首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Clustering is often useful for analyzing and summarizing information within large datasets. Model-based clustering methods have been found to be effective for determining the number of clusters, dealing with outliers, and selecting the best clustering method in datasets that are small to moderate in size. For large datasets, current model-based clustering methods tend to be limited by memory and time requirements and the increasing difficulty of maximum likelihood estimation. They may fit too many clusters in some portions of the data and/or miss clusters containing relatively few observations. We propose an incremental approach for data that can be processed as a whole in memory, which is relatively efficient computationally and has the ability to find small clusters in large datasets. The method starts by drawing a random sample of the data, selecting and fitting a clustering model to the sample, and extending the model to the full dataset by additional EM iterations. New clusters are then added incrementally, initialized with the observations that are poorly fit by the current model. We demonstrate the effectiveness of this method by applying it to simulated data, and to image data where its performance can be assessed visually.  相似文献   

2.
In this article, we study an explicit scheme for the solution of sine‐Gordon equation when the space discretization is carried out by an overlapping multidomain pseudo‐spectral technique. By using differentiation matrices, the equation is reduced to a nonlinear system of ordinary differential equations in time that can be discretized with the explicit fourth‐order Runge–Kutta method. To achieve approximation with high accuracy in large domains, the number of space grid points must be large enough. This yields very large and full matrices in the pseudo‐spectral method that causes large memory requirements. The domain decomposition approach provides sparsity in the matrices obtained after the discretization, and this property reduces storage for large matrices and provides economical ways of performing matrix–vector multiplications. Therefore, we propose a multidomain pseudo‐spectral method for the numerical simulation of the sine‐Gordon equation in large domains. Test examples are given to demonstrate the accuracy and capability of the proposed method. Numerical experiments show that the multidomain scheme has an excellent long‐time numerical behavior for the sine‐Gordon equation in one and two dimensions. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

3.
The effect which the oil price time series has on the long run properties of Vector AutoRegressive (VAR) models for price levels and import demand is investigated. As the oil price variable is assumed to be weakly exogenous for the long run parameters, a cointegration testing procedure allowing for weakly exogenous variables is developed using a LU decomposition of the long run multiplier matrix. The likelihood based cointegration test statistics, Wald, Likelihood Ratio and Lagrange Multiplier, are constructed and their limiting distributions derived. Using these tests, we find that incorporating the oil price in a model for the domestic or import price level of seven industrialized countries decreases the long run memory of the inflation rate. Second, we find that the results for import demand can be classified with respect to the oil importing or exporting status of the specific country. The result for Japan is typical as its import price is not influenced by GNP in the long run, which is the case for all other countries.  相似文献   

4.
A meaningful rank as well as efficient methods for computing such a rank are necessary in many areas of applications. Major methodologies for ranking often exploit principal eigenvectors. Kleinberg’s HITS model is one of such methodologies. The standard approach for computing the HITS rank is the power method. Unlike the PageRank calculations where many acceleration schemes have been proposed, relatively few works on accelerating HITS rank calculation exist. This is mainly because the power method often works quite well in the HITS setting. However, there are cases where the power method is ineffective, moreover, a systematic acceleration over the power method is desirable even when the power method works well. We propose a practical acceleration scheme for HITS rank calculations based on the filtered power method by adaptive Chebyshev polynomials. For cases where the gap-ratio is below 0.85 for which the power method works well, our scheme is about twice faster than the power method. For cases where gap-ratio is unfavorable for the power method, our scheme can provide significant speedup. When the ranking problems are of very large scale, even a single matrix–vector product can be expensive, for which accelerations are highly necessary. The scheme we propose is desirable in that it provides consistent reduction in number of matrix–vector products as well as CPU time over the power method, with little memory overhead.  相似文献   

5.
In this paper we consider the dynamic behavior of a firm subject to environmental regulation. As a social planner the government wants to reduce the level of pollution. To reach that aim it can, among others, set an upper limit on polluting emissions of the firm. The paper determines how this policy instrument influences the firm's decisions concerning investments, abatement efforts, and the choice whether to leave some capacity unused or not. The abatement process is modeled as input substitution rather than end-of-pipe. Using standard control theory in determining the firm's optimal dynamic investment decisions it turns out that it is always optimal to approach a long run optimal level of capital. In some cases, this equilibrium is reached within finite time, but usually it will be approached asymptotically. Different scenarios are considered, ranging from attractive clean input to unattractive clean input, and from a mild emission limit to a very tight one. It is shown that for large capital stocks and/or when marginal cash flow per unit of emissions is larger for the dirty input than for the clean input, it can be optimal to actually leave some production capacity unused. Also, since the convex installation costs suggest to spread investments over time, it can happen that investment in productive capital is positive although capacity remains unused.  相似文献   

6.
The Gaussian geostatistical model has been widely used for modeling spatial data. However, this model suffers from a severe difficulty in computation: it requires users to invert a large covariance matrix. This is infeasible when the number of observations is large. In this article, we propose an auxiliary lattice-based approach for tackling this difficulty. By introducing an auxiliary lattice to the space of observations and defining a Gaussian Markov random field on the auxiliary lattice, our model completely avoids the requirement of matrix inversion. It is remarkable that the computational complexity of our method is only O(n), where n is the number of observations. Hence, our method can be applied to very large datasets with reasonable computational (CPU) times. The numerical results indicate that our model can approximate Gaussian random fields very well in terms of predictions, even for those with long correlation lengths. For real data examples, our model can generally outperform conventional Gaussian random field models in both prediction errors and CPU times. Supplemental materials for the article are available online.  相似文献   

7.
金融时间序列长记忆参数的半参数估计方法以频域分析为主,带宽选择是其中必不可少的关键环节。不同的带宽可能给出差异明显的长记忆参数估计值,甚至产生矛盾的结论,进而影响时间序列平稳性的判断。本文提出一种两步法,用于金融时间序列长记忆估计的半参数方法的带宽选择,并进一步对长记忆参数进行估计:首先,为了克服半参数方法忽略短期结构的不足,通过信息准则判断ARFIMA(p,d,q)过程的短记忆结构;其次,用短记忆模型拟合差分后的序列,根据拟合效果确定选择带宽及长记忆参数估计值。数值模拟显示以长记忆参数估计值均方根误差最小为标准,两步法优于其他方法。经上证50指数已实现波动率日数据的实证检验,两步法在长记忆模型中的预测误差最小;与短记忆模型相比,两步法在中期提前预测步长上具有优势。  相似文献   

8.
Approximation schemes for optimal compression with static and sliding dictionaries which can run on a simple array of processors with distributed memory and no interconnections are presented. These approximation algorithms can be implemented on both small and large scale parallel systems. The sliding dictionary method requires large size files on large scale systems. As far as lossless image compression is concerned, arithmetic encoders enable the best lossless compressors but they are often ruled out because they are too complex. Storer extended dictionary text compression to bi-level images to avoid arithmetic encoders (BLOCK MATCHING). We were able to partition an image into up to a hundred areas and to apply the BLOCK MATCHING heuristic independently to each area with no loss of compression effectiveness. Therefore, the approach is suitable for a small scale parallel system at no communication cost. On the other hand, bi-level image compression seems to require communication on large scale systems. With regard to grey scale and color images, parallelizable lossless image compression (PALIC) is a highly parallelizable and scalable lossless compressor since it is applied independently to blocks of 8 × 8 pixels. We experimented the BLOCK MATCHING and PALIC heuristics with up to 32 processors of a 256 Intel Xeon 3.06 GHz processors machine () on a test set of large topographic bi-level images and color images in RGB format. We obtained the expected speed-up of the compression and decompression times, achieving parallel running times about 25 times faster than the sequential ones. Finally, scalable algorithms computing static and sliding dictionary optimal text compression on an exclusive read, exclusive write shared memory parallel machine are presented. On the same model, compression by block matching of bi-level images is shown which can be implemented on a full binary tree architecture under some realistic assumptions with no scalability issues.  相似文献   

9.
The semiparametric proportional odds model for survival data is useful when mortality rates of different groups converge over time. However, fitting the model by maximum likelihood proves computationally cumbersome for large datasets because the number of parameters exceeds the number of uncensored observations. We present here an alternative to the standard Newton-Raphson method of maximum likelihood estimation. Our algorithm, an example of a minorization-maximization (MM) algorithm, is guaranteed to converge to the maximum likelihood estimate whenever it exists. For large problems, both the algorithm and its quasi-Newton accelerated counterpart outperform Newton-Raphson by more than two orders of magnitude.  相似文献   

10.
Adjacency constraints along with even flow harvest constraints are important in long term forest planning. Simulated annealing (SA) is previously successfully applied when addressing such constraints. The objective of this paper was to assess the performance of SA under three new methods of introducing biased probabilities in the management unit (MU) selection and compare them to the conventional method that assumes uniform probabilities. The new methods were implemented as a search vector approach based on the number of treatment schedules describing sequences of silvicultural treatments over time and standard deviation of net present value within MUs (Methods 2 and 3, respectively), and by combining the two approaches (Method 4). We constructed three hundred hypothetical forests (datasets) for three different landscapes characterized by different initial age class distributions (young, normal and old). Each dataset encompassed 1600 management units. The evaluation of the methods was done by means of objective function values, first feasible iteration and time consumption. Introducing a bias in the MU selection improves solutions compared to the conventional method (Method 1). However, an increase of computational time is in general needed for the new methods. Method 4 is the best alternative because, for large parts of the datasets, produced the best average and maximum objective function values and had lower time consumption than Methods 2 and 3. Although Method 4 performed very well, Methods 2 and 3 should not be neglected because for a considerable number of datasets the maximum objective function values were obtained by these methods.  相似文献   

11.
Efficient subroutines for dense matrix computations have recently been developed and are available on many high-speed computers. On some computers the speed of many dense matrix operations is near to the peak-performance. For sparse matrices storage and operations can be saved by operating only and storing only nonzero elements. However, the price is a great degradation of the speed of computations on supercomputers (due to the use of indirect addresses, to the need to insert new nonzeros in the sparse storage scheme, to the lack of data locality, etc.). On many high-speed computers a dense matrix technique is preferable to sparse matrix technique when the matrices are not large, because the high computational speed compensates fully the disadvantages of using more arithmetic operations and more storage. For very large matrices the computations must be organized as a sequence of tasks in each of which a dense block is treated. The blocks must be large enough to achieve a high computational speed, but not too large, because this will lead to a large increase in both the computing time and the storage. A special “locally optimized reordering algorithm” (LORA) is described, which reorders the matrix so that dense blocks can be constructed and treated with some standard software, say LAPACK or NAG. These ideas are implemented for linear least-squares problems. The rectangular matrices (that appear in such problems) are decomposed by an orthogonal method. Results obtained on a CRAY C92A computer demonstrate the efficiency of using large dense blocks.  相似文献   

12.
13.
A new Lagrangian relaxation (LR) approach is developed for job shop scheduling problems. In the approach, operation precedence constraints rather than machine capacity constraints are relaxed. The relaxed problem is decomposed into single or parallel machine scheduling subproblems. These subproblems, which are NP-complete in general, are approximately solved by using fast heuristic algorithms. The dual problem is solved by using a recently developed “surrogate subgradient method” that allows approximate optimization of the subproblems. Since the algorithms for subproblems do not depend on the time horizon of the scheduling problems and are very fast, our new LR approach is efficient, particularly for large problems with long time horizons. For these problems, the machine decomposition-based LR approach requires much less memory and computation time as compared to a part decomposition-based approach as demonstrated by numerical testing.  相似文献   

14.
In this paper we concentrate on testing for multiple changes in the mean of a series of independent random variables. Suggested method applies a maximum type test statistic. Our primary focus is on an effective calculation of critical values for very large sample sizes comprising (tens of) thousands of observations and a moderate to large number of segments. To that end, Monte Carlo simulations and a modified Bellman’s principle of optimality are used. It is shown that, indisputably, computer memory becomes a critical bottleneck in solving a problem of such a size. Thus, minimization of the memory requirements and appropriate order of calculations appear to be the keys to success. In addition, the formula that can be used to get approximate asymptotic critical values using the theory of exceedance probability of Gaussian fields over a high level is presented.  相似文献   

15.
A new shift‐adaptive meshfree method for solving a class of time‐dependent partial differential equations (PDEs) in a bounded domain (one‐dimensional domain) with moving boundaries and nonhomogeneous boundary conditions is introduced. The radial basis function (RBF) collocation method is combined with the finite difference scheme, because, unlike with Kansa's method, nonlinear PDEs can be converted to a system of linear equations. The grid‐free property of the RBF method is exploited, and a new adaptive algorithm is used to choose the location of the collocation points in the first time step only. In fact, instead of applying the adaptive algorithm on the entire domain of the problem (like with other existing adaptive algorithms), the new adaptive algorithm can be applied only on time steps. Furthermore, because of the radial property of the RBFs, the new adaptive strategy is applied only on the first time step; in the other time steps, the adaptive nodes (obtained in the first time step) are shifted. Thus, only one small system of linear equations must be solved (by LU decomposition method) rather than a large linear or nonlinear system of equations as in Kansa's method (adaptive strategy applied to entire domain), or a large number of small linear systems of equations in the adaptive strategy on each time step. This saves a lot in time and memory usage. Also, Stability analysis is obtained for our scheme, using Von Neumann stability analysis method. Results show that the new method is capable of reducing the number of nodes in the grid without compromising the accuracy of the solution, and the adaptive grading scheme is effective in localizing oscillations due to sharp gradients or discontinuities in the solution. The efficiency and effectiveness of the proposed procedure is examined by adaptively solving two difficult benchmark problems, including a regularized long‐wave equation and a Korteweg‐de Vries problem. © 2016Wiley Periodicals, Inc. Numer Methods Partial Differential Eq 32: 1622–1646, 2016  相似文献   

16.
A commonly used method of monitoring the condition of rail track is to run an inspection vehicle over the track at intervals of about 3 months. Measurements of several geometric properties of the track are automatically recorded about every 900 mm, resulting in long sequences of data (signals) arising from runs of up to 100 km. Condition monitoring is done by comparing the results of a current run with those of a previously recorded reference run. Before this can be done, the two signals need to be aligned so that corresponding distance measurements in each signal actually refer to the same point on the track. A procedure for matching the two signals is presented, which has at its heart a dynamic programming method. The procedure is demonstrated on data from rail tracks in Australia.  相似文献   

17.
医药临床试验,生存分析,可靠性统计等研究领域,由于考虑到时间和费用问题,研究往往有一定期限.因为研究到期的被迫结束或者某些病人中途退出试验,最后得到的试验结果往往是删失数据.对于删失数据,采用无偏转换的方法处理,方法的最大优点是得到的估计量为显式解.首先讨论了在纵向右删失数据下线性回归模型回归系数估计的均方相合性,并且把结论推广到了污染线性模型,得到了污染系数、回归系数的强相合估计.  相似文献   

18.
Heuristics for Large Constrained Vehicle Routing Problems   总被引:1,自引:0,他引:1  
This paper presents a heuristic for solving very large routing problems (thousands of customers and hundreds of vehicles) with side constraints such as time windows. When applied to traditional benchmarks (Solomon's), we obtain high quality results with short resolution time (a few seconds). We also introduce a LDS (Limited Discrepancy Search) variation that produces state-of-the-art results. The heart of this heuristic is a combination of a look-ahead insertion algorithm, an incremental local optimization scheme and a constraint solver for constrained traveling salesman problems. The incrementality means that instead of visiting some large neighborhood after an initial solution has been found, a limited number of moves is examined, after each insertion, on the partial solution. This incremental version is not only faster, it also yields better results than using local optimization once a full solution has been built. We also show how additional constraints can be used in order to guide the insertion process. Because of its use of separate CP (Constraint Programming) modules, this method is flexible and may be used to solve large dispatching problems that include many additional constraints such as setup times (asymmetrical distance) or skill matching.  相似文献   

19.
The modification of particle distributions by low amplitude magnetohydrodynamic modes is an important topic for magnetically confined plasmas. Low amplitude modes are known to be capable of producing significant modification of injected neutral beam profiles, and the same can be expected in burning plasmas for the alpha particle distributions. Flattening of a distribution due to phase mixing in an island or due to portions of phase space becoming stochastic is a process extremely rapid on the time scale of an experiment but still very long compared to the time scale of guiding center simulations. Thus it is very valuable to be able to locate significant resonances and to predict the final particle distribution produced by a given spectrum of magnetohydrodynamic modes. In this paper we introduce a new method of determining domains of phase space in which good surfaces do not exist and use this method for quickly determining the final state of the particle distribution without carrying out the full time evolution leading to it.  相似文献   

20.
This paper is concerned with the efficient solution of the linear systems of equations that arise from an adaptive space-implicit time discretisation of the Black-Scholes equation. These nonsymmetric systems are very large and sparse, so an iterative method will usually be the method of choice. However, such a method may require a large number of iterations to converge, particularly when the timestep used is large (which is often the case towards the end of a simulation which uses adaptive timestepping). An appropriate preconditioner is therefore desirable. In this paper we show that a very simple multigrid algorithm with standard components works well as a preconditioner for these problems. We analyse the eigenvalue spectrum of the multigrid iteration matrix for uniform grid problems and illustrate the method’s efficiency in practice by considering the results of numerical experiments on both uniform grids and those which use adaptivity in space.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号