首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In high dimensional data modeling, Multivariate Adaptive Regression Splines (MARS) is a popular nonparametric regression technique used to define the nonlinear relationship between a response variable and the predictors with the help of splines. MARS uses piecewise linear functions for local fit and apply an adaptive procedure to select the number and location of breaking points (called knots). The function estimation is basically generated via a two-stepwise procedure: forward selection and backward elimination. In the first step, a large number of local fits is obtained by selecting large number of knots via a lack-of-fit criteria; and in the latter one, the least contributing local fits or knots are removed. In conventional adaptive spline procedure, knots are selected from a set of all distinct data points that makes the forward selection procedure computationally expensive and leads to high local variance. To avoid this drawback, it is possible to restrict the knot points to a subset of data points. In this context, a new method is proposed for knot selection which bases on a mapping approach like self organizing maps. By this method, less but more representative data points are become eligible to be used as knots for function estimation in forward step of MARS. The proposed method is applied to many simulated and real datasets, and the results show that it proposes a time efficient forward step for the knot selection and model estimation without degrading the model accuracy and prediction performance.  相似文献   

2.
This paper introduces a model-based approach to the important data mining tool Multivariate adaptive regression splines (MARS), which has originally been organized in a more model-free way. Indeed, MARS denotes a modern methodology from statistical learning which is important in both classification and regression, with an increasing number of applications in many areas of science, economy and technology. It is very useful for high-dimensional problems and shows a great promise for fitting nonlinear multivariate functions. The MARS algorithm for estimating the model function consists of two algorithms, these are the forward and the backward stepwise algorithm. In our paper, we propose not to use the backward stepwise algorithm. Instead, we construct a penalized residual sum of squares for MARS as a Tikhonov regularization problem which is also known as ridge regression. We treat this problem using continuous optimization techniques which we consider to become an important complementary technology and model-based alternative to the concept of the backward stepwise algorithm. In particular, we apply the elegant framework of conic quadratic programming. This is an area of convex optimization which is very well-structured, herewith, resembling linear programming and, hence, permitting the use of powerful interior point methods. Based on these theoretical and algorithmical studies, this paper also contains an application to diabetes data. We evaluate and compare the performance of the established MARS and our new CMARS in classifying diabetic persons, where CMARS turns out to be very competitive and promising.  相似文献   

3.
The article develops a hybrid variational Bayes (VB) algorithm that combines the mean-field and stochastic linear regression fixed-form VB methods. The new estimation algorithm can be used to approximate any posterior without relying on conjugate priors. We propose a divide and recombine strategy for the analysis of large datasets, which partitions a large dataset into smaller subsets and then combines the variational distributions that have been learned in parallel on each separate subset using the hybrid VB algorithm. We also describe an efficient model selection strategy using cross-validation, which is straightforward to implement as a by-product of the parallel run. The proposed method is applied to fitting generalized linear mixed models. The computational efficiency of the parallel and hybrid VB algorithm is demonstrated on several simulated and real datasets. Supplementary material for this article is available online.  相似文献   

4.
Clustering is often useful for analyzing and summarizing information within large datasets. Model-based clustering methods have been found to be effective for determining the number of clusters, dealing with outliers, and selecting the best clustering method in datasets that are small to moderate in size. For large datasets, current model-based clustering methods tend to be limited by memory and time requirements and the increasing difficulty of maximum likelihood estimation. They may fit too many clusters in some portions of the data and/or miss clusters containing relatively few observations. We propose an incremental approach for data that can be processed as a whole in memory, which is relatively efficient computationally and has the ability to find small clusters in large datasets. The method starts by drawing a random sample of the data, selecting and fitting a clustering model to the sample, and extending the model to the full dataset by additional EM iterations. New clusters are then added incrementally, initialized with the observations that are poorly fit by the current model. We demonstrate the effectiveness of this method by applying it to simulated data, and to image data where its performance can be assessed visually.  相似文献   

5.
In this paper, we apply newly developed methods called GAM & CQP and CMARS for country defaults. These are techniques refined by us using Conic Quadratic Programming. Moreover, we compare these new methods with common and regularly used classification tools, applied on 33 emerging markets’ data in the period of 1980-2005. We conclude that GAM & CQP and CMARS provide an efficient alternative in predictions. The aim of this study is to develop a model for predicting the countries’ default possibilities with the help of modern techniques of continuous optimization, especially conic quadratic programming. We want to show that the continuous optimization techniques used in data mining are also very successful in financial theory and application. By this paper we contribute to further benefits from model-based methods of applied mathematics in the financial sector. Herewith, we aim to help build up our nations.  相似文献   

6.
This paper studies the single-job lot streaming problem in a two-stage hybrid flowshop that has m identical machines at the first stage and one machine at the second stage, to minimise the makespan. A setup time is considered before processing each sublot on a machine. For the problem with the number of sublots given, we prove that it is optimal to use a rotation method for allocating and sequencing the sublots on the machines. With such allocation and sequencing, the sublot sizes are then optimised using linear programming. We then consider the problem with equal sublot sizes and develop an efficient solution to determining the optimal number of sublots. Finally optimal and heuristic solution methods for the general problem are proposed and the worst-case performance of the equal-sublot solution is analysed. Computational results are also reported demonstrating the close-to-optimal performances of the heuristic methods in different problem settings.  相似文献   

7.
Based on two modified secant equations proposed by Yuan, and Li and Fukushima, we extend the approach proposed by Andrei, and introduce two hybrid conjugate gradient methods for unconstrained optimization problems. Our methods are hybridizations of Hestenes-Stiefel and Dai-Yuan conjugate gradient methods. Under proper conditions, we show that one of the proposed algorithms is globally convergent for uniformly convex functions and the other is globally convergent for general functions. To enhance the performance of the line search procedure, we propose a new approach for computing the initial value of the steplength for initiating the line search procedure. We give a comparison of the implementations of our algorithms with two efficiently representative hybrid conjugate gradient methods proposed by Andrei using unconstrained optimization test problems from the CUTEr collection. Numerical results show that, in the sense of the performance profile introduced by Dolan and Moré, the proposed hybrid algorithms are competitive, and in some cases more efficient.  相似文献   

8.
Local search methods are widely used to improve the performance of evolutionary computation algorithms in all kinds of domains. Employing advanced and efficient exploration mechanisms becomes crucial in complex and very large (in terms of search space) problems, such as when employing evolutionary algorithms to large-scale data mining tasks. Recently, the GAssist Pittsburgh evolutionary learning system was extended with memetic operators for discrete representations that use information from the supervised learning process to heuristically edit classification rules and rule sets. In this paper we first adapt some of these operators to BioHEL, a different evolutionary learning system applying the iterative learning approach, and afterwards propose versions of these operators designed for continuous attributes and for dealing with noise. The performance of all these operators and their combination is extensively evaluated on a broad range of synthetic large-scale datasets to identify the settings that present the best balance between efficiency and accuracy. Finally, the identified best configurations are compared with other classes of machine learning methods on both synthetic and real-world large-scale datasets and show very competent performance.  相似文献   

9.
Many real life problems can be stated as a minimax problem, such as economics, finance, management, engineering and other fields, which demonstrate the importance of having reliable methods to tackle minimax problems. In this paper, an algorithm for linearly constrained minimax problems is presented in which we combine the trust-region methods with the line-search methods and curve-search methods. By means of this hybrid technique, it avoids possibly solving the trust-region subproblems many times, and make better use of the advantages of different methods. Under weaker conditions, the global and superlinear convergence are achieved. Numerical experiments show that the new algorithm is robust and efficient.  相似文献   

10.
Our recently developed CMARS is powerful in handling complex and heterogeneous data. We include into CMARS the existence of uncertainty about the scenarios. Indeed, data include noise in both output and input variables. Therefore, solutions of the optimization problem may reveal a remarkable sensitivity to perturbations in the parameters of the problem. The data uncertainty results in uncertain constraints and objective function. To overcome this difficulty, we refine our CMARS algorithm by a robust optimization technique proposed to cope with data uncertainty. In our previous study, we present the new robust CMARS (RCMARS) in theory and method and illustrate it with a numerical example. In this study, we present RCMARS results with different uncertainty scenarios for our numerical example.  相似文献   

11.
We propose Near-optimal Nonlinear Regression Trees with hyperplane splits (NNRTs) that use a polynomial prediction function in the leaf nodes, which we solve by stochastic gradient methods. On synthetic data, we show experimentally that the algorithm converges to the global optimal. We compare NNRTs, ORT-LH, Multivariate Adaptive Regression Splines (MARS), Random Forests (RF) and XGBoost on 40 real-world datasets and show that overall NNRTs have a performance edge over all other methods.  相似文献   

12.
The paper shows that the use of a memetic algorithm (MA), a genetic algorithm (GA) combined with local search, synergistically combined with Lagrangian relaxation is effective and efficient for solving large unit commitment problems in electric power systems. It is shown that standard implementations of GA or MA are not competitive with the traditional methods of dynamic programming (DP) and Lagrangian relaxation (LR). However, an MA seeded with LR proves to be superior to all alternatives on large problems. Eight problems from the literature and a new large, randomly generated problem are used to compare the performance of the proposed seeded MA with GA, MA, DP and LR. Compared with previously published results, this hybrid approach solves the larger problems better and uses less computational time.  相似文献   

13.
有限样本的子空间数据聚类建模及其大规模计算是子空间学习面临的主要问题.现有的大多数模型都不适合大规模计算.本文提出了一个新的优化模型,结合谱投影反馈和辅助信息优化.在提升模型的学习能力的同时,采用高效的分片符号更新算法,可以适合大规模计算.我们用较大规模的模拟例子和实际例子,分析检验了新的优化模型及其快速算法的优于现有其他模型与算法的有效性.  相似文献   

14.
In this study, we consider a semi-desirable facility location problem in a continuous planar region considering the interaction between the facility and the existing demand points. A facility can be defined as semi-desirable if it has both undesirable and desirable effects to the people living in the vicinity. Our aim is to maximize the weighted distance of the facility from the closest demand point as well as to minimize the service cost of the facility. The distance between the facility and the demand points is measured with the rectilinear metric. For the solution of the problem, a three-phase interactive geometrical branch and bound algorithm is suggested to find the most preferred efficient solution. In the first two phases, we aim to eliminate the parts of the feasible region the inefficiency of which can be proved. The third phase has been suggested for an interactive search in the remaining regions with the involvement of a decision maker (DM). In the third phase, the DM is given the opportunity to use either an exact or an approximate procedure to carry out the search. The exact procedure is based on the reference point approach and guarantees to find an efficient point as the most preferred solution. On the other hand, in the approximate procedure, a hybrid methodology is used to increase the efficiency of the reference point approach. The approximate procedure can be used when the DM prefers to see locally efficient solutions so as to save computation time. We demonstrate the performance of the proposed method through example problems.  相似文献   

15.
A multi-objective optimization evolutionary algorithm incorporating preference information interactively is proposed. A new nine grade evaluation method is used to quantify the linguistic preferences expressed by the decision maker (DM) so as to reduce his/her cognitive overload. When comparing individuals, the classical Pareto dominance relation is commonly used, but it has difficulty in dealing with problems involving large numbers of objectives in which it gives an unmanageable and large set of Pareto optimal solutions. In order to overcome this limitation, a new outranking relation called “strength superior” which is based on the preference information is constructed via a fuzzy inference system to help the algorithm find a few solutions located in the preferred regions, and the graphical user interface is used to realize the interaction between the DM and the algorithm. The computational complexity of the proposed algorithm is analyzed theoretically, and its ability to handle preference information is validated through simulation. The influence of parameters on the performance of the algorithm is discussed and comparisons to another preference guided multi-objective evolutionary algorithm indicate that the proposed algorithm is effective in solving high dimensional optimization problems.  相似文献   

16.
Quasi-independence is a common assumption for analyzing truncated data. To verify this condition, we propose a class of weighted log-rank type statistics that include existing tests proposed by Tsai (1990) and Martin and Betensky (2005) as special cases. To choose an appropriate weight function that may lead to a more power test, we derive a score test when the dependence structure under the alternative hypothesis is modeled via the odds ratio function proposed by Chaieb, Rivest and Abdous (2006). Asymptotic properties of the proposed tests are established based on the functional delta method which can handle more general situations than results based on rank-statistics or U-statistics. Extension of the proposed methodology under two different censoring settings is also discussed. Simulations are performed to examine finite-sample performances of the proposed method and its competitors. Two datasets are analyzed for illustrative purposes.  相似文献   

17.
Kernel logistic regression (KLR) is a very powerful algorithm that has been shown to be very competitive with many state-of the art machine learning algorithms such as support vector machines (SVM). Unlike SVM, KLR can be easily extended to multi-class problems and produces class posterior probability estimates making it very useful for many real world applications. However, the training of KLR using gradient based methods or iterative re-weighted least squares can be unbearably slow for large datasets. Coupled with poor conditioning and parameter tuning, training KLR can quickly design matrix become infeasible for some real datasets. The goal of this paper is to present simple, fast, scalable, and efficient algorithms for learning KLR. First, based on a simple approximation of the logistic function, a least square algorithm for KLR is derived that avoids the iterative tuning of gradient based methods. Second, inspired by the extreme learning machine (ELM) theory, an explicit feature space is constructed through a generalized single hidden layer feedforward network and used for training iterative re-weighted least squares KLR (IRLS-KLR) and the newly proposed least squares KLR (LS-KLR). Finally, for large-scale and/or poorly conditioned problems, a robust and efficient preconditioned learning technique is proposed for learning the algorithms presented in the paper. Numerical results on a series of artificial and 12 real bench-mark datasets show first that LS-KLR compares favorable with SVM and traditional IRLS-KLR in terms of accuracy and learning speed. Second, the extension of ELM to KLR results in simple, scalable and very fast algorithms with comparable generalization performance to their original versions. Finally, the introduced preconditioned learning method can significantly increase the learning speed of IRLS-KLR.  相似文献   

18.
In this article, a new methodology based on fuzzy proportional‐integral‐derivative (PID) controller is proposed to damp low frequency oscillation in multimachine power system where the parameters of proposed controller are optimized offline automatically by hybrid genetic algorithm (GA) and particle swarm optimization (PSO) techniques. This newly proposed method is more efficient because it cope with oscillations and different operating points. In this strategy, the controller is tuned online from the knowledge base and fuzzy interference. In the proposed method, for achieving the desired level of robust performance exact tuning of rule base and membership functions (MF) are very important. The motivation for using the GA and PSO as a hybrid method are to reduce fuzzy effort and take large parametric uncertainties in to account. This newly developed control strategy mixed the advantage of GA and PSO techniques to optimally tune the rule base and MF parameters of fuzzy controller that leads to a flexible controller with simple structure while is easy to implement. The proposed method is tested on three machine nine buses and 16 machine power systems with different operating conditions in present of disturbance and nonlinearity. The effectiveness of proposed controller is compared with robust PSS that tune using PSO and the fuzzy controller which is optimized rule base by GA through figure of demerit and integral of the time multiplied absolute value of the error performance indices. The results evaluation shows that the proposed method achieves good robust performance for a wide range of load change in the presents of disturbance and system nonlinearities and is superior to the other controllers. © 2014 Wiley Periodicals, Inc. Complexity 21: 78–93, 2015  相似文献   

19.
In this paper, we study the shortest path tour problem in which a shortest path from a given origin node to a given destination node must be found in a directed graph with non-negative arc lengths. Such path needs to cross a sequence of node subsets that are given in a fixed order. The subsets are disjoint and may be different-sized. A polynomial-time reduction of the problem to a classical shortest path problem over a modified digraph is described and two solution methods based on the above reduction and dynamic programming, respectively, are proposed and compared with the state-of-the-art solving procedure. The proposed methods are tested on existing datasets for this problem and on a large class of new benchmark instances. The computational experience shows that both the proposed methods exhibit a consistent improved performance in terms of computational time with respect to the existing solution method.  相似文献   

20.
Abstract

The primary model for cluster analysis is the latent class model. This model yields the mixture likelihood. Due to numerous local maxima, the success of the EM algorithm in maximizing the mixture likelihood depends on the initial starting point of the algorithm. In this article, good starting points for the EM algorithm are obtained by applying classification methods to randomly selected subsamples of the data. The performance of the resulting two-step algorithm, classification followed by EM, is compared to, and found superior to, the baseline algorithm of EM started from a random partition of the data. Though the algorithm is not complicated, comparing it to the baseline algorithm and assessing its performance with several classification methods is nontrivial. The strategy employed for comparing the algorithms is to identify canonical forms for the easiest and most difficult datasets to cluster within a large collection of cluster datasets and then to compare the performance of the two algorithms on these datasets. This has led to the discovery that, in the case of three homogeneous clusters, the most difficult datasets to cluster are those in which the clusters are arranged on a line and the easiest are those in which the clusters are arranged on an equilateral triangle. The performance of the two-step algorithm is assessed using several classification methods and is shown to be able to cluster large, difficult datasets consisting of three highly overlapping clusters arranged on a line with 10,000 observations and 8 variables.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号