首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We consider the problem of deleting bad influential observations (outliers) in linear regression models. The problem is formulated as a Quadratic Mixed Integer Programming (QMIP) problem, where penalty costs for discarding outliers are used into the objective function. The optimum solution defines a robust regression estimator called penalized trimmed squares (PTS). Due to the high computational complexity of the resulting QMIP problem, the proposed robust procedure is computationally suitable for small sample data. The computational performance and the effectiveness of the new procedure are improved significantly by using the idea of ε-Insensitive loss function from support vectors machine regression. Small errors are ignored, and the mathematical formula gains the sparseness property. The good performance of the ε-Insensitive PTS (IPTS) estimator allows identification of multiple outliers avoiding masking or swamping effects. The computational effectiveness and successful outlier detection of the proposed method is demonstrated via simulated experiments. This research has been partially funded by the Greek Ministry of Education under the program Pythagoras II.  相似文献   

2.
We introduce fast and robust algorithms for lower rank approximation to given matrices based on robust alternating regression. The alternating least squares regression, also called criss-cross regression, was used for lower rank approximation of matrices, but it lacks robustness against outliers in these matrices. We use robust regression estimators and address some of the complications arising from this approach. We find it helpful to use high breakdown estimators in the initial iterations, followed by M estimators with monotone score functions in later iterations towards convergence. In addition to robustness, the computational speed is another important consideration in the development of our proposed algorithm, because alternating robust regression can be computationally intensive for large matrices. Based on a mix of the least trimmed squares (LTS) and Huber's M estimators, we demonstrate that fast and robust lower rank approximations are possible for modestly large matrices.  相似文献   

3.
This paper proposes a robust procedure for solving multiphase regression problems that is efficient enough to deal with data contaminated by atypical observations due to measurement errors or those drawn from heavy-tailed distributions. Incorporating the expectation and maximization algorithm with the M-estimation technique, we simultaneously derive robust estimates of the change-points and regression parameters, yet as the proposed method is still not resistant to high leverage outliers we further suggest a modified version by first moderately trimming those outliers and then implementing the new procedure for the trimmed data. This study sets up two robust algorithms using the Huber loss function and Tukey's biweight function to respectively replace the least squares criterion in the normality-based expectation and maximization algorithm, illustrating the effectiveness and superiority of the proposed algorithms through extensive simulations and sensitivity analyses. Experimental results show the ability of the proposed method to withstand outliers and heavy-tailed distributions. Moreover, as resistance to high leverage outliers is particularly important due to their devastating effect on fitting a regression model to data, various real-world applications show the practicability of this approach.  相似文献   

4.
基于最小截平方和估计的监测数据分析方法   总被引:1,自引:0,他引:1  
水工程安全监测数据中不可避免地存在离群点,而应用最为广泛的最小二乘法(least square,LS)不具备剔除离群点的能力,反而更易吸收离群点,使回归曲线严重偏离实际。针对LS在此方面的缺陷,本文在最小化残差平方和理论的基础上,提出采用最小截平方和估计(least trimmed squares,LTS)方法来构建水工程安全监控模型。根据实际工程的监测资料并对监测资料分析处理,剔除离群点得到最优数据群。通过求解最优数据群的回归系数,进而得到最接近实际数据的拟合曲线。相比于LS估计,LTS估计所得结果更具有合理性、稳健性,且能够显著提高数据的预测精度。因此,LTS估计在水工程安全监测等数据分析中具有良好的应用前景。  相似文献   

5.
A statistical procedure is called robust if it is insensitive to the occurence of gross errors in the data. The ordinary least squares regression technique does not satisfy this property, because even a single outlier can totally offset the result. Therefore, the least trimmed squares (LTS) technique is introduced, which can resist the effect of a large percentage of outliers. The latter method is illustrated on data concerning life insurance, pension funds, health insurance, and inflation.  相似文献   

6.
This paper focuses on efficient computational approaches to compute approximate solutions of a linear inverse problem that is contaminated with mixed Poisson–Gaussian noise, and when there are additional outliers in the measured data. The Poisson–Gaussian noise leads to a weighted minimization problem, with solution-dependent weights. To address outliers, the standard least squares fit-to-data metric is replaced by the Talwar robust regression function. Convexity, regularization parameter selection schemes, and incorporation of non-negative constraints are investigated. A projected Newton algorithm is used to solve the resulting constrained optimization problem, and a preconditioner is proposed to accelerate conjugate gradient Hessian solves. Numerical experiments on problems from image deblurring illustrate the effectiveness of the methods.  相似文献   

7.
《Optimization》2012,61(12):1467-1490
Large outliers break down linear and nonlinear regression models. Robust regression methods allow one to filter out the outliers when building a model. By replacing the traditional least squares criterion with the least trimmed squares (LTS) criterion, in which half of data is treated as potential outliers, one can fit accurate regression models to strongly contaminated data. High-breakdown methods have become very well established in linear regression, but have started being applied for non-linear regression only recently. In this work, we examine the problem of fitting artificial neural networks (ANNs) to contaminated data using LTS criterion. We introduce a penalized LTS criterion which prevents unnecessary removal of valid data. Training of ANNs leads to a challenging non-smooth global optimization problem. We compare the efficiency of several derivative-free optimization methods in solving it, and show that our approach identifies the outliers correctly when ANNs are used for nonlinear regression.  相似文献   

8.
Kernel logistic regression (KLR) is a very powerful algorithm that has been shown to be very competitive with many state-of the art machine learning algorithms such as support vector machines (SVM). Unlike SVM, KLR can be easily extended to multi-class problems and produces class posterior probability estimates making it very useful for many real world applications. However, the training of KLR using gradient based methods or iterative re-weighted least squares can be unbearably slow for large datasets. Coupled with poor conditioning and parameter tuning, training KLR can quickly design matrix become infeasible for some real datasets. The goal of this paper is to present simple, fast, scalable, and efficient algorithms for learning KLR. First, based on a simple approximation of the logistic function, a least square algorithm for KLR is derived that avoids the iterative tuning of gradient based methods. Second, inspired by the extreme learning machine (ELM) theory, an explicit feature space is constructed through a generalized single hidden layer feedforward network and used for training iterative re-weighted least squares KLR (IRLS-KLR) and the newly proposed least squares KLR (LS-KLR). Finally, for large-scale and/or poorly conditioned problems, a robust and efficient preconditioned learning technique is proposed for learning the algorithms presented in the paper. Numerical results on a series of artificial and 12 real bench-mark datasets show first that LS-KLR compares favorable with SVM and traditional IRLS-KLR in terms of accuracy and learning speed. Second, the extension of ELM to KLR results in simple, scalable and very fast algorithms with comparable generalization performance to their original versions. Finally, the introduced preconditioned learning method can significantly increase the learning speed of IRLS-KLR.  相似文献   

9.
In this paper, we investigate a robust estimation of the number of components in the mixture of regression models using trimmed information criteria. Compared to the traditional information criteria, the trimmed criteria are robust and not sensitive to outliers. The superiority of the trimmed methods in comparison with the traditional information criterion methods is illustrated through a simulation study. Two real data applications are also used to illustrate the effectiveness of the trimmed model selection methods.  相似文献   

10.
The presence of groups containing high leverage outliers makes linear regression a difficult problem due to the masking effect. The available high breakdown estimators based on Least Trimmed Squares often do not succeed in detecting masked high leverage outliers in finite samples. An alternative to the LTS estimator, called Penalised Trimmed Squares (PTS) estimator, was introduced by the authors in Zioutas and Avramidis (2005) Acta Math Appl Sin 21:323–334, Zioutas et al. (2007) REVSTAT 5:115–136 and it appears to be less sensitive to the masking problem. This estimator is defined by a Quadratic Mixed Integer Programming (QMIP) problem, where in the objective function a penalty cost for each observation is included which serves as an upper bound on the residual error for any feasible regression line. Since the PTS does not require presetting the number of outliers to delete from the data set, it has better efficiency with respect to other estimators. However, due to the high computational complexity of the resulting QMIP problem, exact solutions for moderately large regression problems is infeasible. In this paper we further establish the theoretical properties of the PTS estimator, such as high breakdown and efficiency, and propose an approximate algorithm called Fast-PTS to compute the PTS estimator for large data sets efficiently. Extensive computational experiments on sets of benchmark instances with varying degrees of outlier contamination, indicate that the proposed algorithm performs well in identifying groups of high leverage outliers in reasonable computational time.  相似文献   

11.
In this paper, we propose a robust support vector regression with a novel generic nonconvex quadratic ε-insensitive loss function. The proposed method is robust to outliers or noise since it can adaptively control the loss value and decrease the negative influence of outliers or noise on the decision function by adjusting the elastic interval parameter and adaptive robustification parameter. Given the nature of the nonconvexity of the optimization problem, a concave-convex programming procedure is employed to solve the proposed problem. Experimental results on two artificial data sets and three real-world data sets indicate that the proposed method outperforms support vector regression, L1-norm support vector regression, least squares support vector regression, robust least squares support vector regression, and support vector regression with the Huber loss function on both robustness and generalization ability.  相似文献   

12.
基于LS-SVM的管道腐蚀速率灰色组合预测模型   总被引:1,自引:0,他引:1  
为提高管道腐蚀速率预测精度,建立了一种基于最小二乘支持向量机的灰色组合预测模型.以各种灰色模型对管道腐蚀速率的预测结果作为支持向量机的输入,以管道腐蚀速率的实测值作为支持向量机的输出,采用最小二乘支持向量机回归算法和高斯核函数对支持向量机进行训练,利用训练好的支持向量机进行组合预测.预测模型兼具灰色模型所需原始数据少、建模简单、运算方便的优势和最小二乘支持向量机具有泛化能力强、非线性拟合性好、小样本等特性,弥补了单一预测模型的不足,避免了神经网络组合预测易于陷入局部最优的弱点.模型结构简单、实用,仿真结果验证了其有效性.  相似文献   

13.
In this paper, we propose a robust L1-norm non-parallel proximal support vector machine (L1-NPSVM), which aims at giving a robust performance for binary classification in contrast to GEPSVM, especially for the problem with outliers. There are three mainly properties of the proposed L1-NPSVM. Firstly, different from the traditional GEPSVM which solves two generalized eigenvalue problems, our L1-NPSVM solves a pair of L1-norm optimal problems by using a simple justifiable iterative technique. Secondly, by introducing the L1-norm, our L1-NPSVM is more robust to outliers than GEPSVM to a great extent. Thirdly, compared with GEPSVM, no parameters need to be regularized in our L1-NPSVM. The effectiveness of the proposed method is demonstrated by tests on a simple artificial example as well as on some UCI datasets, which shows the improvements of GEPSVM.  相似文献   

14.
We investigate a robust penalized logistic regression algorithm based on a minimum distance criterion. Influential outliers are often associated with the explosion of parameter vector estimates, but in the context of standard logistic regression, the bias due to outliers always causes the parameter vector to implode, that is, shrink toward the zero vector. Thus, using LASSO-like penalties to perform variable selection in the presence of outliers can result in missed detections of relevant covariates. We show that by choosing a minimum distance criterion together with an elastic net penalty, we can simultaneously find a parsimonious model and avoid estimation implosion even in the presence of many outliers in the important small n large p situation. Minimizing the penalized minimum distance criterion is a challenging problem due to its nonconvexity. To meet the challenge, we develop a simple and efficient MM (majorization–minimization) algorithm that can be adapted gracefully to the small n large p context. Performance of our algorithm is evaluated on simulated and real datasets. This article has supplementary materials available online.  相似文献   

15.
The kernel-based regression (KBR) method, such as support vector machine for regression (SVR) is a well-established methodology for estimating the nonlinear functional relationship between the response variable and predictor variables. KBR methods can be very sensitive to influential observations that in turn have a noticeable impact on the model coefficients. The robustness of KBR methods has recently been the subject of wide-scale investigations with the aim of obtaining a regression estimator insensitive to outlying observations. However, existing robust KBR (RKBR) methods only consider Y-space outliers and, consequently, are sensitive to X-space outliers. As a result, even a single anomalous outlying observation in X-space may greatly affect the estimator. In order to resolve this issue, we propose a new RKBR method that gives reliable result even if a training data set is contaminated with both Y-space and X-space outliers. The proposed method utilizes a weighting scheme based on the hat matrix that resembles the generalized M-estimator (GM-estimator) of conventional robust linear analysis. The diagonal elements of hat matrix in kernel-induced feature space are used as leverage measures to downweight the effects of potential X-space outliers. We show that the kernelized hat diagonal elements can be obtained via eigen decomposition of the kernel matrix. The regularized version of kernelized hat diagonal elements is also proposed to deal with the case of the kernel matrix having full rank where the kernelized hat diagonal elements are not suitable for leverage. We have shown that two kernelized leverage measures, namely, the kernel hat diagonal element and the regularized one, are related to statistical distance measures in the feature space. We also develop an efficiently kernelized training algorithm for the parameter estimation based on iteratively reweighted least squares (IRLS) method. The experimental results from simulated examples and real data sets demonstrate the robustness of our proposed method compared with conventional approaches.  相似文献   

16.
Classical robust statistical methods dealing with noisy data are often based on modifications of convex loss functions. In recent years, nonconvex loss-based robust methods have been increasingly popular. A nonconvex loss can provide robust estimation for data contaminated with outliers. The significant challenge is that a nonconvex loss can be numerically difficult to optimize. This article proposes quadratic majorization algorithm for nonconvex (QManc) loss. The QManc can decompose a nonconvex loss into a sequence of simpler optimization problems. Subsequently, the QManc is applied to a powerful machine learning algorithm: quadratic majorization boosting algorithm (QMBA). We develop QMBA for robust classification (binary and multi-category) and regression. In high-dimensional cancer genetics data and simulations, the QMBA is comparable with convex loss-based boosting algorithms for clean data, and outperforms the latter for data contaminated with outliers. The QMBA is also superior to boosting when directly implemented to optimize nonconvex loss functions. Supplementary material for this article is available online.  相似文献   

17.
Exploratory graphical tools based on trimming are proposed for detecting main clusters in a given dataset. The trimming is obtained by resorting to trimmed k-means methodology. The analysis always reduces to the examination of real valued curves, even in the multivariate case. As the technique is based on a robust clustering criterium, it is able to handle the presence of different kinds of outliers. An algorithm is proposed to carry out this (computer intensive) method. As with classical k-means, the method is specially oriented to mixtures of spherical distributions. A possible generalization is outlined to overcome this drawback.  相似文献   

18.
To perform multiple regression, the least squares estimator is commonly used. However, this estimator is not robust to outliers. Therefore, robust methods such as S-estimation have been proposed. These estimators flag any observation with a large residual as an outlier and downweight it in the further procedure. However, a large residual may be caused by an outlier in only one single predictor variable, and downweighting the complete observation results in a loss of information. Therefore, we propose the shooting S-estimator, a regression estimator that is especially designed for situations where a large number of observations suffer from contamination in a small number of predictor variables. The shooting S-estimator combines the ideas of the coordinate descent algorithm with simple S-regression, which makes it robust against componentwise contamination, at the cost of failing the regression equivariance property.  相似文献   

19.
In multiple linear regression model, we have presupposed assumptions (independence, normality, variance homogeneity and so on) on error term. When case weights are given because of variance heterogeneity, we can estimate efficiently regression parameter using weighted least squares estimator. Unfortunately, this estimator is sensitive to outliers like ordinary least squares estimator. Thus, in this paper, we proposed some statistics for detection of outliers in weighted least squares regression.  相似文献   

20.
In order to down-weight or ignore unusual data and multicollinearity effects, some alternative robust estimators are introduced. Firstly, a ridge least trimmed squares approach is discussed. Then, based on a penalization scheme, a nonlinear integer programming problem is suggested. Because of complexity and difficulty, the proposed optimization problem is solved by a tabu search heuristic algorithm. Also, the robust generalized cross validation criterion is employed for selecting the optimal ridge parameter. Finally, a simulation case and two real-world data sets are computationally studied to support our theoretical discussions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号