首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
We consider the problem of deleting bad influential observations (outliers) in linear regression models. The problem is formulated as a Quadratic Mixed Integer Programming (QMIP) problem, where penalty costs for discarding outliers are used into the objective function. The optimum solution defines a robust regression estimator called penalized trimmed squares (PTS). Due to the high computational complexity of the resulting QMIP problem, the proposed robust procedure is computationally suitable for small sample data. The computational performance and the effectiveness of the new procedure are improved significantly by using the idea of ε-Insensitive loss function from support vectors machine regression. Small errors are ignored, and the mathematical formula gains the sparseness property. The good performance of the ε-Insensitive PTS (IPTS) estimator allows identification of multiple outliers avoiding masking or swamping effects. The computational effectiveness and successful outlier detection of the proposed method is demonstrated via simulated experiments. This research has been partially funded by the Greek Ministry of Education under the program Pythagoras II.  相似文献   

2.
《Optimization》2012,61(12):1467-1490
Large outliers break down linear and nonlinear regression models. Robust regression methods allow one to filter out the outliers when building a model. By replacing the traditional least squares criterion with the least trimmed squares (LTS) criterion, in which half of data is treated as potential outliers, one can fit accurate regression models to strongly contaminated data. High-breakdown methods have become very well established in linear regression, but have started being applied for non-linear regression only recently. In this work, we examine the problem of fitting artificial neural networks (ANNs) to contaminated data using LTS criterion. We introduce a penalized LTS criterion which prevents unnecessary removal of valid data. Training of ANNs leads to a challenging non-smooth global optimization problem. We compare the efficiency of several derivative-free optimization methods in solving it, and show that our approach identifies the outliers correctly when ANNs are used for nonlinear regression.  相似文献   

3.
The presence of groups containing high leverage outliers makes linear regression a difficult problem due to the masking effect. The available high breakdown estimators based on Least Trimmed Squares often do not succeed in detecting masked high leverage outliers in finite samples. An alternative to the LTS estimator, called Penalised Trimmed Squares (PTS) estimator, was introduced by the authors in Zioutas and Avramidis (2005) Acta Math Appl Sin 21:323–334, Zioutas et al. (2007) REVSTAT 5:115–136 and it appears to be less sensitive to the masking problem. This estimator is defined by a Quadratic Mixed Integer Programming (QMIP) problem, where in the objective function a penalty cost for each observation is included which serves as an upper bound on the residual error for any feasible regression line. Since the PTS does not require presetting the number of outliers to delete from the data set, it has better efficiency with respect to other estimators. However, due to the high computational complexity of the resulting QMIP problem, exact solutions for moderately large regression problems is infeasible. In this paper we further establish the theoretical properties of the PTS estimator, such as high breakdown and efficiency, and propose an approximate algorithm called Fast-PTS to compute the PTS estimator for large data sets efficiently. Extensive computational experiments on sets of benchmark instances with varying degrees of outlier contamination, indicate that the proposed algorithm performs well in identifying groups of high leverage outliers in reasonable computational time.  相似文献   

4.
The paper studies a new class of robust regression estimators based on the two-step least weighted squares (2S-LWS) estimator which employs data-adaptive weights determined from the empirical distribution or quantile functions of regression residuals obtained from an initial robust fit. Just like many existing two-step robust methods, the proposed 2S-LWS estimator preserves robust properties of the initial robust estimate. However, contrary to the existing methods, the first-order asymptotic behavior of 2S-LWS is fully independent of the initial estimate under mild conditions. We propose data-adaptive weighting schemes that perform well both in the cross-section and time-series data and prove the asymptotic normality and efficiency of the resulting procedure. A simulation study documents these theoretical properties in finite samples.  相似文献   

5.
We consider the median regression with a LASSO-type penalty term for variable selection. With the fixed number of variables in regression model, a two-stage method is proposed for simultaneous estimation and variable selection where the degree of penalty is adaptively chosen. A Bayesian information criterion type approach is proposed and used to obtain a data-driven procedure which is proved to automatically select asymptotically optimal tuning parameters. It is shown that the resultant estimator achieves the so-called oracle property. The combination of the median regression and LASSO penalty is computationally easy to implement via the standard linear programming. A random perturbation scheme can be made use of to get simple estimator of the standard error. Simulation studies are conducted to assess the finite-sample performance of the proposed method. We illustrate the methodology with a real example.  相似文献   

6.
In this article, we develop efficient robust method for estimation of mean and covariance simultaneously for longitudinal data in regression model. Based on Cholesky decomposition for the covariance matrix and rewriting the regression model, we propose a weighted least square estimator, in which the weights are estimated under generalized empirical likelihood framework. The proposed estimator obtains high efficiency from the close connection to empirical likelihood method, and achieves robustness by bounding the weighted sum of squared residuals. Simulation study shows that, compared to existing robust estimation methods for longitudinal data, the proposed estimator has relatively high efficiency and comparable robustness. In the end, the proposed method is used to analyse a real data set.  相似文献   

7.
To perform multiple regression, the least squares estimator is commonly used. However, this estimator is not robust to outliers. Therefore, robust methods such as S-estimation have been proposed. These estimators flag any observation with a large residual as an outlier and downweight it in the further procedure. However, a large residual may be caused by an outlier in only one single predictor variable, and downweighting the complete observation results in a loss of information. Therefore, we propose the shooting S-estimator, a regression estimator that is especially designed for situations where a large number of observations suffer from contamination in a small number of predictor variables. The shooting S-estimator combines the ideas of the coordinate descent algorithm with simple S-regression, which makes it robust against componentwise contamination, at the cost of failing the regression equivariance property.  相似文献   

8.
This paper formulates the quadratic penalty function for the dual problem of the linear programming associated with the \(L_1\) constrained linear quantile regression model. We prove that the solution of the original linear programming can be obtained by minimizing the quadratic penalty function, with the formulas derived. The obtained quadratic penalty function has no constraint, thus could be minimized efficiently by a generalized Newton algorithm with Armijo step size. The resulting algorithm is easy to implement, without requiring any sophisticated optimization package other than a linear equation solver. The proposed approach can be generalized to the quantile regression model in reproducing kernel Hilbert space with slight modification. Extensive experiments on simulated data and real-world data show that, the proposed Newton quantile regression algorithms can achieve performance comparable to state-of-the-art.  相似文献   

9.
非参数核回归方法近年来已被用于纵向数据的分析(Lin和Carroll,2000).一个颇具争议性的问题是在非参数核回归中是否需要考虑纵向数据间的相关性.Lin和Carroll (2000)证明了基于独立性(即忽略相关性)的核估计在一类核GEE估计量中是(渐近)最有效的.基于混合效应模型方法作者提出了一个不同的核估计类,它自然而有效地结合了纵向数据的相关结构.估计量达到了与Lin和Carroll的估计量相同的渐近有效性,且在有限样本情形下表现更好.由此方法可以很容易地获得对于总体和个体的非参数曲线估计.所提出的估计量具有较好的统计性质,且实施方便,从而对实际工作者具有较大的吸引力.  相似文献   

10.
In this study, a new nonparametric approach using Bernstein copula approximation is proposed to estimate Pickands dependence function. New data points obtained with Bernstein copula approximation serve to estimate the unknown Pickands dependence function. Kernel regression method is then used to derive an intrinsic estimator satisfying the convexity. Some extreme-value copula models are used to measure the performance of the estimator by a comprehensive simulation study. Also, a real-data example is illustrated. The proposed Pickands estimator provides a flexible way to have a better fit and has a better performance than the conventional estimators.  相似文献   

11.
This paper reports a robust kernel estimation for fixed design nonparametric regression models. A Stahel-Donoho kernel estimation is introduced, in which the weight functions depend on both the depths of data and the distances between the design points and the estimation points. Based on a local approximation, a computational technique is given to approximate to the incomputable depths of the errors. As a result the new estimator is computationally efficient. The proposed estimator attains a high breakdown point and has perfect asymptotic behaviors such as the asymptotic normality and convergence in the mean squared error. Unlike the depth-weighted estimator for parametric regression models, this depth-weighted nonparametric estimator has a simple variance structure and then we can compare its efficiency with the original one. Some simulations show that the new method can smooth the regression estimation and achieve some desirable balances between robustness and efficiency.  相似文献   

12.
Robust Depth-Weighted Wavelet for Nonparametric Regression Models   总被引:2,自引:0,他引:2  
In the nonparametric regression models, the original regression estimators including kernel estimator, Fourier series estimator and wavelet estimator are always constructed by the weighted sum of data, and the weights depend only on the distance between the design points and estimation points. As a result these estimators are not robust to the perturbations in data. In order to avoid this problem, a new nonparametric regression model, called the depth-weighted regression model, is introduced and then the depth-weighted wavelet estimation is defined. The new estimation is robust to the perturbations in data, which attains very high breakdown value close to 1/2. On the other hand, some asymptotic behaviours such as asymptotic normality are obtained. Some simulations illustrate that the proposed wavelet estimator is more robust than the original wavelet estimator and, as a price to pay for the robustness, the new method is slightly less efficient than the original method.  相似文献   

13.
The outlier detection problem and the robust covariance estimation problem are often interchangeable. Without outliers, the classical method of maximum likelihood estimation (MLE) can be used to estimate parameters of a known distribution from observational data. When outliers are present, they dominate the log likelihood function causing the MLE estimators to be pulled toward them. Many robust statistical methods have been developed to detect outliers and to produce estimators that are robust against deviation from model assumptions. However, the existing methods suffer either from computational complexity when problem size increases or from giving up desirable properties, such as affine equivariance. An alternative approach is to design a special mathematical programming model to find the optimal weights for all the observations, such that at the optimal solution, outliers are given smaller weights and can be detected. This method produces a covariance estimator that has the following properties: First, it is affine equivariant. Second, it is computationally efficient even for large problem sizes. Third, it easy to incorporate prior beliefs into the estimator by using semi-definite programming. The accuracy of this method is tested for different contamination models, including recently proposed ones. The method is not only faster than the Fast-MCD method for high dimensional data but also has reasonable accuracy for the tested cases.  相似文献   

14.
We establish the consistency, asymptotic normality, and efficiency for estimators derived by minimizing the median of a loss function in a Bayesian context. We contrast this procedure with the behavior of two Frequentist procedures, the least median of squares (LMS) and the least trimmed squares (LTS) estimators, in regression problems. The LMS estimator is the Frequentist version of our estimator, and the LTS estimator approaches a median-based estimator as the trimming approaches 50% on each side. We argue that the Bayesian median-based method is a good tradeoff between the two Frequentist estimators.  相似文献   

15.
对稳健回归尺度参数估计的一种改进   总被引:3,自引:0,他引:3  
常对线性回归模型的稳健 M估计中 ,尺度参数使用绝对离差中位数 MAD.将 Rousseeuw等人对单变量尺度参数的一种稳健估计 Sn引入到回归问题中 ,讨论了此估计的一些优良性质 ,并通过一个小规模的模拟研究 ,说明使用 Sn比使用 MAD做尺度参数将会较大地提高回归估计的估计效率 .  相似文献   

16.
Standard methods for optimal allocation of shares in a financial portfolio are determined by second-order conditions which are very sensitive to outliers. The well-known Markowitz approach, which is based on the input of a mean vector and a covariance matrix, seems to provide questionable results in financial management, since small changes of inputs might lead to irrelevant portfolio allocations. However, existing robust estimators often suffer from masking of multiple influential observations, so we propose a new robust estimator which suitably weights data using a forward search approach. A Monte Carlo simulation study and an application to real data show some advantages of the proposed approach.  相似文献   

17.
In the multiple changepoint setting, various search methods have been proposed, which involve optimizing either a constrained or penalized cost function over possible numbers and locations of changepoints using dynamic programming. Recent work in the penalized optimization setting has focused on developing an exact pruning-based approach that, under certain conditions, is linear in the number of data points. Such an approach naturally requires the specification of a penalty to avoid under/over-fitting. Work has been undertaken to identify the appropriate penalty choice for data-generating processes with known distributional form, but in many applications the model assumed for the data is not correct and these penalty choices are not always appropriate. To this end, we present a method that enables us to find the solution path for all choices of penalty values across a continuous range. This permits an evaluation of the various segmentations to identify a suitable penalty choice. The computational complexity of this approach can be linear in the number of data points and linear in the difference between the number of changepoints in the optimal segmentations for the smallest and largest penalty values. Supplementary materials for this article are available online.  相似文献   

18.
We introduce fast and robust algorithms for lower rank approximation to given matrices based on robust alternating regression. The alternating least squares regression, also called criss-cross regression, was used for lower rank approximation of matrices, but it lacks robustness against outliers in these matrices. We use robust regression estimators and address some of the complications arising from this approach. We find it helpful to use high breakdown estimators in the initial iterations, followed by M estimators with monotone score functions in later iterations towards convergence. In addition to robustness, the computational speed is another important consideration in the development of our proposed algorithm, because alternating robust regression can be computationally intensive for large matrices. Based on a mix of the least trimmed squares (LTS) and Huber's M estimators, we demonstrate that fast and robust lower rank approximations are possible for modestly large matrices.  相似文献   

19.
A new algorithm to solve exact least trimmed squares (LTS) regression is presented. The adding row algorithm (ARA) extends existing methods that compute the LTS estimator for a given coverage. It employs a tree-based strategy to compute a set of LTS regressors for a range of coverage values. Thus, prior knowledge of the optimal coverage is not required. New nodes in the regression tree are generated by updating the QR decomposition of the data matrix after adding one observation to the regression model. The ARA is enhanced by employing a branch and bound strategy. The branch and bound algorithm is an exhaustive algorithm that uses a cutting test to prune nonoptimal subtrees. It significantly improves over the ARA in computational performance. Observation preordering throughout the traversal of the regression tree is investigated. A computationally efficient and numerically stable calculation of the bounds using Givens rotations is designed around the QR decomposition, avoiding the need to explicitly update the triangular factor when an observation is added. This reduces the overall computational load of the preordering device by approximately half. A solution is proposed to allow preordering when the model is underdetermined. It employs pseudo-orthogonal rotations to downdate the QR decomposition. The strategies are illustrated by example. Experimental results confirm the computational efficiency of the proposed algorithms. Supplemental materials (R package and formal proofs) are available online.  相似文献   

20.
The estimation of correlation dimension of continuous and discreet deterministic chaotic processes corrupted by an additive noise and outliers observations is investigated. In this paper we propose a new estimator of correlation dimension based on similarity between the evolution of Gaussian kernel correlation sum (Gkcs) and that of modified Boltzmann sigmoidal function (mBsf), this estimator is given by the maximum value of the first derivative of logarithmic transform of Gkcs against logarithmic transform of bandwidth, so the proposed estimator is independent of the choice of regression region like other regression estimators of correlation dimension. Simulation study indicates the robustness of proposed estimator to the presence of different types of noise such us independent Gaussian noise, non independent Gaussian noise and uniform noise for high noise level, moreover, this estimator is also robust to presence of 60% of outliers observations. Application of this new estimator with determination of their confidence interval using the moving block bootstrap method to adjusted closed price of S&P500 index daily time series revels the stochastic behavior of such financial time series.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号