首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
The canonical correlation (CANCOR) method for dimension reduction in a regression setting is based on the classical estimates of the first and second moments of the data, and therefore sensitive to outliers. In this paper, we study a weighted canonical correlation (WCANCOR) method, which captures a subspace of the central dimension reduction subspace, as well as its asymptotic properties. In the proposed WCANCOR method, each observation is weighted based on its Mahalanobis distance to the location of the predictor distribution. Robust estimates of the location and scatter, such as the minimum covariance determinant (MCD) estimator of Rousseeuw [P.J. Rousseeuw, Multivariate estimation with high breakdown point, Mathematical Statistics and Applications B (1985) 283-297], can be used to compute the Mahalanobis distance. To determine the number of significant dimensions in WCANCOR, a weighted permutation test is considered. A comparison of SIR, CANCOR and WCANCOR is also made through simulation studies to show the robustness of WCANCOR to outlying observations. As an example, the Boston housing data is analyzed using the proposed WCANCOR method.  相似文献   

2.
In sampling theory, the traditional ratio estimator is the most common estimator of the population mean when the correlation between study and auxiliary variables is positively high. We introduce a new ratio-type estimator based on the order statistics of a simple random sample. We show that this new estimator is considerably more efficient than the traditional ratio estimator under non-normality, and remarkably robust to data anomalies such as presence of outliers in data sets.  相似文献   

3.
The least squares (LS) estimator seems the natural estimator of the coefficients of a Gaussian linear regression model. However, if the dimension of the vector of coefficients is greater than 2 and the residuals are independent and identically distributed, this conventional estimator is not admissible. James and Stein [Estimation with quadratic loss, Proceedings of the Fourth Berkely Symposium vol. 1, 1961, pp. 361-379] proposed a shrinkage estimator (James-Stein estimator) which improves the least squares estimator with respect to the mean squares error loss function. In this paper, we investigate the mean squares error of the James-Stein (JS) estimator for the regression coefficients when the residuals are generated from a Gaussian stationary process. Then, sufficient conditions for the JS to improve the LS are given. It is important to know the influence of the dependence on the JS. Also numerical studies illuminate some interesting features of the improvement. The results have potential applications to economics, engineering, and natural sciences.  相似文献   

4.
A Gaussian measurement error assumption, that is, an assumption that the data are observed up to Gaussian noise, can bias any parameter estimation in the presence of outliers. A heavy tailed error assumption based on Student’s t distribution helps reduce the bias. However, it may be less efficient in estimating parameters if the heavy tailed assumption is uniformly applied to all of the data when most of them are normally observed. We propose a mixture error assumption that selectively converts Gaussian errors into Student’s t errors according to latent outlier indicators, leveraging the best of the Gaussian and Student’s t errors; a parameter estimation can be not only robust but also accurate. Using simulated hospital profiling data and astronomical time series of brightness data, we demonstrate the potential for the proposed mixture error assumption to estimate parameters accurately in the presence of outliers. Supplemental materials for this article are available online.  相似文献   

5.
Theories of nonparametric regression are usually based on the assumption that the design density exists. However, in some applications such as those involving high-dimensional or chaotic time series data, the design measure may be singular and may be likely to have a fractal (nonintegral) dimension. In this paper, the popular Nadaraya–Watson estimator is studied under the general setup that the continuity of the design measure is governed by the local or pointwise dimension. It will be shown in the iid setup that the nonparametric regression estimator achieves a convergence rate which is dependent only on the pointwise dimension. The case of time series data is also studied. For the latter case, a new mixing condition is introduced, and an assumption of marginal or joint density is completely avoided. Three examples, a fractal regression and two applications for predicting chaotic time series, are used to illustrate the implications of the obtained results.  相似文献   

6.
The presence of groups containing high leverage outliers makes linear regression a difficult problem due to the masking effect. The available high breakdown estimators based on Least Trimmed Squares often do not succeed in detecting masked high leverage outliers in finite samples. An alternative to the LTS estimator, called Penalised Trimmed Squares (PTS) estimator, was introduced by the authors in Zioutas and Avramidis (2005) Acta Math Appl Sin 21:323–334, Zioutas et al. (2007) REVSTAT 5:115–136 and it appears to be less sensitive to the masking problem. This estimator is defined by a Quadratic Mixed Integer Programming (QMIP) problem, where in the objective function a penalty cost for each observation is included which serves as an upper bound on the residual error for any feasible regression line. Since the PTS does not require presetting the number of outliers to delete from the data set, it has better efficiency with respect to other estimators. However, due to the high computational complexity of the resulting QMIP problem, exact solutions for moderately large regression problems is infeasible. In this paper we further establish the theoretical properties of the PTS estimator, such as high breakdown and efficiency, and propose an approximate algorithm called Fast-PTS to compute the PTS estimator for large data sets efficiently. Extensive computational experiments on sets of benchmark instances with varying degrees of outlier contamination, indicate that the proposed algorithm performs well in identifying groups of high leverage outliers in reasonable computational time.  相似文献   

7.

Euclidean embedding from noisy observations containing outlier errors is an important and challenging problem in statistics and machine learning. Many existing methods would struggle with outliers due to a lack of detection ability. In this paper, we propose a matrix optimization based embedding model that can produce reliable embeddings and identify the outliers jointly. We show that the estimators obtained by the proposed method satisfy a non-asymptotic risk bound, implying that the model provides a high accuracy estimator with high probability when the order of the sample size is roughly the degree of freedom up to a logarithmic factor. Moreover, we show that under some mild conditions, the proposed model also can identify the outliers without any prior information with high probability. Finally, numerical experiments demonstrate that the matrix optimization-based model can produce configurations of high quality and successfully identify outliers even for large networks.

  相似文献   

8.
Correlation between microstructure noise and latent financial logarithmic returns is an empirically relevant phenomenon with sound theoretical justification. With few notable exceptions, all integrated variance estimators proposed in the financial literature are not designed to explicitly handle such a dependence, or handle it only in special settings. We provide an integrated variance estimator that is robust to correlated noise and returns. For this purpose, a generalization of the forward filtering backward sampling algorithm is proposed, to provide a sampling technique for a latent conditionally Gaussian random sequence. We apply our methodology to intraday Microsoft prices and compare it in a simulation study with established alternatives, showing an advantage in terms of root‐mean‐square error and dispersion.  相似文献   

9.
We consider adaptive estimating the value of a linear functional from indirect white noise observations. For a flexible approach, the problem is embedded in an abstract Hilbert scale. We develop an adaptive estimator that is rate optimal within a logarithmic factor simultaneously over a wide collection of balls in the Hilbert scale. It is shown that the proposed estimator has the best possible adaptive properties for a wide range of linear functionals. The case of discretized indirect white noise observations is studied, and the adaptive estimator in this setting is developed. Received: 5 May 1999 / Revised version: 25 October 1999 / Published online: 5 September 2000  相似文献   

10.
To perform multiple regression, the least squares estimator is commonly used. However, this estimator is not robust to outliers. Therefore, robust methods such as S-estimation have been proposed. These estimators flag any observation with a large residual as an outlier and downweight it in the further procedure. However, a large residual may be caused by an outlier in only one single predictor variable, and downweighting the complete observation results in a loss of information. Therefore, we propose the shooting S-estimator, a regression estimator that is especially designed for situations where a large number of observations suffer from contamination in a small number of predictor variables. The shooting S-estimator combines the ideas of the coordinate descent algorithm with simple S-regression, which makes it robust against componentwise contamination, at the cost of failing the regression equivariance property.  相似文献   

11.
We consider the problem of deleting bad influential observations (outliers) in linear regression models. The problem is formulated as a Quadratic Mixed Integer Programming (QMIP) problem, where penalty costs for discarding outliers are used into the objective function. The optimum solution defines a robust regression estimator called penalized trimmed squares (PTS). Due to the high computational complexity of the resulting QMIP problem, the proposed robust procedure is computationally suitable for small sample data. The computational performance and the effectiveness of the new procedure are improved significantly by using the idea of ε-Insensitive loss function from support vectors machine regression. Small errors are ignored, and the mathematical formula gains the sparseness property. The good performance of the ε-Insensitive PTS (IPTS) estimator allows identification of multiple outliers avoiding masking or swamping effects. The computational effectiveness and successful outlier detection of the proposed method is demonstrated via simulated experiments. This research has been partially funded by the Greek Ministry of Education under the program Pythagoras II.  相似文献   

12.
This study considers rank estimation of the regression coefficients of the single index regression model. Conditions needed for the consistency and asymptotic normality of the proposed estimator are established. Monte Carlo simulation experiments demonstrate the robustness and efficiency of the proposed estimator compared to the semiparametric least squares estimator. A real-life example illustrates that the rank regression procedure effectively corrects model nonlinearity even in the presence of outliers in the response space.  相似文献   

13.
In multiple linear regression model, we have presupposed assumptions (independence, normality, variance homogeneity and so on) on error term. When case weights are given because of variance heterogeneity, we can estimate efficiently regression parameter using weighted least squares estimator. Unfortunately, this estimator is sensitive to outliers like ordinary least squares estimator. Thus, in this paper, we proposed some statistics for detection of outliers in weighted least squares regression.  相似文献   

14.
非参数回归函数估计的渐近正态性   总被引:6,自引:0,他引:6  
胡舒合 《数学学报》2002,45(3):433-442
本文研究了独立或相依样本时非参数回归函数的Nadaraya-Watson估计,在简洁合理的条件下,证明了估计量的渐近正态性.获得的结论可在时间序列分析中得到应用.  相似文献   

15.
This paper focuses on efficient computational approaches to compute approximate solutions of a linear inverse problem that is contaminated with mixed Poisson–Gaussian noise, and when there are additional outliers in the measured data. The Poisson–Gaussian noise leads to a weighted minimization problem, with solution-dependent weights. To address outliers, the standard least squares fit-to-data metric is replaced by the Talwar robust regression function. Convexity, regularization parameter selection schemes, and incorporation of non-negative constraints are investigated. A projected Newton algorithm is used to solve the resulting constrained optimization problem, and a preconditioner is proposed to accelerate conjugate gradient Hessian solves. Numerical experiments on problems from image deblurring illustrate the effectiveness of the methods.  相似文献   

16.
This paper studies spectral density estimation based on amplitude modulation including missing data as a specific case. A generalized periodogram is introduced and smoothed to give a consistent estimator of the spectral density by running local linear regression smoother. We explore the asymptotic properties of the proposed estimator and its application to time series data with periodic missing. A simple data-driven local bandwidth selection rule is proposed and an algorithm for computing the spectral density estimate is presented. The effectiveness of the proposed method is demonstrated using simulations. The application to outlier detection based on leave-one-out diagnostic is also considered. An illustrative example shows that the proposed diagnostic procedure succeeds in revealing outliers in time series without masking and smearing effects. Supported by Chinese NSF Grants 10001004 and 39930160, and Fellowship of City University of Hong Kong.  相似文献   

17.
We define a non-parametric estimator of the integrated leverage effect as the integrated covariation between the logarithmic asset price and its volatility. In Curato and Sanfelici (2015), a consistent estimator of the leverage effect has been introduced through a pre-estimate of the Fourier coefficients of the volatility. This is a novel approach compared to the ones present in the literature which use a pre-estimate of the spot volatility path. In this paper, we show the asymptotic normality of the Fourier estimator for non-equidistant observations. Moreover, its finite sample properties are analyzed in a simulation study also in the presence of microstructure noise.  相似文献   

18.
The kernel-based regression (KBR) method, such as support vector machine for regression (SVR) is a well-established methodology for estimating the nonlinear functional relationship between the response variable and predictor variables. KBR methods can be very sensitive to influential observations that in turn have a noticeable impact on the model coefficients. The robustness of KBR methods has recently been the subject of wide-scale investigations with the aim of obtaining a regression estimator insensitive to outlying observations. However, existing robust KBR (RKBR) methods only consider Y-space outliers and, consequently, are sensitive to X-space outliers. As a result, even a single anomalous outlying observation in X-space may greatly affect the estimator. In order to resolve this issue, we propose a new RKBR method that gives reliable result even if a training data set is contaminated with both Y-space and X-space outliers. The proposed method utilizes a weighting scheme based on the hat matrix that resembles the generalized M-estimator (GM-estimator) of conventional robust linear analysis. The diagonal elements of hat matrix in kernel-induced feature space are used as leverage measures to downweight the effects of potential X-space outliers. We show that the kernelized hat diagonal elements can be obtained via eigen decomposition of the kernel matrix. The regularized version of kernelized hat diagonal elements is also proposed to deal with the case of the kernel matrix having full rank where the kernelized hat diagonal elements are not suitable for leverage. We have shown that two kernelized leverage measures, namely, the kernel hat diagonal element and the regularized one, are related to statistical distance measures in the feature space. We also develop an efficiently kernelized training algorithm for the parameter estimation based on iteratively reweighted least squares (IRLS) method. The experimental results from simulated examples and real data sets demonstrate the robustness of our proposed method compared with conventional approaches.  相似文献   

19.
The problem of imputing missing observations under the linear regression model is considered. It is assumed that observations are missing at random and all the observations on the auxiliary or independent variables are available. Estimates of the regression parameters based on singly and multiply imputed values are given. Jackknife as well as bootstrap estimates of the variance of the singly imputed estimator of the regression parameters are given. These estimators are shown to be consistent estimators. The asymptotic distributions of the imputed estimators are also given to obtain interval estimates of the parameters of interest. These interval estimates are then compared with the interval estimates obtained from multiple imputation. It is shown that singly imputed estimators perform at least as good as multiply imputed estimators. A new nonparametric multiply imputed estimator is proposed and shown to perform as good as a multiply imputed estimator under normality. The singly imputed estimator, however, still remains at least as good as a multiply imputed estimator.  相似文献   

20.
In this article we consider the volatility inference in the presence of both market microstructure noise and endogenous time. Estimators of the integrated volatility in such a setting are proposed, and their asymptotic properties are studied. Our proposed estimator is compared with the existing popular volatility estimators via numerical studies. The results show that our estimator can have substantially better performance when time endogeneity exists.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号