期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Robust dimension reduction based on canonical correlation

Jianhui Zhou 《Journal of multivariate analysis》2009,100(1):195-209

The canonical correlation (CANCOR) method for dimension reduction in a regression setting is based on the classical estimates of the first and second moments of the data, and therefore sensitive to outliers. In this paper, we study a weighted canonical correlation (WCANCOR) method, which captures a subspace of the central dimension reduction subspace, as well as its asymptotic properties. In the proposed WCANCOR method, each observation is weighted based on its Mahalanobis distance to the location of the predictor distribution. Robust estimates of the location and scatter, such as the minimum covariance determinant (MCD) estimator of Rousseeuw [P.J. Rousseeuw, Multivariate estimation with high breakdown point, Mathematical Statistics and Applications B (1985) 283-297], can be used to compute the Mahalanobis distance. To determine the number of significant dimensions in WCANCOR, a weighted permutation test is considered. A comparison of SIR, CANCOR and WCANCOR is also made through simulation studies to show the robustness of WCANCOR to outlying observations. As an example, the Boston housing data is analyzed using the proposed WCANCOR method. 相似文献

2.

A robust alternative to the ratio estimator under non-normality

Evrim Oral Ece Oral 《Statistics & probability letters》2011,81(8):930-936

In sampling theory, the traditional ratio estimator is the most common estimator of the population mean when the correlation between study and auxiliary variables is positively high. We introduce a new ratio-type estimator based on the order statistics of a simple random sample. We show that this new estimator is considerably more efficient than the traditional ratio estimator under non-normality, and remarkably robust to data anomalies such as presence of outliers in data sets. 相似文献

3.

James-Stein estimators for time series regression models

Motohiro Senda 《Journal of multivariate analysis》2006,97(9):1984-1996

The least squares (LS) estimator seems the natural estimator of the coefficients of a Gaussian linear regression model. However, if the dimension of the vector of coefficients is greater than 2 and the residuals are independent and identically distributed, this conventional estimator is not admissible. James and Stein [Estimation with quadratic loss, Proceedings of the Fourth Berkely Symposium vol. 1, 1961, pp. 361-379] proposed a shrinkage estimator (James-Stein estimator) which improves the least squares estimator with respect to the mean squares error loss function. In this paper, we investigate the mean squares error of the James-Stein (JS) estimator for the regression coefficients when the residuals are generated from a Gaussian stationary process. Then, sufficient conditions for the JS to improve the LS are given. It is important to know the influence of the dependence on the JS. Also numerical studies illuminate some interesting features of the improvement. The results have potential applications to economics, engineering, and natural sciences. 相似文献

4.

Robust and Accurate Inference via a Mixture of Gaussian and Student’s t Errors

Hyungsuk Tak Justin A. Ellis Sujit K. Ghosh 《Journal of computational and graphical statistics》2019,28(2):415-426

A Gaussian measurement error assumption, that is, an assumption that the data are observed up to Gaussian noise, can bias any parameter estimation in the presence of outliers. A heavy tailed error assumption based on Student’s t distribution helps reduce the bias. However, it may be less efficient in estimating parameters if the heavy tailed assumption is uniformly applied to all of the data when most of them are normally observed. We propose a mixture error assumption that selectively converts Gaussian errors into Student’s t errors according to latent outlier indicators, leveraging the best of the Gaussian and Student’s t errors; a parameter estimation can be not only robust but also accurate. Using simulated hospital profiling data and astronomical time series of brightness data, we demonstrate the potential for the proposed mixture error assumption to estimate parameters accurately in the presence of outliers. Supplemental materials for this article are available online. 相似文献

5.

Nonparametric Regression with Singular Design

Zhan-Qian Lu 《Journal of multivariate analysis》1999,70(2):177

Theories of nonparametric regression are usually based on the assumption that the design density exists. However, in some applications such as those involving high-dimensional or chaotic time series data, the design measure may be singular and may be likely to have a fractal (nonintegral) dimension. In this paper, the popular Nadaraya–Watson estimator is studied under the general setup that the continuity of the design measure is governed by the local or pointwise dimension. It will be shown in the iid setup that the nonparametric regression estimator achieves a convergence rate which is dependent only on the pointwise dimension. The case of time series data is also studied. For the latter case, a new mixing condition is introduced, and an assumption of marginal or joint density is completely avoided. Three examples, a fractal regression and two applications for predicting chaotic time series, are used to illustrate the implications of the obtained results. 相似文献

6.

A fast algorithm for robust regression with penalised trimmed squares

L. Pitsoulis G. Zioutas 《Computational Statistics》2010,25(4):663-689

The presence of groups containing high leverage outliers makes linear regression a difficult problem due to the masking effect. The available high breakdown estimators based on Least Trimmed Squares often do not succeed in detecting masked high leverage outliers in finite samples. An alternative to the LTS estimator, called Penalised Trimmed Squares (PTS) estimator, was introduced by the authors in Zioutas and Avramidis (2005) Acta Math Appl Sin 21:323–334, Zioutas et al. (2007) REVSTAT 5:115–136 and it appears to be less sensitive to the masking problem. This estimator is defined by a Quadratic Mixed Integer Programming (QMIP) problem, where in the objective function a penalty cost for each observation is included which serves as an upper bound on the residual error for any feasible regression line. Since the PTS does not require presetting the number of outliers to delete from the data set, it has better efficiency with respect to other estimators. However, due to the high computational complexity of the resulting QMIP problem, exact solutions for moderately large regression problems is infeasible. In this paper we further establish the theoretical properties of the PTS estimator, such as high breakdown and efficiency, and propose an approximate algorithm called Fast-PTS to compute the PTS estimator for large data sets efficiently. Extensive computational experiments on sets of benchmark instances with varying degrees of outlier contamination, indicate that the proposed algorithm performs well in identifying groups of high leverage outliers in reasonable computational time. 相似文献

7.

Matrix optimization based Euclidean embedding with outliers

Zhang Qian Zhao Xinyuan Ding Chao 《Computational Optimization and Applications》2021,79(2):235-271

Euclidean embedding from noisy observations containing outlier errors is an important and challenging problem in statistics and machine learning. Many existing methods would struggle with outliers due to a lack of detection ability. In this paper, we propose a matrix optimization based embedding model that can produce reliable embeddings and identify the outliers jointly. We show that the estimators obtained by the proposed method satisfy a non-asymptotic risk bound, implying that the model provides a high accuracy estimator with high probability when the order of the sample size is roughly the degree of freedom up to a logarithmic factor. Moreover, we show that under some mild conditions, the proposed model also can identify the outliers without any prior information with high probability. Finally, numerical experiments demonstrate that the matrix optimization-based model can produce configurations of high quality and successfully identify outliers even for large networks.

相似文献

8.

Conditionally Gaussian random sequences for an integrated variance estimator with correlation between noise and returns

Stefano Peluso Antonietta Mira Pietro Muliere 《商业与工业应用随机模型》2019,35(5):1282-1297

Correlation between microstructure noise and latent financial logarithmic returns is an empirically relevant phenomenon with sound theoretical justification. With few notable exceptions, all integrated variance estimators proposed in the financial literature are not designed to explicitly handle such a dependence, or handle it only in special settings. We provide an integrated variance estimator that is robust to correlated noise and returns. For this purpose, a generalization of the forward filtering backward sampling algorithm is proposed, to provide a sampling technique for a latent conditionally Gaussian random sequence. We apply our methodology to intraday Microsoft prices and compare it in a simulation study with established alternatives, showing an advantage in terms of root‐mean‐square error and dispersion. 相似文献

9.

Adaptive estimation of linear functionals in Hilbert scales from indirect white noise observations 总被引：4，自引：0，他引：4

Alexander Goldenshluger Sergei V. Pereverzev 《Probability Theory and Related Fields》2000,118(2):169-186

We consider adaptive estimating the value of a linear functional from indirect white noise observations. For a flexible approach, the problem is embedded in an abstract Hilbert scale. We develop an adaptive estimator that is rate optimal within a logarithmic factor simultaneously over a wide collection of balls in the Hilbert scale. It is shown that the proposed estimator has the best possible adaptive properties for a wide range of linear functionals. The case of discretized indirect white noise observations is studied, and the adaptive estimator in this setting is developed. Received: 5 May 1999 / Revised version: 25 October 1999 / Published online: 5 September 2000 相似文献

10.

The shooting S-estimator for robust regression

Viktoria Öllerer Andreas Alfons Christophe Croux 《Computational Statistics》2016,31(3):829-844

To perform multiple regression, the least squares estimator is commonly used. However, this estimator is not robust to outliers. Therefore, robust methods such as S-estimation have been proposed. These estimators flag any observation with a large residual as an outlier and downweight it in the further procedure. However, a large residual may be caused by an outlier in only one single predictor variable, and downweighting the complete observation results in a loss of information. Therefore, we propose the shooting S-estimator, a regression estimator that is especially designed for situations where a large number of observations suffer from contamination in a small number of predictor variables. The shooting S-estimator combines the ideas of the coordinate descent algorithm with simple S-regression, which makes it robust against componentwise contamination, at the cost of failing the regression equivariance property. 相似文献

11.

Quadratic mixed integer programming and support vectors for deleting outliers in robust regression

G. Zioutas L. Pitsoulis A. Avramidis 《Annals of Operations Research》2009,166(1):339-353

We consider the problem of deleting bad influential observations (outliers) in linear regression models. The problem is formulated as a Quadratic Mixed Integer Programming (QMIP) problem, where penalty costs for discarding outliers are used into the objective function. The optimum solution defines a robust regression estimator called penalized trimmed squares (PTS). Due to the high computational complexity of the resulting QMIP problem, the proposed robust procedure is computationally suitable for small sample data. The computational performance and the effectiveness of the new procedure are improved significantly by using the idea of ε-Insensitive loss function from support vectors machine regression. Small errors are ignored, and the mathematical formula gains the sparseness property. The good performance of the ε-Insensitive PTS (IPTS) estimator allows identification of multiple outliers avoiding masking or swamping effects. The computational effectiveness and successful outlier detection of the proposed method is demonstrated via simulated experiments. This research has been partially funded by the Greek Ministry of Education under the program Pythagoras II. 相似文献

12.

General rank-based estimation for regression single index models

Huybrechts F. Bindele Ash Abebe Karlene N. Meyer 《Annals of the Institute of Statistical Mathematics》2018,70(5):1115-1146

This study considers rank estimation of the regression coefficients of the single index regression model. Conditions needed for the consistency and asymptotic normality of the proposed estimator are established. Monte Carlo simulation experiments demonstrate the robustness and efficiency of the proposed estimator compared to the semiparametric least squares estimator. A real-life example illustrates that the rank regression procedure effectively corrects model nonlinearity even in the presence of outliers in the response space. 相似文献

13.

Detection of outliers in weighted least squares regression

Bang Yong Sohn Guk Boh Kim 《Journal of Applied Mathematics and Computing》1997,4(2):441-452

In multiple linear regression model, we have presupposed assumptions (independence, normality, variance homogeneity and so on) on error term. When case weights are given because of variance heterogeneity, we can estimate efficiently regression parameter using weighted least squares estimator. Unfortunately, this estimator is sensitive to outliers like ordinary least squares estimator. Thus, in this paper, we proposed some statistics for detection of outliers in weighted least squares regression. 相似文献

14.

非参数回归函数估计的渐近正态性 总被引：6，自引：0，他引：6

胡舒合《数学学报》2002,45(3):433-442

本文研究了独立或相依样本时非参数回归函数的Ｎａｄａｒａｙａ-Ｗａｔｓｏｎ估计，在简洁合理的条件下，证明了估计量的渐近正态性．获得的结论可在时间序列分析中得到应用．相似文献

15.

Robust regression for mixed Poisson–Gaussian model

Marie Kubínová James G. Nagy 《Numerical Algorithms》2018,79(3):825-851

This paper focuses on efficient computational approaches to compute approximate solutions of a linear inverse problem that is contaminated with mixed Poisson–Gaussian noise, and when there are additional outliers in the measured data. The Poisson–Gaussian noise leads to a weighted minimization problem, with solution-dependent weights. To address outliers, the standard least squares fit-to-data metric is replaced by the Talwar robust regression function. Convexity, regularization parameter selection schemes, and incorporation of non-negative constraints are investigated. A projected Newton algorithm is used to solve the resulting constrained optimization problem, and a preconditioner is proposed to accelerate conjugate gradient Hessian solves. Numerical experiments on problems from image deblurring illustrate the effectiveness of the methods. 相似文献

16.

Spectral density estimation with amplitude modulation and outlier detection

Jiancheng Jiang Y. V. Hui 《Annals of the Institute of Statistical Mathematics》2004,56(4):611-630

This paper studies spectral density estimation based on amplitude modulation including missing data as a specific case. A generalized periodogram is introduced and smoothed to give a consistent estimator of the spectral density by running local linear regression smoother. We explore the asymptotic properties of the proposed estimator and its application to time series data with periodic missing. A simple data-driven local bandwidth selection rule is proposed and an algorithm for computing the spectral density estimate is presented. The effectiveness of the proposed method is demonstrated using simulations. The application to outlier detection based on leave-one-out diagnostic is also considered. An illustrative example shows that the proposed diagnostic procedure succeeds in revealing outliers in time series without masking and smearing effects. Supported by Chinese NSF Grants 10001004 and 39930160, and Fellowship of City University of Hong Kong. 相似文献

17.

Estimation of the stochastic leverage effect using the Fourier transform method

Imma Valentina Curato 《Stochastic Processes and their Applications》2019,129(9):3207-3238

We define a non-parametric estimator of the integrated leverage effect as the integrated covariation between the logarithmic asset price and its volatility. In Curato and Sanfelici (2015), a consistent estimator of the leverage effect has been introduced through a pre-estimate of the Fourier coefficients of the volatility. This is a novel approach compared to the ones present in the literature which use a pre-estimate of the spot volatility path. In this paper, we show the asymptotic normality of the Fourier estimator for non-equidistant observations. Moreover, its finite sample properties are analyzed in a simulation study also in the presence of microstructure noise. 相似文献

18.

Robust kernel-based regression with bounded influence for outliers

Sangheum Hwang Dohyun Kim Myong K Jeong Bong-Jin Yum 《The Journal of the Operational Research Society》2015,66(8):1385-1398

The kernel-based regression (KBR) method, such as support vector machine for regression (SVR) is a well-established methodology for estimating the nonlinear functional relationship between the response variable and predictor variables. KBR methods can be very sensitive to influential observations that in turn have a noticeable impact on the model coefficients. The robustness of KBR methods has recently been the subject of wide-scale investigations with the aim of obtaining a regression estimator insensitive to outlying observations. However, existing robust KBR (RKBR) methods only consider Y-space outliers and, consequently, are sensitive to X-space outliers. As a result, even a single anomalous outlying observation in X-space may greatly affect the estimator. In order to resolve this issue, we propose a new RKBR method that gives reliable result even if a training data set is contaminated with both Y-space and X-space outliers. The proposed method utilizes a weighting scheme based on the hat matrix that resembles the generalized M-estimator (GM-estimator) of conventional robust linear analysis. The diagonal elements of hat matrix in kernel-induced feature space are used as leverage measures to downweight the effects of potential X-space outliers. We show that the kernelized hat diagonal elements can be obtained via eigen decomposition of the kernel matrix. The regularized version of kernelized hat diagonal elements is also proposed to deal with the case of the kernel matrix having full rank where the kernelized hat diagonal elements are not suitable for leverage. We have shown that two kernelized leverage measures, namely, the kernel hat diagonal element and the regularized one, are related to statistical distance measures in the feature space. We also develop an efficiently kernelized training algorithm for the parameter estimation based on iteratively reweighted least squares (IRLS) method. The experimental results from simulated examples and real data sets demonstrate the robustness of our proposed method compared with conventional approaches. 相似文献

19.

Multiple imputation and other resampling schemes for imputing missing observations

Muni S. Srivastava Mohammad Dolatabadi 《Journal of multivariate analysis》2009,100(9):1919-1937

The problem of imputing missing observations under the linear regression model is considered. It is assumed that observations are missing at random and all the observations on the auxiliary or independent variables are available. Estimates of the regression parameters based on singly and multiply imputed values are given. Jackknife as well as bootstrap estimates of the variance of the singly imputed estimator of the regression parameters are given. These estimators are shown to be consistent estimators. The asymptotic distributions of the imputed estimators are also given to obtain interval estimates of the parameters of interest. These interval estimates are then compared with the interval estimates obtained from multiple imputation. It is shown that singly imputed estimators perform at least as good as multiply imputed estimators. A new nonparametric multiply imputed estimator is proposed and shown to perform as good as a multiply imputed estimator under normality. The singly imputed estimator, however, still remains at least as good as a multiply imputed estimator. 相似文献

20.

Volatility inference in the presence of both endogenous time and microstructure noise

Yingying Li Zhiyuan Zhang Xinghua Zheng 《Stochastic Processes and their Applications》2013

In this article we consider the volatility inference in the presence of both market microstructure noise and endogenous time. Estimators of the integrated volatility in such a setting are proposed, and their asymptotic properties are studied. Our proposed estimator is compared with the existing popular volatility estimators via numerical studies. The results show that our estimator can have substantially better performance when time endogeneity exists. 相似文献