期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Robust kernel-based regression with bounded influence for outliers

Sangheum Hwang Dohyun Kim Myong K Jeong Bong-Jin Yum 《The Journal of the Operational Research Society》2015,66(8):1385-1398

The kernel-based regression (KBR) method, such as support vector machine for regression (SVR) is a well-established methodology for estimating the nonlinear functional relationship between the response variable and predictor variables. KBR methods can be very sensitive to influential observations that in turn have a noticeable impact on the model coefficients. The robustness of KBR methods has recently been the subject of wide-scale investigations with the aim of obtaining a regression estimator insensitive to outlying observations. However, existing robust KBR (RKBR) methods only consider Y-space outliers and, consequently, are sensitive to X-space outliers. As a result, even a single anomalous outlying observation in X-space may greatly affect the estimator. In order to resolve this issue, we propose a new RKBR method that gives reliable result even if a training data set is contaminated with both Y-space and X-space outliers. The proposed method utilizes a weighting scheme based on the hat matrix that resembles the generalized M-estimator (GM-estimator) of conventional robust linear analysis. The diagonal elements of hat matrix in kernel-induced feature space are used as leverage measures to downweight the effects of potential X-space outliers. We show that the kernelized hat diagonal elements can be obtained via eigen decomposition of the kernel matrix. The regularized version of kernelized hat diagonal elements is also proposed to deal with the case of the kernel matrix having full rank where the kernelized hat diagonal elements are not suitable for leverage. We have shown that two kernelized leverage measures, namely, the kernel hat diagonal element and the regularized one, are related to statistical distance measures in the feature space. We also develop an efficiently kernelized training algorithm for the parameter estimation based on iteratively reweighted least squares (IRLS) method. The experimental results from simulated examples and real data sets demonstrate the robustness of our proposed method compared with conventional approaches. 相似文献

2.

Robust estimation in generalized semiparametric mixed models for longitudinal data

Guoyou Qin 《Journal of multivariate analysis》2007,98(8):1658-1683

In this paper, we consider robust generalized estimating equations for the analysis of semiparametric generalized partial linear mixed models (GPLMMs) for longitudinal data. We approximate the non-parametric function in the GPLMM by a regression spline, and make use of bounded scores and leverage-based weights in the estimating equation to achieve robustness against outliers and influential data points, respectively. Under some regularity conditions, the asymptotic properties of the robust estimators are investigated. To avoid the computational problems involving high-dimensional integrals in our estimators, we adopt a robust Monte Carlo Newton-Raphson (RMCNR) algorithm for fitting GPLMMs. Small simulations are carried out to study the behavior of the robust estimates in the presence of outliers, and these estimates are also compared to their corresponding non-robust estimates. The proposed robust method is illustrated in the analysis of two real data sets. 相似文献

3.

Tree-based multivariate regression and density estimation with right-censored data

Annette M. Molinaro Sandrine Dudoit Mark J. van der Laan 《Journal of multivariate analysis》2004,90(1):154-177

We propose a unified strategy for estimator construction, selection, and performance assessment in the presence of censoring. This approach is entirely driven by the choice of a loss function for the full (uncensored) data structure and can be stated in terms of the following three main steps. (1) First, define the parameter of interest as the minimizer of the expected loss, or risk, for a full data loss function chosen to represent the desired measure of performance. Map the full data loss function into an observed (censored) data loss function having the same expected value and leading to an efficient estimator of this risk. (2) Next, construct candidate estimators based on the loss function for the observed data. (3) Then, apply cross-validation to estimate risk based on the observed data loss function and to select an optimal estimator among the candidates. A number of common estimation procedures follow this approach in the full data situation, but depart from it when faced with the obstacle of evaluating the loss function for censored observations. Here, we argue that one can, and should, also adhere to this estimation road map in censored data situations.Tree-based methods, where the candidate estimators in Step 2 are generated by recursive binary partitioning of a suitably defined covariate space, provide a striking example of the chasm between estimation procedures for full data and censored data (e.g., regression trees as in CART for uncensored data and adaptations to censored data). Common approaches for regression trees bypass the risk estimation problem for censored outcomes by altering the node splitting and tree pruning criteria in manners that are specific to right-censored data. This article describes an application of our unified methodology to tree-based estimation with censored data. The approach encompasses univariate outcome prediction, multivariate outcome prediction, and density estimation, simply by defining a suitable loss function for each of these problems. The proposed method for tree-based estimation with censoring is evaluated using a simulation study and the analysis of CGH copy number and survival data from breast cancer patients. 相似文献

4.

Boundary kernels for adaptive density estimators on regions with irregular boundaries

Jonathan C. Marshall 《Journal of multivariate analysis》2010,101(4):949-963

In some applications of kernel density estimation the data may have a highly non-uniform distribution and be confined to a compact region. Standard fixed bandwidth density estimates can struggle to cope with the spatially variable smoothing requirements, and will be subject to excessive bias at the boundary of the region. While adaptive kernel estimators can address the first of these issues, the study of boundary kernel methods has been restricted to the fixed bandwidth context. We propose a new linear boundary kernel which reduces the asymptotic order of the bias of an adaptive density estimator at the boundary, and is simple to implement even on an irregular boundary. The properties of this adaptive boundary kernel are examined theoretically. In particular, we demonstrate that the asymptotic performance of the density estimator is maintained when the adaptive bandwidth is defined in terms of a pilot estimate rather than the true underlying density. We examine the performance for finite sample sizes numerically through analysis of simulated and real data sets. 相似文献

5.

Strong convergence in nonparametric regression with truncated dependent data

Han-Ying Liang Deli Li 《Journal of multivariate analysis》2009,100(1):162-174

In this paper we derive rates of uniform strong convergence for the kernel estimator of the regression function in a left-truncation model. It is assumed that the lifetime observations with multivariate covariates form a stationary α-mixing sequence. The estimation of the covariate’s density is considered as well. Under the assumption that the lifetime observations are bounded, we show that, by an appropriate choice of the bandwidth, both estimators of the covariate’s density and regression function attain the optimal strong convergence rate known from independent complete samples. 相似文献

6.

Asymptotic distributions of nonparametric regression estimators for longitudinal or functional data

Fang Yao 《Journal of multivariate analysis》2007,98(1):40-56

The estimation of a regression function by kernel method for longitudinal or functional data is considered. In the context of longitudinal data analysis, a random function typically represents a subject that is often observed at a small number of time points, while in the studies of functional data the random realization is usually measured on a dense grid. However, essentially the same methods can be applied to both sampling plans, as well as in a number of settings lying between them. In this paper general results are derived for the asymptotic distributions of real-valued functions with arguments which are functionals formed by weighted averages of longitudinal or functional data. Asymptotic distributions for the estimators of the mean and covariance functions obtained from noisy observations with the presence of within-subject correlation are studied. These asymptotic normality results are comparable to those standard rates obtained from independent data, which is illustrated in a simulation study. Besides, this paper discusses the conditions associated with sampling plans, which are required for the validity of local properties of kernel-based estimators for longitudinal or functional data. 相似文献

7.

Some data-analytic modifications to Bayes-Stein estimation

Tom Leonard 《Annals of the Institute of Statistical Mathematics》1984,36(1):11-21

Summary The usual Bayes-Stein shrinkages of maximum likelihood estimates towards a common value may be refined by taking fuller account of the locations of the individual observations. Under a Bayesian formulation, the types of shrinkages depend critically upon the nature of the common distribution assumed for the parameters at the second stage of the prior model. In the present paper this distribution is estimated empirically from the data, permitting the data to determine the nature of the shrinkages. For example, when the observations are located in two or more clearly distinct groups, the maximum likelihood estimates are roughly speaking constrained towards common values within each group. The method also detects outliers; an extreme observation will either the regarded as an outlier and not substantially adjusted towards the other observations, or it will be rejected as an outlier, in which case a more radical adjustment takes place. The method is appropriate for a wide range of sampling distributions and may also be viewed as an alternative to standard multiple comparisons, cluster analysis, and nonparametric kernel methods. 相似文献

8.

Kernel-based goodness-of-fit tests for copulas with fixed smoothing parameters

Olivier Scaillet 《Journal of multivariate analysis》2007,98(3):533-543

We study a test statistic based on the integrated squared difference between a kernel estimator of the copula density and a kernel smoothed estimator of the parametric copula density. We show for fixed smoothing parameters that the test is consistent and that the asymptotic properties are driven by a U-statistic of order 4 with degeneracy of order 1. For practical implementation we suggest to compute the critical values through a semiparametric bootstrap. Monte Carlo results show that the bootstrap procedure performs well in small samples. In particular, size and power are less sensitive to smoothing parameter choice than they are under the asymptotic approximation obtained for a vanishing bandwidth. 相似文献

9.

Analysis of two-sample truncated data using generalized logistic models

Gang Li Jing Qin 《Journal of multivariate analysis》2006,97(3):675-697

Parallel to Cox's [JRSS B34 (1972) 187-230] proportional hazards model, generalized logistic models have been discussed by Anderson [Bull. Int. Statist. Inst. 48 (1979) 35-53] and others. The essential assumption is that the two densities ratio has a known parametric form. A nice property of this model is that it naturally relates to the logistic regression model for categorical data. In astronomic, demographic, epidemiological, and other studies the variable of interest is often truncated by an associated variable. This paper studies generalized logistic models for the two-sample truncated data problem, where the two lifetime densities ratio is assumed to have the form exp{α+φ(x;β)}. Here φ is a known function of x and β, and the baseline density is unspecified. We develop a semiparametric maximum likelihood method for the case where the two samples have a common truncation distribution. It is shown that inferences for β do not depend the nonparametric components. We also derive an iterative algorithm to maximize the semiparametric likelihood for the general case where different truncation distributions are allowed. We further discuss how to check goodness of fit of the generalized logistic model. The developed methods are illustrated and evaluated using both simulated and real data. 相似文献

10.

Some inequalities for strong mixing random variables with applications to density estimation

Yongming Li Shanchao Yang 《Statistics & probability letters》2011,81(2):250-258

In this paper, we establish an inequality of the characteristic functions for strongly mixing random vectors, by which, an upper bound is provided for the supremum of the absolute value of the difference of two multivariate probability density functions based on strongly mixing random vectors. As its application, we consider the consistency and asymptotic normality of a kernel estimate of a density function under strong mixing. Our results generalize some known results in the literature. 相似文献