期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Robust Model Averaging Method Based on LOF Algorithm

Fan Wang Kang You & Guohua Zou 《数学研究通讯：英文版》2023,39(3):386-413

Model averaging is a good alternative to model selection, which can deal with the uncertainty from model selection process and make full use of the information from various candidate models. However, most of the existing model averaging criteria do not consider the influence of outliers on the estimation procedures. The purpose of this paper is to develop a robust model averaging approach based on the local outlier factor (LOF) algorithm which can downweight the outliers in the covariates. Asymptotic optimality of the proposed robust model averaging estimator is derived under some regularity conditions. Further, we prove the consistency of the LOF-based weight estimator tending to the theoretically optimal weight vector. Numerical studies including Monte Carlo simulations and a real data example are provided to illustrate our proposed methodology. 相似文献

2.

Frequentist model averaging for linear mixed-effects models

Xinjie CHEN Guohua ZOU Xinyu ZHANG 《Frontiers of Mathematics in China》2013,8(3):497-515

Linear mixed-effects models are a powerful tool for the analysis of longitudinal data. The aim of this paper is to study model averaging for linear mixed-effects models. The asymptotic distribution of the frequentist model average estimator is derived, and a confidence interval procedure with an actual coverage probability that tends to the nominal level in large samples is developed. The two confidence intervals based on the model averaging and based on the full model are shown to be asymptotically equivalent. A simulation study shows good finite sample performance of the model average estimators. 相似文献

3.

Choice of weights in FMA estimators under general parametric models

ZHANG XinYu ZOU GuoHua LIANG Hua 《中国科学数学(英文版)》2013,56(3):443-457

The choice of weights in frequentist model average estimators is an important but difficult problem. Liang et al. (2011) suggested a criterion for the choice of weight under a general parametric framework which is termed as the generalized OPT (GOPT) criterion in the present paper. However, no properties and applications of the criterion have been studied. This paper is devoted to the further investigation of the GOPT criterion. We show that how to use this criterion for comparison of some existing weights such as the smoothed AIC-based and BIC-based weights and for the choice between model averaging and model selection. Its connection to the Mallows and ordinary OPT criteria is built. The asymptotic optimality on the criterion in the case of non-random weights is also obtained. Finite sample performance of the GOPT criterion is assessed by simulations. Application to the analysis of two real data sets is presented as well. 相似文献

4.

线性测量误差模型的平均估计

王海鹰邹国华《系统科学与数学》2012,32(1):1-14

频率模型平均估计近年来受到了较大的关注,但对有测量误差的观测数据尚未见到任何研究.文章主要考虑了线性测量误差模型的平均估计问题,导出了模型平均估计的渐近分布,基于Hjort和Claeskens(2003)的思想构造了一个覆盖真实参数的概率趋于预定水平的置信区间,并证明了该置信区间与基于全模型正态逼近所构造的置信区间的渐近等价性.模拟结果表明当协变量存在测量误差时,模型平均估计能明显增加点估计的效率. 相似文献

5.

Robust Simulation-Based Estimation of ARMA Models

《Journal of computational and graphical statistics》2013,22(2):370-387

This article proposes a new approach to the robust estimation of a mixed autoregressive and moving average (ARMA) model. It is based on the indirect inference method that originally was proposed for models with an intractable likelihood function. The estimation algorithm proposed is based on an auxiliary autoregressive representation whose parameters are first estimated on the observed time series and then on data simulated from the ARMA model. To simulate data the parameters of the ARMA model have to be set. By varying these we can minimize a distance between the simulation-based and the observation-based auxiliary estimate. The argument of the minimum yields then an estimator for the parameterization of the ARMA model. This simulation-based estimation procedure inherits the properties of the auxiliary model estimator. For instance, robustness is achieved with GM estimators. An essential feature of the introduced estimator, compared to existing robust estimators for ARMA models, is its theoretical tractability that allows us to show consistency and asymptotic normality. Moreover, it is possible to characterize the influence function and the breakdown point of the estimator. In a small sample Monte Carlo study it is found that the new estimator performs fairly well when compared with existing procedures. Furthermore, with two real examples, we also compare the proposed inferential method with two different approaches based on outliers detection. 相似文献

6.

Frequentist model averaging for threshold models

Gao Yan Zhang Xinyu Wang Shouyang Chong Terence Tai-leung Zou Guohua 《Annals of the Institute of Statistical Mathematics》2019,71(2):275-306

Annals of the Institute of Statistical Mathematics - This paper develops a frequentist model averaging approach for threshold model specifications. The resulting estimator is proved to be... 相似文献

7.

Conditional likelihood estimation and efficiency comparisons in proportional odds model with missing covariates

S. H. Hsieh S. M. Lee P. S. Shen M. F. Liu 《Annals of the Institute of Statistical Mathematics》2011,63(5):887-921

In this article, a conditional likelihood approach is developed for dealing with ordinal data with missing covariates in proportional odds model. Based on the validation data set, we propose the Breslow and Cain (Biometrika 75:11–20, 1988) type estimators using different estimates of the selection probabilities, which may be treated as nuisance parameters. Under the assumption that the observed covariates and surrogate variables are categorical, we present large sample theory for the proposed estimators and show that they are more efficient than the estimator using the true selection probabilities. Simulation results support the theoretical analysis. We also illustrate the approaches using data from a survey of cable TV satisfaction. 相似文献

8.

Semiparametric estimation in regression with missing covariates using single-index models

Sun Zhuoer Wang Suojin 《Annals of the Institute of Statistical Mathematics》2019,71(5):1201-1232

We investigate semiparametric estimation of regression coefficients through generalized estimating equations with single-index models when some covariates are missing at random. Existing popular semiparametric estimators may run into difficulties when some selection probabilities are small or the dimension of the covariates is not low. We propose a new simple parameter estimator using a kernel-assisted estimator for the augmentation by a single-index model without using the inverse of selection probabilities. We show that under certain conditions the proposed estimator is as efficient as the existing methods based on standard kernel smoothing, which are often practically infeasible in the case of multiple covariates. A simulation study and a real data example are presented to illustrate the proposed method. The numerical results show that the proposed estimator avoids some numerical issues caused by estimated small selection probabilities that are needed in other estimators.

相似文献

9.

Outlier detection and robust covariance estimation using mathematical programming

Tri-Dzung Nguyen Roy E. Welsch 《Advances in Data Analysis and Classification》2010,4(4):301-334

The outlier detection problem and the robust covariance estimation problem are often interchangeable. Without outliers, the classical method of maximum likelihood estimation (MLE) can be used to estimate parameters of a known distribution from observational data. When outliers are present, they dominate the log likelihood function causing the MLE estimators to be pulled toward them. Many robust statistical methods have been developed to detect outliers and to produce estimators that are robust against deviation from model assumptions. However, the existing methods suffer either from computational complexity when problem size increases or from giving up desirable properties, such as affine equivariance. An alternative approach is to design a special mathematical programming model to find the optimal weights for all the observations, such that at the optimal solution, outliers are given smaller weights and can be detected. This method produces a covariance estimator that has the following properties: First, it is affine equivariant. Second, it is computationally efficient even for large problem sizes. Third, it easy to incorporate prior beliefs into the estimator by using semi-definite programming. The accuracy of this method is tested for different contamination models, including recently proposed ones. The method is not only faster than the Fast-MCD method for high dimensional data but also has reasonable accuracy for the tested cases. 相似文献

10.

响应变量删失情况下线性模型的FIC模型选择和模型平均

下载免费PDF全文

孙志猛马景义苏治《中国科学:数学》2013,43(7):647-661

本文给出了响应变量随机右删失情况下线性模型的FIC (focused information criterion) 模型选择方法和光滑FIC 模型平均估计方法, 证明了兴趣参数的FIC 模型选择估计和光滑FIC 模型平均估计的渐近正态性, 通过随机模拟研究了估计的有限样本性质, 模拟结果显示, 从均方误差和一定置信水平置信区间的经验覆盖概率看, 兴趣参数的光滑FIC 模型平均估计均优于FIC, AIC (Akaikeinformation criterion) 和BIC (Bayesian information citerion) 等模型选择估计; 而FIC 模型选择估计与AIC 和BIC 等模型选择估计相比, 也表现出了一定的优越性. 通过分析原发性胆汁性肝硬化数据集, 说明了本文方法在实际问题中的应用. 相似文献

11.

Post selection shrinkage estimation for high‐dimensional data analysis

下载免费PDF全文

Xiaoli Gao S. E. Ahmed Yang Feng 《商业与工业应用随机模型》2017,33(2):97-120

In high‐dimensional data settings where p ? n , many penalized regularization approaches were studied for simultaneous variable selection and estimation. However, with the existence of covariates with weak effect, many existing variable selection methods, including Lasso and its generations, cannot distinguish covariates with weak and no contribution. Thus, prediction based on a subset model of selected covariates only can be inefficient. In this paper, we propose a post selection shrinkage estimation strategy to improve the prediction performance of a selected subset model. Such a post selection shrinkage estimator (PSE) is data adaptive and constructed by shrinking a post selection weighted ridge estimator in the direction of a selected candidate subset. Under an asymptotic distributional quadratic risk criterion, its prediction performance is explored analytically. We show that the proposed post selection PSE performs better than the post selection weighted ridge estimator. More importantly, it improves the prediction performance of any candidate subset model selected from most existing Lasso‐type variable selection methods significantly. The relative performance of the post selection PSE is demonstrated by both simulation studies and real‐data analysis. Copyright © 2016 John Wiley & Sons, Ltd. 相似文献

12.

Multiple imputations and the missing censoring indicator model

Sundarraman Subramanian 《Journal of multivariate analysis》2011,102(1):105-117

Semiparametric random censorship (SRC) models (Dikta, 1998) provide an attractive framework for estimating survival functions when censoring indicators are fully or partially available. When there are missing censoring indicators (MCIs), the SRC approach employs a model-based estimate of the conditional expectation of the censoring indicator given the observed time, where the model parameters are estimated using only the complete cases. The multiple imputations approach, on the other hand, utilizes this model-based estimate to impute the missing censoring indicators and form several completed data sets. The Kaplan-Meier and SRC estimators based on the several completed data sets are averaged to arrive at the multiple imputations Kaplan-Meier (MIKM) and the multiple imputations SRC (MISRC) estimators. While the MIKM estimator is asymptotically as efficient as or less efficient than the standard SRC-based estimator that involves no imputations, here we investigate the performance of the MISRC estimator and prove that it attains the benchmark variance set by the SRC-based estimator. We also present numerical results comparing the performances of the estimators under several misspecified models for the above mentioned conditional expectation. 相似文献

13.

Toward Automatic Model Comparison: An Adaptive Sequential Monte Carlo Approach

Yan Zhou Adam M. Johansen John A.D. Aston 《Journal of computational and graphical statistics》2016,25(3):701-726

Model comparison for the purposes of selection, averaging, and validation is a problem found throughout statistics. Within the Bayesian paradigm, these problems all require the calculation of the posterior probabilities of models within a particular class. Substantial progress has been made in recent years, but difficulties remain in the implementation of existing schemes. This article presents adaptive sequential Monte Carlo (SMC) sampling strategies to characterize the posterior distribution of a collection of models, as well as the parameters of those models. Both a simple product estimator and a combination of SMC and a path sampling estimator are considered and existing theoretical results are extended to include the path sampling variant. A novel approach to the automatic specification of distributions within SMC algorithms is presented and shown to outperform the state of the art in this area. The performance of the proposed strategies is demonstrated via an extensive empirical study. Comparisons with state-of-the-art algorithms show that the proposed algorithms are always competitive, and often substantially superior to alternative techniques, at equal computational cost and considerably less application-specific implementation effort. Supplementary materials for this article are available online. 相似文献

14.

Model selection bias and Freedman’s paradox 总被引：2，自引：0，他引：2

Paul M. Lukacs Kenneth P. Burnham David R. Anderson 《Annals of the Institute of Statistical Mathematics》2010,62(1):117-125

In situations where limited knowledge of a system exists and the ratio of data points to variables is small, variable selection methods can often be misleading. Freedman (Am Stat 37:152–155, 1983) demonstrated how common it is to select completely unrelated variables as highly “significant” when the number of data points is similar in magnitude to the number of variables. A new type of model averaging estimator based on model selection with Akaike’s AIC is used with linear regression to investigate the problems of likely inclusion of spurious effects and model selection bias, the bias introduced while using the data to select a single seemingly “best” model from a (often large) set of models employing many predictor variables. The new model averaging estimator helps reduce these problems and provides confidence interval coverage at the nominal level while traditional stepwise selection has poor inferential properties. 相似文献

15.

The benefit of data-based model complexity selection via prediction error curves in time-to-event data

Christine Porzelius Martin Schumacher Harald Binder 《Computational Statistics》2011,26(2):293-302

The fitting of predictive survival models usually involves determination of model complexity parameters. Up to now, there was no general applicable model selection criterion for semi- or non-parametric approaches. The integrated prediction error curve, an estimator of the integrated Brier score, has the ability to close this gap and allows a reasonable, data-based choice of complexity parameters for any kind of model where risk predictions can be obtained. Random survival forests are used as example throughout the article. Here, a critical complexity parameter might be the number of candidate variables at each node. Model selection by our integrated prediction error curve criterion is compared to a frequently used rule of thumb, investigating the potential benefit regarding prediction performance. For that, simulated microarray survival data as well as two real data sets of patients with diffuse large-B-cell lymphoma and of patients with neuroblastoma are used. It is shown, that the optimal parameter value depends on the amount of information in the data and that a data-based selection can therefore be beneficial in several settings. 相似文献

16.

异方差情形下平均处理效应的半参数估计

王培周亚虹《数理统计与管理》2010,29(6)

本文主要探讨在扰动项分布对称的假定下,平均处理效应的半参数估计。本文考虑了一种非常普遍形式的异方差,使得我们在估计平均处理效应时,大大扩展了对异方差的处理范围。本文给出了N~(1/2)收敛速度的一致估计量及其渐进正态性质。本文遵循参数框架下常见的两步估计方法,这种方法广泛地运用于半参数的研究中。一个简单的Monte Carlo模拟将用来对比说明本文中估计方法的实际意义。相似文献

17.

Using thresholding difference-based estimators for variable selection in partial linear models

June Luo Patrick Gerard 《Statistics & probability letters》2013

A commonly used semiparametric model is considered. We adopt two difference based estimators of the linear component of the model and propose corresponding thresholding estimators that can be used for variable selection. For each thresholding estimator, variable selection in the linear component is developed and consistency of the variable selection procedure is shown. We evaluate our method in a simulation study and implement it on a real data set. 相似文献

18.

Frequentist standard errors of Bayes estimators

DongHyuk Lee Raymond J. Carroll Samiran Sinha 《Computational Statistics》2017,32(3):867-888

Frequentist standard errors are a measure of uncertainty of an estimator, and the basis for statistical inferences. Frequestist standard errors can also be derived for Bayes estimators. However, except in special cases, the computation of the standard error of Bayesian estimators requires bootstrapping, which in combination with Markov chain Monte Carlo can be highly time consuming. We discuss an alternative approach for computing frequentist standard errors of Bayesian estimators, including importance sampling. Through several numerical examples we show that our approach can be much more computationally efficient than the standard bootstrap. 相似文献

19.

Robust Depth-Weighted Wavelet for Nonparametric Regression Models 总被引：2，自引：0，他引：2

Lu LIN 《数学学报(英文版)》2005,21(3):585-592

In the nonparametric regression models, the original regression estimators including kernel estimator, Fourier series estimator and wavelet estimator are always constructed by the weighted sum of data, and the weights depend only on the distance between the design points and estimation points. As a result these estimators are not robust to the perturbations in data. In order to avoid this problem, a new nonparametric regression model, called the depth-weighted regression model, is introduced and then the depth-weighted wavelet estimation is defined. The new estimation is robust to the perturbations in data, which attains very high breakdown value close to 1/2. On the other hand, some asymptotic behaviours such as asymptotic normality are obtained. Some simulations illustrate that the proposed wavelet estimator is more robust than the original wavelet estimator and, as a price to pay for the robustness, the new method is slightly less efficient than the original method. 相似文献

20.

Robust variable selection for finite mixture regression models

Qingguo Tang R. J. Karunamuni 《Annals of the Institute of Statistical Mathematics》2018,70(3):489-521

Finite mixture regression (FMR) models are frequently used in statistical modeling, often with many covariates with low significance. Variable selection techniques can be employed to identify the covariates with little influence on the response. The problem of variable selection in FMR models is studied here. Penalized likelihood-based approaches are sensitive to data contamination, and their efficiency may be significantly reduced when the model is slightly misspecified. We propose a new robust variable selection procedure for FMR models. The proposed method is based on minimum-distance techniques, which seem to have some automatic robustness to model misspecification. We show that the proposed estimator has the variable selection consistency and oracle property. The finite-sample breakdown point of the estimator is established to demonstrate its robustness. We examine small-sample and robustness properties of the estimator using a Monte Carlo study. We also analyze a real data set. 相似文献