首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper presents a decomposition for the posterior distribution of the covarianee matrix of normal models under a family of prior distributions when missing data are ignorable and monotone. This decomposition is an extension of Bartlett′s decomposition of the Wishart distribution to monotone missing data. It is not only theoretically interesting but also practically useful. First, with monotone missing data, it allows more efficient drawing of parameters from the posterior distribution than the factorized likelihood approach. Furthermore, with nonmonotone missing data, it allows for a very efficient monotone date augmentation algorithm and thereby multiple imputation or the missing data needed to create a monotone pattern.  相似文献   

2.
Missing data mechanism often depends on the values of the responses, which leads to nonignorable nonresponses. In such a situation, inference based on approaches that ignore the missing data mechanism could not be valid. A crucial step is to model the nature of missingness. We specify a parametric model for missingness mechanism, and then propose a conditional score function approach for estimation. This approach imputes the score function by taking the conditional expectation of the score function for the missing data given the available information. Inference procedure is then followed by replacing unknown terms with the related nonparametric estimators based on the observed data. The proposed score function does not suffer from the non-identifiability problem, and the proposed estimator is shown to be consistent and asymptotically normal. We also construct a confidence region for the parameter of interest using empirical likelihood method. Simulation studies demonstrate that the proposed inference procedure performs well in many settings. We apply the proposed method to a data set from research in a growth hormone and exercise intervention study.  相似文献   

3.
Multiple imputation (MI) methods have been widely applied in economic applications as a robust statistical way to incorporate data where some observations have missing values for some variables. However in stochastic frontier analysis (SFA), application of these techniques has been sparse and the case for such models has not received attention in the appropriate academic literature. This paper fills this gap and explores the robust properties of MI within the stochastic frontier context. From a methodological perspective, we depart from the standard MI literature by demonstrating, conceptually and through simulation, that it is not appropriate to use imputations of the dependent variable within the SFA modelling, although they can be useful to predict the values of missing explanatory variables. Fundamentally, this is because efficiency analysis involves decomposing a residual into noise and inefficiency and as a result any imputation of a dependent variable would be imputing efficiency based on some concept of average inefficiency in the sample. A further contribution that we discuss and illustrate for the first time in the SFA literature, is that using auxiliary variables (outside of those contained in the SFA model) can enhance the imputations of missing values. Our empirical example neatly articulates that often the source of missing data is only a sub-set of components comprising a part of a composite (or complex) measure and that the other parts that are observed are very useful in predicting the value.  相似文献   

4.
Most current implementations of multiple imputation (MI) assume that data are missing at random (MAR), but this assumption is generally untestable. We performed analyses to test the effects of auxiliary variables on MI when the data are missing not at random (MNAR) using simulated data and real data. In the analyses we varied (a) the correlation, (b) the level of missing data, (c) the pattern of missing data, and (d) sample size. Results showed that MI performed adequately without auxiliary variables but they also had a modest impact on bias in the real data and improved efficiency in both data sets. The results of this study suggest that, counter to the concern about the violation of the MAR assumption, MI appears to be quite robust to missing data that are MNAR in analytic situations such as the ones presented here. Further, results can be made even better via the use of auxiliary variables, particularly when efficiency is a primary concern.  相似文献   

5.
The 2004 Basel II Accord has pointed out the benefits of credit risk management through internal models using internal data to estimate risk components: probability of default (PD), loss given default, exposure at default and maturity. Internal data are the primary data source for PD estimates; banks are permitted to use statistical default prediction models to estimate the borrowers’ PD, subject to some requirements concerning accuracy, completeness and appropriateness of data. However, in practice, internal records are usually incomplete or do not contain adequate history to estimate the PD. Current missing data are critical with regard to low default portfolios, characterised by inadequate default records, making it difficult to design statistically significant prediction models. Several methods might be used to deal with missing data such as list-wise deletion, application-specific list-wise deletion, substitution techniques or imputation models (simple and multiple variants). List-wise deletion is an easy-to-use method widely applied by social scientists, but it loses substantial data and reduces the diversity of information resulting in a bias in the model's parameters, results and inferences. The choice of the best method to solve the missing data problem largely depends on the nature of missing values (MCAR, MAR and MNAR processes) but there is a lack of empirical analysis about their effect on credit risk that limits the validity of resulting models. In this paper, we analyse the nature and effects of missing data in credit risk modelling (MCAR, MAR and NMAR processes) and take into account current scarce data set on consumer borrowers, which include different percents and distributions of missing data. The findings are used to analyse the performance of several methods for dealing with missing data such as likewise deletion, simple imputation methods, MLE models and advanced multiple imputation (MI) alternatives based on MarkovChain-MonteCarlo and re-sampling methods. Results are evaluated and discussed between models in terms of robustness, accuracy and complexity. In particular, MI models are found to provide very valuable solutions with regard to credit risk missing data.  相似文献   

6.
设两个样本数据不完全的线性模型,其中协变量的观测值不缺失,响应变量的观测值随机缺失。采用随机回归插补法对响应变量的缺失值进行补足,得到两个线性回归模型的"完全"样本数据,在一定条件下得到两响应变量分位数差异的对数经验似然比统计量的极限分布为加权x_1~2,并利用此结果构造分位数差异的经验似然置信区间。模拟结果表明在随机插补下得到的置信区间具有较高的覆盖精度。  相似文献   

7.
We propose a new method to impute missing values in mixed data sets. It is based on a principal component method, the factorial analysis for mixed data, which balances the influence of all the variables that are continuous and categorical in the construction of the principal components. Because the imputation uses the principal axes and components, the prediction of the missing values is based on the similarity between individuals and on the relationships between variables. The properties of the method are illustrated via simulations and the quality of the imputation is assessed using real data sets. The method is compared to a recent method (Stekhoven and Buhlmann Bioinformatics 28:113–118, 2011) based on random forest and shows better performance especially for the imputation of categorical variables and situations with highly linear relationships between continuous variables.  相似文献   

8.
In some multivariate problems with missing data, pairs of variables exist that are never observed together. For example, some modern biological tools can produce data of this form. As a result of this structure, the covariance matrix is only partially identifiable, and point estimation requires that identifying assumptions be made. These assumptions can introduce an unknown and potentially large bias into the inference. This paper presents a method based on semidefinite programming for automatically quantifying this potential bias by computing the range of possible equal-likelihood inferred values for convex functions of the covariance matrix. We focus on the bias of missing value imputation via conditional expectation and show that our method can give an accurate assessment of the true error in cases where estimates based on sampling uncertainty alone are overly optimistic.  相似文献   

9.
In many applications, some covariates could be missing for various reasons. Regression quantiles could be either biased or under-powered when ignoring the missing data. Multiple imputation and EM-based augment approach have been proposed to fully utilize the data with missing covariates for quantile regression. Both methods however are computationally expensive. We propose a fast imputation algorithm (FI) to handle the missing covariates in quantile regression, which is an extension of the fractional imputation in likelihood based regressions. FI and modified imputation algorithms (FIIPW and MIIPW) are compared to existing MI and IPW approaches in the simulation studies, and applied to part of of the National Collaborative Perinatal Project study.  相似文献   

10.
Principled techniques for incomplete-data problems are increasingly part of mainstream statistical practice. Among many proposed techniques so far, inference by multiple imputation (MI) has emerged as one of the most popular. While many strategies leading to inference by MI are available in cross-sectional settings, the same richness does not exist in multilevel applications. The limited methods available for multilevel applications rely on the multivariate adaptations of mixed-effects models. This approach preserves the mean structure across clusters and incorporates distinct variance components into the imputation process. In this paper, I add to these methods by considering a random covariance structure and develop computational algorithms. The attraction of this new imputation modeling strategy is to correctly reflect the mean and variance structure of the joint distribution of the data, and allow the covariances differ across the clusters. Using Markov Chain Monte Carlo techniques, a predictive distribution of missing data given observed data is simulated leading to creation of multiple imputations. To circumvent the large sample size requirement to support independent covariance estimates for the level-1 error term, I consider distributional impositions mimicking random-effects distributions assigned a priori. These techniques are illustrated in an example exploring relationships between victimization and individual and contextual level factors that raise the risk of violent crime.  相似文献   

11.
General procedures are proposed for nonparametric classification in the presence of missing covariates. Both kernel-based imputation as well as Horvitz-Thompson-type inverse weighting approaches are employed to handle the presence of missing covariates. In the case of imputation, it is a certain regression function which is being imputed (and not the missing values). Using the theory of empirical processes, the performance of the resulting classifiers is assessed by obtaining exponential bounds on the deviations of their conditional errors from that of the Bayes classifier. These bounds, in conjunction with the Borel-Cantelli lemma, immediately provide various strong consistency results.  相似文献   

12.
This paper considers the problem of parameter estimation in a general class of semiparametric models when observations are subject to missingness at random. The semiparametric models allow for estimating functions that are non-smooth with respect to the parameter. We propose a nonparametric imputation method for the missing values, which then leads to imputed estimating equations for the finite dimensional parameter of interest. The asymptotic normality of the parameter estimator is proved in a general setting, and is investigated in detail for a number of specific semiparametric models. Finally, we study the small sample performance of the proposed estimator via simulations.  相似文献   

13.
目的对医院出院病人调查表普遍存在的数据缺失进行填补与分析,以保证统计调查表的质量,为医院以及上级卫生部门了解现状,进行预策和决策提供技术支持和质量保证。方法运用SAS9.1,采用多重填补方法Markov Chain Monte Carlo(MCMC)模型对缺失数据进行多次填补并综合分析。结果MCMC填补10次的结果最优。结论(Multiple Imputation)MI方法在解决医院出院病人调查表数据缺失时有优势,发挥空间较大,且填补效率较高。  相似文献   

14.
本文对两个样本数据不完全的线性模型展开讨论, 其中线性模型协变量的观测值不缺失, 响应变量的观测值随机缺失(MAR). 我们采用逆概率加权填补方法对响应变量的缺失值进行补足, 得到两个线性回归模型``完全'样本数据, 在``完全'样本数据的基础上构造了响应变量分位数差异的对数经验似然比统计量. 与以往研究结果不同的是本文在一定条件下证明了该统计量的极限分布为标准, 降低了由于权系数估计带来的误差, 进一步构造出了精度更高的分位数差异的经验似然置信区间.  相似文献   

15.
In this paper, we consider the weighted local polynomial calibration estimation and imputation estimation of a non-parametric function when the data are right censored and the censoring indicators are missing at random, and establish the asymptotic normality of these estimators. As their applications, we derive the weighted local linear calibration estimators and imputation estimations of the conditional distribution function, the conditional density function and the conditional quantile function, and investigate the asymptotic normality of these estimators. Finally, the simulation studies are conducted to illustrate the finite sample performance of the estimators.  相似文献   

16.
A clustering method is presented for analysing multivariate binary data with missing values. When not all values are observed, Govaert3 has studied the relations between clustering methods and statistical models. The author has shown how the identification of a mixture of Bernoulli distributions with the same parameter for all clusters and for all variables corresponds to a clustering criterion which uses L1 distance characterizing the MNDBIN method (Marchetti8). He first generalized this model by selecting parameters which can depend on variables and finally by selecting parameters which can depend both on variables and on clusters. We use the previous models to derive a clustering method adapted to missing data. This method optimizes a criterion by a standard iterative partitioning algorithm which removes the necessity either to ignore objects or to substitute the missing data. We study several versions of this algorithm and, finally, a brief account is given of the application of this method to some simulated data.  相似文献   

17.
Conditionally specified statistical models are frequently constructed from one-parameter exponential family conditional distributions. One way to formulate such a model is to specify the dependence structure among random variables through the use of a Markov random field (MRF). A common assumption on the Gibbsian form of the MRF model is that dependence is expressed only through pairs of random variables, which we refer to as the “pairwise-only dependence” assumption. Based on this assumption, J. Besag (1974, J. Roy. Statist. Soc. Ser. B36, 192–225) formulated exponential family “auto-models” and showed the form that one-parameter exponential family conditional densities must take in such models. We extend these results by relaxing the pairwise-only dependence assumption, and we give a necessary form that one-parameter exponential family conditional densities must take under more general conditions of multiway dependence. Data on the spatial distribution of the European corn borer larvae are fitted using a model with Bernoulli conditional distributions and several dependence structures, including pairwise-only, three-way, and four-way dependencies.  相似文献   

18.
调查问卷中含缺失数据的等级变量的补缺方法   总被引:1,自引:0,他引:1  
讨论了调查问卷中等级变量缺失数据的补缺问题.基于多元统计学理论,并结合总体趋势和个体偏差,提出一种新的补缺方法,方法使得补缺值更加准确、真实,并且将此方法扩展到变量等级数不相等的调查问卷之中.  相似文献   

19.
This note introduces a monotony coefficient as a new measure of the monotone dependence in a two-dimensional sample. Some properties of this measure are derived. In particular, it is shown that the absolute value of the monotony coefficient for a two-dimensional sample is between |r| and 1, where r is the Pearson's correlation coefficient for the sample; that the monotony coefficient equals 1 for any monotone increasing sample and equals ?1 for any monotone decreasing sample. This article contains a few examples demonstrating that the monotony coefficient is a more accurate measure of the degree of monotone dependence for a non-linear relationship than the Pearson's, Spearman's and Kendall's correlation coefficients. The monotony coefficient is a tool that can be applied to samples in order to find dependencies between random variables; it is especially useful in finding couples of dependent variables in a big dataset of many variables. Undergraduate students in mathematics and science would benefit from learning and applying this measure of monotone dependence.  相似文献   

20.
The available methods to handle missing values in principal component analysis only provide point estimates of the parameters (axes and components) and estimates of the missing values. To take into account the variability due to missing values a multiple imputation method is proposed. First a method to generate multiple imputed data sets from a principal component analysis model is defined. Then, two ways to visualize the uncertainty due to missing values onto the principal component analysis results are described. The first one consists in projecting the imputed data sets onto a reference configuration as supplementary elements to assess the stability of the individuals (respectively of the variables). The second one consists in performing a principal component analysis on each imputed data set and fitting each obtained configuration onto the reference one with Procrustes rotation. The latter strategy allows to assess the variability of the principal component analysis parameters induced by the missing values. The methodology is then evaluated from a real data set.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号