共查询到20条相似文献,搜索用时 15 毫秒
1.
This research attempts to solve the problem of dealing with missing data via the interface of Data Envelopment Analysis (DEA) and human behavior. Missing data is under continuing discussion in various research fields, especially those highly dependent on data. In practice and research, some necessary data may not be obtained in many cases, for example, procedural factors, lack of needed responses, etc. Thus the question of how to deal with missing data is raised. In this paper, modified DEA models are developed to estimate the appropriate value of missing data in its interval, based on DEA and Inter-dimensional Similarity Halo Effect. The estimated value of missing data is determined by the General Impression of original DEA efficiency. To evaluate the effectiveness of this method, the impact factor is proposed. In addition, the advantages of the proposed approach are illustrated in comparison with previous methods. 相似文献
2.
针对现实生活中大量数据存在偏斜的情况,构建偏正态数据下的众数回归模型.又加之数据的缺失常有发生,采用插补方法处理缺失数据集,为比较插补效果,考虑对响应变量随机缺失情形进行统计推断研究.利用高斯牛顿迭代法给出众数回归模型参数的极大似然估计,比较该模型在均值插补,回归插补,众数插补三种插补条件下的插补效果.随机模拟和实例分... 相似文献
3.
The restricted EM algorithm under linear inequalities in a linear model with missing data 总被引:1,自引:0,他引:1
ZHENG Shurong SHI Ningzhong & GUO Jianhua School of Mathematics Statistics Northeast Normal University Changchun China Institute of Mathematics Jilin University Changchun China 《中国科学A辑(英文版)》2005,48(6):819-828
This paper discusses the maximum likelihood estimate of βunder linear inequalities A0β≥a in a linear model with missing data, proposes the restricted EM algorithm and proves the convergence. 相似文献
4.
We develop two methods for imputing missing values in regression situations. We examine the standard fixed-effects linear-regression model Y = X β + ?, where the regressors X are fixed and ? is the error term. This research focuses on the problem of missing X values. A particular component of market-share analysis has motivated this research where the price and other promotional instruments of each brand are allowed to have their own impact on the total sales volume in a consumer-products category. When a brand is not distributed in a particular week, only a few of the many measures occurring in that observation are missing. ‘What values should be imputed for the missing measures?’ is the central question this paper addresses. This context creates a unique problem in the missing-data literature, i.e. there is no true value for the missing measure. Using influence functions, from robust statistics we develop two loss functions, each of which is a function of the missing and existing X values. These loss functions turn out to be sums of ratios of low-order polynomials. The minimization of either loss function is an unconstrained non-linear-optimization problem. The solution to this non-linear optimization leads to imputed values that have minimal influence on the estimates of the parameters of the regression model. Estimates using the method for replacing missing values are compared with estimates obtained via some conventional methods. 相似文献
5.
This paper deals in the nonparametric estimation of additive models in the presence of missing data in the response variable. Specifically in the case of additive models estimated by the Backfitting algorithm with local polynomial smoothers [1]. Three estimators are presented, one based on the available data and two based on a complete sample from imputation techniques. We also develop a data-driven local bandwidth selector based on a Wild Bootstrap approximation of the mean squared error of the estimators. The performance of the estimators and the local bootstrap bandwidth selection method are explored through simulation experiments. 相似文献
6.
E. Álvarez A. Arcos S. González J.F. Muñoz M. Rueda 《Journal of Computational and Applied Mathematics》2013
This paper discusses the estimation of a population proportion in the presence of missing data and using auxiliary information at the estimation stage. A general class of estimators, which make efficient use of the available information, are proposed. Some theoretical properties of the proposed estimators are analyzed, and they allow us to find the optimal value for the proposed class in the sense of minimal variance. The optimal estimator is thus more efficient than the customary estimator. Results derived from a simulation study indicate that the proposed optimal estimator gives desirable results in comparison to alternative estimators. 相似文献
7.
Singular spectrum analysis is a natural generalization of principal component methods for time series data. In this paper we propose an imputation method to be used with singular spectrum-based techniques which is based on a weighted combination of the forecasts and hindcasts yield by the recurrent forecast method. Despite its ease of implementation, the obtained results suggest an overall good fit of our method, being able to yield a similar adjustment ability in comparison with the alternative method, according to some measures of predictive performance. 相似文献
8.
Ying Wang Weiming Wan Rui-Sheng Wang Enmin Feng 《Journal of Computational and Applied Mathematics》2009
Mutual information can be used as a measure for the association of a genetic marker or a combination of markers with the phenotype. In this paper, we study the imputation of missing genotype data. We first utilize joint mutual information to compute the dependence between SNP sites, then construct a mathematical model in order to find the two SNP sites having maximal dependence with missing SNP sites, and further study the properties of this model. Finally, an extension method to haplotype-based imputation is proposed to impute the missing values in genotype data. To verify our method, extensive experiments have been performed, and numerical results show that our method is superior to haplotype-based imputation methods. At the same time, numerical results also prove joint mutual information can better measure the dependence between SNP sites. According to experimental results, we also conclude that the dependence between the adjacent SNP sites is not necessarily strongest. 相似文献
9.
Maximum likelihood inference for the Cox regression model with applications to missing covariates 总被引:1,自引:0,他引:1
In this paper, we carry out an in-depth theoretical investigation for existence of maximum likelihood estimates for the Cox model [D.R. Cox, Regression models and life tables (with discussion), Journal of the Royal Statistical Society, Series B 34 (1972) 187–220; D.R. Cox, Partial likelihood, Biometrika 62 (1975) 269–276] both in the full data setting as well as in the presence of missing covariate data. The main motivation for this work arises from missing data problems, where models can easily become difficult to estimate with certain missing data configurations or large missing data fractions. We establish necessary and sufficient conditions for existence of the maximum partial likelihood estimate (MPLE) for completely observed data (i.e., no missing data) settings as well as sufficient conditions for existence of the maximum likelihood estimate (MLE) for survival data with missing covariates via a profile likelihood method. Several theorems are given to establish these conditions. A real dataset from a cancer clinical trial is presented to further illustrate the proposed methodology. 相似文献
10.
New clustering methods for interval data 总被引:3,自引:0,他引:3
Marie Chavent Francisco de A. T. de Carvalho Yves Lechevallier Rosanna Verde 《Computational Statistics》2006,21(2):211-229
Summary In this paper we propose two clustering methods for interval data based on the dynamic cluster algorithm. These methods use
different homogeneity criteria as well as different kinds of cluster representations (prototypes). Some tools to interpret
the final partitions are also introduced. An application of one of the methods concludes the paper. 相似文献
11.
Suppose that there are two nonparametric populations x and y with missing data on both of them. We are interested in constructing confidence intervals on the quantile differences of
x and y. Random imputation is used. Empirical likelihood confidence intervals on the differences are constructed.
Supported by the National Natural Science Foundation of China (No. 10661003) and Natural Science Foundation of Guangxi (No.
0728092). 相似文献
12.
Detecting population (group) differences is useful in many applications, such as medical research. In this paper, we explore the probabilistic theory for identifying the quantile differences .between two populations. Suppose that there are two populations x and y with missing data on both of them, where x is nonparametric and y is parametric. We are interested in constructing confidence intervals on the quantile differences of x and y. Random hot deck imputation is used to fill in missing data. Semi-empirical likelihood confidence intervals on the differences are constructed. 相似文献
13.
In model-based cluster analysis, the expectation-maximization (EM) algorithm has a number of desirable properties, but in
some situations, this algorithm can be slow to converge. Some variants are proposed to speed-up EM in reducing the time spent
in the E-step, in the case of Gaussian mixture. The main aims of such methods is first to speed-up convergence of EM, and
second to yield same results (or not so far) than EM itself. In this paper, we compare these methods from categorical data,
with the latent class model, and we propose a new variant that sustains better results on synthetic and real data sets, in
terms of convergence speed-up and number of misclassified objects. 相似文献
14.
Tadashi Nakamura 《Annals of the Institute of Statistical Mathematics》1984,36(1):375-393
Summary A new type of random sample, called a generalized censored data sample, is defined. An approach to finding criteria for the
existence of a maximum likelihood estiamte from a finite generalized censored data sample is presented. This approach, named
the probability contents boundary analysis, gives systematically a number of practical criteria, each of which is effective
for various kinds of typical distribution families in statistical analysis. 相似文献
15.
A. Pérez-González J. M. Vilar-Fernández W. González-Manteiga 《Annals of the Institute of Statistical Mathematics》2009,61(1):85-109
The main objective of this work is the nonparametric estimation of the regression function with correlated errors when observations
are missing in the response variable. Two nonparametric estimators of the regression function are proposed. The asymptotic
properties of these estimators are studied; expresions for the bias and the variance are obtained and the joint asymptotic
normality is established. A simulation study is also included. 相似文献
16.
何书元 《应用数学学报(英文版)》1994,10(1):12-33
ESTIMATINGADISTRIBUTIONFUNCTIONWITHTRUNCATEDDATAHESHUYUAN(何书元)(DepartmentofProbabilityandStatistics,PekingUniversityBeijing10... 相似文献
17.
Kentaro Tanaka Akimichi Takemura 《Annals of the Institute of Statistical Mathematics》2005,57(1):1-19
We consider maximum likelihood estimation of finite mixture of uniform distributions. We prove that maximum likelihood estimator
is strongly consistent, if the scale parameters of the component uniform distributions are restricted from below by exp(−n
d
), 0<d<1, wheren is the sample size. 相似文献
18.
An approach to dealing with missing data, both during the design and normal operation of a neuro-fuzzy classifier is presented in this paper. Missing values are processed within a general fuzzy min–max neural network architecture utilising hyperbox fuzzy sets as input data cluster prototypes. An emphasis is put on ways of quantifying the uncertainty which missing data might have caused. This takes a form of classification procedure whose primary objective is the reduction of a number of viable alternatives rather than attempting to produce one winning class without supporting evidence. If required, the ways of selecting the most probable class among the viable alternatives found during the primary classification step, which are based on utilising the data frequency information, are also proposed. The reliability of the classification and the completeness of information is communicated by producing upper and lower classification membership values similar in essence to plausibility and belief measures to be found in the theory of evidence or possibility and necessity values to be found in the fuzzy sets theory. Similarities and differences between the proposed method and various fuzzy, neuro-fuzzy and probabilistic algorithms are also discussed. A number of simulation results for well-known data sets are provided in order to illustrate the properties and performance of the proposed approach. 相似文献
19.
本文讨论部分缺失数据两柏松分布总体的参数估计和总体相同的似然比检验,证明了估计的强相合性和渐近正态性,给出了似然比检验的极限分布,并讨论了基于精确分布的检验问题. 相似文献
20.
A hierarchical model for the joint mortality analysis of pension scheme data with missing covariates
A hierarchical model is developed for the joint mortality analysis of pension scheme datasets. The proposed model allows for a rigorous statistical treatment of missing data. While our approach works for any missing data pattern, we are particularly interested in a scenario where some covariates are observed for members of one pension scheme but not the other. Therefore, our approach allows for the joint modelling of datasets which contain different information about individual lives. The proposed model generalizes the specification of parametric models when accounting for covariates. We consider parameter uncertainty using Bayesian techniques. Model parametrization is analysed in order to obtain an efficient MCMC sampler, and address model selection. The inferential framework described here accommodates any missing-data pattern, and turns out to be useful to analyse statistical relationships among covariates. Finally, we assess the financial impact of using the covariates, and of the optimal use of the whole available sample when combining data from different mortality experiences. 相似文献