首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 546 毫秒
1.
A finite mixture model using the multivariate t distribution has been well recognized as a robust extension of Gaussian mixtures. This paper presents an efficient PX-EM algorithm for supervised learning of multivariate t mixture models in the presence of missing values. To simplify the development of new theoretic results and facilitate the implementation of the PX-EM algorithm, two auxiliary indicator matrices are incorporated into the model and shown to be effective. The proposed methodology is a flexible mixture analyzer that allows practitioners to handle real-world multivariate data sets with complex missing patterns in a more efficient manner. The performance of computational aspects is investigated through a simulation study and the procedure is also applied to the analysis of real data with varying proportions of synthetic missing values.  相似文献   

2.
This article presents new computational techniques for multivariate longitudinal or clustered data with missing values. Current methodology for linear mixed-effects models can accommodate imbalance or missing data in a single response variable, but it cannot handle missing values in multiple responses or additional covariates. Applying a multivariate extension of a popular linear mixed-effects model, we create multiple imputations of missing values for subsequent analyses by a straightforward and effective Markov chain Monte Carlo procedure. We also derive and implement a new EM algorithm for parameter estimation which converges more rapidly than traditional EM algorithms because it does not treat the random effects as “missing data,” but integrates them out of the likelihood function analytically. These techniques are illustrated on models for adolescent alcohol use in a large school-based prevention trial.  相似文献   

3.
We establish computationally flexible methods and algorithms for the analysis of multivariate skew normal models when missing values occur in the data. To facilitate the computation and simplify the theoretic derivation, two auxiliary permutation matrices are incorporated into the model for the determination of observed and missing components of each observation. Under missing at random mechanisms, we formulate an analytically simple ECM algorithm for calculating parameter estimation and retrieving each missing value with a single-valued imputation. Gibbs sampling is used to perform a Bayesian inference on model parameters and to create multiple imputations for missing values. The proposed methodologies are illustrated through a real data set and comparisons are made with those obtained from fitting the normal counterparts.  相似文献   

4.
The problem of missing values is common in statistical analysis. One approach to deal with missing values is to delete the incomplete cases from the data set. This approach may disregard valuable information, especially in small samples. An alternative approach is to reconstruct the missing values using the information in the data set. The major purpose of this paper is to investigate how a neural network approach performs compared to statistical techniques for reconstructing missing values. The backpropagation algorithm is used as the learning method to reconstruct missing values. The results of back-propagation are compared with results from two methods, viz., (1) using averages, and (2) using iterative regression analysis, to compute missing values. Experimental results show that backpropagation consistently outperforms other methods in both the training and the test data sets, and suggest that the neural network approach is a useful tool for reconstructing missing values in multivariate analysis.  相似文献   

5.
A clustering method is presented for analysing multivariate binary data with missing values. When not all values are observed, Govaert3 has studied the relations between clustering methods and statistical models. The author has shown how the identification of a mixture of Bernoulli distributions with the same parameter for all clusters and for all variables corresponds to a clustering criterion which uses L1 distance characterizing the MNDBIN method (Marchetti8). He first generalized this model by selecting parameters which can depend on variables and finally by selecting parameters which can depend both on variables and on clusters. We use the previous models to derive a clustering method adapted to missing data. This method optimizes a criterion by a standard iterative partitioning algorithm which removes the necessity either to ignore objects or to substitute the missing data. We study several versions of this algorithm and, finally, a brief account is given of the application of this method to some simulated data.  相似文献   

6.
In this article, we propose and explore a multivariate logistic regression model for analyzing multiple binary outcomes with incomplete covariate data where auxiliary information is available. The auxiliary data are extraneous to the regression model of interest but predictive of the covariate with missing data. Horton and Laird [N.J. Horton, N.M. Laird, Maximum likelihood analysis of logistic regression models with incomplete covariate data and auxiliary information, Biometrics 57 (2001) 34–42] describe how the auxiliary information can be incorporated into a regression model for a single binary outcome with missing covariates, and hence the efficiency of the regression estimators can be improved. We consider extending the method of [9] to the case of a multivariate logistic regression model for multiple correlated outcomes, and with missing covariates and completely observed auxiliary information. We demonstrate that in the case of moderate to strong associations among the multiple outcomes, one can achieve considerable gains in efficiency from estimators in a multivariate model as compared to the marginal estimators of the same parameters.  相似文献   

7.
The paper addresses bivariate surface fitting problems, where data points lie on the vertices of a rectangular grid. Efficient and stable algorithms can be found in the literature to solve such problems. If data values are missing at some grid points, there exists a computational method for finding a least squares spline by fixing appropriate values for the missing data. We extended this technique to arbitrary least squares problems as well as to linear least squares problems with linear equality constraints. Numerical examples are given to show the effectiveness of the technique presented. AMS subject classification (2000)  65D05, 65D07, 65D10, 65F05, 65F20  相似文献   

8.
Nonparametric factorial designs for multivariate observations are considered under the framework of general rank-score statistics. Unlike most of the literature, we do not assume the continuity of the underlying distribution functions. The models studied include general repeated measures designs, compound symmetry designs, and designs for longitudinal data. In particular, designs for ordered categorical data are included. The vectors of the multivariate observations may have different lengths. Moreover, our general framework includes missing values and singular covariance matrices which occur quite frequently in practical data analysis problems. The asymptotic properties of the proposed statistics are studied under general nonparametric hypotheses as well as under a sequence of nonparametric contiguous alternatives. L2-consistent estimators for the unknown covariance matrices are given and two types of quadratic forms are considered for testing the nonparametric hypotheses. The results are applied to a two-way mixed model assuming compound symmetry and to a factorial design for longitudinal data. The main idea of the proofs is based on some moment inequalities for empirical distribution functions in mixed models. The details are provided in the Appendix.  相似文献   

9.
In some multivariate problems with missing data, pairs of variables exist that are never observed together. For example, some modern biological tools can produce data of this form. As a result of this structure, the covariance matrix is only partially identifiable, and point estimation requires that identifying assumptions be made. These assumptions can introduce an unknown and potentially large bias into the inference. This paper presents a method based on semidefinite programming for automatically quantifying this potential bias by computing the range of possible equal-likelihood inferred values for convex functions of the covariance matrix. We focus on the bias of missing value imputation via conditional expectation and show that our method can give an accurate assessment of the true error in cases where estimates based on sampling uncertainty alone are overly optimistic.  相似文献   

10.
Maximum likelihood estimation in finite mixture distributions is typically approached as an incomplete data problem to allow application of the expectation-maximization (EM) algorithm. In its general formulation, the EM algorithm involves the notion of a complete data space, in which the observed measurements and incomplete data are embedded. An advantage is that many difficult estimation problems are facilitated when viewed in this way. One drawback is that the simultaneous update used by standard EM requires overly informative complete data spaces, which leads to slow convergence in some situations. In the incomplete data context, it has been shown that the use of less informative complete data spaces, or equivalently smaller missing data spaces, can lead to faster convergence without sacrifying simplicity. However, in the mixture case, little progress has been made in speeding up EM. In this article we propose a component-wise EM for mixtures. It uses, at each iteration, the smallest admissible missing data space by intrinsically decoupling the parameter updates. Monotonicity is maintained, although the estimated proportions may not sum to one during the course of the iteration. However, we prove that the mixing proportions will satisfy this constraint upon convergence. Our proof of convergence relies on the interpretation of our procedure as a proximal point algorithm. For performance comparison, we consider standard EM as well as two other algorithms based on missing data space reduction, namely the SAGE and AECME algorithms. We provide adaptations of these general procedures to the mixture case. We also consider the ECME algorithm, which is not a data augmentation scheme but still aims at accelerating EM. Our numerical experiments illustrate the advantages of the component-wise EM algorithm relative to these other methods.  相似文献   

11.
In this paper, we carry out an in-depth theoretical investigation for existence of maximum likelihood estimates for the Cox model [D.R. Cox, Regression models and life tables (with discussion), Journal of the Royal Statistical Society, Series B 34 (1972) 187–220; D.R. Cox, Partial likelihood, Biometrika 62 (1975) 269–276] both in the full data setting as well as in the presence of missing covariate data. The main motivation for this work arises from missing data problems, where models can easily become difficult to estimate with certain missing data configurations or large missing data fractions. We establish necessary and sufficient conditions for existence of the maximum partial likelihood estimate (MPLE) for completely observed data (i.e., no missing data) settings as well as sufficient conditions for existence of the maximum likelihood estimate (MLE) for survival data with missing covariates via a profile likelihood method. Several theorems are given to establish these conditions. A real dataset from a cancer clinical trial is presented to further illustrate the proposed methodology.  相似文献   

12.
Multiple imputation (MI) has become a standard statistical technique for dealing with missing values. The CDC Anthrax Vaccine Research Program (AVRP) dataset created new challenges for MI due to the large number of variables of different types and the limited sample size. A common method for imputing missing data in such complex studies is to specify, for each of J variables with missing values, a univariate conditional distribution given all other variables, and then to draw imputations by iterating over the J conditional distributions. Such fully conditional imputation strategies have the theoretical drawback that the conditional distributions may be incompatible. When the missingness pattern is monotone, a theoretically valid approach is to specify, for each variable with missing values, a conditional distribution given the variables with fewer or the same number of missing values and sequentially draw from these distributions. In this article, we propose the “multiple imputation by ordered monotone blocks” approach, which combines these two basic approaches by decomposing any missingness pattern into a collection of smaller “constructed” monotone missingness patterns, and iterating. We apply this strategy to impute the missing data in the AVRP interim data. Supplemental materials, including all source code and a synthetic example dataset, are available online.  相似文献   

13.
A mixture approach to clustering is an important technique in cluster analysis. A mixture of multivariate multinomial distributions is usually used to analyze categorical data with latent class model. The parameter estimation is an important step for a mixture distribution. Described here are four approaches to estimating the parameters of a mixture of multivariate multinomial distributions. The first approach is an extended maximum likelihood (ML) method. The second approach is based on the well-known expectation maximization (EM) algorithm. The third approach is the classification maximum likelihood (CML) algorithm. In this paper, we propose a new approach using the so-called fuzzy class model and then create the fuzzy classification maximum likelihood (FCML) approach for categorical data. The accuracy, robustness and effectiveness of these four types of algorithms for estimating the parameters of multivariate binomial mixtures are compared using real empirical data and samples drawn from the multivariate binomial mixtures of two classes. The results show that the proposed FCML algorithm presents better accuracy, robustness and effectiveness. Overall, the FCML algorithm has the superiority over the ML, EM and CML algorithms. Thus, we recommend FCML as another good tool for estimating the parameters of mixture multivariate multinomial models.  相似文献   

14.
在时间序列建模过程中,数据的缺失会极大地影响模型的准确性,因此对缺失数据的填补尤为重要.选取北京市空气质量指数(AQI)数据。将其随机缺失10%.分别利用EM算法和polyfit直线拟合的方法对缺失值插补,补全数据后建立ARMA模型并作预测分析.结果表明,利用polyfit函数插补法具有较好的结果.  相似文献   

15.
多元$t$分布数据的局部影响分析   总被引:4,自引:0,他引:4       下载免费PDF全文
对于多元$t$分布数据, 直接应用其概率密度进行影响分析是困难的\bd 本文通过引入服从Gamma分布的权重, 将其表示为特定多元正态分布的混合\bd 在此基础上, 进而将权重视为缺失数据, 引入EM算法; 从而利用基于完全数据似然函数的条件期望进行局部影响分析\bd 本文进一步系统研究了加权扰动模型下的局部影响分析, 得到了相应的诊断统计量; 并通过两个实例说明了这种方法的有效性.  相似文献   

16.
Statistical Inference for Stochastic Processes - The problem of linear interpolation in the context of a multivariate time series having multiple (possibly non-consecutive) missing values is...  相似文献   

17.
The available methods to handle missing values in principal component analysis only provide point estimates of the parameters (axes and components) and estimates of the missing values. To take into account the variability due to missing values a multiple imputation method is proposed. First a method to generate multiple imputed data sets from a principal component analysis model is defined. Then, two ways to visualize the uncertainty due to missing values onto the principal component analysis results are described. The first one consists in projecting the imputed data sets onto a reference configuration as supplementary elements to assess the stability of the individuals (respectively of the variables). The second one consists in performing a principal component analysis on each imputed data set and fitting each obtained configuration onto the reference one with Procrustes rotation. The latter strategy allows to assess the variability of the principal component analysis parameters induced by the missing values. The methodology is then evaluated from a real data set.  相似文献   

18.
In this paper, we describe models for dependent multivariate survival data using finite mixtures of positive stable frailty distributions. We investigate the cross-ratio function as a local measure of association. We estimate the parameters in the stable mixture together with the parameters of the (conditional) proportional hazards model in a Bayesian framework using Markov chain Monte Carlo algorithms. We illustrate the methodology using data on kidney infections.  相似文献   

19.
A class of multivariate distributions that are mixtures of the positive powers of a max-infinitely divisible distribution are studied. A subclass has the property that all weighted minima or maxima belong to a given location or scale family. By choosing appropriate parametric families for the mixing distribution and the distribution being mixed, families of multivariate copulas with a flexible dependence structure and with closed form cumulative distribution functions are obtained. Some dependence properties of the class, as well as some characterizations, are given. Conditions for max-infinite divisibility of multivariate distributions are obtained.  相似文献   

20.
Generalized canonical correlation analysis is a versatile technique that allows the joint analysis of several sets of data matrices. The generalized canonical correlation analysis solution can be obtained through an eigenequation and distributional assumptions are not required. When dealing with multiple set data, the situation frequently occurs that some values are missing. In this paper, two new methods for dealing with missing values in generalized canonical correlation analysis are introduced. The first approach, which does not require iterations, is a generalization of the Test Equating method available for principal component analysis. In the second approach, missing values are imputed in such a way that the generalized canonical correlation analysis objective function does not increase in subsequent steps. Convergence is achieved when the value of the objective function remains constant. By means of a simulation study, we assess the performance of the new methods. We compare the results with those of two available methods; the missing-data passive method, introduced in Gifi’s homogeneity analysis framework, and the GENCOM algorithm developed by Green and Carroll. An application using world bank data is used to illustrate the proposed methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号