首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We present a very fast algorithm for general matrix factorization of a data matrix for use in the statistical analysis of high-dimensional data via latent factors. Such data are prevalent across many application areas and generate an ever-increasing demand for methods of dimension reduction in order to undertake the statistical analysis of interest. Our algorithm uses a gradient-based approach which can be used with an arbitrary loss function provided the latter is differentiable. The speed and effectiveness of our algorithm for dimension reduction is demonstrated in the context of supervised classification of some real high-dimensional data sets from the bioinformatics literature.  相似文献   

2.
变量选择控制图是高维统计过程监控的重要方法。针对传统变量选择控制图较少考虑高维过程空间相关性而造成监控效率低的问题,提出一种基于Fused-LASSO的高维空间相关过程监控模型。首先,利用Fused LASSO算法对似然比检验进行改进;然后,推导出基于惩罚似然比的监控统计量;最后,通过仿真模拟和真实案例分析所提监控模型的性能。仿真实验和真实案例均表明:在高维空间相关过程中,当相邻监控变量同时发生异常时,利用所提监控方法能够准确识别潜在异常变量,取得较好的监控效果。  相似文献   

3.
We propose an ?1-penalized algorithm for fitting high-dimensional generalized linear mixed models (GLMMs). GLMMs can be viewed as an extension of generalized linear models for clustered observations. Our Lasso-type approach for GLMMs should be mainly used as variable screening method to reduce the number of variables below the sample size. We then suggest a refitting by maximum likelihood based on the selected variables only. This is an effective correction to overcome problems stemming from the variable screening procedure that are more severe with GLMMs than for generalized linear models. We illustrate the performance of our algorithm on simulated as well as on real data examples. Supplementary materials are available online and the algorithm is implemented in the R package glmmixedlasso.  相似文献   

4.
Boosting is a successful method for dealing with problems of high-dimensional classification of independent data. However, existing variants do not address the correlations in the context of longitudinal or cluster study-designs with measurements collected across two or more time points or in clusters. This article presents two new variants of boosting with a focus on high-dimensional classification problems with matched-pair binary responses or, more generally, any correlated binary responses. The first method is based on the generic functional gradient descent algorithm and the second method is based on a direct likelihood optimization approach. The performance and the computational requirements of the algorithms were evaluated using simulations. Whereas the performance of the two methods is similar, the computational efficiency of the generic-functional-gradient-descent-based algorithm far exceeds that of the direct-likelihood-optimization-based algorithm. The former method is illustrated using data on gene expression changes in de novo and relapsed childhood acute lymphoblastic leukemia. Computer code implementing the algorithms and the relevant dataset are available online as supplemental materials.  相似文献   

5.
Joint latent class modeling of disease prevalence and high-dimensional semicontinuous biomarker data has been proposed to study the relationship between diseases and their related biomarkers. However, statistical inference of the joint latent class modeling approach has proved very challenging due to its computational complexity in seeking maximum likelihood estimates. In this article, we propose a series of composite likelihoods for maximum composite likelihood estimation, as well as an enhanced Monte Carlo expectation–maximization (MCEM) algorithm for maximum likelihood estimation, in the context of joint latent class models. Theoretically, the maximum composite likelihood estimates are consistent and asymptotically normal. Numerically, we have shown that, as compared to the MCEM algorithm that maximizes the full likelihood, not only the composite likelihood approach that is coupled with the quasi-Newton method can substantially reduce the computational complexity and duration, but it can simultaneously retain comparative estimation efficiency.  相似文献   

6.
MM Algorithms for Some Discrete Multivariate Distributions   总被引:1,自引:0,他引:1  
The MM (minorization–maximization) principle is a versatile tool for constructing optimization algorithms. Every EM algorithm is an MM algorithm but not vice versa. This article derives MM algorithms for maximum likelihood estimation with discrete multivariate distributions such as the Dirichlet-multinomial and Connor–Mosimann distributions, the Neerchal–Morel distribution, the negative-multinomial distribution, certain distributions on partitions, and zero-truncated and zero-inflated distributions. These MM algorithms increase the likelihood at each iteration and reliably converge to the maximum from well-chosen initial values. Because they involve no matrix inversion, the algorithms are especially pertinent to high-dimensional problems. To illustrate the performance of the MM algorithms, we compare them to Newton’s method on data used to classify handwritten digits.  相似文献   

7.
高维约束矩阵回归是指高维情况下带非凸约束的多响应多预测统计回归问题,其数学模型是一个NP-难的矩阵优化,它在机器学习与人工智能、医学影像疾病诊疗、基因表达分析、脑神经网络、风险管理等领域有广泛应用.从高维约束矩阵回归的优化理论和算法两方面总结和评述这些新成果,同时,列出了相应的重要文献.  相似文献   

8.
The traditional maximum likelihood estimator (MLE) is often of limited use in complex high-dimensional data due to the intractability of the underlying likelihood function. Maximum composite likelihood estimation (McLE) avoids full likelihood specification by combining a number of partial likelihood objects depending on small data subsets, thus enabling inference for complex data. A fundamental difficulty in making the McLE approach practicable is the selection from numerous candidate likelihood objects for constructing the composite likelihood function. In this article, we propose a flexible Gibbs sampling scheme for optimal selection of sub-likelihood components. The sampled composite likelihood functions are shown to converge to the one maximally informative on the unknown parameters in equilibrium, since sub-likelihood objects are chosen with probability depending on the variance of the corresponding McLE. A penalized version of our method generates sparse likelihoods with a relatively small number of components when the data complexity is intense. Our algorithms are illustrated through numerical examples on simulated data as well as real genotype single nucleotide polymorphism (SNP) data from a case–control study.  相似文献   

9.
高维正态概率积分计算一直是统计学家关注的课题.早期工作已由Gupta(1963)[1]评价,并给出大量的参考文献.近期工作则可参考Tong(1990)[2]的专著.虽然有关的文献很多,但是除了二、三维问题已有较好的算法外(例如见Zhana-Yana,1993[3]),更高维问题尚无公认的有效算法.在维数m>3的高维情形,多数文章常假设积分域或相关阵有特殊形式,否则只有使用MonteCarlo方法[4]或拟MonteCarlo方法(亦称数论网格方法,例如见Fang-Wang,1994[5]).但即使是被认为较好的拟MonteCarlo方法,其收敛阶仅为O(n-2/m),因此对于真…  相似文献   

10.
This article investigates likelihood inferences for high-dimensional factor analysis of time series data. We develop a matrix decomposition technique to obtain expressions of the likelihood functions and its derivatives. With such expressions, the traditional delta method that relies heavily on score function and Hessian matrix can be extended to high-dimensional cases. We establish asymptotic theories, including consistency and asymptotic normality. Moreover, fast computational algorithms are developed for estimation. Applications to high-dimensional stock price data and portfolio analysis are discussed. The technical proofs of the asymptotic results and the computer codes are available online.  相似文献   

11.
We present an algorithm capable of reconstructing a non-manifold surface embedded as a point cloud in a high-dimensional space. Our algorithm extends a previously developed incremental method and produces a non-optimal triangulation, but will work for non-orientable surfaces, and for surfaces with certain types of self-intersection. The self-intersections must be ordinary double curves and are fitted locally by intersecting planes using a degenerate quadratic surface. We present the algorithm in detail and provide many examples, including a dataset describing molecular conformations of cyclo-octane.  相似文献   

12.
Statistical inference can be over optimistic and even misleading based on a selected model due to the uncertainty of the model selection procedure, especially in the high-dimensional data analysis. In this article, we propose a bootstrap-based tilted correlation screening learning (TCSL) algorithm to alleviate this uncertainty. The algorithm is inspired by the recently proposed variable selection method, TCS algorithm, which screens variables via tilted correlation. Our algorithm can reduce the prediction error and make the interpretation more reliable. The other gain of our algorithm is the reduced computational cost compared with the TCS algorithm when the dimension is large. Extensive simulation examples and the analysis of one real dataset are conducted to exhibit the good performance of our algorithm. Supplementary materials for this article are available online.  相似文献   

13.
Kong  Cui Juan  Liang  Han Ying 《数学学报(英文版)》2021,37(12):1803-1825
Acta Mathematica Sinica, English Series - We, in this paper, investigate two-sample quantile difference by empirical likelihood method when the responses with high-dimensional covariates of the two...  相似文献   

14.
We propose an empirical likelihood method to test whether the coefficients in a possibly high-dimensional linear model are equal to given values. The asymptotic distribution of the test statistic is independent of the number of covariates in the linear model.  相似文献   

15.
A new method for estimating high-dimensional covariance matrix based on network structure with heteroscedasticity of response variables is proposed in this paper. This method greatly reduces the computational complexity by transforming the high-dimensional covariance matrix estimation problem into a low-dimensional linear regression problem. Even if the size of sample is finite, the estimation method is still effective. The error of estimation will decrease with the increase of matrix dimension. In addition, this paper presents a method of identifying influential nodes in network via covariance matrix. This method is very suitable for academic cooperation networks by taking into account both the contribution of the node itself and the impact of the node on other nodes.  相似文献   

16.
Multivariate normal mixtures provide a flexible model for high-dimensional data. They are widely used in statistical genetics, statistical finance, and other disciplines. Due to the unboundedness of the likelihood function, classical likelihood-based methods, which may have nice practical properties, are inconsistent. In this paper, we recommend a penalized likelihood method for estimating the mixing distribution. We show that the maximum penalized likelihood estimator is strongly consistent when the number of components has a known upper bound. We also explore a convenient EM-algorithm for computing the maximum penalized likelihood estimator. Extensive simulations are conducted to explore the effectiveness and the practical limitations of both the new method and the ratified maximum likelihood estimators. Guidelines are provided based on the simulation results.  相似文献   

17.
The semiparametric proportional odds model for survival data is useful when mortality rates of different groups converge over time. However, fitting the model by maximum likelihood proves computationally cumbersome for large datasets because the number of parameters exceeds the number of uncensored observations. We present here an alternative to the standard Newton-Raphson method of maximum likelihood estimation. Our algorithm, an example of a minorization-maximization (MM) algorithm, is guaranteed to converge to the maximum likelihood estimate whenever it exists. For large problems, both the algorithm and its quasi-Newton accelerated counterpart outperform Newton-Raphson by more than two orders of magnitude.  相似文献   

18.
The multinomial logit model is the most widely used model for the unordered multi-category responses. However, applications are typically restricted to the use of few predictors because in the high-dimensional case maximum likelihood estimates frequently do not exist. In this paper we are developing a boosting technique called multinomBoost that performs variable selection and fits the multinomial logit model also when predictors are high-dimensional. Since in multi-category models the effect of one predictor variable is represented by several parameters one has to distinguish between variable selection and parameter selection. A special feature of the approach is that, in contrast to existing approaches, it selects variables not parameters. The method can also distinguish between mandatory predictors and optional predictors. Moreover, it adapts to metric, binary, nominal and ordinal predictors. Regularization within the algorithm allows to include nominal and ordinal variables which have many categories. In the case of ordinal predictors the order information is used. The performance of boosting technique with respect to mean squared error, prediction error and the identification of relevant variables is investigated in a simulation study. The method is applied to the national Indonesia contraceptive prevalence survey and the identification of glass. Results are also compared with the Lasso approach which selects parameters.  相似文献   

19.
This article proposes a penalized likelihood method to jointly estimate multiple precision matrices for use in quadratic discriminant analysis (QDA) and model-based clustering. We use a ridge penalty and a ridge fusion penalty to introduce shrinkage and promote similarity between precision matrix estimates. We use blockwise coordinate descent for optimization, and validation likelihood is used for tuning parameter selection. Our method is applied in QDA and semi-supervised model-based clustering.  相似文献   

20.
Gaussian graphical models represent the underlying graph structure of conditional dependence between random variables, which can be determined using their partial correlation or precision matrix. In a high-dimensional setting, the precision matrix is estimated using penalized likelihood by adding a penalization term, which controls the amount of sparsity in the precision matrix and totally characterizes the complexity and structure of the graph. The most commonly used penalization term is the L1 norm of the precision matrix scaled by the regularization parameter, which determines the trade-off between sparsity of the graph and fit to the data. In this article, we propose several procedures to select the regularization parameter in the estimation of graphical models that focus on recovering reliably the appropriate network structure of the graph. We conduct an extensive simulation study to show that the proposed methods produce useful results for different network topologies. The approaches are also applied in a high-dimensional case study of gene expression data with the aim to discover the genes relevant to colon cancer. Using these data, we find graph structures, which are verified to display significant biological gene associations. Supplementary material is available online.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号