首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到16条相似文献,搜索用时 218 毫秒
1.
本文基于隐变量的有限混合模型,提出了一种用于有序数据的Bayes聚类方法.我们采用EM算法获得模型参数的估计,用BIC准则确定类数,用类似于Bayes判别的方法对各观测分类.模拟研究结果表明,本文提出的方法有较好的聚类效果,对于中等规模的数据集,计算量是可以接受的.  相似文献   

2.
俞燕  徐勤丰  孙鹏飞 《应用数学》2006,19(3):600-605
本文基于Dirichlet分布有限混合模型,提出了一种用于成分数据的Bayes聚类方法.采用EM算法获得模型参数的估计,用BIC准则确定类数,用类似于Bayes判别的方法对各观测分类.推导了计算公式,编写出程序.模拟研究结果表明,本文提出的方法有较好的聚类效果.  相似文献   

3.
混合专家回归模型广泛应用于异质总体数据的分类,聚类及回归分析中.研究基于偏正态数据,提出了联合位置与尺度混合专家回归模型,该模型同时对位置,尺度和混合比例参数建模,应用MM算法和EM算法研究了该模型参数的极大似然估计.通过随机模拟和实例分析说明了该模型和方法的有效性与实用性.  相似文献   

4.
混合专家回归模型广泛应用于异质总体数据的分类,聚类及回归分析中.研究基于偏正态数据,提出了联合位置与尺度混合专家回归模型,该模型同时对位置,尺度和混合比例参数建模,应用MM算法和EM算法研究了该模型参数的极大似然估计.通过随机模拟和实例分析说明了该模型和方法的有效性与实用性.  相似文献   

5.
本文基于隐变量的有限混合模型, 提出了一种用于有序数据的Bayes聚类方法\bd 我们采用EM算法获得模型参数的估计, 用BIC准则确定类数, 用类似于Bayes判别的方法对各观测分类\bd 模拟研究结果表明, 本文提出的方法有较好的聚类效果, 对于中等规模的数据集, 计算量是可以接受的.  相似文献   

6.
偏t正态分布是分析尖峰,厚尾数据的重要统计工具之一.研究提出了偏t正态数据下混合线性联合位置与尺度模型,通过EM算法和Newton-Raphson方法研究了该模型参数的极大似然估计.并通过随机模拟试验验证了所提出方法的有效性.最后,结合实际数据验证了该模型和方法具有实用性和可行性.  相似文献   

7.
混合模型已成为数据分析中最流行的技术之一,由于拥有数学模型,它通常比聚类分析中的传统的方法产生的结果更精确,而关键因素是混合模型中子总体个数,它决定了数据分析的最终结果。期望最大化(EM)算法常用在混合模型的参数估计,以及机器学习和聚类领域中的参数估计中,是一种从不完全数据或者是有缺失值的数据中求解参数极大似然估计的迭代算法。学者们往往采用AIC和BIC的方法来确定子总体的个数,而这两种方法在实际的应用中的效果并不稳定,甚至可能会产生错误的结果。针对此问题,本文提出了一种利用似然函数的碎石图来确定混合模型中子总体的个数的新方法。实验结果表明,本文方法确定的子总体的个数在大部分理想的情况下可以得到与AIC、BIC方法确定的聚类个数相同的结果,而在一般的实际数据中或条件不理想的状态下,碎石图方法也可以得到更可靠的结果。随后,本文将新方法在选取的黄石公园喷泉数据的参数估计中进行了实际的应用。  相似文献   

8.
半参数广义线性混合效应模型的估计及其渐近性质   总被引:1,自引:0,他引:1       下载免费PDF全文
半参数广义线性混合效应模型在心理学、生物育种、医学等领域有广泛的应用. Zhang(1998)用最大惩罚似然函数的方法(MPLE)对模型的参数和非参数部分进行了估计, 而Zhang (1998) MPLE方法只适用于正态数据模型. 对于泊松等常用的模型, 常的方法是将随机效应看作缺失数据, 再引入EM算法. 本文基于McCulloch 1997)提出的MCNR算法, 此算法推广到半参数广义线性混合效应模型中并得到相应的估计算法. 于非参数部分, 本文采用P样条拟合并利用GCV方法选取光滑参数, 时证明了所得估计的相合性和渐近正态性. 最后, 过模拟和实例与其它算法作比较验证本文估计方法的有效性.  相似文献   

9.
在分析具有异质性和非对称性数据时,偏正态混合模型提供一种比经典的Gauss混合模型更为灵活的建模方式.然而,由于无界的似然函数和发散的形状参数,该模型的极大似然估计并未被正确定义,进一步导致不理想的推断过程.为同时解决这两个问题,本文基于惩罚似然提出一种新的估计方案,并证明在混合分布的类别个数大于或等于真实的类别个数时,相应的惩罚极大似然估计是强相合的.同时,本文也提出相应的惩罚EM (expectation maximization)算法来计算惩罚估计.最后,通过模拟分析与现有方法比较研究估计方法在有限样本下的表现,并采用两个实例说明方法的有效性.  相似文献   

10.
Tweedie类分布在财产保险中常常用来对索赔额进行量化,而混合专家回归模型在统计和机器学习方面被广泛地研究,并用来对异质总体数据进行分类、聚类及回归分析.本文基于Tweedie类分布提出广义线性联合均值与散度混合专家回归模型,从而为非寿险费率厘定精算技术的发展提供参考思路.接着,利用EM算法给出该模型的极大似然估计,进而通过随机模拟实验验证了所提出方法的有效性.最后,本文结合空气质量指标(AQI)数据验证了该模型和方法具有实用性和可行性.  相似文献   

11.
A new statistical methodology is developed for fitting left-truncated loss data by using the G-component finite mixture model with any combination of Gamma, Lognormal, and Weibull distributions. The EM algorithm, along with the emEM initialization strategy, is employed for model fitting. We propose a new grid map which considers the model selection criterion (AIC or BIC) and risk measures at the same time, by using the entire space of models under consideration. A simulation study validates our proposed approach. The application of the proposed methodology and use of new grid maps are illustrated through analyzing a real data set that includes left-truncated insurance losses.  相似文献   

12.
The problem of estimating the number of hidden states in a hidden Markov model is considered. Emphasis is placed on cross-validated likelihood criteria. Using cross-validation to assess the number of hidden states allows to circumvent the well-documented technical difficulties of the order identification problem in mixture models. Moreover, in a predictive perspective, it does not require that the sampling distribution belongs to one of the models in competition. However, computing cross-validated likelihood for hidden Markov models for which only one training sample is available, involves difficulties since the data are not independent. Two approaches are proposed to compute cross-validated likelihood for a hidden Markov model. The first one consists of using a deterministic half-sampling procedure, and the second one consists of an adaptation of the EM algorithm for hidden Markov models, to take into account randomly missing values induced by cross-validation. Numerical experiments on both simulated and real data sets compare different versions of cross-validated likelihood criterion and penalised likelihood criteria, including BIC and a penalised marginal likelihood criterion. Those numerical experiments highlight a promising behaviour of the deterministic half-sampling criterion.  相似文献   

13.
Count data with excess zeros are often encountered in many medical, biomedical and public health applications. In this paper, an extension of zero-inflated Poisson mixed regression models is presented for dealing with multilevel data set, referred as hierarchical mixture zero-inflated Poisson mixed regression models. A stochastic EM algorithm is developed for obtaining the ML estimates of interested parameters and a model comparison is also considered for comparing models with different latent classes through BIC criterion. An application to the analysis of count data from a Shanghai Adolescence Fitness Survey and a simulation study illustrate the usefulness and effectiveness of our methodologies.  相似文献   

14.
A finite mixture model has been used to fit the data from heterogeneous populations to many applications. An Expectation Maximization (EM) algorithm is the most popular method to estimate parameters in a finite mixture model. A Bayesian approach is another method for fitting a mixture model. However, the EM algorithm often converges to the local maximum regions, and it is sensitive to the choice of starting points. In the Bayesian approach, the Markov Chain Monte Carlo (MCMC) sometimes converges to the local mode and is difficult to move to another mode. Hence, in this paper we propose a new method to improve the limitation of EM algorithm so that the EM can estimate the parameters at the global maximum region and to develop a more effective Bayesian approach so that the MCMC chain moves from one mode to another more easily in the mixture model. Our approach is developed by using both simulated annealing (SA) and adaptive rejection metropolis sampling (ARMS). Although SA is a well-known approach for detecting distinct modes, the limitation of SA is the difficulty in choosing sequences of proper proposal distributions for a target distribution. Since ARMS uses a piecewise linear envelope function for a proposal distribution, we incorporate ARMS into an SA approach so that we can start a more proper proposal distribution and detect separate modes. As a result, we can detect the maximum region and estimate parameters for this global region. We refer to this approach as ARMS annealing. By putting together ARMS annealing with the EM algorithm and with the Bayesian approach, respectively, we have proposed two approaches: an EM-ARMS annealing algorithm and a Bayesian-ARMS annealing approach. We compare our two approaches with traditional EM algorithm alone and Bayesian approach alone using simulation, showing that our two approaches are comparable to each other but perform better than EM algorithm alone and Bayesian approach alone. Our two approaches detect the global maximum region well and estimate the parameters in this region. We demonstrate the advantage of our approaches using an example of the mixture of two Poisson regression models. This mixture model is used to analyze a survey data on the number of charitable donations.  相似文献   

15.
Recovering system model from noisy data is a key challenge in the analysis of dynamical systems. Based on a data-driven identification approach, we develop a model selection algorithm called Entropy Regression Bayesian Information Criterion (ER-BIC). First, the entropy regression identification algorithm (ER) is used to obtain candidate models that are close to the Pareto optimum and combine as a library of candidate models. Second, BIC score in the candidate models library is calculated using the Bayesian information criterion (BIC) and ranked from smallest to largest. Third, the model with the smallest BIC score is selected as the one we need to optimize. Finally, the ER-BIC algorithm is applied to several classical dynamical systems, including one-dimensional polynomial and RC circuit systems, two-dimensional Duffing and classical ODE systems, three-dimensional Lorenz 63 and Lorenz 84 systems. The results show that the new algorithm accurately identifies the system model under noise and time variable $t$, laying the foundation for nonlinear analysis.  相似文献   

16.
Basing cluster analysis on mixture models has become a classical and powerful approach. It enables some classical criteria such as the well-known k-means criterion to be explained. To classify the rows or the columns of a contingency table, an adapted version of k-means known as Mndki2, which uses the chi-square distance, can be used. Unfortunately, this simple, effective method which can be used jointly with correspondence analysis based on the same representation of the data, cannot be associated with a mixture model in the same way as the classical k-means algorithm. In this paper we show that the Mndki2 algorithm can be viewed as an approximation of a classifying version of the EM algorithm for a mixture of multinomial distributions. A comparison of the algorithms belonging in this context are experimentally investigated using Monte Carlo simulations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号