首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于EM算法的混合模型中子总体个数的研究
引用本文:赵杨璐,段丹丹,胡饶敏,唐加山,温勇,袁克海.基于EM算法的混合模型中子总体个数的研究[J].数理统计与管理,2020,39(1):35-50.
作者姓名:赵杨璐  段丹丹  胡饶敏  唐加山  温勇  袁克海
作者单位:南京邮电大学理学院,江苏南京,210023
摘    要:混合模型已成为数据分析中最流行的技术之一,由于拥有数学模型,它通常比聚类分析中的传统的方法产生的结果更精确,而关键因素是混合模型中子总体个数,它决定了数据分析的最终结果。期望最大化(EM)算法常用在混合模型的参数估计,以及机器学习和聚类领域中的参数估计中,是一种从不完全数据或者是有缺失值的数据中求解参数极大似然估计的迭代算法。学者们往往采用AIC和BIC的方法来确定子总体的个数,而这两种方法在实际的应用中的效果并不稳定,甚至可能会产生错误的结果。针对此问题,本文提出了一种利用似然函数的碎石图来确定混合模型中子总体的个数的新方法。实验结果表明,本文方法确定的子总体的个数在大部分理想的情况下可以得到与AIC、BIC方法确定的聚类个数相同的结果,而在一般的实际数据中或条件不理想的状态下,碎石图方法也可以得到更可靠的结果。随后,本文将新方法在选取的黄石公园喷泉数据的参数估计中进行了实际的应用。

关 键 词:EM  混合模型  子总体个数  碎石图

On the Number of Components in Mixture Model Based on EM Algorithm
ZHAO Yang-lu,DUAN Dan-dan,HU Rao-min,TANG Jia-shan,WEN Yong,YUAN Ke-hai.On the Number of Components in Mixture Model Based on EM Algorithm[J].Application of Statistics and Management,2020,39(1):35-50.
Authors:ZHAO Yang-lu  DUAN Dan-dan  HU Rao-min  TANG Jia-shan  WEN Yong  YUAN Ke-hai
Institution:(College of Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China)
Abstract:Mixture modeling has become a popular technique in data analysis.Because of model based,it typically yields more accurate results than conventional methods in cluster analysis.A key element is the number of components in the mixture model,and it determines the final result of data analysis.The expectation maximization(EM) algorithm is commonly used for parameter estimation in mixture model,and in the field of machine learning and data clustering.EM algorithm is an iterative algorithm for computing the maximum likelihood estimates of model parameters with incomplete data,and the number of components in mixture model cannot be observed or is a missing value.Researchers often use AIC and BIC to determine the number of components in mixture model.However,these two criteria are not reliable in applications,and often yield misleading results in real data analysis.Aiming at this problem,this paper proposes a new method to determine the number of components in mixture modeling.The method uses the scree plot of the likelihood function to determine the number of clusters.The simulation results show that the method of scree plot not only obtains the same number of components as AIC and BIC do in most cases,it can also yields more reliable results when conditions are not ideal,which are typical with real data.Subsequently,the new method is applied for parameter estimation of the fountain data from yellow stone national park.
Keywords:EM algorithm  mixed model  number of components  screen plot
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号