首页 | 本学科首页   官方微博 | 高级检索  
     检索      

用于LDA的无监督特征选择(英文)
引用本文:徐蔚然,杜刚,陈光,郭军,杨洁.用于LDA的无监督特征选择(英文)[J].中国通信学报,2011,8(5):54-62.
作者姓名:徐蔚然  杜刚  陈光  郭军  杨洁
基金项目:supported by National Nature Science Foundation of China under Grant No.60905017,61072061; National High Technical Research and Development Program of China(863 Program)under Grant No.2009AA01A346; 111 Project of China under Grant No.B08004; the Special Project for Innovative Young Researchers of Beijing University of Posts and Telecommunications
摘    要:As a generative model,Latent Dirichlet Allocation Model,which lacks optimization of topics' discrimination capability focuses on how to generate data,This paper aims to improve the discrimination capability through unsupervised feature selection.Theoretical analysis shows that the discrimination capability of a topic is limited by the discrimination capability of its representative words.The discrimination capability of a word is approximated by the Information Gain of the word for topics,which is used to d...

收稿时间:2011-10-27;

Unsupervised Feature Selection for Latent Dirichlet Allocation
Xu Weiran,Du Gang,Chen Guang,Guo Jun,Yang Jie.Unsupervised Feature Selection for Latent Dirichlet Allocation[J].China communications magazine,2011,8(5):54-62.
Authors:Xu Weiran  Du Gang  Chen Guang  Guo Jun  Yang Jie
Institution:School of Informations and Communication Engineering,Beijing University of Posts and Telecommunications,Beijing100876,P.R.China
Abstract:As a generative model, Latent Dirichlet Allocation Model, which lacks optimization of topics discrimination capability focuses on how to generate data, This paper aims to improve the discrimination capability through unsupervised feature selection. Theoretical analysis shows that the discrimination capability of a topic is limited by the discrimination capability of its representative words. The discrimination capability of a word is approximated by the Information Gain of the word for topics, which is used to distinguish between “general word” and “special word” in LDA topics. Therefore, we add a constraint to the LDA objective function to let the “general words” only happen in “general topics” other than “special topics”. Then a heuristic algorithm is presented to get the solution. Experiments show that this method can not only improve the information gain of topics, but also make the topics easier to understand by human.
Keywords:pattern recognition  unsupervised feature selection  Latent Dirichlet Allocation  general topic  special topic
本文献已被 维普 等数据库收录!
点击此处可从《中国通信学报》浏览原始摘要信息
点击此处可从《中国通信学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号