用于LDA的无监督特征选择(英文) Unsupervised Feature Selection for Latent Dirichlet Allocation期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

用于LDA的无监督特征选择(英文)

引用本文：	徐蔚然,杜刚,陈光,郭军,杨洁.用于LDA的无监督特征选择(英文)[J].中国通信学报,2011,8(5):54-62.

作者姓名：	徐蔚然杜刚陈光郭军杨洁

基金项目：	supported by National Nature Science Foundation of China under Grant No.60905017,61072061; National High Technical Research and Development Program of China(863 Program)under Grant No.2009AA01A346; 111 Project of China under Grant No.B08004; the Special Project for Innovative Young Researchers of Beijing University of Posts and Telecommunications

摘要：	As a generative model,Latent Dirichlet Allocation Model,which lacks optimization of topics' discrimination capability focuses on how to generate data,This paper aims to improve the discrimination capability through unsupervised feature selection.Theoretical analysis shows that the discrimination capability of a topic is limited by the discrimination capability of its representative words.The discrimination capability of a word is approximated by the Information Gain of the word for topics,which is used to d...
收稿时间：	2011-10-27;
Unsupervised Feature Selection for Latent Dirichlet Allocation

Xu Weiran,Du Gang,Chen Guang,Guo Jun,Yang Jie.Unsupervised Feature Selection for Latent Dirichlet Allocation[J].China communications magazine,2011,8(5):54-62.

Authors:	Xu Weiran Du Gang Chen Guang Guo Jun Yang Jie

Institution:	School of Informations and Communication Engineering,Beijing University of Posts and Telecommunications,Beijing100876,P.R.China

Abstract:	As a generative model, Latent Dirichlet Allocation Model, which lacks optimization of topics discrimination capability focuses on how to generate data, This paper aims to improve the discrimination capability through unsupervised feature selection. Theoretical analysis shows that the discrimination capability of a topic is limited by the discrimination capability of its representative words. The discrimination capability of a word is approximated by the Information Gain of the word for topics, which is used to distinguish between “general word” and “special word” in LDA topics. Therefore, we add a constraint to the LDA objective function to let the “general words” only happen in “general topics” other than “special topics”. Then a heuristic algorithm is presented to get the solution. Experiments show that this method can not only improve the information gain of topics, but also make the topics easier to understand by human.

Keywords:	pattern recognition unsupervised feature selection Latent Dirichlet Allocation general topic special topic
本文献已被维普等数据库收录！
	点击此处可从《中国通信学报》浏览原始摘要信息
	点击此处可从《中国通信学报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏