首页 | 本学科首页   官方微博 | 高级检索  
     

最大熵方法中特征选择算法的改进与纠错排歧
引用本文:张仰森,曹元大,俞士汶. 最大熵方法中特征选择算法的改进与纠错排歧[J]. 北京理工大学学报, 2006, 26(1): 36-40
作者姓名:张仰森  曹元大  俞士汶
作者单位:北京大学,计算语言学研究所,北京,100871;北京信息科技大学,计算机及自动化系,北京,100101;北京理工大学,软件学院,北京,100081;北京大学,计算语言学研究所,北京,100871
基金项目:科技部科研项目 , 国家科技攻关项目
摘    要:对应用最大熵原理建立语言模型的特征选取方法作了改进.用特征模板从训练样本中获得候选特征集,应用频次与平均互信息相结合的方法从候选特征集中选取特征.在选择有效特征时,对候选特征集中出现频次大于某一限值的特征或平均互信息很大的特征直接加入有效特征集,且不是每选一个特征都调用参数的求解过程,从而加快了特征选择的速度.将改进的算法应用于文本纠错建议的排歧,实验证明,所改进的特征选择算法有效.

关 键 词:最大熵方法  特征选择  语言建模  纠错排歧
文章编号:1001-0645(2006)01-0036-05
收稿时间:2005-04-12
修稿时间:2005-04-12

Improvement of Feature Selection Algorithm in Maximum Entropy Model and Disambiguation of Error-Correction Candidates
ZHANG Yang-sen,CAO Yuan-da and YU Shi-wen. Improvement of Feature Selection Algorithm in Maximum Entropy Model and Disambiguation of Error-Correction Candidates[J]. Journal of Beijing Institute of Technology(Natural Science Edition), 2006, 26(1): 36-40
Authors:ZHANG Yang-sen  CAO Yuan-da  YU Shi-wen
Affiliation:1. Institute of Computational Linguistics, Peking University, Beijing 100871, China; 2. School of Computer Software, Beijing Institute of Technology, Beijing 100081, China; 3. Department of Computer and Automation, Beijing Information and Technology University, Beijing 100101, China
Abstract:An improved feature selection algorithm in maximum entropy modeling approach is presented.Candidate feature set is acquired from the training sample corpus using templates,and the features are selected from the candidate feature set according to the combination of feature frequency and average mutual information.When selecting the effective feature,features in the candidate set whose frequency or average mutual information value is larger than a threshold are put into the effective feature set directly.The execution of parameter acquisition algorithm is not for each choice of feature,so the speed of feature selection is improved.The improved model is applied to sort the candidates of error-correction.The experiment shows that it has higher efficiency and precision.
Keywords:maximum entropy approach  feature selecting  language modeling  error-correcting disambiguation
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《北京理工大学学报》浏览原始摘要信息
点击此处可从《北京理工大学学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号