首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于N-gram统计模型的搜索引擎中文纠错
引用本文:陈智鹏,吕玉琴,刘华生,刘刚,屠辉.基于N-gram统计模型的搜索引擎中文纠错[J].中国电子科学研究院学报,2009,4(3).
作者姓名:陈智鹏  吕玉琴  刘华生  刘刚  屠辉
作者单位:北京邮电大学,电子工程学院,北京,100876
摘    要:搜索引擎中的关键词纠错是提高检索效率的一项重要辅助功能。提出了一种完全通过分析上下文统计信息的方法,根据中文语言的特点,在建立N—gram统计模型并分析比较的基础上,再通过计算TF/IDF的权重来获得最优的纠错结果,最后通过实验验证了该方法实现了搜索引擎中对输入关键词的自动检查和纠错。

关 键 词:搜索引擎  输入纠错  N-gram模型  TF/IDF

Chinese Spelling Correction in Search Engines Based on N-gram Model
CHEN Zhi-peng,LV Yu-qin,LIU Hua-sheng,LIU Gang,TU Hui.Chinese Spelling Correction in Search Engines Based on N-gram Model[J].Journal of China Academy of Electronics and Information Technology,2009,4(3).
Authors:CHEN Zhi-peng  LV Yu-qin  LIU Hua-sheng  LIU Gang  TU Hui
Institution:Beijing University of Posts and Telecommunications School of Electronic Engineering;Beijing 100876;China
Abstract:Key words spelling correction plays an important part in the improvement of efficiency in a search engine.In this article,a method that analyzes only the context-sensitive statistics is d iscussed.Accord ing to the characteristics of the Chinese language,this method is based on the establishment ofN-grams model and the analysis and comparison of it,and it involves the calculation of the TF /IDF weights to obtain the best error correction.This correction model is tested in actual practice and is proved effec...
Keywords:search engine  spelling correction  N-grams model  TF/IDF weight  
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号