首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于Nutch的医疗搜索引擎的研究与开发
引用本文:袁恩阁,吴向前,杨文忠.基于Nutch的医疗搜索引擎的研究与开发[J].新疆大学学报(理工版),2014(2):217-221.
作者姓名:袁恩阁  吴向前  杨文忠
作者单位:新疆大学电气工程学院,新疆乌鲁木齐830047
基金项目:地区科学基金(61262087)
摘    要:针对当前大众借助网络获取医疗信息的需求日益增强,以及通用搜索引擎获取专业领域信息时准确性差、效率低下的缺点,本文设计了基于nutch组件的医疗垂直搜索引擎。该系统实现了中文分词功能,通过文本训练得出了专业词库,运用空间向量模型算法对网页进行医疗主题相关度的计算,实现了网页过滤功能,并在排序算法中加入了主题相关因素。测试结果表明:该系统相对于通用搜索引擎,在获取医疗行业信息方面具有更高查准率,减少了不相关信息的干扰,使医疗信息的查找与定位更精确,能够为大众提供更具针对性的服务。

关 键 词:垂直搜索引擎  医疗信息  中文分词  文本分类

Research and Development of Medical Search Engine Based on Nutch
YUAN En-Ge,WU Xiang-Qian,YANG Wen-zhong.Research and Development of Medical Search Engine Based on Nutch[J].Journal of Xinjiang University(Science & Engineering),2014(2):217-221.
Authors:YUAN En-Ge  WU Xiang-Qian  YANG Wen-zhong
Institution:(College of Electrical Engineering, Xinjiang University, Urumqi, Xinjiang 83004 7, China)
Abstract:As the demands of public access to medical information with the help of network is growing, and when people use general search engines get professional information accuracy is poor and ineffcient. This paper designs a medical vertical search engine based on nutch components. The system realized the function of Chinese word segmentation.It also obtained Term Library by training texts. Using of SVM, the engine calculated the correlation between web page and medical domain.It realized the function of web page filtering. Finally,this system joined the theme relevant factors in the sorting algorithm.Test results show that,comparing with the general search engine,this system has a higher accuracy in terms of access to health information. It can reduce the interference of irrelevant information,to make finding and positioning medical information more accurate.So this system can provide the public with more targeted services.
Keywords:Nutch  vertical search engine  Nutch  medical information  chinese word segmentation  text catego-rization
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号