首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于KNN和Bayes算法的组合分类器的垃圾评论识别研究
引用本文:梁曌,陈思宇,梁小林,康欣.基于KNN和Bayes算法的组合分类器的垃圾评论识别研究[J].经济数学,2016(1):36-41.
作者姓名:梁曌  陈思宇  梁小林  康欣
作者单位:1. 长沙市雅礼中学,湖南 长沙,410071;2. 长沙理工大学 数学与统计学院,湖南 长沙,410114
基金项目:长沙理工大学研究生创新性项目(CX2015SS20)
摘    要:产品垃圾评论在一定程度上影响了评论信息的参考价值,本文旨在建立识别模型将垃圾评论从评论文本中剔除,保留真实的产品评论。首先,分析了产品评论的特点,从数据搜集、文本预处理、互信息检验、文本表示4个模块提取了14个特征。然后,利用高互补性建立了基于KNN和Bayes算法的组合分类器模型。最后,利用交叉验证对iPhone 6Plus的产品评论进行检验,得到评价指标分别为:正确识别率75.3%、召回率82.1%以及F1值77.5%.

关 键 词:KNN  算法  Bayes  算法  组合分类器  互信息  交叉验证

Research on Identifying Product Review Spam Based on Combination Classification of KNN and Bayesian Algorithms
LIANG Zhao,CHEN Si-yu,LIANG Xiao-lin,KANG Xin.Research on Identifying Product Review Spam Based on Combination Classification of KNN and Bayesian Algorithms[J].Mathematics in Economics,2016(1):36-41.
Authors:LIANG Zhao  CHEN Si-yu  LIANG Xiao-lin  KANG Xin
Institution:Research on Identifying Product Review Spam Based on Combination Classification of KNN and Bayesian Algorithms
Abstract:Product review spam affects the reference value of information to a certain extent.The purpose of this paper was to set up a model to remove the product review spam,and retained the real product reviews.Firstly,this paper analyzed the characteristics of the product reviews,and abstracted 14 features from Data collecting,text preprocessing,mutual informa-tion inspecting,and text representing.Secondly,we established a model of combination classifications based on KNN and Bayes algorithm by using the biggest complementarity.Finally,we made cross validating to the product review for iPhone 6 Plus.This model gets a higher correct recognition rate of 75.3%,the recall rate of 82.1%,and F1 value 77.5%.
Keywords:KNN algorithm  Bayes algorithm  combination classification  mutual information  cross validation
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《经济数学》浏览原始摘要信息
点击此处可从《经济数学》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号