首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于BalanceCascade-GBDT算法的类别不平衡虚假评论识别方法
引用本文:陶朝杰,杨进.基于BalanceCascade-GBDT算法的类别不平衡虚假评论识别方法[J].经济数学,2020,37(3):214-220.
作者姓名:陶朝杰  杨进
作者单位:上海理工大学理学院 ,上海 200093
基金项目:上海市一流学科建设项目;教育部人文社会科学研究项目
摘    要:虚假评论是电商发展过程中一个无法避免的难题. 针对在线评论数据中样本类别不平衡情况,提出基于BalanceCascade-GBDT算法的虚假评论识别方法. BalanceCascade算法通过设置分类器的误报率逐步缩小大类样本空间,然后集成所有基分类器构建最终分类器. GBDT以其高准确性和可解释性被广泛应用于分类问题中,并且作为样本扰动不稳定算法,是十分合适的基分类模型. 模型基于Yelp评论数据集,采用AUC值作为评价指标,并与逻辑回归、随机森林以及神经网络算法进行对比,实验证明了该方法的有效性.

关 键 词:虚假评论  类别不平衡  GBDT  BalanceCascsde  机器学习

Detection of Class-Imbalance Spam Reviews Based on BalanceCascade-GBDT Algorithm
Institution:(College of Science, University of Shanghai for Science and Technology, Shanghai 200093, China)
Abstract:Spam review was an inevitable problem in the development process of e-commerce. In view of class-imbalance problem in online review data, this paper proposed a BalanceCascade-GBDT method to detect spam reviews. BalanceCascade set the false alarm rate of classifiers to reduce sample space of the majority class gradually and ensembled all base classifiers to build final classifier. GBDT was a suitable base classifier because it was widely used in classification due to its high accuracy and good interpretability and was sensitive to sample data. In terms of AUC and against three machine learning algorithms, the validity of the proposed method was proved.
Keywords:spam review  class-imbalance  GBDT  BalanceCascade  machine learning
本文献已被 万方数据 等数据库收录!
点击此处可从《经济数学》浏览原始摘要信息
点击此处可从《经济数学》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号