首页 | 本学科首页   官方微博 | 高级检索  
     检索      

在线虚假评论识别中的数据贫乏问题研究
引用本文:张文,王强,唐子旭,秦广杰,李健.在线虚假评论识别中的数据贫乏问题研究[J].运筹与管理,2022,31(11):167-173.
作者姓名:张文  王强  唐子旭  秦广杰  李健
作者单位:1.北京工业大学 经济与管理学院,中国 北京 100124; 2.山东浪潮新基建科技有限公司,山东 济南 250011
基金项目:国家自然科学基金资助项目(72174018,71932002);北京市教委社科计划重点项目(SZ202110005001),北京自然科学基金项目(9222001)
摘    要:机器学习相关技术的发展提升了在线虚假评论识别的准确率,然而现阶段机器学习模型缺少足够量的已标注数据来进行模型训练。本文基于生成式对抗网络(GAN)提出了评论数据集扩充方法GAN-RDE(GAN-Review Dataset Expansion)以解决虚假评论识别中模型训练数据贫乏问题。具体而言,首先将初始评论数据划分为真实评论数据集和虚假评论数据集,使用真实评论数据集和虚假评论数据集分别训练GAN,生成符合真实评论与虚假评论特征分布的向量。然后将GAN训练得到的符合评论特征分布的向量与初始评论数据集的特征词词向量矩阵进行合并,扩充模型训练数据。最后,利用朴素贝叶斯、多层感知机和支持向量机作为基础分类器,对比数据扩充前后虚假评论识别的效果。实验结果表明,使用GAN-RDE方法扩充评论数据集后,机器学习模型对虚假评论识别准确率得到显著提升。

关 键 词:虚假评论  生成式对抗网络  多层感知机  支持向量机  机器学习  
收稿时间:2021-04-22

Research on Data Poorness in Online Deceptive Review Identification
ZHANG Wen,WANG Qiang,TANG Zi-xu,QIN Guang-jie,LI Jian.Research on Data Poorness in Online Deceptive Review Identification[J].Operations Research and Management Science,2022,31(11):167-173.
Authors:ZHANG Wen  WANG Qiang  TANG Zi-xu  QIN Guang-jie  LI Jian
Institution:1. School of Economics and Management, Beijing University of Technology, Beijing 100124, China; 2. Inspur Neus Infrastructure Technalogy Co., Ltd, Junan 250011, China
Abstract:The development of machine learning related technology has improved the accuracy of onlinedeceptiveyeview identification. However, the current machine learning model lacks enough labeled data to carry out model training. This paper proposes a review dataset expansion method called GAN-RDE based on Generative Adversarial Networks (GAN), which aims to solve the problem of insufficient model training data in deceptive review identification. Specifically, we divide the initial review data into a real review dataset and a deceptive review dataset, and the GAN is trained through the truthful review dataset and the deceptive review dataset respectively, to generate a vector that conforms to the feature distribution of the truthful review and the deceptive review. Secondly, we combine the vector of the review feature distribution with the feature word vector matrix of the initial review dataset to expand the model training data. Finally, the Nai ve Bayes, the multi-layer perceptron, and support vector machine are used as basic classifiers to compare the effects of deceptive review recognition before and after data expansion. The experimental results show that the classifier with the GAN-RDE method can produce better performances than the classifier with the unexpanded dataset in deceptive review identification.
Keywords:deceptive review  generative adversarial networks  multi-layer perceptron  support vector machine  machine learning  
点击此处可从《运筹与管理》浏览原始摘要信息
点击此处可从《运筹与管理》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号