在线虚假评论识别中的数据贫乏问题研究 Research on Data Poorness in Online Deceptive Review Identification期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

在线虚假评论识别中的数据贫乏问题研究

引用本文：	张文,王强,唐子旭,秦广杰,李健.在线虚假评论识别中的数据贫乏问题研究[J].运筹与管理,2022,31(11):167-173.

作者姓名：	张文王强唐子旭秦广杰李健

作者单位：	1.北京工业大学经济与管理学院,中国北京 100124; 2.山东浪潮新基建科技有限公司,山东济南 250011

基金项目：	国家自然科学基金资助项目(72174018,71932002);北京市教委社科计划重点项目(SZ202110005001),北京自然科学基金项目(9222001)

摘要：	机器学习相关技术的发展提升了在线虚假评论识别的准确率,然而现阶段机器学习模型缺少足够量的已标注数据来进行模型训练。本文基于生成式对抗网络(GAN)提出了评论数据集扩充方法GAN-RDE(GAN-Review Dataset Expansion)以解决虚假评论识别中模型训练数据贫乏问题。具体而言,首先将初始评论数据划分为真实评论数据集和虚假评论数据集,使用真实评论数据集和虚假评论数据集分别训练GAN,生成符合真实评论与虚假评论特征分布的向量。然后将GAN训练得到的符合评论特征分布的向量与初始评论数据集的特征词词向量矩阵进行合并,扩充模型训练数据。最后,利用朴素贝叶斯、多层感知机和支持向量机作为基础分类器,对比数据扩充前后虚假评论识别的效果。实验结果表明,使用GAN-RDE方法扩充评论数据集后,机器学习模型对虚假评论识别准确率得到显著提升。
关键词：	虚假评论生成式对抗网络多层感知机支持向量机机器学习
收稿时间：	2021-04-22
Research on Data Poorness in Online Deceptive Review Identification

ZHANG Wen,WANG Qiang,TANG Zi-xu,QIN Guang-jie,LI Jian.Research on Data Poorness in Online Deceptive Review Identification[J].Operations Research and Management Science,2022,31(11):167-173.

Authors:	ZHANG Wen WANG Qiang TANG Zi-xu QIN Guang-jie LI Jian

Institution:	1. School of Economics and Management, Beijing University of Technology, Beijing 100124, China; 2. Inspur Neus Infrastructure Technalogy Co., Ltd, Junan 250011, China

Abstract:	The development of machine learning related technology has improved the accuracy of onlinedeceptiveyeview identification. However, the current machine learning model lacks enough labeled data to carry out model training. This paper proposes a review dataset expansion method called GAN-RDE based on Generative Adversarial Networks (GAN), which aims to solve the problem of insufficient model training data in deceptive review identification. Specifically, we divide the initial review data into a real review dataset and a deceptive review dataset, and the GAN is trained through the truthful review dataset and the deceptive review dataset respectively, to generate a vector that conforms to the feature distribution of the truthful review and the deceptive review. Secondly, we combine the vector of the review feature distribution with the feature word vector matrix of the initial review dataset to expand the model training data. Finally, the Nai ve Bayes, the multi-layer perceptron, and support vector machine are used as basic classifiers to compare the effects of deceptive review recognition before and after data expansion. The experimental results show that the classifier with the GAN-RDE method can produce better performances than the classifier with the unexpanded dataset in deceptive review identification.

Keywords:	deceptive review generative adversarial networks multi-layer perceptron support vector machine machine learning

	点击此处可从《运筹与管理》浏览原始摘要信息
	点击此处可从《运筹与管理》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏