首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 265 毫秒
1.
基于主题模型的半监督网络文本情感分类研究   总被引:1,自引:0,他引:1  
针对网络评论文本的情感分类问题中存在的数据的不平衡性、无标记性和不规范性问题,提出一种基于主题的闽值调整的半监督学习模型,通过从非结构化文本中提取主题特征,对少量标注情感的文本训练分类器并优化指标调整闽值,达到识别用户评论的情感倾向的目的。仿真研究证明阈值调整的半监督模型对数据非平衡性和无标记性具有较强的适应能力。在实证研究中,对酒店评论文本数据构建的文本情感分类器显示该模型可以有效预测少数类评论样本的情感极性,证实了基于主题模型的闽值调整半监督网络评论文本情感分类模型在实际问题中的适用性与可行性。  相似文献   

2.
张文  王强  唐子旭  秦广杰  李健 《运筹与管理》2022,31(11):167-173
机器学习相关技术的发展提升了在线虚假评论识别的准确率,然而现阶段机器学习模型缺少足够量的已标注数据来进行模型训练。本文基于生成式对抗网络(GAN)提出了评论数据集扩充方法GAN-RDE(GAN-Review Dataset Expansion)以解决虚假评论识别中模型训练数据贫乏问题。具体而言,首先将初始评论数据划分为真实评论数据集和虚假评论数据集,使用真实评论数据集和虚假评论数据集分别训练GAN,生成符合真实评论与虚假评论特征分布的向量。然后将GAN训练得到的符合评论特征分布的向量与初始评论数据集的特征词词向量矩阵进行合并,扩充模型训练数据。最后,利用朴素贝叶斯、多层感知机和支持向量机作为基础分类器,对比数据扩充前后虚假评论识别的效果。实验结果表明,使用GAN-RDE方法扩充评论数据集后,机器学习模型对虚假评论识别准确率得到显著提升。  相似文献   

3.
蒋翠清  梁坤  丁勇  段锐 《运筹与管理》2017,26(2):135-139
网络借贷环境下基于Adaboost的信用评价方法具有较高的基分类器分歧度和样本误分代价。现有研究没有考虑分歧度和误分代价对基分类器样本权重的影响,从而降低了网络借贷信用评价结果的有效性。为此,提出一种基于改进Adaboost的信用评价方法。该方法根据基分类器的误分率,样本在不同基分类器上分类结果的分歧程度,以及样本的误分代价等因素,调整Adaboost模型的样本赋权策略,使得改进后的Adaboost模型能够对分类困难样本和误分代价高的样本实施有针对性的学习,从而提高网络借贷信用评价结果的有效性。基于拍拍贷平台数据的实验结果表明,提出的方法在分类精度和误分代价等方面显著优于传统的基于Adaboost的信用评价方法。  相似文献   

4.
《数理统计与管理》2015,(5):809-820
不平衡数据是指分类问题中目标变量的某一类观测值数量远大于其他类观测值数量的数据。针对处理不平衡数据算法SMOTE及其衍生算法的不足,本文提出一种新的向上采样算法SMUP(Synthetic Minority Using Proximity of Random Forests),通过样本相似度改进SMOTE算法中的距离测量方式,提高了算法的分类精度。实验结果表明,基于SMUP算法的单分类器能有效提升少数类的分类正确率,同时解决了SMOTE对定类型特征变量距离测度不佳的难题;基于SMUP算法的组合分类器分类效果也明显优于SMOTE衍生算法;最重要的是,SMUP将连续型、混合型和定类型这三种特征变量的距离测度整合到一个统一的框架下,为实际应用提供了便利。  相似文献   

5.
本文主要研究正态混合模型的贝叶斯分类方法.贝叶斯分类以后验概率最大为准则,后验概率需要估计相关的条件分布.对于连续型数据的分类,其数据由多个类别混合而成,仅用单一分布难以描述,此时混合模型是一个较好的选择,并且可由EM算法获得.模拟实验表明,基于正态混合模型的贝叶斯分类方法是可行有效的.对于特征较多的分类,不同特征对分类的影响不同,本文对每个特征应用基于正态混合模型的贝叶斯分类方法构建基本分类器,然后结合集成学习,用AdaBoost算法赋予每个分类器权重,再线性组合它们得到最终分类器.通过UCI数据库中实际的Wine Data Set验证表明,本文分类方法与集成学习的结合可以得到高准确率和稳定的分类.  相似文献   

6.
针对不平衡数据集分类问题,提出了一种基于聚类的欠采样方法.分别取不同的聚类个数,对训练集中的多数类样本进行若干次聚类,然后用聚类中心作为多数类样本,与少数类样本构成若干个新的训练集,之后用这些训练集训练分类器,剔除具有错误分类倾向的分类器,最后对分类结果进行投票.仿真实验对几种欠采样方法进行比较.实验采用16个平衡率不一的数据集进行测试.理论分析与实验结果表明:提出的基于聚类的欠采样方法能有效地改善不平衡数据集的不平衡性.  相似文献   

7.
机器算法中存在许多不同类型和方式的运行模式,而在诸多算法之中,集成学习的算法是一种基于统计理论以计算机实现的良好机器学习方法.阐述了集成学习的基本思想和实现步骤,运用Bagging集成学习算法试图建立一个个人信用评估模型,以期取得更好的预测结果.运用信息增益法筛选指标,采用V折交叉确认法,利用UCI的信用数据对单个分类器、Bagging集成分类器模型的分类精度和稳健性进行试验比较.结果表明,Bagging-决策树有效的提高了样本的精确性,在个人信用评估的分析中占有较强的优势.  相似文献   

8.
Boosting是一种有效的分类器组合方法,它能够提高不稳定学习算法的分类性能,但对稳定的学习算法效果不明显.BAN(BN augmented Naive-Bayes)是一种增强的贝叶斯网络分类器,通过Boosting很容易提高其分类性能.比较了GBN(general BN)和BAN的打包分类器Wrapping-BAN-GBN与基于Boosting的BAN组合分类器Boosting-BAN.最后通过实验结果显示了在大多数实验数据上,Boosting-BAN分类器显示出较高的分类正确率.  相似文献   

9.
现有一类分类算法通常采用经典欧氏测度描述样本间相似关系,然而欧氏测度不能较好地反映一些数据集样本的内在分布结构,从而影响这些方法对数据的描述能力.提出一种用于改善一类分类器描述性能的高维空间一类数据距离测度学习算法,与已有距离测度学习算法相比,该算法只需提供目标类数据,通过引入样本先验分布正则化项和L1范数惩罚的距离测度稀疏性约束,能有效解决高维空间小样本情况下的一类数据距离测度学习问题,并通过采用分块协调下降算法高效的解决距离测度学习的优化问题.学习的距离测度能容易的嵌入到一类分类器中,仿真实验结果表明采用学习的距离测度能有效改善一类分类器的描述性能,特别能够改善SVDD的描述能力,从而使得一类分类器具有更强的推广能力.  相似文献   

10.
建立了四类基于基因表达的分类器,用以将87名妇女的子宫内膜样本分成癌症患者和非癌症患者.首先利用信噪比过滤掉无关基因,然后利用主成分分析降低样本维数,再针对这四类分类器随机取75个样本作为训练样本,其余的12个样本作为测试样本,实验结果表明这四类分类器适合子宫内膜癌的分类.最后采用留一交叉验证作为评判标准,通过比较,说明5BP-ELMAN分类器是一类更适合子宫内膜癌分类的有效的肿瘤分类器.  相似文献   

11.
Diverse reduct subspaces based co-training for partially labeled data   总被引:1,自引:0,他引:1  
Rough set theory is an effective supervised learning model for labeled data. However, it is often the case that practical problems involve both labeled and unlabeled data, which is outside the realm of traditional rough set theory. In this paper, the problem of attribute reduction for partially labeled data is first studied. With a new definition of discernibility matrix, a Markov blanket based heuristic algorithm is put forward to compute the optimal reduct of partially labeled data. A novel rough co-training model is then proposed, which could capitalize on the unlabeled data to improve the performance of rough classifier learned only from few labeled data. The model employs two diverse reducts of partially labeled data to train its base classifiers on the labeled data, and then makes the base classifiers learn from each other on the unlabeled data iteratively. The classifiers constructed in different reduct subspaces could benefit from their diversity on the unlabeled data and significantly improve the performance of the rough co-training model. Finally, the rough co-training model is theoretically analyzed, and the upper bound on its performance improvement is given. The experimental results show that the proposed model outperforms other representative models in terms of accuracy and even compares favorably with rough classifier trained on all training data labeled.  相似文献   

12.
Target tracking is one of the most important issues in computer vision and has been applied in many fields of science, engineering and industry. Because of the occlusion during tracking, typical approaches with single classifier learn much of occluding background information which results in the decrease of tracking performance, and eventually lead to the failure of the tracking algorithm. This paper presents a new correlative classifiers approach to address the above problem. Our idea is to derive a group of correlative classifiers based on sample set method. Then we propose strategy to establish the classifiers and to query the suitable classifiers for the next frame tracking. In order to deal with nonlinear problem, particle filter is adopted and integrated with sample set method. For choosing the target from candidate particles, we define a similarity measurement between particles and sample set. The proposed sample set method includes the following steps. First, we cropped positive samples set around the target and negative samples set far away from the target. Second, we extracted average Haar-like feature from these samples and calculate their statistical characteristic which represents the target model. Third, we define the similarity measurement based on the statistical characteristic of these two sets to judge the similarity between candidate particles and target model. Finally, we choose the largest similarity score particle as the target in the new frame. A number of experiments show the robustness and efficiency of the proposed approach when compared with other state-of-the-art trackers.  相似文献   

13.
为解决传统的RFM客户细分方法还不能很好地刻画客户行为,同时也没有就RFM指标权重进行分析这一问题,在RFM指标的基础上扩充了客户细分的指标体系,并提出了基于AHP的RFM指标权重确定策略.鉴于传统的单一分类器存在的很多缺陷,提出基于SOM&SVM的组合分类器模型,充分利用SOM和SVM单一分类器各自的优点,综合两种分类器的分类信息,避免单一分类器可能存在的片面性,从而提高分类的准确性.最后通过实例对上述模型的有效性进行验证.  相似文献   

14.
Mathematical Diagnostics (MD) deals with identification problems arising in different practical areas. Some of these problems can be described by mathematical models where it is required to identify points belonging to two or more sets of points. Most of the existing tools provide some identification rule (a classifier) by means of which a given point is assigned (attributed) to one of the given sets. Each classifier can be viewed as a virtual expert. If there exist several classifiers (experts), the problem of evaluation of experts’ conclusions arises. In the paper for the case of supervised classification the method of virtual experts (the VE-method) is described. Based on this method, a generalized VE method is proposed where each of the classifiers can be chosen from a given family of classifiers. As a result, a new optimization problem with a discontinuous functional is stated. Examples illustrating the proposed approach are provided. The work of the second author was supported by the Russian Foundation for Fundamental Studies (RFFI) under Grant No 03-01-00668.  相似文献   

15.
A fuzzy random forest   总被引:4,自引:0,他引:4  
When individual classifiers are combined appropriately, a statistically significant increase in classification accuracy is usually obtained. Multiple classifier systems are the result of combining several individual classifiers. Following Breiman’s methodology, in this paper a multiple classifier system based on a “forest” of fuzzy decision trees, i.e., a fuzzy random forest, is proposed. This approach combines the robustness of multiple classifier systems, the power of the randomness to increase the diversity of the trees, and the flexibility of fuzzy logic and fuzzy sets for imperfect data management. Various combination methods to obtain the final decision of the multiple classifier system are proposed and compared. Some of them are weighted combination methods which make a weighting of the decisions of the different elements of the multiple classifier system (leaves or trees). A comparative study with several datasets is made to show the efficiency of the proposed multiple classifier system and the various combination methods. The proposed multiple classifier system exhibits a good accuracy classification, comparable to that of the best classifiers when tested with conventional data sets. However, unlike other classifiers, the proposed classifier provides a similar accuracy when tested with imperfect datasets (with missing and fuzzy values) and with datasets with noise.  相似文献   

16.
A Feature Selection Newton Method for Support Vector Machine Classification   总被引:4,自引:1,他引:3  
A fast Newton method, that suppresses input space features, is proposed for a linear programming formulation of support vector machine classifiers. The proposed stand-alone method can handle classification problems in very high dimensional spaces, such as 28,032 dimensions, and generates a classifier that depends on very few input features, such as 7 out of the original 28,032. The method can also handle problems with a large number of data points and requires no specialized linear programming packages but merely a linear equation solver. For nonlinear kernel classifiers, the method utilizes a minimal number of kernel functions in the classifier that it generates.  相似文献   

17.
受推荐系统在电子商务领域重大经济利益的驱动,恶意用户以非法牟利为目的实施托攻击,操纵改变推荐结果,使推荐系统面临严峻的信息安全威胁,如何识别和检测托攻击成为保障推荐系统信息安全的关键。传统支持向量机(SVM)方法同时受到小样本和数据不均衡两个问题的制约。为此,提出一种半监督SVM和非对称集成策略相结合的托攻击检测方法。首先训练初始SVM,然后引入K最近邻法优化分类面附近样本的标记质量,利用标记数据和未标记数据的混合样本集减少对标记数据的需求。最后,设计一种非对称加权集成策略,重点关注攻击样本的分类准确率,降低集成分类器对数据不均衡的敏感性。实验结果表明,本文方法有效地解决了小样本问题和数据不均衡分布问题,获得了较好的检测效果。  相似文献   

18.
This article uses projection depth (PD) for robust classification of multivariate data. Here we consider two types of classifiers, namely, the maximum depth classifier and the modified depth-based classifier. The latter involves kernel density estimation, where one needs to choose the associated scale of smoothing. We consider both the single scale and the multi-scale versions of kernel density estimation, and investigate the large sample properties of the resulting classifiers under appropriate regularity conditions. Some simulated and real data sets are analyzed to evaluate the finite sample performance of these classification tools.  相似文献   

19.
High-dimensional low sample size (HDLSS) data are becoming increasingly common in statistical applications. When the data can be partitioned into two classes, a basic task is to construct a classifier that can assign objects to the correct class. Binary linear classifiers have been shown to be especially useful in HDLSS settings and preferable to more complicated classifiers because of their ease of interpretability. We propose a computational tool called direction-projection-permutation (DiProPerm), which rigorously assesses whether a binary linear classifier is detecting statistically significant differences between two high-dimensional distributions. The basic idea behind DiProPerm involves working directly with the one-dimensional projections of the data induced by binary linear classifier. Theoretical properties of DiProPerm are studied under the HDLSS asymptotic regime whereby dimension diverges to infinity while sample size remains fixed. We show that certain variations of DiProPerm are consistent and that consistency is a nontrivial property of tests in the HDLSS asymptotic regime. The practical utility of DiProPerm is demonstrated on HDLSS gene expression microarray datasets. Finally, an empirical power study is conducted comparing DiProPerm to several alternative two-sample HDLSS tests to understand the advantages and disadvantages of each method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号