首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 312 毫秒
1.
基于主题模型的半监督网络文本情感分类研究   总被引:1,自引:0,他引:1  
针对网络评论文本的情感分类问题中存在的数据的不平衡性、无标记性和不规范性问题,提出一种基于主题的闽值调整的半监督学习模型,通过从非结构化文本中提取主题特征,对少量标注情感的文本训练分类器并优化指标调整闽值,达到识别用户评论的情感倾向的目的。仿真研究证明阈值调整的半监督模型对数据非平衡性和无标记性具有较强的适应能力。在实证研究中,对酒店评论文本数据构建的文本情感分类器显示该模型可以有效预测少数类评论样本的情感极性,证实了基于主题模型的闽值调整半监督网络评论文本情感分类模型在实际问题中的适用性与可行性。  相似文献   

2.
公安案件文本语义特征提取指的是从案件文本中提取案件的作案方式等特征.从本质上说问题是一类特殊的文本分类问题.构建了基于卷积神经网络(CNN)的文本语义特征提取方法框架.构建了CNN文本分类模型;针对多标记特征提取问题,使用问题转换法结合CNN分类方法来提取特征;讨论了分类中不均衡数据带来的问题,改进了CNN模型中的损失函数.实证结果表明:使用的CNN模型对于文本分类的效果优于传统的支持向量机等分类模型;使用问题转换法中的二值相关法结合CNN模型进行多标记语义特征提取准确率较高;改进后的CNN模型更加适合于不均衡数据的分类,宏平均F1值有了显著的提升.  相似文献   

3.
基于加权复杂网络的文本关键词提取   总被引:2,自引:0,他引:2  
通过分析基于复杂网络的网络提取的文本关键词新算法.首先根据文本特征词之间的关系构建文本的加权复杂网络模型,其次通过节点的加权聚类系数和节点的介数计算节点的综合特征值,最后根据综合特征值提取出文本关键词.实验结果表明,该算法提取的关键词能够较好地体现文本主题,提取关键词的准确率比已有算法有明显提高.  相似文献   

4.
主要研究不同的分词模式对文本分类结果的影响,采用两种传统的文本表示方法:LDA和LSA,采用两种分类方法:支持向量机和逻辑回归,一共四组不同的实验来比较分析.实验结果表明相对于传统的分词方法来说,第二种搜索引擎式的分词方法通过拆分、添加组合词对分类结果更有效.具体来说,对两种分词采用LDA得到文本表示后,模式二的分类准确率最高95.38%,模式一为93.7%.在对两种分词采用LSA得到文本表示后,模式二的分类准确率最高为96.44%,模式一最高为95.2%.  相似文献   

5.
产品垃圾评论在一定程度上影响了评论信息的参考价值,本文旨在建立识别模型将垃圾评论从评论文本中剔除,保留真实的产品评论。首先,分析了产品评论的特点,从数据搜集、文本预处理、互信息检验、文本表示4个模块提取了14个特征。然后,利用高互补性建立了基于KNN和Bayes算法的组合分类器模型。最后,利用交叉验证对iPhone 6Plus的产品评论进行检验,得到评价指标分别为:正确识别率75.3%、召回率82.1%以及F1值77.5%.  相似文献   

6.
识别用户的购买意图是提升电子商务购买率(PR)的重要方法之一。针对用户购买意图不明确的现象,提出一种新模型。该模型将训练后的Word2Vec(WV)词向量馈入卷积神经网络(CNN),通过深层语义模型(DSSM)进一步提取文本特征。在Keras框架下结合美国建材电商网站家得宝的真实搜索数据进行实证分析。结果表明,在五分类问题中,新模型在测试数据集上的F1-score达80.6%。新模型使用了Word2Vec与CNN提取文本特征,并应用DSSM模型进一步提取了用户检索与商品描述文档在高维空间中的特征表示,最大化利用了用户检索与正确商品描述之间的语义相似度,同时避免了特征提取时主观因素的干扰,提高了商品购买意图的识别效果。  相似文献   

7.
针对肿瘤的早期诊断,提出了一种基于提升小波变换的特征提取的方法,对肿瘤数据样本进行分析鉴别.该方法利用提升小波变换对190例肝癌(包括对照)和107例肺癌(包括对照)基因表达谱芯片数据进行处理后,提取信号的低频信息,经支持向量机训练学习,构造分类器模型,用于癌和非癌样本的区分甄别.实验结果表明,经提升小波变换提取的特征基因,送入分类器中能得到较高的分类率,且在支持向量机中选取线性核函数或径向基函数都能达到较好的分类效果.通过随机选取的20例基因表达谱芯片样本,对所建立的模型进行了测试,获得了很好的效果,因此,本文提出的方法对肿瘤的诊断有一定的应用意义.  相似文献   

8.
传统针对文本数据的分析,往往基于词频、词频逆文本统计量作为文本的表示特征.这类方法往往只反映了文本的部分信息,忽略了文本的内在语义特征.本文研究了中文词语衔接的概率语言模型,其基本思想在于根据文本中词语出现的先后顺序进行建模分析,该模型在短文本数据挖掘中能够很好地针对文本语义进行量化分析.主要解决两类问题:一、如何合理地将中文词转化为数字向量,并且保证中文近义词在数字空间特征上的相似性;二、如何建立恰当的向量空间,将中文文本的语义和结构特征等信息保留在向量空间中.最后结合某城市房屋管理部门留言板的实际留言文本数据,利用BP神经网络和RNN网络两种算法,实现概率语言模型的求解.与传统文本处理方法的对比说明,本文的模型方法针对短文本语义挖掘问题具有一定的优势性.  相似文献   

9.
主要研究垃圾文本识别问题,利用苹果手机评论文本特征向量建立了SVM分类模型对垃圾文本进行识别,并与BP神经网络判别模型结果进行对比,得出苹果手机前400组训练样本的判别正确率为71%,后196组测试样本的判别正确率为70.12%.故得到,影响垃圾观点文本识别效果的主要原因为:1)评论文本的特征项的提取和文本特征空间向量求解.2)判别分类方法的选择,其中SVM文本识别效果最优.  相似文献   

10.
刘潇  王效俐 《运筹与管理》2021,30(3):104-111
对客户价值进行分类, 识别重要价值客户, 对航空公司获利至关重要。本文提出了基于k-means和邻域粗糙集的航空客户价值分类模型。首先, 从客户的当前价值和潜在价值双视角出发, 建立了航空客户综合价值评价指标体系; 之后, 采用基于Elbow的k-means方法对航空客户进行聚类, 采用邻域粗糙集方法对决策系统进行指标约简, 根据约简后的决策系统完成客户价值初筛。评估前先使用SMOTE方法消除数据的不平衡性, 而后采用网格搜索组合分类器的方法对航空客户价值分类的效果进行评估和检验。最后, 根据评估结果对航空客户价值细分。文末, 对国内某航空公司的62988条真实客户记录进行了实证分析和验证, 其中, 潜在VIP客户群的分类准确率达到了92%, 从而为航空客户价值分类提供了一种新思路。  相似文献   

11.
This paper describes a method for periodic subject-related search based on composition of the method of keyword search and subject-related filtering with the use of text classifiers. We consider various classification algorithms from the standpoint of their efficiency in the solution of the problem under study.  相似文献   

12.
New challenges in knowledge extraction include interpreting and classifying data sets while simultaneously considering related information to confirm results or identify false positives. We discuss a data fusion algorithmic framework targeted at this problem. It includes separate base classifiers for each data type and a fusion method for combining the individual classifiers. The fusion method is an extension of current ensemble classification techniques and has the advantage of allowing data to remain in heterogeneous databases. In this paper, we focus on the applicability of such a framework to the protein phosphorylation prediction problem.  相似文献   

13.
Supervised learning methods are powerful techniques to learn a function from a given set of labeled data, the so-called training data. In this paper the support vector machines approach is applied to an image classification task. Starting with the corresponding Tikhonov regularization problem, reformulated as a convex optimization problem, we introduce a conjugate dual problem to it and prove that, whenever strong duality holds, the function to be learned can be expressed via the dual optimal solutions. Corresponding dual problems are then derived for different loss functions. The theoretical results are applied by numerically solving a classification task using high dimensional real-world data in order to obtain optimal classifiers. The results demonstrate the excellent performance of support vector classification for this particular problem.  相似文献   

14.
Social media, such as blogs and on-line forums, contain a huge amount of information that is typically unorganized and fragmented. An important issue, that has been raising importance so far, is to classify on-line texts in order to detect possible anomalies. For example on-line texts representing consumer opinions can be, not only very precious and profitable for companies, but can also represent a serious damage if they are negative or faked. In this contribution we present a novel statistical methodology rooted in the context of classical text classification, in order to address such issues. In the literature, several classifiers have been proposed, among them support vector machine and naive Bayes classifiers. These approaches are not effective when coping with the problem of classifying texts belonging to an unknown author. To this aim, we propose to employ a new method, based on the combination of classification trees with non parametric approaches, such as Kruskal?CWallis and Brunner?CDette?CMunk test. The main application of what we propose is the capability to classify an author as a new one, that is potentially trustable, or as an old one, that is potentially faked.  相似文献   

15.
Mathematical programming (MP) discriminant analysis models are widely used to generate linear discriminant functions that can be adopted as classification models. Nonlinear classification models may have better classification performance than linear classifiers, but although MP methods can be used to generate nonlinear discriminant functions, functions of specified form must be evaluated separately. Piecewise-linear functions can approximate nonlinear functions, and two new MP methods for generating piecewise-linear discriminant functions are developed in this paper. The first method uses maximization of classification accuracy (MCA) as the objective, while the second uses an approach based on minimization of the sum of deviations (MSD). The use of these new MP models is illustrated in an application to a test problem and the results are compared with those from standard MCA and MSD models.  相似文献   

16.
The use of boxes for pattern classification has been widespread and is a fairly natural way in which to partition data into different classes or categories. In this paper we consider multi-category classifiers which are based on unions of boxes. The classification method studied may be described as follows: find boxes such that all points in the region enclosed by each box are assumed to belong to the same category, and then classify remaining points by considering their distances to these boxes, assigning to a point the category of the nearest box. This extends the simple method of classifying by unions of boxes by incorporating a natural way (based on proximity) of classifying points outside the boxes. We analyze the generalization accuracy of such classifiers and we obtain generalization error bounds that depend on a measure of how definitive is the classification of training points.  相似文献   

17.
The Mumford-Shah energy functional is a successful image segmentation model. It is a non-convex variational problem and lacks of good initialization techniques so far. In this paper, motivated by the fact that image histogram is a combination of several Gaussian distributions, and their centers can be considered as approximations of cluster centers, we introduce a histogram-based initialization method to compute the cluster centers. With this technique, we then devise an effective multi-region Mumford-Shah image segmentation method, and adopt the recent proximal alternating minimization method to solve the minimization problem. Experiments indicate that our histogram initialization method is more robust than existing methods,and our segmentation method is very effective for both gray and color images.  相似文献   

18.
In this work we address a technique for effectively clustering points in specific convex sets, called homogeneous boxes, having sides aligned with the coordinate axes (isothetic condition). The proposed clustering approach is based on homogeneity conditions, not according to some distance measure, and, even if it was originally developed in the context of the logical analysis of data, it is now placed inside the framework of Supervised clustering. First, we introduce the basic concepts in box geometry; then, we consider a generalized clustering algorithm based on a class of graphs, called incompatibility graphs. For supervised classification problems, we consider classifiers based on box sets, and compare the overall performances to the accuracy levels of competing methods for a wide range of real data sets. The results show that the proposed method performs comparably with other supervised learning methods in terms of accuracy.  相似文献   

19.
左静  窦祥胜 《运筹与管理》2020,29(1):124-130
由于受形态变化、光照变化、视觉碰撞和视觉模糊的影响,基于监控视频的车辆分类和计数一直都是待解决的复杂问题。为了更好地解决这个问题,本文提出新的模型来更好的提取前景。详细来讲,在初次前景提取中,建立模型判断是否存在车辆碰撞,对存在碰撞的车辆通过灰度空间双阀值和YCbCr图像空间处理后,对前景进行更准确的再提取。并在此基础上针对碰撞车辆,定义间隙特征向量将车辆分割问题转换为寻找分割点的优化问题,从而给出高效的车辆分割算法,对发生碰撞的车辆进行准确分割。之后利用神经网络对车辆分类,并设计一种基于已正确对碰撞车辆分割的算法对车辆计数。实验结果表明,本文提出的模型在视频车辆的分类和计数中取得优异的表现,并且数据处理速度能够满足及时性。比起人为计算车流量或建立三维模型等进行分析车辆碰撞情况下的车辆分类与计数,此方法兼顾了准确性与时效性,效率提高,成本减少。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号