首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 156 毫秒
1.
在商业竞争环境下,推荐系统容易受到托攻击的危害。基于信任关系的社会化推荐算法被证明是解决托攻击问题的有效途径。然而,现有研究仅考虑显式信任关系,隐式信任关系没有被真正挖掘利用。为此,提出了一种基于多元隐式信任关系挖掘的抗攻击社会化推荐算法。首先,借鉴社会学和组织行为科学领域的信任前因框架模型,从全局信任和局部信任两个视角深入研究各信任要素的提取和量化方法。然后,通过信任调节因子集成局部信任度和全局信任度获得用户总体信任度。最后,以用户总体信任度为依据将攻击用户隔离在可信近邻之外,实现基于信任关系的个性化推荐。大量对比实验表明,本文算法在改善推荐准确率的同时有效抑制了托攻击对推荐算法的影响。  相似文献   

2.
基于SVM理论的一种新的数据分类方法   总被引:2,自引:0,他引:2  
基于 SVM分类器在模式识别问题中有独特的优势 ,本文通过对标准 SVM模型的改造 ,提出了一种新的简单的数据分类方法 .理论分析和实验表明 ,该方法与标准 SVM分类方法相比具有处理大规模数据识别的能力且保持较高的样本识别率 ,节省存储空间等优势 .  相似文献   

3.
许多机器学习的实际应用中都存在数据不平衡问题,即某类的样本数目要远小于其他类别.数据不平衡会使得分类问题中的分类面过于倾向于适应大类而忽略小类,导致测试样本被错误地判断为大类.针对该问题,文章提出了一种平衡化图半监督学习方法.该方法在能量函数中引入均衡化因子项,使得置信值不仅在图上尽量光滑且在不同类别之间也尽量均衡,有效减小了数据不均衡的不利影响,21个标准数据集上对比实验的统计分析结果表明新方法在数据不平衡时具有显著(显著性水平为0.05)优于支持向量机以及其他图半监督学习方法的分类效果.  相似文献   

4.
利用传统支持向量机(SVM)对不平衡数据进行分类时,由于真实的少数类支持向量样本过少且难以被识别,造成了分类时效果不是很理想.针对这一问题,提出了一种基于支持向量机混合采样的不平衡数据分类方法(BSMS).该方法首先对经过支持向量机分类的原始不平衡数据按照所处位置的不同划分为支持向量区(SV),多数类非支持向量区(MNSV)以及少数类非支持向量区(FNSV)三个区域,并对MNSV区和FNSV区的样本做去噪处理;然后对SV区分类错误和部分分类正确且靠近决策边界的少数类样本重复进行过采样处理,直到找到测试结果最优的训练数据集;最后有选择的随机删除MNSV区的部分样本.实验结果表明:方法优于其他采样方法.  相似文献   

5.
基于贝叶斯逐步判别法构建入侵检测模型,将入侵检测转化为一个分类判别问题,基于步进式引入的方法淘汰冗余的特征变量,能够在保障判别效果的前提下有效降低原分类判别函数的计算复杂度.使用KDD CUP99数据中10%数据集作为实验数据,以常见的拒绝服务攻击(DoS攻击)为例创建具体的模型实例,实验结果表明,模型对于样本内连接记录的回代判对率和样本外连接记录的检测正确率均较高.  相似文献   

6.
马田系统是由日本著名质量工程学家田口玄一提出的一种模式识别方法,它将正交试验设计、信噪比与马氏距离进行集成,筛选有效特征变量,对待测群体进行诊断、评价和预测.马田系统利用正交表和信噪比筛选特征变量可能存在不足之处,而粗糙集是处理不完善、不确定数据等不完全信息并能进行属性约简的有效方法,引入粗糙集筛选有效特征变量以改进马田系统.癌细胞的及早发现有助于乳腺癌的早期预防和及时治疗,以乳腺癌细胞的分类检测为背景,选取UCI数据库中600个细胞作为研究样本,使用改进马田系统方法区分正常细胞和乳腺癌细胞,并将其分类效果与经典马田系统相比较.结果表明,基于粗糙集的改进马田系统对乳腺癌细胞的分类正确率高于经典马田系统,粗糙集方法大大减少了特征变量个数,可简化数据的收集工作,为医疗上乳腺癌疾病的早期诊断及其他实际分类工作提供技术参考.  相似文献   

7.
随着机器学习和生物信息学的快速发展,癌症亚型分类成为当前研究热点之一.根据亚型的分类,可以指导癌症的治疗和预后.近年来,许多监督学习方法被用于癌症亚型分类.考虑到高维、样本数量少和数据不均衡等特点,本文首先利用LDA进行降维,其次利用SMOTE算法均衡数据,再利用Extra-Trees模型对癌症亚型进行分类,最后基于TCGA中9种癌症25种癌症亚型的3 296个样本来验证模型的有效性.实验结果表明,利用给出的模型进行癌症亚型分类具有很好的效果.  相似文献   

8.
公安案件文本语义特征提取指的是从案件文本中提取案件的作案方式等特征.从本质上说问题是一类特殊的文本分类问题.构建了基于卷积神经网络(CNN)的文本语义特征提取方法框架.构建了CNN文本分类模型;针对多标记特征提取问题,使用问题转换法结合CNN分类方法来提取特征;讨论了分类中不均衡数据带来的问题,改进了CNN模型中的损失函数.实证结果表明:使用的CNN模型对于文本分类的效果优于传统的支持向量机等分类模型;使用问题转换法中的二值相关法结合CNN模型进行多标记语义特征提取准确率较高;改进后的CNN模型更加适合于不均衡数据的分类,宏平均F1值有了显著的提升.  相似文献   

9.
不平衡数据的企业财务预警模型研究   总被引:1,自引:0,他引:1  
在股票市场中,由于被评为"ST"的公司数量远远少于普通的公司,所以用于训练财务预警模型的数据有着严重的不平衡性。而一般的分类模型如logistic回归等并不具备处理不平衡数据的能力。本文应用加权L1正则化支持向量机(w-L1SVM)构建一个可以处理不平衡数据的财务预警模型:一方面,w-L1SVM通过对两类样本的损失函数进行加权处理,有效地解决了样本不平衡性带来的预测精度问题;另一方面,w-L1SVM通过引入LASSO罚,使得模型在训练的过程中可以直接进行特征选择。通过数值模拟,本文验证了w-L1SVM在非平衡数据分类问题中的预测和特征选择表现。在实证研究中,本文针对我国股票市场机械、设备、仪表板块中的上市公司构建了一个基于w-L1SVM的财务预警模型,结果显示基于w-L1SVM的财务预警模型可以有效选择重要的财务指标并预测被评为"ST"的公司,并且其预测效果显著优于非加权的传统模型,这充分说明了w-L1SVM在财务预警问题中的适用性。  相似文献   

10.
客户信用评估是银行等金融企业日常经营活动中的重要组成部分。一般违约样本在客户总体中只占少数,而能按时还款客户样本占多数,这就是客户信用评估中常见的类别不平衡问题。目前,用于客户信用评估的方法尚不能有效解决少数类样本稀缺带来的类别不平衡。本研究引入迁移学习技术整合系统内外部信息,以解决少数类样本稀缺带来的类别不平衡问题。为了提高对来自系统外部少数类样本信息的使用效率,构建了一种新的迁移学习模型:以基于集成技术的迁移装袋模型为基础,使用两阶段抽样和数据分组处理技术分别对其基模型生成和集成策略进行改进。运用重庆某商业银行信用卡客户数据进行的实证研究结果表明:与目前客户信用评估的常用方法相比,新模型能更好地处理绝对稀缺条件下类别不平衡对客户信用评估的影响,特别对占少数的违约客户有更好的预测精度。  相似文献   

11.
A Feature Selection Newton Method for Support Vector Machine Classification   总被引:4,自引:1,他引:3  
A fast Newton method, that suppresses input space features, is proposed for a linear programming formulation of support vector machine classifiers. The proposed stand-alone method can handle classification problems in very high dimensional spaces, such as 28,032 dimensions, and generates a classifier that depends on very few input features, such as 7 out of the original 28,032. The method can also handle problems with a large number of data points and requires no specialized linear programming packages but merely a linear equation solver. For nonlinear kernel classifiers, the method utilizes a minimal number of kernel functions in the classifier that it generates.  相似文献   

12.
Support vector machine (SVM) is a popular tool for machine learning task. It has been successfully applied in many fields, but the parameter optimization for SVM is an ongoing research issue. In this paper, to tune the parameters of SVM, one form of inter-cluster distance in the feature space is calculated for all the SVM classifiers of multi-class problems. Inter-cluster distance in the feature space shows the degree the classes are separated. A larger inter-cluster distance value implies a pair of more separated classes. For each classifier, the optimal kernel parameter which results in the largest inter-cluster distance is found. Then, a new continuous search interval of kernel parameter which covers the optimal kernel parameter of each class pair is determined. Self-adaptive differential evolution algorithm is used to search the optimal parameter combination in the continuous intervals of kernel parameter and penalty parameter. At last, the proposed method is applied to several real word datasets as well as fault diagnosis for rolling element bearings. The results show that it is both effective and computationally efficient for parameter optimization of multi-class SVM.  相似文献   

13.
In many classification applications and face recognition tasks, there exist unlabelled data available for training along with labelled samples. The use of unlabelled data can improve the performance of a classifier. In this paper, a semi-supervised growing neural gas is proposed for learning with such partly labelled datasets in face recognition applications. The classifier is first trained on the labelled data and then gradually unlabelled data is classified and added to the training data. The classifier is retrained; and so on. The proposed iterative algorithm conforms to the EM framework and is demonstrated, on both artificial and real datasets, to significantly boost the classification rate with the use of unlabelled data. The improvement is particularly great when the labelled dataset is small. Comparison with support vector machine classifiers is also given. The algorithm is computationally efficient and easy to implement.  相似文献   

14.
The performance of kernel-based method, such as support vector machine (SVM), is greatly affected by the choice of kernel function. Multiple kernel learning (MKL) is a promising family of machine learning algorithms and has attracted many attentions in recent years. MKL combines multiple sub-kernels to seek better results compared to single kernel learning. In order to improve the efficiency of SVM and MKL, in this paper, the Kullback–Leibler kernel function is derived to develop SVM. The proposed method employs an improved ensemble learning framework, named KLMKB, which applies Adaboost to learning multiple kernel-based classifier. In the experiment for hyperspectral remote sensing image classification, we employ feature selected through Optional Index Factor (OIF) to classify the satellite image. We extensively examine the performance of our approach in comparison to some relevant and state-of-the-art algorithms on a number of benchmark classification data sets and hyperspectral remote sensing image data set. Experimental results show that our method has a stable behavior and a noticeable accuracy for different data set.  相似文献   

15.
We propose two methods for tuning membership functions of a kernel fuzzy classifier based on the idea of SVM (support vector machine) training. We assume that in a kernel fuzzy classifier a fuzzy rule is defined for each class in the feature space. In the first method, we tune the slopes of the membership functions at the same time so that the margin between classes is maximized under the constraints that the degree of membership to which a data sample belongs is the maximum among all the classes. This method is similar to a linear all-at-once SVM. We call this AAO tuning. In the second method, we tune the membership function of a class one at a time. Namely, for a class the slope of the associated membership function is tuned so that the margin between the class and the remaining classes is maximized under the constraints that the degrees of membership for the data belonging to the class are large and those for the remaining data are small. This method is similar to a linear one-against-all SVM. This is called OAA tuning. According to the computer experiment for fuzzy classifiers based on kernel discriminant analysis and those with ellipsoidal regions, usually both methods improve classification performance by tuning membership functions and classification performance by AAO tuning is slightly better than that by OAA tuning.  相似文献   

16.
支持向量机中的参数直接影响其推广能力,针对参数选取的主观性,提出基于改进的遗传算法优化其参数,并将其应用于银行个人信用的五等级分类问题中,针对多分类问题,设计了3个二值分类器,不同分类的参数不同,通过实验证实可以达到更精细的分类效果.  相似文献   

17.
个性化试题推荐、试题难度预测、学习者建模等教育数据挖掘任务需要使用到学生作答数据资源及试题知识点标注,现阶段的试题数据都是由人工标注知识点。因此,利用机器学习方法自动标注试题知识点是一项迫切的需求。针对海量试题资源情况下的试题知识点自动标注问题,本文提出了一种基于集成学习的试题多知识点标注方法。首先,形式化定义了试题知识点标注问题,并借助教材目录和领域知识构建知识点的知识图谱作为类别标签。其次,采用基于集成学习的方法训练多个支持向量机作为基分类器,筛选出表现优异的基分类器进行集成,构建出试题多知识点标注模型。最后,以某在线教育平台数据库中的高中数学试题为实验数据集,应用所提方法预测试题考察的知识点,取得了较好的效果。  相似文献   

18.
《Optimization》2012,61(7):1099-1116
In this article we study support vector machine (SVM) classifiers in the face of uncertain knowledge sets and show how data uncertainty in knowledge sets can be treated in SVM classification by employing robust optimization. We present knowledge-based SVM classifiers with uncertain knowledge sets using convex quadratic optimization duality. We show that the knowledge-based SVM, where prior knowledge is in the form of uncertain linear constraints, results in an uncertain convex optimization problem with a set containment constraint. Using a new extension of Farkas' lemma, we reformulate the robust counterpart of the uncertain convex optimization problem in the case of interval uncertainty as a convex quadratic optimization problem. We then reformulate the resulting convex optimization problems as a simple quadratic optimization problem with non-negativity constraints using the Lagrange duality. We obtain the solution of the converted problem by a fixed point iterative algorithm and establish the convergence of the algorithm. We finally present some preliminary results of our computational experiments of the method.  相似文献   

19.
Previous studies on financial distress prediction (FDP) almost construct FDP models based on a balanced data set, or only use traditional classification methods for FDP modelling based on an imbalanced data set, which often results in an overestimation of an FDP model’s recognition ability for distressed companies. Our study focuses on support vector machine (SVM) methods for FDP based on imbalanced data sets. We propose a new imbalance-oriented SVM method that combines the synthetic minority over-sampling technique (SMOTE) with the Bagging ensemble learning algorithm and uses SVM as the base classifier. It is named as SMOTE-Bagging-based SVM-ensemble (SB-SVM-ensemble), which is theoretically more effective for FDP modelling based on imbalanced data sets with limited number of samples. For comparative study, the traditional SVM method as well as three classical imbalance-oriented SVM methods such as cost-sensitive SVM, SMOTE-SVM, and data-set-partition-based SVM-ensemble are also introduced. We collect an imbalanced data set for FDP from the Chinese publicly traded companies, and carry out 100 experiments to empirically test its effectiveness. The experimental results indicate that the new SB-SVM-ensemble method outperforms the traditional methods and is a useful tool for imbalanced FDP modelling.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号