首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于支持向量机混合采样的不平衡数据分类方法
引用本文:姜飞,杨明,刘雨欣.基于支持向量机混合采样的不平衡数据分类方法[J].数学的实践与认识,2021(1):88-96.
作者姓名:姜飞  杨明  刘雨欣
作者单位:;1.中北大学理学院
基金项目:国家自然科学基金(61971381);山西省自然科学基金(201801D121158)。
摘    要:利用传统支持向量机(SVM)对不平衡数据进行分类时,由于真实的少数类支持向量样本过少且难以被识别,造成了分类时效果不是很理想.针对这一问题,提出了一种基于支持向量机混合采样的不平衡数据分类方法(BSMS).该方法首先对经过支持向量机分类的原始不平衡数据按照所处位置的不同划分为支持向量区(SV),多数类非支持向量区(MNSV)以及少数类非支持向量区(FNSV)三个区域,并对MNSV区和FNSV区的样本做去噪处理;然后对SV区分类错误和部分分类正确且靠近决策边界的少数类样本重复进行过采样处理,直到找到测试结果最优的训练数据集;最后有选择的随机删除MNSV区的部分样本.实验结果表明:方法优于其他采样方法.

关 键 词:不平衡数据  支持向量机  过采样  欠采样

Classification of Unbalanced Data Based on SVM Mixed Sampling
JIANG Fei,YANG Ming,LIU Yu-xin.Classification of Unbalanced Data Based on SVM Mixed Sampling[J].Mathematics in Practice and Theory,2021(1):88-96.
Authors:JIANG Fei  YANG Ming  LIU Yu-xin
Institution:(School of Science,North University of China,Taiyuan 030051,China)
Abstract:When the traditional support vector machine(SVM)is used to classify unbalanced data,the actual minority support vector samples are too small and difficult to be identified,resulting in less than ideal classification results.To solve this problem,an unbalanced data classification method(BSMS)based on mixed sampling of support vector machines is proposed.This method first divides the original unbalanced data classified by SVM into three regions:the support vector region(SV),the majority non-support vector region(MNSV)and the minority non-support vector region(FNSV)according to their location.Then,the SV region classification error and the partial classification correct and the few class samples near the decision boundary are repeatedly oversampled until the best training data set is found.Finally,there is a selection of random deletion of some samples of the MNSV area.The experimental results show that this method is superior to other sampling methods.
Keywords:unbalanced data  SVM  oversampling  undersampling
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号