首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 119 毫秒
1.
针对不平衡数据集分类问题,提出了一种基于聚类的欠采样方法.分别取不同的聚类个数,对训练集中的多数类样本进行若干次聚类,然后用聚类中心作为多数类样本,与少数类样本构成若干个新的训练集,之后用这些训练集训练分类器,剔除具有错误分类倾向的分类器,最后对分类结果进行投票.仿真实验对几种欠采样方法进行比较.实验采用16个平衡率不一的数据集进行测试.理论分析与实验结果表明:提出的基于聚类的欠采样方法能有效地改善不平衡数据集的不平衡性.  相似文献   

2.
许多机器学习的实际应用中都存在数据不平衡问题,即某类的样本数目要远小于其他类别.数据不平衡会使得分类问题中的分类面过于倾向于适应大类而忽略小类,导致测试样本被错误地判断为大类.针对该问题,文章提出了一种平衡化图半监督学习方法.该方法在能量函数中引入均衡化因子项,使得置信值不仅在图上尽量光滑且在不同类别之间也尽量均衡,有效减小了数据不均衡的不利影响,21个标准数据集上对比实验的统计分析结果表明新方法在数据不平衡时具有显著(显著性水平为0.05)优于支持向量机以及其他图半监督学习方法的分类效果.  相似文献   

3.
考虑到构建二叉树支持向量机时样本的分布情况对分类器推广能力具有较大影响,提出一种改进的二叉树支持向量机层次结构构建方法.以类间样本距离和带权值的类内样本距离与其标准差的比值作为类的分类度.将类间距离大且类内样本平均分布广的类最先分离.利用标准数据集,通过与不同多类分类算法比较,验证了改进的二叉树支持向量机的优越性.对双转子涡喷发动机气路部件进行应用改进的算法进行故障诊断,得到了较好的故障识别率.  相似文献   

4.
心电信号分类是医疗保健领域的重要研究内容.针对大多数方法不能很好地降低样本数量少的类别漏诊率,以及降低预处理操作的复杂性问题,提出了一种基于改进深度残差收缩网络(IDRSN)的心电信号分类算法(即DRSL算法).首先,使用合成少数类过采样技术(SMOTE)扩充数量少的类别样本,从而解决了类不平衡问题;其次,利用改进深度残差收缩网络提取空间特征,其残差模块可以避免网络层加深造成的过拟合,压缩激励和软阈值化子网络可以提取重要局部特征并自动去除噪声;然后,通过长短期记忆网络(LSTM)提取时间特征;最后,利用全连接网络输出分类结果.在MIT-BIH心律失常数据集上的实验结果表明,该算法的分类性能优于IDRSN、DRSN、GAN+2DCNN、CNN+LSTM_ATTENTION、SE-CNN-LSTM分类算法.  相似文献   

5.
基于非平衡数据集的支持向量域分类模型,提出了一种银行客户个人信用预测方法.首先分析了信用预测的主要方法及其不足,然后研究了支持向量域分类模型及其参数的非负二次规划乘性更新算法,进而提出基于支持向量域分类模型的银行客户个人信用预测方法,最后使用人工数据和实际数据对提出方法与支持向量机预测方法进行对比实验.实验结果表明对于银行客户个人信用预测的非平衡数据分析问题,基于支持向量域模型的分类预测方法更有效.  相似文献   

6.
《数理统计与管理》2015,(5):809-820
不平衡数据是指分类问题中目标变量的某一类观测值数量远大于其他类观测值数量的数据。针对处理不平衡数据算法SMOTE及其衍生算法的不足,本文提出一种新的向上采样算法SMUP(Synthetic Minority Using Proximity of Random Forests),通过样本相似度改进SMOTE算法中的距离测量方式,提高了算法的分类精度。实验结果表明,基于SMUP算法的单分类器能有效提升少数类的分类正确率,同时解决了SMOTE对定类型特征变量距离测度不佳的难题;基于SMUP算法的组合分类器分类效果也明显优于SMOTE衍生算法;最重要的是,SMUP将连续型、混合型和定类型这三种特征变量的距离测度整合到一个统一的框架下,为实际应用提供了便利。  相似文献   

7.
为了减少求支持向量过程中二次规划的复杂度,利用训练样本集的几何信息,选出两类中离另一类最近的边界向量集合,它是样本中最有可能成为支持向量的一部分,用它代替原样本集进行训练.对新增样本,若存在违反KKT条件的样本,只对这部分新样本进行学习.同时找出原样本中可能转化为支持向量的非支持向量样本.基于分析结果,提出了一种新的基于最近边界向量的增量式支持向量机学习算法.对标准数据集的实验结果表明,算法是可行的,有效的.  相似文献   

8.
基于支持向量机的最优二分类方法,以癌症诊断为例,构建了疾病诊断的支持向量机模型.对50例非癌症患者和100例癌症患者的腺苷三磷酸酶(ATP酶)和琥珀酸脱氢酶(SDH酶)活性两项指标分组进行训练和仿真诊断,检测样本的诊断正确率为98.03%,故可以用支持向量机建立临床疾病诊断系统.  相似文献   

9.
针对肿瘤的早期诊断,提出了一种基于提升小波变换的特征提取的方法,对肿瘤数据样本进行分析鉴别.该方法利用提升小波变换对190例肝癌(包括对照)和107例肺癌(包括对照)基因表达谱芯片数据进行处理后,提取信号的低频信息,经支持向量机训练学习,构造分类器模型,用于癌和非癌样本的区分甄别.实验结果表明,经提升小波变换提取的特征基因,送入分类器中能得到较高的分类率,且在支持向量机中选取线性核函数或径向基函数都能达到较好的分类效果.通过随机选取的20例基因表达谱芯片样本,对所建立的模型进行了测试,获得了很好的效果,因此,本文提出的方法对肿瘤的诊断有一定的应用意义.  相似文献   

10.
支持向量机作为基于向量空间的一种传统的机器学习方法,不能直接处理张量类型的数据,否则不仅破坏数据的空间结构,还会造成维度灾难及小样本问题。作为支持向量机的一种高阶推广,用于处理张量数据分类的支持张量机已经引起众多学者的关注,并应用于遥感成像、视频分析、金融、故障诊断等多个领域。与支持向量机类似,已有的支持张量机模型中采用的损失函数多为L0/1函数的代理函数。将直接使用L0/1这一本原函数作为损失函数,并利用张量数据的低秩性,建立针对二分类问题的低秩支持张量机模型。针对这一非凸非连续的张量优化问题,设计交替方向乘子法进行求解,并通过对模拟数据和真实数据进行数值实验,验证模型与算法的有效性。  相似文献   

11.
Previous studies on financial distress prediction (FDP) almost construct FDP models based on a balanced data set, or only use traditional classification methods for FDP modelling based on an imbalanced data set, which often results in an overestimation of an FDP model’s recognition ability for distressed companies. Our study focuses on support vector machine (SVM) methods for FDP based on imbalanced data sets. We propose a new imbalance-oriented SVM method that combines the synthetic minority over-sampling technique (SMOTE) with the Bagging ensemble learning algorithm and uses SVM as the base classifier. It is named as SMOTE-Bagging-based SVM-ensemble (SB-SVM-ensemble), which is theoretically more effective for FDP modelling based on imbalanced data sets with limited number of samples. For comparative study, the traditional SVM method as well as three classical imbalance-oriented SVM methods such as cost-sensitive SVM, SMOTE-SVM, and data-set-partition-based SVM-ensemble are also introduced. We collect an imbalanced data set for FDP from the Chinese publicly traded companies, and carry out 100 experiments to empirically test its effectiveness. The experimental results indicate that the new SB-SVM-ensemble method outperforms the traditional methods and is a useful tool for imbalanced FDP modelling.  相似文献   

12.
In this work, we create a quality map of a slate deposit, using the results of an investigation based on surface geology and continuous core borehole sampling. Once the quality of the slate and the location of the sampling points have been defined, different kinds of support vector machines (SVMs)—SVM classification (multiclass one-against-all), ordinal SVM and SVM regression—are used to draw up the quality map. The results are also compared with those for kriging.  相似文献   

13.
In this paper, we propose a new optimization framework for improving feature selection in medical data classification. We call this framework Support Feature Machine (SFM). The use of SFM in feature selection is to find the optimal group of features that show strong separability between two classes. The separability is measured in terms of inter-class and intra-class distances. The objective of SFM optimization model is to maximize the correctly classified data samples in the training set, whose intra-class distances are smaller than inter-class distances. This concept can be incorporated with the modified nearest neighbor rule for unbalanced data. In addition, a variation of SFM that provides the feature weights (prioritization) is also presented. The proposed SFM framework and its extensions were tested on 5 real medical datasets that are related to the diagnosis of epilepsy, breast cancer, heart disease, diabetes, and liver disorders. The classification performance of SFM is compared with those of support vector machine (SVM) classification and Logical Data Analysis (LAD), which is also an optimization-based feature selection technique. SFM gives very good classification results, yet uses far fewer features to make the decision than SVM and LAD. This result provides a very significant implication in diagnostic practice. The outcome of this study suggests that the SFM framework can be used as a quick decision-making tool in real clinical settings.  相似文献   

14.
We introduce a novel modification to standard support vector machine (SVM) formulations based on a limited amount of penalty-free slack to reduce the influence of misclassified samples or outliers. We show that free slack relaxes support vectors and pushes them towards their respective classes, hence we use the name relaxed support vector machines (RSVM) for our method. We present theoretical properties of the RSVM formulation and develop its dual formulation for nonlinear classification via kernels. We show the connection between the dual RSVM and the dual of the standard SVM formulations. We provide error bounds for RSVM and show it to be stable, universally consistent and tighter than error bounds for standard SVM. We also introduce a linear programming version of RSVM, which we call RSVMLP. We apply RSVM and RSVMLP to synthetic data and benchmark binary classification problems, and compare our results with standard SVM classification results. We show that relaxed influential support vectors may lead to better classification results. We develop a two-phase method called RSVM2 for multiple instance classification (MIC) problems, where RSVM formulations are used as classifiers. We extend the two-phase method to the linear programming case and develop RSVMLP2. We demonstrate the classification characteristics of RSVM2 and RSVMLP2, and report our classification results compared to results obtained by other SVM-based MIC methods on public benchmark datasets. We show that both RSVM2 and RSVMLP2 are faster and produce more accurate classification results.  相似文献   

15.
在地质科学中,正确的岩石分类有助于研究岩石的成因、形成条件、演化过程和工程设计等.由于地质条件的多样性、变异性及复杂性,人们很难对岩石样本进行准确的分类.通过主成分分析法(PCA)从影响火成岩分类的众多氧化物评价指标中提取出主成分,用遗传算法(GA)优化支持向量机参数,并采用支持向量机方法(SVM)对实际火成岩公开数据进行训练,建立了火成岩岩石分类的PCA-GA-SVM模型,同时结合火成岩实际数据将预测结果和基于Levenberg-Marquardt算法改进的BP神经网络模型(LM-BP)的预测结果做了比较.结果表明:基于PCA-GA-SVM模型得到的火成岩分类预测结果精度较LM-BP神经网络有很大的提高,与实际分类相符,有广泛的应用前景.  相似文献   

16.
The support vector machine (SVM) is known for its good performance in two-class classification, but its extension to multiclass classification is still an ongoing research issue. In this article, we propose a new approach for classification, called the import vector machine (IVM), which is built on kernel logistic regression (KLR). We show that the IVM not only performs as well as the SVM in two-class classification, but also can naturally be generalized to the multiclass case. Furthermore, the IVM provides an estimate of the underlying probability. Similar to the support points of the SVM, the IVM model uses only a fraction of the training data to index kernel basis functions, typically a much smaller fraction than the SVM. This gives the IVM a potential computational advantage over the SVM.  相似文献   

17.
为了充分利用SVM在个人信用评估方面的优点、克服其不足,提出了基于支持向量机委员会机器的个人信用评估模型.将模型与基于属性效用函数估计构造新学习样本方法结合起来进行个人信用评估;经实证分析及与SVM方法对比发现,模型具有更好、更快、更多适应性的预测分类能力.  相似文献   

18.
The support vector machine (SVM) represents a new and very promising technique for machine learning tasks involving classification, regression or novelty detection. Improvements of its generalization ability can be achieved by incorporating prior knowledge of the task at hand.We propose a new hybrid algorithm consisting of signal-adapted wavelet decompositions and hard margin SVMs for waveform classification. The adaptation of the wavelet decompositions is tailored for hard margin SV classifiers with radial basis functions as kernels. It allows the optimization of the representation of the data before training the SVM and does not suffer from computationally expensive validation techniques.We assess the performance of our algorithm against the background of current concerns in medical diagnostics, namely the classification of endocardial electrograms and the detection of otoacoustic emissions. Here the performance of hard margin SVMs can significantly be improved by our adapted preprocessing step.  相似文献   

19.
The existing support vector machines (SVMs) are all assumed that all the features of training samples have equal contributions to construct the optimal separating hyperplane. However, for a certain real-world data set, some features of it may possess more relevances to the classification information, while others may have less relevances. In this paper, the linear feature-weighted support vector machine (LFWSVM) is proposed to deal with the problem. Two phases are employed to construct the proposed model. First, the mutual information (MI) based approach is used to assign appropriate weights for each feature of the whole given data set. Second, the proposed model is trained by the samples with their features weighted by the obtained feature weight vector. Meanwhile, the feature weights are embedded in the quadratic programming through detailed theoretical deduction to obtain the dual solution to the original optimization problem. Although the calculation of feature weights may add an extra computational cost, the proposed model generally exhibits better generalization performance over the traditional support vector machine (SVM) with linear kernel function. Experimental results upon one synthetic data set and several benchmark data sets confirm the benefits in using the proposed method. Moreover, it is also shown in experiments that the proposed MI based approach to determining feature weights is superior to the other two mostly used methods.  相似文献   

20.
The performance of kernel-based method, such as support vector machine (SVM), is greatly affected by the choice of kernel function. Multiple kernel learning (MKL) is a promising family of machine learning algorithms and has attracted many attentions in recent years. MKL combines multiple sub-kernels to seek better results compared to single kernel learning. In order to improve the efficiency of SVM and MKL, in this paper, the Kullback–Leibler kernel function is derived to develop SVM. The proposed method employs an improved ensemble learning framework, named KLMKB, which applies Adaboost to learning multiple kernel-based classifier. In the experiment for hyperspectral remote sensing image classification, we employ feature selected through Optional Index Factor (OIF) to classify the satellite image. We extensively examine the performance of our approach in comparison to some relevant and state-of-the-art algorithms on a number of benchmark classification data sets and hyperspectral remote sensing image data set. Experimental results show that our method has a stable behavior and a noticeable accuracy for different data set.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号