首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 140 毫秒
1.
传统的马田系统主要用于分类与诊断.将马田系统作为一种综合评价方法进行研究,分别研究了有基准空间和无基准空间两种情形下的马田系统综合评价方法及步骤.针对传统马田系统变量筛选存在的缺陷,构建多目标规划模型进行评价指标筛选,采用遗传算法求解模型.通过两个实际案例,将马田系统综合评价方法与一些常用的综合评价方法对比研究,结果表明,马田系统可以筛选评价指标和避免指标赋权问题,是一种实用且有效的综合评价方法.  相似文献   

2.
马田系统(MTS)是一种多元模式识别方法,它首先通过正常样本来建立基准空间,再利用正交表和信噪比来筛选有效变量,最后通过马氏距离来进行分类、诊断和预测.当建立基准空间的正常样本中掺杂少数异常点时,MTS的性能必然会受到影响.根据多变量控制图原理对建立基准空间样品的适合性进行判别,将在控制线外的样品点删除后建立新的基准空间,并通过UCI数据集进行可行性分析及分类效果比较,结果显示:经多变量控制图优化后的MTS,其性能得到显著提高.  相似文献   

3.
马田系统是一种新的模式识别技术,是将田口式信噪比的试验设计方法的一整套思想应用到模式识别的特征变量选择问题上,并通过构建正常样品的基准空间,应用马氏距离值进行样品类别的识别.探讨了马田系统的基本原理,并应用MTGS模型方法对费希尔关于鸢尾花类型的判别问题进行研究,显示了马田系统方法的良好判别分类效果.  相似文献   

4.
马田系统是由日本著名质量工程学家田口玄一提出的一种模式识别方法,它将正交试验设计、信噪比与马氏距离进行集成,筛选有效特征变量,对待测群体进行诊断、评价和预测.马田系统利用正交表和信噪比筛选特征变量可能存在不足之处,而粗糙集是处理不完善、不确定数据等不完全信息并能进行属性约简的有效方法,引入粗糙集筛选有效特征变量以改进马田系统.癌细胞的及早发现有助于乳腺癌的早期预防和及时治疗,以乳腺癌细胞的分类检测为背景,选取UCI数据库中600个细胞作为研究样本,使用改进马田系统方法区分正常细胞和乳腺癌细胞,并将其分类效果与经典马田系统相比较.结果表明,基于粗糙集的改进马田系统对乳腺癌细胞的分类正确率高于经典马田系统,粗糙集方法大大减少了特征变量个数,可简化数据的收集工作,为医疗上乳腺癌疾病的早期诊断及其他实际分类工作提供技术参考.  相似文献   

5.
针对包含多个正常类的多元数据异常检测问题,提出了一种基于多分类马田系统的半监督数据异常检测方法.通过对训练数据集中的每个正常类分别建立马氏空间,构建了基于马氏距离的多类测量尺度,方法对测试数据集中正常数据进行分类的同时,能够实现对异常数据的检测.通过模拟带异常值的高斯混合模型数据验证了该方法的有效性.  相似文献   

6.
在实际应用中,经常遇到数据分类集合中某一类的样本数量明显少于其他类的样本数量的数据不平衡问题.在二分类数据集中,一般称样本数目多的一类数据集合为正类,样本数目少的一类数据集合为负类.为了提高算法在不平衡数据集下的分类性能,提出了首先利用K-means找出负类中心点,再根据SMOTE基本原理,得出新的数据集.通过对比新数据集和原不平衡数据集在不同算法中的分类应用,结果表明本文改进算法的分类效果得到明显提升,最后用两两配对T检验验证算法的有效性.  相似文献   

7.
人工智能与医疗数据的融合可以加速疾病诊断过程,提高诊断精度,挖掘诊断过程关键指标,改善医疗工作流程.以医疗决策者偏好为前提,提出基于二次损失函数改进阈值的马田系统算法.通过正交表与信噪比对指标进行优化,降低模型复杂度;通过对阈值的改进,提高模型的灵敏度,满足医疗工作者的决策偏好.将此方法应用于UCI乳腺癌医学数据及三甲医院哮喘临床数据,并与其他改进阈值的马田系统算法及智能算法对比分析.结果表明,改进阈值的马田系统算法识别灵敏度高,简化诊断指标,训练耗时少,是一种更为有效的医疗智能诊断方法.  相似文献   

8.
马田系统是以马氏距离为测量尺度,通过选取正常样本构建马氏空间,对多元系统进行诊断和预测的分类技术。马氏距离对样本数据的变化非常敏感,因此用于构建马氏空间的正常样本的数据质量直接影响到分类的准确率。实际应用中正常样本的选取大多依据主观经验判断,缺乏客观规范的选择机制。本文提出基于控制图的马氏空间生成机理,先由专家选取的正常样本构建初始马氏空间,再以每个正常样品在初始马氏空间和对应的缩减马氏空间上的马氏距离增量作为新的测量尺度,以此建立单值控制图,利用控制图稳定性判定规则剔除异常数据,从而得到稳定状态的马氏空间。实验分析结果表明该方法的有效性且提高了马田系统分类的准确率。  相似文献   

9.
基于非平衡数据集的支持向量域分类模型,提出了一种银行客户个人信用预测方法.首先分析了信用预测的主要方法及其不足,然后研究了支持向量域分类模型及其参数的非负二次规划乘性更新算法,进而提出基于支持向量域分类模型的银行客户个人信用预测方法,最后使用人工数据和实际数据对提出方法与支持向量机预测方法进行对比实验.实验结果表明对于银行客户个人信用预测的非平衡数据分析问题,基于支持向量域模型的分类预测方法更有效.  相似文献   

10.
陶朝杰  杨进 《经济数学》2020,37(3):214-220
虚假评论是电商发展过程中一个无法避免的难题. 针对在线评论数据中样本类别不平衡情况,提出基于BalanceCascade-GBDT算法的虚假评论识别方法. BalanceCascade算法通过设置分类器的误报率逐步缩小大类样本空间,然后集成所有基分类器构建最终分类器. GBDT以其高准确性和可解释性被广泛应用于分类问题中,并且作为样本扰动不稳定算法,是十分合适的基分类模型. 模型基于Yelp评论数据集,采用AUC值作为评价指标,并与逻辑回归、随机森林以及神经网络算法进行对比,实验证明了该方法的有效性.  相似文献   

11.
The classification system is very important for making decision and it has been attracted much attention of many researchers. Usually, the traditional classifiers are either domain specific or produce unsatisfactory results over classification problems with larger size and imbalanced data. Hence, genetic algorithms (GA) are recently being combined with traditional classifiers to find useful knowledge for making decision. Although, the main concerns of such GA-based system are the coverage of less search space and increase of computational cost with the growth of population. In this paper, a rule-based knowledge discovery model, combining C4.5 (a Decision Tree based rule inductive algorithm) and a new parallel genetic algorithm based on the idea of massive parallelism, is introduced. The prime goal of the model is to produce a compact set of informative rules from any kind of classification problem. More specifically, the proposed model receives a base method C4.5 to generate rules which are then refined by our proposed parallel GA. The strength of the developed system has been compared with pure C4.5 as well as the hybrid system (C4.5 + sequential genetic algorithm) on six real world benchmark data sets collected from UCI (University of California at Irvine) machine learning repository. Experiments on data sets validate the effectiveness of the new model. The presented results especially indicate that the model is powerful for volumetric data set.  相似文献   

12.
Unsupervised classification is a highly important task of machine learning methods. Although achieving great success in supervised classification, support vector machine (SVM) is much less utilized to classify unlabeled data points, which also induces many drawbacks including sensitive to nonlinear kernels and random initializations, high computational cost, unsuitable for imbalanced datasets. In this paper, to utilize the advantages of SVM and overcome the drawbacks of SVM-based clustering methods, we propose a completely new two-stage unsupervised classification method with no initialization: a new unsupervised kernel-free quadratic surface SVM (QSSVM) model is proposed to avoid selecting kernels and related kernel parameters, then a golden-section algorithm is designed to generate the appropriate classifier for balanced and imbalanced data. By studying certain properties of proposed model, a convergent decomposition algorithm is developed to implement this non-covex QSSVM model effectively and efficiently (in terms of computational cost). Numerical tests on artificial and public benchmark data indicate that the proposed unsupervised QSSVM method outperforms well-known clustering methods (including SVM-based and other state-of-the-art methods), particularly in terms of classification accuracy. Moreover, we extend and apply the proposed method to credit risk assessment by incorporating the T-test based feature weights. The promising numerical results on benchmark personal credit data and real-world corporate credit data strongly demonstrate the effectiveness, efficiency and interpretability of proposed method, as well as indicate its significant potential in certain real-world applications.  相似文献   

13.
Previous studies on financial distress prediction (FDP) almost construct FDP models based on a balanced data set, or only use traditional classification methods for FDP modelling based on an imbalanced data set, which often results in an overestimation of an FDP model’s recognition ability for distressed companies. Our study focuses on support vector machine (SVM) methods for FDP based on imbalanced data sets. We propose a new imbalance-oriented SVM method that combines the synthetic minority over-sampling technique (SMOTE) with the Bagging ensemble learning algorithm and uses SVM as the base classifier. It is named as SMOTE-Bagging-based SVM-ensemble (SB-SVM-ensemble), which is theoretically more effective for FDP modelling based on imbalanced data sets with limited number of samples. For comparative study, the traditional SVM method as well as three classical imbalance-oriented SVM methods such as cost-sensitive SVM, SMOTE-SVM, and data-set-partition-based SVM-ensemble are also introduced. We collect an imbalanced data set for FDP from the Chinese publicly traded companies, and carry out 100 experiments to empirically test its effectiveness. The experimental results indicate that the new SB-SVM-ensemble method outperforms the traditional methods and is a useful tool for imbalanced FDP modelling.  相似文献   

14.
针对云环境下在线虚拟机部署这一矢量装箱问题进行了研究,提出了多维空间划分模型和在线虚拟机能效部署算法OEEVMP。多维空间划分模型可以引导虚拟机部署,避免多维资源的不均衡利用;基于此模型,提出的OEEVMP算法在物理机运行数量局部最优和全局最优之间取得均衡,从而提高虚拟机部署能效。通过仿真实验,将OEEVMP算法与MFFD算法进行了对比,实验结果验证了所提算法的可行性和有效性。最后,对控制模型的两个参数进行了分析,给出了最佳的参数组合。  相似文献   

15.
遥感影像分类作为遥感技术的一个重要应用,对遥感技术的发展具有重要作用.针对遥感影像数据特点,在目前的非线性研究方法中主要用到的是BP神经网络模型.但是BP神经网络模型存在对初始权阈值敏感、易陷入局部极小值和收敛速度慢的问题.因此,为了提高模型遥感影像分类精度,提出采用MEA-BP模型进行遥感影像数据分类.首先采用思维进化算法代替BP神经网络算法进行初始寻优,再用改进BP算法对优化的网络模型权阈值进一步精确优化,随后建立基于思维进化算法的BP神经网络分类模型,并将其应用到遥感影像数据分类研究中.仿真结果表明,新模型有效提高了遥感影像分类准确性,为遥感影像分类提出了一种新的方法,具有广泛研究价值.  相似文献   

16.
Abstract

The primary model for cluster analysis is the latent class model. This model yields the mixture likelihood. Due to numerous local maxima, the success of the EM algorithm in maximizing the mixture likelihood depends on the initial starting point of the algorithm. In this article, good starting points for the EM algorithm are obtained by applying classification methods to randomly selected subsamples of the data. The performance of the resulting two-step algorithm, classification followed by EM, is compared to, and found superior to, the baseline algorithm of EM started from a random partition of the data. Though the algorithm is not complicated, comparing it to the baseline algorithm and assessing its performance with several classification methods is nontrivial. The strategy employed for comparing the algorithms is to identify canonical forms for the easiest and most difficult datasets to cluster within a large collection of cluster datasets and then to compare the performance of the two algorithms on these datasets. This has led to the discovery that, in the case of three homogeneous clusters, the most difficult datasets to cluster are those in which the clusters are arranged on a line and the easiest are those in which the clusters are arranged on an equilateral triangle. The performance of the two-step algorithm is assessed using several classification methods and is shown to be able to cluster large, difficult datasets consisting of three highly overlapping clusters arranged on a line with 10,000 observations and 8 variables.  相似文献   

17.
支持向量机中的参数直接影响其推广能力,针对参数选取的主观性,提出基于改进的遗传算法优化其参数,并将其应用于银行个人信用的五等级分类问题中,针对多分类问题,设计了3个二值分类器,不同分类的参数不同,通过实验证实可以达到更精细的分类效果.  相似文献   

18.
Each clustering algorithm usually optimizes a qualification metric during its progress. The qualification metric in conventional clustering algorithms considers all the features equally important; in other words each feature participates in the clustering process equivalently. It is obvious that some features have more information than others in a dataset. So it is highly likely that some features should have lower importance degrees during a clustering or a classification algorithm; due to their lower information or their higher variances and etc. So it is always a desire for all artificial intelligence communities to enforce the weighting mechanism in any task that identically uses a number of features to make a decision. But there is always a certain problem of how the features can be participated in the clustering process (in any algorithm, but especially in clustering algorithm) in a weighted manner. Recently, this problem is dealt with by locally adaptive clustering (LAC). However, like its traditional competitors the LAC suffers from inefficiency in data with imbalanced clusters. This paper solves the problem by proposing a weighted locally adaptive clustering (WLAC) algorithm that is based on the LAC algorithm. However, WLAC algorithm suffers from sensitivity to its two parameters that should be tuned manually. The performance of WLAC algorithm is affected by well-tuning of its parameters. Paper proposes two solutions. The first is based on a simple clustering ensemble framework to examine the sensitivity of the WLAC algorithm to its manual well-tuning. The second is based on cluster selection method.  相似文献   

19.
无容量设施选址问题(Uncapacitated Facility Location Problem,UFLP)是一类经典的组合优化问题,被证明是一种NP-hard问题,易于描述却难于求解.首先根据UFLP的数学模型及其具体特征,重新设计了蝙蝠算法的操作算子,给出了求解UFLP的蝙蝠算法.其次构建出三种可行化方法,并将其与求解UFLP的蝙蝠算法和拉格朗日松弛算法相结合,设计了求解该问题的拉格朗日蝙蝠算法.最后通过仿真实例和与其他算法进行比较的方式,验证了该混合算法用来求解UFLP的可行性,是解决离散型问题的一种有效方式.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号