首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 500 毫秒
1.
A method for feature selection in linear regression based on an extension of Akaike’s information criterion is proposed. The use of classical Akaike’s information criterion (AIC) for feature selection assumes the exhaustive search through all the subsets of features, which has unreasonably high computational and time cost. A new information criterion is proposed that is a continuous extension of AIC. As a result, the feature selection problem is reduced to a smooth optimization problem. An efficient procedure for solving this problem is derived. Experiments show that the proposed method enables one to efficiently select features in linear regression. In the experiments, the proposed procedure is compared with the relevance vector machine, which is a feature selection method based on Bayesian approach. It is shown that both procedures yield similar results. The main distinction of the proposed method is that certain regularization coefficients are identical zeros. This makes it possible to avoid the underfitting effect, which is a characteristic feature of the relevance vector machine. A special case (the so-called nondiagonal regularization) is considered in which both methods are identical.  相似文献   

2.
In this paper, we propose a new optimization framework for improving feature selection in medical data classification. We call this framework Support Feature Machine (SFM). The use of SFM in feature selection is to find the optimal group of features that show strong separability between two classes. The separability is measured in terms of inter-class and intra-class distances. The objective of SFM optimization model is to maximize the correctly classified data samples in the training set, whose intra-class distances are smaller than inter-class distances. This concept can be incorporated with the modified nearest neighbor rule for unbalanced data. In addition, a variation of SFM that provides the feature weights (prioritization) is also presented. The proposed SFM framework and its extensions were tested on 5 real medical datasets that are related to the diagnosis of epilepsy, breast cancer, heart disease, diabetes, and liver disorders. The classification performance of SFM is compared with those of support vector machine (SVM) classification and Logical Data Analysis (LAD), which is also an optimization-based feature selection technique. SFM gives very good classification results, yet uses far fewer features to make the decision than SVM and LAD. This result provides a very significant implication in diagnostic practice. The outcome of this study suggests that the SFM framework can be used as a quick decision-making tool in real clinical settings.  相似文献   

3.
A highly accurate algorithm, based on support vector machines formulated as linear programs (Refs. 1–2), is proposed here as a completely unconstrained minimization problem (Ref. 3). Combined with a chunking procedure (Ref. 4), this approach, which requires nothing more complex than a linear equation solver, leads to a simple and accurate method for classifying million-point datasets. Because a 1-norm support vector machine underlies the proposed approach, the method suppresses input space features as well. A state-of-the-art linear programming package (CPLEX, Ref. 5) fails to solve problems handled by the proposed algorithm.This research was supported by National Science Foundation Grants CCR-0138308 and IIS-0511905.  相似文献   

4.
This paper presents a novel knowledge-based linear classification model for multi-category discrimination of sets or objects with prior knowledge. The prior knowledge is in the form of multiple polyhedral sets belonging to one or more categories or classes and it is introduced as additional constraints into the formulation of the Tikhonov linear least squares multi-class support vector machine model. The resulting formulation leads to a least squares problem that can be solved using matrix methods or iterative methods. Investigations include the development of a linear knowledge-based classification model extended to the case of multi-categorical discrimination and expressed as a single unconstrained optimization problem. Advantages of this formulation include explicit expressions for the classification weights of the classifier(s) and its ability to incorporate and handle prior knowledge directly to the classifiers. In addition it can provide fast solutions to the optimal classification weights for multi-categorical separation without the use of specialized solver-software. To evaluate the model, data and prior knowledge from the Wisconsin breast cancer prognosis and two-phase flow regimes in pipes were used to train and test the proposed formulation.  相似文献   

5.
利用传统支持向量机(SVM)对不平衡数据进行分类时,由于真实的少数类支持向量样本过少且难以被识别,造成了分类时效果不是很理想.针对这一问题,提出了一种基于支持向量机混合采样的不平衡数据分类方法(BSMS).该方法首先对经过支持向量机分类的原始不平衡数据按照所处位置的不同划分为支持向量区(SV),多数类非支持向量区(MN...  相似文献   

6.
为了充分利用SVM在个人信用评估方面的优点、克服其不足,提出了基于支持向量机委员会机器的个人信用评估模型.将模型与基于属性效用函数估计构造新学习样本方法结合起来进行个人信用评估;经实证分析及与SVM方法对比发现,模型具有更好、更快、更多适应性的预测分类能力.  相似文献   

7.
Traditionally, robust and fuzzy support vector machine models are used to handle the binary classification problem with noise and outliers. These models in general suffer from the negative effects of having mislabeled training points and disregard position information. In this paper, we propose a novel method to better address these issues. First, we adopt the intuitionistic fuzzy set approach to detect suspectable mislabeled training points. Then we omit their labels but use their full position information to build a semi-supervised support vector machine (\(\mathrm {S^3VM}\)) model. After that, we reformulate the corresponding model into a non-convex problem and design a branch-and-bound algorithm to solve it. A new lower bound estimator is used to improve the accuracy and efficiency for binary classification. Numerical tests are conducted to compare the performances of the proposed method with other benchmark support vector machine models. The results strongly support the superior performance of the proposed method.  相似文献   

8.
2019年中国绿色债券发行量依旧稳居世界前列,成为民营环保企业重要融资渠道,但是2018年至今,大量环保企业信用风险事件频发为我们敲响了警钟,构建合适的民营环保企业信用风险预警机制迫在眉睫.环保产业属于新兴产业,并以国有企业为主导,民营企业的样本数据具有样本量小,维度高等特征,这导致传统的信用风险模型适用性不强.因此选用加权支持向量机模型,对不同类别样本采取不同权值,选取大量财务特征,最终构建出风险预警模型.研究发现加权支持向量机模型具有十分优秀的预警性能.环保企业本身具有资金回收周期较长并且项目前期投入较高等特点,建议加强财务管理,保障资产流动性,建立完善产业链.  相似文献   

9.
针对肿瘤的早期诊断,提出了一种基于提升小波变换的特征提取的方法,对肿瘤数据样本进行分析鉴别.该方法利用提升小波变换对190例肝癌(包括对照)和107例肺癌(包括对照)基因表达谱芯片数据进行处理后,提取信号的低频信息,经支持向量机训练学习,构造分类器模型,用于癌和非癌样本的区分甄别.实验结果表明,经提升小波变换提取的特征基因,送入分类器中能得到较高的分类率,且在支持向量机中选取线性核函数或径向基函数都能达到较好的分类效果.通过随机选取的20例基因表达谱芯片样本,对所建立的模型进行了测试,获得了很好的效果,因此,本文提出的方法对肿瘤的诊断有一定的应用意义.  相似文献   

10.
Supervised classification is an important part of corporate data mining to support decision making in customer-centric planning tasks. The paper proposes a hierarchical reference model for support vector machine based classification within this discipline. The approach balances the conflicting goals of transparent yet accurate models and compares favourably to alternative classifiers in a large-scale empirical evaluation in real-world customer relationship management applications. Recent advances in support vector machine oriented research are incorporated to approach feature, instance and model selection in a unified framework.  相似文献   

11.
提出了一种基于人脸重要特征的人脸识别方法,首先选取人脸的重要特征并将其具体化,对得到的重要特征进行主成分分析,然后用支持向量机(Support Vector Machine,SVM)设计重要特征分类器来确定测试人脸图像中重要特征,同时设计支持向量机(SVM)人脸分类器,确定人脸图像的所属类别.对ORL人脸图像数据库进行仿真实验,结果表明,该方法要优于一般的基于整体特征的人脸识别方法并有较强的鲁棒性.  相似文献   

12.
In this paper, we formally establish connections between two standard approaches proposed for resolving multi-objective programs, namely, the nonpreemptive and the preemptive methods. We demonstrate in the linear case that, if the preemptive problem has an optimal solution, then there exists a set of weights for the nonpreemptive problem, such that any optimal solution to the nonpreemptive problem is optimal to the preemptive problem. Conversely, and more importantly, any optimal solution to the preemptive problem is optimal to the nonpreemptive problem. A similar result is established for arbitrary multi-objective functions being optimized over a finite discrete set. Thus, the preemptive problem is subsumed within the nonpreemptive problem in these cases. Although we actually construct a set of equivalent weights, we do not advocate our technique as a computational device for solving the preemptive problem. However, a previous attempt (Ref. 1), which does prescribe a set of equivalent weights to solve a preemptive problem as a linear program, is shown to be erroneous. Moreover, our constructive proof exhibits the features of the problem which govern the determination of such equivalent weights.  相似文献   

13.
In this paper, we propose a kernel-free semi-supervised quadratic surface support vector machine model for binary classification. The model is formulated as a mixed-integer programming problem, which is equivalent to a non-convex optimization problem with absolute-value constraints. Using the relaxation techniques, we derive a semi-definite programming problem for semi-supervised learning. By solving this problem, the proposed model is tested on some artificial and public benchmark data sets. Preliminary computational results indicate that the proposed method outperforms some existing well-known methods for solving semi-supervised support vector machine with a Gaussian kernel in terms of classification accuracy.  相似文献   

14.
In this paper, a novel fuzzy support vector machine (FSVM) coupled with a memetic particle swarm optimization (MPSO) algorithm is introduced. Its application to a license plate recognition problem is studied comprehensively. The proposed recognition model comprises linear FSVM classifiers which are used to locate a two-character window of the license plate. A new MPSO algorithm which consists of three layers i.e. a global optimization layer, a component optimization layer, and a local optimization layer is constructed. During the construction process, MPSO performs FSVM parameters tuning, feature selection, and training instance selection simultaneously. A total of 220 real Malaysian car plate images are used for evaluation. The experimental results indicate the effectiveness of the proposed model for undertaking license plate recognition problems.  相似文献   

15.
为了提高临近支持向量机(PSVM)的数值表现,在PSVM的模型中引入了$\ell_0$-范数正则项,提出了稀疏临近支持向量机模型(SPSVM),从而提高分类器的特征选择能力。然而带有$\ell_0$-范数正则项的问题往往是NP-难问题,为了克服这一问题,采用非凸连续函数近似$\ell_0$-范数,并通过适当的DC分解将问题转化成DC规划问题进行求解,同时还讨论了算法的收敛性。数值实验结果表明不论是在仿真数据还是在实际数据中,所提出的方法是比较有效稳定的。  相似文献   

16.
A linear programming model is proposed for assigning linear attribute weights in the journal-ranking problem. The constraints in the model are derived solely from any quasi-dominance relations that can be established between the journals. The objective function of the model minimizes the maximum difference between the implied valuations for the pair of journals that define a constraint. In the sense that personal inputs are not introduced, the derived weights are preference neutral. The feasibility of the procedure is demonstrated for two sets of data. By considering various random samples of journals from the larger data set, it is shown that large differences can emerge in the attribute weights and in the journal rankings from different samples of journals, even when the sample sizes are large relative to the population size.  相似文献   

17.
张文  王强  唐子旭  秦广杰  李健 《运筹与管理》2022,31(11):167-173
机器学习相关技术的发展提升了在线虚假评论识别的准确率,然而现阶段机器学习模型缺少足够量的已标注数据来进行模型训练。本文基于生成式对抗网络(GAN)提出了评论数据集扩充方法GAN-RDE(GAN-Review Dataset Expansion)以解决虚假评论识别中模型训练数据贫乏问题。具体而言,首先将初始评论数据划分为真实评论数据集和虚假评论数据集,使用真实评论数据集和虚假评论数据集分别训练GAN,生成符合真实评论与虚假评论特征分布的向量。然后将GAN训练得到的符合评论特征分布的向量与初始评论数据集的特征词词向量矩阵进行合并,扩充模型训练数据。最后,利用朴素贝叶斯、多层感知机和支持向量机作为基础分类器,对比数据扩充前后虚假评论识别的效果。实验结果表明,使用GAN-RDE方法扩充评论数据集后,机器学习模型对虚假评论识别准确率得到显著提升。  相似文献   

18.
Feature selection consists of choosing a subset of available features that capture the relevant properties of the data. In supervised pattern classification, a good choice of features is fundamental for building compact and accurate classifiers. In this paper, we develop an efficient feature selection method using the zero-norm l 0 in the context of support vector machines (SVMs). Discontinuity at the origin for l 0 makes the solution of the corresponding optimization problem difficult to solve. To overcome this drawback, we use a robust DC (difference of convex functions) programming approach which is a general framework for non-convex continuous optimisation. We consider an appropriate continuous approximation to l 0 such that the resulting problem can be formulated as a DC program. Our DC algorithm (DCA) has a finite convergence and requires solving one linear program at each iteration. Computational experiments on standard datasets including challenging feature-selection problems of the NIPS 2003 feature selection challenge and gene selection for cancer classification show that the proposed method is promising: while it suppresses up to more than 99% of the features, it can provide a good classification. Moreover, the comparative results illustrate the superiority of the proposed approach over standard methods such as classical SVMs and feature selection concave.  相似文献   

19.
In previous studies, a wrapper feature selection method for decision support in steel sheet incremental cold shaping process (SSICS) was proposed. The problem included both regression and classification, while the learned models were neural networks and support vector machines, respectively. SSICS is the type of problem for which the number of features is similar to the number of instances in the data set, this represents many of real world decision support problems found in the industry. This study focuses on several questions and improvements that were left open, suggesting proposals for each of them. More specifically, this study evaluates the relevance of the different cross validation methods in the learned models, but also proposes several improvements such as allowing the number of chosen features as well as some of the parameters of the neural networks to evolve, accordingly. Well-known data sets have been use in this experimentation and an in-depth analysis of the experiment results is included. 5 $\times $ 2 CV has been found the more interesting cross validation method for this kind of problems. In addition, the adaptation of the number of features and, consequently, the model parameters really improves the performance of the approach. The different enhancements have been applied to the real world problem, an several conclusions have been drawn from the results obtained.  相似文献   

20.
对区间型符号数据进行特征选择,可以降低数据的维数,提取数据的关键特征。针对区间型符号数据的特征选择问题,本文提出了一种新的特征选择方法。首先,该方法使用区间数Hausdorff距离和区间数欧氏距离度量区间数的相似性,通过建立使得样本点与样本类中心相似性最大的优化模型来估计区间型符号数据的特征权重。其次,基于特征权重构建相应的分类器来评价所估计特征权重的优劣。最后,为了验证本文方法的有效性,分别在人工生成数据集和真实数据集上进行了数值实验,数值实验结果表明,本文方法可以有效地去除无关特征,识别出与类标号有关的特征。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号