首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We introduce new criteria to obtain classification trees for ordinal response variables. At this aim, Breiman et al. (Classification and regression trees. Wadsworth, Belmont, 1984), extended their twoing criterion to the ordinal case. Following CART procedure, we extend the well known Gini–Simpson criterion to the ordinal case. Referring to the exclusivity preference property (introduced by Taylor and Silverman in Stat Comput 3:147–161, 1993, for the nominal case), suitably modified for the ordinal case, a second criterion is introduced. The hereby proposed methods are compared with the ordered twoing criterion via simulations.  相似文献   

2.
A general methodology for selecting predictors for Gaussian generative classification models is presented. The problem is regarded as a model selection problem. Three different roles for each possible predictor are considered: a variable can be a relevant classification predictor or not, and the irrelevant classification variables can be linearly dependent on a part of the relevant predictors or independent variables. This variable selection model was inspired by a previous work on variable selection in model-based clustering. A BIC-like model selection criterion is proposed. It is optimized through two embedded forward stepwise variable selection algorithms for classification and linear regression. The model identifiability and the consistency of the variable selection criterion are proved. Numerical experiments on simulated and real data sets illustrate the interest of this variable selection methodology. In particular, it is shown that this well ground variable selection model can be of great interest to improve the classification performance of the quadratic discriminant analysis in a high dimension context.  相似文献   

3.
In this paper we construct the linear support vector machine (SVM) based on the nonlinear rescaling (NR) methodology (see [Polyak in Math Program 54:177–222, 1992; Polyak in Math Program Ser A 92:197–235, 2002; Polyak and Teboulle in Math Program 76:265–284, 1997] and references therein). The formulation of the linear SVM based on the NR method leads to an algorithm which reduces the number of support vectors without compromising the classification performance compared to the linear soft-margin SVM formulation. The NR algorithm computes both the primal and the dual approximation at each step. The dual variables associated with the given data-set provide important information about each data point and play the key role in selecting the set of support vectors. Experimental results on ten benchmark classification problems show that the NR formulation is feasible. The quality of discrimination, in most instances, is comparable to the linear soft-margin SVM while the number of support vectors in several instances were substantially reduced.  相似文献   

4.
The performance of kernel-based method, such as support vector machine (SVM), is greatly affected by the choice of kernel function. Multiple kernel learning (MKL) is a promising family of machine learning algorithms and has attracted many attentions in recent years. MKL combines multiple sub-kernels to seek better results compared to single kernel learning. In order to improve the efficiency of SVM and MKL, in this paper, the Kullback–Leibler kernel function is derived to develop SVM. The proposed method employs an improved ensemble learning framework, named KLMKB, which applies Adaboost to learning multiple kernel-based classifier. In the experiment for hyperspectral remote sensing image classification, we employ feature selected through Optional Index Factor (OIF) to classify the satellite image. We extensively examine the performance of our approach in comparison to some relevant and state-of-the-art algorithms on a number of benchmark classification data sets and hyperspectral remote sensing image data set. Experimental results show that our method has a stable behavior and a noticeable accuracy for different data set.  相似文献   

5.
We propose a penalized likelihood method to fit the linear discriminant analysis model when the predictor is matrix valued. We simultaneously estimate the means and the precision matrix, which we assume has a Kronecker product decomposition. Our penalties encourage pairs of response category mean matrix estimators to have equal entries and also encourage zeros in the precision matrix estimator. To compute our estimators, we use a blockwise coordinate descent algorithm. To update the optimization variables corresponding to response category mean matrices, we use an alternating minimization algorithm that takes advantage of the Kronecker structure of the precision matrix. We show that our method can outperform relevant competitors in classification, even when our modeling assumptions are violated. We analyze three real datasets to demonstrate our method’s applicability. Supplementary materials, including an R package implementing our method, are available online.  相似文献   

6.
Advances in Data Analysis and Classification - Support vector machine (SVM) is a powerful tool in binary classification, known to attain excellent misclassification rates. On the other hand, many...  相似文献   

7.
This paper presents a new, scaling and rotation invariant encoding scheme for shapes. Support vector machines (SVMs) and artificial neural networks (ANNs) are used for the classifications of shapes encoded by the new method. The SVM classification accuracy rate is 95.9  2.9% in 14 categories and 79.2  2.1% in 40 categories. This shows that SVM is one of the best tools for classification problems. The experimental results showed that SVM achieved better performance than ANN. A sensitivity test is performed to show that SVM is quite robust against different parameter values. In addition, our coding method is comparable to previous coding scheme in terms of SVM and ANN performance.  相似文献   

8.
Credit applicants are assigned to good or bad risk classes according to their record of defaulting. Each applicant is described by a high-dimensional input vector of situational characteristics and by an associated class label. A statistical model, which maps the inputs to the labels, can decide whether a new credit applicant should be accepted or rejected, by predicting the class label given the new inputs. Support vector machines (SVM) from statistical learning theory can build such models from the data, requiring extremely weak prior assumptions about the model structure. Furthermore, SVM divide a set of labelled credit applicants into subsets of ‘typical’ and ‘critical’ patterns. The correct class label of a typical pattern is usually very easy to predict, even with linear classification methods. Such patterns do not contain much information about the classification boundary. The critical patterns (the support vectors) contain the less trivial training examples. For instance, linear discriminant analysis with prior training subset selection via SVM also leads to improved generalization. Using non-linear SVM, more ‘surprising’ critical regions may be detected, but owing to the relative sparseness of the data, this potential seems to be limited in credit scoring practice.  相似文献   

9.
An expert system was desired for a group decision-making process. A highly variable data set from previous groups' decisions was available to simulate past group decisions. This data set has much missing information and contains many possible errors. Classification and regression trees (CART) was selected for rule induction, and compared with multiple linear regression and discriminant analysis. We conclude that CART's decision rules can be used for rule induction. CART uses all available information and can predict observations with missing data. Errors in results from CART compare well with those from multiple linear regression and discriminant analysis. CART results are easier to understand.  相似文献   

10.
提出了一种基于人脸重要特征的人脸识别方法,首先选取人脸的重要特征并将其具体化,对得到的重要特征进行主成分分析,然后用支持向量机(Support Vector Machine,SVM)设计重要特征分类器来确定测试人脸图像中重要特征,同时设计支持向量机(SVM)人脸分类器,确定人脸图像的所属类别.对ORL人脸图像数据库进行仿真实验,结果表明,该方法要优于一般的基于整体特征的人脸识别方法并有较强的鲁棒性.  相似文献   

11.
Multicategory Classification by Support Vector Machines   总被引:8,自引:0,他引:8  
We examine the problem of how to discriminate between objects of three or more classes. Specifically, we investigate how two-class discrimination methods can be extended to the multiclass case. We show how the linear programming (LP) approaches based on the work of Mangasarian and quadratic programming (QP) approaches based on Vapnik's Support Vector Machine (SVM) can be combined to yield two new approaches to the multiclass problem. In LP multiclass discrimination, a single linear program is used to construct a piecewise-linear classification function. In our proposed multiclass SVM method, a single quadratic program is used to construct a piecewise-nonlinear classification function. Each piece of this function can take the form of a polynomial, a radial basis function, or even a neural network. For the k > 2-class problems, the SVM method as originally proposed required the construction of a two-class SVM to separate each class from the remaining classes. Similarily, k two-class linear programs can be used for the multiclass problem. We performed an empirical study of the original LP method, the proposed k LP method, the proposed single QP method and the original k QP methods. We discuss the advantages and disadvantages of each approach.  相似文献   

12.
Support vector machines (SVMs) training may be posed as a large quadratic program (QP) with bound constraints and a single linear equality constraint. We propose a (block) coordinate gradient descent method for solving this problem and, more generally, linearly constrained smooth optimization. Our method is closely related to decomposition methods currently popular for SVM training. We establish global convergence and, under a local error bound assumption (which is satisfied by the SVM QP), linear rate of convergence for our method when the coordinate block is chosen by a Gauss-Southwell-type rule to ensure sufficient descent. We show that, for the SVM QP with n variables, this rule can be implemented in O(n) operations using Rockafellar’s notion of conformal realization. Thus, for SVM training, our method requires only O(n) operations per iteration and, in contrast to existing decomposition methods, achieves linear convergence without additional assumptions. We report our numerical experience with the method on some large SVM QP arising from two-class data classification. Our experience suggests that the method can be efficient for SVM training with nonlinear kernel.  相似文献   

13.
The classification problem consists of using some known objects, usually described by a large vector of features, to induce a model that classifies others into known classes. The present paper deals with the optimization of Nearest Neighbor Classifiers via Metaheuristic Algorithms. The Metaheuristic Algorithms used include tabu search, genetic algorithms and ant colony optimization. The performance of the proposed algorithms is tested using data from 1411 firms derived from the loan portfolio of a leading Greek Commercial Bank in order to classify the firms in different groups representing different levels of credit risk. Also, a comparison of the algorithm with other methods such as UTADIS, SVM, CART, and other classification methods is performed using these data.  相似文献   

14.
We propose a new binary classification and variable selection technique especially designed for high-dimensional predictors. Among many predictors, typically, only a small fraction of them have significant impact on prediction. In such a situation, more interpretable models with better prediction accuracy can be obtained by variable selection along with classification. By adding an ?1-type penalty to the loss function, common classification methods such as logistic regression or support vector machines (SVM) can perform variable selection. Existing penalized SVM methods all attempt to jointly solve all the parameters involved in the penalization problem altogether. When data dimension is very high, the joint optimization problem is very complex and involves a lot of memory allocation. In this article, we propose a new penalized forward search technique that can reduce high-dimensional optimization problems to one-dimensional optimization by iterating the selection steps. The new algorithm can be regarded as a forward selection version of the penalized SVM and its variants. The advantage of optimizing in one dimension is that the location of the optimum solution can be obtained with intelligent search by exploiting convexity and a piecewise linear or quadratic structure of the criterion function. In each step, the predictor that is most able to predict the outcome is chosen in the model. The search is then repeatedly used in an iterative fashion until convergence occurs. Comparison of our new classification rule with ?1-SVM and other common methods show very promising performance, in that the proposed method leads to much leaner models without compromising misclassification rates, particularly for high-dimensional predictors.  相似文献   

15.
本文提出了一种客观的个人信用指标体系.首先利用分类回归树量化每个指标对信用状况的影响程度,并以此量化值为每个指标设置不同的评分权重;然后通过定义风险度量值来确定指标中各个取值的评分,进而建立了新的评估指标体系.通过选取现实样本数据对指标体系做了实证分析,分析结果表明,新建的指标体系能很好地对借款人进行风险评价.  相似文献   

16.
We develop several variable selection methods using signomial function to select relevant variables for multi-class classification by taking all classes into consideration. We introduce a \(\ell _{1}\)-norm regularization function to measure the number of selected variables and two adaptive parameters to apply different importance weights for different variables according to their relative importance. The proposed methods select variables suitable for predicting the output and automatically determine the number of variables to be selected. Then, with the selected variables, they naturally obtain the resulting classifiers without an additional classification process. The classifiers obtained by the proposed methods yield competitive or better classification accuracy levels than those by the existing methods.  相似文献   

17.
受推荐系统在电子商务领域重大经济利益的驱动,恶意用户以非法牟利为目的实施托攻击,操纵改变推荐结果,使推荐系统面临严峻的信息安全威胁,如何识别和检测托攻击成为保障推荐系统信息安全的关键。传统支持向量机(SVM)方法同时受到小样本和数据不均衡两个问题的制约。为此,提出一种半监督SVM和非对称集成策略相结合的托攻击检测方法。首先训练初始SVM,然后引入K最近邻法优化分类面附近样本的标记质量,利用标记数据和未标记数据的混合样本集减少对标记数据的需求。最后,设计一种非对称加权集成策略,重点关注攻击样本的分类准确率,降低集成分类器对数据不均衡的敏感性。实验结果表明,本文方法有效地解决了小样本问题和数据不均衡分布问题,获得了较好的检测效果。  相似文献   

18.
The topological classification is discussed for real polynomials of degree 4 in two real independent variables whose critical points and critical values are all different. It is proved that among the 17 746 topological types of smooth functions with the same number of critical points, at most 426 types are realizable by polynomials of degree 4.  相似文献   

19.
Classification of samples into two or multi-classes is to interest of scientists in almost every field. Traditional statistical methodology for classification does not work well when there are more variables (p) than there are samples (n) and it is highly sensitive to outlying observations. In this study, a robust partial least squares based classification method is proposed to handle data containing outliers where $n\ll p.$ The proposed method is applied to well-known benchmark datasets and its properties are explored by an extensive simulation study.  相似文献   

20.
支持向量机作为基于向量空间的一种传统的机器学习方法,不能直接处理张量类型的数据,否则不仅破坏数据的空间结构,还会造成维度灾难及小样本问题。作为支持向量机的一种高阶推广,用于处理张量数据分类的支持张量机已经引起众多学者的关注,并应用于遥感成像、视频分析、金融、故障诊断等多个领域。与支持向量机类似,已有的支持张量机模型中采用的损失函数多为L0/1函数的代理函数。将直接使用L0/1这一本原函数作为损失函数,并利用张量数据的低秩性,建立针对二分类问题的低秩支持张量机模型。针对这一非凸非连续的张量优化问题,设计交替方向乘子法进行求解,并通过对模拟数据和真实数据进行数值实验,验证模型与算法的有效性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号