首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We propose a new binary classification and variable selection technique especially designed for high-dimensional predictors. Among many predictors, typically, only a small fraction of them have significant impact on prediction. In such a situation, more interpretable models with better prediction accuracy can be obtained by variable selection along with classification. By adding an ?1-type penalty to the loss function, common classification methods such as logistic regression or support vector machines (SVM) can perform variable selection. Existing penalized SVM methods all attempt to jointly solve all the parameters involved in the penalization problem altogether. When data dimension is very high, the joint optimization problem is very complex and involves a lot of memory allocation. In this article, we propose a new penalized forward search technique that can reduce high-dimensional optimization problems to one-dimensional optimization by iterating the selection steps. The new algorithm can be regarded as a forward selection version of the penalized SVM and its variants. The advantage of optimizing in one dimension is that the location of the optimum solution can be obtained with intelligent search by exploiting convexity and a piecewise linear or quadratic structure of the criterion function. In each step, the predictor that is most able to predict the outcome is chosen in the model. The search is then repeatedly used in an iterative fashion until convergence occurs. Comparison of our new classification rule with ?1-SVM and other common methods show very promising performance, in that the proposed method leads to much leaner models without compromising misclassification rates, particularly for high-dimensional predictors.  相似文献   

2.
Classification on high-dimensional data with thousands to tens of thousands of dimensions is a challenging task due to the high dimensionality and the quality of the feature set. The problem can be addressed by using feature selection to choose only informative features or feature construction to create new high-level features. Genetic programming (GP) using a tree-based representation can be used for both feature construction and implicit feature selection. This work presents a comprehensive study to investigate the use of GP for feature construction and selection on high-dimensional classification problems. Different combinations of the constructed and/or selected features are tested and compared on seven high-dimensional gene expression problems, and different classification algorithms are used to evaluate their performance. The results show that the constructed and/or selected feature sets can significantly reduce the dimensionality and maintain or even increase the classification accuracy in most cases. The cases with overfitting occurred are analysed via the distribution of features. Further analysis is also performed to show why the constructed feature can achieve promising classification performance.  相似文献   

3.
In many real world classification problems, class-conditional classification noise (CCC-Noise) frequently deteriorates the performance of a classifier that is naively built by ignoring it. In this paper, we investigate the impact of CCC-Noise on the quality of a popular generative classifier, normal discriminant analysis (NDA), and its corresponding discriminative classifier, logistic regression (LR). We consider the problem of two multivariate normal populations having a common covariance matrix. We compare the asymptotic distribution of the misclassification error rate of these two classifiers under CCC-Noise. We show that when the noise level is low, the asymptotic error rates of both procedures are only slightly affected. We also show that LR is less deteriorated by CCC-Noise compared to NDA. Under CCC-Noise contexts, the Mahalanobis distance between the populations plays a vital role in determining the relative performance of these two procedures. In particular, when this distance is small, LR tends to be more tolerable to CCC-Noise compared to NDA.  相似文献   

4.
We study three different approaches to formulate a misclassification cost minimizing genetic algorithm (GA) fitness function for a GA-neural network classifier. These three different approaches include a fitness function that directly minimizes total misclassification cost, a fitness function that uses posterior probability for minimizing total misclassification cost and a hybrid fitness function that uses an average value of the first two fitness functions to minimize total misclassification cost. Using simulated data sets representing three different distributions and four different misclassification cost matrices, we test the performance of the three fitness functions on a two-group classification problem. Our results indicate that the posterior probability-based misclassification cost minimizing function and the hybrid fitness function are less prone to training data over fitting, but direct misclassification cost minimizing fitness function provides the lowest overall misclassification cost in training tests. For holdout sample tests, when cost asymmetries are low (less than or equal to a ratio of 1:2), the hybrid misclassification cost minimizing fitness function yields the best results; however, when cost asymmetries are high (equal or greater than a ratio of 1:4), the total misclassification cost minimizing function provides the best results. We validate our findings using a real-world data on a bankruptcy prediction problem.  相似文献   

5.
Latent tree models were proposed as a class of models for unsupervised learning, and have been applied to various problems such as clustering and density estimation. In this paper, we study the usefulness of latent tree models in another paradigm, namely supervised learning. We propose a novel generative classifier called latent tree classifier (LTC). An LTC represents each class-conditional distribution of attributes using a latent tree model, and uses Bayes rule to make prediction. Latent tree models can capture complex relationship among attributes. Therefore, LTC is able to approximate the true distribution behind data well and thus achieves good classification accuracy. We present an algorithm for learning LTC and empirically evaluate it on an extensive collection of UCI data. The results show that LTC compares favorably to the state-of-the-art in terms of classification accuracy. We also demonstrate that LTC can reveal underlying concepts and discover interesting subgroups within each class.  相似文献   

6.
The multi-category classification algorithms play an important role in both theory and practice of machine learning.In this paper,we consider an approach to the multi-category classification based on minimizing a convex surrogate of the nonstandard misclassification loss.We bound the excess misclassification error by the excess convex risk.We construct an adaptive procedure to search the classifier and furthermore obtain its convergence rate to the Bayes rule.  相似文献   

7.
Classification is concerned with the development of rules for the allocation of observations to groups, and is a fundamental problem in machine learning. Much of previous work on classification models investigates two-group discrimination. Multi-category classification is less-often considered due to the tendency of generalizations of two-group models to produce misclassification rates that are higher than desirable. Indeed, producing “good” two-group classification rules is a challenging task for some applications, and producing good multi-category rules is generally more difficult. Additionally, even when the “optimal” classification rule is known, inter-group misclassification rates may be higher than tolerable for a given classification model. We investigate properties of a mixed-integer programming based multi-category classification model that allows for the pre-specification of limits on inter-group misclassification rates. The mechanism by which the limits are satisfied is the use of a reserved judgment region, an artificial category into which observations are placed whose attributes do not sufficiently indicate membership to any particular group. The method is shown to be a consistent estimator of a classification rule with misclassification limits, and performance on simulated and real-world data is demonstrated.  相似文献   

8.
We consider the problem of multivariate density estimation, using samples from the distribution of interest as well as auxiliary samples from a related distribution. We assume that the data from the target distribution and the related distribution may occur individually as well as in pairs. Using nonparametric maximum likelihood estimator of the joint distribution, we derive a kernel density estimator of the marginal density. We show theoretically, in a simple special case, that the implied estimator of the marginal density has smaller integrated mean squared error than that of a similar estimator obtained by ignoring dependence of the paired observations. We establish consistency of the marginal density estimator under suitable conditions. We demonstrate small sample superiority of the proposed estimator over the estimator that ignores dependence of the samples, through a simulation study with dependent and non-normal populations. The application of the density estimator in nonparametric classification is also discussed. It is shown that the misclassification probability of the resulting classifier is asymptotically equivalent to that of the Bayes classifier. We also include a data analytic illustration.  相似文献   

9.
The support vector machine (SVM) is known for its good performance in two-class classification, but its extension to multiclass classification is still an ongoing research issue. In this article, we propose a new approach for classification, called the import vector machine (IVM), which is built on kernel logistic regression (KLR). We show that the IVM not only performs as well as the SVM in two-class classification, but also can naturally be generalized to the multiclass case. Furthermore, the IVM provides an estimate of the underlying probability. Similar to the support points of the SVM, the IVM model uses only a fraction of the training data to index kernel basis functions, typically a much smaller fraction than the SVM. This gives the IVM a potential computational advantage over the SVM.  相似文献   

10.
Multiclass classification and probability estimation have important applications in data analytics. Support vector machines (SVMs) have shown great success in various real-world problems due to their high classification accuracy. However, one main limitation of standard SVMs is that they do not provide class probability estimates, and thus fail to offer uncertainty measure about class prediction. In this article, we propose a simple yet effective framework to endow kernel SVMs with the feature of multiclass probability estimation. The new probability estimator does not rely on any parametric assumption on the data distribution, therefore, it is flexible and robust. Theoretically, we show that the proposed estimator is asymptotically consistent. Computationally, the new procedure can be conveniently implemented using standard SVM softwares. Our extensive numerical studies demonstrate competitive performance of the new estimator when compared with existing methods such as multiple logistic regression, linear discrimination analysis, tree-based methods, and random forest, under various classification settings. Supplementary materials for this article are available online.  相似文献   

11.
Multicategory Classification by Support Vector Machines   总被引:8,自引:0,他引:8  
We examine the problem of how to discriminate between objects of three or more classes. Specifically, we investigate how two-class discrimination methods can be extended to the multiclass case. We show how the linear programming (LP) approaches based on the work of Mangasarian and quadratic programming (QP) approaches based on Vapnik's Support Vector Machine (SVM) can be combined to yield two new approaches to the multiclass problem. In LP multiclass discrimination, a single linear program is used to construct a piecewise-linear classification function. In our proposed multiclass SVM method, a single quadratic program is used to construct a piecewise-nonlinear classification function. Each piece of this function can take the form of a polynomial, a radial basis function, or even a neural network. For the k > 2-class problems, the SVM method as originally proposed required the construction of a two-class SVM to separate each class from the remaining classes. Similarily, k two-class linear programs can be used for the multiclass problem. We performed an empirical study of the original LP method, the proposed k LP method, the proposed single QP method and the original k QP methods. We discuss the advantages and disadvantages of each approach.  相似文献   

12.
The boosting algorithm is one of the most successful binary classification techniques due to its relative immunity to overfitting and flexible implementation. Several attempts have been made to extend the binary boosting algorithm to multiclass classification. In this article, a novel cost-sensitive multiclass boosting algorithm is proposed that naturally extends the popular binary AdaBoost algorithm and admits unequal misclassification costs. The proposed multiclass boosting algorithm achieves superior classification performance by combining weak candidate models that only need to be better than random guessing. More importantly, the proposed algorithm achieves a large margin separation of the training sample while attaining an L1-norm constraint on the model complexity. Finally, the effectiveness of the proposed algorithm is demonstrated in a number of simulated and real experiments. The supplementary files are available online, including the technical proofs, the implemented R code, and the real datasets.  相似文献   

13.
Boosting是一种有效的分类器组合方法,它能够提高不稳定学习算法的分类性能,但对稳定的学习算法效果不明显.BAN(BN augmented Naive-Bayes)是一种增强的贝叶斯网络分类器,通过Boosting很容易提高其分类性能.比较了GBN(general BN)和BAN的打包分类器Wrapping-BAN-GBN与基于Boosting的BAN组合分类器Boosting-BAN.最后通过实验结果显示了在大多数实验数据上,Boosting-BAN分类器显示出较高的分类正确率.  相似文献   

14.
15.
Minimax Optimal Rates of Convergence for Multicategory Classifications   总被引:1,自引:0,他引:1  
In the problem of classification (or pattern recognition), given a set of n samples, we attempt to construct a classifier gn with a small misclassification error. It is important to study the convergence rates of the misclassification error as n tends to infinity. It is known that such a rate can't exist for the set of all distributions. In this paper we obtain the optimal convergence rates for a class of distributions L^(λ,ω) in multicategory classification and nonstandard binary classification.  相似文献   

16.
蒋翠清  梁坤  丁勇  段锐 《运筹与管理》2017,26(2):135-139
网络借贷环境下基于Adaboost的信用评价方法具有较高的基分类器分歧度和样本误分代价。现有研究没有考虑分歧度和误分代价对基分类器样本权重的影响,从而降低了网络借贷信用评价结果的有效性。为此,提出一种基于改进Adaboost的信用评价方法。该方法根据基分类器的误分率,样本在不同基分类器上分类结果的分歧程度,以及样本的误分代价等因素,调整Adaboost模型的样本赋权策略,使得改进后的Adaboost模型能够对分类困难样本和误分代价高的样本实施有针对性的学习,从而提高网络借贷信用评价结果的有效性。基于拍拍贷平台数据的实验结果表明,提出的方法在分类精度和误分代价等方面显著优于传统的基于Adaboost的信用评价方法。  相似文献   

17.
As an extension of Pawlak rough set model, decision-theoretic rough set model (DTRS) adopts the Bayesian decision theory to compute the required thresholds in probabilistic rough set models. It gives a new semantic interpretation of the positive, boundary and negative regions by using three-way decisions. DTRS has been widely discussed and applied in data mining and decision making. However, one limitation of DTRS is its lack of ability to deal with numerical data directly. In order to overcome this disadvantage and extend the theory of DTRS, this paper proposes a neighborhood based decision-theoretic rough set model (NDTRS) under the framework of DTRS. Basic concepts of NDTRS are introduced. A positive region related attribute reduct and a minimum cost attribute reduct in the proposed model are defined and analyzed. Experimental results show that our methods can get a short reduct. Furthermore, a new neighborhood classifier based on three-way decisions is constructed and compared with other classifiers. Comparison experiments show that the proposed classifier can get a high accuracy and a low misclassification cost.  相似文献   

18.
We propose a method that allows for instrument classification from a piece of sound. Features are derived from a pre-filtered time series divided into small windows. Afterwards, features from the (transformed) spectrum, Perceptive Linear Prediction (PLP), and Mel Frequency Cepstral Coefficients (MFCCs) as known from speech processing are selected. As a clustering method, k-means is applied yielding a reduced number of features for the classification task. A SVM classifier using a polynomial kernel yields good results. The accuracy is very convincing given a misclassification error of roughly 19% for 59 different classes of instruments. As expected, misclassification error is smaller for a problem with less classes. The rastamat library (Ellis in PLP and RASTA (and MFCC, and inversion) in Matlab. , online web resource, 2005) functionality has been ported from Matlab to R. This means feature extraction as known from speech processing is now easily available from the statistical programming language R. This software has been used on a cluster of machines for the computer intensive evaluation of the proposed method.  相似文献   

19.
Support Vector Machines (SVMs) is known to be a powerful nonparametric classification technique even for high-dimensional data. Although predictive ability is important, obtaining an easy-to-interpret classifier is also crucial in many applications. Linear SVM provides a classifier based on a linear score. In the case of functional data, the coefficient function that defines such linear score usually has many irregular oscillations, making it difficult to interpret.  相似文献   

20.
A method for the classification of facial expressions from the analysis of facial deformations is presented. This classification process is based on the transferable belief model (TBM) framework. Facial expressions are related to the six universal emotions, namely Joy, Surprise, Disgust, Sadness, Anger, Fear, as well as Neutral. The proposed classifier relies on data coming from a contour segmentation technique, which extracts an expression skeleton of facial features (mouth, eyes and eyebrows) and derives simple distance coefficients from every face image of a video sequence. The characteristic distances are fed to a rule-based decision system that relies on the TBM and data fusion in order to assign a facial expression to every face image. In the proposed work, we first demonstrate the feasibility of facial expression classification with simple data (only five facial distances are considered). We also demonstrate the efficiency of TBM for the purpose of emotion classification. The TBM based classifier was compared with a Bayesian classifier working on the same data. Both classifiers were tested on three different databases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号