首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
Classification models can be developed by statistical or mathematical programming discriminant analysis techniques. Variable selection extensions of these techniques allow the development of classification models with a limited number of variables. Although stepwise statistical variable selection methods are widely used, the performance of the resultant classification models may not be optimal because of the stepwise selection protocol and the nature of the group separation criterion. A mixed integer programming approach for selecting variables for maximum classification accuracy is developed in this paper and the performance of this approach, measured by the leave-one-out hit rate, is compared with the published results from a statistical approach in which all possible variable subsets were considered. Although this mixed integer programming approach can only be applied to problems with a relatively small number of observations, it may be of great value where classification decisions must be based on a limited number of observations.  相似文献   

2.
Mathematical programming (MP) discriminant analysis models can be used to develop classification models for assigning observations of unknown class membership to one of a number of specified classes using values of a set of features associated with each observation. Since most MP discriminant analysis models generate linear discriminant functions, these MP models are generally used to develop linear classification models. Nonlinear classifiers may, however, have better classification performance than linear classifiers. In this paper, a mixed integer programming model is developed to generate nonlinear discriminant functions composed of monotone piecewise-linear marginal utility functions for each feature and the cut-off value for class membership. It is also shown that this model can be extended for feature selection. The performance of this new MP model for two-group discriminant analysis is compared with statistical discriminant analysis and other MP discriminant analysis models using a real problem and a number of simulated problem sets.  相似文献   

3.
Classification on high-dimensional data with thousands to tens of thousands of dimensions is a challenging task due to the high dimensionality and the quality of the feature set. The problem can be addressed by using feature selection to choose only informative features or feature construction to create new high-level features. Genetic programming (GP) using a tree-based representation can be used for both feature construction and implicit feature selection. This work presents a comprehensive study to investigate the use of GP for feature construction and selection on high-dimensional classification problems. Different combinations of the constructed and/or selected features are tested and compared on seven high-dimensional gene expression problems, and different classification algorithms are used to evaluate their performance. The results show that the constructed and/or selected feature sets can significantly reduce the dimensionality and maintain or even increase the classification accuracy in most cases. The cases with overfitting occurred are analysed via the distribution of features. Further analysis is also performed to show why the constructed feature can achieve promising classification performance.  相似文献   

4.
The curse of dimensionality is based on the fact that high dimensional data is often difficult to work with. A large number of features can increase the noise of the data and thus the error of a learning algorithm. Feature selection is a solution for such problems where there is a need to reduce the data dimensionality. Different feature selection algorithms may yield feature subsets that can be considered local optima in the space of feature subsets. Ensemble feature selection combines independent feature subsets and might give a better approximation to the optimal subset of features. We propose an ensemble feature selection approach based on feature selectors’ reliability assessment. It aims at providing a unique and stable feature selection without ignoring the predictive accuracy aspect. A classification algorithm is used as an evaluator to assign a confidence to features selected by ensemble members based on their associated classification performance. We compare our proposed approach to several existing techniques and to individual feature selection algorithms. Results show that our approach often improves classification performance and feature selection stability for high dimensional data sets.  相似文献   

5.
This paper investigates the feature subset selection problem for the binary classification problem using logistic regression model. We developed a modified discrete particle swarm optimization (PSO) algorithm for the feature subset selection problem. This approach embodies an adaptive feature selection procedure which dynamically accounts for the relevance and dependence of the features included the feature subset. We compare the proposed methodology with the tabu search and scatter search algorithms using publicly available datasets. The results show that the proposed discrete PSO algorithm is competitive in terms of both classification accuracy and computational performance.  相似文献   

6.
Feature selection plays an important role in the successful application of machine learning techniques to large real-world datasets. Avoiding model overfitting, especially when the number of features far exceeds the number of observations, requires selecting informative features and/or eliminating irrelevant ones. Searching for an optimal subset of features can be computationally expensive. Functional magnetic resonance imaging (fMRI) produces datasets with such characteristics creating challenges for applying machine learning techniques to classify cognitive states based on fMRI data. In this study, we present an embedded feature selection framework that integrates sparse optimization for regularization (or sparse regularization) and classification. This optimization approach attempts to maximize training accuracy while simultaneously enforcing sparsity by penalizing the objective function for the coefficients of the features. This process allows many coefficients to become zero, which effectively eliminates their corresponding features from the classification model. To demonstrate the utility of the approach, we apply our framework to three different real-world fMRI datasets. The results show that regularized classifiers yield better classification accuracy, especially when the number of initial features is large. The results further show that sparse regularization is key to achieving scientifically-relevant generalizability and functional localization of classifier features. The approach is thus highly suited for analysis of fMRI data.  相似文献   

7.
We propose a new binary classification and variable selection technique especially designed for high-dimensional predictors. Among many predictors, typically, only a small fraction of them have significant impact on prediction. In such a situation, more interpretable models with better prediction accuracy can be obtained by variable selection along with classification. By adding an ?1-type penalty to the loss function, common classification methods such as logistic regression or support vector machines (SVM) can perform variable selection. Existing penalized SVM methods all attempt to jointly solve all the parameters involved in the penalization problem altogether. When data dimension is very high, the joint optimization problem is very complex and involves a lot of memory allocation. In this article, we propose a new penalized forward search technique that can reduce high-dimensional optimization problems to one-dimensional optimization by iterating the selection steps. The new algorithm can be regarded as a forward selection version of the penalized SVM and its variants. The advantage of optimizing in one dimension is that the location of the optimum solution can be obtained with intelligent search by exploiting convexity and a piecewise linear or quadratic structure of the criterion function. In each step, the predictor that is most able to predict the outcome is chosen in the model. The search is then repeatedly used in an iterative fashion until convergence occurs. Comparison of our new classification rule with ?1-SVM and other common methods show very promising performance, in that the proposed method leads to much leaner models without compromising misclassification rates, particularly for high-dimensional predictors.  相似文献   

8.
Feature selection consists of choosing a subset of available features that capture the relevant properties of the data. In supervised pattern classification, a good choice of features is fundamental for building compact and accurate classifiers. In this paper, we develop an efficient feature selection method using the zero-norm l 0 in the context of support vector machines (SVMs). Discontinuity at the origin for l 0 makes the solution of the corresponding optimization problem difficult to solve. To overcome this drawback, we use a robust DC (difference of convex functions) programming approach which is a general framework for non-convex continuous optimisation. We consider an appropriate continuous approximation to l 0 such that the resulting problem can be formulated as a DC program. Our DC algorithm (DCA) has a finite convergence and requires solving one linear program at each iteration. Computational experiments on standard datasets including challenging feature-selection problems of the NIPS 2003 feature selection challenge and gene selection for cancer classification show that the proposed method is promising: while it suppresses up to more than 99% of the features, it can provide a good classification. Moreover, the comparative results illustrate the superiority of the proposed approach over standard methods such as classical SVMs and feature selection concave.  相似文献   

9.
Bayesian approaches to prediction and the assessment of predictive uncertainty in generalized linear models are often based on averaging predictions over different models, and this requires methods for accounting for model uncertainty. When there are linear dependencies among potential predictor variables in a generalized linear model, existing Markov chain Monte Carlo algorithms for sampling from the posterior distribution on the model and parameter space in Bayesian variable selection problems may not work well. This article describes a sampling algorithm based on the Swendsen-Wang algorithm for the Ising model, and which works well when the predictors are far from orthogonality. In problems of variable selection for generalized linear models we can index different models by a binary parameter vector, where each binary variable indicates whether or not a given predictor variable is included in the model. The posterior distribution on the model is a distribution on this collection of binary strings, and by thinking of this posterior distribution as a binary spatial field we apply a sampling scheme inspired by the Swendsen-Wang algorithm for the Ising model in order to sample from the model posterior distribution. The algorithm we describe extends a similar algorithm for variable selection problems in linear models. The benefits of the algorithm are demonstrated for both real and simulated data.  相似文献   

10.
Keiji Tatsumi  Tetsuzo Tanino 《TOP》2014,22(3):815-840
Machine learning is a very interesting and important branch of artificial intelligence. Among many learning models, the support vector machine is a popular model with high classification ability which can be trained by mathematical programming methods. Since the model was originally formulated for binary classification, various kinds of extensions have been investigated for multi-class classification. In this paper, we review some existing models, and introduce new models which we recently proposed. The models are derived from the viewpoint of multi-objective maximization of geometric margins for a discriminant function, and each model can be trained by solving a second-order cone programming problem. We show that discriminant functions with high generalization ability can be obtained by these models through some numerical experiments.  相似文献   

11.
The main challenge in working with gene expression microarrays is that the sample size is small compared to the large number of variables (genes). In many studies, the main focus is on finding a small subset of the genes, which are the most important ones for differentiating between different types of cancer, for simpler and cheaper diagnostic arrays. In this paper, a sparse Bayesian variable selection method in probit model is proposed for gene selection and classification. We assign a sparse prior for regression parameters and perform variable selection by indexing the covariates of the model with a binary vector. The correlation prior for the binary vector assigned in this paper is able to distinguish models with the same size. The performance of the proposed method is demonstrated with one simulated data and two well known real data sets, and the results show that our method is comparable with other existing methods in variable selection and classification.  相似文献   

12.
In assembly line balancing problems, parallel execution of assembly operations is often advocated because of its enhanced flexibility and minimum lead-time. Although the theoretical maximum number of possible assembly sequences combinatorially explodes with the number of components in a product, graphical representations can depict these sequences in a surveyable way. The AND/OR graph representation is an appropriate basis for optimum sequence selection, which can be achieved via heuristic, metaheuristic, and exact methods. The exact method, based on binary linear programming, is described. To arrive at the appropriate model, a novel approach for AND/OR graph generation, based on subassembly detection, is presented. The method is demonstrated with simple cases and next extended to increasingly complex products. A modification of the optimization method is applied, which enables a search for sequences with maximum parallelism.  相似文献   

13.
The presence of less relevant or highly correlated features often decrease classification accuracy. Feature selection in which most informative variables are selected for model generation is an important step in data-driven modeling. In feature selection, one often tries to satisfy multiple criteria such as feature discriminating power, model performance or subset cardinality. Therefore, a multi-objective formulation of the feature selection problem is more appropriate. In this paper, we propose to use fuzzy criteria in feature selection by using a fuzzy decision making framework. This formulation allows for a more flexible definition of the goals in feature selection, and avoids the problem of weighting different goals is classical multi-objective optimization. The optimization problem is solved using an ant colony optimization algorithm proposed in our previous work. We illustrate the added value of the approach by applying our proposed fuzzy feature selection algorithm to eight benchmark problems.  相似文献   

14.
Discrete support vector machines (DSVM), originally proposed for binary classification problems, have been shown to outperform other competing approaches on well-known benchmark datasets. Here we address their extension to multicategory classification, by developing three different methods. Two of them are based respectively on one-against-all and round-robin classification schemes, in which a number of binary discrimination problems are solved by means of a variant of DSVM. The third method directly addresses the multicategory classification task, by building a decision tree in which an optimal split to separate classes is derived at each node by a new extended formulation of DSVM. Computational tests on publicly available datasets are then conducted to compare the three multicategory classifiers based on DSVM with other methods, indicating that the proposed techniques achieve significantly higher accuracies. This research was partially supported by PRIN grant 2004132117.  相似文献   

15.
Feature Selection (FS) is an important pre-processing step in data mining and classification tasks. The aim of FS is to select a small subset of most important and discriminative features. All the traditional feature selection methods assume that the entire input feature set is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with time as new features stream in. A critical challenge for online streaming feature selection (OSFS) is the unavailability of the entire feature set before learning starts. Several efforts have been made to address the OSFS problem, however they all need some prior knowledge about the entire feature space to select informative features. In this paper, the OSFS problem is considered from the rough sets (RS) perspective and a new OSFS algorithm, called OS-NRRSAR-SA, is proposed. The main motivation for this consideration is that RS-based data mining does not require any domain knowledge other than the given dataset. The proposed algorithm uses the classical significance analysis concepts in RS theory to control the unknown feature space in OSFS problems. This algorithm is evaluated extensively on several high-dimensional datasets in terms of compactness, classification accuracy, run-time, and robustness against noises. Experimental results demonstrate that the algorithm achieves better results than existing OSFS algorithms, in every way.  相似文献   

16.
In the context of learning theory many efforts have been devoted to developing classification algorithms able to scale up with massive data problems. In this paper the complementary issue is addressed, aimed at deriving powerful classification rules by accurately learning from few data. This task is accomplished by solving a new mixed integer programming model that extends the notion of discrete support vector machines, in order to derive an optimal set of separating hyperplanes for binary classification problems. According to the cardinality of the set of hyperplanes, the classification region may take the form of a convex polyhedron or a polytope in the original space where the examples are defined. Computational tests on benchmark datasets highlight the effectiveness of the proposed model, that yields the greatest accuracy when compared to other classification approaches. This research was partially supported by PRIN grant 2004132117.  相似文献   

17.
In this paper, we propose a kernel-free semi-supervised quadratic surface support vector machine model for binary classification. The model is formulated as a mixed-integer programming problem, which is equivalent to a non-convex optimization problem with absolute-value constraints. Using the relaxation techniques, we derive a semi-definite programming problem for semi-supervised learning. By solving this problem, the proposed model is tested on some artificial and public benchmark data sets. Preliminary computational results indicate that the proposed method outperforms some existing well-known methods for solving semi-supervised support vector machine with a Gaussian kernel in terms of classification accuracy.  相似文献   

18.
We develop a class of methods for minimizing a nondifferentiable function which is the maximum of a finite number of smooth functions. The methods proceed by solving iteratively quadratic programming problems to generate search directions. For efficiency the matrices in the quadratic programming problems are suggested to be updated in a variable metric way. By doing so, the methods possess many attractive features of variable metric methods and can be viewed as their natural extension to the nondifferentiable case. To avoid the difficulties of an exact line search, a practical stepsize procedure is also introduced. Under mild assumptions the resulting method converge globally.Research supported by National Science Foundation under grant number ENG 7903881.  相似文献   

19.
In this paper, a logarithmic method was developed to solve optimization problems containing the product of free-sign discrete functions (PFDF). The current deterministic methods used to handle these problems are based on the concept of continuous variables; therefore, the methods always transform the original model into another programming model (e.g., DC programming, convex programming) and solve them with a commercial solver. As the nature of a discrete variable is quite different from that of a continuous one, developing a novel method to address the above mentioned problems is necessary. This study proposes a concise and efficient method that linearizes PFDF term into a set of linear inequalities directly without redundant transformation. Further, the proposed method only requires the logarithmic numbers of binary variables and constraints. Numerical examples demonstrate that the proposed formulation significantly outperforms current approaches.  相似文献   

20.
We consider linear programming approaches for support vector machines (SVM). The linear programming problems are introduced as an approximation of the quadratic programming problems commonly used in SVM. When we consider the kernel based nonlinear discriminators, the approximation can be viewed as kernel principle component analysis which generates an important subspace from the feature space characterized the kernel function. We show that any data points nonlinearly, and implicitly, projected into the feature space by kernel functions can be approximately expressed as points lying a low dimensional Euclidean space explicitly, which enables us to develop linear programming formulations for nonlinear discriminators. We also introduce linear programming formulations for multicategory classification problems. We show that the same maximal margin principle exploited in SVM can be involved into the linear programming formulations. Moreover, considering the low dimensional feature subspace extraction, we can generate nonlinear multicategory discriminators by solving linear programming problems.Numerical experiments on real world datasets are presented. We show that the fairly low dimensional feature subspace can achieve a reasonable accuracy, and that the linear programming formulations calculate discriminators efficiently. We also discuss a sampling strategy which might be crucial for huge datasets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号