首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
A Feature Selection Newton Method for Support Vector Machine Classification   总被引:4,自引:1,他引:3  
A fast Newton method, that suppresses input space features, is proposed for a linear programming formulation of support vector machine classifiers. The proposed stand-alone method can handle classification problems in very high dimensional spaces, such as 28,032 dimensions, and generates a classifier that depends on very few input features, such as 7 out of the original 28,032. The method can also handle problems with a large number of data points and requires no specialized linear programming packages but merely a linear equation solver. For nonlinear kernel classifiers, the method utilizes a minimal number of kernel functions in the classifier that it generates.  相似文献   

2.
Based on the Adaboost algorithm, a modified boosting method is proposed in this paper for solving classification problems. This method predicts the class label of an example as the weighted majority voting of an ensemble of classifiers. Each classifier is obtained by applying a given weak learner to a subsample (with size smaller than that of the original training set) which is drawn from the original training set according to the probability distribution maintained over the training set. A parameter is introduced into the reweighted scheme proposed in Adaboost to update the probabilities assigned to training examples so that the algorithm can be more accurate than Adaboost. The experimental results on synthetic and several real-world data sets available from the UCI repository show that the proposed method improves the prediction accuracy, the execution speed as well as the robustness to classification noise of Adaboost. Furthermore, the diversity–accuracy patterns of the ensemble classifiers are investigated by kappa–error diagrams.  相似文献   

3.
Construction of classifier ensembles by means of artificial immune systems   总被引:2,自引:0,他引:2  
This paper presents the application of Artificial Immune Systems to the design of classifier ensembles. Ensembles of classifiers are a very interesting alternative to single classifiers when facing difficult problems. In general, ensembles are able to achieve better performance in terms of learning and generalisation errors. Several papers have shown that the processes of classifier design and combination must be related in order to obtain better ensembles. Artificial Immune Systems are a recent paradigm based on the immune systems of animals. The features of this new paradigm make it very appropriate for the design of systems where many components must cooperate to solve a given task. The design of classifier ensembles can be considered within such a group of systems, as the cooperation of the individual classifiers is able to improve the performance of the overall system. This paper studies the viability of Artificial Immune Systems when dealing with ensemble design. We construct a population of classifiers that is evolved using an Artificial Immune algorithm. From this population of classifiers several different ensembles can be extracted. These ensembles are favourably compared with ensembles obtained using standard methods in 35 real-world classification problems from the UCI Machine Learning Repository.  相似文献   

4.
《Optimization》2012,61(7):1099-1116
In this article we study support vector machine (SVM) classifiers in the face of uncertain knowledge sets and show how data uncertainty in knowledge sets can be treated in SVM classification by employing robust optimization. We present knowledge-based SVM classifiers with uncertain knowledge sets using convex quadratic optimization duality. We show that the knowledge-based SVM, where prior knowledge is in the form of uncertain linear constraints, results in an uncertain convex optimization problem with a set containment constraint. Using a new extension of Farkas' lemma, we reformulate the robust counterpart of the uncertain convex optimization problem in the case of interval uncertainty as a convex quadratic optimization problem. We then reformulate the resulting convex optimization problems as a simple quadratic optimization problem with non-negativity constraints using the Lagrange duality. We obtain the solution of the converted problem by a fixed point iterative algorithm and establish the convergence of the algorithm. We finally present some preliminary results of our computational experiments of the method.  相似文献   

5.
Given linearly inseparable sets R of red points and B of blue points, we consider several measures of how far they are from being separable. Intuitively, given a potential separator (“classifier”), we measure its quality (“error”) according to how much work it would take to move the misclassified points across the classifier to yield separated sets. We consider several measures of work and provide algorithms to find linear classifiers that minimize the error under these different measures.  相似文献   

6.
A fuzzy random forest   总被引:4,自引:0,他引:4  
When individual classifiers are combined appropriately, a statistically significant increase in classification accuracy is usually obtained. Multiple classifier systems are the result of combining several individual classifiers. Following Breiman’s methodology, in this paper a multiple classifier system based on a “forest” of fuzzy decision trees, i.e., a fuzzy random forest, is proposed. This approach combines the robustness of multiple classifier systems, the power of the randomness to increase the diversity of the trees, and the flexibility of fuzzy logic and fuzzy sets for imperfect data management. Various combination methods to obtain the final decision of the multiple classifier system are proposed and compared. Some of them are weighted combination methods which make a weighting of the decisions of the different elements of the multiple classifier system (leaves or trees). A comparative study with several datasets is made to show the efficiency of the proposed multiple classifier system and the various combination methods. The proposed multiple classifier system exhibits a good accuracy classification, comparable to that of the best classifiers when tested with conventional data sets. However, unlike other classifiers, the proposed classifier provides a similar accuracy when tested with imperfect datasets (with missing and fuzzy values) and with datasets with noise.  相似文献   

7.
A regularized classifier is proposed for a two-population classification problem of mixed continuous and categorical variables in a general location model(GLOM). The limiting overall expected error for the classifier is given. It can be used in an optimization search for the regularization parameters. For a heteroscedastic spherical dispersion across all locations, an asymptotic error is available which provides an alternative criterion for the optimization search. In addition, the asymptotic error can serve as a baseline for practical comparisons with other classifiers. Results based on a simulation and two real datasets are presented.  相似文献   

8.
We consider the problem of retaining the motions of an abstract dynamic system in a given constraint set. Constructions from the programmed iteration method are extended to problems whose dynamics, in general, does not possess any topological properties. The weaker requirements are compensated by introducing transfinite iterations of the programmed absorption operator. The technique of fixed points of mappings in chain-complete partially ordered sets is used in the proofs. The proposed procedure produces a set where the retention problem is solved in the class of quasistrategies. The control interval is not assumed to be bounded.  相似文献   

9.
One issue in data classification problems is to find an optimal subset of instances to train a classifier. Training sets that represent well the characteristics of each class have better chances to build a successful predictor. There are cases where data are redundant or take large amounts of computing time in the learning process. To overcome this issue, instance selection techniques have been proposed. These techniques remove examples from the data set so that classifiers are built faster and, in some cases, with better accuracy. Some of these techniques are based on nearest neighbors, ordered removal, random sampling and evolutionary methods. The weaknesses of these methods generally involve lack of accuracy, overfitting, lack of robustness when the data set size increases and high complexity. This work proposes a simple and fast immune-inspired suppressive algorithm for instance selection, called SeleSup. According to self-regulation mechanisms, those cells unable to neutralize danger tend to disappear from the organism. Therefore, by analogy, data not relevant to the learning of a classifier are eliminated from the training process. The proposed method was compared with three important instance selection algorithms on a number of data sets. The experiments showed that our mechanism substantially reduces the data set size and is accurate and robust, specially on larger data sets.  相似文献   

10.
The problem of identification in fuzzy systems described by the use of fuzzy equation is considered. The identification method and its performance index is also presented. The formal procedure of the identification algorithm is illustrated by means of a numerical example. The possibility of using the proposed algorithm for the solution of a control problem is given as well.  相似文献   

11.
陶朝杰  杨进 《经济数学》2020,37(3):214-220
虚假评论是电商发展过程中一个无法避免的难题. 针对在线评论数据中样本类别不平衡情况,提出基于BalanceCascade-GBDT算法的虚假评论识别方法. BalanceCascade算法通过设置分类器的误报率逐步缩小大类样本空间,然后集成所有基分类器构建最终分类器. GBDT以其高准确性和可解释性被广泛应用于分类问题中,并且作为样本扰动不稳定算法,是十分合适的基分类模型. 模型基于Yelp评论数据集,采用AUC值作为评价指标,并与逻辑回归、随机森林以及神经网络算法进行对比,实验证明了该方法的有效性.  相似文献   

12.
Diverse reduct subspaces based co-training for partially labeled data   总被引:1,自引:0,他引:1  
Rough set theory is an effective supervised learning model for labeled data. However, it is often the case that practical problems involve both labeled and unlabeled data, which is outside the realm of traditional rough set theory. In this paper, the problem of attribute reduction for partially labeled data is first studied. With a new definition of discernibility matrix, a Markov blanket based heuristic algorithm is put forward to compute the optimal reduct of partially labeled data. A novel rough co-training model is then proposed, which could capitalize on the unlabeled data to improve the performance of rough classifier learned only from few labeled data. The model employs two diverse reducts of partially labeled data to train its base classifiers on the labeled data, and then makes the base classifiers learn from each other on the unlabeled data iteratively. The classifiers constructed in different reduct subspaces could benefit from their diversity on the unlabeled data and significantly improve the performance of the rough co-training model. Finally, the rough co-training model is theoretically analyzed, and the upper bound on its performance improvement is given. The experimental results show that the proposed model outperforms other representative models in terms of accuracy and even compares favorably with rough classifier trained on all training data labeled.  相似文献   

13.
This article uses projection depth (PD) for robust classification of multivariate data. Here we consider two types of classifiers, namely, the maximum depth classifier and the modified depth-based classifier. The latter involves kernel density estimation, where one needs to choose the associated scale of smoothing. We consider both the single scale and the multi-scale versions of kernel density estimation, and investigate the large sample properties of the resulting classifiers under appropriate regularity conditions. Some simulated and real data sets are analyzed to evaluate the finite sample performance of these classification tools.  相似文献   

14.
为解决传统的RFM客户细分方法还不能很好地刻画客户行为,同时也没有就RFM指标权重进行分析这一问题,在RFM指标的基础上扩充了客户细分的指标体系,并提出了基于AHP的RFM指标权重确定策略.鉴于传统的单一分类器存在的很多缺陷,提出基于SOM&SVM的组合分类器模型,充分利用SOM和SVM单一分类器各自的优点,综合两种分类器的分类信息,避免单一分类器可能存在的片面性,从而提高分类的准确性.最后通过实例对上述模型的有效性进行验证.  相似文献   

15.
This paper presents a novel knowledge-based linear classification model for multi-category discrimination of sets or objects with prior knowledge. The prior knowledge is in the form of multiple polyhedral sets belonging to one or more categories or classes and it is introduced as additional constraints into the formulation of the Tikhonov linear least squares multi-class support vector machine model. The resulting formulation leads to a least squares problem that can be solved using matrix methods or iterative methods. Investigations include the development of a linear knowledge-based classification model extended to the case of multi-categorical discrimination and expressed as a single unconstrained optimization problem. Advantages of this formulation include explicit expressions for the classification weights of the classifier(s) and its ability to incorporate and handle prior knowledge directly to the classifiers. In addition it can provide fast solutions to the optimal classification weights for multi-categorical separation without the use of specialized solver-software. To evaluate the model, data and prior knowledge from the Wisconsin breast cancer prognosis and two-phase flow regimes in pipes were used to train and test the proposed formulation.  相似文献   

16.
A multicriteria identification and prediction method for mathematical models of simulation type in the case of several identification criteria (error functions) is proposed. The necessity of the multicriteria formulation arises, for example, when one needs to take into account errors of completely different origins (not reducible to a single characteristic) or when there is no information on the class of noise in the data to be analyzed. An identification sets method is described based on the approximation and visualization of the multidimensional graph of the identification error function and sets of suboptimal parameters. This method allows for additional advantages of the multicriteria approach, namely, the construction and visual analysis of the frontier and the effective identification set (frontier and the Pareto set for identification criteria), various representations of the sets of Pareto effective and subeffective parameter combinations, and the corresponding predictive trajectory tubes. The approximation is based on the deep holes method, which yields metric ε-coverings with nearly optimal properties, and on multiphase approximation methods for the Edgeworth–Pareto hull. The visualization relies on the approach of interactive decision maps. With the use of the multicriteria method, multiple-choice solutions of identification and prediction problems can be produced and justified by analyzing the stability of the optimal solution not only with respect to the parameters (robustness with respect to data) but also with respect to the chosen set of identification criteria (robustness with respect to the given collection of functionals).  相似文献   

17.
In this paper, we propose a genetic programming (GP) based approach to evolve fuzzy rule based classifiers. For a c-class problem, a classifier consists of c trees. Each tree, T i , of the multi-tree classifier represents a set of rules for class i. During the evolutionary process, the inaccurate/inactive rules of the initial set of rules are removed by a cleaning scheme. This allows good rules to sustain and that eventually determines the number of rules. In the beginning, our GP scheme uses a randomly selected subset of features and then evolves the features to be used in each rule. The initial rules are constructed using prototypes, which are generated randomly as well as by the fuzzy k-means (FKM) algorithm. Besides, experiments are conducted in three different ways: Using only randomly generated rules, using a mixture of randomly generated rules and FKM prototype based rules, and with exclusively FKM prototype based rules. The performance of the classifiers is comparable irrespective of the type of initial rules. This emphasizes the novelty of the proposed evolutionary scheme. In this context, we propose a new mutation operation to alter the rule parameters. The GP scheme optimizes the structure of rules as well as the parameters involved. The method is validated on six benchmark data sets and the performance of the proposed scheme is found to be satisfactory.  相似文献   

18.
This paper points out three questionable areas in the realm of similarity measures and then provides a new method that will rectify the problem. The purpose of this paper is fourfold. First, we will propose a scenario where the three similarity measures proposed by Hung and Yang (2004) [1] are helpless in aiding a decision maker in deciding pattern recognition problem. Second, we will present our method for solving the dilemma. Third, we will show that our proposed similarity measures satisfy the axioms for well defined similarity measures. Fourth, we will prove that our method could solve pattern recognition problems. Our findings will help researchers handle similarity problems under intuitionistic fuzzy sets environment.  相似文献   

19.
This paper considers the model discrimination problem among a finite number of models in safety–critical systems that are subjected to constraints that can be disjunctive and where state and input constraints can be coupled with each other. In particular, we consider both the optimal input design problem for active model discrimination that is solved offline as well as the online passive model discrimination problem via a model invalidation framework. To overcome the issues associated with non-convex and generalized semi-infinite constraints due to the disjunctive and coupled constraints, we propose some techniques for reformulating these constraints in a computationally tractable manner by leveraging the Karush–Kuhn–Tucker (KKT) conditions and introducing binary variables, thus recasting the active and passive model discrimination problems into tractable mixed-integer linear/quadratic programming (MILP/MIQP) problems. When compared with existing approaches, our method is able to obtain the optimal solution and is observed in simulations to also result in less computation time. Finally, we demonstrate the effectiveness of the proposed active model discrimination approach for estimating driver intention with disjunctive safety constraints and state–input coupled curvature constraints, as well as for fault identification.  相似文献   

20.
Target tracking is one of the most important issues in computer vision and has been applied in many fields of science, engineering and industry. Because of the occlusion during tracking, typical approaches with single classifier learn much of occluding background information which results in the decrease of tracking performance, and eventually lead to the failure of the tracking algorithm. This paper presents a new correlative classifiers approach to address the above problem. Our idea is to derive a group of correlative classifiers based on sample set method. Then we propose strategy to establish the classifiers and to query the suitable classifiers for the next frame tracking. In order to deal with nonlinear problem, particle filter is adopted and integrated with sample set method. For choosing the target from candidate particles, we define a similarity measurement between particles and sample set. The proposed sample set method includes the following steps. First, we cropped positive samples set around the target and negative samples set far away from the target. Second, we extracted average Haar-like feature from these samples and calculate their statistical characteristic which represents the target model. Third, we define the similarity measurement based on the statistical characteristic of these two sets to judge the similarity between candidate particles and target model. Finally, we choose the largest similarity score particle as the target in the new frame. A number of experiments show the robustness and efficiency of the proposed approach when compared with other state-of-the-art trackers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号