首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Local search methods are widely used to improve the performance of evolutionary computation algorithms in all kinds of domains. Employing advanced and efficient exploration mechanisms becomes crucial in complex and very large (in terms of search space) problems, such as when employing evolutionary algorithms to large-scale data mining tasks. Recently, the GAssist Pittsburgh evolutionary learning system was extended with memetic operators for discrete representations that use information from the supervised learning process to heuristically edit classification rules and rule sets. In this paper we first adapt some of these operators to BioHEL, a different evolutionary learning system applying the iterative learning approach, and afterwards propose versions of these operators designed for continuous attributes and for dealing with noise. The performance of all these operators and their combination is extensively evaluated on a broad range of synthetic large-scale datasets to identify the settings that present the best balance between efficiency and accuracy. Finally, the identified best configurations are compared with other classes of machine learning methods on both synthetic and real-world large-scale datasets and show very competent performance.  相似文献   

2.
A scenario tree is an efficient way to represent a stochastic data process in decision problems under uncertainty. This paper addresses how to efficiently generate appropriate scenario trees. A knowledge‐based scenario tree generation method is proposed; the new method is further improved by accounting for subjective judgements or expectations about the random future. Compared with existing approaches, complicated mathematical models and time‐consuming estimation, simulation and optimization problem solution are avoided in our knowledge‐based algorithms, and large‐scale scenario trees can be quickly generated. To show the advantages of the new algorithms, a multiperiod portfolio selection problem is considered, and a dynamic risk measure is adopted to control the intermediate risk, which is superior to the single‐period risk measure used in the existing literature. A series of numerical experiments are carried out by using real trading data from the Shanghai stock market. The results show that the scenarios generated by our algorithms can properly represent the underlying distribution; our algorithms have high performance, say, a scenario tree with up to 10,000 scenarios can be generated in less than a half minute. The applications in the multiperiod portfolio management problem demonstrate that our scenario tree generation methods are stable, and the optimal trading strategies obtained with the generated scenario tree are reasonable, efficient and robust. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

3.
A fuzzy random forest   总被引:4,自引:0,他引:4  
When individual classifiers are combined appropriately, a statistically significant increase in classification accuracy is usually obtained. Multiple classifier systems are the result of combining several individual classifiers. Following Breiman’s methodology, in this paper a multiple classifier system based on a “forest” of fuzzy decision trees, i.e., a fuzzy random forest, is proposed. This approach combines the robustness of multiple classifier systems, the power of the randomness to increase the diversity of the trees, and the flexibility of fuzzy logic and fuzzy sets for imperfect data management. Various combination methods to obtain the final decision of the multiple classifier system are proposed and compared. Some of them are weighted combination methods which make a weighting of the decisions of the different elements of the multiple classifier system (leaves or trees). A comparative study with several datasets is made to show the efficiency of the proposed multiple classifier system and the various combination methods. The proposed multiple classifier system exhibits a good accuracy classification, comparable to that of the best classifiers when tested with conventional data sets. However, unlike other classifiers, the proposed classifier provides a similar accuracy when tested with imperfect datasets (with missing and fuzzy values) and with datasets with noise.  相似文献   

4.
This paper presents a hybrid method for identification of Pareto-optimal fuzzy classifiers (FCs). In contrast to many existing methods, the initial population for multiobjective evolutionary algorithms (MOEAs) is neither created randomly nor a priori knowledge is required. Instead, it is created by the proposed two-step initialization method. First, a decision tree (DT) created by C4.5 algorithm is transformed into an FC. Therefore, relevant variables are selected and initial partition of input space is performed. Then, the rest of the population is created by randomly replacing some parameters of the initial FC, such that, the initial population is widely spread. That improves the convergence of MOEAs into the correct Pareto front. The initial population is optimized by NSGA-II algorithm and a set of Pareto-optimal FCs representing the trade-off between accuracy and interpretability is obtained. The method does not require any a priori knowledge of the number of fuzzy sets, distribution of fuzzy sets or the number of relevant variables. They are all determined by it. Performance of the obtained FCs is validated by six benchmark data sets from the literature. The obtained results are compared to a recently published paper [H. Ishibuchi, Y. Nojima, Analysis of interpretability-accuracy tradeoff of fuzzy systems by multiobjective fuzzy genetics-based machine learning, International Journal of Approximate Reasoning 44 (1) (2007) 4–31] and the benefits of our method are clearly shown.  相似文献   

5.
In this paper, we study the performance of various state-of-the-art classification algorithms applied to eight real-life credit scoring data sets. Some of the data sets originate from major Benelux and UK financial institutions. Different types of classifiers are evaluated and compared. Besides the well-known classification algorithms (eg logistic regression, discriminant analysis, k-nearest neighbour, neural networks and decision trees), this study also investigates the suitability and performance of some recently proposed, advanced kernel-based classification algorithms such as support vector machines and least-squares support vector machines (LS-SVMs). The performance is assessed using the classification accuracy and the area under the receiver operating characteristic curve. Statistically significant performance differences are identified using the appropriate test statistics. It is found that both the LS-SVM and neural network classifiers yield a very good performance, but also simple classifiers such as logistic regression and linear discriminant analysis perform very well for credit scoring.  相似文献   

6.
This paper proposes input selection methods for fuzzy modeling, which are based on decision tree search approaches. The branching decision at each node of the tree is made based on the accuracy of the model available at the node. We propose two different approaches of decision tree search algorithms: bottom-up and top-down and four different measures for selecting the most appropriate set of inputs at every branching node (or decision node). Both decision tree approaches are tested using real-world application examples. These methods are applied to fuzzy modeling of two different classification problems and to fuzzy modeling of two dynamic processes. The models accuracy of the four different examples are compared in terms of several performance measures. Moreover, the advantages and drawbacks of using bottom-up or top-down approaches are discussed.  相似文献   

7.
Dynamic optimization and multi-objective optimization have separately gained increasing attention from the research community during the last decade. However, few studies have been reported on dynamic multi-objective optimization (dMO) and scarce effective dMO methods have been proposed. In this paper, we fulfill these gabs by developing new dMO test problems and new effective dMO algorithm. In the newly designed dMO problems, Pareto-optimal decision values (i.e., Pareto-optimal solutions: POS) or both POS and Pareto-optimal objective values (i.e., Pareto-optimal front: POF) change with time. A new multi-strategy ensemble multi-objective evolutionary algorithm (MS-MOEA) is proposed to tackle the challenges of dMO. In MS-MOEA, the convergence speed is accelerated by the new offspring creating mechanism powered by adaptive genetic and differential operators (GDM); a Gaussian mutation operator is employed to cope with premature convergence; a memory like strategy is proposed to achieve better starting population when a change takes place. In order to show the advantages of the proposed algorithm, we experimentally compare MS-MOEA with several algorithms equipped with traditional restart strategy. It is suggested that such a multi-strategy ensemble approach is promising for dealing with dMO problems.  相似文献   

8.
In solving multi-objective optimization problems, evolutionary algorithms have been adequately applied to demonstrate that multiple and well-spread Pareto-optimal solutions can be found in a single simulation run. In this paper, we discuss and put together various different classical generating methods which are either quite well-known or are in oblivion due to publication in less accessible journals and some of which were even suggested before the inception of evolutionary methodologies. These generating methods specialize either in finding multiple Pareto-optimal solutions in a single simulation run or specialize in maintaining a good diversity by systematically solving a number of scalarizing problems. Most classical generating methodologies are classified into four groups mainly based on their working principles and one representative method from each group is chosen in the present study for a detailed discussion and for its performance comparison with a state-of-the-art evolutionary method. On visual comparisons of the efficient frontiers obtained for a number of two and three-objective test problems, the results bring out interesting insights about the strengths and weaknesses of these approaches. The results should motivate researchers to design hybrid multi-objective optimization algorithms which may be better than each of the individual methods.  相似文献   

9.
Many rule systems generated from decision trees (like CART, ID3, C4.5, etc.) or from direct counting frequency methods (like Apriori) are usually non-significant or even contradictory. Nevertheless, most papers on this subject demonstrate that important reductions can be made to generate rule sets by searching and removing redundancies and conflicts and simplifying the similarities between them. The objective of this paper is to present an algorithm (RBS: Reduction Based on Significance) for allocating a significance value to each rule in the system so that experts may select the rules that should be considered as preferable and understand the exact degree of correlation between the different rule attributes. Significance is calculated from the antecedent frequency and rule frequency parameters for each rule; if the first one is above the minimal level and rule frequency is in a critical interval, its significance ratio is computed by the algorithm. These critical boundaries are calculated by an incremental method and the rule space is divided according to them. The significance function is defined for these intervals. As with other methods of rule reduction, our approach can also be applied to rule sets generated from decision trees or frequency counting algorithms, in an independent way and after the rule set has been created. Three simulated data sets are used to carry out a computational experiment. Other standard data sets from UCI repository (UCI Machine Learning Repository) and two particular data sets with expert interpretation are used too, in order to obtain a greater consistency. The proposed method offers a more reduced and more easily understandable rule set than the original sets, and highlights the most significant attribute correlations quantifying their influence on consequent attribute.  相似文献   

10.
The constraint proposal method for computing Pareto-optimal solutions is extended to multi-party negotiations. In the method a neutral coordinator assists decision makers in finding Pareto-optimal solutions so that the elicitation of the decision makers' value functions is not required. During the procedure the decision makers have to indicate their most preferred points on different sets of linear constraints. The method can be used to generate either one Pareto-optimal solution dominating the status quo solution of the negotiation or an approximation to the Pareto frontier. In the latter case a distributive negotiation among the efficient agreements can be carried out afterwards.  相似文献   

11.
Regression trees are a popular alternative to classical regression methods. A number of approaches exist for constructing regression trees. Most of these techniques, including CART, are sequential in nature and locally optimal at each node split, so the final tree solution found may not be the best tree overall. In addition, small changes in the training data often lead to large changes in the final result due to the relative instability of these greedy tree-growing algorithms. Ensemble techniques, such as random forests, attempt to take advantage of this instability by growing a forest of trees from the data and averaging their predictions. The predictive performance is improved, but the simplicity of a single-tree solution is lost.

In earlier work, we introduced the Tree Analysis with Randomly Generated and Evolved Trees (TARGET) method for constructing classification trees via genetic algorithms. In this article, we extend the TARGET approach to regression trees. Simulated data and real world data are used to illustrate the TARGET process and compare its performance to CART, Bayesian CART, and random forests. The empirical results indicate that TARGET regression trees have better predictive performance than recursive partitioning methods, such as CART, and single-tree stochastic search methods, such as Bayesian CART. The predictive performance of TARGET is slightly worse than that of ensemble methods, such as random forests, but the TARGET solutions are far more interpretable.  相似文献   

12.
This paper proposes, describes and evaluates T3C, a classification algorithm that builds decision trees of depth at most three, and results in high accuracy whilst keeping the size of the tree reasonably small. T3C is an improvement over algorithm T3 in the way it performs splits on continuous attributes. When run against publicly available data sets, T3C achieved lower generalisation error than T3 and the popular C4.5, and competitive results compared to Random Forest and Rotation Forest.  相似文献   

13.
In the Knowledge Discovery Process, classification algorithms are often used to help create models with training data that can be used to predict the classes of untested data instances. While there are several factors involved with classification algorithms that can influence classification results, such as the node splitting measures used in making decision trees, feature selection is often used as a pre-classification step when using large data sets to help eliminate irrelevant or redundant attributes in order to increase computational efficiency and possibly to increase classification accuracy. One important factor common to both feature selection as well as to classification using decision trees is attribute discretization, which is the process of dividing attribute values into a smaller number of discrete values. In this paper, we will present and explore a new hybrid approach, ChiBlur, which involves the use of concepts from both the blurring and χ2-based approaches to feature selection, as well as concepts from multi-objective optimization. We will compare this new algorithm with algorithms based on the blurring and χ2-based approaches.  相似文献   

14.
Data classification is an important area of data mining. Several well known techniques such as decision tree, neural network, etc. are available for this task. In this paper we propose a Kalman particle swarm optimized (KPSO) polynomial equation for classification for several well known data sets. Our proposed method is derived from some of the findings of the valuable information like number of terms, number and combination of features in each term, degree of the polynomial equation etc. of our earlier work on data classification using polynomial neural network. The KPSO optimizes these polynomial equations with a faster convergence speed unlike PSO. The polynomial equation that gives the best performance is considered as the model for classification. Our simulation result shows that the proposed approach is able to give competitive classification accuracy compared to PNN in many datasets.  相似文献   

15.
A Dual-Objective Evolutionary Algorithm for Rules Extraction in Data Mining   总被引:1,自引:0,他引:1  
This paper presents a dual-objective evolutionary algorithm (DOEA) for extracting multiple decision rule lists in data mining, which aims at satisfying the classification criteria of high accuracy and ease of user comprehension. Unlike existing approaches, the algorithm incorporates the concept of Pareto dominance to evolve a set of non-dominated decision rule lists each having different classification accuracy and number of rules over a specified range. The classification results of DOEA are analyzed and compared with existing rule-based and non-rule based classifiers based upon 8 test problems obtained from UCI Machine Learning Repository. It is shown that the DOEA produces comprehensible rules with competitive classification accuracy as compared to many methods in literature. Results obtained from box plots and t-tests further examine its invariance to random partition of datasets. An erratum to this article is available at .  相似文献   

16.
基于DDAG-SVM的网络流量分类技术   总被引:1,自引:0,他引:1  
互联网技术不断发展,很多新的网络通信采用动态端口、协议加密等技术,使传统的流量分类技术不再适用.以TCP三次握手后客户端到服务器的第1个包载荷大小、服务器到客户端的第1个包和第2个包载荷大小以及服务器端口信息作为流量特征,提出一种基于DDAG-SVM的网络流量分类的方法,并针对传统DDAG-SVM的误差累积效应,使分类性能变差的问题,根据类间可分离度重构DDAG-SVM决策树,每次都选择最容易分开的两个流类别构成分类决策面,测试结果表明该方法取得了较高的分类准确率.  相似文献   

17.
Evolutionary Design of Nearest Prototype Classifiers   总被引:3,自引:0,他引:3  
In pattern classification problems, many works have been carried out with the aim of designing good classifiers from different perspectives. These works achieve very good results in many domains. However, in general they are very dependent on some crucial parameters involved in the design. These parameters have to be found by a trial and error process or by some automatic methods, like heuristic search and genetic algorithms, that strongly decrease the performance of the method. For instance, in nearest prototype approaches, main parameters are the number of prototypes to use, the initial set, and a smoothing parameter. In this work, an evolutionary approach based on Nearest Prototype Classifier (ENPC) is introduced where no parameters are involved, thus overcoming all the problems that classical methods have in tuning and searching for the appropiate values. The algorithm is based on the evolution of a set of prototypes that can execute several operators in order to increase their quality in a local sense, and with a high classification accuracy emerging for the whole classifier. This new approach has been tested using four different classical domains, including such artificial distributions as spiral and uniform distibuted data sets, the Iris Data Set and an application domain about diabetes. In all the cases, the experiments show successfull results, not only in the classification accuracy, but also in the number and distribution of the prototypes achieved.  相似文献   

18.
This article introduces a classification tree algorithm that can simultaneously reduce tree size, improve class prediction, and enhance data visualization. We accomplish this by fitting a bivariate linear discriminant model to the data in each node. Standard algorithms can produce fairly large tree structures because they employ a very simple node model, wherein the entire partition associated with a node is assigned to one class. We reduce the size of our trees by letting the discriminant models share part of the data complexity. Being themselves classifiers, the discriminant models can also help to improve prediction accuracy. Finally, because the discriminant models use only two predictor variables at a time, their effects are easily visualized by means of two-dimensional plots. Our algorithm does not simply fit discriminant models to the terminal nodes of a pruned tree, as this does not reduce the size of the tree. Instead, discriminant modeling is carried out in all phases of tree growth and the misclassification costs of the node models are explicitly used to prune the tree. Our algorithm is also distinct from the “linear combination split” algorithms that partition the data space with arbitrarily oriented hyperplanes. We use axis-orthogonal splits to preserve the interpretability of the tree structures. An extensive empirical study with real datasets shows that, in general, our algorithm has better prediction power than many other tree or nontree algorithms.  相似文献   

19.
Rough set theory is a useful mathematical tool to deal with vagueness and uncertainty in available information. The results of a rough set approach are usually presented in the form of a set of decision rules derived from a decision table. Because using the original decision table is not the only way to implement a rough set approach, it could be interesting to investigate possible improvement in classification performance by replacing the original table with an alternative table obtained by pairwise comparisons among patterns. In this paper, a decision table based on pairwise comparisons is generated using the preference relation as in the Preference Ranking Organization Methods for Enrichment Evaluations (PROMETHEE) methods, to gauges the intensity of preference for one pattern over another pattern on each criterion before classification. The rough-set-based rule classifier (RSRC) provided by the well-known library for the Rough Set Exploration System (RSES) running under Windows as been successfully used to generate decision rules by using the pairwise-comparisons-based tables. Specifically, parameters related to the preference function on each criterion have been determined using a genetic-algorithm-based approach. Computer simulations involving several real-world data sets have revealed that of the proposed classification method performs well compared to other well-known classification methods and to RSRC using the original tables.  相似文献   

20.
The non-dominate sorting genetic algorithmic-II (NSGA-II) is an effective algorithm for finding Pareto-optimal front for multi-objective optimization problems. To further enhance the advantage of the NSGA-II, this study proposes an evaluative-NSGA-II (E-NSGA-II) in which a novel gene-therapy method incorporates into the crossover operation to retain superior schema patterns in evolutionary population and enhance its solution capability. The merit of each select gene in a crossover chromosome is estimated by exchanging the therapeutic genes in both mating chromosomes and observing their fitness differentiation. Hence, the evaluative crossover operation can generate effective genomes based on the gene merit without explicitly analyzing the solution space. Experiments for nine unconstrained multi-objective benchmarks and four constrained problems show that E-NSGA-II can find Pareto-optimal solutions in all test cases with better convergence and diversity qualities than several existing algorithms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号