首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
The outcomes at the tips of a decision tree cannot always be represented by a single numerical value on a one-dimensional axis. In many decision problems they are multidimensional, and their performance is expressed in a mixture of verbal statements and physical or monetary values. We propose to use the scores of cardinal methods for multicriteria decision analysis in order to represent the relative performance of the outcomes. Thereafter, we evaluate the chance forks in the tree via the corresponding aggregation procedure: in the Multiplicative AHP via weighted geometric means of the scores and in SMART via weighted arithmetic means. The procedure is based on the idea that the numerical values of verbal quantifiers like somewhat more, more, … do not depend on what we compare, whether it is relative importance or relative likelihood.  相似文献   

3.
A method for parallel construction of a classifier ensemble for solving the problem of localization of neuron sources within the brain on the basis of the analysis of electroencephalography signals is described. The idea of the proposed parallel numerical method consists in the consideration of the source parameters as attributes of decision tress constructed in parallel. The method is based on formation of a training data set from an experimental signal and construction of a classifier on the basis of the value of error of the potential, that is, the difference between the measured and model values of the potential. The efficiency of parallelization of the localization problem, namely, the data distribution between processors, and the distributed training of the ensembles of decision trees are considered. Analysis of the scalability of the problem of construction of a classifier ensemble with a increase in the number of processors in the course of solution of the problem of localization of a neuron source on multiprocessor computational complexes is presented. The parallel source localization algorithm is developed for architectures with either common or distributed memory. The algorithm is realized using the MPI technology; a hybrid model of parallel calculations using MPI and OpenMPI is also discussed.  相似文献   

4.
Empirical studies have shown that the performance of decision tree induction usually improves when the trees are pruned. Whether these results hold in general and to what extent pruning improves the accuracy of a concept have not been investigated theoretically. This paper provides a theoretical study of pruning. We focus on a particular type of pruning and determine a bound on the error due to pruning. This is combined with PAC (probably approximately correct) learning theory to determine a sample size sufficient to guarantee a probabilistic bound on the concept error. We also discuss additional pruning rules and give an analysis for the pruning error.  相似文献   

5.
The New Basel Accord, which was implemented in 2007, has made a significant difference to the use of modelling within financial organisations. In particular it has highlighted the importance of Loss Given Default (LGD) modelling. We propose a decision tree approach to modelling LGD for unsecured consumer loans where the uncertainty in some of the nodes is modelled using a mixture model, where the parameters are obtained using regression. A case study based on default data from the in-house collections department of a UK financial organisation is used to show how such regression can be undertaken.  相似文献   

6.
In multicriteria decision problems many values must be assigned, such as the importance of the different criteria and the values of the alternatives with respect to subjective criteria. Since these assignments are approximate, it is very important to analyze the sensitivity of results when small modifications of the assignments are made. When solving a multicriteria decision problem, it is desirable to choose a decision function that leads to a solution as stable as possible. We propose here a method based on genetic programming that produces better decision functions than the commonly used ones. The theoretical expectations are validated by case studies.  相似文献   

7.
We propose a generic decision tree framework that supports reusable components design. The proposed generic decision tree framework consists of several sub-problems which were recognized by analyzing well-known decision tree induction algorithms, namely ID3, C4.5, CART, CHAID, QUEST, GUIDE, CRUISE, and CTREE. We identified reusable components in these algorithms as well as in several of their partial improvements that can be used as solutions for sub-problems in the generic decision tree framework. The identified components can now be used outside the algorithm they originate from. Combining reusable components allows the replication of original algorithms, their modification but also the creation of new decision tree induction algorithms. Every original algorithm can outperform other algorithms under specific conditions but can also perform poorly when these conditions change. Reusable components allow exchanging of solutions from various algorithms and fast design of new algorithms. We offer a generic framework for component-based algorithms design that enhances understanding, testing and usability of decision tree algorithm parts.  相似文献   

8.
Decision-tree algorithm provides one of the most popular methodologies for symbolic knowledge acquisition. The resulting knowledge, a symbolic decision tree along with a simple inference mechanism, has been praised for comprehensibility. The most comprehensible decision trees have been designed for perfect symbolic data. Over the years, additional methodologies have been investigated and proposed to deal with continuous or multi-valued data, and with missing or noisy features. Recently, with the growing popularity of fuzzy representation, some researchers have proposed to utilize fuzzy representation in decision trees to deal with similar situations. This paper presents a survey of current methods for Fuzzy Decision Tree (FDT) designment and the various existing issues. After considering potential advantages of FDT classifiers over traditional decision tree classifiers, we discuss the subjects of FDT including attribute selection criteria, inference for decision assignment and stopping criteria. To be best of our knowledge, this is the first overview of fuzzy decision tree classifier.  相似文献   

9.
An important approach to decision modeling is the induction of knowledge structures—such as rules, trees, and graphs—from empirical data describing previous conditions and the resulting decisions. We examine here a specific knowledge structure, a logic tree, in which the conditions are leaves, the decision is the root, and the intermediate nodes are logical operators. We then use genetic algorithms (GAs) to construct logic trees that best represent the correspondence between conditions and decisions described by the empirical data. We also investigate an important characteristic of the GA search, the fitness distance correlation. Finally, we comment on the usefulness of GAs in knowledge modeling.  相似文献   

10.
11.
This paper is concerned with a special case of the generalized minimum spanning tree problem. The problem is defined on an undirected graph, where the vertex set is partitioned into clusters, and non-negative costs are associated with the edges. The problem is to find a tree of minimum cost containing at least one vertex in each cluster. We consider a geometric case of the problem where the graph is complete, all vertices are situated in the plane, and Euclidean distance defines the edge cost. We prove that the problem is strongly -hard even in the case of a special structure of the clustering called grid clustering. We construct an exact exponential time dynamic programming algorithm and, based on this dynamic programming algorithm, we develop a polynomial time approximation scheme for the problem with grid clustering.  相似文献   

12.
Automatic construction of decision trees for classification   总被引:1,自引:0,他引:1  
An algorithm for learning decision trees for classification and prediction is described which converts real-valued attributes into intervals using statistical considerations. The trees are automatically pruned with the help of a threshold for the estimated class probabilities in an interval. By means of this threshold the user can control the complexity of the tree, i.e. the degree of approximation of class regions in feature space. Costs can be included in the learning phase if a cost matrix is given. In this case class dependent thresholds are used.Some applications are described, especially the task of predicting the high water level in a mountain river.  相似文献   

13.
Classification algorithms based on full decision trees are investigated. Due to the decision tree construction under consideration, all the features satisfying a branching criterion are taken into account at each special vertex. The generalization ability of a full decision tree is estimated by applying margin theory. It is shown on real-life problems that the construction of a full decision tree leads to an increase in the margins of the learning objects; moreover, the number of objects with a positive margin increases as well. It is shown that the empirical Rademacher complexity of a full decision tree is lower than that of a classical decision tree.  相似文献   

14.
This paper presents a mixed integer linear programming (MILP) model for a new class of dynamic project selection and funding problems under risk given multiple scarce resources of different qualifications. The underlying stochastic decision tree concept extends classical approaches mainly in that it adds a novel node type that allows for the continuous control of discrete branching probability distributions. The control functions are piecewise linear and are convex for the costs and concave for the benefits. The MILP-model has been embedded in a prototype Decision Support System (DSS). With respect to the proposed solution the DSS provides complete probability distributions for both costs and benefits.  相似文献   

15.
This paper presents a hybrid method for identification of Pareto-optimal fuzzy classifiers (FCs). In contrast to many existing methods, the initial population for multiobjective evolutionary algorithms (MOEAs) is neither created randomly nor a priori knowledge is required. Instead, it is created by the proposed two-step initialization method. First, a decision tree (DT) created by C4.5 algorithm is transformed into an FC. Therefore, relevant variables are selected and initial partition of input space is performed. Then, the rest of the population is created by randomly replacing some parameters of the initial FC, such that, the initial population is widely spread. That improves the convergence of MOEAs into the correct Pareto front. The initial population is optimized by NSGA-II algorithm and a set of Pareto-optimal FCs representing the trade-off between accuracy and interpretability is obtained. The method does not require any a priori knowledge of the number of fuzzy sets, distribution of fuzzy sets or the number of relevant variables. They are all determined by it. Performance of the obtained FCs is validated by six benchmark data sets from the literature. The obtained results are compared to a recently published paper [H. Ishibuchi, Y. Nojima, Analysis of interpretability-accuracy tradeoff of fuzzy systems by multiobjective fuzzy genetics-based machine learning, International Journal of Approximate Reasoning 44 (1) (2007) 4–31] and the benefits of our method are clearly shown.  相似文献   

16.
One of the most important aspect of molecular and computational biology is the reconstruction of evolutionary relationships. The area is well explored after decades of intensive research. Despite this fact there remains a need for good and efficient algorithms that are capable of reconstructing the evolutionary relationship in reasonable time.  相似文献   

17.
Clustering analysis plays an important role in the filed of data mining. Nowadays, hierarchical clustering technique is becoming one of the most widely used clustering techniques. However, for most algorithms of hierarchical clustering technique, the requirements of high execution efficiency and high accuracy of clustering result cannot be met at the same time. After analyzing the advantages and disadvantages of the hierarchical algorithms, the paper puts forward a two-stage clustering algorithm, named Chameleon Based on Clustering Feature Tree (CBCFT), which hybridizes the Clustering Tree of algorithm BIRCH with algorithm CHAMELEON. By calculating the time complexity of CBCFT, the paper argues that the time complexity of CBCFT increases linearly with the number of data. By experimenting on sample data set, this paper demonstrates that CBCFT is able to identify clusters with large variance in size and shape and is robust to outliers. Moreover, the result of CBCFT is as similar as that of CHAMELEON, but CBCFT overcomes the shortcoming of the low execution efficiency of CHAMELEON. Although the execution time of CBCFT is longer than BIRCH, the clustering result of CBCFT is much satisfactory than that of BIRCH. Finally, through a case of customer segmentation of Chinese Petroleum Corp. HUBEI branch; the paper demonstrates that the clustering result of the case is meaningful and useful. The research is partially supported by National Natural Science Foundation of China (grants #70372049 and #70121001).  相似文献   

18.
Classification of signals with random forms is considered as a statistical hypothesis test problem. A method for empirical construction of decision procedures in statistical hypothesis test problems is proposed.  相似文献   

19.
In the boolean decision tree model there is at least a linear gap between the Monte Carlo and the Las Vegas complexity of a function depending on the error probability. We prove for a large class of read-once formulae that this trivial speed-up is the best that a Monte Carlo algorithm can achieve. For every formula F belonging to that class we show that the Monte Carlo complexity of F with two-sided error p is (1 ? 2p)R(F), and with one-sided error p is (1 ? p)R(F), where R(F) denotes the Las Vegas complexity of F. The result follows from a general lower bound that we derive on the Monte Carlo complexity of these formulae. This bound is analogous to the lower bound due to Saks and Wigderson on their Las Vegas complexity.  相似文献   

20.
We consider the randomized decision tree complexity of the recursive 3‐majority function. We prove a lower bound of for the two‐sided‐error randomized decision tree complexity of evaluating height h formulae with error . This improves the lower bound of given by Jayram, Kumar, and Sivakumar (STOC'03), and the one of given by Leonardos (ICALP'13). Second, we improve the upper bound by giving a new zero‐error randomized decision tree algorithm that has complexity at most . The previous best known algorithm achieved complexity . The new lower bound follows from a better analysis of the base case of the recursion of Jayram et al. The new algorithm uses a novel “interleaving” of two recursive algorithms. © 2015 Wiley Periodicals, Inc. Random Struct. Alg., 48, 612–638, 2016  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号