首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 881 毫秒
1.
2.
Classification and rule induction are two important tasks to extract knowledge from data. In rule induction, the representation of knowledge is defined as IF-THEN rules which are easily understandable and applicable by problem-domain experts. In this paper, a new chromosome representation and solution technique based on Multi-Expression Programming (MEP) which is named as MEPAR-miner (Multi-Expression Programming for Association Rule Mining) for rule induction is proposed. Multi-Expression Programming (MEP) is a relatively new technique in evolutionary programming that is first introduced in 2002 by Oltean and Dumitrescu. MEP uses linear chromosome structure. In MEP, multiple logical expressions which have different sizes are used to represent different logical rules. MEP expressions can be encoded and implemented in a flexible and efficient manner. MEP is generally applied to prediction problems; in this paper a new algorithm is presented which enables MEP to discover classification rules. The performance of the developed algorithm is tested on nine publicly available binary and n-ary classification data sets. Extensive experiments are performed to demonstrate that MEPAR-miner can discover effective classification rules that are as good as (or better than) the ones obtained by the traditional rule induction methods. It is also shown that effective gene encoding structure directly improves the predictive accuracy of logical IF-THEN rules.  相似文献   

3.
A Dual-Objective Evolutionary Algorithm for Rules Extraction in Data Mining   总被引:1,自引:0,他引:1  
This paper presents a dual-objective evolutionary algorithm (DOEA) for extracting multiple decision rule lists in data mining, which aims at satisfying the classification criteria of high accuracy and ease of user comprehension. Unlike existing approaches, the algorithm incorporates the concept of Pareto dominance to evolve a set of non-dominated decision rule lists each having different classification accuracy and number of rules over a specified range. The classification results of DOEA are analyzed and compared with existing rule-based and non-rule based classifiers based upon 8 test problems obtained from UCI Machine Learning Repository. It is shown that the DOEA produces comprehensible rules with competitive classification accuracy as compared to many methods in literature. Results obtained from box plots and t-tests further examine its invariance to random partition of datasets. An erratum to this article is available at .  相似文献   

4.
Rough set theory is a useful mathematical tool to deal with vagueness and uncertainty in available information. The results of a rough set approach are usually presented in the form of a set of decision rules derived from a decision table. Because using the original decision table is not the only way to implement a rough set approach, it could be interesting to investigate possible improvement in classification performance by replacing the original table with an alternative table obtained by pairwise comparisons among patterns. In this paper, a decision table based on pairwise comparisons is generated using the preference relation as in the Preference Ranking Organization Methods for Enrichment Evaluations (PROMETHEE) methods, to gauges the intensity of preference for one pattern over another pattern on each criterion before classification. The rough-set-based rule classifier (RSRC) provided by the well-known library for the Rough Set Exploration System (RSES) running under Windows as been successfully used to generate decision rules by using the pairwise-comparisons-based tables. Specifically, parameters related to the preference function on each criterion have been determined using a genetic-algorithm-based approach. Computer simulations involving several real-world data sets have revealed that of the proposed classification method performs well compared to other well-known classification methods and to RSRC using the original tables.  相似文献   

5.
Fuzzy Rule-Based Systems have been succesfully applied to pattern classification problems. In this type of classification systems, the classical Fuzzy Reasoning Method (FRM) classifies a new example with the consequent of the rule with the greatest degree of association. By using this reasoning method, we lose the information provided by the other rules with different linguistic labels which also represent this value in the pattern attribute, although probably to a lesser degree. The aim of this paper is to present new FRMs which allow us to improve the system performance, maintaining its interpretability. The common aspect of the proposals is the participation, in the classification of the new pattern, of the rules that have been fired by such pattern. We formally describe the behaviour of a general reasoning method, analyze six proposals for this general model, and present a method to learn the parameters of these FRMs by means of Genetic Algorithms, adapting the inference mechanism to the set of rules. Finally, to show the increase of the system generalization capability provided by the proposed FRMs, we point out some results obtained by their integration in a fuzzy rule generation process.  相似文献   

6.
A bootstrap-based aggregate classifier for model-based clustering   总被引:1,自引:0,他引:1  
In model-based clustering, a situation in which true class labels are unknown and that is therefore also referred to as unsupervised learning, observations are typically classified by the Bayes modal rule. In this study, we assess whether alternative classifiers from the classification or supervised-learning literature—developed for situations in which class labels are known—can improve the Bayes rule. More specifically, we investigate the performance of bootstrap-based aggregate (bagging) rules after adapting these to the model-based clustering context. It is argued that specific issues, such as the label-switching problem, have to be carefully addressed when using bootstrap methods in model-based clustering. Our two Monte Carlo studies show that classification based on the Bayes rule is rather stable and difficult to improve by bootstrap-based aggregate rules, even for sparse data. An empirical example illustrates the various approaches described in this paper.  相似文献   

7.
Dealing with the large amount of data resulting from association rule mining is a big challenge. The essential issue is how to provide efficient methods for summarizing and representing meaningful discovered knowledge from databases. This paper presents a new approach called multi-tier granule mining to improve the performance of association rule mining. Rather than using patterns, it uses granules to represent knowledge that is implicitly contained in relational databases. This approach also uses multi-tier structures and association mappings to interpret association rules in terms of granules. Consequently, association rules can be quickly assessed and meaningless association rules can be justified according to these association mappings. The experimental results indicate that the proposed approach is promising.  相似文献   

8.
This paper proposes fuzzy symbolic modeling as a framework for intelligent data analysis and model interpretation in classification and regression problems. The fuzzy symbolic modeling approach is based on the eigenstructure analysis of the data similarity matrix to define the number of fuzzy rules in the model. Each fuzzy rule is associated with a symbol and is defined by a Gaussian membership function. The prototypes for the rules are computed by a clustering algorithm, and the model output parameters are computed as the solutions of a bounded quadratic optimization problem. In classification problems, the rules’ parameters are interpreted as the rules’ confidence. In regression problems, the rules’ parameters are used to derive rules’ confidences for classes that represent ranges of output variable values. The resulting model is evaluated based on a set of benchmark datasets for classification and regression problems. Nonparametric statistical tests were performed on the benchmark results, showing that the proposed approach produces compact fuzzy models with accuracy comparable to models produced by the standard modeling approaches. The resulting model is also exploited from the interpretability point of view, showing how the rule weights provide additional information to help in data and model understanding, such that it can be used as a decision support tool for the prediction of new data.  相似文献   

9.
We are considering the problem of multi-criteria classification. In this problem, a set of “if … then …” decision rules is used as a preference model to classify objects evaluated by a set of criteria and regular attributes. Given a sample of classification examples, called learning data set, the rules are induced from dominance-based rough approximations of preference-ordered decision classes, according to the Variable Consistency Dominance-based Rough Set Approach (VC-DRSA). The main question to be answered in this paper is how to classify an object using decision rules in situation where it is covered by (i) no rule, (ii) exactly one rule, (iii) several rules. The proposed classification scheme can be applied to both, learning data set (to restore the classification known from examples) and testing data set (to predict classification of new objects). A hypothetical example from the area of telecommunications is used for illustration of the proposed classification method and for a comparison with some previous proposals.  相似文献   

10.
Local search methods are widely used to improve the performance of evolutionary computation algorithms in all kinds of domains. Employing advanced and efficient exploration mechanisms becomes crucial in complex and very large (in terms of search space) problems, such as when employing evolutionary algorithms to large-scale data mining tasks. Recently, the GAssist Pittsburgh evolutionary learning system was extended with memetic operators for discrete representations that use information from the supervised learning process to heuristically edit classification rules and rule sets. In this paper we first adapt some of these operators to BioHEL, a different evolutionary learning system applying the iterative learning approach, and afterwards propose versions of these operators designed for continuous attributes and for dealing with noise. The performance of all these operators and their combination is extensively evaluated on a broad range of synthetic large-scale datasets to identify the settings that present the best balance between efficiency and accuracy. Finally, the identified best configurations are compared with other classes of machine learning methods on both synthetic and real-world large-scale datasets and show very competent performance.  相似文献   

11.
Classification is concerned with the development of rules for the allocation of observations to groups, and is a fundamental problem in machine learning. Much of previous work on classification models investigates two-group discrimination. Multi-category classification is less-often considered due to the tendency of generalizations of two-group models to produce misclassification rates that are higher than desirable. Indeed, producing “good” two-group classification rules is a challenging task for some applications, and producing good multi-category rules is generally more difficult. Additionally, even when the “optimal” classification rule is known, inter-group misclassification rates may be higher than tolerable for a given classification model. We investigate properties of a mixed-integer programming based multi-category classification model that allows for the pre-specification of limits on inter-group misclassification rates. The mechanism by which the limits are satisfied is the use of a reserved judgment region, an artificial category into which observations are placed whose attributes do not sufficiently indicate membership to any particular group. The method is shown to be a consistent estimator of a classification rule with misclassification limits, and performance on simulated and real-world data is demonstrated.  相似文献   

12.
We propose a new binary classification and variable selection technique especially designed for high-dimensional predictors. Among many predictors, typically, only a small fraction of them have significant impact on prediction. In such a situation, more interpretable models with better prediction accuracy can be obtained by variable selection along with classification. By adding an ?1-type penalty to the loss function, common classification methods such as logistic regression or support vector machines (SVM) can perform variable selection. Existing penalized SVM methods all attempt to jointly solve all the parameters involved in the penalization problem altogether. When data dimension is very high, the joint optimization problem is very complex and involves a lot of memory allocation. In this article, we propose a new penalized forward search technique that can reduce high-dimensional optimization problems to one-dimensional optimization by iterating the selection steps. The new algorithm can be regarded as a forward selection version of the penalized SVM and its variants. The advantage of optimizing in one dimension is that the location of the optimum solution can be obtained with intelligent search by exploiting convexity and a piecewise linear or quadratic structure of the criterion function. In each step, the predictor that is most able to predict the outcome is chosen in the model. The search is then repeatedly used in an iterative fashion until convergence occurs. Comparison of our new classification rule with ?1-SVM and other common methods show very promising performance, in that the proposed method leads to much leaner models without compromising misclassification rates, particularly for high-dimensional predictors.  相似文献   

13.
In recent years, support vector machines (SVMs) were successfully applied to a wide range of applications. However, since the classifier is described as a complex mathematical function, it is rather incomprehensible for humans. This opacity property prevents them from being used in many real-life applications where both accuracy and comprehensibility are required, such as medical diagnosis and credit risk evaluation. To overcome this limitation, rules can be extracted from the trained SVM that are interpretable by humans and keep as much of the accuracy of the SVM as possible. In this paper, we will provide an overview of the recently proposed rule extraction techniques for SVMs and introduce two others taken from the artificial neural networks domain, being Trepan and G-REX. The described techniques are compared using publicly available datasets, such as Ripley’s synthetic dataset and the multi-class iris dataset. We will also look at medical diagnosis and credit scoring where comprehensibility is a key requirement and even a regulatory recommendation. Our experiments show that the SVM rule extraction techniques lose only a small percentage in performance compared to SVMs and therefore rank at the top of comprehensible classification techniques.  相似文献   

14.
This paper compares heuristic criteria used for extracting a pre-specified number of fuzzy classification rules from numerical data. We examine the performance of each heuristic criterion through computational experiments on well-known test problems. Experimental results show that better results are obtained from composite criteria of confidence and support measures than their individual use. It is also shown that genetic algorithm-based rule selection can improve the classification ability of extracted fuzzy rules by searching for good rule combinations. This observation suggests the importance of taking into account the combinatorial effect of fuzzy rules (i.e., the interaction among them).  相似文献   

15.
This paper considers the outpatient no-show problem faced by a rural free clinic located in the south-eastern United States. Using data mining and simulation techniques, we develop sequencing schemes for patients, in order to optimize a combination of performance measures used at the clinic. We utilize association rule mining (ARM) to build a model for predicting patient no-shows; and then use a set covering optimization method to derive three manageable sets of rules for patient sequencing. Simulation is used to determine the optimal number of patients and to evaluate the models. The ARM technique presented here results in significant improvements over models that do not employ rules, supporting the conjecture that, when dealing with noisy data such as in an outpatient clinic, extracting partial patterns, as is done by ARM, can be of significant value for simulation modelling.  相似文献   

16.
The prediction of surface roughness is a challengeable problem. In order to improve the prediction accuracy in end milling process, an improved approach is proposed to model surface roughness with adaptive network-based fuzzy inference system (ANFIS) and leave-one-out cross-validation (LOO-CV) approach. This approach focuses on both architecture and parameter optimization. LOO-CV, which is an effective measure to evaluate the generalization capability of mode, is employed to find the most suitable membership function and the optimal rule base of ANFIS model for the issue of surface roughness prediction. To find the optimal rule base of ANFIS, a new “top down” rules reduction method is suggested. Three machining parameters, the spindle speed, feed rate and depth of cut are used as inputs in the model. Based on the same experimental data, the predictive results of ANFIS with LOO-CV are compared with the results reported recently in the literature and ANFIS with clustering methods. The comparisons indicate that the presented approach outperforms the opponent methods, and the prediction accuracy can be improved to 96.38%. ANFIS with LOO-CV approach is an effective approach for prediction of surface roughness in end milling process.  相似文献   

17.
Cancer classification using genomic data is one of the major research areas in the medical field. Therefore, a number of binary classification methods have been proposed in recent years. Top Scoring Pair (TSP) method is one of the most promising techniques that classify genomic data in a lower dimensional subspace using a simple decision rule. In the present paper, we propose a supervised classification technique that utilizes incremental generalized eigenvalue and top scoring pair classifiers to obtain higher classification accuracy with a small training set. We validate our method by applying it to well known microarray data sets.  相似文献   

18.
The paper presents a discussion on evaluation methods in decision analysis. The presentation begins with the discussion of the expected value rule for selection amongst a number of available courses of action. Then a number of other evaluation rules to either replace or supplement the expected value are presented. They are discussed from a choice rather than preference view. To improve the expected value rule (or any other similar rule), it is suggested that it should be supplemented with other, qualitative rules rather than engaging in further modifications in pursuit of the perfect rule. A characteristic of qualitative rules is that they do not rely on multiplying probabilities and values but treat them as separate numeric entities. Once a rule has been agreed upon, it can be applied to all the alternatives, provided there is a computational procedure for evaluating the alternatives under that rule. Delta dominance is introduced as a unifying concept for many of the dominance rules in current use. Dominance and threshold methods are discussed and the kinship between them is pointed out.  相似文献   

19.
This article presents a hybrid model for the multiple criteria decision making problems. The proposed decision model consists of three parts: (i) DEA (data envelopment analysis) is used to provide the best combination on the performance parameters of original data; (ii) By the application of AFS (axiomatic fuzzy set) theory and AHP (analytic hierarchy process) method, the weight of each attribute is calculated and (iii) TOPSIS (technique for order preference by similarity to ideal solution) is applied to provide the ranking order of that best combination based on the weights of attributes. In addition, we also provide the definitely semantic interpretations for the decision results by AFS theory. Specially, the model not only employs the performance parameters from raw data, but also considers the preferences from decision-makers that can make the decision results more reasonable. The proposed model is used for robot selection to verify the proposed model. Using the selection index, the evaluation of alternative robots and the selection of the most appropriate are eventually feasible. Moreover, a numerical example for supplier selection is included to illustrate the application of the model for the newly developed problems.  相似文献   

20.
Data collected from a survey typically consist of attributes that are mostly if not completely binary-valued or binary-encoded. We present a method for handling such data where the underlying data analysis can be cast as a classification problem. We propose a hybrid method that combines neural network and decision tree methods. The network is trained to remove irrelevant data attributes and the decision tree is applied to extract comprehensible classification rules from the trained network. The conditions of the rules are in the form of a conjunction of M-of-N constructs. An M-of-N construct is a rule condition that is satisfied if (at least, exactly, at most) M of the N binary attributes in the construct are present. The effectiveness of the method is illustrated on data collected for a study of global car market segmentation. The results show that besides achieving high predictive accuracy, the method also allows meaningful interpretation of the relationships among the data variables.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号