首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Classification and rule induction are two important tasks to extract knowledge from data. In rule induction, the representation of knowledge is defined as IF-THEN rules which are easily understandable and applicable by problem-domain experts. In this paper, a new chromosome representation and solution technique based on Multi-Expression Programming (MEP) which is named as MEPAR-miner (Multi-Expression Programming for Association Rule Mining) for rule induction is proposed. Multi-Expression Programming (MEP) is a relatively new technique in evolutionary programming that is first introduced in 2002 by Oltean and Dumitrescu. MEP uses linear chromosome structure. In MEP, multiple logical expressions which have different sizes are used to represent different logical rules. MEP expressions can be encoded and implemented in a flexible and efficient manner. MEP is generally applied to prediction problems; in this paper a new algorithm is presented which enables MEP to discover classification rules. The performance of the developed algorithm is tested on nine publicly available binary and n-ary classification data sets. Extensive experiments are performed to demonstrate that MEPAR-miner can discover effective classification rules that are as good as (or better than) the ones obtained by the traditional rule induction methods. It is also shown that effective gene encoding structure directly improves the predictive accuracy of logical IF-THEN rules.  相似文献   

2.
3.
4.
We consider a proteomic mass spectrometry case-control study for the calibration of a diagnostic rule for the detection of early-stage breast cancer. For each patient, a pair of two distinct mass spectra is recorded, each of which is derived from a different prior fractionation procedure on the available patient serum. We propose a procedure for combining the distinct spectral expressions from patients for the calibration of a diagnostic discriminant rule. This is achieved by first calibrating two distinct prediction rules separately, each on only one of the two available spectral data sources. A double cross-validatory approach is used to summarize the available spectral data using the two classifiers to posterior class probabilities, on which a combined predictor can be calibrated.  相似文献   

5.
Breast cancer is the most commonly diagnosed form of cancer in women. While mammography is the standard modality for detecting breast cancer, it has been shown that medical thermography, which uses cameras with sensitivities in the thermal infrared, is also well suited for the task, especially in dense tissue and for small tumors. In this paper, we present an approach of analysing breast thermograms that first extracts a series of image features describing bilateral (a)symmetry between left and right breast regions, and then uses these features in a subsequent classification step. For the classification, we employ an ant colony optimisation based pattern recognition algorithm that is shown to provide a concise rule base with good classification performance.  相似文献   

6.
Rough set theory is a new data mining approach to manage vagueness. It is capable to discover important facts hidden in the data. Literature indicate the current rough set based approaches can’t guarantee that classification of a decision table is credible and it is not able to generate robust decision rules when new attributes are incrementally added in. In this study, an incremental attribute oriented rule-extraction algorithm is proposed to solve this deficiency commonly observed in the literature related to decision rule induction. The proposed approach considers incremental attributes based on the alternative rule extraction algorithm (AREA), which was presented for discovering preference-based rules according to the reducts with the maximum of strength index (SI), specifically the case that the desired reducts are not necessarily unique since several reducts could include the same value of SI. Using the AREA, an alternative rule can be defined as the rule which holds identical preference to the original decision rule and may be more attractive to a decision-maker than the original one. Through implementing the proposed approach, it can be effectively operating with new attributes to be added in the database/information systems. It is not required to re-compute the updated data set similar to the first step at the initial stage. The proposed algorithm also excludes these repetitive rules during the solution search stage since most of the rule induction approaches generate the repetitive rules. The proposed approach is capable to efficiently and effectively generate the complete, robust and non-repetitive decision rules. The rules derived from the data set provide an indication of how to effectively study this problem in further investigations.  相似文献   

7.
In this paper we study lattice rules which are cubature formulae to approximate integrands over the unit cube [0,1] s from a weighted reproducing kernel Hilbert space. We assume that the weights are independent random variables with a given mean and variance for two reasons stemming from practical applications: (i) It is usually not known in practice how to choose the weights. Thus by assuming that the weights are random variables, we obtain robust constructions (with respect to the weights) of lattice rules. This, to some extend, removes the necessity to carefully choose the weights. (ii) In practice it is convenient to use the same lattice rule for many different integrands. The best choice of weights for each integrand may vary to some degree, hence considering the weights random variables does justice to how lattice rules are used in applications. In this paper the worst-case error is therefore a random variable depending on random weights. We show how one can construct lattice rules which perform well for weights taken from a set with large measure. Such lattice rules are therefore robust with respect to certain changes in the weights. The construction algorithm uses the component-by-component (cbc) idea based on two criteria, one using the mean of the worst case error and the second criterion using a bound on the variance of the worst-case error. We call the new algorithm the cbc2c (component-by-component with 2 constraints) algorithm. We also study a generalized version which uses r constraints which we call the cbcrc (component-by-component with r constraints) algorithm. We show that lattice rules generated by the cbcrc algorithm simultaneously work well for all weights in a subspace spanned by the chosen weights ?? (1), . . . , ?? (r). Thus, in applications, instead of finding one set of weights, it is enough to find a convex polytope in which the optimal weights lie. The price for this method is a factor r in the upper bound on the error and in the construction cost of the lattice rule. Thus the burden of determining one set of weights very precisely can be shifted to the construction of good lattice rules. Numerical results indicate the benefit of using the cbc2c algorithm for certain choices of weights.  相似文献   

8.
In this paper, we propose a genetic programming (GP) based approach to evolve fuzzy rule based classifiers. For a c-class problem, a classifier consists of c trees. Each tree, T i , of the multi-tree classifier represents a set of rules for class i. During the evolutionary process, the inaccurate/inactive rules of the initial set of rules are removed by a cleaning scheme. This allows good rules to sustain and that eventually determines the number of rules. In the beginning, our GP scheme uses a randomly selected subset of features and then evolves the features to be used in each rule. The initial rules are constructed using prototypes, which are generated randomly as well as by the fuzzy k-means (FKM) algorithm. Besides, experiments are conducted in three different ways: Using only randomly generated rules, using a mixture of randomly generated rules and FKM prototype based rules, and with exclusively FKM prototype based rules. The performance of the classifiers is comparable irrespective of the type of initial rules. This emphasizes the novelty of the proposed evolutionary scheme. In this context, we propose a new mutation operation to alter the rule parameters. The GP scheme optimizes the structure of rules as well as the parameters involved. The method is validated on six benchmark data sets and the performance of the proposed scheme is found to be satisfactory.  相似文献   

9.
Self-rostering is receiving more and more attention in literature and in practice. With self-rostering, employees propose the schedule they prefer to work during a given planning horizon. However, these schedules often do not match with the staffing demand as specified by the organization. We present an approach to support creating feasible schedules that uses the schedules proposed by the employees as input and that aims to divide the burden of shift reassignments fairly throughout the employees. We discuss computational results and indicate how various model parameters influence scheduling performance indicators. The presented approach is flexible and easily extendable, since labor rule checks are isolated from the actual algorithm, which makes it easy to include additional labor rules in the approach. Moreover, our approach enables the user to make a trade-off between the quality of the resulting roster and the extent to which the planner is able to track the decisions of the algorithm.  相似文献   

10.
When combining classifiers in the Dempster-Shafer framework, Dempster’s rule is generally used. However, this rule assumes the classifiers to be independent. This paper investigates the use of other operators for combining non independent classifiers, including the cautious rule and, more generally, t-norm based rules with behavior ranging between Dempster’s rule and the cautious rule. Two strategies are investigated for learning an optimal combination scheme, based on a parameterized family of t-norms. The first one learns a single rule by minimizing an error criterion. The second strategy is a two-step procedure, in which groups of classifiers with similar outputs are first identified using a clustering algorithm. Then, within- and between-cluster rules are determined by minimizing an error criterion. Experiments with various synthetic and real data sets demonstrate the effectiveness of both the single rule and two-step strategies. Overall, optimizing a single t-norm based rule yields better results than using a fixed rule, including Dempster’s rule, and the two-step strategy brings further improvements.  相似文献   

11.
Branch and Bound (B&B) algorithms are known to exhibit an irregularity of the search tree. Therefore, developing a parallel approach for this kind of algorithms is a challenge. The efficiency of a B&B algorithm depends on the chosen Branching, Bounding, Selection, Rejection, and Termination rules. The question we investigate is how the chosen platform consisting of programming language, used libraries, or skeletons influences programming effort and algorithm performance. Selection rule and data management structures are usually hidden to programmers for frameworks with a high level of abstraction, as well as the load balancing strategy, when the algorithm is run in parallel. We investigate the question by implementing a multidimensional Global Optimization B&B algorithm with the help of three frameworks with a different level of abstraction (from more to less): Bobpp, Threading Building Blocks (TBB), and a customized Pthread implementation. The following has been found. The Bobpp implementation is easy to code, but exhibits the poorest scalability. On the contrast, the TBB and Pthread implementations scale almost linearly on the used platform. The TBB approach shows a slightly better productivity.  相似文献   

12.
We recently proposed several new pivot rules for achieving dual feasibility in linear programming, which are distinct from existing ones: the objective function value will no longer change necessarily monotonically in their solution process. In this paper, we report our further computational testing with one of them, the most-obtuse-angle rule. A two-phase dual simplex algorithm, in which the rule is used as a row selection rule for Phase-1, has been implemented in FORTRAN 77 modules, and was tested on a set of standard linear programming problems from NETLIB. We show that, if full pricing is applied, our code unambiguously will outperform MINOS 5.3, one of the best implementations of the simplex algorithm at present.  相似文献   

13.
The relative performance of five simple rules for selecting facility locations are compared with two heuristic location-allocation algorithms in the context of a real region. It is hypothesized that the performance of a rule is likely to be systematically related to the spatial structure of the region in which it is used. In the Iowa settlement system, known to have predictable structural regularity, some rules performed over 95 percent of the level achieved by the algorithms (the greedy add and the modified vertex exchange algorithms) on a range of commonly used criteria for evaluating locational efficiency. Under certain circumstances, these rules may be recommended as alternatives to more formal algorithms.  相似文献   

14.
In this paper we study construction algorithms for polynomial lattice rules modulo arbitrary polynomials. Polynomial lattice rules are a special class of digital nets which yield well distributed point sets in the unit cube for numerical integration.Niederreiter obtained an existence result for polynomial lattice rules modulo arbitrary polynomials for which the underlying point set has a small star discrepancy and recently Dick, Leobacher and Pillichshammer introduced construction algorithms for polynomial lattice rules modulo an irreducible polynomial for which the underlying point set has a small (weighted) star discrepancy.In this work we provide construction algorithms for polynomial lattice rules modulo arbitrary polynomials, thereby generalizing the previously obtained results. More precisely we use a component-by-component algorithm and a Korobov-type algorithm. We show how the search space of the Korobov-type algorithm can be reduced without sacrificing the convergence rate, hence this algorithm is particularly fast. Our findings are based on a detailed analysis of quantities closely related to the (weighted) star discrepancy.  相似文献   

15.
With new treatments and novel technology available, precision medicine has become a key topic in the new era of healthcare. Traditional statistical methods for precision medicine focus on subgroup discovery through identifying interactions between a few markers and treatment regimes. However, given the large scale and high dimensionality of modern datasets, it is difficult to detect the interactions between treatment and high-dimensional covariates. Recently, novel approaches have emerged that seek to directly estimate individualized treatment rules (ITR) via maximizing the expected clinical reward by using, for example, support vector machines (SVM) or decision trees. The latter enjoys great popularity in clinical practice due to its interpretability. In this article, we propose a new reward function and a novel decision tree algorithm to directly maximize rewards. We further improve a single tree decision rule by an ensemble decision tree algorithm, ITR random forests. Our final decision rule is an average over single decision trees and it is a soft probability rather than a hard choice. Depending on how strong the treatment recommendation is, physicians can make decisions based on our model along with their own judgment and experience. Performance of ITR forest and tree methods is assessed through simulations along with applications to a randomized controlled trial (RCT) of 1385 patients with diabetes and an EMR cohort of 5177 patients with diabetes. ITR forest and tree methods are implemented using statistical software R (https://github.com/kdoub5ha/ITR.Forest). Supplementary materials for this article are available online.  相似文献   

16.
We investigate the Lane–Riesenfeld subdivision algorithm for uniform B-splines, when the arithmetic mean in the various steps of the algorithm is replaced by nonlinear, symmetric, binary averaging rules. The averaging rules may be different in different steps of the algorithm. We review the notion of a symmetric binary averaging rule, and we derive some of its relevant properties. We then provide sufficient conditions on the nonlinear binary averaging rules used in the Lane–Riesenfeld algorithm that ensure the convergence of the algorithm to a continuous function. We also show that, when the averaging rules are C 2 with uniformly bounded second derivatives, then the limit is a C 1 function. A canonical family of nonlinear, symmetric averaging rules, the p-averages, is presented, and the Lane–Riesenfeld algorithm with these averages is investigated.  相似文献   

17.
In this paper, a fuzzy wavelet network is proposed to approximate arbitrary nonlinear functions based on the theory of multiresolution analysis (MRA) of wavelet transform and fuzzy concepts. The presented network combines TSK fuzzy models with wavelet transform and ROLS learning algorithm while still preserve the property of linearity in parameters. In order to reduce the number of fuzzy rules, fuzzy clustering is invoked. In the clustering algorithm, those wavelets that are closer to each other in the sense of the Euclidean norm are placed in a group and are used in the consequent part of a fuzzy rule. Antecedent parts of the rules are Gaussian membership functions. Determination of the deviation parameter is performed with the help of gold partition method. Here, mean of each function is derived by averaging center of all wavelets that are related to that particular rule. The overall developed fuzzy wavelet network is called fuzzy wave-net and simulation results show superior performance over previous networks.The present work is complemented by a second part which focuses on the control aspects and to be published in this journal([17]). This paper proposes an observer based self-structuring robust adaptive fuzzy wave-net (FWN) controller for a class of nonlinear uncertain multi-input multi-output systems.  相似文献   

18.
In this paper we propose a new procedure for classification based on a hybrid approach. The classification problem is solved by minimizing the distance between the components of each clusters and the centers of the clusters. The determination of the cluster centers is therefore a critical step in our approach and was addressed used the k-means algorithm. Once the centers of each class are determined, the rule of center neighbourhood is applied to assign an element to a class using mathematical programming. The implementation of our hybrid approach was validated on benchmark datasets and applied to an original biological dataset on 84 breast cancer tumours. Each tumour was measured for five parameters corresponding to the expression of five biomarkers (proteins). The obtained classification was discussed using biological knowledge and classical clinical experts’ classification of breast tumors.  相似文献   

19.
In this paper, we propose a novel method to mine association rules for classification problems namely AFSRC (AFS association rules for classification) realized in the framework of the axiomatic fuzzy set (AFS) theory. This model provides a simple and efficient rule generation mechanism. It can also retain meaningful rules for imbalanced classes by fuzzifying the concept of the class support of a rule. In addition, AFSRC can handle different data types occurring simultaneously. Furthermore, the new model can produce membership functions automatically by processing available data. An extensive suite of experiments are reported which offer a comprehensive comparison of the performance of the method with the performance of some other methods available in the literature. The experimental result shows that AFSRC outperforms most of other methods when being quantified in terms of accuracy and interpretability. AFSRC forms a classifier with high accuracy and more interpretable rule base of smaller size while retaining a sound balance between these two characteristics.  相似文献   

20.
Ordinary algorithms aim to pre-order a matrix in order to achieve a favourable structure for factorization. Two prototype structures are described which are commonly aimed at by current algorithms. A different structure, based on a lower Hessenberg form, is introduced. We show that the common structures may be obtained from the Hessenberg form in a simple way. A fundamental and very simple algorithm is presented for deriving the lower Hessenberg form. This algorithm is inherently a part of other common algorithms, but is usually obscured by other detailed heuristics. Some of these heuristics are seen to correspond to different tie-break rules in the fundamental algorithm. We describe a particularly simple tie-break rule used in SPK1 [11], which is effective and not at all well known. Ordinary algorithms need to be modified to enable pivoting for numerical stability to be carried out. We describe how the idea of threshold pivoting can be used in the context of these algorithms. Most ordering algorithms in common use employ some sort of look-ahead to promote a favourable structure. It is argued that look-ahead is generally ineffective in practice, and that numerical evidence supports the use of very simple strategies. A new ordering algorithm is presented which aims to obtain a narrower bandwidth in the lower Hessenberg form. Some limited numerical experience is presented which enables us to draw tentative conclusions about various ordering strategies, and how they compare with those in common use.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号