首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We propose a classification approach exploiting relationships between ellipsoidal separation and Support-vector Machine (SVM) with quadratic kernel. By adding a (Semidefinite Programming) SDP constraint to SVM model we ensure that the chosen hyperplane in feature space represents a non-degenerate ellipsoid in input space. This allows us to exploit SDP techniques within Support-vector Regression (SVR) approaches, yielding better results in case ellipsoid-shaped separators are appropriate for classification tasks. We compare our approach with spherical separation and SVM on some classification problems.  相似文献   

2.
With the rapid growth of databases in many modern enterprises data mining has become an increasingly important approach for data analysis. The operations research community has contributed significantly to this field, especially through the formulation and solution of numerous data mining problems as optimization problems, and several operations research applications can also be addressed using data mining methods. This paper provides a survey of the intersection of operations research and data mining. The primary goals of the paper are to illustrate the range of interactions between the two fields, present some detailed examples of important research work, and provide comprehensive references to other important work in the area. The paper thus looks at both the different optimization methods that can be used for data mining, as well as the data mining process itself and how operations research methods can be used in almost every step of this process. Promising directions for future research are also identified throughout the paper. Finally, the paper looks at some applications related to the area of management of electronic services, namely customer relationship management and personalization.  相似文献   

3.
In this paper, we propose a novel method to mine association rules for classification problems namely AFSRC (AFS association rules for classification) realized in the framework of the axiomatic fuzzy set (AFS) theory. This model provides a simple and efficient rule generation mechanism. It can also retain meaningful rules for imbalanced classes by fuzzifying the concept of the class support of a rule. In addition, AFSRC can handle different data types occurring simultaneously. Furthermore, the new model can produce membership functions automatically by processing available data. An extensive suite of experiments are reported which offer a comprehensive comparison of the performance of the method with the performance of some other methods available in the literature. The experimental result shows that AFSRC outperforms most of other methods when being quantified in terms of accuracy and interpretability. AFSRC forms a classifier with high accuracy and more interpretable rule base of smaller size while retaining a sound balance between these two characteristics.  相似文献   

4.
Mining association rules is a popular and well researched method for discovering interesting relations between variables in large databases. A practical problem is that at medium to low support values often a large number of frequent itemsets and an even larger number of association rules are found in a database. A widely used approach is to gradually increase minimum support and minimum confidence or to filter the found rules using increasingly strict constraints on additional measures of interestingness until the set of rules found is reduced to a manageable size. In this paper we describe a different approach which is based on the idea to first define a set of “interesting” itemsets (e.g., by a mixture of mining and expert knowledge) and then, in a second step to selectively generate rules for only these itemsets. The main advantage of this approach over increasing thresholds or filtering rules is that the number of rules found is significantly reduced while at the same time it is not necessary to increase the support and confidence thresholds which might lead to missing important information in the database.  相似文献   

5.
Data mining aims to find patterns in organizational databases. However, most techniques in mining do not consider knowledge of the quality of the database. In this work, we show how to incorporate into classification mining recent advances in the data quality field that view a database as the product of an imprecise manufacturing process where the flaws/defects are captured in quality matrices. We develop a general purpose method of incorporating data quality matrices into the data mining classification task. Our work differs from existing data preparation techniques since while other approaches detect and fix errors to ensure consistency with the entire data set our work makes use of the apriori knowledge of how the data is produced/manufactured.  相似文献   

6.
Recently developed SAGE technology enables us to simultaneously quantify the expression levels of thousands of genes in a population of cells. SAGE data is helpful in classification of different types of cancers. However, one main challenge in this task is the availability of a smaller number of samples compared to huge number of genes, many of which are irrelevant for classification. Another main challenge is that there is a lack of appropriate statistical methods that consider the specific properties of SAGE data. We propose an efficient solution by selecting relevant genes by information gain and building a multinomial event model for SAGE data. Promising results, in terms of accuracy, were obtained for the model proposed.   相似文献   

7.
A general approach to designing multiple classifiers represents them as a combination of several binary classifiers in order to enable correction of classification errors and increase reliability. This method is explained, for example, in Witten and Frank (Data Mining: Practical Machine Learning Tools and Techniques, 2005, Sect. 7.5). The aim of this paper is to investigate representations of this sort based on Brandt semigroups. We give a formula for the maximum number of errors of binary classifiers, which can be corrected by a multiple classifier of this type. Examples show that our formula does not carry over to larger classes of semigroups.  相似文献   

8.
The q-mode problem is a combinatorial optimization problem that requires partitioning of objects into clusters. We discuss theoretical properties of an existing mixed integer programming (MIP) model for this problem and offer alternative models and enhancements. Through a comprehensive experiment we investigate computational properties of these MIP models. This experiment reveals that, in practice, the MIP approach is more effective for instances containing strong natural clusters and it is not as effective for instances containing weak natural clusters. The experiment also reveals that one of the MIP models that we propose is more effective than the other models for solving larger instances of the problem.  相似文献   

9.
Advanced Genetic Programming Based Machine Learning   总被引:1,自引:0,他引:1  
A Genetic Programming based approach for solving classification problems is presented in this paper. Classification is understood as the act of placing an object into a set of categories, based on the object’s properties; classification algorithms are designed to learn a function which maps a vector of object features into one of several classes. This is done by analyzing a set of input-output examples (“training samples”) of the function. Here we present a method based on the theory of Genetic Algorithms and Genetic Programming that interprets classification problems as optimization problems: Each presented instance of the classification problem is interpreted as an instance of an optimization problem, and a solution is found by a heuristic optimization algorithm. The major new aspects presented in this paper are advanced algorithmic concepts as well as suitable genetic operators for this problem class (mainly the creation of new hypotheses by merging already existing ones and their detailed evaluation). The experimental part of the paper documents the results produced using new hybrid variants of Genetic Algorithms as well as investigated parameter settings. Graphical analysis is done using a novel multiclass classifier analysis concept based on the theory of Receiver Operating Characteristic curves. The work described in this paper was done within the Translational Research Project L282 “GP-Based Techniques for the Design of Virtual Sensors” sponsored by the Austrian Science Fund (FWF).  相似文献   

10.
We present the design of more effective and efficient genetic algorithm based data mining techniques that use the concepts of feature selection. Explicit feature selection is traditionally done as a wrapper approach where every candidate feature subset is evaluated by executing the data mining algorithm on that subset. In this article we present a GA for doing both the tasks of mining and feature selection simultaneously by evolving a binary code along side the chromosome structure used for evolving the rules. We then present a wrapper approach to feature selection based on Hausdorff distance measure. Results from applying the above techniques to a real world data mining problem show that combining both the feature selection methods provides the best performance in terms of prediction accuracy and computational efficiency.  相似文献   

11.
This paper presents a goal programming model that allows for the flexible handling of the two group classification problem. The goal programming model is based around the concepts of non-standard preference functions and penalty function modelling. An extension to a generalised distance metric case is given. The inclusion of multiple levels of classification based upon different levels of certainty is incorporated into the model. The model is tested on a real-life data set pertaining to cinema-going attendance and conclusions are drawn both in the context of the methodology and of the application.  相似文献   

12.
This study shows how data envelopment analysis (DEA) can be used to reduce vertical dimensionality of certain data mining databases. The study illustrates basic concepts using a real-world graduate admissions decision task. It is well known that cost sensitive mixed integer programming (MIP) problems are NP-complete. This study shows that heuristic solutions for cost sensitive classification problems can be obtained by solving a simple goal programming problem by that reduces the vertical dimension of the original learning dataset. Using simulated datasets and a misclassification cost performance metric, the performance of proposed goal programming heuristic is compared with the extended DEA-discriminant analysis MIP approach. The holdout sample results of our experiments shows that the proposed heuristic approach outperforms the extended DEA-discriminant analysis MIP approach.  相似文献   

13.
The progress in bioinformatics and biotechnology area has generated a huge amount of sequences that requires a detailed analysis. There are several data mining techniques that can be used to discovery patterns in large databases. This paper describes the development of a tool/methodology to extract hydrophobicity patterns/profiles that archives a specific secondary structure in proteins. The results indicate that association rules can be efficient method to investigate this kind of problem. This work contributes for two areas: prediction of protein structure and protein folding.  相似文献   

14.
Support Vector Machines (SVMs) is known to be a powerful nonparametric classification technique even for high-dimensional data. Although predictive ability is important, obtaining an easy-to-interpret classifier is also crucial in many applications. Linear SVM provides a classifier based on a linear score. In the case of functional data, the coefficient function that defines such linear score usually has many irregular oscillations, making it difficult to interpret.  相似文献   

15.
The purpose of this paper is to discuss the various pivot rules of the simplex method and its variants that have been developed in the last two decades, starting from the appearance of the minimal index rule of Bland. We are mainly concerned with finiteness properties of simplex type pivot rules. Well known classical results concerning the simplex method are not considered in this survey, but the connection between the new pivot methods and the classical ones, if there is any, is discussed.In this paper we discuss three classes of recently developed pivot rules for linear programming. The first and largest class is the class of essentially combinatorial pivot rules including minimal index type rules and recursive rules. These rules only use labeling and signs of the variables. The second class contains those pivot rules which can actually be considered as variants or generalizations or specializations of Lemke's method, and so they are closely related to parametric programming. The last class has the common feature that the rules all have close connections to certain interior point methods. Finally, we mention some open problems for future research.On leave from the Eötvös University, Budapest, and partially supported by OTKA No. 2115.  相似文献   

16.
Support Vector Machines (SVMs) are now very popular as a powerful method in pattern classification problems. One of main features of SVMs is to produce a separating hyperplane which maximizes the margin in feature space induced by nonlinear mapping using kernel function. As a result, SVMs can treat not only linear separation but also nonlinear separation. While the soft margin method of SVMs considers only the distance between separating hyperplane and misclassified data, we propose in this paper multi-objective programming formulation considering surplus variables. A similar formulation was extensively researched in linear discriminant analysis mostly in 1980s by using Goal Programming(GP). This paper compares these conventional methods such as SVMs and GP with our proposed formulation through several examples.Received: September 2003, Revised: December 2003,  相似文献   

17.
Cancer classification using genomic data is one of the major research areas in the medical field. Therefore, a number of binary classification methods have been proposed in recent years. Top Scoring Pair (TSP) method is one of the most promising techniques that classify genomic data in a lower dimensional subspace using a simple decision rule. In the present paper, we propose a supervised classification technique that utilizes incremental generalized eigenvalue and top scoring pair classifiers to obtain higher classification accuracy with a small training set. We validate our method by applying it to well known microarray data sets.  相似文献   

18.
For a given vectorx 0, the sequence {x t} which optimizes the sum of discounted rewardsr(x t, xt+1), wherer is a quadratic function, is shown to be generated by a linear decision rulex t+1=Sx t +R. Moreover, the coefficientsR,S are given by explicit formulas in terms of the coefficients of the reward functionr. A unique steady-state is shown to exist (except for a degenerate case), and its stability is discussed.  相似文献   

19.
In this paper we introduce the concept of s-monotone index selection rule for linear programming problems. We show that several known anti-cycling pivot rules like the minimal index, Last-In–First-Out and the most-often-selected-variable pivot rules are s-monotone index selection rules. Furthermore, we show a possible way to define new s-monotone pivot rules. We prove that several known algorithms like the primal (dual) simplex, MBU-simplex algorithms and criss-cross algorithm with s-monotone pivot rules are finite methods.  相似文献   

20.
Outranking methods constitute an important class of multicriteria classification models. Often, however, their implementation is cumbersome, due to the large number of parameters that the decision maker must specify. Past studies tried to address this issue using linear and nonlinear programming, to elicit the necessary preferential information from assignment examples. In this study, an evolutionary approach, based on the differential evolution algorithm, is proposed in the context of the ELECTRE TRI method. Computational results are given to test the effectiveness of the methodology and the quality of the obtained models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号