首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Data mining aims to find patterns in organizational databases. However, most techniques in mining do not consider knowledge of the quality of the database. In this work, we show how to incorporate into classification mining recent advances in the data quality field that view a database as the product of an imprecise manufacturing process where the flaws/defects are captured in quality matrices. We develop a general purpose method of incorporating data quality matrices into the data mining classification task. Our work differs from existing data preparation techniques since while other approaches detect and fix errors to ensure consistency with the entire data set our work makes use of the apriori knowledge of how the data is produced/manufactured.  相似文献   

2.
This paper concerns classification by Boolean functions. We investigate the classification accuracy obtained by standard classification techniques on unseen points (elements of the domain, {0,1}n, for some n) that are similar, in particular senses, to the points that have been observed as training observations. Explicitly, we use a new measure of how similar a point x∈{0,1}n is to a set of such points to restrict the domain of points on which we offer a classification. For points sufficiently dissimilar, no classification is given. We report on experimental results which indicate that the classification accuracies obtained on the resulting restricted domains are better than those obtained without restriction. These experiments involve a number of standard data-sets and classification techniques. We also compare the classification accuracies with those obtained by restricting the domain on which classification is given by using the Hamming distance.  相似文献   

3.
We propose a multivariate statistical framework for regional development assessment based on structural equation modelling with latent variables and show how such methods can be combined with non-parametric classification methods such as cluster analysis to obtain development grouping of territorial units. This approach is advantageous over the current approaches in the literature in that it takes account of distributional issues such as departures from normality in turn enabling application of more powerful inferential techniques; it enables modelling of structural relationships among latent development dimensions and subsequently formal statistical testing of model specification and testing of various hypothesis on the estimated parameters; it allows for complex structure of the factor loadings in the measurement models for the latent variables which can also be formally tested in the confirmatory framework; and enables computation of latent variable scores that take into account structural or causal relationships among latent variables and complex structure of the factor loadings in the measurement models. We apply these methods to regional development classification of Slovenia and Croatia.  相似文献   

4.
This article presents techniques for constructing classifiers that combine statistical information from training data with tangent approximations to known transformations; it demonstrates the techniques by applying them to a face recognition task. Our approach is to build Bayes classifiers with approximate class-conditional probability densities for measured data. The high dimension of the measurements in modern classification problems such as speech or image recognition makes inferring probability densities from feasibly sized training datasets difficult. We address the difficulty by imposing severely simplifying assumptions and exploiting a priori information about transformations to which classification should be invariant. For the face recognition task, we used a five-parameter group of such transformations consisting of rotation, shifts, and scalings. On the face recognition task, a classifier based on our techniques has an error rate that is 20% lower than that of the best algorithm in a reference software distribution.  相似文献   

5.
The first part of this serial pointed out the integration of the German concept Grundvorstellungen into current concepts, especially its central position as a mediator between reality and mathematics. The next stage is therefore to explain the use of the proportion and percentage calculations within this concept and how it can be used as a criterion to detect the demands of mathematical problems. Firstly, we will take a look at a classification of mathematical items. This classification shows the complexity of the mathematical item in respect of Grundvorstellungen. The consequences of this consideration have hierarchical levels of demand on these items. Furthermore to show how to describe and interpret these results on the basis of these levels, we refer to selected results of the PISA 2000 comparative study.  相似文献   

6.
In this paper, we consider the problem of signal classification. First, the signal is translated into a persistence diagram through the use of delay-embedding and persistent homology. Endowing the data space of persistence diagrams with a metric from point processes, we show that it admits statistical structure in the form of Fréchet means and variances and a classification scheme is established. In contrast with the Wasserstein distance, this metric accounts for changes in small persistence and changes in cardinality. The classification results using this distance are benchmarked on both synthetic data and real acoustic signals and it is demonstrated that this classifier outperforms current signal classification techniques.  相似文献   

7.
Supervised learning methods are powerful techniques to learn a function from a given set of labeled data, the so-called training data. In this paper the support vector machines approach is applied to an image classification task. Starting with the corresponding Tikhonov regularization problem, reformulated as a convex optimization problem, we introduce a conjugate dual problem to it and prove that, whenever strong duality holds, the function to be learned can be expressed via the dual optimal solutions. Corresponding dual problems are then derived for different loss functions. The theoretical results are applied by numerically solving a classification task using high dimensional real-world data in order to obtain optimal classifiers. The results demonstrate the excellent performance of support vector classification for this particular problem.  相似文献   

8.
Abstract

A new family of plug-in classification techniques has recently been developed in the statistics and machine learning literature. A plug-in classification technique (PICT) is a method that takes a standard classifier (such as LDA or TREES) and plugs it into an algorithm to produce a new classifier. The standard classifier is known as the base classifier. These methods often produce large improvements over using a single classifier. In this article we investigate one of these methods and give some motivation for its success.  相似文献   

9.
Stochastic Global Optimization: Problem Classes and Solution Techniques   总被引:4,自引:0,他引:4  
There is a lack of a representative set of test problems for comparing global optimization methods. To remedy this a classification of essentially unconstrained global optimization problems into unimodal, easy, moderately difficult, and difficult problems is proposed. The problem features giving this classification are the chance to miss the region of attraction of the global minimum, embeddedness of the global minimum, and the number of minimizers. The classification of some often used test problems are given and it is recognized that most of them are easy and some even unimodal. Global optimization solution techniques treated are global, local, and adaptive search and their use for tackling different classes of problems is discussed. The problem of fair comparison of methods is then adressed. Further possible components of a general global optimization tool based on the problem classes and solution techniques is presented.  相似文献   

10.
Cancer classification using genomic data is one of the major research areas in the medical field. Therefore, a number of binary classification methods have been proposed in recent years. Top Scoring Pair (TSP) method is one of the most promising techniques that classify genomic data in a lower dimensional subspace using a simple decision rule. In the present paper, we propose a supervised classification technique that utilizes incremental generalized eigenvalue and top scoring pair classifiers to obtain higher classification accuracy with a small training set. We validate our method by applying it to well known microarray data sets.  相似文献   

11.
Many real world business situations require classification decisions that must often be made on the basis of judgment and past performance. In this paper, we propose a decision framework that combines multiple models or techniques in a complementary fashion to provide input to managers who make such decisions on a routine basis. We illustrate the framework by specifically using five different classification techniques – neural networks, discriminant analysis, quadratic discriminant analysis (QDA), k-nearest neighbor (KNN), and multinomial logistic regression analysis (MNL). Application of the decision framework to an actual retail department store data shows that it is most useful in those cases where uncertainty is high and a priori classification cannot be made with a high degree of reliability. The proposed framework thus enhances the value of exception reporting, and provides managers additional insights into the phenomenon being studied.  相似文献   

12.
Classification models can be developed by statistical or mathematical programming discriminant analysis techniques. Variable selection extensions of these techniques allow the development of classification models with a limited number of variables. Although stepwise statistical variable selection methods are widely used, the performance of the resultant classification models may not be optimal because of the stepwise selection protocol and the nature of the group separation criterion. A mixed integer programming approach for selecting variables for maximum classification accuracy is developed in this paper and the performance of this approach, measured by the leave-one-out hit rate, is compared with the published results from a statistical approach in which all possible variable subsets were considered. Although this mixed integer programming approach can only be applied to problems with a relatively small number of observations, it may be of great value where classification decisions must be based on a limited number of observations.  相似文献   

13.
Under the assumption that the variables in the wave equation can be separated and its coefficients are periodic, we develop a classification of seismic eigenwaves and use it to answer some questions as to how to specify the type and basic parameters of a wave on the basis of measurements of amplitudes, whether there exist points of chaos, and how to predict them.  相似文献   

14.
This work continues the account given in Part I of the paper1 by presenting a short summary of some of the mathematical techniques employed in the wave front analysis of quasi‐linear hyperbolic partial differential equations. Starting from a number of important physical examples, the classification of quasi‐linear first‐order systems is discussed and followed by a simple account of the theory of characteristics for systems involving n dependent and two independent variables. A special example is discussed showing how discontinuities arise in solutions, and the paper is concluded by an account of wave front analysis as applied to the piston problem of gas dynamics.  相似文献   

15.
The theory of group classification of differential equations is analyzed, substantially extended and enhanced based on the new notions of conditional equivalence group and normalized class of differential equations. Effective new techniques are proposed. Using these, we exhaustively describe admissible point transformations in classes of nonlinear (1+1)-dimensional Schrödinger equations, in particular, in the class of nonlinear (1+1)-dimensional Schrödinger equations with modular nonlinearities and potentials and some subclasses thereof. We then carry out a complete group classification in this class, representing it as a union of disjoint normalized subclasses and applying a combination of algebraic and compatibility methods. Moreover, we introduce the complete classification of (1+2)-dimensional cubic Schrödinger equations with potentials. The proposed approach can be applied to studying symmetry properties of a wide range of differential equations.  相似文献   

16.
We are considering the problem of multi-criteria classification. In this problem, a set of “if … then …” decision rules is used as a preference model to classify objects evaluated by a set of criteria and regular attributes. Given a sample of classification examples, called learning data set, the rules are induced from dominance-based rough approximations of preference-ordered decision classes, according to the Variable Consistency Dominance-based Rough Set Approach (VC-DRSA). The main question to be answered in this paper is how to classify an object using decision rules in situation where it is covered by (i) no rule, (ii) exactly one rule, (iii) several rules. The proposed classification scheme can be applied to both, learning data set (to restore the classification known from examples) and testing data set (to predict classification of new objects). A hypothetical example from the area of telecommunications is used for illustration of the proposed classification method and for a comparison with some previous proposals.  相似文献   

17.
The classification problem is of major importance to a plethora of research fields. The outgrowth in the development of classification methods has led to the development of several techniques. The objective of this research is to provide some insight on the relative performance of some well-known classification methods, through an experimental analysis covering data sets with different characteristics. The methods used in the analysis include statistical techniques, machine learning methods and multicriteria decision aid. The results of the study can be used to support the design of classification systems and the identification of the proper methods that could be used given the data characteristics.  相似文献   

18.
This article proposes a new quantity for assessing the number of groups or clusters in a dataset. The key idea is to view clustering as a supervised classification problem, in which we must also estimate the “true” class labels. The resulting “prediction strength” measure assesses how many groups can be predicted from the data, and how well. In the process, we develop novel notions of bias and variance for unlabeled data. Prediction strength performs well in simulation studies, and we apply it to clusters of breast cancer samples from a DNA microarray study. Finally, some consistency properties of the method are established.  相似文献   

19.
Automated classification of granite slabs is a key aspect of the automation of processes in the granite transformation sector. This classification task is currently performed manually on the basis of the subjective opinions of an expert in regard to texture and colour. We describe a classification method based on machine learning techniques fed with spectral information for the rock, supplied in the form of discrete values captured by a suitably parameterized spectrophotometer. The machine learning techniques applied in our research take a functional perspective, with the spectral function smoothed in accordance with the data supplied by the spectrophotometer. On the basis of the results obtained, it can be concluded that the proposed method is suitable for automatically classifying ornamental rock.  相似文献   

20.
We demonstrate how optimization problems arise in the field of pattern classification, in particular in using piecewise-linear classification and classification based on an optimal linear separator. We motivate the need in this area for a general purpose optimization approach. We discuss ALOPEX, a biased random search approach, from the point of view of this need. While ALOPEX itself failed to fulfil our need, a newly-introduced generalization of it (iterated ALOPEX) was found to be appropriate for the optimization problems of our particular concern. We conclude the paper with a brief critical evaluation of this approach as compared to our original aims.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号