首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this work we address an extension of box clustering in supervised classification problems that makes use of optimization problems to refine the results obtained by agglomerative techniques. The central concept of box clustering is that of homogeneous boxes that give rise to overtrained classifiers under some conditions. Thus, we focus our attentions on the issue of pruning out redundant boxes, using the information gleaned from the other boxes generated under the hypothesis that such a choice would identify simpler models with good predictive power. We propose a pruning method based on an integer optimization problem and a family of sub problems derived from the main one. The overall performances are then compared to the accuracy levels of competing methods on a wide range of real data sets. The method has proven to be robust, making it possible to derive a more compact system of boxes in the instance space with good performance on training and test data.  相似文献   

2.
Chance constrained uncertain classification via robust optimization   总被引:1,自引:0,他引:1  
This paper studies the problem of constructing robust classifiers when the training is plagued with uncertainty. The problem is posed as a Chance-Constrained Program (CCP) which ensures that the uncertain data points are classified correctly with high probability. Unfortunately such a CCP turns out to be intractable. The key novelty is in employing Bernstein bounding schemes to relax the CCP as a convex second order cone program whose solution is guaranteed to satisfy the probabilistic constraint. Prior to this work, only the Chebyshev based relaxations were exploited in learning algorithms. Bernstein bounds employ richer partial information and hence can be far less conservative than Chebyshev bounds. Due to this efficient modeling of uncertainty, the resulting classifiers achieve higher classification margins and hence better generalization. Methodologies for classifying uncertain test data points and error measures for evaluating classifiers robust to uncertain data are discussed. Experimental results on synthetic and real-world datasets show that the proposed classifiers are better equipped to handle data uncertainty and outperform state-of-the-art in many cases.  相似文献   

3.
In this work we address a technique for effectively clustering points in specific convex sets, called homogeneous boxes, having sides aligned with the coordinate axes (isothetic condition). The proposed clustering approach is based on homogeneity conditions, not according to some distance measure, and, even if it was originally developed in the context of the logical analysis of data, it is now placed inside the framework of Supervised clustering. First, we introduce the basic concepts in box geometry; then, we consider a generalized clustering algorithm based on a class of graphs, called incompatibility graphs. For supervised classification problems, we consider classifiers based on box sets, and compare the overall performances to the accuracy levels of competing methods for a wide range of real data sets. The results show that the proposed method performs comparably with other supervised learning methods in terms of accuracy.  相似文献   

4.
In this work we apply tools developed for the study of fractal properties of time series to the problem of classifying defects in welding joints probed by ultrasonic techniques. We employ the fractal tools in a preprocessing step, producing curves with a considerably smaller number of points than in the original signals. These curves are then used in the classification step, which is realized by applying an extension of the Karhunen–Loève linear transformation. We show that our approach leads to small error rates, comparable with those obtained by using more time-consuming methods based on non-linear classifiers.  相似文献   

5.
New challenges in knowledge extraction include interpreting and classifying data sets while simultaneously considering related information to confirm results or identify false positives. We discuss a data fusion algorithmic framework targeted at this problem. It includes separate base classifiers for each data type and a fusion method for combining the individual classifiers. The fusion method is an extension of current ensemble classification techniques and has the advantage of allowing data to remain in heterogeneous databases. In this paper, we focus on the applicability of such a framework to the protein phosphorylation prediction problem.  相似文献   

6.
Social media, such as blogs and on-line forums, contain a huge amount of information that is typically unorganized and fragmented. An important issue, that has been raising importance so far, is to classify on-line texts in order to detect possible anomalies. For example on-line texts representing consumer opinions can be, not only very precious and profitable for companies, but can also represent a serious damage if they are negative or faked. In this contribution we present a novel statistical methodology rooted in the context of classical text classification, in order to address such issues. In the literature, several classifiers have been proposed, among them support vector machine and naive Bayes classifiers. These approaches are not effective when coping with the problem of classifying texts belonging to an unknown author. To this aim, we propose to employ a new method, based on the combination of classification trees with non parametric approaches, such as Kruskal?CWallis and Brunner?CDette?CMunk test. The main application of what we propose is the capability to classify an author as a new one, that is potentially trustable, or as an old one, that is potentially faked.  相似文献   

7.
8.
An important aspect in the solution process of constraint satisfaction problems is to identify exclusion boxes which are boxes that do not contain feasible points. This paper presents a certificate of infeasibility for finding such boxes by solving a linearly constrained nonsmooth optimization problem. Furthermore, the constructed certificate can be used to enlarge an exclusion box by solving a nonlinearly constrained nonsmooth optimization problem.  相似文献   

9.
Learning from examples is a frequently arising challenge, with a large number of algorithms proposed in the classification, data mining and machine learning literature. The evaluation of the quality of such algorithms is frequently carried out ex post, on an experimental basis: their performance is measured either by cross validation on benchmark data sets, or by clinical trials. Few of these approaches evaluate the learning process ex ante, on its own merits. In this paper, we discuss a property of rule-based classifiers which we call “justifiability”, and which focuses on the type of information extracted from the given training set in order to classify new observations. We investigate some interesting mathematical properties of justifiable classifiers. In particular, we establish the existence of justifiable classifiers, and we show that several well-known learning approaches, such as decision trees or nearest neighbor based methods, automatically provide justifiable classifiers. We also identify maximal subsets of observations which must be classified in the same way by every justifiable classifiers. Finally, we illustrate by a numerical example that using classifiers based on “most justifiable” rules does not seem to lead to overfitting, even though it involves an element of optimization.  相似文献   

10.
个性化试题推荐、试题难度预测、学习者建模等教育数据挖掘任务需要使用到学生作答数据资源及试题知识点标注,现阶段的试题数据都是由人工标注知识点。因此,利用机器学习方法自动标注试题知识点是一项迫切的需求。针对海量试题资源情况下的试题知识点自动标注问题,本文提出了一种基于集成学习的试题多知识点标注方法。首先,形式化定义了试题知识点标注问题,并借助教材目录和领域知识构建知识点的知识图谱作为类别标签。其次,采用基于集成学习的方法训练多个支持向量机作为基分类器,筛选出表现优异的基分类器进行集成,构建出试题多知识点标注模型。最后,以某在线教育平台数据库中的高中数学试题为实验数据集,应用所提方法预测试题考察的知识点,取得了较好的效果。  相似文献   

11.
We approach the problem of classifying injective modules over an integral domain, by considering the class of semistar Noetherian domains. When working with such domains, one has to focus on semistar ideals: as a consequence for modules, we restrict our study to the class of injective hulls of co-semistar modules, those in which the annihilator ideal of each nonzero element is semistar. We obtain a complete classification of this class, by describing its elements as injective hulls of uniquely determined direct sums of indecomposable injective modules; if moreover, we consider stable semistar operations, then we can further improve this result, obtaining a natural generalization of the classical Noetherian case. Our approach provides a unified treatment of results on injective modules over various kinds of domains obtained by Matlis, Cailleau, Beck, Fuchs and Kim–Kim–Park.  相似文献   

12.
The notion of homomorphism homogeneity was introduced by Cameron and Nešetřil as a natural generalization of the classical model-theoretic notion of homogeneity. A relational structure is called homomorphism homogeneous (HH) if every homomorphism between finite substructures extends to an endomorphism. It is called polymorphism homogeneous (PH) if every finite power of the structure is homomorphism homogeneous. Despite the similarity of the definitions, the HH and PH structures lead a life quite separate from the homogeneous structures. While the classification theory of homogeneous structure is dominated by Fraïssé-theory, other methods are needed for classifying HH and PH structures. In this paper we give a complete classification of HH countable tournaments (with loops allowed). We use this result in order to derive a classification of countable PH tournaments. The method of classification is designed to be useful also for other classes of relational structures. Our results extend previous research on the classification of finite HH and PH tournaments by Ilić, Mašulović, Nenadov, and the first author.  相似文献   

13.
Objects lying in four different boxes are rearranged in such a way that the number of objects in each box stays the same. Askey, Ismail, and Koornwinder proved that the cardinality of the set of rearrangements for which the number of objects changing boxes is even exceeds the cardinality of the set of rearrangements for which that number is odd. We give a simple counting proof of this fact.  相似文献   

14.
Many machine learning based algorithms contain a training step that is done once. The training step is usually computational expensive since it involves processing of huge matrices. If the training profile is extracted from an evolving dynamic dataset, it has to be updated as some features of the training dataset are changed. This paper proposes a solution how to update this profile efficiently. Therefore, we investigate how to update the training profile when the data is constantly evolving. We assume that the data is modeled by a kernel method and processed by a spectral decomposition. In many algorithms for clustering and classification, a low dimensional representation of the affinity (kernel) graph of the embedded training dataset is computed. Then, it is used for classifying newly arrived data points. We present methods for updating such embeddings of the training datasets in an incremental way without the need to perform the entire computation upon the occurrences of changes in a small number of the training samples. Efficient computation of such an algorithm is critical in many web based applications.  相似文献   

15.
A new heuristic for the well-known (two-dimensional orthogonal) pallet loading problem (PLP) is proposed in this paper. This heuristic, referred to as G4-heuristic, is based on the definition of the so-called G4-structure of packing patterns. The G4-structure is a generalization of the common used block structure of packing patterns which requires the same orientation of packed boxes within each block. The G4-heuristic yields in approximately 99% of the test instances an optimal solution and solves all instances exactly where at most 50 boxes are contained in an optimal packing. Although the algorithm is pseudo-polynomial the computational experiments reported show that also instances with more than 200 packed boxes in an optimal solution can be handled with a small amount of computational time. Moreover, so far there is not known any instance where the gap between optimal value and the value obtained by the G4-heuristic is larger than one box.  相似文献   

16.
The logistic regression framework has been for long time the most used statistical method when assessing customer credit risk. Recently, a more pragmatic approach has been adopted, where the first issue is credit risk prediction, instead of explanation. In this context, several classification techniques have been shown to perform well on credit scoring, such as support vector machines among others. While the investigation of better classifiers is an important research topic, the specific methodology chosen in real world applications has to deal with the challenges arising from the real world data collected in the industry. Such data are often highly unbalanced, part of the information can be missing and some common hypotheses, such as the i.i.d. one, can be violated. In this paper we present a case study based on a sample of IBM Italian customers, which presents all the challenges mentioned above. The main objective is to build and validate robust models, able to handle missing information, class unbalancedness and non-iid data points. We define a missing data imputation method and propose the use of an ensemble classification technique, subagging, particularly suitable for highly unbalanced data, such as credit scoring data. Both the imputation and subagging steps are embedded in a customized cross-validation loop, which handles dependencies between different credit requests. The methodology has been applied using several classifiers (kernel support vector machines, nearest neighbors, decision trees, Adaboost) and their subagged versions. The use of subagging improves the performance of the base classifier and we will show that subagging decision trees achieve better performance, still keeping the model simple and reasonably interpretable.  相似文献   

17.
We define and study the properties of a notion of morphism of enriched categories, intermediate between strong functor and profunctor. Suggested by bicategorical considerations, it turns out to be a generalization of Mealy machine, well-known since the 1950’s in the theory of computation. When the base category is closed we construct a classifying category for Mealy morphisms, as we call them. This is also seen to give the free tensor completion of an enriched category.  相似文献   

18.
In this paper, a matrix modular neural network (MMNN) based on task decomposition with subspace division by adaptive affinity propagation clustering is developed to solve classification tasks. First, we propose an adaptive version to affinity propagation clustering, which is adopted to divide each class subspace into several clusters. By these divisions of class spaces, a classification problem can be decomposed into many binary classification subtasks between cluster pairs, which are much easier than the classification task in the original multi-class space. Each of these binary classification subtasks is solved by a neural network designed by a dynamic process. Then all designed network modules form a network matrix structure, which produces a matrix of outputs that will be fed to an integration machine so that a classification decision can be made. Finally, the experimental results show that our proposed MMNN system has more powerful generalization capability than the classifiers of single 3-layered perceptron and modular neural networks adopting other task decomposition techniques, and has a less training time consumption.  相似文献   

19.
Various classification theorems of thick subcategories of a triangulated category have been obtained in many areas of mathematics. In this paper, as a higher-dimensional version of the classification theorem of thick subcategories of the stable category of finitely generated representations of a finite p-group due to Benson, Carlson and Rickard, we consider classifying thick subcategories of the stable category of Cohen-Macaulay modules over a Gorenstein local ring. The main result of this paper yields a complete classification of the thick subcategories of the stable category of Cohen-Macaulay modules over a local hypersurface in terms of specialization-closed subsets of the prime ideal spectrum of the ring which are contained in its singular locus.  相似文献   

20.
A field of endomorphisms R is called a Nijenhuis operator if its Nijenhuis torsion vanishes. In this work we study a specific kind of singular points of R called points of scalar type. We show that the tangent space at such points possesses a natural structure of a left-symmetric algebra (also known as pre-Lie or Vinberg-Kozul algebras). Following Weinstein's approach to linearization of Poisson structures, we state the linearisation problem for Nijenhuis operators and give an answer in terms of non-degenerate left-symmetric algebras. In particular, in dimension 2, we give classification of non-degenerate left-symmetric algebras for the smooth category and, with some small gaps, for the analytic one. These two cases, analytic and smooth, differ. We also obtain a complete classification of two-dimensional real left-symmetric algebras, which may be an interesting result on its own.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号