期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Pruning boxes in a box-based classification method

Vincenzo Spinelli 《Advances in Data Analysis and Classification》2016,10(3):285-304

In this work we address an extension of box clustering in supervised classification problems that makes use of optimization problems to refine the results obtained by agglomerative techniques. The central concept of box clustering is that of homogeneous boxes that give rise to overtrained classifiers under some conditions. Thus, we focus our attentions on the issue of pruning out redundant boxes, using the information gleaned from the other boxes generated under the hypothesis that such a choice would identify simpler models with good predictive power. We propose a pruning method based on an integer optimization problem and a family of sub problems derived from the main one. The overall performances are then compared to the accuracy levels of competing methods on a wide range of real data sets. The method has proven to be robust, making it possible to derive a more compact system of boxes in the instance space with good performance on training and test data. 相似文献

2.

Chance constrained uncertain classification via robust optimization 总被引：1，自引：0，他引：1

Aharon Ben-Tal Sahely Bhadra Chiranjib Bhattacharyya J. Saketha Nath 《Mathematical Programming》2011,127(1):145-173

This paper studies the problem of constructing robust classifiers when the training is plagued with uncertainty. The problem is posed as a Chance-Constrained Program (CCP) which ensures that the uncertain data points are classified correctly with high probability. Unfortunately such a CCP turns out to be intractable. The key novelty is in employing Bernstein bounding schemes to relax the CCP as a convex second order cone program whose solution is guaranteed to satisfy the probabilistic constraint. Prior to this work, only the Chebyshev based relaxations were exploited in learning algorithms. Bernstein bounds employ richer partial information and hence can be far less conservative than Chebyshev bounds. Due to this efficient modeling of uncertainty, the resulting classifiers achieve higher classification margins and hence better generalization. Methodologies for classifying uncertain test data points and error measures for evaluating classifiers robust to uncertain data are discussed. Experimental results on synthetic and real-world datasets show that the proposed classifiers are better equipped to handle data uncertainty and outperform state-of-the-art in many cases. 相似文献

3.

Supervised box clustering

Vincenzo Spinelli 《Advances in Data Analysis and Classification》2017,11(1):179-204

In this work we address a technique for effectively clustering points in specific convex sets, called homogeneous boxes, having sides aligned with the coordinate axes (isothetic condition). The proposed clustering approach is based on homogeneity conditions, not according to some distance measure, and, even if it was originally developed in the context of the logical analysis of data, it is now placed inside the framework of Supervised clustering. First, we introduce the basic concepts in box geometry; then, we consider a generalized clustering algorithm based on a class of graphs, called incompatibility graphs. For supervised classification problems, we consider classifiers based on box sets, and compare the overall performances to the accuracy levels of competing methods for a wide range of real data sets. The results show that the proposed method performs comparably with other supervised learning methods in terms of accuracy. 相似文献

4.

Characterization of welding defects by fractal analysis of ultrasonic signals

A.P. Vieira E.P. de Moura L.L. Gonalves J.M.A. Rebello 《Chaos, solitons, and fractals》2008,38(3):748-754

In this work we apply tools developed for the study of fractal properties of time series to the problem of classifying defects in welding joints probed by ultrasonic techniques. We employ the fractal tools in a preprocessing step, producing curves with a considerably smaller number of points than in the original signals. These curves are then used in the classification step, which is realized by applying an extension of the Karhunen–Loève linear transformation. We show that our approach leads to small error rates, comparable with those obtained by using more time-consuming methods based on non-linear classifiers. 相似文献

5.

Disparate data fusion for protein phosphorylation prediction

Genetha A. Gray Pamela J. Williams W. Michael Brown Jean-Loup Faulon Kenneth L. Sale 《Annals of Operations Research》2010,174(1):219-235

New challenges in knowledge extraction include interpreting and classifying data sets while simultaneously considering related information to confirm results or identify false positives. We discuss a data fusion algorithmic framework targeted at this problem. It includes separate base classifiers for each data type and a fusion method for combining the individual classifiers. The fusion method is an extension of current ensemble classification techniques and has the advantage of allowing data to remain in heterogeneous databases. In this paper, we focus on the applicability of such a framework to the protein phosphorylation prediction problem. 相似文献

6.

Non parametric statistical models for on-line text classification

Paola Cerchiello Paolo Giudici 《Advances in Data Analysis and Classification》2012,6(4):277-288

Social media, such as blogs and on-line forums, contain a huge amount of information that is typically unorganized and fragmented. An important issue, that has been raising importance so far, is to classify on-line texts in order to detect possible anomalies. For example on-line texts representing consumer opinions can be, not only very precious and profitable for companies, but can also represent a serious damage if they are negative or faked. In this contribution we present a novel statistical methodology rooted in the context of classical text classification, in order to address such issues. In the literature, several classifiers have been proposed, among them support vector machine and naive Bayes classifiers. These approaches are not effective when coping with the problem of classifying texts belonging to an unknown author. To this aim, we propose to employ a new method, based on the combination of classification trees with non parametric approaches, such as Kruskal?CWallis and Brunner?CDette?CMunk test. The main application of what we propose is the capability to classify an author as a new one, that is potentially trustable, or as an old one, that is potentially faked. 相似文献

7.

On a problem of Nazarova and Roiter

B. Deng 《Commentarii Mathematici Helvetici》2000,75(3):368-409

相似文献

8.

Certificates of infeasibility via nonsmooth optimization

Hannes Fendl Arnold Neumaier Hermann Schichl 《Journal of Global Optimization》2017,69(1):157-182

An important aspect in the solution process of constraint satisfaction problems is to identify exclusion boxes which are boxes that do not contain feasible points. This paper presents a certificate of infeasibility for finding such boxes by solving a linearly constrained nonsmooth optimization problem. Furthermore, the constructed certificate can be used to enlarge an exclusion box by solving a nonlinearly constrained nonsmooth optimization problem. 相似文献

9.

Logical analysis of data: classification with justification

Endre Boros Yves Crama Peter L. Hammer Toshihide Ibaraki Alexander Kogan Kazuhisa Makino 《Annals of Operations Research》2011,188(1):33-61

Learning from examples is a frequently arising challenge, with a large number of algorithms proposed in the classification, data mining and machine learning literature. The evaluation of the quality of such algorithms is frequently carried out ex post, on an experimental basis: their performance is measured either by cross validation on benchmark data sets, or by clinical trials. Few of these approaches evaluate the learning process ex ante, on its own merits. In this paper, we discuss a property of rule-based classifiers which we call “justifiability”, and which focuses on the type of information extracted from the given training set in order to classify new observations. We investigate some interesting mathematical properties of justifiable classifiers. In particular, we establish the existence of justifiable classifiers, and we show that several well-known learning approaches, such as decision trees or nearest neighbor based methods, automatically provide justifiable classifiers. We also identify maximal subsets of observations which must be classified in the same way by every justifiable classifiers. Finally, we illustrate by a numerical example that using classifiers based on “most justifiable” rules does not seem to lead to overfitting, even though it involves an element of optimization. 相似文献

10.

一种基于集成学习的试题多知识点标注方法

下载免费PDF全文

郭崇慧吕征达《运筹与管理》2020,29(2):129-143

个性化试题推荐、试题难度预测、学习者建模等教育数据挖掘任务需要使用到学生作答数据资源及试题知识点标注,现阶段的试题数据都是由人工标注知识点。因此,利用机器学习方法自动标注试题知识点是一项迫切的需求。针对海量试题资源情况下的试题知识点自动标注问题,本文提出了一种基于集成学习的试题多知识点标注方法。首先,形式化定义了试题知识点标注问题,并借助教材目录和领域知识构建知识点的知识图谱作为类别标签。其次,采用基于集成学习的方法训练多个支持向量机作为基分类器,筛选出表现优异的基分类器进行集成,构建出试题多知识点标注模型。最后,以某在线教育平台数据库中的高中数学试题为实验数据集,应用所提方法预测试题考察的知识点,取得了较好的效果。相似文献

11.

Injective modules and semistar operations

Gabriele Fusacchia 《Journal of Pure and Applied Algebra》2012,216(1):77-90

We approach the problem of classifying injective modules over an integral domain, by considering the class of semistar Noetherian domains. When working with such domains, one has to focus on semistar ideals: as a consequence for modules, we restrict our study to the class of injective hulls of co-semistar modules, those in which the annihilator ideal of each nonzero element is semistar. We obtain a complete classification of this class, by describing its elements as injective hulls of uniquely determined direct sums of indecomposable injective modules; if moreover, we consider stable semistar operations, then we can further improve this result, obtaining a natural generalization of the classical Noetherian case. Our approach provides a unified treatment of results on injective modules over various kinds of domains obtained by Matlis, Cailleau, Beck, Fuchs and Kim–Kim–Park. 相似文献

12.

The classification of homomorphism homogeneous tournaments

《European Journal of Combinatorics》2020

The notion of homomorphism homogeneity was introduced by Cameron and Nešetřil as a natural generalization of the classical model-theoretic notion of homogeneity. A relational structure is called homomorphism homogeneous (HH) if every homomorphism between finite substructures extends to an endomorphism. It is called polymorphism homogeneous (PH) if every finite power of the structure is homomorphism homogeneous. Despite the similarity of the definitions, the HH and PH structures lead a life quite separate from the homogeneous structures. While the classification theory of homogeneous structure is dominated by Fraïssé-theory, other methods are needed for classifying HH and PH structures. In this paper we give a complete classification of HH countable tournaments (with loops allowed). We use this result in order to derive a classification of countable PH tournaments. The method of classification is designed to be useful also for other classes of relational structures. Our results extend previous research on the classification of finite HH and PH tournaments by Ilić, Mašulović, Nenadov, and the first author. 相似文献

13.

A Direct Combinatorial Proof of a Positivity Result

J. Gillis D. Zeilberger 《European Journal of Combinatorics》1983,4(3):221-223

Objects lying in four different boxes are rearranged in such a way that the number of objects in each box stays the same. Askey, Ismail, and Koornwinder proved that the cardinality of the set of rearrangements for which the number of objects changing boxes is even exceeds the cardinality of the set of rearrangements for which that number is odd. We give a simple counting proof of this fact. 相似文献

14.

Updating kernel methods in spectral decomposition by affinity perturbations

Yaniv Shmueli Guy Wolf Amir Averbuch 《Linear algebra and its applications》2012,437(6):1356-1365

Many machine learning based algorithms contain a training step that is done once. The training step is usually computational expensive since it involves processing of huge matrices. If the training profile is extracted from an evolving dynamic dataset, it has to be updated as some features of the training dataset are changed. This paper proposes a solution how to update this profile efficiently. Therefore, we investigate how to update the training profile when the data is constantly evolving. We assume that the data is modeled by a kernel method and processed by a spectral decomposition. In many algorithms for clustering and classification, a low dimensional representation of the affinity (kernel) graph of the embedded training dataset is computed. Then, it is used for classifying newly arrived data points. We present methods for updating such embeddings of the training datasets in an incremental way without the need to perform the entire computation upon the occurrences of changes in a small number of the training samples. Efficient computation of such an algorithm is critical in many web based applications. 相似文献

15.

The G4-Heuristic for the Pallet Loading Problem

Guntram Scheithauer Johannes Terno 《The Journal of the Operational Research Society》1996,47(4):511-522

A new heuristic for the well-known (two-dimensional orthogonal) pallet loading problem (PLP) is proposed in this paper. This heuristic, referred to as G4-heuristic, is based on the definition of the so-called G4-structure of packing patterns. The G4-structure is a generalization of the common used block structure of packing patterns which requires the same orientation of packed boxes within each block. The G4-heuristic yields in approximately 99% of the test instances an optimal solution and solves all instances exactly where at most 50 boxes are contained in an optimal packing. Although the algorithm is pseudo-polynomial the computational experiments reported show that also instances with more than 200 packed boxes in an optimal solution can be handled with a small amount of computational time. Moreover, so far there is not known any instance where the gap between optimal value and the value obtained by the G4-heuristic is larger than one box. 相似文献

16.

Subagging for credit scoring models

Giuseppe Paleologo André Elisseeff Gianluca Antonini 《European Journal of Operational Research》2010

The logistic regression framework has been for long time the most used statistical method when assessing customer credit risk. Recently, a more pragmatic approach has been adopted, where the first issue is credit risk prediction, instead of explanation. In this context, several classification techniques have been shown to perform well on credit scoring, such as support vector machines among others. While the investigation of better classifiers is an important research topic, the specific methodology chosen in real world applications has to deal with the challenges arising from the real world data collected in the industry. Such data are often highly unbalanced, part of the information can be missing and some common hypotheses, such as the i.i.d. one, can be violated. In this paper we present a case study based on a sample of IBM Italian customers, which presents all the challenges mentioned above. The main objective is to build and validate robust models, able to handle missing information, class unbalancedness and non-iid data points. We define a missing data imputation method and propose the use of an ensemble classification technique, subagging, particularly suitable for highly unbalanced data, such as credit scoring data. Both the imputation and subagging steps are embedded in a customized cross-validation loop, which handles dependencies between different credit requests. The methodology has been applied using several classifiers (kernel support vector machines, nearest neighbors, decision trees, Adaboost) and their subagged versions. The use of subagging improves the performance of the base classifier and we will show that subagging decision trees achieve better performance, still keeping the model simple and reasonably interpretable. 相似文献

17.

Mealy Morphisms of Enriched Categories

Robert Paré 《Applied Categorical Structures》2012,20(3):251-273

We define and study the properties of a notion of morphism of enriched categories, intermediate between strong functor and profunctor. Suggested by bicategorical considerations, it turns out to be a generalization of Mealy machine, well-known since the 1950’s in the theory of computation. When the base category is closed we construct a classifying category for Mealy morphisms, as we call them. This is also seen to give the free tensor completion of an enriched category. 相似文献

18.

A matrix modular neural network based on task decomposition with subspace division by adaptive affinity propagation clustering

Zhong-Qiu Zhao Jun Gao Herve Glotin Xindong Wu 《Applied Mathematical Modelling》2010,34(12):3884-3895

In this paper, a matrix modular neural network (MMNN) based on task decomposition with subspace division by adaptive affinity propagation clustering is developed to solve classification tasks. First, we propose an adaptive version to affinity propagation clustering, which is adopted to divide each class subspace into several clusters. By these divisions of class spaces, a classification problem can be decomposed into many binary classification subtasks between cluster pairs, which are much easier than the classification task in the original multi-class space. Each of these binary classification subtasks is solved by a neural network designed by a dynamic process. Then all designed network modules form a network matrix structure, which produces a matrix of outputs that will be fed to an integration machine so that a classification decision can be made. Finally, the experimental results show that our proposed MMNN system has more powerful generalization capability than the classifiers of single 3-layered perceptron and modular neural networks adopting other task decomposition techniques, and has a less training time consumption. 相似文献

19.

Classifying thick subcategories of the stable category of Cohen-Macaulay modules

Ryo Takahashi 《Advances in Mathematics》2010,225(4):2076-399

Various classification theorems of thick subcategories of a triangulated category have been obtained in many areas of mathematics. In this paper, as a higher-dimensional version of the classification theorem of thick subcategories of the stable category of finitely generated representations of a finite p-group due to Benson, Carlson and Rickard, we consider classifying thick subcategories of the stable category of Cohen-Macaulay modules over a Gorenstein local ring. The main result of this paper yields a complete classification of the thick subcategories of the stable category of Cohen-Macaulay modules over a local hypersurface in terms of specialization-closed subsets of the prime ideal spectrum of the ring which are contained in its singular locus. 相似文献

20.

Nijenhuis geometry II: Left-symmetric algebras and linearization problem for Nijenhuis operators

《Differential Geometry and its Applications》2021

A field of endomorphisms R is called a Nijenhuis operator if its Nijenhuis torsion vanishes. In this work we study a specific kind of singular points of R called points of scalar type. We show that the tangent space at such points possesses a natural structure of a left-symmetric algebra (also known as pre-Lie or Vinberg-Kozul algebras). Following Weinstein's approach to linearization of Poisson structures, we state the linearisation problem for Nijenhuis operators and give an answer in terms of non-degenerate left-symmetric algebras. In particular, in dimension 2, we give classification of non-degenerate left-symmetric algebras for the smooth category and, with some small gaps, for the analytic one. These two cases, analytic and smooth, differ. We also obtain a complete classification of two-dimensional real left-symmetric algebras, which may be an interesting result on its own. 相似文献