期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Learning Bayesian network classifiers by risk minimization

Roy Kelner 《International Journal of Approximate Reasoning》2012,53(2):248-272

Bayesian networks (BNs) provide a powerful graphical model for encoding the probabilistic relationships among a set of variables, and hence can naturally be used for classification. However, Bayesian network classifiers (BNCs) learned in the common way using likelihood scores usually tend to achieve only mediocre classification accuracy because these scores are less specific to classification, but rather suit a general inference problem. We propose risk minimization by cross validation (RMCV) using the 0/1 loss function, which is a classification-oriented score for unrestricted BNCs. RMCV is an extension of classification-oriented scores commonly used in learning restricted BNCs and non-BN classifiers. Using small real and synthetic problems, allowing for learning all possible graphs, we empirically demonstrate RMCV superiority to marginal and class-conditional likelihood-based scores with respect to classification accuracy. Experiments using twenty-two real-world datasets show that BNCs learned using an RMCV-based algorithm significantly outperform the naive Bayesian classifier (NBC), tree augmented NBC (TAN), and other BNCs learned using marginal or conditional likelihood scores and are on par with non-BN state of the art classifiers, such as support vector machine, neural network, and classification tree. These experiments also show that an optimized version of RMCV is faster than all unrestricted BNCs and comparable with the neural network with respect to run-time. The main conclusion from our experiments is that unrestricted BNCs, when learned properly, can be a good alternative to restricted BNCs and traditional machine-learning classifiers with respect to both accuracy and efficiency. 相似文献

2.

Subagging for credit scoring models

Giuseppe Paleologo André Elisseeff Gianluca Antonini 《European Journal of Operational Research》2010

The logistic regression framework has been for long time the most used statistical method when assessing customer credit risk. Recently, a more pragmatic approach has been adopted, where the first issue is credit risk prediction, instead of explanation. In this context, several classification techniques have been shown to perform well on credit scoring, such as support vector machines among others. While the investigation of better classifiers is an important research topic, the specific methodology chosen in real world applications has to deal with the challenges arising from the real world data collected in the industry. Such data are often highly unbalanced, part of the information can be missing and some common hypotheses, such as the i.i.d. one, can be violated. In this paper we present a case study based on a sample of IBM Italian customers, which presents all the challenges mentioned above. The main objective is to build and validate robust models, able to handle missing information, class unbalancedness and non-iid data points. We define a missing data imputation method and propose the use of an ensemble classification technique, subagging, particularly suitable for highly unbalanced data, such as credit scoring data. Both the imputation and subagging steps are embedded in a customized cross-validation loop, which handles dependencies between different credit requests. The methodology has been applied using several classifiers (kernel support vector machines, nearest neighbors, decision trees, Adaboost) and their subagged versions. The use of subagging improves the performance of the base classifier and we will show that subagging decision trees achieve better performance, still keeping the model simple and reasonably interpretable. 相似文献

3.

High-Dimensional Mixed Graphical Models

Jie Cheng Tianxi Li Elizaveta Levina Ji Zhu 《Journal of computational and graphical statistics》2017,26(2):367-378

While graphical models for continuous data (Gaussian graphical models) and discrete data (Ising models) have been extensively studied, there is little work on graphical models for datasets with both continuous and discrete variables (mixed data), which are common in many scientific applications. We propose a novel graphical model for mixed data, which is simple enough to be suitable for high-dimensional data, yet flexible enough to represent all possible graph structures. We develop a computationally efficient regression-based algorithm for fitting the model by focusing on the conditional log-likelihood of each variable given the rest. The parameters have a natural group structure, and sparsity in the fitted graph is attained by incorporating a group lasso penalty, approximated by a weighted lasso penalty for computational efficiency. We demonstrate the effectiveness of our method through an extensive simulation study and apply it to a music annotation dataset (CAL500), obtaining a sparse and interpretable graphical model relating the continuous features of the audio signal to binary variables such as genre, emotions, and usage associated with particular songs. While we focus on binary discrete variables for the main presentation, we also show that the proposed methodology can be easily extended to general discrete variables. 相似文献

4.

Qualitative inequalities for squared partial correlations of a Gaussian random vector

Sanjay Chaudhuri 《Annals of the Institute of Statistical Mathematics》2014,66(2):345-367

We describe various sets of conditional independence relationships, sufficient for qualitatively comparing non-vanishing squared partial correlations of a Gaussian random vector. These sufficient conditions are satisfied by several graphical Markov models. Rules for comparing degree of association among the vertices of such Gaussian graphical models are also developed. We apply these rules to compare conditional dependencies on Gaussian trees. In particular for trees, we show that such dependence can be completely characterised by the length of the paths joining the dependent vertices to each other and to the vertices conditioned on. We also apply our results to postulate rules for model selection for polytree models. Our rules apply to mutual information of Gaussian random vectors as well. 相似文献

5.

Evidence-based modelling of strategic fit: An introduction to RCaRBS

Malcolm J. Beynon Rhys Andrews George A. Boyne 《European Journal of Operational Research》2010

This paper presents an important development of a novel non-parametric object classification technique, namely CaRBS (Classification and Ranking Belief Simplex), to enable regression-type analyses. Termed RCaRBS, it is, as with CaRBS, an evidence-based technique, with its mathematical operations based on the Dempster–Shafer theory of evidence. Its exposition is demonstrated here by modelling the strategic fit of a set of public organizations. In addition to the consideration of the predictive fit of a series of models, graphical exploration of the contribution of individual variables in the derived models is also undertaken when using RCaRBS. Comparison analyses, including through fivefold cross-validation, are carried out using multiple regression and neural networks models. The findings highlight that RCaRBS achieves parity of test set predictive fit with regression and better fit than neural networks. The RCaRBS technique can also enable researchers to explore non-linear relationships (contributions) between variables in greater detail than either regression or neural networks models. 相似文献

6.

Copula selection for graphical models in continuous Estimation of Distribution Algorithms

Rogelio Salinas-Gutiérrez Arturo Hernández-Aguirre Enrique R. Villa-Diharce 《Computational Statistics》2014,29(3-4):685-713

This paper presents the use of graphical models and copula functions in Estimation of Distribution Algorithms (EDAs) for solving multivariate optimization problems. It is shown in this work how the incorporation of copula functions and graphical models for modeling the dependencies among variables provides some theoretical advantages over traditional EDAs. By means of copula functions and two well known graphical models, this paper presents a novel approach for defining new EDAs. Either dependence is modeled by a copula function chosen from a predefined set of six functions that aim to cover a wide range of inter-relations. It is also shown how the use of mutual information in the learning of graphical models implies a natural way of employing copula entropies. The experimental results on separable and non-separable functions show that the two new EDAs, which adopt copula functions to model dependencies, perform better than their original version with Gaussian variables. 相似文献

7.

A Bayesian framework for the combination of classifier outputs

H Zhu P A Beling G A Overstreet 《The Journal of the Operational Research Society》2002,53(7):719-727

We explore a Bayesian framework for constructing combinations of classifier outputs, as a means to improving overall classification results. We propose a sequential Bayesian framework to estimate the posterior probability of being in a certain class given multiple classifiers. This framework, which employs meta-Gaussian modelling but makes no assumptions about the distribution of classifier outputs, allows us to capture nonlinear dependencies between the combined classifiers and individuals. An important property of our method is that it produces a combined classifier that dominates the individuals upon which it is based in terms of Bayes risk, error rate, and receiver operating characteristic (ROC) curve. To illustrate the method, we show empirical results from the combination of credit scores generated from four different scoring models. 相似文献

8.

An additive utility mixed integer programming model for nonlinear discriminant analysis

J J Glen 《The Journal of the Operational Research Society》2008,59(11):1492-1505

Mathematical programming (MP) discriminant analysis models can be used to develop classification models for assigning observations of unknown class membership to one of a number of specified classes using values of a set of features associated with each observation. Since most MP discriminant analysis models generate linear discriminant functions, these MP models are generally used to develop linear classification models. Nonlinear classifiers may, however, have better classification performance than linear classifiers. In this paper, a mixed integer programming model is developed to generate nonlinear discriminant functions composed of monotone piecewise-linear marginal utility functions for each feature and the cut-off value for class membership. It is also shown that this model can be extended for feature selection. The performance of this new MP model for two-group discriminant analysis is compared with statistical discriminant analysis and other MP discriminant analysis models using a real problem and a number of simulated problem sets. 相似文献

9.

Multi-dimensional classification with Bayesian networks

C. Bielza 《International Journal of Approximate Reasoning》2011,52(6):705-727

Multi-dimensional classification aims at finding a function that assigns a vector of class values to a given vector of features. In this paper, this problem is tackled by a general family of models, called multi-dimensional Bayesian network classifiers (MBCs). This probabilistic graphical model organizes class and feature variables as three different subgraphs: class subgraph, feature subgraph, and bridge (from class to features) subgraph. Under the standard 0-1 loss function, the most probable explanation (MPE) must be computed, for which we provide theoretical results in both general MBCs and in MBCs decomposable into maximal connected components. Moreover, when computing the MPE, the vector of class values is covered by following a special ordering (gray code). Under other loss functions defined in accordance with a decomposable structure, we derive theoretical results on how to minimize the expected loss. Besides these inference issues, the paper presents flexible algorithms for learning MBC structures from data based on filter, wrapper and hybrid approaches. The cardinality of the search space is also given. New performance evaluation metrics adapted from the single-class setting are introduced. Experimental results with three benchmark data sets are encouraging, and they outperform state-of-the-art algorithms for multi-label classification. 相似文献

10.

Applications and extensions of cost curves to marine container inspection

R. Hoshino D. Coughtrey S. Sivaraja I. Volnyansky S. Auer A. Trichtchenko 《Annals of Operations Research》2011,187(1):159-183

Drummond and Holte introduced the theory of cost curves, a graphical technique for visualizing the performance of binary classifiers over the full range of possible class distributions and misclassification costs. In this paper, we use this concept to develop the Improvement Curve, a new performance metric for predictive models. Improvement curves are more user-friendly than cost curves and enable direct inter-classifier comparisons. We apply improvement curves to measure risk-assessment processes at Canada’s marine ports. We illustrate how implementing even a basic predictive model would lead to improved efficiency for the Canada Border Services Agency, regardless of class distributions or misclassification costs. 相似文献

11.

GA-Ensemble: a genetic algorithm for robust ensembles

Dong-Yop Oh J. Brian Gray 《Computational Statistics》2013,28(5):2333-2347

Many simple and complex methods have been developed to solve the classification problem. Boosting is one of the best known techniques for improving the accuracy of classifiers. However, boosting is prone to overfitting with noisy data and the final model is difficult to interpret. Some boosting methods, including AdaBoost, are also very sensitive to outliers. In this article we propose a new method, GA-Ensemble, which directly solves for the set of weak classifiers and their associated weights using a genetic algorithm. The genetic algorithm utilizes a new penalized fitness function that limits the number of weak classifiers and controls the effects of outliers by maximizing an appropriately chosen $p$ th percentile of margins. We compare the test set error rates of GA-Ensemble, AdaBoost, and GentleBoost (an outlier-resistant version of AdaBoost) using several artificial data sets and real-world data sets from the UC-Irvine Machine Learning Repository. GA-Ensemble is found to be more resistant to outliers and results in simpler predictive models than AdaBoost and GentleBoost. 相似文献

12.

Classification of underwater signals using wavelet transforms and neural networks

《Mathematical and Computer Modelling》1998,27(2):47-60

Neural network classifiers have been widely used in classification due to its adaptive and parallel processing ability. This paper concerns classification of underwater passive sonar signals radiated by ships using neural networks. Classification process can be divided into two stages: one is the signal preprocessing and feature extraction, the other is the recognition process. In the preprocessing and feature extraction stage, the wavelet transform (WT) is used to extract tonal features from the average power spectral density (APSD) of the input data. In the classification stage, two kinds of neural network classifiers are used to evaluate the classification results, inclusive of the hyperplane-based classifier—Multilayer Perceptron (MLP)—and the kernel-based classifier—Adaptive Kernel Classifier (AKC). The experimental results obtained from MLP with different configurations and algorithms show that the bipolar continuous function possesses a wider range and a higher value of the learning rate than the unipolar continuous function. Besides, AKC with fixed radius (modified AKC) sometimes gives better performance than AKC, but the former takes more training time in selecting the width of the receptive field. More important, networks trained with tonal features extracted by WT has 96% or 94% correction rate, but the training with original APSDs only have 80% correction rate. 相似文献

13.

Sparse optimization in feature selection: application in neuroimaging

K. Kampa S. Mehta C. A. Chou W. A. Chaovalitwongse T. J. Grabowski 《Journal of Global Optimization》2014,59(2-3):439-457

Feature selection plays an important role in the successful application of machine learning techniques to large real-world datasets. Avoiding model overfitting, especially when the number of features far exceeds the number of observations, requires selecting informative features and/or eliminating irrelevant ones. Searching for an optimal subset of features can be computationally expensive. Functional magnetic resonance imaging (fMRI) produces datasets with such characteristics creating challenges for applying machine learning techniques to classify cognitive states based on fMRI data. In this study, we present an embedded feature selection framework that integrates sparse optimization for regularization (or sparse regularization) and classification. This optimization approach attempts to maximize training accuracy while simultaneously enforcing sparsity by penalizing the objective function for the coefficients of the features. This process allows many coefficients to become zero, which effectively eliminates their corresponding features from the classification model. To demonstrate the utility of the approach, we apply our framework to three different real-world fMRI datasets. The results show that regularized classifiers yield better classification accuracy, especially when the number of initial features is large. The results further show that sparse regularization is key to achieving scientifically-relevant generalizability and functional localization of classifier features. The approach is thus highly suited for analysis of fMRI data. 相似文献

14.

Applicability and effectiveness of classifications models for achieving the twin objectives of growth and outreach of microfinance institutions

Manojit Chattopadhyay Subrata Kumar Mitra 《Computational & Mathematical Organization Theory》2017,23(4):451-474

Measuring performance of microfinance institutions (MFIs) is challenging as MFIs must achieve the twin objectives of outreach and sustainability. We propose a new measure to capture the performance of MFIs by placing their twin achievements in a 2 × 2 grid of a classification matrix. To make a dichotomous classification, MFIs that meet both their twin objectives are classified as ‘1’ and MFIs who could not meet their dual objectives simultaneously are designated as ‘0’. Six classifiers are applied to analyze the operating and financial characteristics of MFIs that can offer a predictive modeling solution in achieving their objectives and the results of the classifiers are comprehended using technique for order preference by similarity to ideal solution to identify an appropriate classifier based on ranking of measures of performance. Out of six classifiers applied in the study, kernel lab-support vector machines achieved highest accuracy and lowest classification error rate that discriminates the best achievement of the MFIs’ twin objective. MFIs can use both these steps to identify whether they are on the right path to attaining their multiple objectives from their operating characteristics. 相似文献

15.

Robust multicategory support vector machines using difference convex algorithm

Chong Zhang Minh Pham Sheng Fu Yufeng Liu 《Mathematical Programming》2018,169(1):277-305

The support vector machine (SVM) is one of the most popular classification methods in the machine learning literature. Binary SVM methods have been extensively studied, and have achieved many successes in various disciplines. However, generalization to multicategory SVM (MSVM) methods can be very challenging. Many existing methods estimate k functions for k classes with an explicit sum-to-zero constraint. It was shown recently that such a formulation can be suboptimal. Moreover, many existing MSVMs are not Fisher consistent, or do not take into account the effect of outliers. In this paper, we focus on classification in the angle-based framework, which is free of the explicit sum-to-zero constraint, hence more efficient, and propose two robust MSVM methods using truncated hinge loss functions. We show that our new classifiers can enjoy Fisher consistency, and simultaneously alleviate the impact of outliers to achieve more stable classification performance. To implement our proposed classifiers, we employ the difference convex algorithm for efficient computation. Theoretical and numerical results obtained indicate that for problems with potential outliers, our robust angle-based MSVMs can be very competitive among existing methods. 相似文献

16.

Feature selection for Bayesian network classifiers using the MDL-FS score

M?d?lina M. Drugan Marco A. Wiering 《International Journal of Approximate Reasoning》2010,51(6):695-717

相似文献

17.

Credit rating analysis using adaptive fuzzy rule-based systems: an industry-specific approach

Petr Hájek 《Central European Journal of Operations Research》2012,20(3):421-434

This paper presents an analysis of credit rating using fuzzy rule-based systems. The disadvantage of the models used in previous studies is that it is difficult to extract understandable knowledge from them. The root of this problem is the use of natural language that is typical for the credit rating process. This problem can be solved using fuzzy logic, which enables users to model the meaning of natural language words. Therefore, the fuzzy rule-based system adapted by a feed-forward neural network is designed to classify US companies (divided into the finance, manufacturing, mining, retail trade, services, and transportation industries) and municipalities into the credit rating classes obtained from rating agencies. Features are selected using a filter combined with a genetic algorithm as a search method. The resulting subsets of features confirm the assumption that the rating process is industry-specific (i.e. specific determinants are used for each industry). The results show that the credit rating classes assigned to bond issuers can be classified with high classification accuracy using low numbers of features, membership functions, and if-then rules. The comparison of selected fuzzy rule-based classifiers indicates that it is possible to increase classification performance by using different classifiers for individual industries. 相似文献

18.

Non parametric statistical models for on-line text classification

Paola Cerchiello Paolo Giudici 《Advances in Data Analysis and Classification》2012,6(4):277-288

Social media, such as blogs and on-line forums, contain a huge amount of information that is typically unorganized and fragmented. An important issue, that has been raising importance so far, is to classify on-line texts in order to detect possible anomalies. For example on-line texts representing consumer opinions can be, not only very precious and profitable for companies, but can also represent a serious damage if they are negative or faked. In this contribution we present a novel statistical methodology rooted in the context of classical text classification, in order to address such issues. In the literature, several classifiers have been proposed, among them support vector machine and naive Bayes classifiers. These approaches are not effective when coping with the problem of classifying texts belonging to an unknown author. To this aim, we propose to employ a new method, based on the combination of classification trees with non parametric approaches, such as Kruskal?CWallis and Brunner?CDette?CMunk test. The main application of what we propose is the capability to classify an author as a new one, that is potentially trustable, or as an old one, that is potentially faked. 相似文献

19.

Compatibility of conditionally specified models

Hua Yun Chen 《Statistics & probability letters》2010,80(7-8):670-677

A conditionally specified joint model is convenient to use in fields such as spatial data modeling, Gibbs sampling, and missing data imputation. One potential problem with such an approach is that the conditionally specified models may be incompatible, which can lead to serious problems in applications. We propose an odds ratio representation of a joint density to study the issue and derive conditions under which conditionally specified distributions are compatible and yield a joint distribution. Our conditions are the simplest to verify compared with those proposed in the literature. The proposal also explicitly constructs joint densities that are fully compatible with the conditionally specified densities when the conditional densities are compatible, and partially compatible with the conditional densities when they are incompatible. The construction result is then applied to checking the compatibility of the conditionally specified models. Ways to modify the conditionally specified models based on the construction of the joint models are also discussed when the conditionally specified models are incompatible. 相似文献

20.

Construction of classifier ensembles by means of artificial immune systems 总被引：2，自引：0，他引：2

Nicolás García-Pedrajas Colin Fyfe 《Journal of Heuristics》2008,14(3):285-310

This paper presents the application of Artificial Immune Systems to the design of classifier ensembles. Ensembles of classifiers are a very interesting alternative to single classifiers when facing difficult problems. In general, ensembles are able to achieve better performance in terms of learning and generalisation errors. Several papers have shown that the processes of classifier design and combination must be related in order to obtain better ensembles. Artificial Immune Systems are a recent paradigm based on the immune systems of animals. The features of this new paradigm make it very appropriate for the design of systems where many components must cooperate to solve a given task. The design of classifier ensembles can be considered within such a group of systems, as the cooperation of the individual classifiers is able to improve the performance of the overall system. This paper studies the viability of Artificial Immune Systems when dealing with ensemble design. We construct a population of classifiers that is evolved using an Artificial Immune algorithm. From this population of classifiers several different ensembles can be extracted. These ensembles are favourably compared with ensembles obtained using standard methods in 35 real-world classification problems from the UCI Machine Learning Repository. 相似文献