首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
“Logical analysis of data” (LAD) is a methodology developed since the late eighties, aimed at discovering hidden structural information in data sets. LAD was originally developed for analyzing binary data by using the theory of partially defined Boolean functions. An extension of LAD for the analysis of numerical data sets is achieved through the process of “binarization” consisting in the replacement of each numerical variable by binary “indicator” variables, each showing whether the value of the original variable is above or below a certain level. Binarization was successfully applied to the analysis of a variety of real life data sets. This paper develops the theoretical foundations of the binarization process studying the combinatorial optimization problems related to the minimization of the number of binary variables. To provide an algorithmic framework for the practical solution of such problems, we construct compact linear integer programming formulations of them. We develop polynomial time algorithms for some of these minimization problems, and prove NP-hardness of others. The authors gratefully acknowledge the partial support by the Office of Naval Research (grants N00014-92-J1375 and N00014-92-J4083).  相似文献   

2.
In this paper, we illustrate how data envelopment analysis (DEA) can be used to aid interactive classification. We assume that the scoring function for the classification problem is known. We use DEA to identify difficult to classify cases from a database and present them to the decision-maker one at a time. The decision-maker assigns a class to the presented case and based on the decision-maker class assignment, a tradeoff cutting plane is drawn using the scoring function and decision-maker’s input. The procedure continues for finite number of iterations and terminates with the final discriminant function. We also show how a hybrid DEA and mathematical programming approach can be used when user interaction is not desired. For non-interactive case, we compare a hybrid DEA and mathematical programming based approach with several statistical and machine learning approaches, and show that the hybrid approach provides competitive performance when compared to the other machine learning approaches.  相似文献   

3.
The paper presents a review of the basic concepts of the Logical Analysis of Data (LAD), along with a series of discrete optimization models associated to the implementation of various components of its general methodology, as well as an outline of applications of LAD to medical problems. The combinatorial optimization models described in the paper represent variations on the general theme of set covering, including some with nonlinear objective functions. The medical applications described include the development of diagnostic and prognostic systems in cancer research and pulmonology, risk assessment among cardiac patients, and the design of biomaterials.  相似文献   

4.
Nowadays, the diffusion of smartphones, tablet computers, and other multipurpose equipment with high-speed Internet access makes new data types available for data analysis and classification in marketing. So, e.g., it is now possible to collect images/snaps, music, or videos instead of ratings. With appropriate algorithms and software at hand, a marketing researcher could simply group or classify respondents according to the content of uploaded images/snaps, music, or videos. However, appropriate algorithms and software are sparsely known in marketing research up to now. The paper tries to close this gap. Algorithms and software from computer science are presented, adapted and applied to data analysis and classification in marketing. The new SPSS-like software package IMADAC is introduced.  相似文献   

5.
The problem of recognition (classification) by precedents is considered. Issues of improving the recognition ability and the training rate of logical correctors, i.e., the recognition procedures based on the construction of correct sets of elementary classifiers, are studied. The concept of a correct set of generic elementary classifiers is introduced and used to construct and investigate a qualitatively new model of the logical corrector. This model uses a wider class of correcting functions than in the earlier constructed models of logical correctors.  相似文献   

6.
In recent years, several methods have been proposed to deal with functional data classification problems (e.g., one-dimensional curves or two- or three-dimensional images). One popular general approach is based on the kernel-based method, proposed by Ferraty and Vieu (Comput Stat Data Anal 44:161–173, 2003). The performance of this general method depends heavily on the choice of the semi-metric. Motivated by Fan and Lin (J Am Stat Assoc 93:1007–1021, 1998) and our image data, we propose a new semi-metric, based on wavelet thresholding for classifying functional data. This wavelet-thresholding semi-metric is able to adapt to the smoothness of the data and provides for particularly good classification when data features are localized and/or sparse. We conduct simulation studies to compare our proposed method with several functional classification methods and study the relative performance of the methods for classifying positron emission tomography images.  相似文献   

7.
PLS classification of functional data   总被引:2,自引:0,他引:2  
Partial least squares (PLS) approach is proposed for linear discriminant analysis (LDA) when predictors are data of functional type (curves). Based on the equivalence between LDA and the multiple linear regression (binary response) and LDA and the canonical correlation analysis (more than two groups), the PLS regression on functional data is used to estimate the discriminant coefficient functions. A simulation study as well as an application to kneading data compare the PLS model results with those given by other methods.  相似文献   

8.
In this paper we propose a robust classification rule for skewed unimodal distributions. For low dimensional data, the classifier is based on minimizing the adjusted outlyingness to each group. In the case of high dimensional data, the robustified SIMCA method is adjusted for skewness. The robustness of the methods is investigated through different simulations and by applying it to some datasets.  相似文献   

9.
10.
The effects of data heterogeneity on the efficiency estimate by data envelopment analysis are evaluated here in terms of empirical applications in the computer industry. Scale or size variations of firms and heteroscedasticity are the two forms of heterogeneity considered here. Our empirical results show that the adverse effects of data heterogeneity can be considerably reduced by the methods suggested here.  相似文献   

11.
We study the asymptotic behaviour of the solution of elliptic problems with periodic data when the size of the domain on which the problem is set becomes unbounded.  相似文献   

12.
This paper considers the problem of interval scale data in the most widely used models of data envelopment analysis (DEA), the CCR and BCC models. Radial models require inputs and outputs measured on the ratio scale. Our focus is on how to deal with interval scale variables especially when the interval scale variable is a difference of two ratio scale variables like profit or the decrease/increase in bank accounts. We suggest the use of these ratio scale variables in a radial DEA model.  相似文献   

13.
In this paper, we investigate DEA with interval input-output data. First we show various extensions of efficiency and that 25 of them are essential. Second we formulate the efficiency test problems as mixed integer programming problems. We prove that 14 among 25 problems can be reduced to linear programming problems and that the other 11 efficiencies can be tested by solving a finite sequence of linear programming problems. Third, in order to obtain efficiency scores, we extend SBM model to interval input-output data. Fourth, to moderate a possible positive overassessment by DEA, we introduce the inverted DEA model with interval input-output data. Using efficiency and inefficiency scores, we propose a classification of DMUs. Finally, we apply the proposed approach to Japanese Bank Data and demonstrate its advantages.  相似文献   

14.
15.
This paper reexamines the unintended consequences of the two widely cited models for measuring environmental efficiency—the hyperbolic efficiency model (HEM) and directional distance function (DDF). I prove the existence of three main problems: (1) these two models are not monotonic in undesirable outputs (i.e., a firm’s efficiency may increase when polluting more, and vice versa), (2) strongly dominated firms may appear efficient, and (3) some firms’ environmental efficiency scores may be computed against strongly dominated points. Using the supply-chain carbon emissions data from the 50 major U.S. manufacturing companies, I empirically compare these two models with a weighted additive DEA model. The empirical results corroborate the analytical findings that the DDF and HEM models can generate spurious efficiency estimates and must be used with extreme caution.  相似文献   

16.
In the domain of data preparation for supervised classification, filter methods for variable ranking are time efficient. However, their intrinsic univariate limitation prevents them from detecting redundancies or constructive interactions between variables. This paper introduces a new method to automatically, rapidly and reliably extract the classificatory information of a pair of input variables. It is based on a simultaneous partitioning of the domains of each input variable, into intervals in the numerical case and into groups of categories in the categorical case. The resulting input data grid allows to quantify the joint information between the two input variables and the output variable. The best joint partitioning is searched by maximizing a Bayesian model selection criterion. Intensive experiments demonstrate the benefits of the approach, especially the significant improvement of accuracy for classification tasks.  相似文献   

17.
Advances in Data Analysis and Classification - Brand confusion occurs when a consumer is exposed to an advertisement (ad) for brand A but believes that it is for brand B. If more consumers are...  相似文献   

18.
The supervised classification of fuzzy data obtained from a random experiment is discussed. The data generation process is modelled through random fuzzy sets which, from a formal point of view, can be identified with certain function-valued random elements. First, one of the most versatile discriminant approaches in the context of functional data analysis is adapted to the specific case of interest. In this way, discriminant analysis based on nonparametric kernel density estimation is discussed. In general, this criterion is shown not to be optimal and to require large sample sizes. To avoid such inconveniences, a simpler approach which eludes the density estimation by considering conditional probabilities on certain balls is introduced. The approaches are applied to two experiments; one concerning fuzzy perceptions and linguistic labels and another one concerning flood analysis. The methods are tested against linear discriminant analysis and random K-fold cross validation.  相似文献   

19.
Cancer classification using genomic data is one of the major research areas in the medical field. Therefore, a number of binary classification methods have been proposed in recent years. Top Scoring Pair (TSP) method is one of the most promising techniques that classify genomic data in a lower dimensional subspace using a simple decision rule. In the present paper, we propose a supervised classification technique that utilizes incremental generalized eigenvalue and top scoring pair classifiers to obtain higher classification accuracy with a small training set. We validate our method by applying it to well known microarray data sets.  相似文献   

20.
We consider the performance of the independent rule in classification of multivariate binary data. In this article, broad studies are presented including the performance of the independent rule when the number of variables, d, is fixed or increased with the sample size, n. The latter situation includes the case of d=O(nτ) for τ>0 which cover “the small sample and the large dimension”, namely dn when τ>1. Park and Ghosh [J. Park, J.K. Ghosh, Persistence of plug-in rule in classification of high dimensional binary data, Journal of Statistical Planning and Inference 137 (2007) 3687–3707] studied the independent rule in terms of the consistency of misclassification error rate which is called persistence under growing numbers of dimensions, but they did not investigate the convergence rate. We present asymptotic results in view of the convergence rate under some structured parameter space and highlight that variable selection is necessary to improve the performance of the independent rule. We also extend the applications of the independent rule to the case of correlated binary data such as the Bahadur representation and the logit model. It is emphasized that variable selection is also needed in correlated binary data for the improvement of the performance of the independent rule.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号