首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In high-dimensional classification problems, one is often interested in finding a few important discriminant directions in order to reduce the dimensionality. Fisher's linear discriminant analysis (LDA) is a commonly used method. Although LDA is guaranteed to find the best directions when each class has a Gaussian density with a common covariance matrix, it can fail if the class densities are more general. Using a likelihood-based interpretation of Fisher's LDA criterion, we develop a general method for finding important discriminant directions without assuming the class densities belong to any particular parametric family. We also show that our method can be easily integrated with projection pursuit density estimation to produce a powerful procedure for (reduced-rank) nonparametric discriminant analysis.  相似文献   

2.
An expert system was desired for a group decision-making process. A highly variable data set from previous groups' decisions was available to simulate past group decisions. This data set has much missing information and contains many possible errors. Classification and regression trees (CART) was selected for rule induction, and compared with multiple linear regression and discriminant analysis. We conclude that CART's decision rules can be used for rule induction. CART uses all available information and can predict observations with missing data. Errors in results from CART compare well with those from multiple linear regression and discriminant analysis. CART results are easier to understand.  相似文献   

3.
In this paper, we analyze matrix dynamics for online linear discriminant analysis (online LDA). Convergence of the dynamics have been studied for nonsingular cases; our main contribution is an analysis of singular cases, that is a key for efficient calculation without full-size square matrices. All fixed points of the dynamics are identified and their stability is examined.  相似文献   

4.
Corporate credit granting is a key commercial activity of financial institutions nowadays. A critical first step in the credit granting process usually involves a careful financial analysis of the creditworthiness of the potential client. Wrong decisions result either in foregoing valuable clients or, more severely, in substantial capital losses if the client subsequently defaults. It is thus of crucial importance to develop models that estimate the probability of corporate bankruptcy with a high degree of accuracy. Many studies focused on the use of financial ratios in linear statistical models, such as linear discriminant analysis and logistic regression. However, the obtained error rates are often high. In this paper, Least Squares Support Vector Machine (LS-SVM) classifiers, also known as kernel Fisher discriminant analysis, are applied within the Bayesian evidence framework in order to automatically infer and analyze the creditworthiness of potential corporate clients. The inferred posterior class probabilities of bankruptcy are then used to analyze the sensitivity of the classifier output with respect to the given inputs and to assist in the credit assignment decision making process. The suggested nonlinear kernel based classifiers yield better performances than linear discriminant analysis and logistic regression when applied to a real-life data set concerning commercial credit granting to mid-cap Belgian and Dutch firms.  相似文献   

5.
The clusterwise regression model is used to perform cluster analysis within a regression framework. While the traditional regression model assumes the regression coefficient (β) to be identical for all subjects in the sample, the clusterwise regression model allows β to vary with subjects of different clusters. Since the cluster membership is unknown, the estimation of the clusterwise regression is a tough combinatorial optimization problem. In this research, we propose a “Generalized Clusterwise Regression Model” which is formulated as a mathematical programming (MP) problem. A nonlinear programming procedure (with linear constraints) is proposed to solve the combinatorial problem and to estimate the cluster membership and β simultaneously. Moreover, by integrating the cluster analysis with the discriminant analysis, a clusterwise discriminant model is developed to incorporate parameter heterogeneity into the traditional discriminant analysis. The cluster membership and discriminant parameters are estimated simultaneously by another nonlinear programming model.  相似文献   

6.
The aim of this paper is twofold. In the first part, we recapitulate the main results regarding the shrinkage properties of partial least squares (PLS) regression. In particular, we give an alternative proof of the shape of the PLS shrinkage factors. It is well known that some of the factors are >1. We discuss in detail the effect of shrinkage factors for the mean squared error of linear estimators and argue that we cannot extend the results to PLS directly, as it is nonlinear. In the second part, we investigate the effect of shrinkage factors empirically. In particular, we point out that experiments on simulated and real world data show that bounding the absolute value of the PLS shrinkage factors by 1 seems to leads to a lower mean squared error.  相似文献   

7.
Statistical methods of discrimination and classification are used for the prediction of protein structure from amino acid sequence data. This provides information for the establishment of new paradigms of carcinogenesis modeling on the basis of gene expression. Feed forward neural networks and standard statistical classification procedures are used to classify proteins into fold classes. Logistic regression, additive models, and projection pursuit regression from the family of methods based on a posterior probabilities; linear, quadratic, and a flexible discriminant analysis from the class of methods based on class conditional probabilities, and the nearest-neighbors classification rule are applied to a data set of 268 sequences. From analyzing the prediction error obtained with a test sample (n = 125) and with a cross validation procedure, we conclude that the standard linear discriminant analysis and nearest-neighbor methods are at the same time statistically feasible and potent competitors to the more flexible tools of feed forward neural networks. Further research is needed to explore the gain obtainable from statistical methods by the application to larger sets of protein sequence data and to compare the results with those from biophysical approaches.  相似文献   

8.
Logistic回归模型在信用风险分析中的应用   总被引:2,自引:0,他引:2  
通过运行SPSS,建立L og istic回归信用评价模型(cred it eva luation m odel),用来对中国2000年106家上市公司进行两类模式分类,这两类模式是指按照公司的经营状况分为“差”和“正常”两个小组.对每一家上市公司,考虑其经营状况的4个主要财务指标:每股收益、每股净资产、净资产收益率和每股现金流量.仿真结果表明,L og istic回归信用评价模型对总体106个样本,判别准确率达到99.06%.此外,本文的研究结果还发现,当利用SPSS的D iscrim inan t给出的模型系数建立的线性判别分析模型和利用SPSS的M u ltinom ia lL og istic给出的模型参数建立的L og istic回归模型,L og istic回归模型的判别结果不如线性判别模型.但如果剔除不合格的样本,或是将样本数据规格化,则可以提高L og istic回归模型的分类准确率.  相似文献   

9.
A hybrid genetic model for the prediction of corporate failure   总被引:1,自引:0,他引:1  
This study examines the potential of a neural network (NN) model, whose inputs and structure are automatically selected by means of a genetic algorithm (GA), for the prediction of corporate failure using information drawn from financial statements. The results of this model are compared with those of a linear discriminant analysis (LDA) model. Data from a matched sample of 178 publicly quoted, failed and non-failed, US firms, drawn from the period 1991 to 2000 is used to train and test the models. The best evolved neural network correctly classified 86.7 (76.6)% of the firms in the training set, one (three) year(s) prior to failure, and 80.7 (66.0)% in the out-of-sample validation set. The LDA model correctly categorised 81.7 (75.0)% and 76.0 (64.7)% respectively. The results provide support for a hypothesis that corporate failure can be anticipated, and that a hybrid GA/NN model can outperform an LDA model in this domain.MSC codes: 62M45, 68W10, 90B50, 91C20  相似文献   

10.
The supervised classification of fuzzy data obtained from a random experiment is discussed. The data generation process is modelled through random fuzzy sets which, from a formal point of view, can be identified with certain function-valued random elements. First, one of the most versatile discriminant approaches in the context of functional data analysis is adapted to the specific case of interest. In this way, discriminant analysis based on nonparametric kernel density estimation is discussed. In general, this criterion is shown not to be optimal and to require large sample sizes. To avoid such inconveniences, a simpler approach which eludes the density estimation by considering conditional probabilities on certain balls is introduced. The approaches are applied to two experiments; one concerning fuzzy perceptions and linguistic labels and another one concerning flood analysis. The methods are tested against linear discriminant analysis and random K-fold cross validation.  相似文献   

11.
In this paper, we study the performance of various state-of-the-art classification algorithms applied to eight real-life credit scoring data sets. Some of the data sets originate from major Benelux and UK financial institutions. Different types of classifiers are evaluated and compared. Besides the well-known classification algorithms (eg logistic regression, discriminant analysis, k-nearest neighbour, neural networks and decision trees), this study also investigates the suitability and performance of some recently proposed, advanced kernel-based classification algorithms such as support vector machines and least-squares support vector machines (LS-SVMs). The performance is assessed using the classification accuracy and the area under the receiver operating characteristic curve. Statistically significant performance differences are identified using the appropriate test statistics. It is found that both the LS-SVM and neural network classifiers yield a very good performance, but also simple classifiers such as logistic regression and linear discriminant analysis perform very well for credit scoring.  相似文献   

12.
This article introduces a new method of supervised learning based on linear discrimination among the vertices of a regular simplex in Euclidean space. Each vertex represents a different category. Discrimination is phrased as a regression problem involving ?-insensitive residuals and a quadratic penalty on the coefficients of the linear predictors. The objective function can by minimized by a primal MM (majorization–minimization) algorithm that (a) relies on quadratic majorization and iteratively re-weighted least squares, (b) is simpler to program than algorithms that pass to the dual of the original optimization problem, and (c) can be accelerated by step doubling. Limited comparisons on real and simulated data suggest that the MM algorithm is competitive in statistical accuracy and computational speed with the best currently available algorithms for discriminant analysis.  相似文献   

13.
We propose a new algorithm for sparse estimation of eigenvectors in generalized eigenvalue problems (GEPs). The GEP arises in a number of modern data-analytic situations and statistical methods, including principal component analysis (PCA), multiclass linear discriminant analysis (LDA), canonical correlation analysis (CCA), sufficient dimension reduction (SDR), and invariant co-ordinate selection. We propose to modify the standard generalized orthogonal iteration with a sparsity-inducing penalty for the eigenvectors. To achieve this goal, we generalize the equation-solving step of orthogonal iteration to a penalized convex optimization problem. The resulting algorithm, called penalized orthogonal iteration, provides accurate estimation of the true eigenspace, when it is sparse. Also proposed is a computationally more efficient alternative, which works well for PCA and LDA problems. Numerical studies reveal that the proposed algorithms are competitive, and that our tuning procedure works well. We demonstrate applications of the proposed algorithm to obtain sparse estimates for PCA, multiclass LDA, CCA, and SDR. Supplementary materials for this article are available online.  相似文献   

14.
The credit scoring is a risk evaluation task considered as a critical decision for financial institutions in order to avoid wrong decision that may result in huge amount of losses. Classification models are one of the most widely used groups of data mining approaches that greatly help decision makers and managers to reduce their credit risk of granting credits to customers instead of intuitive experience or portfolio management. Accuracy is one of the most important criteria in order to choose a credit‐scoring model; and hence, the researches directed at improving upon the effectiveness of credit scoring models have never been stopped. In this article, a hybrid binary classification model, namely FMLP, is proposed for credit scoring, based on the basic concepts of fuzzy logic and artificial neural networks (ANNs). In the proposed model, instead of crisp weights and biases, used in traditional multilayer perceptrons (MLPs), fuzzy numbers are used in order to better model of the uncertainties and complexities in financial data sets. Empirical results of three well‐known benchmark credit data sets indicate that hybrid proposed model outperforms its component and also other those classification models such as support vector machines (SVMs), K‐nearest neighbor (KNN), quadratic discriminant analysis (QDA), and linear discriminant analysis (LDA). Therefore, it can be concluded that the proposed model can be an appropriate alternative tool for financial binary classification problems, especially in high uncertainty conditions. © 2013 Wiley Periodicals, Inc. Complexity 18: 46–57, 2013  相似文献   

15.
Currently, prenatal screening for Down Syndrome (DS) uses the mother's age as well as three biochemical markers for risk prediction. Risk calculations for the biochemical markers use a quadratic discriminant function. In this paper we compare several classification procedures to quadratic discrimination methods for biochemical-based DS risk prediction, based on data from a prospective multicentre prenatal screening study. We investigate alternative methods including linear discriminant methods, logistic regression methods, neural network methods, and classification and regression-tree methods. Several experiments are performed, and in each experiment resampling methods are used to create training and testing data sets. The procedures on the test data set are summarized by the area under their receiver operating characteristic curves. In each experiment this process is repeated 500 times and then the classification procedures are compared. We find that several methods are superior to the currently used quadratic discriminant method for risk estimation for these data. The implications of these results for prenatal screening programs are discussed.  相似文献   

16.
Partial LAD regression uses the L 1 norm associated with least absolute deviations (LAD) regression while retaining the same algorithmic structure of univariate partial least squares (PLS) regression. We use the bootstrap in order to assess the partial LAD regression model performance and to make comparisons to PLS regression. We use a variety of examples coming from NIR experiments as well as two sets of experimental data.  相似文献   

17.
Soltysik and Yarnold propose, as a method for two-group multivariate optimal discriminant analysis (MultiODA), selecting a linear discriminant function based on an algorithm by Warmack and Gonzalez. An important assumption underlying the Warmack–Gonzalez algorithm is likely to be violated when the data in the discriminant training samples are discrete, and in particular when they are nominal, causing the algorithm to fail. We offer modest changes to the algorithm that overcome this limitation.  相似文献   

18.
This paper studies how to identify influential observations in the functional linear model in which the predictor is functional and the response is scalar. Measurement of the effects of a single observation on estimation and prediction when the model is estimated by the principal components method is undertaken. For that, three statistics are introduced for measuring the influence of each observation on estimation and prediction of the functional linear model with scalar response that are generalizations of the measures proposed for the standard regression model by [D.R. Cook, Detection of influential observations in linear regression, Technometrics 19 (1977) 15-18; D. Peña, A new statistic for influence in linear regression, Technometrics 47 (2005) 1-12] respectively. A smoothed bootstrap method is proposed to estimate the quantiles of the influence measures, which allows us to point out which observations have the larger influence on estimation and prediction. The behavior of the three statistics and the quantile estimation bootstrap based method is analyzed via a simulation study. Finally, the practical use of the proposed statistics is illustrated by the analysis of a real data example, which show that the proposed measures are useful for detecting heterogeneity in the functional linear model with scalar response.  相似文献   

19.
Normal distribution based discriminant methods have been used for the classification of new entities into different groups based on a discriminant rule constructed from the learning set. In practice if the groups are not homogeneous, then mixture discriminant analysis of Hastie and Tibshirani (J R Stat Soc Ser B 58(1):155–176, 1996) is a useful approach, assuming that the distribution of the feature vectors is a mixture of multivariate normals. In this paper a new logistic regression model for heterogenous group structure of the learning set is proposed based on penalized multinomial mixture logit models. This approach is shown through simulation studies to be more effective. The results were compared with the standard mixture discriminant analysis approach using the probability of misclassification criterion. This comparison showed a slight reduction in the average probability of misclassification using this penalized multinomial mixture logit model as compared to the classical discriminant rules. It also showed better results when applied to practical life data problems producing smaller errors.  相似文献   

20.
最近几年,函数型数据分析的理论和应用飞速发展.在许多实际应用里,响应变量往往存在随机右删失的情况.考虑利用函数型部分线性分位数回归模型来刻画函数型和标量预测量与右删失响应变量之间的关系.基于函数型主成分基函数来逼近未知的斜率函数,通过极小化逆概率加权分位数损失函数得到未知系数的估计量.文章的估计方法容易通过加权分位数回归程序实现.在一定的假设条件下,给出了有限维参数估计量的渐近正态性与斜率函数估计量的收敛速度.最后,通过模拟计算与应用实例证明了所提方法的有效性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号