首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Credit scoring is a method of modelling potential risk of credit applications. Traditionally, logistic regression and discriminant analysis are the most widely used approaches to create scoring models in the industry. However, these methods are associated with quite a few limitations, such as being instable with high-dimensional data and small sample size, intensive variable selection effort and incapability of efficiently handling non-linear features. Most importantly, based on these algorithms, it is difficult to automate the modelling process and when population changes occur, the static models usually fail to adapt and may need to be rebuilt from scratch. In the last few years, the kernel learning approach has been investigated to solve these problems. However, the existing applications of this type of methods (in particular the SVM) in credit scoring have all focused on the batch model and did not address the important problem of how to update the scoring model on-line. This paper presents a novel and practical adaptive scoring system based on an incremental kernel method. With this approach, the scoring model is adjusted according to an on-line update procedure that can always converge to the optimal solution without information loss or running into numerical difficulties. Non-linear features in the data are automatically included in the model through a kernel transformation. This approach does not require any variable reduction effort and is also robust for scoring data with a large number of attributes and highly unbalanced class distributions. Moreover, a new potential kernel function is introduced to further improve the predictive performance of the scoring model and a kernel attribute ranking technique is used that adds transparency in the final model. Experimental studies using real world data sets have demonstrated the effectiveness of the proposed method.  相似文献   

2.
The purpose of the present paper is to explore the ability of neural networks such as multilayer perceptrons and modular neural networks, and traditional techniques such as linear discriminant analysis and logistic regression, in building credit scoring models in the credit union environment. Also, since funding and small sample size often preclude the use of customized credit scoring models at small credit unions, we investigate the performance of generic models and compare them with customized models. Our results indicate that customized neural networks offer a very promising avenue if the measure of performance is percentage of bad loans correctly classified. However, if the measure of performance is percentage of good and bad loans correctly classified, logistic regression models are comparable to the neural networks approach. The performance of generic models was not as good as the customized models, particularly when it came to correctly classifying bad loans. Although we found significant differences in the results for the three credit unions, our modular neural network could not accommodate these differences, indicating that more innovative architectures might be necessary for building effective generic models.  相似文献   

3.
The logistic regression framework has been for long time the most used statistical method when assessing customer credit risk. Recently, a more pragmatic approach has been adopted, where the first issue is credit risk prediction, instead of explanation. In this context, several classification techniques have been shown to perform well on credit scoring, such as support vector machines among others. While the investigation of better classifiers is an important research topic, the specific methodology chosen in real world applications has to deal with the challenges arising from the real world data collected in the industry. Such data are often highly unbalanced, part of the information can be missing and some common hypotheses, such as the i.i.d. one, can be violated. In this paper we present a case study based on a sample of IBM Italian customers, which presents all the challenges mentioned above. The main objective is to build and validate robust models, able to handle missing information, class unbalancedness and non-iid data points. We define a missing data imputation method and propose the use of an ensemble classification technique, subagging, particularly suitable for highly unbalanced data, such as credit scoring data. Both the imputation and subagging steps are embedded in a customized cross-validation loop, which handles dependencies between different credit requests. The methodology has been applied using several classifiers (kernel support vector machines, nearest neighbors, decision trees, Adaboost) and their subagged versions. The use of subagging improves the performance of the base classifier and we will show that subagging decision trees achieve better performance, still keeping the model simple and reasonably interpretable.  相似文献   

4.
Data-based scorecards, such as those used in credit scoring, age with time and need to be rebuilt or readjusted. Unlike the huge literature on modelling the replacement and maintenance of equipment there have been hardly any models that deal with this problem for scorecards. This paper identifies an effective way of describing the predictive ability of the scorecard and from this describes a simple model for how its predictive ability will develop. Using a dynamic programming approach one is then able to find when it is optimal to rebuild and when to readjust a scorecard. Failing to readjust or rebuild a scorecard when they aged was one of the defects in credit scoring identified in the investigations into the sub-prime mortgage crisis.  相似文献   

5.
The features used may have an important effect on the performance of credit scoring models. The process of choosing the best set of features for credit scoring models is usually unsystematic and dominated by somewhat arbitrary trial. This paper presents an empirical study of four machine learning feature selection methods. These methods provide an automatic data mining technique for reducing the feature space. The study illustrates how four feature selection methods—‘ReliefF’, ‘Correlation-based’, ‘Consistency-based’ and ‘Wrapper’ algorithms help to improve three aspects of the performance of scoring models: model simplicity, model speed and model accuracy. The experiments are conducted on real data sets using four classification algorithms—‘model tree (M5)’, ‘neural network (multi-layer perceptron with back-propagation)’, ‘logistic regression’, and ‘k-nearest-neighbours’.  相似文献   

6.
特征选择方法在信用评估指标选取中的应用   总被引:2,自引:0,他引:2  
在信用评分模型中所运用的指标变量对模型的表现有重要的影响,指标选取方法的科学化规范化水平有待于进一步提高。本文研究了机器学习领域的特征选择方法在定量确定信用评分模型指标体系上的应用。以实际信用评估问题为例,对四种特征选择方法(ReliefF方法、基于相关性的方法、基于一致性的方法和包裹性)进行了比较试验,验证了特征选择方法可以在精简性、速度和准确率三个方面提高信用评分模型的表现。其中基于一致性的方法和包裹法表现优于Reli-efF方法和基于相关性的方法。  相似文献   

7.
客户信用评估是银行等金融企业日常经营活动中的重要组成部分。一般违约样本在客户总体中只占少数,而能按时还款客户样本占多数,这就是客户信用评估中常见的类别不平衡问题。目前,用于客户信用评估的方法尚不能有效解决少数类样本稀缺带来的类别不平衡。本研究引入迁移学习技术整合系统内外部信息,以解决少数类样本稀缺带来的类别不平衡问题。为了提高对来自系统外部少数类样本信息的使用效率,构建了一种新的迁移学习模型:以基于集成技术的迁移装袋模型为基础,使用两阶段抽样和数据分组处理技术分别对其基模型生成和集成策略进行改进。运用重庆某商业银行信用卡客户数据进行的实证研究结果表明:与目前客户信用评估的常用方法相比,新模型能更好地处理绝对稀缺条件下类别不平衡对客户信用评估的影响,特别对占少数的违约客户有更好的预测精度。  相似文献   

8.
The last years have seen the development of many credit scoring models for assessing the creditworthiness of loan applicants. Traditional credit scoring methodology has involved the use of statistical and mathematical programming techniques such as discriminant analysis, linear and logistic regression, linear and quadratic programming, or decision trees. However, the importance of credit grant decisions for financial institutions has caused growing interest in using a variety of computational intelligence techniques. This paper concentrates on evolutionary computing, which is viewed as one of the most promising paradigms of computational intelligence. Taking into account the synergistic relationship between the communities of Economics and Computer Science, the aim of this paper is to summarize the most recent developments in the application of evolutionary algorithms to credit scoring by means of a thorough review of scientific articles published during the period 2000–2012.  相似文献   

9.
The credit scoring is a risk evaluation task considered as a critical decision for financial institutions in order to avoid wrong decision that may result in huge amount of losses. Classification models are one of the most widely used groups of data mining approaches that greatly help decision makers and managers to reduce their credit risk of granting credits to customers instead of intuitive experience or portfolio management. Accuracy is one of the most important criteria in order to choose a credit‐scoring model; and hence, the researches directed at improving upon the effectiveness of credit scoring models have never been stopped. In this article, a hybrid binary classification model, namely FMLP, is proposed for credit scoring, based on the basic concepts of fuzzy logic and artificial neural networks (ANNs). In the proposed model, instead of crisp weights and biases, used in traditional multilayer perceptrons (MLPs), fuzzy numbers are used in order to better model of the uncertainties and complexities in financial data sets. Empirical results of three well‐known benchmark credit data sets indicate that hybrid proposed model outperforms its component and also other those classification models such as support vector machines (SVMs), K‐nearest neighbor (KNN), quadratic discriminant analysis (QDA), and linear discriminant analysis (LDA). Therefore, it can be concluded that the proposed model can be an appropriate alternative tool for financial binary classification problems, especially in high uncertainty conditions. © 2013 Wiley Periodicals, Inc. Complexity 18: 46–57, 2013  相似文献   

10.
运用基于主分量分析和神经网络(PCA-NN)的个人信用评估模型以期取得更好的预测分类能力.经实证分析及与SVM方法、线性判别分析、Logistic回归分析、最近邻估计、分类回归树及神经网络等方法的对比,结果表明,该方法有很好的预测效果.  相似文献   

11.
Credit applicants are assigned to good or bad risk classes according to their record of defaulting. Each applicant is described by a high-dimensional input vector of situational characteristics and by an associated class label. A statistical model, which maps the inputs to the labels, can decide whether a new credit applicant should be accepted or rejected, by predicting the class label given the new inputs. Support vector machines (SVM) from statistical learning theory can build such models from the data, requiring extremely weak prior assumptions about the model structure. Furthermore, SVM divide a set of labelled credit applicants into subsets of ‘typical’ and ‘critical’ patterns. The correct class label of a typical pattern is usually very easy to predict, even with linear classification methods. Such patterns do not contain much information about the classification boundary. The critical patterns (the support vectors) contain the less trivial training examples. For instance, linear discriminant analysis with prior training subset selection via SVM also leads to improved generalization. Using non-linear SVM, more ‘surprising’ critical regions may be detected, but owing to the relative sparseness of the data, this potential seems to be limited in credit scoring practice.  相似文献   

12.
Corporate credit granting is a key commercial activity of financial institutions nowadays. A critical first step in the credit granting process usually involves a careful financial analysis of the creditworthiness of the potential client. Wrong decisions result either in foregoing valuable clients or, more severely, in substantial capital losses if the client subsequently defaults. It is thus of crucial importance to develop models that estimate the probability of corporate bankruptcy with a high degree of accuracy. Many studies focused on the use of financial ratios in linear statistical models, such as linear discriminant analysis and logistic regression. However, the obtained error rates are often high. In this paper, Least Squares Support Vector Machine (LS-SVM) classifiers, also known as kernel Fisher discriminant analysis, are applied within the Bayesian evidence framework in order to automatically infer and analyze the creditworthiness of potential corporate clients. The inferred posterior class probabilities of bankruptcy are then used to analyze the sensitivity of the classifier output with respect to the given inputs and to assist in the credit assignment decision making process. The suggested nonlinear kernel based classifiers yield better performances than linear discriminant analysis and logistic regression when applied to a real-life data set concerning commercial credit granting to mid-cap Belgian and Dutch firms.  相似文献   

13.
本文首先从数据缺失机制的角度分析了信用评分模型的开发和应用中所存在的样本偏差问题,提出了可以用拒绝推断来处理此类问题;然后在曾经被应用于拒绝推断问题处理的Heckman两阶段模型的基础上,提出了用拟似然两阶段模型和广义偏线性模型这两种新的两阶段方法来处理信用评分模型中的拒绝推断问题。经过实证分析发现,应用这两种方法可以得到很理想的结果。另外根据本文的研究,人行征信这类外部数据是拒绝推断最有效的方法,如果此类数据缺乏,则用拟似然两阶段模型和广义偏线性模型是比较有效的拒绝推断方法。  相似文献   

14.
Conditional density estimation in a parametric regression setting, where the problem is to estimate a parametric density of the response given the predictor, is a classical and prominent topic in regression analysis. This article explores this problem in a nonparametric setting where no assumption about shape of an underlying conditional density is made. For the first time in the literature, it is proved that there exists a nonparametric data-driven estimator that matches performance of an oracle which: (i) knows the underlying conditional density, (ii) adapts to an unknown design of predictors, (iii) performs a dimension reduction if the response does not depend on the predictor, (iv) is minimax over a vast set of anisotropic bivariate function classes. All these results are established via an oracle inequality which is on par with ones known in the univariate density estimation literature. Further, the asymptotically optimal estimator is tested on an interesting actuarial example which explores a relationship between credit scoring and premium for basic auto-insurance for 54 undergraduate college students.  相似文献   

15.
Behavioural scoring models are generally used to estimate the probability that a customer of a financial institution who owns a credit product will default on this product in a fixed time horizon. However, one single customer usually purchases many credit products from an institution while behavioural scoring models generally treat each of these products independently. In order to make credit risk management easier and more efficient, it is interesting to develop customer default scoring models. These models estimate the probability that a customer of a certain financial institution will have credit issues with at least one product in a fixed time horizon. In this study, three strategies to develop customer default scoring models are described. One of the strategies is regularly utilized by financial institutions and the other two will be proposed herein. The performance of these strategies is compared by means of an actual data bank supplied by a financial institution and a Monte Carlo simulation study.  相似文献   

16.
One of the aims of credit scoring models is to predict the probability of repayment of any applicant and yet such models are usually parameterised using a sample of accepted applicants only. This may lead to biased estimates of the parameters. In this paper we examine two issues. First, we compare the classification accuracy of a model based only on accepted applicants, relative to one based on a sample of all applicants. We find only a minimal difference, given the cutoff scores for the old model used by the data supplier. Using a simulated model we examine the predictive performance of models estimated from bands of applicants, ranked by predicted creditworthiness. We find that the lower the risk band of the training sample, the less accurate the predictions for all applicants. We also find that the lower the risk band of the training sample, the greater the overestimate of the true performance of the model, when tested on a sample of applicants within the same risk band — as a financial institution would do. The overestimation may be very large. Second, we examine the predictive accuracy of a bivariate probit model with selection (BVP). This parameterises the accept–reject model allowing for (unknown) omitted variables to be correlated with those of the original good–bad model. The BVP model may improve accuracy if the loan officer has overridden a scoring rule. We find that a small improvement when using the BVP model is sometimes possible.  相似文献   

17.
In this paper, we study the performance of various state-of-the-art classification algorithms applied to eight real-life credit scoring data sets. Some of the data sets originate from major Benelux and UK financial institutions. Different types of classifiers are evaluated and compared. Besides the well-known classification algorithms (eg logistic regression, discriminant analysis, k-nearest neighbour, neural networks and decision trees), this study also investigates the suitability and performance of some recently proposed, advanced kernel-based classification algorithms such as support vector machines and least-squares support vector machines (LS-SVMs). The performance is assessed using the classification accuracy and the area under the receiver operating characteristic curve. Statistically significant performance differences are identified using the appropriate test statistics. It is found that both the LS-SVM and neural network classifiers yield a very good performance, but also simple classifiers such as logistic regression and linear discriminant analysis perform very well for credit scoring.  相似文献   

18.
为了充分利用SVM在个人信用评估方面的优点、克服其不足,提出了基于支持向量机委员会机器的个人信用评估模型.将模型与基于属性效用函数估计构造新学习样本方法结合起来进行个人信用评估;经实证分析及与SVM方法对比发现,模型具有更好、更快、更多适应性的预测分类能力.  相似文献   

19.
Dataset shift is present in almost all real-world applications, since most of them are constantly dealing with changing environments. Detecting fractures in datasets on time allows recalibrating the models before a significant decrease in the model’s performance is observed. Since small changes are normal in most applications and do not justify the efforts that a model recalibration requires, we are only interested in identifying those changes that are critical for the correct functioning of the model. In this work we propose a model-dependent backtesting strategy designed to identify significant changes in the covariates, relating a confidence zone of the change to a maximal deviance measure obtained from the coefficients of the model. Using logistic regression as a predictive approach, we performed experiments on simulated data, and on a real-world credit scoring dataset. The results show that the proposed method has better performance than traditional approaches, consistently identifying major changes in variables while taking into account important characteristics of the problem, such as sample sizes and variances, and uncertainty in the coefficients.  相似文献   

20.
If a credit scoring model is built using only applicants who have been previously accepted for credit such a non-random sample selection may produce bias in the estimated model parameters and accordingly the model's predictions of repayment performance may not be optimal. Previous empirical research suggests that omission of rejected applicants has a detrimental impact on model estimation and prediction. This paper explores the extent to which, given the previous cutoff score applied to decide on accepted applicants, the number of included variables influences the efficacy of a commonly used reject inference technique, reweighting. The analysis benefits from the availability of a rare sample, where virtually no applicant was denied credit. The general indication is that the efficacy of reject inference is little influenced by either model leanness or interaction between model leanness and the rejection rate that determined the sample. However, there remains some hint that very lean models may benefit from reject inference where modelling is conducted on data characterized by a very high rate of applicant rejection.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号