首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper presents a new approach for consumer credit scoring, by tailoring a profit-based classification performance measure to credit risk modeling. This performance measure takes into account the expected profits and losses of credit granting and thereby better aligns the model developers’ objectives with those of the lending company. It is based on the Expected Maximum Profit (EMP) measure and is used to find a trade-off between the expected losses – driven by the exposure of the loan and the loss given default – and the operational income given by the loan. Additionally, one of the major advantages of using the proposed measure is that it permits to calculate the optimal cutoff value, which is necessary for model implementation. To test the proposed approach, we use a dataset of loans granted by a government institution, and benchmarked the accuracy and monetary gain of using EMP, accuracy, and the area under the ROC curve as measures for selecting model parameters, and for determining the respective cutoff values. The results show that our proposed profit-based classification measure outperforms the alternative approaches in terms of both accuracy and monetary value in the test set, and that it facilitates model deployment.  相似文献   

2.
The logistic regression framework has been for long time the most used statistical method when assessing customer credit risk. Recently, a more pragmatic approach has been adopted, where the first issue is credit risk prediction, instead of explanation. In this context, several classification techniques have been shown to perform well on credit scoring, such as support vector machines among others. While the investigation of better classifiers is an important research topic, the specific methodology chosen in real world applications has to deal with the challenges arising from the real world data collected in the industry. Such data are often highly unbalanced, part of the information can be missing and some common hypotheses, such as the i.i.d. one, can be violated. In this paper we present a case study based on a sample of IBM Italian customers, which presents all the challenges mentioned above. The main objective is to build and validate robust models, able to handle missing information, class unbalancedness and non-iid data points. We define a missing data imputation method and propose the use of an ensemble classification technique, subagging, particularly suitable for highly unbalanced data, such as credit scoring data. Both the imputation and subagging steps are embedded in a customized cross-validation loop, which handles dependencies between different credit requests. The methodology has been applied using several classifiers (kernel support vector machines, nearest neighbors, decision trees, Adaboost) and their subagged versions. The use of subagging improves the performance of the base classifier and we will show that subagging decision trees achieve better performance, still keeping the model simple and reasonably interpretable.  相似文献   

3.
In recent years, support vector machines (SVMs) were successfully applied to a wide range of applications. However, since the classifier is described as a complex mathematical function, it is rather incomprehensible for humans. This opacity property prevents them from being used in many real-life applications where both accuracy and comprehensibility are required, such as medical diagnosis and credit risk evaluation. To overcome this limitation, rules can be extracted from the trained SVM that are interpretable by humans and keep as much of the accuracy of the SVM as possible. In this paper, we will provide an overview of the recently proposed rule extraction techniques for SVMs and introduce two others taken from the artificial neural networks domain, being Trepan and G-REX. The described techniques are compared using publicly available datasets, such as Ripley’s synthetic dataset and the multi-class iris dataset. We will also look at medical diagnosis and credit scoring where comprehensibility is a key requirement and even a regulatory recommendation. Our experiments show that the SVM rule extraction techniques lose only a small percentage in performance compared to SVMs and therefore rank at the top of comprehensible classification techniques.  相似文献   

4.
Deductive reasoning with classical logic is hampered when imprecision is present in the variables, although human reasoning can cope quite adequately with vague concepts. A new approach to reasoning which allows imprecise conclusions to be drawn consistently from imprecise premises was introduced by Baldwin [2]. This method is economical in calculation as it avoids the high dimensionality that fuzzy set representations often involve.This paper briefly reviews the method from an operational viewpoint, isolating the individual processes that are used in the method. A feasible algorithm for computing each process is then presented.It is assumed that the reader is familiar with the concept of, and operations on, fuzzy sets introduced by Zadeh [14].  相似文献   

5.
Credit risk analysis is an active research area in financial risk management and credit scoring is one of the key analytical techniques in credit risk evaluation. In this study, a novel intelligent-agent-based fuzzy group decision making (GDM) model is proposed as an effective multicriteria decision analysis (MCDA) tool for credit risk evaluation. In this proposed model, some artificial intelligent techniques, which are used as intelligent agents, are first used to analyze and evaluate the risk levels of credit applicants over a set of pre-defined criteria. Then these evaluation results, generated by different intelligent agents, are fuzzified into some fuzzy opinions on credit risk level of applicants. Finally, these fuzzification opinions are aggregated into a group consensus and meantime the fuzzy aggregated consensus is defuzzified into a crisp aggregated value to support final decision for decision-makers of credit-granting institutions. For illustration and verification purposes, a simple numerical example and three real-world credit application approval datasets are presented.  相似文献   

6.
In this paper, we propose a novel method to mine association rules for classification problems namely AFSRC (AFS association rules for classification) realized in the framework of the axiomatic fuzzy set (AFS) theory. This model provides a simple and efficient rule generation mechanism. It can also retain meaningful rules for imbalanced classes by fuzzifying the concept of the class support of a rule. In addition, AFSRC can handle different data types occurring simultaneously. Furthermore, the new model can produce membership functions automatically by processing available data. An extensive suite of experiments are reported which offer a comprehensive comparison of the performance of the method with the performance of some other methods available in the literature. The experimental result shows that AFSRC outperforms most of other methods when being quantified in terms of accuracy and interpretability. AFSRC forms a classifier with high accuracy and more interpretable rule base of smaller size while retaining a sound balance between these two characteristics.  相似文献   

7.
This paper discusses models for evaluating credit risk in relation to the retailing industry. Hunt’s [Hunt, S.D., 2000. A General Theory of Competition. Sage Publications Inc., California] Resource–Advantage Theory of Competition is used as a basis for variable selection, given the theory’s relevancy to retail competition. The study focuses on the US retail market. Four standard credit scoring methodologies: Naïve Bayes, Logistic Regression, Recursive Partitioning and Artificial Neural Network, are compared with Sequential Minimal Optimization (SMO), using a sample of 195 healthy companies and 51 distressed firms over five time periods from 1994 to 2002.  相似文献   

8.
Redundant fuzzy rules exclusion by genetic algorithms   总被引:1,自引:0,他引:1  
A genetic-algorithm-based method for exclusion of the potential redundant if-then fuzzy rules that have been extracted from numerical input-output data is proposed. The main idea is the input-space separation into activation rectangles, corresponding to certain output intervals. The generation of fuzzy rules and the membership functions are based on these activation rectangles and appropriate fuzzy rules inference mechanism is proposed. As the method usually produces too many rules, it is necessary to exclude the potential redundant if-then rules. The concept for varying the family of sensitivity parameters, defining the overlapping of the fuzzy regions is proposed. The genetic algorithms are used to resolve the following combinatorial optimization problem: the generation of families of sensitivity parameters. In this way the potential redundant if-then fuzzy rules are excluded.

The method formalizes the synthesis of the fuzzy system and could be used for function approximation, classification and control purposes. An illustrative example for implementation of the method for traffic fuzzy control is given.  相似文献   


9.
A learning process for fuzzy control rules using genetic algorithms   总被引:10,自引:0,他引:10  
The purpose of this paper is to present a genetic learning process for learning fuzzy control rules from examples. It is developed in three stages: the first one is a fuzzy rule genetic generating process based on a rule learning iterative approach, the second one combines two kinds of rules, experts rules if there are and the previously generated fuzzy control rules, removing the redundant fuzzy rules, and the thrid one is a tuning process for adjusting the membership functions of the fuzzy rules. The three components of the learning process are developed formulating suitable genetic algorithms.  相似文献   

10.
Multiple classifier systems combine several individual classifiers to deliver a final classification decision. In this paper the performance of several multiple classifier systems are evaluated in terms of their ability to correctly classify consumers as good or bad credit risks. Empirical results suggest that some multiple classifier systems deliver significantly better performance than the single best classifier, but many do not. Overall, bagging and boosting outperform other multi-classifier systems, and a new boosting algorithm, Error Trimmed Boosting, outperforms bagging and AdaBoost by a significant margin.  相似文献   

11.
With the fast development of financial products and services, bank’s credit departments collected large amounts of data, which risk analysts use to build appropriate credit scoring models to evaluate an applicant’s credit risk accurately. One of these models is the Multi-Criteria Optimization Classifier (MCOC). By finding a trade-off between overlapping of different classes and total distance from input points to the decision boundary, MCOC can derive a decision function from distinct classes of training data and subsequently use this function to predict the class label of an unseen sample. In many real world applications, however, owing to noise, outliers, class imbalance, nonlinearly separable problems and other uncertainties in data, classification quality degenerates rapidly when using MCOC. In this paper, we propose a novel multi-criteria optimization classifier based on kernel, fuzzification, and penalty factors (KFP-MCOC): Firstly a kernel function is used to map input points into a high-dimensional feature space, then an appropriate fuzzy membership function is introduced to MCOC and associated with each data point in the feature space, and the unequal penalty factors are added to the input points of imbalanced classes. Thus, the effects of the aforementioned problems are reduced. Our experimental results of credit risk evaluation and their comparison with MCOC, support vector machines (SVM) and fuzzy SVM show that KFP-MCOC can enhance the separation of different applicants, the efficiency of credit risk scoring, and the generalization of predicting the credit rank of a new credit applicant.  相似文献   

12.
Mixture cure models were originally proposed in medical statistics to model long-term survival of cancer patients in terms of two distinct subpopulations - those that are cured of the event of interest and will never relapse, along with those that are uncured and are susceptible to the event. In the present paper, we introduce mixture cure models to the area of credit scoring, where, similarly to the medical setting, a large proportion of the dataset may not experience the event of interest during the loan term, i.e. default. We estimate a mixture cure model predicting (time to) default on a UK personal loan portfolio, and compare its performance to the Cox proportional hazards method and standard logistic regression. Results for credit scoring at an account level and prediction of the number of defaults at a portfolio level are presented; model performance is evaluated through cross validation on discrimination and calibration measures. Discrimination performance for all three approaches was found to be high and competitive. Calibration performance for the survival approaches was found to be superior to logistic regression for intermediate time intervals and useful for fixed 12 month time horizon estimates, reinforcing the flexibility of survival analysis as both a risk ranking tool and for providing robust estimates of probability of default over time. Furthermore, the mixture cure model’s ability to distinguish between two subpopulations can offer additional insights by estimating the parameters that determine susceptibility to default in addition to parameters that influence time to default of a borrower.  相似文献   

13.
《Fuzzy Sets and Systems》2004,141(1):47-58
This paper presents a novel boosting algorithm for genetic learning of fuzzy classification rules. The method is based on the iterative rule learning approach to fuzzy rule base system design. The fuzzy rule base is generated in an incremental fashion, in that the evolutionary algorithm optimizes one fuzzy classifier rule at a time. The boosting mechanism reduces the weight of those training instances that are classified correctly by the new rule. Therefore, the next rule generation cycle focuses on fuzzy rules that account for the currently uncovered or misclassified instances. The weight of a fuzzy rule reflects the relative strength the boosting algorithm assigns to the rule class when it aggregates the casted votes. The approach is compared with other classification algorithms for a number problem sets from the UCI repository.  相似文献   

14.
The classification problem consists of using some known objects, usually described by a large vector of features, to induce a model that classifies others into known classes. The present paper deals with the optimization of Nearest Neighbor Classifiers via Metaheuristic Algorithms. The Metaheuristic Algorithms used include tabu search, genetic algorithms and ant colony optimization. The performance of the proposed algorithms is tested using data from 1411 firms derived from the loan portfolio of a leading Greek Commercial Bank in order to classify the firms in different groups representing different levels of credit risk. Also, a comparison of the algorithm with other methods such as UTADIS, SVM, CART, and other classification methods is performed using these data.  相似文献   

15.
Applying classical association rule extraction framework on fuzzy datasets leads to an unmanageably highly sized association rule sets. Moreover, the discretization operation leads to information loss and constitutes a hamper towards an efficient exploitation of the mined knowledge. To overcome such a drawback, this paper proposes the extraction and the exploitation of compact and informative generic basis of fuzzy association rules. The presented approach relies on the extension, within the fuzzy context, of the notion of closure and Galois connection, that we introduce in this paper. In order to select without loss of information a generic subset of all fuzzy association rules, we define three fuzzy generic basis from which remaining (redundant) FARs are generated. This generic basis constitutes a compact nucleus of fuzzy association rules, from which it is possible to informatively derive all the remaining rules. In order to ensure a sound and complete derivation process, we introduce an axiomatic system allowing the complete derivation of all the redundant rules. The results obtained from experiments carried out on benchmark datasets are very encouraging. They highlight a very important reduction of the number of the extracted fuzzy association rules without information loss.  相似文献   

16.
Sequential pattern mining from sequence databases has been recognized as an important data mining problem with various applications. Items in a sequence database can be organized into a concept hierarchy according to taxonomy. Based on the hierarchy, sequential patterns can be found not only at the leaf nodes (individual items) of the hierarchy, but also at higher levels of the hierarchy; this is called multiple-level sequential pattern mining. In previous research, taxonomies based on crisp relationships between any two disjointed levels, however, cannot handle the uncertainties and fuzziness in real life. For example, Tomatoes could be classified into the Fruit category, but could be also regarded as the Vegetable category. To deal with the fuzzy nature of taxonomy, Chen and Huang developed a novel knowledge discovering model to mine fuzzy multi-level sequential patterns, where the relationships from one level to another can be represented by a value between 0 and 1. In their work, a generalized sequential patterns (GSP)-like algorithm was developed to find fuzzy multi-level sequential patterns. This algorithm, however, faces a difficult problem since the mining process may have to generate and examine a huge set of combinatorial subsequences and requires multiple scans of the database. In this paper, we propose a new efficient algorithm to mine this type of pattern based on the divide-and-conquer strategy. In addition, another efficient algorithm is developed to discover fuzzy cross-level sequential patterns. Since the proposed algorithm greatly reduces the candidate subsequence generation efforts, the performance is improved significantly. Experiments show that the proposed algorithm is much more efficient and scalable than the previous one. In mining real-life databases, our works enhance the model's practicability and could promote more applications in business.  相似文献   

17.
Classifying magnetic resonance spectra is often difficult due to the curse of dimensionality; scenarios in which a high-dimensional feature space is coupled with a small sample size. We present an aggregation strategy that combines predicted disease states from multiple classifiers using several fuzzy integration variants. Rather than using all input features for each classifier, these multiple classifiers are presented with different, randomly selected, subsets of the spectral features. Results from a set of detailed experiments using this strategy are carefully compared against classification performance benchmarks. We empirically demonstrate that the aggregated predictions are consistently superior to the corresponding prediction from the best individual classifier.  相似文献   

18.
This paper describes a rapid technique: communal analysis suspicion scoring (CASS), for generating numeric suspicion scores on streaming credit applications based on implicit links to each other, over both time and space. CASS includes pair-wise communal scoring of identifier attributes for applications, definition of categories of suspiciousness for application-pairs, the incorporation of temporal and spatial weights, and smoothed k-wise scoring of multiple linked application-pairs. Results on mining several hundred thousand real credit applications demonstrate that CASS reduces false alarm rates while maintaining reasonable hit rates. CASS is scalable for this large data sample, and can rapidly detect early symptoms of identity crime. In addition, new insights have been observed from the relationships between applications.  相似文献   

19.
We present a methodology to grant and follow-up credits for micro-entrepreneurs. This segment of grantees is very relevant for many economies, especially in developing countries, but shows a behavior different to that of classical consumers where established credit scoring systems exist. Parts of our methodology follow a proven procedure we have applied successfully in several credit scoring projects. Other parts, such as cut-off point construction and model follow-up, however, had to be developed and constitute original contributions of the present paper. The results from two credit scoring projects we developed in Chile, one for a private bank and one for a governmental credit granting institution, provide interesting insights into micro-entrepreneurs’ repayment behavior which could also be interesting for the respective segment in countries with similar characteristics.  相似文献   

20.
The number of Non-Performing Loans has increased in recent years, paralleling the current financial crisis, thus increasing the importance of credit scoring models. This study proposes a three stage hybrid Adaptive Neuro Fuzzy Inference System credit scoring model, which is based on statistical techniques and Neuro Fuzzy. The proposed model’s performance was compared with conventional and commonly utilized models. The credit scoring models are tested using a 10-fold cross-validation process with the credit card data of an international bank operating in Turkey. Results demonstrate that the proposed model consistently performs better than the Linear Discriminant Analysis, Logistic Regression Analysis, and Artificial Neural Network (ANN) approaches, in terms of average correct classification rate and estimated misclassification cost. As with ANN, the proposed model has learning ability; unlike ANN, the model does not stay in a black box. In the proposed model, the interpretation of independent variables may provide valuable information for bankers and consumers, especially in the explanation of why credit applications are rejected.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号