首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 406 毫秒
1.
机器算法中存在许多不同类型和方式的运行模式,而在诸多算法之中,集成学习的算法是一种基于统计理论以计算机实现的良好机器学习方法.阐述了集成学习的基本思想和实现步骤,运用Bagging集成学习算法试图建立一个个人信用评估模型,以期取得更好的预测结果.运用信息增益法筛选指标,采用V折交叉确认法,利用UCI的信用数据对单个分类器、Bagging集成分类器模型的分类精度和稳健性进行试验比较.结果表明,Bagging-决策树有效的提高了样本的精确性,在个人信用评估的分析中占有较强的优势.  相似文献   

2.
《数理统计与管理》2019,(5):812-822
征信数据中的客户往往呈现"好多坏少"的不平衡结构,这种结构使得一般的分类模型在预测客户信用表现时失效。本文基于零膨胀计数模型的建模思想,分别提出处理因变量为二分类变量、多分类变量、计数变量的零膨胀信用评级模型(ZICSM),将客户结构拆分为稳定好客户、不稳定好客户和坏客户三个部分,利用模型自身优势形成严谨和宽松的两套贷款审批机制。ZICSM模型对目标函数进行权数调整,使模型更加关注"坏"客户,在目标函数中加入惩罚项,使模型具备组变量选择功能。此外,本文提出兼顾风险把控和市场份额的RS得分指标,借以评价信用评级模型的分类效果。模拟研究和实证研究的结果表明,ZICSM模型能够提升金融机构的贷款收益,增加其审批机制的灵活性,适用于处理征信数据的不平衡问题。  相似文献   

3.
连续变量离散化属于信用评级建模的初始阶段,科学的离散化操作能够提升模型的分类效果和参数的稳定性,便于评级模型的产品呈现.考虑信用评级的误判成本差异,对类别-属性一致性最大化准则进行类别权重调整,提出ACACM准则,并提出基于ACACM准则的数据离散化算法.ACACM算法调整原算法中不同类别个体的权重,更加倾向于刻画误判成本较高的违约客户,使离散化后的变量能够提升评级模型的风险控制能力,更适合信用评级建模.  相似文献   

4.
信用分类是信用风险管理中一个重要环节,其主要目的是根据信用申请客户提供的资料从申请客户中区分出可信客户和违约客户,以便为信用决策者提供决策依据.为了正确区分不同的信用客户,特别是违约客户,结合核主元分析和支持向量机算法构造基于核主元分析的带可变惩罚因子最小二乘模糊支持向量机模型对信用数据进行了分类处理.在基于核主元分析的带可变惩罚因子最小二乘模糊支持向量机模型中,首先对样本数据进行预处理,然后利用核主元分析以非线性方式降低数据的维数,最后利用带可变惩罚因子最小二乘模糊支持向量机模型对降维后数据进行分类分析.为了验证,选择两个公开的信用数据集来进行实证分析.实证结果表明:基于核主元分析的带可变惩罚因子最小二乘模糊支持向量机模型取得了较好的分类结果,可为信用决策者提供重要的决策参考依据.  相似文献   

5.
截至2014年底,中国注册个体工商户为4984.06万户,个体私营经济吸纳社会从业人员已达2.5亿人,加上中国商户小额贷款对象的分散性、财务信息不健全等特点和难点,商户小额贷款信用评级体系极不完善,甚至绝大多数银行都没有建立这个体系。本文通过相关分析剔除反映信息重复的指标,通过显著性判别遴选对商户违约状态影响显著的指标,建立了能显著区分商户违约状态的小额贷款信用评级指标体系。在此基础上,结合PROMETHEE-II(偏好顺序结构)和聚类分析方法,构建了商户小额贷款信用评级模型,并对中国某国有商业银行2157个商户小额贷款样本进行了实证。本文创新与特色:一是通过将偏好顺序结构评估法(PROMETHEE-II)引入商户小额贷款信用评级,构建了基于PROMETHEE-II的小额贷款信用评分模型,求解商户的净流量信用得分Φ(a),揭示了商户a与其余商户、评价指标间的相互作用对评价结果的影响,避免了现有研究由于评价指标之间的相互替代性、严重影响评价结果可靠性的不足。二是借鉴模糊聚类“数据越集中、越应该被分为一类”的思想,采用R聚类对商户信用得分进行分类;进而采用K-W检验,对分类数目l进行非参数检验,确定商户的信用等级。既保证了不同等级商户在信用得分数值上存在显著差异,也确保了不同等级商户能反映不同的信用特征;同时,也避免了现有利用信用得分区间、违约概率阈值或客户数分布方法划分信用等级时,得分区间、违约概率阈值或客户数分布分位点人为主观确定的不足。三是实证研究表明,影响商户小额贷款信用风险的重要性排序依次为:X3偿债能力>X1基本情况>X6宏观环境>X5营运能力>X2保证联保>X4盈利能力。  相似文献   

6.
构建农村信用社信用风险模型对完善农村金融风险管理体系、提高农村信用社经营管理意义重大.基于还款意愿和还款能力两方面,系统分析了影响农信社贷款债务人违约率的主要因素,在此基础上应用logistic方法建立农信社债务人违约率预测模型,并通过Gini系数对模型区分能力和识别能力进行验证评估.实证结果表明,模型中债务人年龄、所在地区、贷款额所占家庭收入比例、与信用社信贷关系密切程度以及户口状况等因素都表现显著;违约率预测模型在样本内和样本外均有较好的违约识别能力,从而可为农信社放贷前的债务人信用评估、贷款发放和风险管理提供有力参考.  相似文献   

7.
程砚秋 《运筹与管理》2016,25(6):181-189
小企业信用风险评价既是银行风险管理问题,又事关经济社会稳定。针对小企业贷款实践中,违约样本远少于非违约样本、且违约客户误判对银行影响较大的现实,采用不均衡支持向量机对小企业信用风险评价指标进行赋权,进而构建了能有效区分违约客户、非违约客户的评价模型。根据有无特定评价指标、特定评价指标数值变化对贷款小企业违约状态的影响程度赋权;反映了对违约状态影响越大、评价指标权重越大的赋权思路。将违约样本正确识别率、违约样本的准确率与查全率等因素作为支持向量机赋权模型中客户识别率的度量标准,改变了样本数据不均衡所导致的样本总体精度很高、违约样本精度反而不高的现象。研究结果表明:行业景气指数、资本固定化比率、净利润现金含量、恩格尔系数、营业利润率等评价指标对小企业信用风险的影响较大。  相似文献   

8.
《数理统计与管理》2015,(6):1048-1056
中国人保守的消费习惯使得信用评级建模数据违约率较低,数据呈现出不平衡的特点,这种不平衡性对logistic回归模型的预测效果带来负面影响。本文将非对称连接函数的思想引入到信用评级中,将有偏logistic分布的分布函数作为连接函数的反函数,利用实际数据来估计偏度参数和回归系数。研究表明,有偏logistic回归的预测效果优于普通logistic回归,并且在10%的违约数据集中,有偏logistic回归的表现还优于决策树、神经网络和支持向量机。  相似文献   

9.
为提高具有先验知识样本的学习效率,本文在吸引子传播聚类模型基础上,引入半监督学习策略,并综合考虑样本动态信息变化,融合多指标面板数据,提出智能信息处理的多指标面板数据聚类模型。选取30家房地产业上市公司2009-2013年相关财务数据,利用此模型进行聚类和绩效评价分析。结果表明,智能信息处理的多指标面板数据聚类模型能更加有效地区分样本类别特征,可为上市公司绩效评价、金融管理与决策提供一个更加有效的方法和手段。  相似文献   

10.
许多机器学习的实际应用中都存在数据不平衡问题,即某类的样本数目要远小于其他类别.数据不平衡会使得分类问题中的分类面过于倾向于适应大类而忽略小类,导致测试样本被错误地判断为大类.针对该问题,文章提出了一种平衡化图半监督学习方法.该方法在能量函数中引入均衡化因子项,使得置信值不仅在图上尽量光滑且在不同类别之间也尽量均衡,有效减小了数据不均衡的不利影响,21个标准数据集上对比实验的统计分析结果表明新方法在数据不平衡时具有显著(显著性水平为0.05)优于支持向量机以及其他图半监督学习方法的分类效果.  相似文献   

11.
The class imbalance problem is common in the credit scoring domain, as the number of defaulters is usually much less than the number of non-defaulters. To date, research on investigating the class imbalance problem has mainly focused on indicating and reducing the adverse effect of the class imbalance on the predictive accuracy of machine learning techniques, while the impact of that on machine learning interpretability has never been studied in the literature. This paper fills this gap by analysing how the stability of Local Interpretable Model-agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP), two popular interpretation methods, are affected by class imbalance. Our experiments use 2016–2020 UK residential mortgage data collected from European Datawarehouse. We evaluate the stability of LIME and SHAP on datasets of progressively increased class imbalance. The results show that interpretations generated from LIME and SHAP are less stable as the class imbalance increases, which indicates that the class imbalance does have an adverse effect on machine learning interpretability. To check the robustness of our outcomes, we also analyse two open-source credit scoring datasets and we obtain similar results.  相似文献   

12.
The logistic regression framework has been for long time the most used statistical method when assessing customer credit risk. Recently, a more pragmatic approach has been adopted, where the first issue is credit risk prediction, instead of explanation. In this context, several classification techniques have been shown to perform well on credit scoring, such as support vector machines among others. While the investigation of better classifiers is an important research topic, the specific methodology chosen in real world applications has to deal with the challenges arising from the real world data collected in the industry. Such data are often highly unbalanced, part of the information can be missing and some common hypotheses, such as the i.i.d. one, can be violated. In this paper we present a case study based on a sample of IBM Italian customers, which presents all the challenges mentioned above. The main objective is to build and validate robust models, able to handle missing information, class unbalancedness and non-iid data points. We define a missing data imputation method and propose the use of an ensemble classification technique, subagging, particularly suitable for highly unbalanced data, such as credit scoring data. Both the imputation and subagging steps are embedded in a customized cross-validation loop, which handles dependencies between different credit requests. The methodology has been applied using several classifiers (kernel support vector machines, nearest neighbors, decision trees, Adaboost) and their subagged versions. The use of subagging improves the performance of the base classifier and we will show that subagging decision trees achieve better performance, still keeping the model simple and reasonably interpretable.  相似文献   

13.
Credit scoring is a method of modelling potential risk of credit applications. Traditionally, logistic regression and discriminant analysis are the most widely used approaches to create scoring models in the industry. However, these methods are associated with quite a few limitations, such as being instable with high-dimensional data and small sample size, intensive variable selection effort and incapability of efficiently handling non-linear features. Most importantly, based on these algorithms, it is difficult to automate the modelling process and when population changes occur, the static models usually fail to adapt and may need to be rebuilt from scratch. In the last few years, the kernel learning approach has been investigated to solve these problems. However, the existing applications of this type of methods (in particular the SVM) in credit scoring have all focused on the batch model and did not address the important problem of how to update the scoring model on-line. This paper presents a novel and practical adaptive scoring system based on an incremental kernel method. With this approach, the scoring model is adjusted according to an on-line update procedure that can always converge to the optimal solution without information loss or running into numerical difficulties. Non-linear features in the data are automatically included in the model through a kernel transformation. This approach does not require any variable reduction effort and is also robust for scoring data with a large number of attributes and highly unbalanced class distributions. Moreover, a new potential kernel function is introduced to further improve the predictive performance of the scoring model and a kernel attribute ranking technique is used that adds transparency in the final model. Experimental studies using real world data sets have demonstrated the effectiveness of the proposed method.  相似文献   

14.
The features used may have an important effect on the performance of credit scoring models. The process of choosing the best set of features for credit scoring models is usually unsystematic and dominated by somewhat arbitrary trial. This paper presents an empirical study of four machine learning feature selection methods. These methods provide an automatic data mining technique for reducing the feature space. The study illustrates how four feature selection methods—‘ReliefF’, ‘Correlation-based’, ‘Consistency-based’ and ‘Wrapper’ algorithms help to improve three aspects of the performance of scoring models: model simplicity, model speed and model accuracy. The experiments are conducted on real data sets using four classification algorithms—‘model tree (M5)’, ‘neural network (multi-layer perceptron with back-propagation)’, ‘logistic regression’, and ‘k-nearest-neighbours’.  相似文献   

15.
Traditional methods of applying classification models into the area of credit scoring may ignore the effect from censoring. Survival analysis has been introduced with its ability to deal with censored data. The mixture cure model, one important branch of survival models, is also applied in the context of credit scoring, assuming that the study population is a mixture of never-default and will-default customers.  相似文献   

16.
Fierce competition as well as the recent financial crisis in financial and banking industries made credit scoring gain importance. An accurate estimation of credit risk helps organizations to decide whether or not to grant credit to potential customers. Many classification methods have been suggested to handle this problem in the literature. This paper proposes a model for evaluating credit risk based on binary quantile regression, using Bayesian estimation. This paper points out the distinct advantages of the latter approach: that is (i) the method provides accurate predictions of which customers may default in the future, (ii) the approach provides detailed insight into the effects of the explanatory variables on the probability of default, and (iii) the methodology is ideally suited to build a segmentation scheme of the customers in terms of risk of default and the corresponding uncertainty about the prediction. An often studied dataset from a German bank is used to show the applicability of the method proposed. The results demonstrate that the methodology can be an important tool for credit companies that want to take the credit risk of their customer fully into account.  相似文献   

17.
为了充分利用SVM在个人信用评估方面的优点、克服其不足,提出了基于支持向量机委员会机器的个人信用评估模型.将模型与基于属性效用函数估计构造新学习样本方法结合起来进行个人信用评估;经实证分析及与SVM方法对比发现,模型具有更好、更快、更多适应性的预测分类能力.  相似文献   

18.
The number of Non-Performing Loans has increased in recent years, paralleling the current financial crisis, thus increasing the importance of credit scoring models. This study proposes a three stage hybrid Adaptive Neuro Fuzzy Inference System credit scoring model, which is based on statistical techniques and Neuro Fuzzy. The proposed model’s performance was compared with conventional and commonly utilized models. The credit scoring models are tested using a 10-fold cross-validation process with the credit card data of an international bank operating in Turkey. Results demonstrate that the proposed model consistently performs better than the Linear Discriminant Analysis, Logistic Regression Analysis, and Artificial Neural Network (ANN) approaches, in terms of average correct classification rate and estimated misclassification cost. As with ANN, the proposed model has learning ability; unlike ANN, the model does not stay in a black box. In the proposed model, the interpretation of independent variables may provide valuable information for bankers and consumers, especially in the explanation of why credit applications are rejected.  相似文献   

19.
The credit scoring is a risk evaluation task considered as a critical decision for financial institutions in order to avoid wrong decision that may result in huge amount of losses. Classification models are one of the most widely used groups of data mining approaches that greatly help decision makers and managers to reduce their credit risk of granting credits to customers instead of intuitive experience or portfolio management. Accuracy is one of the most important criteria in order to choose a credit‐scoring model; and hence, the researches directed at improving upon the effectiveness of credit scoring models have never been stopped. In this article, a hybrid binary classification model, namely FMLP, is proposed for credit scoring, based on the basic concepts of fuzzy logic and artificial neural networks (ANNs). In the proposed model, instead of crisp weights and biases, used in traditional multilayer perceptrons (MLPs), fuzzy numbers are used in order to better model of the uncertainties and complexities in financial data sets. Empirical results of three well‐known benchmark credit data sets indicate that hybrid proposed model outperforms its component and also other those classification models such as support vector machines (SVMs), K‐nearest neighbor (KNN), quadratic discriminant analysis (QDA), and linear discriminant analysis (LDA). Therefore, it can be concluded that the proposed model can be an appropriate alternative tool for financial binary classification problems, especially in high uncertainty conditions. © 2013 Wiley Periodicals, Inc. Complexity 18: 46–57, 2013  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号