首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 984 毫秒
1.
Customer churn prediction models aim to indicate the customers with the highest propensity to attrite, allowing to improve the efficiency of customer retention campaigns and to reduce the costs associated with churn. Although cost reduction is their prime objective, churn prediction models are typically evaluated using statistically based performance measures, resulting in suboptimal model selection. Therefore, in the first part of this paper, a novel, profit centric performance measure is developed, by calculating the maximum profit that can be generated by including the optimal fraction of customers with the highest predicted probabilities to attrite in a retention campaign. The novel measure selects the optimal model and fraction of customers to include, yielding a significant increase in profits compared to statistical measures.In the second part an extensive benchmarking experiment is conducted, evaluating various classification techniques applied on eleven real-life data sets from telecom operators worldwide by using both the profit centric and statistically based performance measures. The experimental results show that a small number of variables suffices to predict churn with high accuracy, and that oversampling generally does not improve the performance significantly. Finally, a large group of classifiers is found to yield comparable performance.  相似文献   

2.
In this paper, we propose ADTreesLogit, a model that integrates the advantage of ADTrees model and the logistic regression model, to improve the predictive accuracy and interpretability of existing churn prediction models. We show that the overall predictive accuracy of ADTreesLogit model compares favorably with that of TreeNet®, a model which won the Gold Prize in the 2003 mobile customer churn prediction modeling contest (The Duke/NCR Teradata Churn Modeling Tournament). In fact, ADTreesLogit has better predictive accuracy than TreeNet® on two important observation points.  相似文献   

3.
The definition and modeling of customer loyalty have been central issues in customer relationship management since many years. Recent papers propose solutions to detect customers that are becoming less loyal, also called churners. The churner status is then defined as a function of the volume of commercial transactions. In the context of a Belgian retail financial service company, our first contribution is to redefine the notion of customer loyalty by considering it from a customer-centric viewpoint instead of a product-centric one. We hereby use the customer lifetime value (CLV) defined as the discounted value of future marginal earnings, based on the customer’s activity. Hence, a churner is defined as someone whose CLV, thus the related marginal profit, is decreasing. As a second contribution, the loss incurred by the CLV decrease is used to appraise the cost to misclassify a customer by introducing a new loss function. In the empirical study, we compare the accuracy of various classification techniques commonly used in the domain of churn prediction, including two cost-sensitive classifiers. Our final conclusion is that since profit is what really matters in a commercial environment, standard statistical accuracy measures for prediction need to be revised and a more profit oriented focus may be desirable.  相似文献   

4.
Currently, in order to remain competitive companies are adopting customer centered strategies and consequently customer relationship management is gaining increasing importance. In this context, customer retention deserves particular attention. This paper proposes a model for partial churn detection in the retail grocery sector that includes as a predictor the similarity of the products?? first purchase sequence with churner and non-churner sequences. The sequence of first purchase events is modeled using Markov for discrimination. Two classification techniques are used in the empirical study: logistic regression and random forests. A real sample of approximately 95,000 new customers is analyzed taken from the data warehouse of a European retailing company. The empirical results reveal the relevance of the inclusion of a products?? sequence likelihood in partial churn prediction models, as well as the supremacy of logistic regression when compared with random forests.  相似文献   

5.
Mobile phone carriers in a saturated market must focus on customer retention to maintain profitability. This study investigates the incorporation of social network information into churn prediction models to improve accuracy, timeliness, and profitability. Traditional models are built using customer attributes, however these data are often incomplete for prepaid customers. Alternatively, call record graphs that are current and complete for all customers can be analysed. A procedure was developed to build the call graph and extract relevant features from it to be used in classification models. The scalability and applicability of this technique are demonstrated on a telecommunications data set containing 1.4 million customers and over 30 million calls each month. The models are evaluated based on ROC plots, lift curves, and expected profitability. The results show how using network features can improve performance over local features while retaining high interpretability and usability.  相似文献   

6.
Companies' interest in customer relationship modelling and key issues such as customer lifetime value and churn has substantially increased over the years. However, the complexity of building, interpreting and applying these models creates obstacles for their implementation. The main contribution of this paper is to show how domain knowledge can be incorporated in the data mining process for churn prediction, viz. through the evaluation of coefficient signs in a logistic regression model, and secondly, by analysing a decision table (DT) extracted from a decision tree or rule-based classifier. An algorithm to check DTs for violations of monotonicity constraints is presented, which involves the repeated application of condition reordering and table contraction to detect counter-intuitive patterns. Both approaches are applied to two telecom data sets to empirically demonstrate how domain knowledge can be used to ensure the interpretability of the resulting models.  相似文献   

7.
The value of the customer has been widely recognized in terms of financial planning and efficient resource allocation including the financial service industry. Previous studies have shown that directly observable information can be used in order to make reasonable predictions of customer attrition probabilities. However, these studies do not take full account of customer behavior information. In this paper, we demonstrate that efficient use of information can add value to financial services industry and improve the prediction of customer attrition. To achieve this, we apply an orthogonal polynomial approximation analysis to derive unobservable information, which is then used as explanatory variables in a probit–hazard rate model. Our results show that derived information can help our understanding of customer attrition behavior and give better predictions. We conclude that both researchers and the financial service industry should gather and use derived financial information in addition to directly observable information.  相似文献   

8.
The defection or churn of customers represents an important concern for any company and a central matter of interest in customer base analysis. An additional complication arises in non-contractual settings, where the characteristics that should be observed to saying that a customer has totally or partially defected are not clearly defined. As a matter of fact, different definitions of the churn situation could be used in this context. Focusing on non-contractual settings, in this paper we propose a methodology for evaluating the short-time economic effects that using a certain definition of churn would have on a company. With this aim, we have defined two efficiency measures for the economic results of a marketing campaign implemented against churn, and these measures have been computed using a set of definitions of partial defection. Our methodology finds that definition maximizing both efficiency measures and moreover, the monetary amount that the company should invest per customer in the campaign for achieving the optimal solution. This has been modelled as a multiobjective optimization problem that we solved using compromise programming. Numerical results using real data from a Spanish retailing company are presented and discussed in order to show the performance and validity of our proposal.  相似文献   

9.
This paper presents an integrated fuzzy-optimization customer grouping based logistics distribution methodology for quickly responding to a variety of customer demands. The proposed methodology involves three main mechanisms: (1) pre-route customer classification using fuzzy clustering techniques, (2) determination of customer group-based delivery service priority and (3) en-route goods delivery using multi-objective optimization programming methods. In the process of pre-route customer classification, the proposed method groups customers’ orders primarily based on the multiple attributes of customer demands, rather than by static geographic attributes, which are mainly considered in classical vehicle routing algorithms. Numerical studies including a real-world application are conducted to illustrate the applicability of the proposed method and its potential advantages over existing operational strategies. Using the proposed method, it is shown that the overall performance of a logistics distribution system can be improved by more than 20%, according to the numerical results from the case studied.  相似文献   

10.
In many industrial processes hundreds of noisy and correlated process variables are collected for monitoring and control purposes. The goal is often to correctly classify production batches into classes, such as good or failed, based on the process variables. We propose a method for selecting the best process variables for classification of process batches using multiple criteria including classification performance measures (i.e., sensitivity and specificity) and the measurement cost. The method applies Partial Least Squares (PLS) regression on the training set to derive an importance index for each variable. Then an iterative classification/elimination procedure using k-Nearest Neighbor is carried out. Finally, Pareto analysis is used to select the best set of variables and avoid excessive retention of variables. The method proposed here consistently selects process variables important for classification, regardless of the batches included in the training data. Further, we demonstrate the advantages of the proposed method using six industrial datasets.  相似文献   

11.
Multivariate longitudinal data arise frequently in a variety of applications, where multiple outcomes are measured repeatedly from the same subject. In this paper, we first propose a two-stage weighted least square estimation procedure for the regression coefficients when the random error follows an irregular autoregressive(AR) process, and establish asymptotic normality properties for the resulting estimators. We then apply the smoothly clipped absolute deviation(SCAD) variable selection approach to determine the order of the AR error process. We further propose a test statistic to check whether multiple responses are correlated at the same observation time, and derive the asymptotic distribution of the proposed test statistic. Several simulated examples and real data analysis are presented to illustrate the finite-sample performance of the proposed method.  相似文献   

12.
本文在多种复杂数据下, 研究一类半参数变系数部分线性模型的统计推断理论和方法. 首先在纵向数据和测量误差数据等复杂数据下, 研究半参数变系数部分线性模型的经验似然推断问题, 分别提出分组的和纠偏的经验似然方法. 该方法可以有效地处理纵向数据的组内相关性给构造经验似然比函数所带来的困难. 其次在测量误差数据和缺失数据等复杂数据下, 研究模型的变量选择问题, 分别提出一个“纠偏” 的和基于借补值的变量选择方法. 该变量选择方法可以同时选择参数分量及非参数分量中的重要变量, 并且变量选择与回归系数的估计同时进行. 通过选择适当的惩罚参数, 证明该变量选择方法可以相合地识别出真实模型, 并且所得的正则估计具有oracle 性质.  相似文献   

13.
The stock exchanges in China give a stock special treatment in order to indicate its risk warning if the corresponding listed company cannot meet some requirements on financial performance. To correctly predict the special treatment of stocks is very important for the investors. The performance of the prediction models is mainly affected by the selection of explanatory variables and modelling methods. This paper makes a comparison between the multi-period hazard models and five widely used single-period static models by investigating a comprehensive category of variables including accounting variables, market variables, characteristic variables and macroeconomic variables. The empirical result shows that the performance of the models is sensitive to the choice of explanatory variables but the performance between the multi-period hazard models and the single-period static models has no significant difference.  相似文献   

14.
In this paper, we consider the problem of variable selection and model detection in varying coefficient models with longitudinal data. We propose a combined penalization procedure to select the significant variables, detect the true structure of the model and estimate the unknown regression coefficients simultaneously. With appropriate selection of the tuning parameters, we show that the proposed procedure is consistent in both variable selection and the separation of varying and constant coefficients, and the penalized estimators have the oracle property. Finite sample performances of the proposed method are illustrated by some simulation studies and the real data analysis.  相似文献   

15.
Near infrared (NIR) spectroscopy has been extensively used in classification problems because it is fast, reliable, cost-effective, and non-destructive. However, NIR data often have several hundred or thousand variables (wavelengths) that are highly correlated with each other. Thus, it is critical to select a few important features or wavelengths that better explain NIR data. Wavelets are popular as preprocessing tools for spectra data. Many applications perform feature selection directly, based on high-dimensional wavelet coefficients, and this can be computationally expensive. This paper proposes a two-stage scheme for the classification of NIR spectra data. In the first stage, the proposed multi-scale vertical energy thresholding procedure is used to reduce the dimension of the high-dimensional spectral data. In the second stage, a few important wavelet coefficients are selected using the proposed support vector machines gradient-recursive feature elimination. The proposed two-stage method has produced better classification performance, with higher computational efficiency, when tested on four NIR data sets.  相似文献   

16.
This study provides operational guidance for building naïve Bayes Bayesian network (BN) models for bankruptcy prediction. First, we suggest a heuristic method that guides the selection of bankruptcy predictors. Based on the correlations and partial correlations among variables, the method aims at eliminating redundant and less relevant variables. A naïve Bayes model is developed using the proposed heuristic method and is found to perform well based on a 10-fold validation analysis. The developed naïve Bayes model consists of eight first-order variables, six of which are continuous. We also provide guidance on building a cascaded model by selecting second-order variables to compensate for missing values of first-order variables. Second, we analyze whether the number of states into which the six continuous variables are discretized has an impact on the model’s performance. Our results show that the model’s performance is the best when the number of states for discretization is either two or three. Starting from four states, the performance starts to deteriorate, probably due to over-fitting. Finally, we experiment whether modeling continuous variables with continuous distributions instead of discretizing them can improve the model’s performance. Our finding suggests that this is not true. One possible reason is that continuous distributions tested by the study do not represent well the underlying distributions of empirical data. Finally, the results of this study could also be applicable to business decision-making contexts other than bankruptcy prediction.  相似文献   

17.
This paper introduces an artificial neural network (ANN) application to a hot strip mill to improve the model’s prediction ability for rolling force and rolling torque, as a function of various process parameters. To obtain a data basis for training and validation of the neural network, numerous three dimensional finite element simulations were carried out for different sets of process variables. Experimental data were compared with the finite element predictions to verify the model accuracy. The input variables are selected to be rolling speed, percentage of thickness reduction, initial temperature of the strip and friction coefficient in the contact area. A comprehensive analysis of the prediction errors of roll force and roll torque made by the ANN is presented. Model responses analysis is also conducted to enhance the understanding of the behavior of the NN model. The resulted ANN model is feasible for on-line control and rolling schedule optimization, and can be easily extended to cover different aluminum grades and strip sizes in a straight-forward way by generating the corresponding training data from a FE model.  相似文献   

18.

The efficiency of banks has a critical role in development of sound financial systems of countries. Data Envelopment Analysis (DEA) has witnessed an increase in popularity for modeling the performance efficiency of banks. Such efficiency depends on the appropriate selection of input and output variables. In literature, no agreement exists on the selection of relevant variables. The disagreement has been an on-going debate among academic experts, and no diagnostic tools exist to identify variable misspecifications. A cognitive analytics management framework is proposed using three processes to address misspecifications. The cognitive process conducts an extensive review to identify the most common set of variables. The analytics process integrates a random forest method; a simulation method with a DEA measurement feedback; and Shannon Entropy to select the best DEA model and its relevant variables. Finally, a management process discusses the managerial insights to manage performance and impacts. A sample of data is collected on 303 top-world banks for the periods 2013 to 2015 from 49 countries. The experimental simulation results identified the best DEA model along with its associated variables, and addressed the misclassification of the total deposits. The paper concludes with the limitations and future research directions.

  相似文献   

19.
This paper deals with an BMAP/G/1 G-queues with second optional service and multiple vacations. Arrivals of positive customers and negative customers follow a batch Markovian arrival process (BMAP) and Markovian arrival process (MAP), respectively. After completion of the essential service of a customer, it may go for a second phase of service. The arrival of a negative customer removes the customer being in service. The server leaves for a vacation as soon as the system empties and is allowed to take repeated (multiple) vacations. By using the supplementary variables method and the censoring technique, we obtain the queue length distributions. We obtain the mean of the busy period based on the renewal theory.  相似文献   

20.
提出了客户关系与营销活动的动态交互模型,以长期收益最大化为目标,优化企业的营销活动。模型假设客户关系可离散为几个层级状态,并设客户关系所处状态受营销活动的影响而动态的变化,服从马尔可夫决策过程。客户关系状态所处层级不可直接观测,但其与客户购买水平有概率相关关系。提出模型参数估计的最大似然估计方法。以国内某企业的客户关系管理数据为例,说明了模型变量的定义方法,通过客户交互历史数据估计模型参数,并对客户管理策略进行优化。结果表明,最优策略管理下期望提升客户价值61%~82%。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号