首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 141 毫秒
1.
A Tabu search method is proposed and analysed for selecting variables that are subsequently used in Logistic Regression Models. The aim is to find from among a set of m variables a smaller subset which enables the efficient classification of cases. Reducing dimensionality has some very well-known advantages that are summarized in literature. The specific problem consists in finding, for a small integer value of p, a subset of size p of the original set of variables that yields the greatest percentage of hits in Logistic Regression. The proposed Tabu search method performs a deep search in the solution space that alternates between a basic phase (that uses simple moves) and a diversification phase (to explore regions not previously visited). Testing shows that it obtains significantly better results than the Stepwise, Backward or Forward methods used by classic statistical packages. Some results of applying these methods are presented.  相似文献   

2.
The feature selection problem is an interesting and important topic which is relevant for a variety of database applications. This paper utilizes the Tabu Search metaheuristic algorithm to implement a feature subset selection procedure while the nearest neighbor classification method is used for the classification task. Tabu Search is a general metaheuristic procedure that is used in order to guide the search to obtain good solutions in complex solution spaces. Several metrics are used in the nearest neighbor classification method, such as the euclidean distance, the Standardized Euclidean distance, the Mahalanobis distance, the City block metric, the Cosine distance and the Correlation distance, in order to identify the most significant metric for the nearest neighbor classifier. The performance of the proposed algorithms is tested using various benchmark datasets from UCI Machine Learning Repository.  相似文献   

3.
Optimal subset selection among a general family of threshold autoregressive moving-average (TARMA) models is considered. The usual complexity of model/order selection is increased by capturing the uncertainty of unknown threshold levels and an unknown delay lag. The Monte Carlo method of Bayesian model averaging provides a possible way to overcome such model uncertainty. Incorporating with the idea of Bayesian model averaging, a modified stochastic search variable selection method is adapted to consider subset selection in TARMA models, by adding latent indicator variables for all potential model lags as part of the proposed Markov chain Monte Carlo sampling scheme. Metropolis–Hastings methods are employed to deal with the well-known difficulty of including moving-average terms in the model and a novel proposal mechanism is designed for this purpose. Bayesian comparison of two hyper-parameter settings is carried out via a simulation study. The results demonstrate that the modified method has favourable performance under reasonable sample size and appropriate settings of the necessary hyper-parameters. Finally, the application to four real datasets illustrates that the proposed method can provide promising and parsimonious models from more than 16 million possible subsets.  相似文献   

4.
This study compares DEA (data envelopment analysis) with DEA–DA (discriminant analysis) in terms of bankruptcy assessment. Recently, many DEA researchers propose a use of DEA as a quick-and-easy tool to assess corporate bankruptcy. Meanwhile, other DEA researchers discuss a use of DEA–DA for bankruptcy-based financial analysis. The two groups are very different from the conventional use of DEA because we have long applied DEA to the measurement of operational performance, or productivity analysis. The two research groups open up a new application area (bankruptcy-based financial assessment) for DEA. This study discusses methodological strengths and weaknesses of DEA and DEA–DA from the perspective of corporate failure. The proposed comparative analysis has the three main criteria: (a) how to handle negative data in financial variables, (b) how to handle data imbalance between default and non-default firms, and (c) how to identify a failure process over time. This study finds that DEA is a managerial tool for the initial assessment of corporate failure and DEA is useful for busy corporate leaders and financial managers. In contrast, DEA–DA is useful for researchers and individuals who are interested in the detailed assessment of bankruptcy and its failure process in a time horizon.  相似文献   

5.
This article presents a Markov chain Monte Carlo algorithm for both variable and covariance selection in the context of logistic mixed effects models. This algorithm allows us to sample solely from standard densities with no additional tuning. We apply a stochastic search variable approach to select explanatory variables as well as to determine the structure of the random effects covariance matrix.

Prior determination of explanatory variables and random effects is not a prerequisite because the definite structure is chosen in a data-driven manner in the course of the modeling procedure. To illustrate the method, we give two bank data examples.  相似文献   

6.
This paper presents a metaheuristic solution approach based on Tabu search for the open-pit mine production scheduling problem with metal uncertainty. To search the feasible domain more extensively, two different diversification strategies are used to generate several initial solutions to be optimized by the Tabu search procedure. The first diversification strategy exploits a long-term memory of the search history. The second one relies on the variable neighborhood search method. Numerical results on realistic large-scale instances are provided to indicate the efficiency of the solution approach to produce very good solutions in relatively short computational times.  相似文献   

7.
8.
Spatial semiparametric varying coefficient models are a useful extension of spatial linear model. Nevertheless, how to conduct variable selection for it has not been well investigated. In this paper, by basis spline approximation together with a general M-type loss function to treat mean, median, quantile and robust mean regressions in one setting, we propose a novel partially adaptive group \(L_{r} (r\ge 1)\) penalized M-type estimator, which can select variables and estimate coefficients simultaneously. Under mild conditions, the selection consistency and oracle property in estimation are established. The new method has several distinctive features: (1) it achieves robustness against outliers and heavy-tail distributions; (2) it is more flexible to accommodate heterogeneity and allows the set of relevant variables to vary across quantiles; (3) it can keep balance between efficiency and robustness. Simulation studies and real data analysis are included to illustrate our approach.  相似文献   

9.
10.
分位数变系数模型是一种稳健的非参数建模方法.使用变系数模型分析数据时,一个自然的问题是如何同时选择重要变量和从重要变量中识别常数效应变量.本文基于分位数方法研究具有稳健和有效性的估计和变量选择程序.利用局部光滑和自适应组变量选择方法,并对分位数损失函数施加双惩罚,我们获得了惩罚估计.通过BIC准则合适地选择调节参数,提出的变量选择方法具有oracle理论性质,并通过模拟研究和脂肪实例数据分析来说明新方法的有用性.数值结果表明,在不需要知道关于变量和误差分布的任何信息前提下,本文提出的方法能够识别不重要变量同时能区分出常数效应变量.  相似文献   

11.
This paper considers the weighted composite quantile (WCQ) regression for linear model with random censoring. The adaptive penalized procedure for variable selection in this model is proposed, and the consistency, asymptotic normality and oracle property of the resulting estimators are also derived. The simulation studies and the analysis of an acute myocardial infarction data set are conducted to illustrate the finite sample performance of the proposed method.  相似文献   

12.
With advanced capability in data collection, applications of linear regression analysis now often involve a large number of predictors. Variable selection thus has become an increasingly important issue in building a linear regression model. For a given selection criterion, variable selection is essentially an optimization problem that seeks the optimal solution over 2m possible linear regression models, where m is the total number of candidate predictors. When m is large, exhaustive search becomes practically impossible. Simple suboptimal procedures such as forward addition, backward elimination, and backward-forward stepwise procedure are fast but can easily be trapped in a local solution. In this article we propose a relatively simple algorithm for selecting explanatory variables in a linear regression for a given variable selection criterion. Although the algorithm is still a suboptimal algorithm, it has been shown to perform well in extensive empirical study. The main idea of the procedure is to partition the candidate predictors into a small number of groups. Working with various combinations of the groups and iterating the search through random regrouping, the search space is substantially reduced, hence increasing the probability of finding the global optimum. By identifying and collecting “important” variables throughout the iterations, the algorithm finds increasingly better models until convergence. The proposed algorithm performs well in simulation studies with 60 to 300 predictors. As a by-product of the proposed procedure, we are able to study the behavior of variable selection criteria when the number of predictors is large. Such a study has not been possible with traditional search algorithms.

This article has supplementary material online.  相似文献   

13.
This paper investigates the feature subset selection problem for the binary classification problem using logistic regression model. We developed a modified discrete particle swarm optimization (PSO) algorithm for the feature subset selection problem. This approach embodies an adaptive feature selection procedure which dynamically accounts for the relevance and dependence of the features included the feature subset. We compare the proposed methodology with the tabu search and scatter search algorithms using publicly available datasets. The results show that the proposed discrete PSO algorithm is competitive in terms of both classification accuracy and computational performance.  相似文献   

14.
利用结构化方法构造了杠杆公司的金融资产组合,由于公司破产的不可逆性和不确定性,可以把公司破产理解为公司所发行的债券发生违约.通过求解回望期权所满足的抛物型随机偏微分方程,推导出了混合分数跳-扩散模型下杠杆公司的股票定价公式,给出了杠杆公司在财务出现危机时股东通过资本注入来弥补经营损失和清偿债务而没有导致公司破产的概率,...  相似文献   

15.
在线性回归模型建模中, 回归自变量选择是一个受到广泛关注、文献众多, 具有很强的理论和实际意义的问题. 回归自变量选择子集的相合性是其中一个重要问题, 如果某种自变量选择方法选择的子集在样本量趋于无穷时是相合的, 而且预测均方误差较小, 则这种方法是可取的. 利用BIC准则可以挑选相合的自变量子集, 但是在自变量个数很多时计算量过大; 适应lasso方法具有较高计算效率, 也能找到相合的自变量子集; 本文提出一种更简单的自变量选择方法, 只需要计算两次普通线性回归: 第一次进行全集回归, 得到全集的回归系数估计, 然后利用这些回归系数估计挑选子集, 然后只要在挑选的自变量子集上再进行一次普通线性回归就得到了回归结果. 考虑如下的回归模型: 其中回归系数中非零分量下标的集合为, 设是本文方法选择的自变量子集下标集合, 是本文方法估计的回归系数(未选中的自变量对应的系数为零), 本文证明了, 在适当条件下, 其中表示的 分量下标在中的元素的组成的向量, 是误差方差, 是与 矩阵极限有关的矩阵和常数. 数值模拟结果表明本文方法具有很好的中小样本性质.  相似文献   

16.
17.
Supervised clustering of variables   总被引:1,自引:0,他引:1  
In predictive modelling, highly correlated predictors lead to unstable models that are often difficult to interpret. The selection of features, or the use of latent components that reduce the complexity among correlated observed variables, are common strategies. Our objective with the new procedure that we advocate here is to achieve both purposes: to highlight the group structure among the variables and to identify the most relevant groups of variables for prediction. The proposed procedure is an iterative adaptation of a method developed for the clustering of variables around latent variables (CLV). Modification of the standard CLV algorithm leads to a supervised procedure, in the sense that the variable to be predicted plays an active role in the clustering. The latent variables associated with the groups of variables, selected for their “proximity” to the variable to be predicted and their “internal homogeneity”, are progressively added in a predictive model. The features of the methodology are illustrated based on a simulation study and a real-world application.  相似文献   

18.
Bankruptcy prediction is a key part in corporate credit risk management. Traditional bankruptcy prediction models employ financial ratios or market prices to predict bankruptcy or financial distress prior to its occurrence. We investigate the predictive accuracy of corporate efficiency measures along with standard financial ratios in predicting corporate distress in Chinese companies. Data Envelopment Analysis (DEA) is used to measure corporate efficiency. In contrast to previous applications of DEA in credit risk modelling where it was used to generate a single efficiency—Technical Efficiency (TE), we assume Variable Returns to Scale, and decompose TE into Pure Technical Efficiency and Scale Efficiency. These measures are introduced into Logistic Regression to predict the probability of distress, along with the level of Returns to Scale. Effects of efficiency variables are allowed to vary across industries through the use of interaction terms, while the financial ratios are assumed to have the same effects across all sectors. The results show that the predictive power of the model is improved by this corporate efficiency information.  相似文献   

19.
Two metaheuristic methods based on Tabu search are introduced to assign judges to individual competitions in a tournament. The complexity of the mathematical formulation accounting for the assignment rules, leads us to use such an approach. The first metaheuristic includes two different Tabu searches that are combined with a diversification strategy. The second metaheuristic is applied to a penalized version of the original model formulated as an assignment problem. This metaheuristic is also based on a Tabu search procedure including a diversification strategy driven by the constraints violated. Numerical results are provided to indicate the efficiency of the methods to generate very good solutions.  相似文献   

20.
The p-median problem with positive and negative weights has been introduced by Burkard and Krarup [Computing 60 (1998) 193]. In this paper we discuss some special cases of this problem on trees and propose a variable neighborhood search procedure for general networks, which is in fact a modification of the one proposed by Hansen and Mladenovic [Locat. Sci. 5 (1997) 207] for the p-median. We also compare the results with those obtained by a Tabu search procedure.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号