首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
Previous studies on financial distress prediction (FDP) almost construct FDP models based on a balanced data set, or only use traditional classification methods for FDP modelling based on an imbalanced data set, which often results in an overestimation of an FDP model’s recognition ability for distressed companies. Our study focuses on support vector machine (SVM) methods for FDP based on imbalanced data sets. We propose a new imbalance-oriented SVM method that combines the synthetic minority over-sampling technique (SMOTE) with the Bagging ensemble learning algorithm and uses SVM as the base classifier. It is named as SMOTE-Bagging-based SVM-ensemble (SB-SVM-ensemble), which is theoretically more effective for FDP modelling based on imbalanced data sets with limited number of samples. For comparative study, the traditional SVM method as well as three classical imbalance-oriented SVM methods such as cost-sensitive SVM, SMOTE-SVM, and data-set-partition-based SVM-ensemble are also introduced. We collect an imbalanced data set for FDP from the Chinese publicly traded companies, and carry out 100 experiments to empirically test its effectiveness. The experimental results indicate that the new SB-SVM-ensemble method outperforms the traditional methods and is a useful tool for imbalanced FDP modelling.  相似文献   

2.
This paper compares regression analysis and data envelopment analysis as two alternative methods for assessing the comparative performance of homogeneous organizational units such as bank branches or schools. The comparison is restricted to units using a single resource or securing a single output. It focuses on the estimates of relative efficiency, marginal input-output values and target input-output levels that the two methods offer. A set of hypothetical hospitals is used to illustrate the performance of the two methods. It is found that, in general, data envelopment analysis outperforms regression analysis on accuracy of estimates but regression analysis offers greater stability of accuracy.  相似文献   

3.
针对利用网络搜索数据合成各类指数时难以消除共线性、难以合理确定各指标权重等问题,基于数据降维的思想构建了以主成分分析法为核心的指数合成方法,并以利用网络搜索数据合成消费者物价指数为例,将提出的方法与当前主流的逐步回归法做出比较研究.研究结果表明,基于主成分分析法的数据合成方法可以得到稳定性和拟合度更高的指数.  相似文献   

4.
Bayes判别在进行判别分析时考虑到各总体出现的先验概率、预报的先验概率及错判造成的损失,其判别效能优于其他判别方法.对Bayes判别方法详细介绍的基础上,利用R软件对一组舒张压和胆固醇数据分别进行Bayes判别分析、Fisher判别分析和基于距离的判别分析,对比三种不同方法下得到的判别结果,结果表明Bayes判别分析得到的分类结果精度较高,Bayes判别分析在医学领域有较好的应用前景.  相似文献   

5.
This article presents a method for visualization of multivariate functions. The method is based on a tree structure—called the level set tree—built from separated parts of level sets of a function. The method is applied for visualization of estimates of multivarate density functions. With different graphical representations of level set trees we may visualize the number and location of modes, excess masses associated with the modes, and certain shape characteristics of the estimate. Simulation examples are presented where projecting data to two dimensions does not help to reveal the modes of the density, but with the help of level set trees one may detect the modes. I argue that level set trees provide a useful method for exploratory data analysis.  相似文献   

6.
基于病例队列数据的比例风险模型的诊断   总被引:1,自引:0,他引:1  
余吉昌  曹永秀 《数学学报》2020,63(2):137-148
病例队列设计是一种在生存分析中广泛应用的可以降低成本又能提高效率的抽样方法.对于病例队列数据,已经有很多统计方法基于比例风险模型来估计协变量对生存时间的影响.然而,很少有工作基于病例队列数据来检验模型的假设是否成立.在这篇文章中,我们基于渐近的零均的值随机过程提出了一类检验统计量,这类检验统计量可以基于病例队列数据来检验比例风险模型的假设是否成立.我们通过重抽样的方法来逼近上述检验统计量的渐近分布,通过数值模拟来研究所提方法在有限样本下的表现,最后将所提出的方法应用于一个国家肾母细胞瘤研究的真实数据集上.  相似文献   

7.
Extended VIKOR method in comparison with outranking methods   总被引:1,自引:0,他引:1  
The VIKOR method was developed to solve MCDM problems with conflicting and noncommensurable (different units) criteria, assuming that compromising is acceptable for conflict resolution, the decision maker wants a solution that is the closest to the ideal, and the alternatives are evaluated according to all established criteria. This method focuses on ranking and selecting from a set of alternatives in the presence of conflicting criteria, and on proposing compromise solution (one or more). The VIKOR method is extended with a stability analysis determining the weight stability intervals and with trade-offs analysis. The extended VIKOR method is compared with three multicriteria decision making methods: TOPSIS, PROMETHEE, and ELECTRE. A numerical example illustrates an application of the VIKOR method, and the results by all four considered methods are compared.  相似文献   

8.
This paper deals with the issue of estimating production frontier and measuring efficiency from a panel data set. First, it proposes an alternate method for the estimation of a production frontier on a short panel data set. The method is based on the so-called mean-and-covariance structure analysis which is closely related to the generalized method of moments. One advantage of the method is that it allows us to investigate the presence of correlations between individual effects and exogenous variables without the requirement of some available instruments uncorrelated with the individual effects as in instrumental variable estimation. Another advantage is that the method is well suited to a panel data set with a short number of periods. Second, the paper considers the question of recovering individual efficiency levels from the estimates obtained from the mean-and-covariance structure analysis. Since individual effects are here viewed as latent variables, they can be estimated as factor scores, i.e., weighted sums of the observed variables. We illustrate the proposed methods with the estimation of a stochastic production frontier on a short panel data of French fruit growers.  相似文献   

9.
A simple methodology is presented for sensitivity analysis ofmodels that have been fitted to data by statistical methods.Such analysis is a decision support tool that can focus theeffort of a modeller who wishes to further refine a model and/orto collect more data. A formula is given for the calculationof the proportional reduction in the variance of the model ‘output’that would be achievable with perfect knowledge of a subsetof the model parameters. This is a measure of the importanceof the set of parameters, and is shown to be asymptoticallyequal to the squared correlation between the model output andits best predictor based on the omitted parameters. The methodology is illustrated with three examples of OR problems,an age-based equipment replacement model, an ARIMA forecastingmodel and a cancer screening model. The sampling error of thecalculated percentage of variance reduction is studied theoretically,and a simulation study is then used to exemplify the accuracyof the method as a function of sample size.  相似文献   

10.
Comparison of three multicriteria methods to predict known outcomes   总被引:1,自引:0,他引:1  
Major approaches to selection decisions include multiattribute utility theory and outranking methods. One of the most frustrating aspects of research in the relative performance of these methods is that data where the final outcome is known is not available. In the US, a great deal of effort has been devoted to statistically recording detailed performance characteristics of major league professional baseball. Every year there has been two to four seasonal competitions, with known outcome in terms of the proportion of contests won. Successful teams often have diverse characteristics, emphasizing different characteristics. SMART, PROMETHEE, and a centroid method were applied to baseball data over the period 1901–1991. Baseball has undergone a series of changes in style over that period, and different physical and administrative characteristics. Therefore the data was divided into decades, with the first five years used as a training set, and the last five years used for data collection. Regression was used to develop the input for preference selection in each method. Single-attribute utilities for criteria performance were generated from the first five years of data from each set. Relative accuracy of multicriteria methods was compared over 114 competitive seasons for both selecting the winning team, as well as for rank-ordering all teams. All the methods have value in supporting human decision making. PROMETHEE II using Gaussian preference functions and SMART were found to be the most accurate. The centroid method and PROMETHEE II using ordinal data were found to involve little sacrifice in predictive accuracy.  相似文献   

11.
聂斌  王曦  胡雪 《运筹与管理》2019,28(1):101-107
在质量控制领域,非线性轮廓异常点识别问题是重点研究问题之一。本文综合运用了小波分析、数据深度、聚类分析等数据分析处理技术,提出了一种新的非正态变异的异常点识别方法。文章通过仿真分析技术,将新方法χ2与控制图方法进行性能对比,结果证实新方法能够以更高的准确率和稳定性识别异常点,表现出更好的异常点识别性能。最后将新方法应用于木板垂直密度轮廓实例对新方法进行验证,分析结果表明本方法能够有效识别出异常轮廓数据。  相似文献   

12.
Threshold noise reduction methods of vibration signals have been widely researched and used. However, these methods are less efficient in such situation, including requiring a time‐consuming and subjective to manual editing because different degree of noise signal requires selecting different characterization for filtering. In this paper, an efficient denoising method based on PDE for mechanical vibration signals time‐frequency distribution is investigated, in which, a one‐dimensional vibration signal is transformed into 2D time‐frequency domain by using Gabor transform. This enables (i) simultaneously utilize both time and frequency characteristic for effectively multiple dimension signal denosing and (ii) isotropic and anisotropic characteristics to be imposed by employing PDE, which explicitly fit with the local structure of time‐frequency signal. This paper analyzes the basic methods of isotropic and anisotropic diffusion filtering, investigates the anisotropic diffusion method based on local feature structure of 2D information, and conducts a set of comparative tests. Experiments show that this proposed method has a better performance of denoising than that of thresholding. At the same time, it is more handy than that of other methods, such as independent component analysis. Finally, problems and ways of improving the PDE‐based filter method are analyzed in this paper. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

13.
Multidimensional scaling (MDS) is a set of techniques, used especially in behavioral and social sciences, that enable a researcher to visualize proximity data in a multidimensional space. This article focuses on a particular class of MDS models proposed to deal with proximities which describe asymmetric relationships (i.e., trade indices for a set of countries, brand switching data, occupational mobility tables, and so on). They are based on the decomposition of the relationships into a symmetric and a skew-symmetric part. In this way the objects are represented as points in a multidimensional space and the intensity of their relationships as scalar products (symmetry) or triangle areas (skew-symmetry). These models are seen as special cases of a general model and their rotational indeterminacy is investigated. The aim is to propose a rotation method that makes easier the visual inspection of the graphical representation, highlighting the simple structure of the data. In particular an orthomax-like family of rotation methods and a general algorithm are proposed. Advantages of the proposal are illustrated by analysis of import-export data.  相似文献   

14.
Among the large amount of genes presented in microarray gene expression data, only a small fraction of them is effective for performing a certain diagnostic test. In this regard, a new feature selection algorithm is presented based on rough set theory. It selects a set of genes from microarray data by maximizing the relevance and significance of the selected genes. A theoretical analysis is presented to justify the use of both relevance and significance criteria for selecting a reduced gene set with high predictive accuracy. The importance of rough set theory for computing both relevance and significance of the genes is also established. The performance of the proposed algorithm, along with a comparison with other related methods, is studied using the predictive accuracy of K-nearest neighbor rule and support vector machine on five cancer and two arthritis microarray data sets. Among seven data sets, the proposed algorithm attains 100% predictive accuracy for three cancer and two arthritis data sets, while the rough set based two existing algorithms attain this accuracy only for one cancer data set.  相似文献   

15.
The available methods to handle missing values in principal component analysis only provide point estimates of the parameters (axes and components) and estimates of the missing values. To take into account the variability due to missing values a multiple imputation method is proposed. First a method to generate multiple imputed data sets from a principal component analysis model is defined. Then, two ways to visualize the uncertainty due to missing values onto the principal component analysis results are described. The first one consists in projecting the imputed data sets onto a reference configuration as supplementary elements to assess the stability of the individuals (respectively of the variables). The second one consists in performing a principal component analysis on each imputed data set and fitting each obtained configuration onto the reference one with Procrustes rotation. The latter strategy allows to assess the variability of the principal component analysis parameters induced by the missing values. The methodology is then evaluated from a real data set.  相似文献   

16.
In this paper, we develop and compare two methods for solving the problem of determining the global maximum of a function over a feasible set. The two methods begin with a random sample of points over the feasible set. Both methods then seek to combine these points into “regions of attraction” which represent subsets of the points which will yield the same local maximums when an optimization procedure is applied to points in the subset. The first method for constructing regions of attraction is based on approximating the function by a mixture of normal distributions over the feasible region and the second involves attempts to apply cluster analysis to form regions of attraction. The two methods are then compared on a set of well-known test problems.  相似文献   

17.
In a knowledge-based system, which aims at supporting persons who are interested in the analysis of special data, the problem can arise that a whole set of proposals is generated in answer to a question of a user. Such proposals are based on appropriate interconnections between user wishes, available original data as well as derived data obtained by application of adequate methods, the methods mentioned, and data analysis objectives. We use graphical visualizations of proposals to outline how the system would cope with the underlying situation.In this paper, special attention is paid to the concept of knowledge-based comparisons of proposals when propagation of certainty factors is used for a-priori judgments of proposals generated (before suggested proposals are performed). After-wards, a-posteriori judgments of proposals considered (after solutions have been computed by application of selected proposals) can be based on goodness of fit criteria derived from chosen outputs.  相似文献   

18.
Professionals in neuropsychology usually perform diagnoses of patients’ behaviour in a verbal rather than in a numerical form. This fact generates interest in decision support systems that process verbal data. It also motivates us to develop methods for the classification of such data. In this paper, we describe ways of aiding classification of a discrete set of objects, evaluated on set of criteria that may have verbal estimations, into ordered decision classes. In some situations, there is no explicit additional information available, while in others it is possible to order the criteria lexicographically. We consider both of these cases. The proposed Dichotomic Classification (DC) method is based on the principles of Verbal Decision Analysis (VDA). Verbal Decision Analysis methods are especially helpful when verbal data, in criteria values, are to be handled. When compared to the previously developed Verbal Decision Analysis classification methods, Dichotomic Classification method performs better on the same data sets and is able to cope with larger sizes of the object sets to be classified. We present an interactive classification procedure, estimate the effectiveness and computational complexity of the new method and compare it to one of the previously developed Verbal Decision Analysis methods. The developed and studied methods are implemented in the framework of a decision support system, and the results of testing on artificial sets of data are reported.  相似文献   

19.
Isotonic nonparametric least squares (INLS) is a regression method for estimating a monotonic function by fitting a step function to data. In the literature of frontier estimation, the free disposal hull (FDH) method is similarly based on the minimal assumption of monotonicity. In this paper, we link these two separately developed nonparametric methods by showing that FDH is a sign-constrained variant of INLS. We also discuss the connections to related methods such as data envelopment analysis (DEA) and convex nonparametric least squares (CNLS). Further, we examine alternative ways of applying isotonic regression to frontier estimation, analogous to corrected and modified ordinary least squares (COLS/MOLS) methods known in the parametric stream of frontier literature. We find that INLS is a useful extension to the toolbox of frontier estimation both in the deterministic and stochastic settings. In the absence of noise, the corrected INLS (CINLS) has a higher discriminating power than FDH. In the case of noisy data, we propose to apply the method of non-convex stochastic envelopment of data (non-convex StoNED), which disentangles inefficiency from noise based on the skewness of the INLS residuals. The proposed methods are illustrated by means of simulated examples.  相似文献   

20.
加权线性支持向量分类机是数据挖掘的新方法.它对应于一个优化问题.针对加权线性支持向量分类机优化问题建立了数据扰动分析理论方法.具体地针对加权线性支持向量分类机的原始问题建立了数据扰动分析基本定理,定理可以得到加权线性支持向量分类机问题的解及决策函数对数据参数的偏导数,同时可以定量分析输入数据的误差以及数据各种变化对其解以及决策函数值的定量影响,可以回答加权线性支持向量分类机问题的稳定性问题和灵敏度分析问题.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号