首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this paper we propose a new nonparametric regression method called composite support vector quantile regression (CSVQR) that combines the formulations of support vector regression and composite quantile regression. First the CSVQR using the quadratic programming (QP) is proposed and then the CSVQR utilizing the iteratively reweighted least squares (IRWLS) procedure is proposed to overcome weakness of the QP based method in terms of computation time. The IRWLS procedure based method enables us to derive a generalized cross validation (GCV) function that is easier and faster than the conventional cross validation function. The GCV function facilitates choosing the hyperparameters that affect the performance of the CSVQR and saving computation time. Numerical experiment results are presented to illustrate the performance of the proposed method  相似文献   

2.
Quantile regression provides a more complete statistical analysis of the stochastic relationships among random variables. Sometimes quantile regression functions estimated at different orders can cross each other. We propose a new non-crossing quantile regression method using doubly penalized kernel machine (DPKM) which uses heteroscedastic location-scale model as basic model and estimates both location and scale functions simultaneously by kernel machines. The DPKM provides the satisfying solution to estimating non-crossing quantile regression functions when multiple quantiles for high-dimensional data are needed. We also present the model selection method that employs cross validation techniques for choosing the parameters which affect the performance of the DPKM. One real example and two synthetic examples are provided to show the usefulness of the DPKM.  相似文献   

3.
We introduce a binary regression accounting-based model for bankruptcy prediction of small and medium enterprises (SMEs). The main advantage of the model lies in its predictive performance in identifying defaulted SMEs. Another advantage, which is especially relevant for banks, is that the relationship between the accounting characteristics of SMEs and response is not assumed a priori (eg, linear, quadratic or cubic) and can be determined from the data. The proposed approach uses the quantile function of the generalized extreme value distribution as link function as well as smooth functions of accounting characteristics to flexibly model covariate effects. Therefore, the usual assumptions in scoring models of symmetric link function and linear or pre-specified covariate-response relationships are relaxed. Out-of-sample and out-of-time validation on Italian data shows that our proposal outperforms the commonly used (logistic) scoring model for different default horizons.  相似文献   

4.
The cluster-weighted model (CWM) is a mixture model with random covariates that allows for flexible clustering/classification and distribution estimation of a random vector composed of a response variable and a set of covariates. Within this class of models, the generalized linear exponential CWM is here introduced especially for modeling bivariate data of mixed-type. Its natural counterpart in the family of latent class models is also defined. Maximum likelihood parameter estimates are derived using the expectation-maximization algorithm and some computational issues are detailed. Through Monte Carlo experiments, the classification performance of the proposed model is compared with other mixture-based approaches, consistency of the estimators of the regression coefficients is evaluated, and several likelihood-based information criteria are compared for selecting the number of mixture components. An application to real data is also finally considered.  相似文献   

5.
In previous studies, a wrapper feature selection method for decision support in steel sheet incremental cold shaping process (SSICS) was proposed. The problem included both regression and classification, while the learned models were neural networks and support vector machines, respectively. SSICS is the type of problem for which the number of features is similar to the number of instances in the data set, this represents many of real world decision support problems found in the industry. This study focuses on several questions and improvements that were left open, suggesting proposals for each of them. More specifically, this study evaluates the relevance of the different cross validation methods in the learned models, but also proposes several improvements such as allowing the number of chosen features as well as some of the parameters of the neural networks to evolve, accordingly. Well-known data sets have been use in this experimentation and an in-depth analysis of the experiment results is included. 5 $\times $ 2 CV has been found the more interesting cross validation method for this kind of problems. In addition, the adaptation of the number of features and, consequently, the model parameters really improves the performance of the approach. The different enhancements have been applied to the real world problem, an several conclusions have been drawn from the results obtained.  相似文献   

6.
Mixture cure models were originally proposed in medical statistics to model long-term survival of cancer patients in terms of two distinct subpopulations - those that are cured of the event of interest and will never relapse, along with those that are uncured and are susceptible to the event. In the present paper, we introduce mixture cure models to the area of credit scoring, where, similarly to the medical setting, a large proportion of the dataset may not experience the event of interest during the loan term, i.e. default. We estimate a mixture cure model predicting (time to) default on a UK personal loan portfolio, and compare its performance to the Cox proportional hazards method and standard logistic regression. Results for credit scoring at an account level and prediction of the number of defaults at a portfolio level are presented; model performance is evaluated through cross validation on discrimination and calibration measures. Discrimination performance for all three approaches was found to be high and competitive. Calibration performance for the survival approaches was found to be superior to logistic regression for intermediate time intervals and useful for fixed 12 month time horizon estimates, reinforcing the flexibility of survival analysis as both a risk ranking tool and for providing robust estimates of probability of default over time. Furthermore, the mixture cure model’s ability to distinguish between two subpopulations can offer additional insights by estimating the parameters that determine susceptibility to default in addition to parameters that influence time to default of a borrower.  相似文献   

7.
In this article, we consider the estimation problem of a tree model for multiple conditional quantile functions of the response. Using the generalized, unbiased interaction detection and estimation algorithm, the quantile regression tree (QRT) method has been developed to construct a tree model for an individual quantile function. However, QRT produces different tree models across quantile levels because it estimates several QRT models separately. Furthermore, the estimated quantile functions from QRT often cross each other and consequently violate the basic properties of quantiles. This undesirable phenomenon reduces prediction accuracy and makes it difficult to interpret the resulting tree models. To overcome such limitations, we propose the unified noncrossing multiple quantile regressions tree (UNQRT) method, which constructs a common tree structure across all interesting quantile levels for better data visualization and model interpretation. Furthermore, the UNQRT estimates noncrossing multiple quantile functions simultaneously by enforcing noncrossing constraints, resulting in the improvement of prediction accuracy. The numerical results are presented to demonstrate the competitive performance of the proposed UNQRT over QRT. Supplementary materials for this article are available online.  相似文献   

8.
Simulation experiments are often analyzed through a linear regression model of their input/output data. Such an analysis yields a metamodel or response surface for the underlying simulation model. This metamodel can be validated through various statistics; this article studies (1) the coefficient of determination (R-square) for generalized least squares, and (2) a lack-of-fit F-statistic originally formulated by Rao [Biometrika 46 (1959) 49], who assumed multivariate normality. To derive the distributions of these two validation statistics, this paper shows how to apply bootstrapping—without assuming normality. To illustrate the performance of these bootstrapped validation statistics, the paper uses Monte Carlo experiments with simple models. For these models (i) R-square is a conservative statistic (rejecting a valid metamodel relatively rarely), so its power is low; (ii) Rao’s original statistic may reject a valid metamodel too often; (iii) bootstrapping Rao’s statistic gives only slightly conservative results, so its power is relatively high.  相似文献   

9.
An alternative to the accelerated failure time model is to regress the median of the failure time on the covariates. In the recent years, censored median regression models have been shown to be useful for analyzing a variety of censored survival data with the robustness property. Based on missing information principle, a semiparametric inference procedure for regression parameter has been developed when censoring variable depends on continuous covariate. In order to improve the low coverage accuracy of such procedure, we apply an empirical likelihood ratio method (EL) to the model and derive the limiting distributions of the estimated and adjusted empirical likelihood ratios for the vector of regression parameter. Two kinds of EL confidence regions for the unknown vector of regression parameters are obtained accordingly. We conduct an extensive simulation study to compare the performance of the proposed methods with that normal approximation based method. The simulation results suggest that the EL methods outperform the normal approximation based method in terms of coverage probability. Finally, we make some discussions about our methods.  相似文献   

10.
Bankruptcy prediction by generalized additive models   总被引:2,自引:0,他引:2  
We compare several accounting‐based models for bankruptcy prediction. The models are developed and tested on large data sets containing annual financial statements for Norwegian limited liability firms. Out‐of‐sample and out‐of‐time validation shows that generalized additive models significantly outperform popular models like linear discriminant analysis, generalized linear models and neural networks at all levels of risk. Further, important issues like default horizon and performance depreciation are examined. We clearly see a performance depreciation as the default horizon is increased and as time goes by. Finally a multi‐year model, developed on all available data from three consecutive years, is compared with a one‐year model, developed on data from the most recent year only. The multi‐year model exhibits a desirable robustness to yearly fluctuations that is not present in the one‐year model. Copyright © 2006 John Wiley & Sons, Ltd.  相似文献   

11.
We consider the use ofB-spline nonparametric regression models estimated by the maximum penalized likelihood method for extracting information from data with complex nonlinear structure. Crucial points inB-spline smoothing are the choices of a smoothing parameter and the number of basis functions, for which several selectors have been proposed based on cross-validation and Akaike information criterion known as AIC. It might be however noticed that AIC is a criterion for evaluating models estimated by the maximum likelihood method, and it was derived under the assumption that the ture distribution belongs to the specified parametric model. In this paper we derive information criteria for evaluatingB-spline nonparametric regression models estimated by the maximum penalized likelihood method in the context of generalized linear models under model misspecification. We use Monte Carlo experiments and real data examples to examine the properties of our criteria including various selectors proposed previously.  相似文献   

12.
Overdispersion in time series of counts is very common and has been well studied by many authors, but the opposite phenomenon of underdispersion may also be encountered in real applications and receives little attention. Based on popularity of the generalized Poisson distribution in regression count models and of Poisson INGARCH models in time series analysis, we introduce a generalized Poisson INGARCH model, which can account for both overdispersion and underdispersion. Compared with the double Poisson INGARCH model, conditions for the existence and ergodicity of such a process are easily given. We analyze the autocorrelation structure and also derive expressions for moments of order 1 and 2. We consider the maximum likelihood estimators for the parameters and establish their consistency and asymptotic normality. We apply the proposed model to one overdispersed real example and one underdispersed real example, respectively, which indicates that the proposed methodology performs better than other conventional model-based methods in the literature.  相似文献   

13.
We propose a hybrid deep learning model that merges Variational Autoencoders and Convolutional LSTM Networks (VAE-ConvLSTM) to forecast inflation. Using a public macroeconomic database that comprises 134 monthly US time series from January 1978 to December 2019, the proposed model is compared against several popular econometric and machine learning benchmarks, including Ridge regression, LASSO regression, Random Forests, Bayesian methods, VECM, and multilayer perceptron. We find that VAE-ConvLSTM outperforms the competing models in terms of consistency and out-of-sample performance. The robustness of such conclusion is ensured via cross-validation and Monte-Carlo simulations using different training, validation, and test samples. Our results suggest that macroeconomic forecasting could take advantage of deep learning models when tackling nonlinearities and nonstationarity, potentially delivering superior performance in comparison to traditional econometric approaches based on linear, stationary models.  相似文献   

14.
On choosing “optimal” shape parameters for RBF approximation   总被引:1,自引:0,他引:1  
Many radial basis function (RBF) methods contain a free shape parameter that plays an important role for the accuracy of the method. In most papers the authors end up choosing this shape parameter by trial and error or some other ad hoc means. The method of cross validation has long been used in the statistics literature, and the special case of leave-one-out cross validation forms the basis of the algorithm for choosing an optimal value of the shape parameter proposed by Rippa in the setting of scattered data interpolation with RBFs. We discuss extensions of this approach that can be applied in the setting of iterated approximate moving least squares approximation of function value data and for RBF pseudo-spectral methods for the solution of partial differential equations. The former method can be viewed as an efficient alternative to ridge regression or smoothing spline approximation, while the latter forms an extension of the classical polynomial pseudo-spectral approach. Numerical experiments illustrating the use of our algorithms are included.  相似文献   

15.
半参数再生散度模型是再生散度模型和半参数回归模型的推广,包括了半参数广义线性模型和广义部分线性模型等特殊类型.讨论的是该模型在响应变量和协变量均存在非随机缺失数据情形下参数的Bayes估计和基于Bayes因子的模型选择问题.在分析中,采用了惩罚样条来估计模型中的非参数成分,并建立了Bayes层次模型;为了解决Gibbs抽样过程中因参数高度相关带来的混合性差以及因维数增加导致出现不稳定性的问题,引入了潜变量做为添加数据并应用了压缩Gibbs抽样方法,改进了收敛性;同时,为了避免计算多重积分,利用了M-H算法估计边缘密度函数后计算Bayes因子,为模型的选择比较提供了一种准则.最后,通过模拟和实例验证了所给方法的有效性.  相似文献   

16.
The support vector machine (SVM) is known for its good performance in two-class classification, but its extension to multiclass classification is still an ongoing research issue. In this article, we propose a new approach for classification, called the import vector machine (IVM), which is built on kernel logistic regression (KLR). We show that the IVM not only performs as well as the SVM in two-class classification, but also can naturally be generalized to the multiclass case. Furthermore, the IVM provides an estimate of the underlying probability. Similar to the support points of the SVM, the IVM model uses only a fraction of the training data to index kernel basis functions, typically a much smaller fraction than the SVM. This gives the IVM a potential computational advantage over the SVM.  相似文献   

17.
In this paper we combine the idea of ‘power steady model’, ‘discount factor’ and ‘power prior’, for a general class of filter model, more specifically within a class of dynamic generalized linear models (DGLM). We show an optimality property for our proposed method and present the particle filter algorithm for DGLM as an alternative to Markov chain Monte Carlo method. We also present two applications; one on dynamic Poisson models for hurricane count data in Atlantic ocean and the another on the dynamic Poisson regression model for longitudinal count data.  相似文献   

18.
A numerical-analytical iterative method is proposed for solving generalized self-adjoint regular vector Sturm–Liouville problems with Dirichlet boundary conditions. The method is based on eigenvalue (spectral) correction. The matrix coefficients of the equations are assumed to be nonlinear functions of the spectral parameter. For a relatively close initial approximation, the method is shown to have second-order convergence with respect to a small parameter. Test examples are considered, and the model problem of transverse vibrations of a hinged rod with a variable cross section is solved taking into account its rotational inertia.  相似文献   

19.
Penalized estimation has become an established tool for regularization and model selection in regression models. A variety of penalties with specific features are available and effective algorithms for specific penalties have been proposed. But not much is available to fit models with a combination of different penalties. When modeling the rent data of Munich as in our application, various types of predictors call for a combination of a Ridge, a group Lasso and a Lasso-type penalty within one model. We propose to approximate penalties that are (semi-)norms of scalar linear transformations of the coefficient vector in generalized structured models—such that penalties of various kinds can be combined in one model. The approach is very general such that the Lasso, the fused Lasso, the Ridge, the smoothly clipped absolute deviation penalty, the elastic net and many more penalties are embedded. The computation is based on conventional penalized iteratively re-weighted least squares algorithms and hence, easy to implement. New penalties can be incorporated quickly. The approach is extended to penalties with vector based arguments. There are several possibilities to choose the penalty parameter(s). A software implementation is available. Some illustrative examples show promising results.  相似文献   

20.
In this article, we propose and study a new class of semiparametric mixture of regression models, where the mixing proportions and variances are constants, but the component regression functions are smooth functions of a covariate. A one-step backfitting estimate and two EM-type algorithms have been proposed to achieve the optimal convergence rate for both the global parameters and the nonparametric regression functions. We derive the asymptotic property of the proposed estimates and show that both the proposed EM-type algorithms preserve the asymptotic ascent property. A generalized likelihood ratio test is proposed for semiparametric inferences. We prove that the test follows an asymptotic \(\chi ^2\)-distribution under the null hypothesis, which is independent of the nuisance parameters. A simulation study and two real data examples have been conducted to demonstrate the finite sample performance of the proposed model.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号