共查询到20条相似文献,搜索用时 9 毫秒
1.
In this paper, we propose a new criterion, named PICa, to simultaneously select explanatory variables in the mean model and variance model in heteroscedastic linear models based on the model structure. We show that the new criterion can select the true mean model and a correct variance model with probability tending to 1 under mild conditions. Simulation studies and a real example are presented to evaluate the new criterion, and it turns out that the proposed approach performs well. 相似文献
3.
We focus on the problem of simultaneous variable selection and estimation for nonlinear models based on modal regression (MR), when the number of coefficients diverges with sample size. With appropriate selection of the tuning parameters, the resulting estimator is shown to be consistent and to enjoy the oracle properties. 相似文献
4.
A Tabu search method is proposed and analysed for selecting variables that are subsequently used in Logistic Regression Models. The aim is to find from among a set of m variables a smaller subset which enables the efficient classification of cases. Reducing dimensionality has some very well-known advantages that are summarized in literature. The specific problem consists in finding, for a small integer value of p, a subset of size p of the original set of variables that yields the greatest percentage of hits in Logistic Regression. The proposed Tabu search method performs a deep search in the solution space that alternates between a basic phase (that uses simple moves) and a diversification phase (to explore regions not previously visited). Testing shows that it obtains significantly better results than the Stepwise, Backward or Forward methods used by classic statistical packages. Some results of applying these methods are presented. 相似文献
5.
This paper proposes a robust procedure for solving multiphase regression problems that is efficient enough to deal with data contaminated by atypical observations due to measurement errors or those drawn from heavy-tailed distributions. Incorporating the expectation and maximization algorithm with the M-estimation technique, we simultaneously derive robust estimates of the change-points and regression parameters, yet as the proposed method is still not resistant to high leverage outliers we further suggest a modified version by first moderately trimming those outliers and then implementing the new procedure for the trimmed data. This study sets up two robust algorithms using the Huber loss function and Tukey's biweight function to respectively replace the least squares criterion in the normality-based expectation and maximization algorithm, illustrating the effectiveness and superiority of the proposed algorithms through extensive simulations and sensitivity analyses. Experimental results show the ability of the proposed method to withstand outliers and heavy-tailed distributions. Moreover, as resistance to high leverage outliers is particularly important due to their devastating effect on fitting a regression model to data, various real-world applications show the practicability of this approach. 相似文献
6.
A robust version of Akaike's model selection procedure for regression models is introduced and its relationship with robust testing procedures is discussed. 相似文献
7.
本文提出复合最小化平均分位数损失估计方法 (composite minimizing average check loss estimation,CMACLE)用于实现部分线性单指标模型(partial linear single-index models,PLSIM)的复合分位数回归(composite quantile regression,CQR).首先基于高维核函数构造参数部分的复合分位数回归意义下的相合估计,在此相合估计的基础上,通过采用指标核函数进一步得到参数和非参数函数的可达最优收敛速度的估计,并建立所得估计的渐近正态性,比较PLSIM的CQR估计和最小平均方差估计(MAVE)的相对渐近效率.进一步地,本文提出CQR框架下PLSIM的变量选择方法,证明所提变量选择方法的oracle性质.随机模拟和实例分析验证了所提方法在有限样本时的表现,证实了所提方法的优良性. 相似文献
8.
Semiparametric partially linear varying coefficient models (SPLVCM) are frequently used in statistical modeling. With high-dimensional covariates both in parametric and nonparametric part for SPLVCM, sparse modeling is often considered in practice. In this paper, we propose a new estimation and variable selection procedure based on modal regression, where the nonparametric functions are approximated by $B$ -spline basis. The outstanding merit of the proposed variable selection procedure is that it can achieve both robustness and efficiency by introducing an additional tuning parameter (i.e., bandwidth $h$ ). Its oracle property is also established for both the parametric and nonparametric part. Moreover, we give the data-driven bandwidth selection method and propose an EM-type algorithm for the proposed method. Monte Carlo simulation study and real data example are conducted to examine the finite sample performance of the proposed method. Both the simulation results and real data analysis confirm that the newly proposed method works very well. 相似文献
11.
In this paper, we consider improved estimation strategies for the parameter vector in multiple regression models with first-order random coefficient autoregressive errors (RCAR(1)). We propose a shrinkage estimation strategy and implement variable selection methods such as lasso and adaptive lasso strategies. The simulation results reveal that the shrinkage estimators perform better than both lasso and adaptive lasso when and only when there are many nuisance variables in the model. 相似文献
12.
Statistical Inference for Stochastic Processes - We consider the nonparametric robust estimation problem for regression models in continuous time with semi-Markov noises. An adaptive model... 相似文献
13.
We study a flexible class of nonproportional hazard function regression models in which the influence of the covariates splits into the sum of a parametric part and a time-dependent nonparametric part. We develop a method of covariate selection for the parametric part by adjusting for the implicit fitting of the nonparametric part. Asymptotic consistency of the proposed covariate selection method is established, leading to asymptotically normal estimators of both parametric and nonparametric parts of the model in the presence of covariate selection. The approach is applied to a real data set and a simulation study is presented. 相似文献
14.
With high-dimensional data, the number of covariates is considerably larger than the sample size. We propose a sound method for analyzing these data. It performs simultaneously clustering and variable selection. The method is inspired by the plaid model. It may be seen as a multiplicative mixture model that allows for overlapping clustering. Unlike conventional clustering, within this model an observation may be explained by several clusters. This characteristic makes it specially suitable for gene expression data. Parameter estimation is performed with the Monte Carlo expectation maximization algorithm and importance sampling. Using extensive simulations and comparisons with competing methods, we show the advantages of our methodology, in terms of both variable selection and clustering. An application of our approach to the gene expression data of kidney renal cell carcinoma taken from The Cancer Genome Atlas validates some previously identified cancer biomarkers. 相似文献
15.
We propose a novel extension of nonparametric multivariate finite mixture models by dropping the standard conditional independence assumption and incorporating the independent component analysis (ICA) structure instead. This innovation extends nonparametric mixture model estimation methods to situations in which conditional independence, a necessary assumption for the unique identifiability of the parameters in such models, is clearly violated. We formulate an objective function in terms of penalized smoothed Kullback–Leibler distance and introduce the nonlinear smoothed majorization-minimization independent component analysis algorithm for optimizing this function and estimating the model parameters. Our algorithm does not require any labeled observations a priori; it may be used for fully unsupervised clustering problems in a multivariate setting. We have implemented a practical version of this algorithm, which utilizes the FastICA algorithm, in the R package icamix. We illustrate this new methodology using several applications in unsupervised learning and image processing. 相似文献
16.
Advances in Data Analysis and Classification - A model is proposed to analyze longitudinal data where two response variables are available, one of which is a binary indicator of selection and the... 相似文献
17.
We propose a robust estimation procedure based on local Walsh-average regression(LWR) for single-index models. Our novel method provides a root-n consistent estimate of the single-index parameter under some mild regularity conditions; the estimate of the unknown link function converges at the usual rate for the nonparametric estimation of a univariate covariate. We theoretically demonstrate that the new estimators show significant efficiency gain across a wide spectrum of non-normal error distributions and have almost no loss of efficiency for the normal error. Even in the worst case, the asymptotic relative efficiency(ARE) has a lower bound compared with the least squares(LS) estimates; the lower bounds of the AREs are 0.864 and 0.8896 for the single-index parameter and nonparametric function, respectively. Moreover, the ARE of the proposed LWR-based approach versus the ARE of the LS-based method has an expression that is closely related to the ARE of the signed-rank Wilcoxon test as compared with the t-test. In addition, to obtain a sparse estimate of the single-index parameter, we develop a variable selection procedure by combining the estimation method with smoothly clipped absolute deviation penalty; this procedure is shown to possess the oracle property. We also propose a Bayes information criterion(BIC)-type criterion for selecting the tuning parameter and further prove its ability to consistently identify the true model. We conduct some Monte Carlo simulations and a real data analysis to illustrate the finite sample performance of the proposed methods. 相似文献
19.
The multinomial logit model is the most widely used model for the unordered multi-category responses. However, applications are typically restricted to the use of few predictors because in the high-dimensional case maximum likelihood estimates frequently do not exist. In this paper we are developing a boosting technique called multinomBoost that performs variable selection and fits the multinomial logit model also when predictors are high-dimensional. Since in multi-category models the effect of one predictor variable is represented by several parameters one has to distinguish between variable selection and parameter selection. A special feature of the approach is that, in contrast to existing approaches, it selects variables not parameters. The method can also distinguish between mandatory predictors and optional predictors. Moreover, it adapts to metric, binary, nominal and ordinal predictors. Regularization within the algorithm allows to include nominal and ordinal variables which have many categories. In the case of ordinal predictors the order information is used. The performance of boosting technique with respect to mean squared error, prediction error and the identification of relevant variables is investigated in a simulation study. The method is applied to the national Indonesia contraceptive prevalence survey and the identification of glass. Results are also compared with the Lasso approach which selects parameters. 相似文献
20.
Mixture of Experts(MoE) regression models are widely studied in statistics and machine learning for modeling heterogeneity in data for regression, clustering and classification.Laplace distribution is one of the most important statistical tools to analyze thick and tail data. Laplace Mixture of Linear Experts(LMoLE) regression models are based on the Laplace distribution which is more robust. Similar to modelling variance parameter in a homogeneous population, we propose and study a new novel class of models: heteroscedastic Laplace mixture of experts regression models to analyze the heteroscedastic data coming from a heterogeneous population in this paper. The issues of maximum likelihood estimation are addressed. In particular, Minorization-Maximization(MM) algorithm for estimating the regression parameters is developed. Properties of the estimators of the regression coefficients are evaluated through Monte Carlo simulations. Results from the analysis of two real data sets are presented. 相似文献
|