首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper, the conditional distance correlation (CDC) is used as a measure of correlation to develop a conditional feature screening procedure given some significant variables for ultrahigh-dimensional data. The proposed procedure is model free and is called conditional distance correlation-sure independence screening (CDC-SIS for short). That is, we do not specify any model structure between the response and the predictors, which is appealing in some practical problems of ultrahigh-dimensional data analysis. The sure screening property of the CDC-SIS is proved and a simulation study was conducted to evaluate the finite sample performances. Real data analysis is used to illustrate the proposed method. The results indicate that CDC-SIS performs well.  相似文献   

2.
The additive model is a more flexible nonparametric statistical model which allows a data-analytic transform of the covariates.When the number of covariates is big and grows exponentially with the sample size the urgent issue is to reduce dimensionality from high to a moderate scale. In this paper, we propose and investigate marginal empirical likelihood screening methods in ultra-high dimensional additive models. The proposed nonparametric screening method selects variables by ranking a measure of the marginal empirical likelihood ratio evaluated at zero to differentiate contributions of each covariate given to a response variable. We show that, under some mild technical conditions, the proposed marginal empirical likelihood screening methods have a sure screening property and the extent to which the dimensionality can be reduced is also explicitly quantified. We also propose a data-driven thresholding and an iterative marginal empirical likelihood methods to enhance the finite sample performance for fitting sparse additive models. Simulation results and real data analysis demonstrate the proposed methods work competitively and performs better than competitive methods in error of a heteroscedastic case.  相似文献   

3.
In this paper, we consider the ultra-high dimensional partially linear model, where the dimensionality p of linear component is much larger than the sample size n, and p can be as large as an exponential of the sample size n. Firstly, we transform the ultra-high dimensional partially linear model into the ultra-high dimensional linear model based the profile technique used in the semiparametric regression. Secondly, in order to finish the variable screening for high-dimensional linear component, we propose a variable screening method called as the profile greedy forward regression (PGFR) by combining the greedy algorithm with the forward regression (FR) method. The proposed PGFR method not only considers the correlation between the covariates, but also identifies all relevant predictors consistently and possesses the screening consistency property under the some regularity conditions. We further propose the BIC criterion to determine whether the selected model contains the true model with probability tending to one. Finally, some simulation studies and a real application are conducted to examine the finite sample performance of the proposed PGFR procedure.  相似文献   

4.
Feature screening plays an important role in ultrahigh dimensional data analysis. This paper is concerned with conditional feature screening when one is interested in detecting the association between the response and ultrahigh dimensional predictors (e.g., genetic makers) given a low-dimensional exposure variable (such as clinical variables or environmental variables). To this end, we first propose a new index to measure conditional independence, and further develop a conditional screening procedure based on the newly proposed index. We systematically study the theoretical property of the proposed procedure and establish the sure screening and ranking consistency properties under some very mild conditions. The newly proposed screening procedure enjoys some appealing properties. (a) It is model-free in that its implementation does not require a specification on the model structure; (b) it is robust to heavy-tailed distributions or outliers in both directions of response and predictors; and (c) it can deal with both feature screening and the conditional screening in a unified way. We study the finite sample performance of the proposed procedure by Monte Carlo simulations and further illustrate the proposed method through two real data examples.  相似文献   

5.
The curse of high-dimensionality has emerged in the statistical fields more and more frequently. Many techniques have been developed to address this challenge for classification problems. We propose a novel feature screening procedure for dichotomous response data. This new method can be implemented as easily as t-test marginal screening approach, and the proposed procedure is free of any subexponential tail probability conditions and moment requirement and not restricted in a specific model structure. We prove that our method possesses the sure screening property and also illustrate the effect of screening by Monte Carlo simulation and apply it to a real data example.  相似文献   

6.
Abstract

An essential feature of longitudinal data is the existence of autocorrelation among the observations from the same unit or subject. Two-stage random-effects linear models are commonly used to analyze longitudinal data. These models are not flexible enough, however, for exploring the underlying data structures and, especially, for describing time trends. Semi-parametric models have been proposed recently to accommodate general time trends. But these semi-parametric models do not provide a convenient way to explore interactions among time and other covariates although such interactions exist in many applications. Moreover, semi-parametric models require specifying the design matrix of the covariates (time excluded). We propose nonparametric models to resolve these issues. To fit nonparametric models, we use the novel technique of the multivariate adaptive regression splines for the estimation of mean curve and then apply an EM-like iterative procedure for covariance estimation. After giving a general algorithm of model building, we show how to design a fast algorithm. We use both simulated and published data to illustrate the use of our proposed method.  相似文献   

7.
Finite mixture regression (FMR) models are frequently used in statistical modeling, often with many covariates with low significance. Variable selection techniques can be employed to identify the covariates with little influence on the response. The problem of variable selection in FMR models is studied here. Penalized likelihood-based approaches are sensitive to data contamination, and their efficiency may be significantly reduced when the model is slightly misspecified. We propose a new robust variable selection procedure for FMR models. The proposed method is based on minimum-distance techniques, which seem to have some automatic robustness to model misspecification. We show that the proposed estimator has the variable selection consistency and oracle property. The finite-sample breakdown point of the estimator is established to demonstrate its robustness. We examine small-sample and robustness properties of the estimator using a Monte Carlo study. We also analyze a real data set.  相似文献   

8.
Feature screening plays an important role in dimension reduction for ultrahigh-dimensional data. In this article, we introduce a new feature screening method and establish its sure independence screening property under the ultrahigh-dimensional setting. The proposed method works based on the nonparanormal transformation and Henze–Zirkler’s test, that is, it first transforms the response variable and features to Gaussian random variables using the nonparanormal transformation and then tests the dependence between the response variable and features using the Henze–Zirkler’s test. The proposed method enjoys at least two merits. First, it is model-free, which avoids the specification of a particular model structure. Second, it is condition-free, which does not require any extra conditions except for some regularity conditions for high-dimensional feature screening. The numerical results indicate that, compared to the existing methods, the proposed method is more robust to the data generated from heavy-tailed distributions and/or complex models with interaction variables. The proposed method is applied to screening of anticancer drug response genes. Supplementary material for this article is available online.  相似文献   

9.
With the rapid-growth-in-size scientific data in various disciplines, feature screening plays an important role to reduce the high-dimensionality to a moderate scale in many scientific fields. In this paper, we introduce a unified and robust model-free feature screening approach for high-dimensional survival data with censoring, which has several advantages: it is a model-free approach under a general model framework, and hence avoids the complication to specify an actual model form with huge number of candidate variables; under mild conditions without requiring the existence of any moment of the response, it enjoys the ranking consistency and sure screening properties in ultra-high dimension. In particular, we impose a conditional independence assumption of the response and the censoring variable given each covariate, instead of assuming the censoring variable is independent of the response and the covariates. Moreover, we also propose a more robust variant to the new procedure, which possesses desirable theoretical properties without any finite moment condition of the predictors and the response. The computation of the newly proposed methods does not require any complicated numerical optimization and it is fast and easy to implement. Extensive numerical studies demonstrate that the proposed methods perform competitively for various configurations. Application is illustrated with an analysis of a genetic data set.  相似文献   

10.
An alternative to the accelerated failure time model is to regress the median of the failure time on the covariates. In the recent years, censored median regression models have been shown to be useful for analyzing a variety of censored survival data with the robustness property. Based on missing information principle, a semiparametric inference procedure for regression parameter has been developed when censoring variable depends on continuous covariate. In order to improve the low coverage accuracy of such procedure, we apply an empirical likelihood ratio method (EL) to the model and derive the limiting distributions of the estimated and adjusted empirical likelihood ratios for the vector of regression parameter. Two kinds of EL confidence regions for the unknown vector of regression parameters are obtained accordingly. We conduct an extensive simulation study to compare the performance of the proposed methods with that normal approximation based method. The simulation results suggest that the EL methods outperform the normal approximation based method in terms of coverage probability. Finally, we make some discussions about our methods.  相似文献   

11.
This paper focuses on the variable selections for semiparametric varying coefficient partially linear models when the covariates in the parametric and nonparametric components are all measured with errors. A bias-corrected variable selection procedure is proposed by combining basis function approximations with shrinkage estimations. With appropriate selection of the tuning parameters, the consistency of the variable selection procedure and the oracle property of the regularized estimators are established. A simulation study and a real data application are undertaken to evaluate the finite sample performance of the proposed method.  相似文献   

12.
Semiparametric partially linear varying coefficient models (SPLVCM) are frequently used in statistical modeling. With high-dimensional covariates both in parametric and nonparametric part for SPLVCM, sparse modeling is often considered in practice. In this paper, we propose a new estimation and variable selection procedure based on modal regression, where the nonparametric functions are approximated by $B$ -spline basis. The outstanding merit of the proposed variable selection procedure is that it can achieve both robustness and efficiency by introducing an additional tuning parameter (i.e., bandwidth $h$ ). Its oracle property is also established for both the parametric and nonparametric part. Moreover, we give the data-driven bandwidth selection method and propose an EM-type algorithm for the proposed method. Monte Carlo simulation study and real data example are conducted to examine the finite sample performance of the proposed method. Both the simulation results and real data analysis confirm that the newly proposed method works very well.  相似文献   

13.
主要研究因变量存在缺失且协变量部分包含测量误差情形下,如何对变系数部分线性模型同时进行参数估计和变量选择.我们利用插补方法来处理缺失数据,并结合修正的profile最小二乘估计和SCAD惩罚对参数进行估计和变量选择.并且证明所得的估计具有渐近正态性和Oracle性质.通过数值模拟进一步研究所得估计的有限样本性质.  相似文献   

14.
We consider the problem of variable selection for single-index varying-coefficient model, and present a regularized variable selection procedure by combining basis function approximations with SCAD penalty. The proposed procedure simultaneously selects significant covariates with functional coefficients and local significant variables with parametric coefficients. With appropriate selection of the tuning parameters, the consistency of the variable selection procedure and the oracle property of the estimators are established. The proposed method can naturally be applied to deal with pure single-index model and varying-coefficient model. Finite sample performances of the proposed method are illustrated by a simulation study and the real data analysis.  相似文献   

15.
It is rather challenging for current variable selectors to handle situations where the number of covariates under consideration is ultra-high. Consider a motivating clinical trial of the drug bortezomib for the treatment of multiple myeloma, where overall survival and expression levels of 44760 probesets were measured for each of 80 patients with the goal of identifying genes that predict survival after treatment. This dataset defies analysis even with regularized regression. Some remedies have been proposed for the linear model and for generalized linear models, but there are few solutions in the survival setting and, to our knowledge, no theoretical support. Furthermore, existing strategies often involve tuning parameters that are difficult to interpret. In this paper we propose and theoretically justify a principled method for reducing dimensionality in the analysis of censored data by selecting only the important covariates. Our procedure involves a tuning parameter that has a simple interpretation as the desired false positive rate of this selection. We present simulation results and apply the proposed procedure to analyze the aforementioned myeloma study.  相似文献   

16.
We study partial linear single index models when the response and the covariates in the parametric part are measured with errors and distorted by unknown functions of commonly observable confounding variables, and propose a semiparametric covariate-adjusted estimation procedure. We apply the minimum average variance estimation method to estimate the parameters of interest. This is different from all existing covariate-adjusted methods in the literature. Asymptotic properties of the proposed estimators are established. Moreover, we also study variable selection by adopting the coordinate-independent sparse estimation to select all relevant but distorted covariates in the parametric part. We show that the resulting sparse estimators can exclude all irrelevant covariates with probability approaching one. A simulation study is conducted to evaluate the performance of the proposed methods and a real data set is analyzed for illustration.  相似文献   

17.
The feature selection characterized by relatively small sample size and extremely high-dimensional feature space is common in many areas of contemporary statistics. The high dimensionality of the feature space causes serious difficulties: (i) the sample correlations between features become high even if the features are stochastically independent; (ii) the computation becomes intractable. These difficulties make conventional approaches either inapplicable or inefficient. The reduction of dimensionality of the feature space followed by low dimensional approaches appears the only feasible way to tackle the problem. Along this line, we develop in this article a tournament screening cum EBIC approach for feature selection with high dimensional feature space. The procedure of tournament screening mimics that of a tournament. It is shown theoretically that the tournament screening has the sure screening property, a necessary property which should be satisfied by any valid screening procedure. It is demonstrated by numerical studies that the tournament screening cum EBIC approach enjoys desirable properties such as having higher positive selection rate and lower false discovery rate than other approaches. Zehua Chen was supported by Singapore Ministry of Educations ACRF Tier 1 (Grant No. R-155-000-065-112). Jiahua Chen was supported by the National Science and Engineering Research Countil of Canada and MITACS, Canada.  相似文献   

18.
??This paper develops a covariate-adjusted precision matrix estimation using a two-stage estimation procedure. Firstly, we identify the relevant covariates that affect the means by a joint l_1 penalization. Then, the estimated regression coefficients are used to estimate the mean values in a multivariate sub-Gaussian model in order to estimate the sparse precision matrix through a Lasso penalized D-trace loss. Under some assumptions, we establish the convergence rate of the precision matrix estimation under different norms and demonstrate the sparse recovery property with probability converging to one. Simulation shows that our methods have the finite-sample performance compared with other methods.  相似文献   

19.
Degradation data have been widely used to estimate product reliability. Because of technology advancement, time‐varying usage and environmental variables, which are called dynamic covariates, can be easily recorded nowadays, in addition to the traditional degradation measurements. The use of dynamic covariates is appealing because they have the potential to explain more variability in degradation paths. We propose a class of general path models to incorporate dynamic covariates for modeling of degradation paths. Physically motivated nonlinear functions are used to describe the degradation paths, and random effects are used to describe unit‐to‐unit variability. The covariate effects are modeled by shape‐restricted splines. The estimation of unknown model parameters is challenging because of the involvement of nonlinear relationships, random effects, and shaped‐restricted splines. We develop an efficient procedure for parameter estimations. The performance of the proposed method is evaluated by simulations. An outdoor coating weathering dataset is used to illustrate the proposed method. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

20.
This paper develops a covariate-adjusted precision matrix estimation using a two-stage estimation procedure. Firstly, we identify the relevant covariates that affect the means by a joint l_1 penalization. Then, the estimated regression coefficients are used to estimate the mean values in a multivariate sub-Gaussian model in order to estimate the sparse precision matrix through a Lasso penalized D-trace loss. Under some assumptions, we establish the convergence rate of the precision matrix estimation under different norms and demonstrate the sparse recovery property with probability converging to one. Simulation shows that our methods have the finite-sample performance compared with other methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号