首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The accelerated failure time model provides a natural formulation of the effects of covariates on the failure time variable. The presence of censoring poses major challenges in the semi-parametric analysis. The existing semi-parametric estimators are computationally intractable. In this article we propose an unbiased transformation for the potential censored response variable, thus least square estimators of regression parameters can be gotten easily. The resulting estimators are consistent and asymptotically normal. Based on these, we can get a strongly consistent K-M estimator for the distribution of random error. Extensive simulation studies show that the asymptotic approximations are accurate in practical situations.  相似文献   

2.
The censored linear regression model, also referred to as the accelerated failure time (AFT) model when the logarithm of the survival time is used as the response variable, is widely seen as an alternative to the popular Cox model when the assumption of proportional hazards is questionable. Buckley and James [Linear regression with censored data, Biometrika 66 (1979) 429-436] extended the least squares estimator to the semiparametric censored linear regression model in which the error distribution is completely unspecified. The Buckley-James estimator performs well in many simulation studies and examples. The direct interpretation of the AFT model is also more attractive than the Cox model, as Cox has pointed out, in practical situations. However, the application of the Buckley-James estimation was limited in practice mainly due to its illusive variance. In this paper, we use the empirical likelihood method to derive a new test and confidence interval based on the Buckley-James estimator of the regression coefficient. A standard chi-square distribution is used to calculate the P-value and the confidence interval. The proposed empirical likelihood method does not involve variance estimation. It also shows much better small sample performance than some existing methods in our simulation studies.  相似文献   

3.
Finite mixture regression (FMR) models are frequently used in statistical modeling, often with many covariates with low significance. Variable selection techniques can be employed to identify the covariates with little influence on the response. The problem of variable selection in FMR models is studied here. Penalized likelihood-based approaches are sensitive to data contamination, and their efficiency may be significantly reduced when the model is slightly misspecified. We propose a new robust variable selection procedure for FMR models. The proposed method is based on minimum-distance techniques, which seem to have some automatic robustness to model misspecification. We show that the proposed estimator has the variable selection consistency and oracle property. The finite-sample breakdown point of the estimator is established to demonstrate its robustness. We examine small-sample and robustness properties of the estimator using a Monte Carlo study. We also analyze a real data set.  相似文献   

4.
Length-biased data are encountered frequently due to prevalent cohort sampling in follow-up studies. Quantile regression provides great flexibility for assessing covariate effects on survival time, and is a useful alternative to Cox’s proportional hazards model and the accelerated failure time (AFT) model for survival analysis. In this paper, we develop a Buckley–James-type estimator for right-censored length-biased data under a quantile regression model. The problem of informative right-censoring of length-biased data induced by prevalent cohort sampling must be handled. Following on from the generalization of the Buckley–James-type estimator under the AFT model proposed by Ning et al. (Biometrics 67:1369–1378, 2011), we propose a Buckley–James-type estimating equation for regression coefficients in the quantile regression model and develop an iterative algorithm to obtain the estimates. The resulting estimator is consistent and asymptotically normal. We evaluate the performance of the proposed estimator on finite samples using extensive simulation studies. Analysis of real data is presented to illustrate our proposed methodology.  相似文献   

5.

In this paper, we investigate the quantile varying coefficient model for longitudinal data, where the unknown nonparametric functions are approximated by polynomial splines and the estimators are obtained by minimizing the quadratic inference function. The theoretical properties of the resulting estimators are established, and they achieve the optimal convergence rate for the nonparametric functions. Since the objective function is non-smooth, an estimation procedure is proposed that uses induced smoothing and we prove that the smoothed estimator is asymptotically equivalent to the original estimator. Moreover, we propose a variable selection procedure based on the regularization method, which can simultaneously estimate and select important nonparametric components and has the asymptotic oracle property. Extensive simulations and a real data analysis show the usefulness of the proposed method.

  相似文献   

6.
为解决大规模数据在进行回归分析时存在的计算内存不足和运行时间较长的问题,提出两个新的回归分析方法:先筛选后抽样的大规模数据L1惩罚分位数回归方法(FSSLQR)和先抽样后筛选的大规模数据L1惩罚分位数回归方法(SFSLQR),其数值模拟和实际应用结果表明:FSSLQR和SFSLQR方法不仅能够显著降低计算内存和运行时间,而且其估计预测和变量选择的结果与全量L1惩罚分位数回归基本一致。此外,与Xu等(2018)提出的大规模数据的L1惩罚分位数回归方法(SLQR)相比,FSSLQR和SFSLQR方法在估计预测、变量选择和运行时间等方面都更具优势。  相似文献   

7.
Many problems in genomics are related to variable selection where high-dimensional genomic data are treated as covariates. Such genomic covariates often have certain structures and can be represented as vertices of an undirected graph. Biological processes also vary as functions depending upon some biological state, such as time. High-dimensional variable selection where covariates are graph-structured and underlying model is nonparametric presents an important but largely unaddressed statistical challenge. Motivated by the problem of regression-based motif discovery, we consider the problem of variable selection for high-dimensional nonparametric varying-coefficient models and introduce a sparse structured shrinkage (SSS) estimator based on basis function expansions and a novel smoothed penalty function. We present an efficient algorithm for computing the SSS estimator. Results on model selection consistency and estimation bounds are derived. Moreover, finite-sample performances are studied via simulations, and the effects of high-dimensionality and structural information of the covariates are especially highlighted. We apply our method to motif finding problem using a yeast cell-cycle gene expression dataset and word counts in genes’ promoter sequences. Our results demonstrate that the proposed method can result in better variable selection and prediction for high-dimensional regression when the underlying model is nonparametric and covariates are structured. Supplemental materials for the article are available online.  相似文献   

8.
Many problems in genomics are related to variable selection where high-dimensional genomic data are treated as covariates. Such genomic covariates often have certain structures and can be represented as vertices of an undirected graph. Biological processes also vary as functions depending upon some biological state, such as time. High-dimensional variable selection where covariates are graph-structured and underlying model is nonparametric presents an important but largely unaddressed statistical challenge. Motivated by the problem of regression-based motif discovery, we consider the problem of variable selection for high-dimensional nonparametric varying-coefficient models and introduce a sparse structured shrinkage (SSS) estimator based on basis function expansions and a novel smoothed penalty function. We present an efficient algorithm for computing the SSS estimator. Results on model selection consistency and estimation bounds are derived. Moreover, finite-sample performances are studied via simulations, and the effects of high-dimensionality and structural information of the covariates are especially highlighted. We apply our method to motif finding problem using a yeast cell-cycle gene expression dataset and word counts in genes' promoter sequences. Our results demonstrate that the proposed method can result in better variable selection and prediction for high-dimensional regression when the underlying model is nonparametric and covariates are structured. Supplemental materials for the article are available online.  相似文献   

9.
The censored single-index model provides a flexible way for modelling the association between a response and a set of predictor variables when the response variable is randomly censored and the link function is unknown. It presents a technique for “dimension reduction” in semiparametric censored regression models and generalizes the existing accelerated failure time models for survival analysis. This paper proposes two methods for estimation of single-index models with randomly censored samples. We first transform the censored data into synthetic data or pseudo-responses unbiasedly, then obtain estimates of the index coefficients by the rOPG or rMAVE procedures of Xia (2006) [1]. Finally, we estimate the unknown nonparametric link function using techniques for univariate censored nonparametric regression. The estimators for the index coefficients are shown to be root-n consistent and asymptotically normal. In addition, the estimator for the unknown regression function is a local linear kernel regression estimator and can be estimated with the same efficiency as the parameters are known. Monte Carlo simulations are conducted to illustrate the proposed methodologies.  相似文献   

10.
A cured model is a useful approach for analysing failure time data in which some subjects could eventually experience and others never experience the event of interest. All subjects in the test belong to one of the two groups: the susceptible group and the non-susceptible group. There has been considerable progress in the development of semi-parametric models for regression analysis of time-to-event data. However, most of the current work focuses on right-censored data, especially when the population contains a non-ignorable cured subgroup. In this paper, we propose a semi-parametric cure model for current status data. In general, treatments are developed to both increase the patients' chances of being cured and prolong the survival time among non-cured patients. A logistic regression model is proposed for whether the subject is in the susceptible group. An accelerated failure time regression model is proposed for the event time when the subject is in the non-susceptible group. An EM algorithm is used to maximize the log-likelihood of the observed data. Simulation results show that the proposed method can get efficient estimations.  相似文献   

11.
Regularization methods, including Lasso, group Lasso, and SCAD, typically focus on selecting variables with strong effects while ignoring weak signals. This may result in biased prediction, especially when weak signals outnumber strong signals. This paper aims to incorporate weak signals in variable selection, estimation, and prediction. We propose a two‐stage procedure, consisting of variable selection and postselection estimation. The variable selection stage involves a covariance‐insured screening for detecting weak signals, whereas the postselection estimation stage involves a shrinkage estimator for jointly estimating strong and weak signals selected from the first stage. We term the proposed method as the covariance‐insured screening‐based postselection shrinkage estimator. We establish asymptotic properties for the proposed method and show, via simulations, that incorporating weak signals can improve estimation and prediction performance. We apply the proposed method to predict the annual gross domestic product rates based on various socioeconomic indicators for 82 countries.  相似文献   

12.
For censored response variable against projected co-variable, a generalized linear model with an unknown link function can cover almost all existing models under censorship. Its special cases include the accelerated failure time model with censored data. Such a model in the uncensored case is called the single-index model in econometrics. In this paper, we systematically study the asymptotic properties. We derive the central limit theorem and the law of the iterated logarithm for an estimator of the direction parameter. We also obtain the optimal convergence rate of an estimator of the unknown link function in the model.   相似文献   

13.
In this article, we propose a failure rate based step-stress accelerated life testing (SSALT) model assuming that the time-to-event distribution belongs to a fairly general family of distributions and the underlying population consists of long term survivors. With increase in stress levels, it is expected that the mean time to the event of interest gets shortened leading to an order restriction among the mean times-to-event. We propose here a method of obtaining order restricted maximum likelihood estimators (MLEs) based on expectation maximization (EM) algorithm coupled with the method of generalized isotonic regression technique. Additionally, we address the testing of hypothesis problem for the presence of long term survivors in the underlying population based on both asymptotic and non-asymptotic approaches. To illustrate the effectiveness of the proposed method, extensive simulation experiments are carried out and a real data set is analyzed.  相似文献   

14.
Estimating the bivariate survival function has been a major goal of many researchers. For that purpose many methods and techniques have been published. However, most of these techniques and methods rely heavily on bivariate failure data. There are situations in which failure time data are difficult to obtain and thus there is a growing need to assess the bivariate survival function for such cases. In this paper we propose two techniques for generating families of bivariate processes for describing several variables that can be used to indirectly assess the bivariate survival function. An estimation procedure is provided and a simulation study is conducted to evaluate the performance of our proposed estimator.  相似文献   

15.
A commonly used semiparametric model is considered. We adopt two difference based estimators of the linear component of the model and propose corresponding thresholding estimators that can be used for variable selection. For each thresholding estimator, variable selection in the linear component is developed and consistency of the variable selection procedure is shown. We evaluate our method in a simulation study and implement it on a real data set.  相似文献   

16.
We propose a penalized likelihood method that simultaneously fits the multinomial logistic regression model and combines subsets of the response categories. The penalty is nondifferentiable when pairs of columns in the optimization variable are equal. This encourages pairwise equality of these columns in the estimator, which corresponds to response category combination. We use an alternating direction method of multipliers algorithm to compute the estimator and we discuss the algorithm’s convergence. Prediction and model selection are also addressed. Supplemental materials for this article are available online.  相似文献   

17.
Semiparametric linear transformation models have received much attention due to their high flexibility in modeling survival data. A useful estimating equation procedure was recently proposed by Chen et al. (2002) [21] for linear transformation models to jointly estimate parametric and nonparametric terms. They showed that this procedure can yield a consistent and robust estimator. However, the problem of variable selection for linear transformation models has been less studied, partially because a convenient loss function is not readily available under this context. In this paper, we propose a simple yet powerful approach to achieve both sparse and consistent estimation for linear transformation models. The main idea is to derive a profiled score from the estimating equation of Chen et al. [21], construct a loss function based on the profile scored and its variance, and then minimize the loss subject to some shrinkage penalty. Under regularity conditions, we have shown that the resulting estimator is consistent for both model estimation and variable selection. Furthermore, the estimated parametric terms are asymptotically normal and can achieve a higher efficiency than that yielded from the estimation equations. For computation, we suggest a one-step approximation algorithm which can take advantage of the LARS and build the entire solution path efficiently. Performance of the new procedure is illustrated through numerous simulations and real examples including one microarray data.  相似文献   

18.
In high‐dimensional data settings where p  ? n , many penalized regularization approaches were studied for simultaneous variable selection and estimation. However, with the existence of covariates with weak effect, many existing variable selection methods, including Lasso and its generations, cannot distinguish covariates with weak and no contribution. Thus, prediction based on a subset model of selected covariates only can be inefficient. In this paper, we propose a post selection shrinkage estimation strategy to improve the prediction performance of a selected subset model. Such a post selection shrinkage estimator (PSE) is data adaptive and constructed by shrinking a post selection weighted ridge estimator in the direction of a selected candidate subset. Under an asymptotic distributional quadratic risk criterion, its prediction performance is explored analytically. We show that the proposed post selection PSE performs better than the post selection weighted ridge estimator. More importantly, it improves the prediction performance of any candidate subset model selected from most existing Lasso‐type variable selection methods significantly. The relative performance of the post selection PSE is demonstrated by both simulation studies and real‐data analysis. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

19.
Additive hazards model with random effects is proposed for modelling the correlated failure time data when focus is on comparing the failure times within clusters and on estimating the correlation between failure times from the same cluster, as well as the marginal regression parameters. Our model features that, when marginalized over the random effect variable, it still enjoys the structure of the additive hazards model. We develop the estimating equations for inferring the regression parameters. The proposed estimators are shown to be consistent and asymptotically normal under appropriate regularity conditions. Furthermore, the estimator of the baseline hazards function is proposed and its asymptotic properties are also established. We propose a class of diagnostic methods to assess the overall fitting adequacy of the additive hazards model with random effects. We conduct simulation studies to evaluate the finite sample behaviors of the proposed estimators in various scenarios. Analysis of the Diabetic Retinopathy Study is provided as an illustration for the proposed method.  相似文献   

20.
植物遗传与基因组学研究表明许多重要的农艺性状有影响的基因位点不是稀疏的,受到大量微效基因的影响,并且还存在基因交互项的影响.本文基于重要油料作物油菜的花期数据,研究中等稀疏条件下的基因选择问题,提出了一种两步Bayes模型选择方法.考虑基因间的交互作用,模型的维数急剧增长,加上数据结构特别,通常的变量选择方法效果不好.本文提出两步变量选择的方法:首先利用Kolmogorov特征扫描方法筛除那些明显不重要的变量,达到降维的目的;其次,在选出的位点中考虑交互作用.为了克服Bayes方法计算速度慢的问题,本文在模型中引入指示变量,通过估计指示变量的后验分布选择模型.模拟结果表明本文提出的方法在预测精度和计算稳定性上有良好的表现,与不加指示变量的Bayes方法相比,在预测精度上有很大的提高.最后,利用本文提出的方法分析一个油菜花期数据,发现了一些交互效应的基因位点.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号