首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper is concerned with feature screening for ultrahigh-dimensional covariates under general varying-coefficient models. With the sparsity principle and based on the conditional distance correlation, we develop a new marginal feature screening procedure called CDC-SIS to select significant covariates and show that it possesses the sure screening property and ranking consistency property under some regularity conditions. The proposed procedure enjoys two appealing merits. First, the model we considered is more flexible than traditional varying-coefficients regression models, so the method can be used in a wider range of applications. Second, CDC-SIS can be used directly to deal with grouped predictor variables and multivariate responses. We assess the finite sample properties of the proposed procedure by Monte Carlo studies, and illustrate our method by an empirical analysis of a real data set. Compared with other similar works, our procedure yields better performance.  相似文献   

2.
Using the so-called martingale difference correlation (MDC), we propose a novel censored-conditional-quantile screening approach for ultrahigh-dimensional survival data with heterogeneity (which is often present in such data). By incorporating a weighting scheme, this method is a natural extension of MDC-based conditional quantile screening, as considered in Shao and Zhang (2014), to handle ultrahigh-dimensional survival data. The proposed screening procedure has a sure-screening property under certain technical conditions and an excellent capability of detecting the nonlinear relationship between independent and censored dependent variables. Both simulation results and an analysis of real data demonstrate the effectiveness of the new censored conditional quantile-screening procedure.  相似文献   

3.
Feature screening plays an important role in dimension reduction for ultrahigh-dimensional data. In this article, we introduce a new feature screening method and establish its sure independence screening property under the ultrahigh-dimensional setting. The proposed method works based on the nonparanormal transformation and Henze–Zirkler’s test, that is, it first transforms the response variable and features to Gaussian random variables using the nonparanormal transformation and then tests the dependence between the response variable and features using the Henze–Zirkler’s test. The proposed method enjoys at least two merits. First, it is model-free, which avoids the specification of a particular model structure. Second, it is condition-free, which does not require any extra conditions except for some regularity conditions for high-dimensional feature screening. The numerical results indicate that, compared to the existing methods, the proposed method is more robust to the data generated from heavy-tailed distributions and/or complex models with interaction variables. The proposed method is applied to screening of anticancer drug response genes. Supplementary material for this article is available online.  相似文献   

4.
Feature screening plays an important role in ultrahigh dimensional data analysis. This paper is concerned with conditional feature screening when one is interested in detecting the association between the response and ultrahigh dimensional predictors (e.g., genetic makers) given a low-dimensional exposure variable (such as clinical variables or environmental variables). To this end, we first propose a new index to measure conditional independence, and further develop a conditional screening procedure based on the newly proposed index. We systematically study the theoretical property of the proposed procedure and establish the sure screening and ranking consistency properties under some very mild conditions. The newly proposed screening procedure enjoys some appealing properties. (a) It is model-free in that its implementation does not require a specification on the model structure; (b) it is robust to heavy-tailed distributions or outliers in both directions of response and predictors; and (c) it can deal with both feature screening and the conditional screening in a unified way. We study the finite sample performance of the proposed procedure by Monte Carlo simulations and further illustrate the proposed method through two real data examples.  相似文献   

5.
Acta Mathematicae Applicatae Sinica, English Series - In this paper, we study the sure independence screening of ultrahigh-dimensional censored data with varying coefficient single-index model....  相似文献   

6.
High-dimensional data have frequently been collected in many scientific areas including genomewide association study, biomedical imaging, tomography, tumor classifications, and finance. Analysis of highdimensional data poses many challenges for statisticians. Feature selection and variable selection are fundamental for high-dimensional data analysis. The sparsity principle, which assumes that only a small number of predictors contribute to the response, is frequently adopted and deemed useful in the analysis of high-dimensional data.Following this general principle, a large number of variable selection approaches via penalized least squares or likelihood have been developed in the recent literature to estimate a sparse model and select significant variables simultaneously. While the penalized variable selection methods have been successfully applied in many highdimensional analyses, modern applications in areas such as genomics and proteomics push the dimensionality of data to an even larger scale, where the dimension of data may grow exponentially with the sample size. This has been called ultrahigh-dimensional data in the literature. This work aims to present a selective overview of feature screening procedures for ultrahigh-dimensional data. We focus on insights into how to construct marginal utilities for feature screening on specific models and motivation for the need of model-free feature screening procedures.  相似文献   

7.
Lin and Zhang (J. Roy. Statist. Soc. Ser. B 61 (1999) 381) proposed the generalized additive mixed model (GAMM) as a framework for analysis of correlated data, where normally distributed random effects are used to account for correlation in the data, and proposed to use double penalized quasi-likelihood (DPQL) to estimate the nonparametric functions in the model and marginal likelihood to estimate the smoothing parameters and variance components simultaneously. However, the normal distributional assumption for the random effects may not be realistic in many applications, and it is unclear how violation of this assumption affects ensuing inferences for GAMMs. For a particular class of GAMMs, we propose a conditional estimation procedure built on a conditional likelihood for the response given a sufficient statistic for the random effect, treating the random effect as a nuisance parameter, which thus should be robust to its distribution. In extensive simulation studies, we assess performance of this estimator under a range of conditions and use it as a basis for comparison to DPQL to evaluate the impact of violation of the normality assumption. The procedure is illustrated with application to data from the Multicenter AIDS Cohort Study (MACS).  相似文献   

8.
In this paper, we consider the ultra-high dimensional partially linear model, where the dimensionality p of linear component is much larger than the sample size n, and p can be as large as an exponential of the sample size n. Firstly, we transform the ultra-high dimensional partially linear model into the ultra-high dimensional linear model based the profile technique used in the semiparametric regression. Secondly, in order to finish the variable screening for high-dimensional linear component, we propose a variable screening method called as the profile greedy forward regression (PGFR) by combining the greedy algorithm with the forward regression (FR) method. The proposed PGFR method not only considers the correlation between the covariates, but also identifies all relevant predictors consistently and possesses the screening consistency property under the some regularity conditions. We further propose the BIC criterion to determine whether the selected model contains the true model with probability tending to one. Finally, some simulation studies and a real application are conducted to examine the finite sample performance of the proposed PGFR procedure.  相似文献   

9.
高维回归分析的变量选择问题是目前统计学研究的一个热点和难点问题.提出了一个基于条件分布函数的相关性度量准则,并在此基础上提出三种变量选择方法.与现有的方法相比,提出的方法不依赖于统计模型,可以适用于线性模型和非参数可加模型.数值模拟结果表明,即使协变量之间存在一定的相关性,方法也有较为满意的表现.  相似文献   

10.
Directional distance functions provide very flexible tools for investigating the performance of Decision Making Units (DMUs). Their flexibility relies on their ability to handle undesirable outputs and to account for non-discretionary inputs and/or outputs by fixing zero values in some elements of the directional vector.  and  indicate how the statistical properties of Farrell–Debreu type of radial efficiency measures can be transferred to directional distances. Moreover, robust versions of these distances are also available, for conditional and unconditional measures. B?din, Daraio, and Simar (2012) have shown how conditional radial distances are useful to investigate the effect of environmental factors on the production process. In this paper we develop the operational aspects for computing conditional and unconditional directional distances and their robust versions, in particular when some of the elements of the directional vector are fixed at zero. After that, we show how the approach of B?din et al. (2012) can be adapted in a directional distance framework, including bandwidth selection and two-stage regression of conditional efficiency scores. Finally, we suggest a procedure, based on bootstrap techniques, for testing the significance of environmental factors on directional efficiency scores. The procedure is illustrated through simulated and real data.  相似文献   

11.
With the rapid-growth-in-size scientific data in various disciplines, feature screening plays an important role to reduce the high-dimensionality to a moderate scale in many scientific fields. In this paper, we introduce a unified and robust model-free feature screening approach for high-dimensional survival data with censoring, which has several advantages: it is a model-free approach under a general model framework, and hence avoids the complication to specify an actual model form with huge number of candidate variables; under mild conditions without requiring the existence of any moment of the response, it enjoys the ranking consistency and sure screening properties in ultra-high dimension. In particular, we impose a conditional independence assumption of the response and the censoring variable given each covariate, instead of assuming the censoring variable is independent of the response and the covariates. Moreover, we also propose a more robust variant to the new procedure, which possesses desirable theoretical properties without any finite moment condition of the predictors and the response. The computation of the newly proposed methods does not require any complicated numerical optimization and it is fast and easy to implement. Extensive numerical studies demonstrate that the proposed methods perform competitively for various configurations. Application is illustrated with an analysis of a genetic data set.  相似文献   

12.
The dynamic conditional correlation(DCC) model has been widely used for modeling the conditional correlation of multivariate time series by Engle(2002). However, the stationarity conditions have been established only recently and the asymptotic theory of parameter estimation for the DCC model has not yet to be fully discussed. In this paper, we propose an alternative model, namely the scalar dynamic conditional correlation(SDCC) model. Sufficient and easily-checked conditions for stationarity, geometric ergodicity, andβ-mixing with exponential-decay rates are provided. We then show the strong consistency and asymptotic normality of the quasi-maximum-likelihood estimator(QMLE) of the model parameters under regular conditions.The asymptotic results are illustrated by Monte Carlo experiments. As a real-data example, the proposed SDCC model is applied to analyzing the daily returns of the FSTE(financial times and stock exchange) 100 index and FSTE 100 futures. Our model improves the performance of the DCC model in the sense that the Li-Mc Leod statistic of the SDCC model is much smaller and the hedging efficiency is higher.  相似文献   

13.
The first-order nonlinear autoregressive model is considered and a semiparametric method is proposed to estimate regression function. In the presented model, dependent errors are defined as first-order autoregressive AR(1). The conditional least squares method is used for parametric estimation and the nonparametric kernel approach is applied to estimate regression adjustment. In this case, some asymptotic behaviors and simulated results for the semiparametric method are presented. Furthermore, the method is applied for the financial data in Iran’s Tejarat-Bank.  相似文献   

14.
Missing data mechanism often depends on the values of the responses, which leads to nonignorable nonresponses. In such a situation, inference based on approaches that ignore the missing data mechanism could not be valid. A crucial step is to model the nature of missingness. We specify a parametric model for missingness mechanism, and then propose a conditional score function approach for estimation. This approach imputes the score function by taking the conditional expectation of the score function for the missing data given the available information. Inference procedure is then followed by replacing unknown terms with the related nonparametric estimators based on the observed data. The proposed score function does not suffer from the non-identifiability problem, and the proposed estimator is shown to be consistent and asymptotically normal. We also construct a confidence region for the parameter of interest using empirical likelihood method. Simulation studies demonstrate that the proposed inference procedure performs well in many settings. We apply the proposed method to a data set from research in a growth hormone and exercise intervention study.  相似文献   

15.
Statistical inference can be over optimistic and even misleading based on a selected model due to the uncertainty of the model selection procedure, especially in the high-dimensional data analysis. In this article, we propose a bootstrap-based tilted correlation screening learning (TCSL) algorithm to alleviate this uncertainty. The algorithm is inspired by the recently proposed variable selection method, TCS algorithm, which screens variables via tilted correlation. Our algorithm can reduce the prediction error and make the interpretation more reliable. The other gain of our algorithm is the reduced computational cost compared with the TCS algorithm when the dimension is large. Extensive simulation examples and the analysis of one real dataset are conducted to exhibit the good performance of our algorithm. Supplementary materials for this article are available online.  相似文献   

16.
This paper introduces some methods for outlier identification in the regression setting, motivated by the analysis of steelmaking process data. The proposed methodology extends to the regression setting the boxplot rule, commonly used for outlier screening with univariate data. The focus here is on bivariate settings with a single covariate, but extensions are possible. The proposal is based on quantile regression, including an additional transformation parameter for selecting the best scale for linearity of the conditional quantiles. The resulting method is used to perform effective labeling of potential outliers, with a quite low computational complexity, allowing for simple implementation within statistical software as well as commonly used spreadsheets. Some simulation experiments have been carried out to study the swamping and masking properties of the proposal. The methodology is also illustrated by some real life examples, taking as the response variable the energy consumed in the melting process. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

17.
A D.C. optimization method for single facility location problems   总被引:4,自引:0,他引:4  
The single facility location problem with general attraction and repulsion functions is considered. An algorithm based on a representation of the objective function as the difference of two convex (d.c.) functions is proposed. Convergence to a global solution of the problem is proven and extensive computational experience with an implementation of the procedure is reported for up to 100,000 points. The procedure is also extended to solve conditional and limited distance location problems. We report on limited computational experiments on these extensions.This research was supported in part by the National Science Foundation Grant DDM-91-14489.  相似文献   

18.
This paper, by using conditional directional distance functions as introduced by Simar and Vanhems [J. Econometrics 166 (2012) 342–354] modifies the model by Färe and Grosskopf [Eur. J. Operat. Res. 157 (2004) 242–245] and examines the link between regional environmental efficiency and economic growth. The proposed model using conditional directional distance functions incorporates the effect of regional economic growth on regions’ environmental efficiency levels. The results from UK regional data reveal a negative relationship between regions’ GDP per capita and environmental inefficiency up to a certain GDP per capita level. After that level it appears that the relationship becomes positive. As an overall result the regional environmental inefficiency-GDP per capita relationship appears to have a ‘U’ shape form.  相似文献   

19.
This paper considers the problem of estimating a periodic function in a continuous time regression model with an additive stationary Gaussian noise having unknown correlation function. A general model selection procedure on the basis of arbitrary projective estimates, which does not need the knowledge of the noise correlation function, is proposed. A non-asymptotic upper bound for L2{\mathcal{L}_2} -risk (oracle inequality) has been derived under mild conditions on the noise. For the Ornstein–Uhlenbeck noise the risk upper bound is shown to be uniform in the nuisance parameter. In the case of Gaussian white noise the constructed procedure has some advantages as compared with the procedure based on the least squares estimates (LSE). The asymptotic minimaxity of the estimates has been proved. The proposed model selection scheme is extended also to the estimation problem based on the discrete data applicably to the situation when high frequency sampling can not be provided.  相似文献   

20.
We consider the periodic generalized autoregressive conditional heteroskedasticity(P-GARCH) process and propose a robust estimator by composite quantile regression. We study some useful properties about the P-GARCH model. Under some mild conditions, we establish the asymptotic results of proposed estimator.The Monte Carlo simulation is presented to assess the performance of proposed estimator. Numerical study results show that our proposed estimation outperforms other existing methods for heavy tailed distributions.The proposed methodology is also illustrated by Va R on stock price data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号