首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The work revisits the autocovariance function estimation, a fundamental problem in statistical inference for time series. We convert the function estimation problem into constrained penalized regression with a generalized penalty that provides us with flexible and accurate estimation, and study the asymptotic properties of the proposed estimator. In case of a nonzero mean time series, we apply a penalized regression technique to a differenced time series, which does not require a separate detrending procedure. In penalized regression, selection of tuning parameters is critical and we propose four different data-driven criteria to determine them. A simulation study shows effectiveness of the tuning parameter selection and that the proposed approach is superior to three existing methods. We also briefly discuss the extension of the proposed approach to interval-valued time series. Supplementary materials for this article are available online.  相似文献   

2.
In this article, we focus on the estimation of a high-dimensional inverse covariance (i.e., precision) matrix. We propose a simple improvement of the graphical Lasso (glasso) framework that is able to attain better statistical performance without increasing significantly the computational cost. The proposed improvement is based on computing a root of the sample covariance matrix to reduce the spread of the associated eigenvalues. Through extensive numerical results, using both simulated and real datasets, we show that the proposed modification improves the glasso procedure. Our results reveal that the square-root improvement can be a reasonable choice in practice. Supplementary material for this article is available online.  相似文献   

3.
In this article, we use the cross-entropy method for noisy optimization for fitting generalized linear multilevel models through maximum likelihood. We propose specifications of the instrumental distributions for positive and bounded parameters that improve the computational performance. We also introduce a new stopping criterion, which has the advantage of being problem-independent. In a second step we find, by means of extensive Monte Carlo experiments, the most suitable values of the input parameters of the algorithm. Finally, we compare the method to the benchmark estimation technique based on numerical integration. The cross-entropy approach turns out to be preferable from both the statistical and the computational point of view. In the last part of the article, the method is used to model the probability of firm exits in the healthcare industry in Italy. Supplemental materials are available online.  相似文献   

4.
This paper considers the problem of parameter estimation in a general class of semiparametric models when observations are subject to missingness at random. The semiparametric models allow for estimating functions that are non-smooth with respect to the parameter. We propose a nonparametric imputation method for the missing values, which then leads to imputed estimating equations for the finite dimensional parameter of interest. The asymptotic normality of the parameter estimator is proved in a general setting, and is investigated in detail for a number of specific semiparametric models. Finally, we study the small sample performance of the proposed estimator via simulations.  相似文献   

5.
We propose an accelerated path-following iterative shrinkage thresholding algorithm (APISTA) for solving high-dimensional sparse nonconvex learning problems. The main difference between APISTA and the path-following iterative shrinkage thresholding algorithm (PISTA) is that APISTA exploits an additional coordinate descent subroutine to boost the computational performance. Such a modification, though simple, has profound impact: APISTA not only enjoys the same theoretical guarantee as that of PISTA, that is, APISTA attains a linear rate of convergence to a unique sparse local optimum with good statistical properties, but also significantly outperforms PISTA in empirical benchmarks. As an application, we apply APISTA to solve a family of nonconvex optimization problems motivated by estimating sparse semiparametric graphical models. APISTA allows us to obtain new statistical recovery results that do not exist in the existing literature. Thorough numerical results are provided to back up our theory.  相似文献   

6.
Gaussian time-series models are often specified through their spectral density. Such models present several computational challenges, in particular because of the nonsparse nature of the covariance matrix. We derive a fast approximation of the likelihood for such models. We propose to sample from the approximate posterior (i.e., the prior times the approximate likelihood), and then to recover the exact posterior through importance sampling. We show that the variance of the importance sampling weights vanishes as the sample size goes to infinity. We explain why the approximate posterior may typically be multimodal, and we derive a Sequential Monte Carlo sampler based on an annealing sequence to sample from that target distribution. Performance of the overall approach is evaluated on simulated and real datasets. In addition, for one real-world dataset, we provide some numerical evidence that a Bayesian approach to semiparametric estimation of spectral density may provide more reasonable results than its frequentist counterparts. The article comes with supplementary materials, available online, that contain an Appendix with a proof of our main Theorem, a Python package that implements the proposed procedure, and the Ethernet dataset.  相似文献   

7.
Cross-validation (CV) is often used to select the regularization parameter in high-dimensional problems. However, when applied to the sparse modeling method Lasso, CV leads to models that are unstable in high-dimensions, and consequently not suited for reliable interpretation. In this article, we propose a model-free criterion ESCV based on a new estimation stability (ES) metric and CV. Our proposed ESCV finds a smaller and locally ES-optimal model smaller than the CV choice so that it fits the data and also enjoys estimation stability property. We demonstrate that ESCV is an effective alternative to CV at a similar easily parallelizable computational cost. In particular, we compare the two approaches with respect to several performance measures when applied to the Lasso on both simulated and real datasets. For dependent predictors common in practice, our main finding is that ESCV cuts down false positive rates often by a large margin, while sacrificing little of true positive rates. ESCV usually outperforms CV in terms of parameter estimation while giving similar performance as CV in terms of prediction. For the two real datasets from neuroscience and cell biology, the models found by ESCV are less than half of the model sizes by CV, but preserves CV's predictive performance and corroborates with subject knowledge and independent work. We also discuss some regularization parameter alignment issues that come up in both approaches. Supplementary materials are available online.  相似文献   

8.
One of the main objectives of this article is to derive efficient nonparametric estimators for an unknown density fX. It is well known that the ordinary kernel density estimator has, despite several good properties, some serious drawbacks. For example, it suffers from boundary bias and it also exhibits spurious bumps in the tails. We propose a semiparametric transformation kernel density estimator to overcome these defects. It is based on a new semiparametric transformation function that transforms data to normality. A generalized bandwidth adaptation procedure is also developed. It is found that the newly proposed semiparametric transformation kernel density estimator performs well for unimodal, low, and high kurtosis densities. Moreover, it detects and estimates densities with excessive curvature (e.g., modes and valleys) more effectively than existing procedures. In conclusion, practical examples based on real-life data are presented.  相似文献   

9.
The analysis of data generated by animal habitat selection studies, by family studies of genetic diseases, or by longitudinal follow-up of households often involves fitting a mixed conditional logistic regression model to longitudinal data composed of clusters of matched case-control strata. The estimation of model parameters by maximum likelihood is especially difficult when the number of cases per stratum is greater than one. In this case, the denominator of each cluster contribution to the conditional likelihood involves a complex integral in high dimension, which leads to convergence problems in the numerical maximization. In this article we show how these computational complexities can be bypassed using a global two-step analysis for nonlinear mixed effects models. The first step estimates the cluster-specific parameters and can be achieved with standard statistical methods and software based on maximum likelihood for independent data. The second step uses the EM-algorithm in conjunction with conditional restricted maximum likelihood to estimate the population parameters. We use simulations to demonstrate that the method works well when the analysis is based on a large number of strata per cluster, as in many ecological studies. We apply the proposed two-step approach to evaluate habitat selection by pairs of bison roaming freely in their natural environment. This article has supplementary material online.  相似文献   

10.
The time-evolving precision matrix of a piecewise-constant Gaussian graphical model encodes the dynamic conditional dependency structure of a multivariate time-series. Traditionally, graphical models are estimated under the assumption that data are drawn identically from a generating distribution. Introducing sparsity and sparse-difference inducing priors, we relax these assumptions and propose a novel regularized M-estimator to jointly estimate both the graph and changepoint structure. The resulting estimator possesses the ability to therefore favor sparse dependency structures and/or smoothly evolving graph structures, as required. Moreover, our approach extends current methods to allow estimation of changepoints that are grouped across multiple dependencies in a system. An efficient algorithm for estimating structure is proposed. We study the empirical recovery properties in a synthetic setting. The qualitative effect of grouped changepoint estimation is then demonstrated by applying the method on a genetic time-course dataset. Supplementary material for this article is available online.  相似文献   

11.
基于纵向数据部分线性测量误差模型, 研究了模型中兴趣参数部分回归系数的估计问题. 首先采用B样条方法逼近模型中的非参数函数, 然后提出修正的二次推断函数(QIF)方法对模型中参数部分的回归系数进行估计, 所提方法可以提高估计的效率. 在一定的正则条件下, 证明了所得到的估计量具有相合性和渐近正态性. 最后, 通过模拟研究和实例分析验证了所提出估计方法的有限大样本性质.  相似文献   

12.
Many problems in genomics are related to variable selection where high-dimensional genomic data are treated as covariates. Such genomic covariates often have certain structures and can be represented as vertices of an undirected graph. Biological processes also vary as functions depending upon some biological state, such as time. High-dimensional variable selection where covariates are graph-structured and underlying model is nonparametric presents an important but largely unaddressed statistical challenge. Motivated by the problem of regression-based motif discovery, we consider the problem of variable selection for high-dimensional nonparametric varying-coefficient models and introduce a sparse structured shrinkage (SSS) estimator based on basis function expansions and a novel smoothed penalty function. We present an efficient algorithm for computing the SSS estimator. Results on model selection consistency and estimation bounds are derived. Moreover, finite-sample performances are studied via simulations, and the effects of high-dimensionality and structural information of the covariates are especially highlighted. We apply our method to motif finding problem using a yeast cell-cycle gene expression dataset and word counts in genes’ promoter sequences. Our results demonstrate that the proposed method can result in better variable selection and prediction for high-dimensional regression when the underlying model is nonparametric and covariates are structured. Supplemental materials for the article are available online.  相似文献   

13.
This article proposes simple estimation methods dedicated to a semiparametric family of bivariate copulas. These copulas can be simply estimated through the estimation of their univariate generating function. We use this result to estimate the associated measures of association as well as the high probability regions of the copula. These procedures are illustrated using both simulations and real data.  相似文献   

14.
本文将半参数线性混合效应模型推广应用到一类具有零膨胀的纵向数据或集群数据的研究中,提出了一类新的半参数混合效应模型,然后利用广义交叉核实法选取光滑参数,通过最大惩罚似然函数方法与EM算法给出了模型参数部分与非参数部分的估计方法,最后,通过模拟和实例说明了本文方法的有效性.  相似文献   

15.
We consider the problem of nonparametric estimation of unknown smooth functions in the presence of restrictions on the shape of the estimator and on its support using polynomial splines. We provide a general computational framework that treats these estimation problems in a unified manner, without the limitations of the existing methods. Applications of our approach include computing optimal spline estimators for regression, density estimation, and arrival rate estimation problems in the presence of various shape constraints. Our approach can also handle multiple simultaneous shape constraints. The approach is based on a characterization of nonnegative polynomials that leads to semidefinite programming (SDP) and second-order cone programming (SOCP) formulations of the problems. These formulations extend and generalize a number of previous approaches in the literature, including those with piecewise linear and B-spline estimators. We also consider a simpler approach in which nonnegative splines are approximated by splines whose pieces are polynomials with nonnegative coefficients in a nonnegative basis. A condition is presented to test whether a given nonnegative basis gives rise to a spline cone that is dense in the space of nonnegative continuous functions. The optimization models formulated in the article are solvable with minimal running time using off-the-shelf software. We provide numerical illustrations for density estimation and regression problems. These examples show that the proposed approach requires minimal computational time, and that the estimators obtained using our approach often match and frequently outperform kernel methods and spline smoothing without shape constraints. Supplementary materials for this article are provided online.  相似文献   

16.
We analyze a semiparametric model for data that suffer from the problems of sample selection, where some of the data are observed for only part of the sample with a probability that depends on a selection equation, and of endogeneity, where a covariate is correlated with the disturbance term. The introduction of nonparametric functions in the model permits great flexibility in the way covariates affect response variables. We present an efficient Bayesian method for the analysis of such models that allows us to consider general systems of outcome variables and endogenous regressors that are continuous, binary, censored, or ordered. Estimation is by Markov chain Monte Carlo (MCMC) methods. The algorithm we propose does not require simulation of the outcomes that are missing due to the selection mechanism, which reduces the computational load and improves the mixing of the MCMC chain. The approach is applied to a model of women’s labor force participation and log-wage determination. Data and computer code used in this article are available online.  相似文献   

17.
In multivariate categorical data, models based on conditional independence assumptions, such as latent class models, offer efficient estimation of complex dependencies. However, Bayesian versions of latent structure models for categorical data typically do not appropriately handle impossible combinations of variables, also known as structural zeros. Allowing nonzero probability for impossible combinations results in inaccurate estimates of joint and conditional probabilities, even for feasible combinations. We present an approach for estimating posterior distributions in Bayesian latent structure models with potentially many structural zeros. The basic idea is to treat the observed data as a truncated sample from an augmented dataset, thereby allowing us to exploit the conditional independence assumptions for computational expediency. As part of the approach, we develop an algorithm for collapsing a large set of structural zero combinations into a much smaller set of disjoint marginal conditions, which speeds up computation. We apply the approach to sample from a semiparametric version of the latent class model with structural zeros in the context of a key issue faced by national statistical agencies seeking to disseminate confidential data to the public: estimating the number of records in a sample that are unique in the population on a set of publicly available categorical variables. The latent class model offers remarkably accurate estimates of population uniqueness, even in the presence of a large number of structural zeros.  相似文献   

18.
For semiparametric survival models with interval-censored data and a cure fraction, it is often difficult to derive nonparametric maximum likelihood estimation due to the challenge in maximizing the complex likelihood function. In this article, we propose a computationally efficient EM algorithm, facilitated by a gamma-Poisson data augmentation, for maximum likelihood estimation in a class of generalized odds rate mixture cure (GORMC) models with interval-censored data. The gamma-Poisson data augmentation greatly simplifies the EM estimation and enhances the convergence speed of the EM algorithm. The empirical properties of the proposed method are examined through extensive simulation studies and compared with numerical maximum likelihood estimates. An R package “GORCure” is developed to implement the proposed method and its use is illustrated by an application to the Aerobic Center Longitudinal Study dataset. Supplementary material for this article is available online.  相似文献   

19.
In this paper,a semiparametric two-sample density ratio model is considered and the empirical likelihood method is applied to obtain the parameters estimation.A commonly occurring problem in computing is that the empirical likelihood function may be a concaveconvex function.Here a simple Lagrange saddle point algorithm is presented for computing the saddle point of the empirical likelihood function when the Lagrange multiplier has no explicit solution.So we can obtain the maximum empirical likelihood estimation (MELE) of parameters.Monte Carlo simulations are presented to illustrate the Lagrange saddle point algorithm.  相似文献   

20.
Variable selection methods using a penalized likelihood have been widely studied in various statistical models. However, in semiparametric frailty models, these methods have been relatively less studied because the marginal likelihood function involves analytically intractable integrals, particularly when modeling multicomponent or correlated frailties. In this article, we propose a simple but unified procedure via a penalized h-likelihood (HL) for variable selection of fixed effects in a general class of semiparametric frailty models, in which random effects may be shared, nested, or correlated. We consider three penalty functions (least absolute shrinkage and selection operator [LASSO], smoothly clipped absolute deviation [SCAD], and HL) in our variable selection procedure. We show that the proposed method can be easily implemented via a slight modification to existing HL estimation approaches. Simulation studies also show that the procedure using the SCAD or HL penalty performs well. The usefulness of the new method is illustrated using three practical datasets too. Supplementary materials for the article are available online.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号