首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
One useful approach for fitting linear models with scalar outcomes and functional predictors involves transforming the functional data to wavelet domain and converting the data-fitting problem to a variable selection problem. Applying the LASSO procedure in this situation has been shown to be efficient and powerful. In this article, we explore two potential directions for improvements to this method: techniques for prescreening and methods for weighting the LASSO-type penalty. We consider several strategies for each of these directions which have never been investigated, either numerically or theoretically, in a functional linear regression context. We compare the finite-sample performance of the proposed methods through both simulations and real-data applications with both 1D signals and 2D image predictors. We also discuss asymptotic aspects. We show that applying these procedures can lead to improved estimation and prediction as well as better stability. Supplementary materials for this article are available online.  相似文献   

2.
Applications of regression models for binary response are very common and models specific to these problems are widely used. Quantile regression for binary response data has recently attracted attention and regularized quantile regression methods have been proposed for high dimensional problems. When the predictors have a natural group structure, such as in the case of categorical predictors converted into dummy variables, then a group lasso penalty is used in regularized methods. In this paper, we present a Bayesian Gibbs sampling procedure to estimate the parameters of a quantile regression model under a group lasso penalty for classification problems with a binary response. Simulated and real data show a good performance of the proposed method in comparison to mean-based approaches and to quantile-based approaches which do not exploit the group structure of the predictors.  相似文献   

3.
Researchers have long struggled to identify causal effects in nonexperimental settings. Many recently proposed strategies assume ignorability of the treatment assignment mechanism and require fitting two models—one for the assignment mechanism and one for the response surface. This article proposes a strategy that instead focuses on very flexibly modeling just the response surface using a Bayesian nonparametric modeling procedure, Bayesian Additive Regression Trees (BART). BART has several advantages: it is far simpler to use than many recent competitors, requires less guesswork in model fitting, handles a large number of predictors, yields coherent uncertainty intervals, and fluidly handles continuous treatment variables and missing data for the outcome variable. BART also naturally identifies heterogeneous treatment effects. BART produces more accurate estimates of average treatment effects compared to propensity score matching, propensity-weighted estimators, and regression adjustment in the nonlinear simulation situations examined. Further, it is highly competitive in linear settings with the “correct” model, linear regression. Supplemental materials including code and data to replicate simulations and examples from the article as well as methods for population inference are available online.  相似文献   

4.
We recognize Harada’s generalized categories of diagrams as a particular case of modules over a monad defined on a finite direct product of additive categories. We work in the dual (albeit formally equivalent) situation, that is, with comodules over comonads. With this conceptual tool at hand, we obtain several of the Harada results with simpler proofs, some of them under more general hypothesis, besides with a characterization of the normal triangular matrix comonads that are hereditary, that is, of homological dimension less than or equal to 1. Our methods rest on a matrix representation of additive functors and natural transformations, which allows us to adapt typical algebraic manipulations from Linear Algebra to the additive categorical setting.  相似文献   

5.
We present first methodology for dimension reduction in regressions with predictors that, given the response, follow one-parameter exponential families. Our approach is based on modeling the conditional distribution of the predictors given the response, which allows us to derive and estimate a sufficient reduction of the predictors. We also propose a method of estimating the forward regression mean function without requiring an explicit forward regression model. Whereas nearly all existing estimators of the central subspace are limited to regressions with continuous predictors only, our proposed methodology extends estimation to regressions with all categorical or a mixture of categorical and continuous predictors. Supplementary materials including the proofs and the computer code are available from the JCGS website.  相似文献   

6.
The generalized partially linear additive model (GPLAM) is a flexible and interpretable approach to building predictive models. It combines features in an additive manner, allowing each to have either a linear or nonlinear effect on the response. However, the choice of which features to treat as linear or nonlinear is typically assumed known. Thus, to make a GPLAM a viable approach in situations in which little is known a priori about the features, one must overcome two primary model selection challenges: deciding which features to include in the model and determining which of these features to treat nonlinearly. We introduce the sparse partially linear additive model (SPLAM), which combines model fitting and both of these model selection challenges into a single convex optimization problem. SPLAM provides a bridge between the lasso and sparse additive models. Through a statistical oracle inequality and thorough simulation, we demonstrate that SPLAM can outperform other methods across a broad spectrum of statistical regimes, including the high-dimensional (p ? N) setting. We develop efficient algorithms that are applied to real datasets with half a million samples and over 45,000 features with excellent predictive performance. Supplementary materials for this article are available online.  相似文献   

7.
Penalized Functional Regression   总被引:1,自引:0,他引:1  
We develop fast fitting methods for generalized functional linear models. The functional predictor is projected onto a large number of smooth eigenvectors and the coefficient function is estimated using penalized spline regression; confidence intervals based on the mixed model framework are obtained. Our method can be applied to many functional data designs including functions measured with and without error, sparsely or densely sampled. The methods also extend to the case of multiple functional predictors or functional predictors with a natural multilevel structure. The approach can be implemented using standard mixed effects software and is computationally fast. The methodology is motivated by a study of white-matter demyelination via diffusion tensor imaging (DTI). The aim of this study is to analyze differences between various cerebral white-matter tract property measurements of multiple sclerosis (MS) patients and controls. While the statistical developments proposed here were motivated by the DTI study, the methodology is designed and presented in generality and is applicable to many other areas of scientific research. An online appendix provides R implementations of all simulations.  相似文献   

8.
Variable and model selection are of major concern in many statistical applications, especially in high-dimensional regression models. Boosting is a convenient statistical method that combines model fitting with intrinsic model selection. We investigate the impact of base-learner specification on the performance of boosting as a model selection procedure. We show that variable selection may be biased if the covariates are of different nature. Important examples are models combining continuous and categorical covariates, especially if the number of categories is large. In this case, least squares base-learners offer increased flexibility for the categorical covariate and lead to a preference even if the categorical covariate is noninformative. Similar difficulties arise when comparing linear and nonlinear base-learners for a continuous covariate. The additional flexibility in the nonlinear base-learner again yields a preference of the more complex modeling alternative. We investigate these problems from a theoretical perspective and suggest a framework for bias correction based on a general class of penalized least squares base-learners. Making all base-learners comparable in terms of their degrees of freedom strongly reduces the selection bias observed in naive boosting specifications. The importance of unbiased model selection is demonstrated in simulations. Supplemental materials including an application to forest health models, additional simulation results, additional theorems, and proofs for the theorems are available online.  相似文献   

9.
With the rapid-growth-in-size scientific data in various disciplines, feature screening plays an important role to reduce the high-dimensionality to a moderate scale in many scientific fields. In this paper, we introduce a unified and robust model-free feature screening approach for high-dimensional survival data with censoring, which has several advantages: it is a model-free approach under a general model framework, and hence avoids the complication to specify an actual model form with huge number of candidate variables; under mild conditions without requiring the existence of any moment of the response, it enjoys the ranking consistency and sure screening properties in ultra-high dimension. In particular, we impose a conditional independence assumption of the response and the censoring variable given each covariate, instead of assuming the censoring variable is independent of the response and the covariates. Moreover, we also propose a more robust variant to the new procedure, which possesses desirable theoretical properties without any finite moment condition of the predictors and the response. The computation of the newly proposed methods does not require any complicated numerical optimization and it is fast and easy to implement. Extensive numerical studies demonstrate that the proposed methods perform competitively for various configurations. Application is illustrated with an analysis of a genetic data set.  相似文献   

10.
Fixed effects models are very flexible because they do not make assumptions on the distribution of effects and can also be used if the heterogeneity component is correlated with explanatory variables. A disadvantage is the large number of effects that have to be estimated. A recursive partitioning (or tree based) method is proposed that identifies clusters of units that share the same effect. The approach reduces the number of parameters to be estimated and is useful in particular if one is interested in identifying clusters with the same effect on a response variable. It is shown that the method performs well and outperforms competitors like the finite mixture model in particular if the heterogeneity component is correlated with explanatory variables. In two applications the usefulness of the approach to identify clusters that share the same effect is illustrated. Supplementary materials for this article are available online.  相似文献   

11.
We define symmetric and exterior powers of categories, fitting into categorified Koszul complexes. We discuss examples and calculate the effect of these power operations on the categorical characters of matrix 2-representations.  相似文献   

12.
Additive hazards model with random effects is proposed for modelling the correlated failure time data when focus is on comparing the failure times within clusters and on estimating the correlation between failure times from the same cluster, as well as the marginal regression parameters. Our model features that, when marginalized over the random effect variable, it still enjoys the structure of the additive hazards model. We develop the estimating equations for inferring the regression parameters. The proposed estimators are shown to be consistent and asymptotically normal under appropriate regularity conditions. Furthermore, the estimator of the baseline hazards function is proposed and its asymptotic properties are also established. We propose a class of diagnostic methods to assess the overall fitting adequacy of the additive hazards model with random effects. We conduct simulation studies to evaluate the finite sample behaviors of the proposed estimators in various scenarios. Analysis of the Diabetic Retinopathy Study is provided as an illustration for the proposed method.  相似文献   

13.
We present a technique for clustering categorical data by generating many dissimilarity matrices and combining them. We begin by demonstrating our technique on low-dimensional categorical data and comparing it to several other techniques that have been proposed. We show through simulations and examples that our method is both more accurate and more stable. Then we give conditions under which our method should yield good results in general. Our method extends to high-dimensional categorical data of equal lengths by ensembling over many choices of explanatory variables. In this context, we compare our method with two other methods. Finally, we extend our method to high-dimensional categorical data vectors of unequal length by using alignment techniques to equalize the lengths. We give an example to show that our method continues to provide useful results, in particular, providing a comparison with phylogenetic trees. Supplementary material for this article is available online.  相似文献   

14.
The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with this constant in place of a numerical or binary response. In a linear model with a positive response, dividing by its values yields a regression of constant output by the relative shares of individual predictors into the total response. Chemical reaction models use the agents' concentration, summing to a constant 100%. Another example can be found in priority modelling by Thurstone scaling for ranked or paired comparison data. The Thurstone scale can be estimated by probit or logit models with identical output across all the responses. Models with a unitary output can be constructed by software for regular regressions, but they give a different interpretation of results. For instance, the coefficient of multiple determination is not an estimate of the explained variance in the total response variance (which is zero), but a measure of the fitting quality of the constant approximated by an aggregate of predictors.  相似文献   

15.
The modelling of many real life phenomena for which either the parameter estimation is difficult, or which are subject to random noisy perturbations, is often carried out by using stochastic ordinary differential equations (SODEs). For this reason, in recent years much attention has been devoted to deriving numerical methods for approximating their solution. In particular, in this paper we consider the use of linear multistep formulae (LMF). Strong order convergence conditions up to order 1 are stated, for both commutative and non-commutative problems. The case of additive noise is further investigated, in order to obtain order improvements. The implementation of the methods is also considered, leading to a predictor-corrector approach. Some numerical tests on problems taken from the literature are also included.  相似文献   

16.
In the present paper, we consider dimension reduction methods for functional regression with a scalar response and the predictors including a random curve and a categorical random variable. To deal with the categorical random variable, we propose three potential dimension reduction methods: partial functional sliced inverse regression, marginal functional sliced inverse regression and conditional functional sliced inverse regression. Furthermore, we investigate the relationships among the three methods. In addition, a new modified BIC criterion for determining the dimension of the effective dimension reduction space is developed. Real and simulation data examples are then presented to show the effectiveness of the proposed methods.  相似文献   

17.
This paper examines the analysis of an extended finite mixture of factor analyzers (MFA) where both the continuous latent variable (common factor) and the categorical latent variable (component label) are assumed to be influenced by the effects of fixed observed covariates. A polytomous logistic regression model is used to link the categorical latent variable to its corresponding covariate, while a traditional linear model with normal noise is used to model the effect of the covariate on the continuous latent variable. The proposed model turns out be in various ways an extension of many existing related models, and as such offers the potential to address some of the issues not fully handled by those previous models. A detailed derivation of an EM algorithm is proposed for parameter estimation, and latent variable estimates are obtained as by-products of the overall estimation procedure.  相似文献   

18.
Many categorical axioms assert that a particular canonically defined natural transformation between certain functors is invertible. We give two examples of such axioms where the existence of any natural isomorphism between the functors implies the invertibility of the canonical natural transformation. The first example is distributive categories, the second (semi-)additive ones. We show that each follows from a general result about monoidal functors.  相似文献   

19.
Using a relative version of Auslander's formula, we give a functorial approach to show that the bounded derived category of every Artin algebra admits a categorical resolution. This, in particular, implies that the bounded derived categories of Artin algebras of finite global dimension determine bounded derived categories of all Artin algebras. Hence, this paper can be considered as a typical application of functor categories,introduced in representation theory by Auslander(1971), to categorical resolutions.  相似文献   

20.
Linear problems with inexact initial data are examined. The stopping rules for certain iterative methods designed for solving linear equations and a linear elimination problem are proposed and analysed. In particular, these methods are applicable to ill-conditioned and ill-posed problems. Numerical results are presented that demonstrate the efficiency of these methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号