首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到12条相似文献,搜索用时 0 毫秒
1.
We propose a method for estimating nonstationary spatial covariance functions by representing a spatial process as a linear combination of some local basis functions with uncorrelated random coefficients and some stationary processes, based on spatial data sampled in space with repeated measurements. By incorporating a large collection of local basis functions with various scales at various locations and stationary processes with various degrees of smoothness, the model is flexible enough to represent a wide variety of nonstationary spatial features. The covariance estimation and model selection are formulated as a regression problem with the sample covariances as the response and the covariances corresponding to the local basis functions and the stationary processes as the predictors. A constrained least squares approach is applied to select appropriate basis functions and stationary processes as well as estimate parameters simultaneously. In addition, a constrained generalized least squares approach is proposed to further account for the dependencies among the response variables. A simulation experiment shows that our method performs well in both covariance function estimation and spatial prediction. The methodology is applied to a U.S. precipitation dataset for illustration. Supplemental materials relating to the application are available online.  相似文献   

2.
Variable and model selection are of major concern in many statistical applications, especially in high-dimensional regression models. Boosting is a convenient statistical method that combines model fitting with intrinsic model selection. We investigate the impact of base-learner specification on the performance of boosting as a model selection procedure. We show that variable selection may be biased if the covariates are of different nature. Important examples are models combining continuous and categorical covariates, especially if the number of categories is large. In this case, least squares base-learners offer increased flexibility for the categorical covariate and lead to a preference even if the categorical covariate is noninformative. Similar difficulties arise when comparing linear and nonlinear base-learners for a continuous covariate. The additional flexibility in the nonlinear base-learner again yields a preference of the more complex modeling alternative. We investigate these problems from a theoretical perspective and suggest a framework for bias correction based on a general class of penalized least squares base-learners. Making all base-learners comparable in terms of their degrees of freedom strongly reduces the selection bias observed in naive boosting specifications. The importance of unbiased model selection is demonstrated in simulations. Supplemental materials including an application to forest health models, additional simulation results, additional theorems, and proofs for the theorems are available online.  相似文献   

3.
We propose an objective Bayesian approach to the selection of covariates and their penalized splines transformations in generalized additive models. The methodology is based on a combination of continuous mixtures of g-priors for model parameters and a multiplicity-correction prior for the models themselves. We introduce our approach in the normal model and extend it to nonnormal exponential families. A simulation study and an application with binary outcome is provided. An efficient implementation is available in the R package hypergsplines. Supplementary materials for this article are available online.  相似文献   

4.
可线性化回归预测模型通过换元进行线性回归,换元前后的因变量具有异方差性,致使拟线性回归参数的精度较低.运用GL算法给出了此类模型的稳定最小二乘解,提高了参数的估计精度,最后给出了一个应用实例.  相似文献   

5.
Sufficient dimension reduction (SDR) is a paradigm for reducing the dimension of the predictors without losing regression information. Most SDR methods require inverting the covariance matrix of the predictors. This hinders their use in the analysis of contemporary datasets where the number of predictors exceeds the available sample size and the predictors are highly correlated. To this end, by incorporating the seeded SDR idea and the sequential dimension-reduction framework, we propose a SDR method for high-dimensional data with correlated predictors. The performance of the proposed method is studied via extensive simulations. To demonstrate its use, an application to microarray gene expression data where the response is the production rate of riboflavin (vitamin B2) is presented.  相似文献   

6.
In this work, we apply variographic techniques from spatial statistics to the problem of model selection in local polynomial regression with multivariate data. These techniques permit selection of the kernel and smoothing matrix with less computational load and interpretation of the regularity of the regression function in different directions. Moreover, they may represent the only feasible alternative for problems of a certain dimensionality.  相似文献   

7.
Testing for nonindependence among the residuals from a regression or time series model is a common approach to evaluating the adequacy of a fitted model. This idea underlies the familiar Durbin–Watson statistic, and previous works illustrate how the spatial autocorrelation among residuals can be used to test a candidate linear model. We propose here that a version of Moran's I statistic for spatial autocorrelation, applied to residuals from a fitted model, is a practical general tool for selecting model complexity under the assumption of iid additive errors. The “space” is defined by the independent variables, and the presence of significant spatial autocorrelation in residuals is evidence that a more complex model is needed to capture all of the structure in the data. An advantage of this approach is its generality, which results from the fact that no properties of the fitted model are used other than consistency. The problem of smoothing parameter selection in nonparametric regression is used to illustrate the performance of model selection based on residual spatial autocorrelation (RSA). In simulation trials comparing RSA with established selection criteria based on minimizing mean square prediction error, smooths selected by RSA exhibit fewer spurious features such as minima and maxima. In some cases, at higher noise levels, RSA smooths achieved a lower average mean square error than smooths selected by GCV. We also briefly describe a possible modification of the method for non-iid errors having short-range correlations, for example, time-series errors or spatial data. Some other potential applications are suggested, including variable selection in regression models.  相似文献   

8.
Abstract

Proposed by Tibshirani, the least absolute shrinkage and selection operator (LASSO) estimates a vector of regression coefficients by minimizing the residual sum of squares subject to a constraint on the l 1-norm of the coefficient vector. The LASSO estimator typically has one or more zero elements and thus shares characteristics of both shrinkage estimation and variable selection. In this article we treat the LASSO as a convex programming problem and derive its dual. Consideration of the primal and dual problems together leads to important new insights into the characteristics of the LASSO estimator and to an improved method for estimating its covariance matrix. Using these results we also develop an efficient algorithm for computing LASSO estimates which is usable even in cases where the number of regressors exceeds the number of observations. An S-Plus library based on this algorithm is available from StatLib.  相似文献   

9.
An Application of Multiple Comparison Techniques to Model Selection   总被引:1,自引:0,他引:1  
Akaike's information criterion (AIC) is widely used to estimate the best model from a given candidate set of parameterized probabilistic models. In this paper, considering the sampling error of AIC, a set of good models is constructed rather than choosing a single model. This set is called a confidence set of models, which includes the minimum {AIC} model at an error rate smaller than the specified significance level. The result is given as P-value for each model, from which the confidence set is immediately obtained. A variant of Gupta's subset selection procedure is devised, in which a standardized difference of AIC is calculated for every pair of models. The critical constants are computed by the Monte-Carlo method, where the asymptotic normal approximation of AIC is used. The proposed method neither requires the full model nor assumes a hierarchical structure of models, and it has higher power than similar existing methods.  相似文献   

10.
孙道德 《大学数学》2001,17(5):45-49
关于线性回归模型选择 ,[1 ]中介绍了许多方法 ,他们均基于残差平方和下建立的选择准则 .本文试基于参数估计的理论给出一种方法 ,从参数估计的优良性质上来说 ,我们认为是合理的 .同时给出了计算方法及应用实例 .  相似文献   

11.
This paper presents a generalization of Rao's covariance structure. In a general linear regression model, we classify the error covariance structure into several categories and investigate the efficiency of the ordinary least squares estimator (OLSE) relative to the Gauss–Markov estimator (GME). The classification criterion considered here is the rank of the covariance matrix of the difference between the OLSE and the GME. Hence our classification includes Rao's covariance structure. The results are applied to models with special structures: a general multivariate analysis of variance model, a seemingly unrelated regression model, and a serial correlation model.  相似文献   

12.
High-dimensional data are prevalent across many application areas, and generate an ever-increasing demand for statistical methods of dimension reduction, such as cluster and significance analysis. One application area that has recently received much interest is the analysis of microarray gene expression data.

The results of cluster analysis are open to subjective interpretation. To facilitate the objective inference of such analyses, we use flexible parameterizations of the cluster means, paired with model selection, to generate sparse and easy-to-interpret representations of each cluster. Model selection in cluster analysis is combinatorial in the numbers of clusters and data dimensions, and thus presents a computationally challenging task.

In this article we introduce a model selection method based on rate-distortion theory, which allows us to turn the combinatorial model selection problem into a fast and simultaneous selection across clusters. The method is also applicable to model selection in significance analysis

We show that simultaneous model selection for cluster analysis generates objectively interpretable cluster models, and that the selection performance is competitive with a combinatorial search, at a fraction of the computational cost. Moreover, we show that the rate-distortion based significance analysis substantially increases the power compared with standard methods.

This article has supplementary material online.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号