期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Variable Selection Diagnostics Measures for High-Dimensional Regression

Ying Nan Yuhong Yang 《Journal of computational and graphical statistics》2013,22(3):636-656

Markov chain Monte Carlo (MCMC) is nowadays a standard approach to numerical computation of integrals of the posterior density π of the parameter vector η. Unfortunately, Bayesian inference using MCMC is computationally intractable when the posterior density π is expensive to evaluate. In many such problems, it is possible to identify a minimal subvector β of η responsible for the expensive computation in the evaluation of π. We propose two approaches, DOSKA and INDA, that approximate π by interpolation in ways that exploit this computational structure to mitigate the curse of dimensionality. DOSKA interpolates π directly while INDA interpolates π indirectly by interpolating functions, for example, a regression function, upon which π depends. Our primary contribution is derivation of a Gaussian processes interpolant that provably improves over some of the existing approaches by reducing the effective dimension of the interpolation problem from dim(η) to dim(β). This allows a dramatic reduction of the number of expensive evaluations necessary to construct an accurate approximation of π when dim(η) is high but dim(β) is low.

We illustrate the proposed approaches in a case study for a spatio-temporal linear model for air pollution data in the greater Boston area.

Supplemental materials include proofs, details, and software implementation of the proposed procedures. 相似文献

2.

Covariate Transformation Diagnostics for Generalized Linear Models

Andy H. Lee John S. Yick 《Annals of the Institute of Statistical Mathematics》1999,51(2):383-398

Transformations of covariates are commonly applied in regression analysis. When a parametric transformation family is used, the maximum likelihood estimate of the transformation parameter is often sensitive to minor perturbations of the data. Diagnostics are derived to assess the influence of observations on the covariate transformation parameter in generalized linear models. Three numerical examples are presented to illustrate the usefulness of the proposed diagnostics. 相似文献

3.

Using Spatial Statistics to Select Model Complexity

《Journal of computational and graphical statistics》2013,22(2):348-369

Testing for nonindependence among the residuals from a regression or time series model is a common approach to evaluating the adequacy of a fitted model. This idea underlies the familiar Durbin–Watson statistic, and previous works illustrate how the spatial autocorrelation among residuals can be used to test a candidate linear model. We propose here that a version of Moran's I statistic for spatial autocorrelation, applied to residuals from a fitted model, is a practical general tool for selecting model complexity under the assumption of iid additive errors. The “space” is defined by the independent variables, and the presence of significant spatial autocorrelation in residuals is evidence that a more complex model is needed to capture all of the structure in the data. An advantage of this approach is its generality, which results from the fact that no properties of the fitted model are used other than consistency. The problem of smoothing parameter selection in nonparametric regression is used to illustrate the performance of model selection based on residual spatial autocorrelation (RSA). In simulation trials comparing RSA with established selection criteria based on minimizing mean square prediction error, smooths selected by RSA exhibit fewer spurious features such as minima and maxima. In some cases, at higher noise levels, RSA smooths achieved a lower average mean square error than smooths selected by GCV. We also briefly describe a possible modification of the method for non-iid errors having short-range correlations, for example, time-series errors or spatial data. Some other potential applications are suggested, including variable selection in regression models. 相似文献

4.

Variational Approximation for Mixtures of Linear Mixed Models

Siew Li Tan David J. Nott 《Journal of computational and graphical statistics》2013,22(2):564-585

Mixtures of linear mixed models (MLMMs) are useful for clustering grouped data and can be estimated by likelihood maximization through the Expectation–Maximization algorithm. A suitable number of components is then determined conventionally by comparing different mixture models using penalized log-likelihood criteria such as Bayesian information criterion. We propose fitting MLMMs with variational methods, which can perform parameter estimation and model selection simultaneously. We describe a variational approximation for MLMMs where the variational lower bound is in closed form, allowing for fast evaluation and develop a novel variational greedy algorithm for model selection and learning of the mixture components. This approach handles algorithm initialization and returns a plausible number of mixture components automatically. In cases of weak identifiability of certain model parameters, we use hierarchical centering to reparameterize the model and show empirically that there is a gain in efficiency in variational algorithms similar to that in Markov chain Monte Carlo (MCMC) algorithms. Related to this, we prove that the approximate rate of convergence of variational algorithms by Gaussian approximation is equal to that of the corresponding Gibbs sampler, which suggests that reparameterizations can lead to improved convergence in variational algorithms just as in MCMC algorithms. Supplementary materials for the article are available online. 相似文献

5.

纵向数据下线性EV模型的变量选择

田瑞琴薛留根《应用概率统计》2013,29(3):246-260

本文考虑了纵向数据线性EV模型的变量选择.基于二次推断函数方法和压缩方法的思想提出了一种新的偏差校正的变量选择方法.在选择适当的调整参数下,我们证明了所得到的估计量的相合性和渐近正态性.最后通过模拟研究验证了所提出的变量选择方法的有限样本性质. 相似文献

6.

Statistical and Geometrical Way of Model Selection for a Family of Subdivision Schemes

Ghulam MUSTAFA 《数学年刊B辑(英文版)》2017,38(5):1077-1092

The objective of this article is to introduce a generalized algorithm to produce the m-point n-ary approximating subdivision schemes(for any integer m, n ≥ 2). The proposed algorithm has been derived from uniform B-spline blending functions. In particular, we study statistical and geometrical/traditional methods for the model selection and assessment for selecting a subdivision curve from the proposed family of schemes to model noisy and noisy free data. Moreover, we also discuss the deviation of subdivision curves generated by proposed family of schemes from convex polygonal curve. Furthermore, visual performances of the schemes have been presented to compare numerically the Gibbs oscillations with the existing family of schemes. 相似文献

7.

高维线性回归模型中非凸惩罚岭回归

王秀丽王明秋《数学研究及应用》2014,34(6):743-753

非凸惩罚函数包括SCAD惩罚和MCP惩罚, 这类惩罚函数具有无偏性、连续性和稀疏性等特点,岭回归方法能够很好的克服共线性问题. 本文将非凸惩罚函数和岭回归方法的优势结合起来(简记为 NPR),研究了自变量间存在高相关性问题时NPR估计的Oracle性质. 这里主要研究了参数个数$p_n$ 随样本量$n$ 呈指数阶增长的情况. 同时, 通过模拟研究和实例分析进一步验证了NPR 方法的表现. 相似文献

8.

基于灰色理论的线性模型参数理论及应用

李勇《数理统计与管理》2017,(4):672-677

线性模型是经典统计学的基本内容,主要应用于随机数据的建模分析。如何对非随机的模糊或灰色等不分明性数据进行模型构建和统计分析。基于灰色系统理论,在一系列关于灰色统计推断理论的研究基础上,将灰色估计和灰色假设检验等方法拓展到线性模型的参数估计和假设检验中。与经典统计分析方法进行对比,为不分明数据的建模分析提供新的方法。相似文献

9.

An Algorithm for Nonlinear,Nonparametric Model Choice and Prediction

Frédéric Ferraty Peter Hall 《Journal of computational and graphical statistics》2013,22(3):695-714

We introduce an algorithm which, in the context of nonlinear regression on vector-valued explanatory variables, aims to choose those combinations of vector components that provide best prediction. The algorithm is constructed specifically so that it devotes attention to components that might be of relatively little predictive value by themselves, and so might be ignored by more conventional methodology for model choice, but which, in combination with other difficult-to-find components, can be particularly beneficial for prediction. The design of the algorithm is also motivated by a desire to choose vector components that become redundant once appropriate combinations of other, more relevant components are selected. Our theoretical arguments show these goals are met in the sense that, with probability converging to 1 as sample size increases, the algorithm correctly determines a small, fixed number of variables on which the regression mean, g say, depends, even if dimension diverges to infinity much faster than n. Moreover, the estimated regression mean based on those variables approximates g with an error that, to first order, equals the error which would arise if we were told in advance the correct variables. In this sense, the estimator achieves oracle performance. Our numerical work indicates that the algorithm is suitable for very high dimensional problems, where it keeps computational labor in check by using a novel sequential argument, and also for more conventional prediction problems, where dimension is relatively low. 相似文献

10.

Interaction Model and Model Selection for Function-on-Function Regression

Ruiyan Luo Xin Qi 《Journal of computational and graphical statistics》2019,28(2):309-322

Regression models with interaction effects have been widely used in multivariate analysis to improve model flexibility and prediction accuracy. In functional data analysis, however, due to the challenges of estimating three-dimensional coefficient functions, interaction effects have not been considered for function-on-function linear regression. In this article, we propose function-on-function regression models with interaction and quadratic effects. For a model with specified main and interaction effects, we propose an efficient estimation method that enjoys a minimum prediction error property and has good predictive performance in practice. Moreover, converting the estimation of three-dimensional coefficient functions of the interaction effects to the estimation of two- and one-dimensional functions separately, our method is computationally efficient. We also propose adaptive penalties to account for varying magnitudes and roughness levels of coefficient functions. In practice, the forms of the models are usually unspecified. We propose a stepwise procedure for model selection based on a predictive criterion. This method is implemented in our R package FRegSigComp. Supplemental materials are available online. 相似文献

11.

因变量缺失下变系数部分线性测量误差模型的变量选择

杨凌霞黄彬《数学的实践与认识》2014,(16)

主要研究因变量存在缺失且协变量部分包含测量误差情形下,如何对变系数部分线性模型同时进行参数估计和变量选择.我们利用插补方法来处理缺失数据,并结合修正的profile最小二乘估计和SCAD惩罚对参数进行估计和变量选择.并且证明所得的估计具有渐近正态性和Oracle性质.通过数值模拟进一步研究所得估计的有限样本性质. 相似文献

12.

线性混合模型用于艾滋病疗效预测和疗法选优

桂文林韩兆洲《数理统计与管理》2010,29(3)

为探究线性混合效应模型在艾滋病疗效预测和疗法选优中的应用。利用美国艾滋病医疗试验机构ACTG的193A研究中的一组非平衡重复测量数据,以logcd4为体现疗效的因变量,年龄、性别为固定效应,建立截距和治疗时间的斜率随受试者随机变化且其期望值因疗法不同的线性混合效应模型,用SAS软件中mixed过程求解并预测。通过疗法对截距和治疗时间斜率期望值的影响选择最优疗法。结果表明模型有非常好的拟合和预测效果,疗法4为最优疗法。本研究为专业医生进行艾滋病疗效的预测和疗法选优提供了科学依据和方法。相似文献

13.

Non-gaussian Test Models for Prediction and State Estimation with Model Errors

Michal BRANICKI Nan CHEN Andrew J. MAJDA 《数学年刊B辑(英文版)》2013,34(1):29-64

Turbulent dynamical systems involve dynamics with both a large dimensional phase space and a large number of positive Lyapunov exponents. Such systems are ubiquitous in applications in contemporary science and engineering where the statistical ensemble prediction and the real time filtering/state estimation are needed despite the underlying complexity of the system. Statistically exactly solvable test models have a crucial role to provide firm mathematical underpinning or new algorithms for vastly more complex scientific phenomena. Here, a class of statistically exactly solvable non-Gaussian test models is introduced, where a generalized Feynman-Kac formulation reduces the exact behavior of conditional statistical moments to the solution to inhomogeneous Fokker-Planck equations modified by linear lower order coupling and source terms. This procedure is applied to a test model with hidden instabilities and is combined with information theory to address two important issues in the contemporary statistical prediction of turbulent dynamical systems: the coarse-grained ensemble prediction in a perfect model and the improving long range forecasting in imperfect models. The models discussed here should be useful for many other applications and algorithms for the real time prediction and the state estimation. 相似文献

14.

Measurement of Longevity Risk Using Bootstrapping for Lee–Carter and Generalised Linear Poisson Models of Mortality

S. Haberman A. E. Renshaw 《Methodology and Computing in Applied Probability》2009,11(3):443-461

This paper provides a comparative investigation of simulation strategies for measuring the longevity risk associated with predictions of mortality rates and derived estimates of life expectancy. The study considers the Lee–Carter framework and a generalised linear Poisson model for representing the dynamics of mortality, as well as enhancements that allow for joint modelling of the dispersion and the effect of using a negative binomial rather than a Poisson assumption. 相似文献

15.

基于灰色理论与统计学比较的包头市经济发展预测研究

张璞孙青《数理统计与管理》2007,26(4):595-601

本文研究了基于灰色理论与统计学比较的系统建模方法来对包头市未来的经济发展做出预测。在分析了包头市的经济发展状况后,作者通过国内生产总值的时间序列这个综合信息,应用灰色理论与有关统计学分别建立了包头市的经济增长的数学模型,并应用这些模型分别进行了预测。经过比较,最后给出了包头市的"十一五"期间的国内生产总值的预测值。相似文献

16.

再扩充模糊逻辑中逻辑程序的模型论语义和不动点语义研究

刘富春《模糊系统与数学》2006,20(2):28-33

模糊逻辑及其扩充是处理不确定性与模糊性信息的重要数学工具,在近似推理、人工智能等领域有着广泛的应用。而逻辑程序也已经成为人工智能的研究热点之一。本文是在再扩充模糊逻辑中,对逻辑程序进行了语义的研究。给出了其语法和语义描述,并且将逻辑程序的许多主要结论推广到再扩充模糊逻辑中。首先,得到了关于模糊逻辑推论的充分必要条件以及模型交与强模型交性质。然后,通过引入一个H erbrand解释算子Pτ:LBp→LBp,给出了确定性程序P的H erbrand模型的充分条件和强H erbrand模型的充分必要条件。最后,建立了确定性程序的最小强H erbrand模型的不动点刻画定理。相似文献

17.

Model selection for regression on a fixed design 总被引：1，自引：0，他引：1

Yannick Baraud 《Probability Theory and Related Fields》2000,117(4):467-493

We deal with the problem of estimating some unknown regression function involved in a regression framework with deterministic design points. For this end, we consider some collection of finite dimensional linear spaces (models) and the least-squares estimator built on a data driven selected model among this collection. This data driven choice is performed via the minimization of some penalized model selection criterion that generalizes on Mallows' C _p. We provide non asymptotic risk bounds for the so-defined estimator from which we deduce adaptivity properties. Our results hold under mild moment conditions on the errors. The statement and the use of a new moment inequality for empirical processes is at the heart of the techniques involved in our approach. Received: 2 July 1997 / Revised version: 20 September 1999 / Published online: 6 July 2000 相似文献

18.

图书馆图书供应商选择的PROMETHEE/TOPSIS决策模型 总被引：3，自引：0，他引：3

唐振宇冯玉强呼大永《运筹与管理》2009,18(6):110-116

图书供应商选择对图书馆各项功能的实现和未来的发展有着重要影响,这一重要问题的解决方法受到很多学者的关注。在构建图书供应商评价指标体系的基础上,提出了一种供应商选择的PROMETHEE／TOPSIS决策模型。模型通过向具有完全补偿性的TOPSIS方法中引入偏好函数,并结合摆动赋权方法,减少了决策信息的损失和扭曲,更好地反映了图书供应商间的差别。最后通过实例证明了该模型的有效性和实用性。相似文献

19.

基于AHP和动量BP神经网络的工程项目承包商选择模型 总被引：1，自引：0，他引：1

张熠王先甲《数学的实践与认识》2014,(21)

运用BP神经网络技术,采用动量BP算法,构建了基于动量BP神经网络的工程项目承包商选择模型,并将AHP的评价结果作为学习样本,对BP神经网络模型进行训练和测试.结果表明,基于AHP和动量BP神经网络的工程项目承包商选择模型是可行的,该模型具有较高的自组织、自适应和自学习能力以及较强的容错功能,能够为一般的工程项目承包商选择活动提供有效的参考和依据. 相似文献

20.

基于线性多步法的GM(1,1)模型优化及应用

沈艳张丽玲刘垠聂龙阳《数学的实践与认识》2016,(17):182-188

为提高灰色GM(1,1)模型的模拟效果和预测精度,采用线性多步法中四阶Adams显式公式和隐式公式来优化GM(1,1)模型,改进模型的参数辨识,讨论所建立优化模型的适用范围、模拟效果和预测精度,并与最小二乘作为参数辨识的传统GM(1,1)模型进行比较.实例表明,基于线性多步法所建立的GM(1,1)模型,可以有效地提高模型的预测精度和适用性. 相似文献