首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In count data regression there can be several problems that prevent the use of the standard Poisson log‐linear model: overdispersion, caused by unobserved heterogeneity or correlation, excess of zeros, non‐linear effects of continuous covariates or of time scales, and spatial effects. We develop Bayesian count data models that can deal with these issues simultaneously and within a unified inferential approach. Models for overdispersed or zero‐inflated data are combined with semiparametrically structured additive predictors, resulting in a rich class of count data regression models. Inference is fully Bayesian and is carried out by computationally efficient MCMC techniques. Simulation studies investigate performance, in particular how well different model components can be identified. Applications to patent data and to data from a car insurance illustrate the potential and, to some extent, limitations of our approach. Copyright © 2006 John Wiley & Sons, Ltd.  相似文献   

2.
为提高港口货物吞吐量预测精度,建立了基于ARIMAX-SVR的组合预测模型。以天津港为例,选取1999~2018年货物吞吐量数据进行分析,首先运用BP神经网络补插缺失数据,然后通过Pearson相关分析筛选出影响货物吞吐量的主要因素;再在ARIMA模型的基础上建立了ARIMAX模型,为进一步提高模型精度,最后建立了SVR模型修正的ARIMAX模型。实证分析结果表明组合模型拟合精度更高,预测效果更好,适用于港口吞吐量预测并且模型具有一定的先进性。  相似文献   

3.
In this article, we propose and explore a multivariate logistic regression model for analyzing multiple binary outcomes with incomplete covariate data where auxiliary information is available. The auxiliary data are extraneous to the regression model of interest but predictive of the covariate with missing data. Horton and Laird [N.J. Horton, N.M. Laird, Maximum likelihood analysis of logistic regression models with incomplete covariate data and auxiliary information, Biometrics 57 (2001) 34–42] describe how the auxiliary information can be incorporated into a regression model for a single binary outcome with missing covariates, and hence the efficiency of the regression estimators can be improved. We consider extending the method of [9] to the case of a multivariate logistic regression model for multiple correlated outcomes, and with missing covariates and completely observed auxiliary information. We demonstrate that in the case of moderate to strong associations among the multiple outcomes, one can achieve considerable gains in efficiency from estimators in a multivariate model as compared to the marginal estimators of the same parameters.  相似文献   

4.
Mortality improvements pose a challenge for the planning of public retirement systems as well as for the private life annuities business. For public policy, as well as for the management of financial institutions, it is important to forecast future mortality rates. Standard models for mortality forecasting assume that the force of mortality at age x in calendar year t is of the form exp(αx + βxκt). The log of the time series of age-specific death rates is thus expressed as the sum of an age-specific component αx that is independent of time and another component that is the product of a time-varying parameter κt reflecting the general level of mortality, and an age-specific component βx that represents how rapidly or slowly mortality at each age varies when the general level of mortality changes. The parameters are usually estimated via singular value decomposition or via maximum likelihood in a binomial or Poisson regression model. This paper demonstrates that it is possible to take into account the overdispersion present in the mortality data by estimating the parameter in a negative binomial regression model. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

5.
Regression density estimation is the problem of flexibly estimating a response distribution as a function of covariates. An important approach to regression density estimation uses finite mixture models and our article considers flexible mixtures of heteroscedastic regression (MHR) models where the response distribution is a normal mixture, with the component means, variances, and mixture weights all varying as a function of covariates. Our article develops fast variational approximation (VA) methods for inference. Our motivation is that alternative computationally intensive Markov chain Monte Carlo (MCMC) methods for fitting mixture models are difficult to apply when it is desired to fit models repeatedly in exploratory analysis and model choice. Our article makes three contributions. First, a VA for MHR models is described where the variational lower bound is in closed form. Second, the basic approximation can be improved by using stochastic approximation (SA) methods to perturb the initial solution to attain higher accuracy. Third, the advantages of our approach for model choice and evaluation compared with MCMC-based approaches are illustrated. These advantages are particularly compelling for time series data where repeated refitting for one-step-ahead prediction in model choice and diagnostics and in rolling-window computations is very common. Supplementary materials for the article are available online.  相似文献   

6.
In this paper we consider the estimation of the error distribution in a heteroscedastic nonparametric regression model with multivariate covariates. As estimator we consider the empirical distribution function of residuals, which are obtained from multivariate local polynomial fits of the regression and variance functions, respectively. Weak convergence of the empirical residual process to a Gaussian process is proved. We also consider various applications for testing model assumptions in nonparametric multiple regression. The model tests obtained are able to detect local alternatives that converge to zero at an n−1/2-rate, independent of the covariate dimension. We consider in detail a test for additivity of the regression function.  相似文献   

7.
Conditional simulation is useful in connection with inference and prediction for a generalized linear mixed model. We consider random walk Metropolis and Langevin-Hastings algorithms for simulating the random effects given the observed data, when the joint distribution of the unobserved random effects is multivariate Gaussian. In particular we study the desirable property of geometric ergodicity, which ensures the validity of central limit theorems for Monte Carlo estimates.  相似文献   

8.
In this paper, we consider the problem of estimating a high dimensional precision matrix of Gaussian graphical model. Taking advantage of the connection between multivariate linear regression and entries of the precision matrix, we propose Bayesian Lasso together with neighborhood regression estimate for Gaussian graphical model. This method can obtain parameter estimation and model selection simultaneously. Moreover, the proposed method can provide symmetric confidence intervals of all entries of the precision matrix.  相似文献   

9.
In multivariate time series analysis, dynamic principal component analysis (DPCA) is an effective method for dimensionality reduction. DPCA is an extension of the original PCA method which can be applied to an autocorrelated dynamic process. In this paper, we apply DPCA to a set of real oil data and use the principal components as covariates in condition-based maintenance (CBM) modeling. The CBM model (Model 1) is then compared with the CBM model which uses raw oil data as the covariates (Model 2). It is shown that the average maintenance cost corresponding to the optimal policy for Model 1 is considerably lower than that for Model 2, and when the optimal policies are applied to the oil data histories, the policy for Model 1 correctly indicates almost twice as many impending system failures as the policy for Model 2.  相似文献   

10.
Time series of counts have a wide variety of applications in real life. Analyzing time series of counts requires accommodations for serial dependence, discreteness, and overdispersion of data. In this paper, we extend blockwise empirical likelihood (Kitamura, 1997 [15]) to the analysis of time series of counts under a regression setting. In particular, our contribution is the extension of Kitamura’s (1997) [15] method to the analysis of nonstationary time series. Serial dependence among observations is treated nonparametrically using a blocking technique; and overdispersion in count data is accommodated by the specification of a variance-mean relationship. We establish consistency and asymptotic normality of the maximum blockwise empirical likelihood estimator. Simulation studies show that our method has a good finite sample performance. The method is also illustrated by analyzing two real data sets: monthly counts of poliomyelitis cases in the USA and daily counts of non-accidental deaths in Toronto, Canada.  相似文献   

11.
This paper presents an extension of the standard regression tree method to clustered data. Previous works extending tree methods to accommodate correlated data are mainly based on the multivariate repeated-measures approach. We propose a “mixed effects regression tree” method where the correlated observations are viewed as nested within clusters rather than as vectors of multivariate repeated responses. The proposed method can handle unbalanced clusters, allows observations within clusters to be split, and can incorporate random effects and observation-level covariates. We implemented the proposed method using a standard tree algorithm within the framework of the expectation-maximization (EM) algorithm. The simulation results show that the proposed regression tree method provides substantial improvements over standard trees when the random effects are non negligible. A real data example is used to illustrate the method.  相似文献   

12.
Data with multivariate count responses frequently occur in modern applications. The commonly used multinomial-logit model is limiting due to its restrictive mean-variance structure. For instance, analyzing count data from the recent RNA-seq technology by the multinomial-logit model leads to serious errors in hypothesis testing. The ubiquity of overdispersion and complicated correlation structures among multivariate counts calls for more flexible regression models. In this article, we study some generalized linear models that incorporate various correlation structures among the counts. Current literature lacks a treatment of these models, partly because they do not belong to the natural exponential family. We study the estimation, testing, and variable selection for these models in a unifying framework. The regression models are compared on both synthetic and real RNA-seq data. Supplementary materials for this article are available online.  相似文献   

13.
We propose a new variational Bayes (VB) estimator for high-dimensional copulas with discrete, or a combination of discrete and continuous, margins. The method is based on a variational approximation to a tractable augmented posterior and is faster than previous likelihood-based approaches. We use it to estimate drawable vine copulas for univariate and multivariate Markov ordinal and mixed time series. These have dimension rT, where T is the number of observations and r is the number of series, and are difficult to estimate using previous methods. The vine pair-copulas are carefully selected to allow for heteroscedasticity, which is a feature of most ordinal time series data. When combined with flexible margins, the resulting time series models also allow for other common features of ordinal data, such as zero inflation, multiple modes, and under or overdispersion. Using six example series, we illustrate both the flexibility of the time series copula models and the efficacy of the VB estimator for copulas of up to 792 dimensions and 60 parameters. This far exceeds the size and complexity of copula models for discrete data that can be estimated using previous methods. An online appendix and MATLAB code implementing the method are available as supplementary materials.  相似文献   

14.
We introduce graphical time series models for the analysis of dynamic relationships among variables in multivariate time series. The modelling approach is based on the notion of strong Granger causality and can be applied to time series with non-linear dependences. The models are derived from ordinary time series models by imposing constraints that are encoded by mixed graphs. In these graphs each component series is represented by a single vertex and directed edges indicate possible Granger-causal relationships between variables while undirected edges are used to map the contemporaneous dependence structure. We introduce various notions of Granger-causal Markov properties and discuss the relationships among them and to other Markov properties that can be applied in this context. Examples for graphical time series models include nonlinear autoregressive models and multivariate ARCH models.  相似文献   

15.
Count data frequently exhibit overdispersion, zero inflation and even heavy-tailedness (the tail probabilities are non-negligible or decrease very slowly) in practical applications. Many models have been proposed for modelling count data with overdispersion and zero inflation, but heavy-tailedness is less considered. The proposed model, a new integer-valued autoregressive process with generalized Poisson-inverse Gaussian innovations, is capable of capturing these features. The generalized Poisson-inverse Gaussian family is very flexible, which includes Poisson distribution, Poisson inverse Gaussian distribution, discrete stable distribution and so on. Stationarity and ergodicity of this model are investigated and the expressions of marginal mean and variance are provided. Conditional maximum likelihood is used for estimating the parameters, and consistency and asymptotic normality for the estimators are presented. Further, we consider the h-step forecast and diagnostics for the proposed model. The proposed model is applied to three real data examples. In the first example, we consider the monthly number of cases of Polio, which validates that the proposed model can take into account count data with excessive zeros. Then, we illustrate the use of the proposed model through an application to the numbers of National Science Foundation fundings. Finally, we apply the proposed model to the numbers of transactions in 5-min intervals for the stock traded at Empire District Electric Company. The second and third examples show that the proposed model has a good performance in modelling heavy-tailed count data.  相似文献   

16.
股票时间序列预测在经济和管理领域具有重要的应用前景,也是很多商业和金融机构成功的基础.首先利用奇异谱分析对股市时间序列重构,降低噪声并提取趋势序列.再利用C-C算法确定股市时间序列的嵌入维数和延迟阶数,对股市时间序列进行相空间重构,生成神经网络的学习矩阵.进一步利用Boosting技术和不同的神经网络模型,生成神经网络集成个体.最后采用带有惩罚项的半参数回归模型进行集成,并利用遗传算法选择最优的光滑参数,以此建立遗传算法和半参数回归的神经网络集成股市预测模型.通过上证指数开盘价进行实例分析,与传统的时间序列分析和其他集成方法对比,发现该方法能获得更准确的预测结果.计算结果表明该方法能充分反映股票价格时间序列趋势,为金融时间序列预测提供一个有效方法.  相似文献   

17.
Overdispersion in time series of counts is very common and has been well studied by many authors, but the opposite phenomenon of underdispersion may also be encountered in real applications and receives little attention. Based on popularity of the generalized Poisson distribution in regression count models and of Poisson INGARCH models in time series analysis, we introduce a generalized Poisson INGARCH model, which can account for both overdispersion and underdispersion. Compared with the double Poisson INGARCH model, conditions for the existence and ergodicity of such a process are easily given. We analyze the autocorrelation structure and also derive expressions for moments of order 1 and 2. We consider the maximum likelihood estimators for the parameters and establish their consistency and asymptotic normality. We apply the proposed model to one overdispersed real example and one underdispersed real example, respectively, which indicates that the proposed methodology performs better than other conventional model-based methods in the literature.  相似文献   

18.
Variational approximations have the potential to scale Bayesian computations to large datasets and highly parameterized models. Gaussian approximations are popular, but can be computationally burdensome when an unrestricted covariance matrix is employed and the dimension of the model parameter is high. To circumvent this problem, we consider a factor covariance structure as a parsimonious representation. General stochastic gradient ascent methods are described for efficient implementation, with gradient estimates obtained using the so-called “reparameterization trick.” The end result is a flexible and efficient approach to high-dimensional Gaussian variational approximation. We illustrate using robust P-spline regression and logistic regression models. For the latter, we consider eight real datasets, including datasets with many more covariates than observations, and another with mixed effects. In all cases, our variational method provides fast and accurate estimates. Supplementary material for this article is available online.  相似文献   

19.
The covariate-adjusted regression model was initially proposed for the situations where both the predictors and the response variables are not directly observed, but are distorted by some common observable covariates. In this paper, we investigate a covariate-adjusted nonparametric regression (CANR) model and consider the proposed model on time series setting. We develop a two-step estimation procedure to estimate the regression function. The asymptotic property of the proposed estimation is investigated under the -mixing conditions. Both the real data and simulated examples are provided for illustration.  相似文献   

20.
We propose a probability model for random partitions in the presence of covariates. In other words, we develop a model-based clustering algorithm that exploits available covariates. The motivating application is predicting time to progression for patients in a breast cancer trial. We proceed by reporting a weighted average of the responses of clusters of earlier patients. The weights should be determined by the similarity of the new patient’s covariate with the covariates of patients in each cluster. We achieve the desired inference by defining a random partition model that includes a regression on covariates. Patients with similar covariates are a priori more likely to be clustered together. Posterior predictive inference in this model formalizes the desired prediction.

We build on product partition models (PPM). We define an extension of the PPM to include a regression on covariates by including in the cohesion function a new factor that increases the probability of experimental units with similar covariates to be included in the same cluster. We discuss implementations suitable for any combination of continuous, categorical, count, and ordinal covariates.

An implementation of the proposed model as R-package is available for download.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号