首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Generalized additive models for location, scale and, shape define a flexible, semi-parametric class of regression models for analyzing insurance data in which the exponential family assumption for the response is relaxed. This approach allows the actuary to include risk factors not only in the mean but also in other key parameters governing the claiming behavior, like the degree of residual heterogeneity or the no-claim probability. In this broader setting, the Negative Binomial regression with cell-specific heterogeneity and the zero-inflated Poisson regression with cell-specific additional probability mass at zero are applied to model claim frequencies. New models for claim severities that can be applied either per claim or aggregated per year are also presented. Bayesian inference is based on efficient Markov chain Monte Carlo simulation techniques and allows for the simultaneous estimation of linear effects as well as of possible nonlinear effects, spatial variations and interactions between risk factors within the data set. To illustrate the relevance of this approach, a detailed case study is proposed based on the Belgian motor insurance portfolio studied in Denuit and Lang (2004).  相似文献   

2.
The purpose of this paper is to explore and compare the credibility premiums in generalized zero-inflated count models for panel data. Predictive premiums based on quadratic loss and exponential loss are derived. It is shown that the credibility premiums of the zero-inflated model allow for more flexibility in the prediction. Indeed, the future premiums not only depend on the number of past claims, but also on the number of insured periods with at least one claim. The model also offers another way of analysing the hunger for bonus phenomenon. The accident distribution is obtained from the zero-inflated distribution used to model the claims distribution, which can in turn be used to evaluate the impact of various credibility premiums on the reported accident distribution. This way of analysing the claims data gives another point of view on the research conducted on the development of statistical models for predicting accidents. A numerical illustration supports this discussion.  相似文献   

3.
In many applications involving spatial point patterns, we find evidence of inhibition or repulsion. The most commonly used class of models for such settings are the Gibbs point processes. A recent alternative, at least to the statistical community, is the determinantal point process. Here, we examine model fitting and inference for both of these classes of processes in a Bayesian framework. While usual MCMC model fitting can be available, the algorithms are complex and are not always well behaved. We propose using approximate Bayesian computation (ABC) for such fitting. This approach becomes attractive because, though likelihoods are very challenging to work with for these processes, generation of realizations given parameter values is relatively straightforward. As a result, the ABC fitting approach is well-suited for these models. In addition, such simulation makes them well-suited for posterior predictive inference as well as for model assessment. We provide details for all of the above along with some simulation investigation and an illustrative analysis of a point pattern of tree data exhibiting repulsion. R code and datasets are included in the supplementary material.  相似文献   

4.
A complex sequence of tests on components and the system is a part of many manufacturing processes. Statistical imperfect test and repair models can be used to derive the properties of such test sequences but require model parameters to be specified. We describe a technique for estimating such parameters from typical data that are available from past testing. A Gaussian mixture model is used to illustrate the approach and as a model that can represent the wide variety of statistical properties of test data, including outliers, multimodality and skewness. Model fitting was carried out using a Bayesian approach, implemented by MCMC. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

5.
This paper develops a Bayesian spike and slab model for zero-inflated count models which are commonly used in health economics. We account for model uncertainty and allow for model averaging in situations with many potential regressors. The proposed techniques are applied to a German data set analyzing the demand for health care. An accompanying package for the free statistical software environment R is provided.  相似文献   

6.
Customer satisfaction data collected by a large cellular phone service provider are to be used to evaluate and improve the quality of their service. For this purpose, we propose a Bayesian treatment of a joint‐response chain graph relating partial assessments of specific aspects of quality to an overall assessment of the service quality. The resulting Bayesian model can be used to render basic geographical and temporal differentiation, allowing the company to undertake direct corrective actions. Both normal and binary models are considered for our customer satisfaction data and are compared with other currently used methods in the study of customer satisfaction. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

7.
Count data with excess zeros are often encountered in many medical, biomedical and public health applications. In this paper, an extension of zero-inflated Poisson mixed regression models is presented for dealing with multilevel data set, referred as hierarchical mixture zero-inflated Poisson mixed regression models. A stochastic EM algorithm is developed for obtaining the ML estimates of interested parameters and a model comparison is also considered for comparing models with different latent classes through BIC criterion. An application to the analysis of count data from a Shanghai Adolescence Fitness Survey and a simulation study illustrate the usefulness and effectiveness of our methodologies.  相似文献   

8.
Expert knowledge in the form of mathematical models can be considered sufficient statistics of all prior experimentation in the domain, embodying generic or abstract knowledge of it. When used in a probabilistic framework, such models provide a sound foundation for data mining, inference, and decision making under uncertainty.We describe a methodology for encapsulating knowledge in the form of ordinary differential equations (ODEs) in dynamic Bayesian networks (DBNs). The resulting DBN framework can handle both data and model uncertainty in a principled manner, can be used for temporal data mining with noisy and missing data, and can be used to re-estimate model parameters automatically using data streams. A standard assumption when performing inference in DBNs is that time steps are fixed. Generally, the time step chosen is small enough to capture the dynamics of the most rapidly changing variable. This can result in DBNs having a natural time step that is very short, leading to inefficient inference; this is particularly an issue for DBNs derived from ODEs and for systems where the dynamics are not uniform over time.We propose an alternative to the fixed time step inference used in standard DBNs. In our algorithm, the DBN automatically adapts the time step lengths to suit the dynamics in each step. The resulting system allows us to efficiently infer probable values of hidden variables using multiple time series of evidence, some of which may be sparse, noisy or incomplete.We evaluate our approach with a DBN based on a variant of the van der Pol oscillator, and demonstrate an example where it gives more accurate results than the standard approach, but using only one tenth the number of time steps.We also apply our approach to a real-world example in critical care medicine. By incorporating knowledge in the form of an existing ODE model, we have built a DBN framework for efficiently predicting individualised patient responses using the available bedside and lab data.  相似文献   

9.
This paper discusses the utilization of specified Operations Research (OR) techniques and the application areas of OR in Taiwan's companies. Since the data solicited from the business organizations are represented by fuzzy linguistic terms, a linguistic approach is used to transform the data to total utility values for ranking. The results show that decision making with computers and statistical forecasting are the two most frequently used OR techniques, while Markovian decision processes and queuing models are the two least used ones. Regarding the application areas, manufacturing firms consider production whereas service companies consider finance as the major area for application. Both agree that human resource is the area receiving the least attention. With the linguistic approach of this study, more valuable results are derived.  相似文献   

10.
ZI (zero-inflated)数据就是含零过多的数据.从上世纪90年代以来, ZI数据在各个研究领域受到越来越广泛的重视,现在仍然是数据分析的热点问题之一.本文首先通过2个实例说明ZI数据的实际意义,然后介绍ZI数据分析的研究概况和最新进展.另外文章还系统介绍了各种ZI数据模型、ZI纵向数据模型及其参数估计方法,同时也介绍了ZI数据的统计诊断等问题, 其中包括作者近年来的一些工作.最后, 本文列出了若干有待进一步研究的问题.  相似文献   

11.
Loss Given Default (LGD) is the loss borne by the bank when a customer defaults on a loan. LGD for unsecured retail loans is often found difficult to model. In the frequentist (non-Bayesian) two-step approach, two separate regression models are estimated independently, which can be considered potentially problematic when trying to combine them to make predictions about LGD. The result is a point estimate of LGD for each loan. Alternatively, LGD can be modelled using Bayesian methods. In the Bayesian framework, one can build a single, hierarchical model instead of two separate ones, which makes this a more coherent approach. In this paper, Bayesian methods as well as the frequentist approach are applied to the data on personal loans provided by a large UK bank. As expected, the posterior means of parameters that have been produced using Bayesian methods are very similar to the frequentist estimates. The most important advantage of the Bayesian model is that it generates an individual predictive distribution of LGD for each loan. Potential applications of such distributions include the downturn LGD and the stressed LGD under Basel II.  相似文献   

12.
This paper presents first- and second-order statistical models of the total emergency medical service (EMS) demand rate using socio-economic, demographic and other characteristics of an area as the exogenous variables. Due to the multicollinear or non-orthogonal nature of the exogenous data, parameter estimates result are compared with those derived from the ordinary least squares method and a related Bayesian approach. These models are shown to provide highly significant fits to the empirical data. It is suggested that such models possess considerable utility as quantitative tool for evaluating and planning EMS systems.  相似文献   

13.
Bayesian hierarchical models have been used for smoothing splines, thin-plate splines, and L-splines. In analyzing high dimensional data sets, additive models and backfitting methods are often used. A full Bayesian analysis for such models may include a large number of random effects, many of which are not intuitive, so researchers typically use noninformative improper or nearly improper priors. We investigate propriety of the posterior for these cases. Our findings extend known results for normal linear mixed models to certain cases with Bayesian additive smoothing spline models. Supported by National Science Foundation grant SES-0351523 and by National Institutes of Health grants R01-CA100760 and R01-MH071418.  相似文献   

14.
A flexible Bayesian periodic autoregressive model is used for the prediction of quarterly and monthly time series data. As the unknown autoregressive lag order, the occurrence of structural breaks and their respective break dates are common sources of uncertainty these are treated as random quantities within the Bayesian framework. Since no analytical expressions for the corresponding marginal posterior predictive distributions exist a Markov Chain Monte Carlo approach based on data augmentation is proposed. Its performance is demonstrated in Monte Carlo experiments. Instead of resorting to a model selection approach by choosing a particular candidate model for prediction, a forecasting approach based on Bayesian model averaging is used in order to account for model uncertainty and to improve forecasting accuracy. For model diagnosis a Bayesian sign test is introduced to compare the predictive accuracy of different forecasting models in terms of statistical significance. In an empirical application, using monthly unemployment rates of Germany, the performance of the model averaging prediction approach is compared to those of model selected Bayesian and classical (non)periodic time series models.  相似文献   

15.
A finite mixture model has been used to fit the data from heterogeneous populations to many applications. An Expectation Maximization (EM) algorithm is the most popular method to estimate parameters in a finite mixture model. A Bayesian approach is another method for fitting a mixture model. However, the EM algorithm often converges to the local maximum regions, and it is sensitive to the choice of starting points. In the Bayesian approach, the Markov Chain Monte Carlo (MCMC) sometimes converges to the local mode and is difficult to move to another mode. Hence, in this paper we propose a new method to improve the limitation of EM algorithm so that the EM can estimate the parameters at the global maximum region and to develop a more effective Bayesian approach so that the MCMC chain moves from one mode to another more easily in the mixture model. Our approach is developed by using both simulated annealing (SA) and adaptive rejection metropolis sampling (ARMS). Although SA is a well-known approach for detecting distinct modes, the limitation of SA is the difficulty in choosing sequences of proper proposal distributions for a target distribution. Since ARMS uses a piecewise linear envelope function for a proposal distribution, we incorporate ARMS into an SA approach so that we can start a more proper proposal distribution and detect separate modes. As a result, we can detect the maximum region and estimate parameters for this global region. We refer to this approach as ARMS annealing. By putting together ARMS annealing with the EM algorithm and with the Bayesian approach, respectively, we have proposed two approaches: an EM-ARMS annealing algorithm and a Bayesian-ARMS annealing approach. We compare our two approaches with traditional EM algorithm alone and Bayesian approach alone using simulation, showing that our two approaches are comparable to each other but perform better than EM algorithm alone and Bayesian approach alone. Our two approaches detect the global maximum region well and estimate the parameters in this region. We demonstrate the advantage of our approaches using an example of the mixture of two Poisson regression models. This mixture model is used to analyze a survey data on the number of charitable donations.  相似文献   

16.
Non-negative matrix factorization (NMF) is a technique of multivariate analysis used to approximate a given matrix containing non-negative data using two non-negative factor matrices that has been applied to a number of fields. However, when a matrix containing non-negative data has many zeroes, NMF encounters an approximation difficulty. This zero-inflated situation occurs often when a data matrix is given as count data, and becomes more challenging with matrices of increasing size. To solve this problem, we propose a new NMF model for zero-inflated non-negative matrices. Our model is based on the zero-inflated Tweedie distribution. The Tweedie distribution is a generalization of the normal, the Poisson, and the gamma distributions, and differs from each of the other distributions in the degree of robustness of its estimated parameters. In this paper, we show through numerical examples that the proposed model is superior to the basic NMF model in terms of approximation of zero-inflated data. Furthermore, we show the differences between the estimated basis vectors found using the basic and the proposed NMF models for \(\beta \) divergence by applying it to real purchasing data.  相似文献   

17.
Probabilistic Decision Graphs (PDGs) are a class of graphical models that can naturally encode some context specific independencies that cannot always be efficiently captured by other popular models, such as Bayesian Networks. Furthermore, inference can be carried out efficiently over a PDG, in time linear in the size of the model. The problem of learning PDGs from data has been studied in the literature, but only for the case of complete data. We propose an algorithm for learning PDGs in the presence of missing data. The proposed method is based on the Expectation-Maximisation principle for estimating the structure of the model as well as the parameters. We test our proposal on both artificially generated data with different rates of missing cells and real incomplete data. We also compare the PDG models learnt by our approach to the commonly used Bayesian Network (BN) model. The results indicate that the PDG model is less sensitive to the rate of missing data than BN model. Also, though the BN models usually attain higher likelihood, the PDGs are close to them also in size, which makes the learnt PDGs preferable for probabilistic inference purposes.  相似文献   

18.
针对现有动态面板数据分析中存在偶发参数和没有考虑模型参数的不确定性风险问题,提出了基于Gibbs抽样算法的贝叶斯随机系数动态面板数据模型.假设初始值服从平稳分布,自回归系数服从Logit正态分布的条件下,设计了Markov链Monte Carlo数值计算程序,得到了模型参数的贝叶斯估计值.实证研究结果表明:基于Gibb...  相似文献   

19.
This paper presents a novel approach to simulation metamodeling using dynamic Bayesian networks (DBNs) in the context of discrete event simulation. A DBN is a probabilistic model that represents the joint distribution of a sequence of random variables and enables the efficient calculation of their marginal and conditional distributions. In this paper, the construction of a DBN based on simulation data and its utilization in simulation analyses are presented. The DBN metamodel allows the study of the time evolution of simulation by tracking the probability distribution of the simulation state over the duration of the simulation. This feature is unprecedented among existing simulation metamodels. The DBN metamodel also enables effective what-if analysis which reveals the conditional evolution of the simulation. In such an analysis, the simulation state at a given time is fixed and the probability distributions representing the state at other time instants are updated. Simulation parameters can be included in the DBN metamodel as external random variables. Then, the DBN offers a way to study the effects of parameter values and their uncertainty on the evolution of the simulation. The accuracy of the analyses allowed by DBNs is studied by constructing appropriate confidence intervals. These analyses could be conducted based on raw simulation data but the use of DBNs reduces the duration of repetitive analyses and is expedited by available Bayesian network software. The construction and analysis capabilities of DBN metamodels are illustrated with two example simulation studies.  相似文献   

20.
This paper addresses the problem of data fragmentation when incorporating imbalanced categorical covariates in nonparametric survival models. The problem arises in an application of demand forecasting where certain categorical covariates are important explanatory factors for the diversity of survival patterns but are severely imbalanced in the sense that a large percentage of data segments defined by these covariates have very small sample sizes. Two general approaches, called the class‐based approach and the fusion‐based approach, are proposed to handle the problem. Both reply on judicious utilization of a data segment hierarchy defined by the covariates. The class‐based approach allows certain segments in the hierarchy to have their private survival functions and aggregates the others to share a common survival function. The fusion‐based approach allows all survival functions to borrow and share information from all segments based on their positions in the hierarchy. A nonparametric Bayesian estimator with Dirichlet process priors provides the data‐sharing mechanism in the fusion‐based approach. The hyperparameters in the priors are treated as fixed quantities and learned from data by taking advantage of the data segment hierarchy. The proposed methods are motivated and validated by a case study with real‐world data from an operation of software development service.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号