首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 74 毫秒
1.
The objective of this paper is to explore different modeling strategies to generate high-dimensional Bernoulli vectors. We discuss the multivariate Bernoulli (MB) distribution, probe its properties and examine three models for generating random vectors. A latent multivariate normal model whose bivariate distributions are approximated with Plackett distributions with univariate normal distributions is presented. A conditional mean model is examined where the conditional probability of success depends on previous history of successes. A mixture of beta distributions is also presented that expresses the probability of the MB vector as a product of correlated binary random variables. Each method has a domain of effectiveness. The latent model offers unpatterned correlation structures while the conditional mean and the mixture model provide computational feasibility for high-dimensional generation of MB vectors.  相似文献   

2.
This paper focuses on nonparametric regression estimation for the parameters of a discrete or continuous distribution, such as the Poisson or Gamma distributions, when anomalous data are present. The proposal is a natural extension of robust methods developed in the setting of parametric generalized linear models. Robust estimators bounding either large values of the deviance or of the Pearson residuals are introduced and their asymptotic behaviour is derived. Through a Monte Carlo study, for the Poisson and Gamma distributions, the finite properties of the proposed procedures are investigated and their performance is compared with that of the classical ones. A resistant cross-validation method to choose the smoothing parameter is also considered.  相似文献   

3.
We analyze the reliability of NASA composite pressure vessels by using a new Bayesian semiparametric model. The data set consists of lifetimes of pressure vessels, wrapped with a Kevlar fiber, grouped by spool, subject to different stress levels; 10% of the data are right censored. The model that we consider is a regression on the log‐scale for the lifetimes, with fixed (stress) and random (spool) effects. The prior of the spool parameters is nonparametric, namely they are a sample from a normalized generalized gamma process, which encompasses the well‐known Dirichlet process. The nonparametric prior is assumed to robustify inferences to misspecification of the parametric prior. Here, this choice of likelihood and prior yields a new Bayesian model in reliability analysis. Via a Bayesian hierarchical approach, it is easy to analyze the reliability of the Kevlar fiber by predicting quantiles of the failure time when a new spool is selected at random from the population of spools. Moreover, for comparative purposes, we review the most interesting frequentist and Bayesian models analyzing this data set. Our credibility intervals of the quantiles of interest for a new random spool are narrower than those derived by previous Bayesian parametric literature, although the predictive goodness‐of‐fit performances are similar. Finally, as an original feature of our model, by means of the discreteness of the random‐effects distribution, we are able to cluster the spools into three different groups. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

4.
We describe a Bayesian model for simultaneous linear quantile regression at several specified quantile levels. More specifically, we propose to model the conditional distributions by using random probability measures, known as quantile pyramids, introduced by Hjort and Walker. Unlike many existing approaches, this framework allows us to specify meaningful priors on the conditional distributions, while retaining the flexibility afforded by the nonparametric error distribution formulation. Simulation studies demonstrate the flexibility of the proposed approach in estimating diverse scenarios, generally outperforming other competitive methods. We also provide conditions for posterior consistency. The method is particularly promising for modeling the extremal quantiles. Applications to extreme value analysis and in higher dimensions are also explored through data examples. Supplemental material for this article is available online.  相似文献   

5.
Random coefficient regressions have been applied in a wide range of fields, from biology to economics, and constitute a common frame for several important statistical models. A nonparametric approach to inference in random coefficient models was initiated by Beran and Hall. In this paper we introduce and study goodness of fit tests for the coefficient distributions; their asymptotic behavior under the null hypothesis is obtained. We also propose bootstrap resampling strategies to approach these distributions and prove their asymptotic validity using results by Giné and Zinn on bootstrap empirical processes. A simulation study illustrates the properties of these tests.  相似文献   

6.
To represent the high concentration of recovery rates at the boundaries, we propose to consider the recovery rate as a mixed random variable, obtained as the mixture of a Bernoulli random variable and a beta random variable. We suggest to estimate the mixture weights and the Bernoulli parameter by two logistic regression models. For the recovery rates belonging to the interval (0,1), we model, jointly, the mean and the dispersion by using two link functions, so we propose the joint beta regression model that accommodates skewness and heteroscedastic errors. This methodological proposal is applied to a comprehensive survey on loan recovery process of Italian banks. In the regression model, we include some macroeconomic variables because they are relevant to explain the recovery rate and allow to estimate it in downturn conditions, as Basel II requires. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

7.
We apply nonparametric regression to current status data, which often arises in survival analysis and reliability analysis. While no parametric assumption on the distributions has been imposed, most authors have employed parametric models like linear models to measure the covariate effects on failure times in regression analysis with current status data. We construct a nonparametric estimator of the regression function by modifying the maximum rank correlation (MRC) estimator. Our estimator can deal with the cases where the other estimators do not work. We present the asymptotic bias and the asymptotic distribution of the estimator by adapting a result on equicontinuity of degenerate U-processes to the setup of this paper.  相似文献   

8.
This paper describes a method by which a neural network learns to fit a distribution to sample data. The neural network may be used to replace the input distributions required in a simulation or mathematical model and it allows random variates to be generated for subsequent use in the model. Results are given for several data sets which indicate the method is robust and can represent different families of continuous distributions. The neural network is a three-layer feed-forward network of size (1-3-3-1). This paper suggests that the method is an alternative approach to the problem of selection of suitable continuous distributions and random variate generation techniques for use in simulation and mathematical models.  相似文献   

9.
Accurate loss reserves are an important item in the financial statement of an insurance company and are mostly evaluated by macrolevel models with aggregate data in run‐off triangles. In recent years, a new set of literature has considered individual claims data and proposed parametric reserving models based on claim history profiles. In this paper, we present a nonparametric and flexible approach for estimating outstanding liabilities using all the covariates associated to the policy, its policyholder, and all the information received by the insurance company on the individual claims since its reporting date. We develop a machine learning–based method and explain how to build specific subsets of data for the machine learning algorithms to be trained and assessed on. The choice for a nonparametric model leads to new issues since the target variables (claim occurrence and claim severity) are right‐censored most of the time. The performance of our approach is evaluated by comparing the predictive values of the reserve estimates with their true values on simulated data. We compare our individual approach with the most used aggregate data method, namely, chain ladder, with respect to the bias and the variance of the estimates. We also provide a short real case study based on a Dutch loan insurance portfolio.  相似文献   

10.
Much work has focused on developing exact tests for the analysis of discrete data using log linear or logistic regression models. A parametric model is tested for a dataset by conditioning on the value of a sufficient statistic and determining the probability of obtaining another dataset as extreme or more extreme relative to the general model, where extremeness is determined by the value of a test statistic such as the chi-square or the log-likelihood ratio. Exact determination of these probabilities can be infeasible for high dimensional problems, and asymptotic approximations to them are often inaccurate when there are small data entries and/or there are many nuisance parameters. In these cases Monte Carlo methods can be used to estimate exact probabilities by randomly generating datasets (tables) that match the sufficient statistic of the original table. However, naive Monte Carlo methods produce tables that are usually far from matching the sufficient statistic. The Markov chain Monte Carlo method used in this work (the regression/attraction approach) uses attraction to concentrate the distribution around the set of tables that match the sufficient statistic, and uses regression to take advantage of information in tables that “almost” match. It is also more general than others in that it does not require the sufficient statistic to be linear, and it can be adapted to problems involving continuous variables. The method is applied to several high dimensional settings including four-way tables with a model of no four-way interaction, and a table of continuous data based on beta distributions. It is powerful enough to deal with the difficult problem of four-way tables and flexible enough to handle continuous data with a nonlinear sufficient statistic.  相似文献   

11.
We consider Bayesian nonparametric regression through random partition models. Our approach involves the construction of a covariate-dependent prior distribution on partitions of individuals. Our goal is to use covariate information to improve predictive inference. To do so, we propose a prior on partitions based on the Potts clustering model associated with the observed covariates. This drives by covariate proximity both the formation of clusters, and the prior predictive distribution. The resulting prior model is flexible enough to support many different types of likelihood models. We focus the discussion on nonparametric regression. Implementation details are discussed for the specific case of multivariate multiple linear regression. The proposed model performs well in terms of model fitting and prediction when compared to other alternative nonparametric regression approaches. We illustrate the methodology with an application to the health status of nations at the turn of the 21st century. Supplementary materials are available online.  相似文献   

12.
用小波方法,考虑半参数回归模型y_i=X_i~Tβ+g(t_i)+ε_i(1≤i≤n),其中β∈R~d为未知参数,g(t)为[0,1]上未知的Borel可测函数,X_i为R~d上的随机设计,随机误差{ε_i}为鞅差序列,{t_i}为[0,1]上的常数序列.得到参数及非参数的小波估计量的q-阶矩相合性.  相似文献   

13.
The classic hierarchical linear model formulation provides a considerable flexibility for modelling the random effects structure and a powerful tool for analyzing nested data that arise in various areas such as biology, economics and education. However, it assumes the within-group errors to be independently and identically distributed (i.i.d.) and models at all levels to be linear. Most importantly, traditional hierarchical models (just like other ordinary mean regression methods) cannot characterize the entire conditional distribution of a dependent variable given a set of covariates and fail to yield robust estimators. In this article, we relax the aforementioned and normality assumptions, and develop a so-called Hierarchical Semiparametric Quantile Regression Models in which the within-group errors could be heteroscedastic and models at some levels are allowed to be nonparametric. We present the ideas with a 2-level model. The level-1 model is specified as a nonparametric model whereas level-2 model is set as a parametric model. Under the proposed semiparametric setting the vector of partial derivatives of the nonparametric function in level-1 becomes the response variable vector in level 2. The proposed method allows us to model the fixed effects in the innermost level (i.e., level 2) as a function of the covariates instead of a constant effect. We outline some mild regularity conditions required for convergence and asymptotic normality for our estimators. We illustrate our methodology with a real hierarchical data set from a laboratory study and some simulation studies.  相似文献   

14.
Decision trees are often used as a convenient way to visualize, and then solve, a decision problem. A standard problem approached in this way is a decision as to whether or not to sample, followed by a decision whether or not to engage in an activity. This approach has been limited to a small number of sample outcomes to be practical. This paper develops an approach which permits a continuous distribution of sample outcomes to be used. A new distribution, the "skewed parabolic distribution" is introduced as a flexible way of representing the judgemental (probabilistic) beliefs of the assessor. This new distribution is compared with the beta distribution. An algorithm, which is easily programmed and is practical for hand computation with a calculator and logarithm tables, is developed for the complete solution of a decision tree problem using this skewed parabolic distribution. An example is given. It is concluded that the skewed parabolic distribution makes use of continuous distributions practical for decision trees, and has advantages over the beta for this purpose.  相似文献   

15.
Trust region algorithms are well known in the field of local continuous optimization. They proceed by maintaining a confidence region in which a simple, most often quadratic, model is substituted to the criterion to be minimized. The minimum of the model in the trust region becomes the next starting point of the algorithm and, depending on the amount of progress made during this step, the confidence region is expanded, contracted or kept unchanged. In the field of global optimization, interval programming may be thought as a kind of confidence region approach, with a binary confidence level: the region is guaranteed to contain the optimum or guaranteed to not contain it. A probabilistic version, known as branch and probability bound, is based on an approximate probability that a region of the search space contains the optimum, and has a confidence level in the interval [0,1]. The method introduced in this paper is an application of the trust region approach within the framework of evolutionary algorithms. Regions of the search space are endowed with a prospectiveness criterion obtained from random sampling possibly coupled with a local continuous algorithm. The regions are considered as individuals for an evolutionary algorithm with mutation and crossover operators based on a transformation group. The performance of the algorithm on some standard benchmark functions is presented.  相似文献   

16.
In this paper we consider the problem of estimating an unknown joint distribution which is defined over mixed discrete and continuous variables. A nonparametric kernel approach is proposed with smoothing parameters obtained from the cross-validated minimization of the estimator's integrated squared error. We derive the rate of convergence of the cross-validated smoothing parameters to their ‘benchmark’ optimal values, and we also establish the asymptotic normality of the resulting nonparametric kernel density estimator. Monte Carlo simulations illustrate that the proposed estimator performs substantially better than the conventional nonparametric frequency estimator in a range of settings. The simulations also demonstrate that the proposed approach does not suffer from known limitations of the likelihood cross-validation method which breaks down with commonly used kernels when the continuous variables are drawn from fat-tailed distributions. An empirical application demonstrates that the proposed method can yield superior predictions relative to commonly used parametric models.  相似文献   

17.
We propose using minimum distance to obtain nonparametric estimates of the distributions of components in random effects models. A main setting considered is equivalent to having a large number of small datasets whose locations, and perhaps scales, vary randomly, but which otherwise have a common distribution. Interest focuses on estimating the distribution that is common to all datasets, knowledge of which is crucial in multiple testing problems where a location/scale invariant test is applied to every small dataset. A detailed algorithm for computing minimum distance estimates is proposed, and the usefulness of our methodology is illustrated by a simulation study and an analysis of microarray data. Supplemental materials for the article, including R-code and a dataset, are available online.  相似文献   

18.
During the past decade, a useful model for nonstationary random fields has been developed. This consists of reducing the random field of interest to isotropy via a bijective bi-continuous deformation of the index space. Then the problem consists of estimating this space deformation together with the isotropic correlation in the deformed index space. We propose to estimate both this space deformation and this isotropic correlation using a constrained continuous version of the simulated annealing for a Metropolis-Hastings dynamic. This method provides a nonparametric estimation of the deformation which has the required property to be bijective; so far, the previous nonparametric methods do not guarantee this property. We illustrate our work with two examples, one concerning a precipitation dataset. We also give one idea of how spatial prediction should proceed in the new coordinate space.  相似文献   

19.
We analyze a semiparametric model for data that suffer from the problems of sample selection, where some of the data are observed for only part of the sample with a probability that depends on a selection equation, and of endogeneity, where a covariate is correlated with the disturbance term. The introduction of nonparametric functions in the model permits great flexibility in the way covariates affect response variables. We present an efficient Bayesian method for the analysis of such models that allows us to consider general systems of outcome variables and endogenous regressors that are continuous, binary, censored, or ordered. Estimation is by Markov chain Monte Carlo (MCMC) methods. The algorithm we propose does not require simulation of the outcomes that are missing due to the selection mechanism, which reduces the computational load and improves the mixing of the MCMC chain. The approach is applied to a model of women’s labor force participation and log-wage determination. Data and computer code used in this article are available online.  相似文献   

20.
This paper develops a robust and efficient estimation procedure for quantile partially linear additive models with longitudinal data, where the nonparametric components are approximated by B spline basis functions. The proposed approach can incorporate the correlation structure between repeated measures to improve estimation efficiency. Moreover, the new method is empirically shown to be much more efficient and robust than the popular generalized estimating equations method for non-normal correlated random errors. However, the proposed estimating functions are non-smooth and non-convex. In order to reduce computational burdens, we apply the induced smoothing method for fast and accurate computation of the parameter estimates and its asymptotic covariance. Under some regularity conditions, we establish the asymptotically normal distribution of the estimators for the parametric components and the convergence rate of the estimators for the nonparametric functions. Furthermore, a variable selection procedure based on smooth-threshold estimating equations is developed to simultaneously identify non-zero parametric and nonparametric components. Finally, simulation studies have been conducted to evaluate the finite sample performance of the proposed method, and a real data example is analyzed to illustrate the application of the proposed method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号