首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We consider Bayesian nonparametric regression through random partition models. Our approach involves the construction of a covariate-dependent prior distribution on partitions of individuals. Our goal is to use covariate information to improve predictive inference. To do so, we propose a prior on partitions based on the Potts clustering model associated with the observed covariates. This drives by covariate proximity both the formation of clusters, and the prior predictive distribution. The resulting prior model is flexible enough to support many different types of likelihood models. We focus the discussion on nonparametric regression. Implementation details are discussed for the specific case of multivariate multiple linear regression. The proposed model performs well in terms of model fitting and prediction when compared to other alternative nonparametric regression approaches. We illustrate the methodology with an application to the health status of nations at the turn of the 21st century. Supplementary materials are available online.  相似文献   

2.
??The multivariate response is commonly seen in longitudinal and cross-sectional design. The marginal model is an important tool in discovering the average influence of the covariates on the response. A main feature of the marginal model is that even without specifying the inter-correlation among different components of the response, we still get consistent estimation of the regression parameters. This paper discusses the GMM estimation of marginal model when the covariates are missing at random. Using the inverse probability weighting and different basic working correlation matrices, we obtain a series of estimating equations. We estimate the parameters of interest by minimizing the corresponding quadratic inference function. Asymptotic normality of the proposed estimator is established. Simulation studies are conducted to investigate the finite sample performance of the new estimator. We also apply our proposal to a real data of mathematical achievement from middle school students.  相似文献   

3.
This article discusses inference on the order of dependence in binary sequences. The proposed approach is based on the notion of partial exchangeability of order k. A partially exchangeable binary sequence of order k can be represented as a mixture of Markov chains. The mixture is with respect to the unknown transition probability matrix θ. We use this defining property to construct a semiparametric model for binary sequences by assuming a nonparametric prior on the transition matrix θ. This enables us to consider inference on the order of dependence without constraint to a particular parametric model. Implementing posterior simulation in the proposed model is complicated by the fact that the dimension of θ changes with the order of dependence k. We discuss appropriate posterior simulation schemes based on a pseudo prior approach. We extend the model to include covariates by considering an alternative parameterization as an autologistic regression which allows for a straightforward introduction of covariates. The regression on covariates raises the additional inference problem of variable selection. We discuss appropriate posterior simulation schemes, focusing on inference about the order of dependence. We discuss and develop the model with covariates only to the extent needed for such inference.  相似文献   

4.

We consider a weighted local linear estimator based on the inverse selection probability for nonparametric regression with missing covariates at random. The asymptotic distribution of the maximal deviation between the estimator and the true regression function is derived and an asymptotically accurate simultaneous confidence band is constructed. The estimator for the regression function is shown to be oracally efficient in the sense that it is uniformly indistinguishable from that when the selection probabilities are known. Finite sample performance is examined via simulation studies which support our asymptotic theory. The proposed method is demonstrated via an analysis of a data set from the Canada 2010/2011 Youth Student Survey.

  相似文献   

5.

We investigate semiparametric estimation of regression coefficients through generalized estimating equations with single-index models when some covariates are missing at random. Existing popular semiparametric estimators may run into difficulties when some selection probabilities are small or the dimension of the covariates is not low. We propose a new simple parameter estimator using a kernel-assisted estimator for the augmentation by a single-index model without using the inverse of selection probabilities. We show that under certain conditions the proposed estimator is as efficient as the existing methods based on standard kernel smoothing, which are often practically infeasible in the case of multiple covariates. A simulation study and a real data example are presented to illustrate the proposed method. The numerical results show that the proposed estimator avoids some numerical issues caused by estimated small selection probabilities that are needed in other estimators.

  相似文献   

6.
It is very common in AIDS studies that response variable (e.g., HIV viral load) may be subject to censoring due to detection limits while covariates (e.g., CD4 cell count) may be measured with error. Failure to take censoring in response variable and measurement errors in covariates into account may introduce substantial bias in estimation and thus lead to unreliable inference. Moreover, with non-normal and/or heteroskedastic data, traditional mean regression models are not robust to tail reactions. In this case, one may find it attractive to estimate extreme causal relationship of covariates to a dependent variable, which can be suitably studied in quantile regression framework. In this paper, we consider joint inference of mixed-effects quantile regression model with right-censored responses and errors in covariates. The inverse censoring probability weighted method and the orthogonal regression method are combined to reduce the biases of estimation caused by censored data and measurement errors. Under some regularity conditions, the consistence and asymptotic normality of estimators are derived. Finally, some simulation studies are implemented and a HIV/AIDS clinical data set is analyzed to to illustrate the proposed procedure.  相似文献   

7.
Many clustering methods, such as K -means, kernel K -means, and MNcut clustering, follow the same recipe: (i) choose a measure of similarity between observations; (ii) define a figure of merit assigning a large value to partitions of the data that put similar observations in the same cluster; and (iii) optimize this figure of merit over partitions. Potts model clustering represents an interesting variation on this recipe. Blatt, Wiseman, and Domany defined a new figure of merit for partitions that is formally similar to the Hamiltonian of the Potts model for ferromagnetism, extensively studied in statistical physics. For each temperature T, the Hamiltonian defines a distribution assigning a probability to each possible configuration of the physical system or, in the language of clustering, to each partition. Instead of searching for a single partition optimizing the Hamiltonian, they sampled a large number of partitions from this distribution for a range of temperatures. They proposed a heuristic for choosing an appropriate temperature and from the sample of partitions associated with this chosen temperature, they then derived what we call a consensus clustering: two observations are put in the same consensus cluster if they belong to the same cluster in the majority of the random partitions. In a sense, the consensus clustering is an “average” of plausible configurations, and we would expect it to be more stable (over different samples)than the configuration optimizing the Hamiltonian.

The goal of this article is to contribute to the understanding of Potts model clustering and to propose extensions and improvements: (1) We show that the Hamiltonian used in Potts model clustering is closely related to the kernel K -means and MNCutcriteria. (2) We propose a modification of the Hamiltonian penalizing unequal clustersizes and show that it can be interpreted as a weighted version of the kernel K -meanscriterion. (3) We introduce a new version of the Wolff algorithm to simulate configurations from the distribution defined by the penalized Hamiltonian, leading to penalized Potts model clustering. (4) We note a link between kernel based clustering methods and nonparametric density estimation and exploit it to automatically determine locally adaptive kernel bandwidths. (5) We propose a new simple rule for selecting a good temperature T.

As an illustration we apply Potts model clustering to gene expression data and compare our results to those obtained by model based clustering and a nonparametric dendrogram sharpening method.  相似文献   

8.
The Shadow Prior     
In this article we consider posterior simulation in models with constrained parameter or sampling spaces. Constraints on the support of sampling and prior distributions give rise to a normalization constant in the complete conditional posterior distribution for the (hyper-) parameters of the respective distribution, complicating posterior simulation.

To mitigate the problem of evaluating normalization constants, we propose a computational approach based on model augmentation. We include an additional level in the probability model to separate the (hyper-) parameter from the constrained probability model, and we refer to this additional level in the probability model as a shadow prior. This approach can significantly reduce the overall computational burden if the original (hyper-) prior includes a complicated structure, but a simple form is chosen for the shadow prior, for example, if the original prior includes a mixture model or multivariate distribution, and the shadow prior defines a set of shadow parameters that are iid given the (hyper-) parameters. Although introducing the shadow prior changes the posterior inference on the original parameters, we argue that by appropriate choices of the shadow prior, the change is minimal and posterior simulation in the augmented probability model provides a meaningful approximation to the desired inference. Data used in this article are available online.  相似文献   

9.

We present a theoretical and computational framework to compute the symmetry number of a flexible sphere cluster in \({\mathbb {R}}^3\), using a definition of symmetry that arises naturally when calculating the equilibrium probability of a cluster of spheres in the sticky-sphere limit. We define the sticky symmetry group of the cluster as the set of permutations and inversions of the spheres which preserve adjacency and can be realized by continuous deformations of the cluster that do not change the set of contacts or cause particles to overlap. The symmetry number is the size of the sticky symmetry group. We introduce a numerical algorithm to compute the sticky symmetry group and symmetry number, and show it works well on several test cases. Furthermore, we show that once the sticky symmetry group has been calculated for indistinguishable spheres, the symmetry group for partially distinguishable spheres (those with nonidentical interactions) can be efficiently obtained without repeating the laborious parts of the computations. We use our algorithm to calculate the partition functions of every possible connected cluster of six identical sticky spheres, generating data that may be used to design interactions between spheres so they self-assemble into a desired structure.

  相似文献   

10.

This paper considers estimation and inference in semiparametric quantile regression models when the response variable is subject to random censoring. The paper considers both the cases of independent and dependent censoring and proposes three iterative estimators based on inverse probability weighting, where the weights are estimated from the censoring distribution using the Kaplan–Meier, a fully parametric and the conditional Kaplan–Meier estimators. The paper proposes a computationally simple resampling technique that can be used to approximate the finite sample distribution of the parametric estimator. The paper also considers inference for both the parametric and nonparametric components of the quantile regression model. Monte Carlo simulations show that the proposed estimators and test statistics have good finite sample properties. Finally, the paper contains a real data application, which illustrates the usefulness of the proposed methods.

  相似文献   

11.
A Bayesian inference for a linear Gaussian random coefficient regression model with inhomogeneous within-class variances is presented. The model is motivated by an application in metrology, but it may well find interest in other fields. We consider the selection of a noninformative prior for the Bayesian inference to address applications where the available prior knowledge is either vague or shall be ignored. The noninformative prior is derived by applying the Berger and Bernardo reference prior principle with the means of the random coefficients forming the parameters of interest. We show that the resulting posterior is proper and specify conditions for the existence of first and second moments of the marginal posterior. Simulation results are presented which suggest good frequentist properties of the proposed inference. The calibration of sonic nozzle data is considered as an application from metrology. The proposed inference is applied to these data and the results are compared to those obtained by alternative approaches.  相似文献   

12.
We propose a two-component graphical chain model, the discrete regression distribution, where a set of discrete random variables is modeled as a response to a set of categorical and continuous covariates. The proposed model is useful for modeling a set of discrete variables measured at multiple sites along with a set of continuous and/or discrete covariates. The proposed model allows for joint examination of the dependence structure of the discrete response and observed covariates and also accommodates site-to-site variability. We develop the graphical model properties and theoretical justifications of this model. Our model has several advantages over the traditional logistic normal model used to analyze similar compositional data, including site-specific random effect terms and the incorporation of discrete and continuous covariates.  相似文献   

13.
In change point problems in general we should answer three questions: how many changes are there? Where are they? And, what is the distribution of the data within the blocks? In this paper, we develop a new full predictivistic approach for modeling observations within the same block of observation and consider the product partition model (PPM) for treating the change point problem. The PPM brings more flexibility into the change point problem because it considers the number of changes and the instants when the changes occurred as random variables. A full predictivistic characterization of the model can provide a more tractable way to elicit the prior distribution of the parameters of interest, once prior opinions will be required only about observable quantities. We also present an application to the problem of identifying multiple change points in the mean and variance of a stock market return time series.  相似文献   

14.
We propose a new model for cluster analysis in a Bayesian nonparametric framework. Our model combines two ingredients, species sampling mixture models of Gaussian distributions on one hand, and a deterministic clustering procedure (DBSCAN) on the other. Here, two observations from the underlying species sampling mixture model share the same cluster if the distance between the densities corresponding to their latent parameters is smaller than a threshold; this yields a random partition which is coarser than the one induced by the species sampling mixture. Since this procedure depends on the value of the threshold, we suggest a strategy to fix it. In addition, we discuss implementation and applications of the model; comparison with more standard clustering algorithms will be given as well. Supplementary materials for the article are available online.  相似文献   

15.
A new means of estimating the correlation coefficient for cluster binary data in the regression settings is introduced. The creation of this method is founded upon the violation of Bartlett’s second identity when adopting the binomial distributions to model binary data that are correlated. The new methodology applies to any sensible link functions that connect the success probability and covariates. One can easily implement the procedure by using any statistical software providing the naïve and the sandwich covariance matrices for regression parameter estimates. Simulations and real data analyses are used to demonstrate the efficacy of our new procedure.  相似文献   

16.
There is an increasingly rich literature about Bayesian nonparametric models for clustering functional observations. Most recent proposals rely on infinite-dimensional characterizations that might lead to overly complex cluster solutions. In addition, while prior knowledge about the functional shapes is typically available, its practical exploitation might be a difficult modeling task. Motivated by an application in e-commerce, we propose a novel enriched Dirichlet mixture model for functional data. Our proposal accommodates the incorporation of functional constraints while bounding the model complexity. We characterize the prior process through a urn scheme to clarify the underlying partition mechanism. These features lead to a very interpretable clustering method compared to available techniques. Moreover, we employ a variational Bayes approximation for tractable posterior inference to overcome computational bottlenecks.  相似文献   

17.
This paper proposes a transformed random effects model for analyzing non-normal panel data where both the response and (some of) the covariates are subject to transformations for inducing flexible functional form, normality, homoscedasticity, and simple model structure. We develop a maximum likelihood procedure for model estimation and inference, along with a computational device which makes the estimation procedure feasible in cases of large panels. We provide model specification tests that take into account the fact that parameter values for error components cannot be negative. We illustrate the model and methods with two applications: state production and wage distribution. The empirical results strongly favor the new model to the standard ones where either linear or log-linear functional form is employed. Monte Carlo simulation shows that maximum likelihood inference is quite robust against mild departure from normality. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

18.
The mixture of Dirichlet process (MDP) defines a flexible prior distribution on the space of probability measures. This study shows that ordinary least-squares (OLS) estimator, as a functional of the MDP posterior distribution, has posterior mean given by weighted least-squares (WLS), and has posterior covariance matrix given by the (weighted) heteroscedastic-consistent sandwich estimator. This is according to a pairs bootstrap distribution approximation of the posterior, using a Pólya urn scheme. Also, when the MDP prior baseline distribution is specified as a product of independent probability measures, this WLS solution provides a new type of generalized ridge regression estimator. Such an estimator can handle multicollinear or singular design matrices even when the number of covariates exceeds the sample size, and can shrink the coefficient estimates of irrelevant covariates towards zero, which makes it useful for nonlinear regressions via basis expansions. Also, this MDP/OLS functional methodology can be extended to methods for analyzing the sensitivity of the heteroscedasticity-consistent causal effect size over a range of hidden biases, due to missing covariates omitted from the regression; and more generally, can be extended to a Vibration of Effects analysis. The methodology is illustrated through the analysis of simulated and real data sets. Overall, this study establishes new connections between Dirichlet process functional inference, the bootstrap, consistent sandwich covariance estimation, ridge shrinkage regression, WLS, and sensitivity analysis, to provide regression methodology useful for inferences of the mean dependent response.  相似文献   

19.
We describe and contrast several different bootstrap procedures for penalized spline smoothers. The bootstrap methods considered are variations on existing methods, developed under two different probabilistic frameworks. Under the first framework, penalized spline regression is considered as an estimation technique to find an unknown smooth function. The smooth function is represented in a high-dimensional spline basis, with spline coefficients estimated in a penalized form. Under the second framework, the unknown function is treated as a realization of a set of random spline coefficients, which are then predicted in a linear mixed model. We describe how bootstrap methods can be implemented under both frameworks, and we show theoretically and through simulations and examples that bootstrapping provides valid inference in both cases. We compare the inference obtained under both frameworks, and conclude that the latter generally produces better results than the former. The bootstrap ideas are extended to hypothesis testing, where parametric components in a model are tested against nonparametric alternatives.

Datasets and computer code are available in the online supplements.  相似文献   

20.
Recurrent event data often arises in biomedical studies, and individuals within a cluster might not be independent. We propose a semiparametric additive rates model for clustered recurrent event data, wherein the covariates are assumed to add to the unspecified baseline rate. For the inference on the model parameters, estimating equation approaches are developed, and both large and finite sample properties of the proposed estimators are established.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号