首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Additive hazards model with random effects is proposed for modelling the correlated failure time data when focus is on comparing the failure times within clusters and on estimating the correlation between failure times from the same cluster, as well as the marginal regression parameters. Our model features that, when marginalized over the random effect variable, it still enjoys the structure of the additive hazards model. We develop the estimating equations for inferring the regression parameters. The proposed estimators are shown to be consistent and asymptotically normal under appropriate regularity conditions. Furthermore, the estimator of the baseline hazards function is proposed and its asymptotic properties are also established. We propose a class of diagnostic methods to assess the overall fitting adequacy of the additive hazards model with random effects. We conduct simulation studies to evaluate the finite sample behaviors of the proposed estimators in various scenarios. Analysis of the Diabetic Retinopathy Study is provided as an illustration for the proposed method.  相似文献   

2.
Abstract Developing models to predict tree mortality using data from long‐term repeated measurement data sets can be difficult and challenging due to the nature of mortality as well as the effects of dependence on observations. Marginal (population‐averaged) generalized estimating equations (GEE) and random effects (subject‐specific) models offer two possible ways to overcome these effects. For this study, standard logistic, marginal logistic based on the GEE approach, and random logistic regression models were fitted and compared. In addition, four model evaluation statistics were calculated by means of K‐fold cross‐valuation. They include the mean prediction error, the mean absolute prediction error, the variance of prediction error, and the mean square error. Results from this study suggest that the random effects model produced the smallest evaluation statistics among the three models. Although marginal logistic regression accommodated for correlations between observations, it did not provide noticeable improvements of model performance compared to the standard logistic regression model that assumed impendence. This study indicates that the random effects model was able to increase the overall accuracy of mortality modeling. Moreover, it was able to ascertain correlation derived from the hierarchal data structure as well as serial correlation generated through repeated measurements.  相似文献   

3.
We present a simple, efficient, and computationally cheap sampling method for exploring an un-normalized multivariate density on ?(d), such as a posterior density, called the Polya tree sampler. The algorithm constructs an independent proposal based on an approximation of the target density. The approximation is built from a set of (initial) support points - data that act as parameters for the approximation - and the predictive density of a finite multivariate Polya tree. In an initial "warming-up" phase, the support points are iteratively relocated to regions of higher support under the target distribution to minimize the distance between the target distribution and the Polya tree predictive distribution. In the "sampling" phase, samples from the final approximating mixture of finite Polya trees are used as candidates which are accepted with a standard Metropolis-Hastings acceptance probability. Several illustrations are presented, including comparisons of the proposed approach to Metropolis-within-Gibbs and delayed rejection adaptive Metropolis algorithm.  相似文献   

4.
5.
Multiblock component methods are applied to data sets for which several blocks of variables are measured on a same set of observations with the goal to analyze the relationships between these blocks of variables. In this article, we focus on multiblock component methods that integrate the information found in several blocks of explanatory variables in order to describe and explain one set of dependent variables. In the following, multiblock PLS and multiblock redundancy analysis are chosen, as particular cases of multiblock component methods when one set of variables is explained by a set of predictor variables that is organized into blocks. Because these multiblock techniques assume that the observations come from a homogeneous population they will provide suboptimal results when the observations actually come from different populations. A strategy to palliate this problem—presented in this article—is to use a technique such as clusterwise regression in order to identify homogeneous clusters of observations. This approach creates two new methods that provide clusters that have their own sets of regression coefficients. This combination of clustering and regression improves the overall quality of the prediction and facilitates the interpretation. In addition, the minimization of a well-defined criterion—by means of a sequential algorithm—ensures that the algorithm converges monotonously. Finally, the proposed method is distribution-free and can be used when the explanatory variables outnumber the observations within clusters. The proposed clusterwise multiblock methods are illustrated with of a simulation study and a (simulated) example from marketing.  相似文献   

6.
Clustering is often useful for analyzing and summarizing information within large datasets. Model-based clustering methods have been found to be effective for determining the number of clusters, dealing with outliers, and selecting the best clustering method in datasets that are small to moderate in size. For large datasets, current model-based clustering methods tend to be limited by memory and time requirements and the increasing difficulty of maximum likelihood estimation. They may fit too many clusters in some portions of the data and/or miss clusters containing relatively few observations. We propose an incremental approach for data that can be processed as a whole in memory, which is relatively efficient computationally and has the ability to find small clusters in large datasets. The method starts by drawing a random sample of the data, selecting and fitting a clustering model to the sample, and extending the model to the full dataset by additional EM iterations. New clusters are then added incrementally, initialized with the observations that are poorly fit by the current model. We demonstrate the effectiveness of this method by applying it to simulated data, and to image data where its performance can be assessed visually.  相似文献   

7.
A clustering method is presented for analysing multivariate binary data with missing values. When not all values are observed, Govaert3 has studied the relations between clustering methods and statistical models. The author has shown how the identification of a mixture of Bernoulli distributions with the same parameter for all clusters and for all variables corresponds to a clustering criterion which uses L1 distance characterizing the MNDBIN method (Marchetti8). He first generalized this model by selecting parameters which can depend on variables and finally by selecting parameters which can depend both on variables and on clusters. We use the previous models to derive a clustering method adapted to missing data. This method optimizes a criterion by a standard iterative partitioning algorithm which removes the necessity either to ignore objects or to substitute the missing data. We study several versions of this algorithm and, finally, a brief account is given of the application of this method to some simulated data.  相似文献   

8.
In this paper, we discuss Bayesian joint quantile regression of mixed effects models with censored responses and errors in covariates simultaneously using Markov Chain Monte Carlo method. Under the assumption of asymmetric Laplace error distribution, we establish a Bayesian hierarchical model and derive the posterior distributions of all unknown parameters based on Gibbs sampling algorithm. Three cases including multivariate normal distribution and other two heavy-tailed distributions are considered for fitting random effects of the mixed effects models. Finally, some Monte Carlo simulations are performed and the proposed procedure is illustrated by analyzing a group of AIDS clinical data set.  相似文献   

9.
In recent years, many clustering methods have been proposed to extract information from networks. The principle is to look for groups of vertices with homogenous connection profiles. Most of these techniques are suitable for static networks, that is to say, not taking into account the temporal dimension. This work is motivated by the need of analyzing evolving networks where a decomposition of the networks into subgraphs is given. Therefore, in this paper, we consider the random subgraph model (RSM) which was proposed recently to model networks through latent clusters built within known partitions. Using a state space model to characterize the cluster proportions, RSM is then extended in order to deal with dynamic networks. We call the latter the dynamic random subgraph model (dRSM). A variational expectation maximization (VEM) algorithm is proposed to perform inference. We show that the variational approximations lead to an update step which involves a new state space model from which the parameters along with the hidden states can be estimated using the standard Kalman filter and Rauch–Tung–Striebel smoother. Simulated data sets are considered to assess the proposed methodology. Finally, dRSM along with the corresponding VEM algorithm are applied to an original maritime network built from printed Lloyd’s voyage records.  相似文献   

10.
在生物医学研究中,多元失效时间数据非常常见.该文提出用一般边际半参数危险率回归模型来分析多元失效时间数据.此模型包括了三种常用边际模型:边际比例风险模型、边际加速失效时间模型和边际加速危险模型作为子模型.对于模型中的回归系数,可以通过估计方程的方法来估计它,同时也给出了基准累积危险率函数的估计.得到的估计可以证明是相合的和渐近正态的.  相似文献   

11.
Abstract

An essential feature of longitudinal data is the existence of autocorrelation among the observations from the same unit or subject. Two-stage random-effects linear models are commonly used to analyze longitudinal data. These models are not flexible enough, however, for exploring the underlying data structures and, especially, for describing time trends. Semi-parametric models have been proposed recently to accommodate general time trends. But these semi-parametric models do not provide a convenient way to explore interactions among time and other covariates although such interactions exist in many applications. Moreover, semi-parametric models require specifying the design matrix of the covariates (time excluded). We propose nonparametric models to resolve these issues. To fit nonparametric models, we use the novel technique of the multivariate adaptive regression splines for the estimation of mean curve and then apply an EM-like iterative procedure for covariance estimation. After giving a general algorithm of model building, we show how to design a fast algorithm. We use both simulated and published data to illustrate the use of our proposed method.  相似文献   

12.
A random forest (RF) predictor is an ensemble of individual tree predictors. As part of their construction, RF predictors naturally lead to a dissimilarity measure between the observations. One can also define an RF dissimilarity measure between unlabeled data: the idea is to construct an RF predictor that distinguishes the “observed” data from suitably generated synthetic data. The observed data are the original unlabeled data and the synthetic data are drawn from a reference distribution. Here we describe the properties of the RF dissimilarity and make recommendations on how to use it in practice.

An RF dissimilarity can be attractive because it handles mixed variable types well, is invariant to monotonic transformations of the input variables, and is robust to outlying observations. The RF dissimilarity easily deals with a large number of variables due to its intrinsic variable selection; for example, the Addcl 1 RF dissimilarity weighs the contribution of each variable according to how dependent it is on other variables.

We find that the RF dissimilarity is useful for detecting tumor sample clusters on the basis of tumor marker expressions. In this application, biologically meaningful clusters can often be described with simple thresholding rules.  相似文献   

13.
This paper develops a Bayesian approach to analyzing quantile regression models for censored dynamic panel data. We employ a likelihood-based approach using the asymmetric Laplace error distribution and introduce lagged observed responses into the conditional quantile function. We also deal with the initial conditions problem in dynamic panel data models by introducing correlated random effects into the model. For posterior inference, we propose a Gibbs sampling algorithm based on a location-scale mixture representation of the asymmetric Laplace distribution. It is shown that the mixture representation provides fully tractable conditional posterior densities and considerably simplifies existing estimation procedures for quantile regression models. In addition, we explain how the proposed Gibbs sampler can be utilized for the calculation of marginal likelihood and the modal estimation. Our approach is illustrated with real data on medical expenditures.  相似文献   

14.
Treed Regression     
Abstract

Given a data set consisting of n observations on p independent variables and a single dependent variable, treed regression creates a binary tree with a simple linear regression function at each of the leaves. Each node of the tree consists of an inequality condition on one of the independent variables. The tree is generated from the training data by a recursive partitioning algorithm. Treed regression models are more parsimonious than CART models because there are fewer splits. Additionally, monotonicity in some or all of the variables can be imposed.  相似文献   

15.
16.
We use the local maxima of a redescending M-estimator to identify cluster, a method proposed already by Morgenthaler (in: H.D. Lawrence, S. Arthur (Eds.), Robust Regression, Dekker, New York, 1990, pp. 105–128) for finding regression clusters. We work out the method not only for classical regression but also for orthogonal regression and multivariate location and show that all three approaches are special cases of a general approach which includes also other cluster problems. For the general case we show consistency for an asymptotic objective function which generalizes the density in the multivariate case. The approach of orthogonal regression is applied to the identification of edges in noisy images.  相似文献   

17.
Multivariate adaptive regression splines (MARS) is a popular nonparametric regression tool often used for prediction and for uncovering important data patterns between the response and predictor variables. The standard MARS algorithm assumes responses are normally distributed and independent, but in this article we relax both of these assumptions by extending MARS to generalized estimating equations. We refer to this MARS-for-GEEs algorithm as “MARGE.” Our algorithm makes use of fast forward selection techniques, such that in the univariate case, MARGE has similar computation speed to a standard MARS implementation. Through simulation we show that the proposed algorithm has improved predictive performance than the original MARS algorithm when using correlated and/or nonnormal response data. MARGE is also competitive with alternatives in the literature, especially for problems with multiple interacting predictors. We apply MARGE to various ecological examples with different data types. Supplementary material for this article is available online.  相似文献   

18.
Given a set of data, very little is known about tests to determine number of clusters and/or elements of the clusters. Even in the simplest case of detecting between only one or two clusters with multivariate normal data, theoretically the number of tests needed seems to be infinite. Alternatively, suppose N independent estimates of generalized variances (GVs) are computed from a given set of p-dimensional vector observations. Assuming multivariate normality, tests based on GVs are proposed which objectively and uniquely determine, simultaneously, the number of clusters and their corresponding elements. Only a reasonably small nunber of tests are required for this stepwise procedure. The exact percentage points are either available from existing tables or can be computed from a result presented.  相似文献   

19.
In this article, we propose and explore a multivariate logistic regression model for analyzing multiple binary outcomes with incomplete covariate data where auxiliary information is available. The auxiliary data are extraneous to the regression model of interest but predictive of the covariate with missing data. Horton and Laird [N.J. Horton, N.M. Laird, Maximum likelihood analysis of logistic regression models with incomplete covariate data and auxiliary information, Biometrics 57 (2001) 34–42] describe how the auxiliary information can be incorporated into a regression model for a single binary outcome with missing covariates, and hence the efficiency of the regression estimators can be improved. We consider extending the method of [9] to the case of a multivariate logistic regression model for multiple correlated outcomes, and with missing covariates and completely observed auxiliary information. We demonstrate that in the case of moderate to strong associations among the multiple outcomes, one can achieve considerable gains in efficiency from estimators in a multivariate model as compared to the marginal estimators of the same parameters.  相似文献   

20.
Tree-structured models have been widely used because they function as interpretable prediction models that offer easy data visualization. A number of tree algorithms have been developed for univariate response data and can be extended to analyze multivariate response data. We propose a tree algorithm by combining the merits of a tree-based model and a mixed-effects model for longitudinal data. We alleviate variable selection bias through residual analysis, which is used to solve problems that exhaustive search approaches suffer from, such as undue preference to split variables with more possible splits, expensive computational cost, and end-cut preference. Most importantly, our tree algorithm discovers trends over time on each of the subspaces from recursive partitioning, while other tree algorithms predict responses. We investigate the performance of our algorithm with both simulation and real data studies. We also develop an R package melt that can be used conveniently and freely. Additional results are provided as online supplementary material.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号