首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
While graphical models for continuous data (Gaussian graphical models) and discrete data (Ising models) have been extensively studied, there is little work on graphical models for datasets with both continuous and discrete variables (mixed data), which are common in many scientific applications. We propose a novel graphical model for mixed data, which is simple enough to be suitable for high-dimensional data, yet flexible enough to represent all possible graph structures. We develop a computationally efficient regression-based algorithm for fitting the model by focusing on the conditional log-likelihood of each variable given the rest. The parameters have a natural group structure, and sparsity in the fitted graph is attained by incorporating a group lasso penalty, approximated by a weighted lasso penalty for computational efficiency. We demonstrate the effectiveness of our method through an extensive simulation study and apply it to a music annotation dataset (CAL500), obtaining a sparse and interpretable graphical model relating the continuous features of the audio signal to binary variables such as genre, emotions, and usage associated with particular songs. While we focus on binary discrete variables for the main presentation, we also show that the proposed methodology can be easily extended to general discrete variables.  相似文献   

2.
We consider the problem of learning the structure of a pairwise graphical model over continuous and discrete variables. We present a new pairwise model for graphical models with both continuous and discrete variables that is amenable to structure learning. In previous work, authors have considered structure learning of Gaussian graphical models and structure learning of discrete models. Our approach is a natural generalization of these two lines of work to the mixed case. The penalization scheme involves a novel symmetric use of the group-lasso norm and follows naturally from a particular parameterization of the model. Supplementary materials for this article are available online.  相似文献   

3.
Prediction of Euclidean distances with discrete and continuous outcomes   总被引:1,自引:0,他引:1  
The objective of this paper is first to predict generalized Euclidean distances in the context of discrete and quantitative variables and then to derive their statistical properties. We first consider the simultaneous modelling of discrete and continuous random variables with covariates and obtain the likelihood. We derive an important property useful for its practical maximization. We then study the prediction of any Euclidean distances and its statistical proprieties, especially for the Mahalanobis distance. The quality of distance estimation is analyzed through simulations. This results are applied to our motivating example: the official distinction procedure of rapeseed varieties.  相似文献   

4.
In this paper we present a discrete survival model with covariates and random effects, where the random effects may depend on the observed covariates. The dependence between the covariates and the random effects is modelled through correlation parameters, and these parameters can only be identified for time-varying covariates. For time-varying covariates, however, it is possible to separate regression effects and selection effects in the case of a certain dependene structure between the random effects and the time-varying covariates that are assumed to be conditionally independent given the initial level of the covariate. The proposed model is equivalent to a model with independent random effects and the initial level of the covariates as further covariates. The model is applied to simulated data that illustrates some identifiability problems, and further indicate how the proposed model may be an approximation to retrospectively collected data with incorrect specification of the waiting times. The model is fitted by maximum likelihood estimation that is implemented as iteratively reweighted least squares. © 1998 John Wiley & Sons, Ltd.  相似文献   

5.
This paper develops two copula models for fitting the insurance claim numbers with excess zeros and time-dependence. The joint distribution of the claims in two successive periods is modeled by a copula with discrete or continuous marginal distributions. The first model fits two successive claims by a bivariate copula with discrete marginal distributions. In the second model, a copula is used to model the random effects of the conjoint numbers of successive claims with continuous marginal distributions. Zero-inflated phenomenon is taken into account in the above copula models. The maximum likelihood is applied to estimate the parameters of the discrete copula model. A two-step procedure is proposed to estimate the parameters in the second model, with the first step to estimate the marginals, followed by the second step to estimate the unobserved random effect variables and the copula parameter. Simulations are performed to assess the proposed models and methodologies.  相似文献   

6.
Probabilistic Decision Graphs (PDGs) are probabilistic graphical models that represent a factorisation of a discrete joint probability distribution using a “decision graph”-like structure over local marginal parameters. The structure of a PDG enables the model to capture some context specific independence relations that are not representable in the structure of more commonly used graphical models such as Bayesian networks and Markov networks. This sometimes makes operations in PDGs more efficient than in alternative models. PDGs have previously been defined only in the discrete case, assuming a multinomial joint distribution over the variables in the model. We extend PDGs to incorporate continuous variables, by assuming a Conditional Gaussian (CG) joint distribution. We also show how inference can be carried out in an efficient way.  相似文献   

7.
Suppose that the failure times of the units placed on a life-testing experiment are independent but nonidentically distributed random variables. Under progressively type II censoring scheme, distributional properties of the proposed random variables are presented and some inferences are made. Assuming that the random variables come from a proportional hazard rate model, the formulas are simplified and also the amount of Fisher information about the common parameters of this family is calculated. The results are also extended to a fixed covariates model. The performance of the proposed procedure is investigated via a real data set. Some numerical computations are also presented to study the effect of the proportionality rates in view of the Fisher information criterion. Finally, some concluding remarks are stated.  相似文献   

8.
We propose a probability model for random partitions in the presence of covariates. In other words, we develop a model-based clustering algorithm that exploits available covariates. The motivating application is predicting time to progression for patients in a breast cancer trial. We proceed by reporting a weighted average of the responses of clusters of earlier patients. The weights should be determined by the similarity of the new patient’s covariate with the covariates of patients in each cluster. We achieve the desired inference by defining a random partition model that includes a regression on covariates. Patients with similar covariates are a priori more likely to be clustered together. Posterior predictive inference in this model formalizes the desired prediction.

We build on product partition models (PPM). We define an extension of the PPM to include a regression on covariates by including in the cohesion function a new factor that increases the probability of experimental units with similar covariates to be included in the same cluster. We discuss implementations suitable for any combination of continuous, categorical, count, and ordinal covariates.

An implementation of the proposed model as R-package is available for download.  相似文献   

9.
Various random effects models have been developed for clustered binary data; however, traditional approaches to these models generally rely heavily on the specification of a continuous random effect distribution such as Gaussian or beta distribution. In this article, we introduce a new model that incorporates nonparametric unobserved random effects on unit interval (0,1) into logistic regression multiplicatively with fixed effects. This new multiplicative model setup facilitates prediction of our nonparametric random effects and corresponding model interpretations. A distinctive feature of our approach is that a closed-form expression has been derived for the predictor of nonparametric random effects on unit interval (0,1) in terms of known covariates and responses. A quasi-likelihood approach has been developed in the estimation of our model. Our results are robust against random effects distributions from very discrete binary to continuous beta distributions. We illustrate our method by analyzing recent large stock crash data in China. The performance of our method is also evaluated through simulation studies.  相似文献   

10.
王学武 《应用数学》2012,25(1):105-109
本文引入离散指数分布概念,建立了关于离散型指数分布序列的强偏差定理和强大数定律.同时,得到离散指数分布序列对连续指数分布序列的强逼近.  相似文献   

11.
Suppose that cause-effect relationships between variables can be described by a causal network with a linear structural equation model. Kuroki and Miyakawa proposed a graphical criterion for selecting covariates to identify the effect of a conditional plan with one control variable [J. Roy. Statist. Soc. Ser. B, 2003, 65: 209–222]. In this paper, we study a particular type of conditional plan with more than one control variable and propose a graphical criterion for selecting covariates to identify the effect of a conditional plan of the studied type.  相似文献   

12.
This paper presents a new algorithm for learning the structure of a special type of Bayesian network. The conditional phase-type (C-Ph) distribution is a Bayesian network that models the probabilistic causal relationships between a skewed continuous variable, modelled by the Coxian phase-type distribution, a special type of Markov model, and a set of interacting discrete variables. The algorithm takes a data set as input and produces the structure, parameters and graphical representations of the fit of the C-Ph distribution as output. The algorithm, which uses a greedy-search technique and has been implemented in MATLAB, is evaluated using a simulated data set consisting of 20,000 cases. The results show that the original C-Ph distribution is recaptured and the fit of the network to the data is discussed.  相似文献   

13.
In this article we study a semiparametric generalized partially linear model when the covariates are missing at random. We propose combining local linear regression with the local quasilikelihood technique and weighted estimating equation to estimate the parameters and nonparameters when the missing probability is known or unknown. We establish normality of the estimators of the parameter and asymptotic expansion for the estimators of the nonparametric part. We apply the proposed models and methods to a study of the relation between virologic and immunologic responses in AIDS clinical trials, in which virologic response is classified into binary variables. We also give simulation results to illustrate our approach.  相似文献   

14.
Cycle-transitive comparison of independent random variables   总被引:2,自引:0,他引:2  
The discrete dice model, previously introduced by the present authors, essentially amounts to the pairwise comparison of a collection of independent discrete random variables that are uniformly distributed on finite integer multisets. This pairwise comparison results in a probabilistic relation that exhibits a particular type of transitivity, called dice-transitivity. In this paper, the discrete dice model is generalized with the purpose of pairwisely comparing independent discrete or continuous random variables with arbitrary probability distributions. It is shown that the probabilistic relation generated by a collection of arbitrary independent random variables is still dice-transitive. Interestingly, this probabilistic relation can be seen as a graded alternative to the concept of stochastic dominance. Furthermore, when the marginal distributions of the random variables belong to the same parametric family of distributions, the probabilistic relation exhibits interesting types of isostochastic transitivity, such as multiplicative transitivity. Finally, the probabilistic relation generated by a collection of independent normal random variables is proven to be moderately stochastic transitive.  相似文献   

15.
The cluster-weighted model (CWM) is a mixture model with random covariates that allows for flexible clustering/classification and distribution estimation of a random vector composed of a response variable and a set of covariates. Within this class of models, the generalized linear exponential CWM is here introduced especially for modeling bivariate data of mixed-type. Its natural counterpart in the family of latent class models is also defined. Maximum likelihood parameter estimates are derived using the expectation-maximization algorithm and some computational issues are detailed. Through Monte Carlo experiments, the classification performance of the proposed model is compared with other mixture-based approaches, consistency of the estimators of the regression coefficients is evaluated, and several likelihood-based information criteria are compared for selecting the number of mixture components. An application to real data is also finally considered.  相似文献   

16.
The kinematics of a projectile in flight provides an ideal opportunity for an introduction to (and a comparison between) discrete and continuous methods in applied mathematics. We use a graphical method in the discrete approach, which provides good physical insight and serves as an introduction to finite difference methods. The continuous approach is better in the no‐drag case, but the discrete approach is foundto be more effective when a nonlinear drag effect is included in the model.  相似文献   

17.
We study partial linear single index models when the response and the covariates in the parametric part are measured with errors and distorted by unknown functions of commonly observable confounding variables, and propose a semiparametric covariate-adjusted estimation procedure. We apply the minimum average variance estimation method to estimate the parameters of interest. This is different from all existing covariate-adjusted methods in the literature. Asymptotic properties of the proposed estimators are established. Moreover, we also study variable selection by adopting the coordinate-independent sparse estimation to select all relevant but distorted covariates in the parametric part. We show that the resulting sparse estimators can exclude all irrelevant covariates with probability approaching one. A simulation study is conducted to evaluate the performance of the proposed methods and a real data set is analyzed for illustration.  相似文献   

18.
Degradation data have been widely used to estimate product reliability. Because of technology advancement, time‐varying usage and environmental variables, which are called dynamic covariates, can be easily recorded nowadays, in addition to the traditional degradation measurements. The use of dynamic covariates is appealing because they have the potential to explain more variability in degradation paths. We propose a class of general path models to incorporate dynamic covariates for modeling of degradation paths. Physically motivated nonlinear functions are used to describe the degradation paths, and random effects are used to describe unit‐to‐unit variability. The covariate effects are modeled by shape‐restricted splines. The estimation of unknown model parameters is challenging because of the involvement of nonlinear relationships, random effects, and shaped‐restricted splines. We develop an efficient procedure for parameter estimations. The performance of the proposed method is evaluated by simulations. An outdoor coating weathering dataset is used to illustrate the proposed method. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

19.
In practice, quality characteristics do not always follow a normal distribution, and quality control processes sometimes generate non‐normal response outcomes, including continuous non‐normal data and discrete count data. Thus, achieving better results in such situations requires a new control chart derived from various types of response variables. This study proposes a procedure for monitoring response variables that uses control charts based on randomized quantile residuals obtained from a fitted regression model. Simulation studies demonstrate the performance of the proposed control charts under various situations. We illustrate the procedure using two real‐data examples, based on normal and negative binomial regression models, respectively. The simulation and real‐data results support our proposed procedure.  相似文献   

20.
A discrete time Markov chain assumes that the population is homogeneous, each individual in the population evolves according to the same transition matrix. In contrast, a discrete mover‐stayer (MS) model postulates a simple form of population heterogeneity; in each initial state, there is a proportion of individuals who never leave this state (stayers) and the complementary proportion of individuals who evolve according to a Markov chain (movers). The MS model was extended by specifying the stayer's probability to be a logistic function of an individual's covariates but leaving the same transition matrix for all movers. We further extend the MS model by allowing each mover to have her/his covariates dependent transition matrix. The model for a mover's transition matrix is related to the extant Markov chains mixture model with mixing on the speed of movement of Markov chains. The proposed model is estimated using the expectation‐maximization algorithm and illustrated with a large data set on car loans and the simulation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号