共查询到20条相似文献,搜索用时 0 毫秒
1.
A mixture approach to clustering is an important technique in cluster analysis. A mixture of multivariate multinomial distributions is usually used to analyze categorical data with latent class model. The parameter estimation is an important step for a mixture distribution. Described here are four approaches to estimating the parameters of a mixture of multivariate multinomial distributions. The first approach is an extended maximum likelihood (ML) method. The second approach is based on the well-known expectation maximization (EM) algorithm. The third approach is the classification maximum likelihood (CML) algorithm. In this paper, we propose a new approach using the so-called fuzzy class model and then create the fuzzy classification maximum likelihood (FCML) approach for categorical data. The accuracy, robustness and effectiveness of these four types of algorithms for estimating the parameters of multivariate binomial mixtures are compared using real empirical data and samples drawn from the multivariate binomial mixtures of two classes. The results show that the proposed FCML algorithm presents better accuracy, robustness and effectiveness. Overall, the FCML algorithm has the superiority over the ML, EM and CML algorithms. Thus, we recommend FCML as another good tool for estimating the parameters of mixture multivariate multinomial models. 相似文献
2.
A bootstrap procedure useful in latent class, or more general mixture models has been developed to determine the sufficient number of latent classes or components required to account for systematic group differences in the data. The procedure is illustrated in the context of a multidimensional scaling latent class model, CLASCAL. Also presented is a bootstrap technique for determining standard errors for estimates of the stimulus co‐ordinates, parameters of the multidimensional scaling model. Real and artificial data are presented. The bootstrap procedure for selecting a sufficient number of classes seems to correctly select the correct number of latent classes at both low and high error levels. At higher error levels it outperforms Hope's ( J. Roy. Statist. Soc. Ser B 1968; 30 : 582) procedure. The bootstrap procedures to estimate parameter stability appear to correctly re‐produce Monte Carlo results. Copyright © 2002 John Wiley & Sons, Ltd. 相似文献
3.
In this paper, an ordinal multilevel latent Markov model based on separate random effects is proposed. In detail, two distinct second-level discrete effects are considered in the model, one affecting the initial probability vector and the other affecting the transition probability matrix of the first-level ordinal latent Markov process. To model these separate effects, we consider a bi-dimensional mixture specification that allows to avoid unverifiable assumptions on the random effect distribution and to derive a two-way clustering of second-level units. Starting from a general model where the two random effects are dependent, we also obtain the independence model as a special case. The proposal is applied to data on the physical health status of a sample of elderly residents grouped into nursing homes. A simulation study assessing the performance of the proposal is also included. 相似文献
4.
For clustering objects, we often collect not only continuous variables, but binary attributes as well. This paper proposes a model-based clustering approach with mixed binary and continuous variables where each binary attribute is generated by a latent continuous variable that is dichotomized with a suitable threshold value, and where the scores of the latent variables are estimated from the binary data. In economics, such variables are called utility functions and the assumption is that the binary attributes (the presence or the absence of a public service or utility) are determined by low and high values of these functions. In genetics, the latent response is interpreted as the ??liability?? to develop a qualitative trait or phenotype. The estimated scores of the latent variables, together with the observed continuous ones, allow to use a multivariate Gaussian mixture model for clustering, instead of using a mixture of discrete and continuous distributions. After describing the method, this paper presents the results of both simulated and real-case data and compares the performances of the multivariate Gaussian mixture model and of a mixture of joint multivariate and multinomial distributions. Results show that the former model outperforms the mixture model for variables with different scales, both in terms of classification error rate and reproduction of the clusters means. 相似文献
6.
This paper proposes a new approach to dynamically segment markets. Dynamic segmentation i of key importance in many markets where it is unrealistic to assume stationary segments due to the dynamics in consumers’ needs and product choices. The main goal of the study is to analyse the dynamic process of financial product ownership under the assumption of heterogeneous growth in different segments taking into account significant determinants of growth trajectories. Using data from 2002 to 2010 collected by the Survey of Household Income and Wealth conducted by the Bank of Italy, this article shows that the Italian market of financial products is segmented and that this behavior’s trajectories over time are significantly influenced by the area of the country where the family lives and head of household’s education and gender. 相似文献
7.
With high-dimensional data, the number of covariates is considerably larger than the sample size. We propose a sound method for analyzing these data. It performs simultaneously clustering and variable selection. The method is inspired by the plaid model. It may be seen as a multiplicative mixture model that allows for overlapping clustering. Unlike conventional clustering, within this model an observation may be explained by several clusters. This characteristic makes it specially suitable for gene expression data. Parameter estimation is performed with the Monte Carlo expectation maximization algorithm and importance sampling. Using extensive simulations and comparisons with competing methods, we show the advantages of our methodology, in terms of both variable selection and clustering. An application of our approach to the gene expression data of kidney renal cell carcinoma taken from The Cancer Genome Atlas validates some previously identified cancer biomarkers. 相似文献
8.
Block clustering aims to reveal homogeneous block structures in a data table. Among the different approaches of block clustering, we consider here a model-based method: the Gaussian latent block model for continuous data which is an extension of the Gaussian mixture model for one-way clustering. For a given data table, several candidate models are usually examined, which differ for example in the number of clusters. Model selection then becomes a critical issue. To this end, we develop a criterion based on an approximation of the integrated classification likelihood for the Gaussian latent block model, and propose a Bayesian information criterion-like variant following the same pattern. We also propose a non-asymptotic exact criterion, thus circumventing the controversial definition of the asymptotic regime arising from the dual nature of the rows and columns in co-clustering. The experimental results show steady performances of these criteria for medium to large data tables. 相似文献
9.
Finite mixture modeling approach is widely used for the analysis of bimodal or multimodal data that are individually observed in many situations. However, in some applications, the analysis becomes substantially challenging as the available data are grouped into categories. In this work, we assume that the observed data are grouped into distinct non-overlapping intervals and follow a finite mixture of normal distributions. For the inference of the model parameters, we propose a parametric approach that accounts for the categorical features of the data. The main idea of our method is to impute the missing information of the original data through the Bayesian framework using the Gibbs sampling techniques. The proposed method was compared with the maximum likelihood approach, which uses the Expectation-Maximization algorithm for the estimation of the model parameters. It was also illustrated with an application to the Old Faithful geyser data. 相似文献
10.
Advances in Data Analysis and Classification - Finite mixtures present a powerful tool for modeling complex heterogeneous data. One of their most important applications is model-based clustering.... 相似文献
11.
Advances in Data Analysis and Classification - We consider model-based clustering methods for continuous, correlated data that account for external information available in the presence of... 相似文献
12.
Model-based clustering is a popular tool which is renowned for its probabilistic foundations and its flexibility. However, model-based clustering techniques usually perform poorly when dealing with high-dimensional data streams, which are nowadays a frequent data type. To overcome this limitation of model-based clustering, we propose an online inference algorithm for the mixture of probabilistic PCA model. The proposed algorithm relies on an EM-based procedure and on a probabilistic and incremental version of PCA. Model selection is also considered in the online setting through parallel computing. Numerical experiments on simulated and real data demonstrate the effectiveness of our approach and compare it to state-of-the-art online EM-based algorithms. 相似文献
13.
This paper presents an extension of the standard Tobit to simultaneously address segmental phases, subpopulation heterogeneity, lower limit of detection, and skewness in outcomes of human immunodeficiency virus (HIV) or acquired immunodeficiency syndrome (AIDS) longitudinal data. A major problem often encountered in an HIV/AIDS research is the development of drug resistance to antiretroviral (ARV) drug or therapy. For dealing with drug resistance problem, estimating the time at which drug resistance would develop is usually sought. Following ARV treatment, the profile of each subject’s viral load tends to follow a ‘broken stick’ like growth trajectory, indicating multiple phases of decline and increase in viral loads. Such multiple phases with multiple change-points are captured by subject-specific random parameters of growth curve models. To account subpopulation heterogeneity of drug resistance among patients, the turning-points are also allowed to differ by latent classes of patients on the basis of trajectories of observed viral loads. These features of viral longitudinal data are jointly modeled in a unified framework of segmental growth mixture Tobit mixed-effects models with skew distributions for a response variable with left censoring and skewness under the Bayesian approach. The proposed methods are illustrated using real data from an AIDS clinical study. 相似文献
15.
In this paper, some SEIRS epidemiological models with vaccination and temporary immunity are considered. First of all, previously published work is reviewed. In the next section, a general model with a constant contact rate and a density-dependent death rate is examined. The model is reformulated in terms of the proportions of susceptible, incubating, infectious, and immune individuals. Next the equilibrium and stability properties of this model are examined, assuming that the average duration of immunity exceeds the infectious period. There is a threshold parameter Ro and the disease can persist if and only if Ro exceeds one. The disease-free equilibrium always exists and is locally stable if Ro < 1 and unstable if Ro > 1. Conditions are derived for the global stability of the disease-free equilibrium. For Ro > 1, the endemic equilibrium is unique and locally asymptotically stable.For the full model dealing with numbers of individuals, there are two critical contact rates. These give conditions for the disease, respectively, to drive a population which would otherwise persist at a finite level or explode to extinction and to cause a population that would otherwise explode to be regulated at a finite level. If the contact rate β( N) is a monotone increasing function of the population size, then we find that there are now three threshold parameters which determine whether or not the disease can persist proportionally. Moreover, the endemic equilibrium need no longer be locally asymptotically stable. Instead stable limit cycles can arise by supercritical Hopf bifurcation from the endemic equilibrium as this equilibrium loses its stability. This is confirmed numerically. 相似文献
16.
This paper proposes particle swarm optimization with age-group topology (PSOAG), a novel age-based particle swarm optimization (PSO). In this work, we present a new concept of age to measure the search ability of each particle in local area. To keep population diversity during searching, we separate particles to different age-groups by their age and particles in each age-group can only select the ones in younger groups or their own groups as their neighbourhoods. To allow search escape from local optima, the aging particles are regularly replaced by new and randomly generated ones. In addition, we design an age-group based parameter setting method, where particles in different age-groups have different parameters, to accelerate convergence. This algorithm is applied to nonlinear function optimization and data clustering problems for performance evaluation. In comparison against several PSO variants and other EAs, we find that the proposed algorithm provides significantly better performances on both the function optimization problems and the data clustering tasks. 相似文献
17.
We consider simultaneously identifying the membership and locations of point sources that are convolved with different band-limited point spread functions, from the observation of their superpositions. This problem arises in three-dimensional super-resolution single-molecule imaging, neural spike sorting, multi-user channel identification, among other applications. We propose a novel algorithm, based on convex programming, and establish its near-optimal performance guarantee for exact recovery in the noise-free setting by exploiting the spectral sparsity of the point source models as well as the incoherence between point spread functions. Furthermore, robustness of the recovery algorithm in the presence of bounded noise is also established. Numerical examples are provided to demonstrate the effectiveness of the proposed approach. 相似文献
18.
We study the combined effects of periodically varying carrying capacity and survival rate on populations. We show that our populations with constant recruitment functions do not experience either resonance or attenuance when either only the carrying capacity or the survival rate is fluctuating. However, when both carrying capacity and survival rate are fluctuating the populations experience either attenuance or resonance, depending on parameter regimes. In addition, we show that our populations with Beverton–Holt recruitment functions experience attenuance when only the carrying capacity is fluctuating. 相似文献
20.
In this paper, a simplified model describing the stochastic process underlying the etiology of contagious and noncontagious diseases with mass screening is developed. Typical examples might include screening of tuberculosis in urban ghetto areas, venereal diseases in the sexually active, or AIDS in high risk population groups. The model is addressed to diseases which have zero or negligible latent periods. In the model, it is assumed that the reliabilities of the screening tests are constant, and independent of how long the population unit has the disease. Both tests with perfect and imperfect reliabilities are considered. It is shown that most of the results of a 1978 study by W.P. Pierskalla and J.A. Voelker for noncontagious diseases can be generalized for contagious diseases. A mathematical program for computing the optimal test choice and screening periods is presented. It is shown that the optimal screening schedule is equally spaced for tests with perfect reliability. Other properties relating to the managerial problems of screening frequencies, test selection, and resource allocation are also presented. 相似文献
|