首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A bootstrap procedure useful in latent class, or more general mixture models has been developed to determine the sufficient number of latent classes or components required to account for systematic group differences in the data. The procedure is illustrated in the context of a multidimensional scaling latent class model, CLASCAL. Also presented is a bootstrap technique for determining standard errors for estimates of the stimulus co‐ordinates, parameters of the multidimensional scaling model. Real and artificial data are presented. The bootstrap procedure for selecting a sufficient number of classes seems to correctly select the correct number of latent classes at both low and high error levels. At higher error levels it outperforms Hope's (J. Roy. Statist. Soc. Ser B 1968; 30 : 582) procedure. The bootstrap procedures to estimate parameter stability appear to correctly re‐produce Monte Carlo results. Copyright © 2002 John Wiley & Sons, Ltd.  相似文献   

2.
For clustering objects, we often collect not only continuous variables, but binary attributes as well. This paper proposes a model-based clustering approach with mixed binary and continuous variables where each binary attribute is generated by a latent continuous variable that is dichotomized with a suitable threshold value, and where the scores of the latent variables are estimated from the binary data. In economics, such variables are called utility functions and the assumption is that the binary attributes (the presence or the absence of a public service or utility) are determined by low and high values of these functions. In genetics, the latent response is interpreted as the ??liability?? to develop a qualitative trait or phenotype. The estimated scores of the latent variables, together with the observed continuous ones, allow to use a multivariate Gaussian mixture model for clustering, instead of using a mixture of discrete and continuous distributions. After describing the method, this paper presents the results of both simulated and real-case data and compares the performances of the multivariate Gaussian mixture model and of a mixture of joint multivariate and multinomial distributions. Results show that the former model outperforms the mixture model for variables with different scales, both in terms of classification error rate and reproduction of the clusters means.  相似文献   

3.
4.
This paper proposes a new approach to dynamically segment markets. Dynamic segmentation i of key importance in many markets where it is unrealistic to assume stationary segments due to the dynamics in consumers’ needs and product choices. The main goal of the study is to analyse the dynamic process of financial product ownership under the assumption of heterogeneous growth in different segments taking into account significant determinants of growth trajectories. Using data from 2002 to 2010 collected by the Survey of Household Income and Wealth conducted by the Bank of Italy, this article shows that the Italian market of financial products is segmented and that this behavior’s trajectories over time are significantly influenced by the area of the country where the family lives and head of household’s education and gender.  相似文献   

5.
With high-dimensional data, the number of covariates is considerably larger than the sample size. We propose a sound method for analyzing these data. It performs simultaneously clustering and variable selection. The method is inspired by the plaid model. It may be seen as a multiplicative mixture model that allows for overlapping clustering. Unlike conventional clustering, within this model an observation may be explained by several clusters. This characteristic makes it specially suitable for gene expression data. Parameter estimation is performed with the Monte Carlo expectation maximization algorithm and importance sampling. Using extensive simulations and comparisons with competing methods, we show the advantages of our methodology, in terms of both variable selection and clustering. An application of our approach to the gene expression data of kidney renal cell carcinoma taken from The Cancer Genome Atlas validates some previously identified cancer biomarkers.  相似文献   

6.
Block clustering aims to reveal homogeneous block structures in a data table. Among the different approaches of block clustering, we consider here a model-based method: the Gaussian latent block model for continuous data which is an extension of the Gaussian mixture model for one-way clustering. For a given data table, several candidate models are usually examined, which differ for example in the number of clusters. Model selection then becomes a critical issue. To this end, we develop a criterion based on an approximation of the integrated classification likelihood for the Gaussian latent block model, and propose a Bayesian information criterion-like variant following the same pattern. We also propose a non-asymptotic exact criterion, thus circumventing the controversial definition of the asymptotic regime arising from the dual nature of the rows and columns in co-clustering. The experimental results show steady performances of these criteria for medium to large data tables.  相似文献   

7.
Advances in Data Analysis and Classification - Finite mixtures present a powerful tool for modeling complex heterogeneous data. One of their most important applications is model-based clustering....  相似文献   

8.
Advances in Data Analysis and Classification - We consider model-based clustering methods for continuous, correlated data that account for external information available in the presence of...  相似文献   

9.
Finite mixture modeling approach is widely used for the analysis of bimodal or multimodal data that are individually observed in many situations. However, in some applications, the analysis becomes substantially challenging as the available data are grouped into categories. In this work, we assume that the observed data are grouped into distinct non-overlapping intervals and follow a finite mixture of normal distributions. For the inference of the model parameters, we propose a parametric approach that accounts for the categorical features of the data. The main idea of our method is to impute the missing information of the original data through the Bayesian framework using the Gibbs sampling techniques. The proposed method was compared with the maximum likelihood approach, which uses the Expectation-Maximization algorithm for the estimation of the model parameters. It was also illustrated with an application to the Old Faithful geyser data.  相似文献   

10.
Model-based clustering is a popular tool which is renowned for its probabilistic foundations and its flexibility. However, model-based clustering techniques usually perform poorly when dealing with high-dimensional data streams, which are nowadays a frequent data type. To overcome this limitation of model-based clustering, we propose an online inference algorithm for the mixture of probabilistic PCA model. The proposed algorithm relies on an EM-based procedure and on a probabilistic and incremental version of PCA. Model selection is also considered in the online setting through parallel computing. Numerical experiments on simulated and real data demonstrate the effectiveness of our approach and compare it to state-of-the-art online EM-based algorithms.  相似文献   

11.
12.
This paper proposes particle swarm optimization with age-group topology (PSOAG), a novel age-based particle swarm optimization (PSO). In this work, we present a new concept of age to measure the search ability of each particle in local area. To keep population diversity during searching, we separate particles to different age-groups by their age and particles in each age-group can only select the ones in younger groups or their own groups as their neighbourhoods. To allow search escape from local optima, the aging particles are regularly replaced by new and randomly generated ones. In addition, we design an age-group based parameter setting method, where particles in different age-groups have different parameters, to accelerate convergence. This algorithm is applied to nonlinear function optimization and data clustering problems for performance evaluation. In comparison against several PSO variants and other EAs, we find that the proposed algorithm provides significantly better performances on both the function optimization problems and the data clustering tasks.  相似文献   

13.
In this paper, some SEIRS epidemiological models with vaccination and temporary immunity are considered. First of all, previously published work is reviewed. In the next section, a general model with a constant contact rate and a density-dependent death rate is examined. The model is reformulated in terms of the proportions of susceptible, incubating, infectious, and immune individuals. Next the equilibrium and stability properties of this model are examined, assuming that the average duration of immunity exceeds the infectious period. There is a threshold parameter Ro and the disease can persist if and only if Ro exceeds one. The disease-free equilibrium always exists and is locally stable if Ro < 1 and unstable if Ro > 1. Conditions are derived for the global stability of the disease-free equilibrium. For Ro > 1, the endemic equilibrium is unique and locally asymptotically stable.For the full model dealing with numbers of individuals, there are two critical contact rates. These give conditions for the disease, respectively, to drive a population which would otherwise persist at a finite level or explode to extinction and to cause a population that would otherwise explode to be regulated at a finite level. If the contact rate β(N) is a monotone increasing function of the population size, then we find that there are now three threshold parameters which determine whether or not the disease can persist proportionally. Moreover, the endemic equilibrium need no longer be locally asymptotically stable. Instead stable limit cycles can arise by supercritical Hopf bifurcation from the endemic equilibrium as this equilibrium loses its stability. This is confirmed numerically.  相似文献   

14.
We consider simultaneously identifying the membership and locations of point sources that are convolved with different band-limited point spread functions, from the observation of their superpositions. This problem arises in three-dimensional super-resolution single-molecule imaging, neural spike sorting, multi-user channel identification, among other applications. We propose a novel algorithm, based on convex programming, and establish its near-optimal performance guarantee for exact recovery in the noise-free setting by exploiting the spectral sparsity of the point source models as well as the incoherence between point spread functions. Furthermore, robustness of the recovery algorithm in the presence of bounded noise is also established. Numerical examples are provided to demonstrate the effectiveness of the proposed approach.  相似文献   

15.
We study the combined effects of periodically varying carrying capacity and survival rate on populations. We show that our populations with constant recruitment functions do not experience either resonance or attenuance when either only the carrying capacity or the survival rate is fluctuating. However, when both carrying capacity and survival rate are fluctuating the populations experience either attenuance or resonance, depending on parameter regimes. In addition, we show that our populations with Beverton–Holt recruitment functions experience attenuance when only the carrying capacity is fluctuating.  相似文献   

16.
17.
In this paper, a simplified model describing the stochastic process underlying the etiology of contagious and noncontagious diseases with mass screening is developed. Typical examples might include screening of tuberculosis in urban ghetto areas, venereal diseases in the sexually active, or AIDS in high risk population groups. The model is addressed to diseases which have zero or negligible latent periods. In the model, it is assumed that the reliabilities of the screening tests are constant, and independent of how long the population unit has the disease. Both tests with perfect and imperfect reliabilities are considered. It is shown that most of the results of a 1978 study by W.P. Pierskalla and J.A. Voelker for noncontagious diseases can be generalized for contagious diseases. A mathematical program for computing the optimal test choice and screening periods is presented. It is shown that the optimal screening schedule is equally spaced for tests with perfect reliability. Other properties relating to the managerial problems of screening frequencies, test selection, and resource allocation are also presented.  相似文献   

18.
The fitting of finite mixture models is an ill-defined estimation problem, as completely different parameterizations can induce similar mixture distributions. This leads to multiple modes in the likelihood, which is a problem for frequentist maximum likelihood estimation, and complicates statistical inference of Markov chain Monte Carlo draws in Bayesian estimation. For the analysis of the posterior density of these draws, a suitable separation into different modes is desirable. In addition, a unique labelling of the component specific estimates is necessary to solve the label switching problem. This paper presents and compares two approaches to achieve these goals: relabelling under multimodality and constrained clustering. The algorithmic details are discussed, and their application is demonstrated on artificial and real-world data.  相似文献   

19.
We give nontrivial bounds in various ranges for exponential sums of the form
  相似文献   

20.
We prove the existence of nontrivial functions in ℝn, n ≥ 2, with vanishing integrals over balls of fixed radius and given majorant of growth. Translated from Ukrains'kyi Matematychnyi Zhurnal, Vol. 60, No. 6, pp. 857–861, June, 2008.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号