首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Sparse Partially Collapsed MCMC for Parallel Inference in Topic Models
Authors:Måns Magnusson  Leif Jonsson  Mattias Villani  David Broman
Institution:1. Department of Computer and Information Science, Link?ping University, Link?ping, Sweden;2. Ericsson AB, Stockholm, Sweden;3. School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden
Abstract:Topic models, and more specifically the class of latent Dirichlet allocation (LDA), are widely used for probabilistic modeling of text. Markov chain Monte Carlo (MCMC) sampling from the posterior distribution is typically performed using a collapsed Gibbs sampler. We propose a parallel sparse partially collapsed Gibbs sampler and compare its speed and efficiency to state-of-the-art samplers for topic models on five well-known text corpora of differing sizes and properties. In particular, we propose and compare two different strategies for sampling the parameter block with latent topic indicators. The experiments show that the increase in statistical inefficiency from only partial collapsing is smaller than commonly assumed, and can be more than compensated by the speedup from parallelization and sparsity on larger corpora. We also prove that the partially collapsed samplers scale well with the size of the corpus. The proposed algorithm is fast, efficient, exact, and can be used in more modeling situations than the ordinary collapsed sampler. Supplementary materials for this article are available online.
Keywords:Bayesian inference  Computational complexity  Gibbs sampling  Latent Dirichlet allocation  Massive datasets  Parallel computing
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号