Off-campus UMass Amherst users: To download dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users, please click the view more button below to purchase a copy of this dissertation from Proquest.

(Some titles may also be available free of charge in our Open Access Dissertation Collection, so please check there first.)

Pachinko allocation: DAG-structured mixture models of topic correlations

Wei Li, University of Massachusetts Amherst

Abstract

Statistical topic models are increasingly popular tools for summarization and manifold discovery in discrete data. However, the majority of existing approaches capture no or limited correlations between topics. We propose the pachinko allocation model (PAM), which captures arbitrary, nested, and possibly sparse correlations between topics using a directed acyclic graph (DAG). We present various structures within this framework, different parameterizations of topic distributions, and an extension to capture dynamic patterns of topic correlations. We also introduce a non-parametric Bayesian prior to automatically learn the topic structure from data. The model is evaluated on document classification, likelihood of held-out data, the ability to support fine-grained topics, and topical keyword coherence. With a highly-scalable approximation, PAM has also been applied to discover topic hierarchies in very large datasets.

Subject Area

Statistics|Artificial intelligence|Computer science

Recommended Citation

Li, Wei, "Pachinko allocation: DAG-structured mixture models of topic correlations" (2007). Doctoral Dissertations Available from Proquest. AAI3289214.
https://scholarworks.umass.edu/dissertations/AAI3289214

Share

COinS