Title
Date of Award
2-2012
Document type
dissertation
Access Type
Open Access Dissertation
Degree Name
Doctor of Philosophy (PhD)
Degree Program
Computer Science
First Advisor
Andrew McCallum
Second Advisor
David Blei
Third Advisor
David Jensen
Subject Categories
Computer Sciences
Abstract
Text documents are generally accompanied by non-textual information, such as authors, dates, publication sources, and, increasingly, automatically recognized named entities. Work in text analysis has often involved predicting these non-text values based on text data for tasks such as document classification and author identification. This thesis considers the opposite problem: predicting the textual content of documents based on non-text data. In this work I study several regression-based methods for estimating the influence of specific metadata elements in determining the content of text documents. Such topic regression methods allow users of document collections to test hypotheses about the underlying environments that produced those documents.
DOI
https://doi.org/10.7275/2646883
Recommended Citation
Mimno, David, "Topic Regression" (2012). Open Access Dissertations. 520.
https://doi.org/10.7275/2646883
https://scholarworks.umass.edu/open_access_dissertations/520