Topic Regression

Mimno, David

Publication

Topic Regression

Mimno, David

Abstract

Text documents are generally accompanied by non-textual information, such as authors, dates, publication sources, and, increasingly, automatically recognized named entities. Work in text analysis has often involved predicting these non-text values based on text data for tasks such as document classification and author identification. This thesis considers the opposite problem: predicting the textual content of documents based on non-text data. In this work I study several regression-based methods for estimating the influence of specific metadata elements in determining the content of text documents. Such topic regression methods allow users of document collections to test hypotheses about the underlying environments that produced those documents.

Type

dissertation
article
dissertation

Date

2012-02-01

Topic Regression

Mimno, David

Abstract

Type

Date

Publisher

Degree

Advisors

Rights

License

Files

Research Projects

Organizational Units

Journal Issue

Embargo

URI

DOI

Publisher Version

Embedded videos

Collections