Publication Date

2001

Abstract

Hierarchies have long been used for organization, summarization, and access to information. In this proposal we define summarization in terms of a probabilistic language model and use the definition to explore new techniques for automatically generating topic hierarchies. One technique applies a graph-theoretic algorithm, which is an approximation of the Dominating Set Problem. Another technique uses an entropy-based approach to choose topic terms. Both techniques efficiently select terms according to a language model. We compare the new techniques to previous methods proposed for constructing topic hierarchies including subsumption and lexical hierarchies, as well as words found using TF.IDF. Our preliminary results show that the new techniques perform as well as or better than these other techniques. We plan to evaluate the two techniques further through user studies as well as computer simulations. We will also develop a demo for better interaction with users.

Comments

This paper was harvested from CiteSeer

Share

COinS