Loading...
Thumbnail Image
Publication

Multi-Way Distributional Clustering via Pairwise Interactions

Abstract
We present a novel unsupervised learning scheme that simultaneously clusters variables of several types (e.g., documents, words and authors) based on pairwise interactions be- tween the types, as observed in co-occurrence data. In this scheme, multiple clustering systems are generated aiming at maximizing an objective function that measures multiple pairwise mutual information between cluster variables. To implement this idea, we pro- pose an algorithm that interleaves top-down clustering of some variables and bottom-up clustering of the other variables, with a local optimization correction routine. Focusing on document clustering we present an extensive empirical study of two-way, three-way and four-way applications of our scheme using six real-world datasets including the 20 News- groups (20NG) and the Enron email collec- tion. Our multi-way distributional clustering (MDC) algorithms consistently and signi¯- cantly outperform previous state-of-the-art information theoretic clustering algorithms.
Type
article
article
Date
2005-01-01
Publisher
Degree
Advisors
Rights
License
Research Projects
Organizational Units
Journal Issue
Embargo
DOI
Publisher Version
Embedded videos