O±ce workers everywhere are drowning in email|not only spam, but also large quan- tities of legitimate email to be read and organized for browsing. Although there have been extensive investigations of automatic document categorization, email gives rise to a num- ber of unique challenges, and there has been relatively little study of classifying email into folders. This paper presents an extensive benchmark study of email foldering using two large corpora of real-world email messages and foldering schemes: one from former Enron em- ployees, another from participants in an SRI research project. We discuss the challenges that arise from di®erences between email foldering and traditional document classi¯cation. We show experimental results from an array of automated classi¯cation methods and eval- uation methodologies, including a new evaluation method of foldering results based on the email timeline, and including enhancements to the exponential gradient method Winnow, providing top-tier accuracy with a fraction the training time of alternative methods. We also establish that classi¯cation accuracy in many cases is relatively low, con¯rming the challenges of email data, and pointing toward email foldering as an important area for further research.
Bekkerman, Ron, "Automatic Categorization of Email into Folders: Benchmark Experiments on Enron and SRI Corpora" (2004). Computer Science Department Faculty Publication Series. 218.
Retrieved from https://scholarworks.umass.edu/cs_faculty_pubs/218