Computer Science Department Faculty Publication Series

UMass at TREC 2004: Novelty and HARD

Nasreen Abdul-Jaleel, University of Massachusetts - Amherst
James Allan, University of Massachusetts - Amherst
W. Bruce Croft, University of Massachusetts - Amherst
Fernando Diaz, University of Massachusetts - Amherst
Leah Larkey, University of Massachusetts - Amherst
Xiaoyan Li, University of Massachusetts - Amherst
Mark D. Smucker, University of Massachusetts - Amherst
Courtney Wade, University of Massachusetts - Amherst

Publication Date

2004

Abstract

For the TREC 2004 Novelty track, UMass participated in all four tasks. Although finding relevant sentences was harder this year than last, we continue to show marked improvements over the baseline of calling all sentences relevant, with a variant of tfidf being the most successful approach. We achieve 5–9%improvements over the baseline in locating novel sentences, primarily by looking at the similarity of a sentence to earlier sentences and focusing on named entities. For the High Accuracy Retrieval from Documents (HARD) track, we investigated the use of clarification forms, fixed- and variable-length passage retrieval, and the use of metadata. Clarification form results indicate that passage level feedback can provide improvements comparable to user supplied related-text for document evaluation and outperforms related-text for passage evaluation. Document retrieval methods without a query expansion component show themost gains fromrelated-text. We also found that displaying the top passages for feedback outperformed displaying centroid passages. Named entity feedback resulted in mixed performance. Our primary findings for passage retrieval are that document retrieval methods performed better than passage retrieval methods on the passage evaluation metric of binary preference at 12,000 characters, and that clarification forms improved passage retrieval for every retrieval method explored. We found no benefit to using variable-length passages over fixed-length passages for this corpus. Our use of geography and genremetadata resulted in no significant changes in retrieval performance.

Comments

This paper was harvested from CiteSeer

Recommended Citation

Abdul-Jaleel, Nasreen; Allan, James; Croft, W. Bruce; Diaz, Fernando; Larkey, Leah; Li, Xiaoyan; Smucker, Mark D.; and Wade, Courtney, "UMass at TREC 2004: Novelty and HARD" (2004). Computer Science Department Faculty Publication Series. 189.
Retrieved from https://scholarworks.umass.edu/cs_faculty_pubs/189

Download

Included in

Computer Sciences Commons

COinS

ScholarWorks@UMass Amherst

Computer Science Department Faculty Publication Series

UMass at TREC 2004: Novelty and HARD

Publication Date

Abstract

Comments

Recommended Citation

Included in

Browse

Author Corner

Links

ScholarWorks@UMass Amherst

Computer Science Department Faculty Publication Series

UMass at TREC 2004: Novelty and HARD

Authors

Publication Date

Abstract

Comments

Recommended Citation

Included in

Share

Browse

Author Corner

Links