Off-campus UMass Amherst users: To download dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users, please click the view more button below to purchase a copy of this dissertation from Proquest.

(Some titles may also be available free of charge in our Open Access Dissertation Collection, so please check there first.)

Word-sense disambiguation for large text databases

Robert Jeffrey Krovetz, University of Massachusetts Amherst

Abstract

Most retrieval systems represent documents and queries by the words they contain, and rank documents based on the words in common with the query. Because words are ambiguous, this can cause documents to be retrieved that are not relevant. In addition, a document can be relevant even if it does not mention the exact words used in the query. A user is generally not interested in the words, but in the concepts that those words represent. We report on an analysis of lexical ambiguity in information retrieval test collections, and on experiments to determine the utility of word meanings for separating relevant from non-relevant documents. Our research has examined different sources of evidence for distinguishing meanings. These sources can serve as a mechanism for splitting meanings apart, as well as bringing them together. For example, morphology separates author/authorize, and universe/university; it brings together burglar/burglarize and sincere/sincerity. Similar distinctions hold for phrases and part of speech. Any effort to deal with word meanings and information retrieval must take these distinctions into account. We discuss the results of experiments with these sources of evidence, and our proposals for future work.

Subject Area

Computer science|Linguistics

Recommended Citation

Krovetz, Robert Jeffrey, "Word-sense disambiguation for large text databases" (1995). Doctoral Dissertations Available from Proquest. AAI9541123.
https://scholarworks.umass.edu/dissertations/AAI9541123

Share

COinS