Off-campus UMass Amherst users: To download dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users, please click the view more button below to purchase a copy of this dissertation from Proquest.

(Some titles may also be available free of charge in our Open Access Dissertation Collection, so please check there first.)

Maximum entropy, weight of evidence and information retrieval

Warren Richard Greiff, University of Massachusetts Amherst

Abstract

The central theme of this dissertation is the statistical analysis of retrieval data. Features commonly used in modern retrieval systems are studied and modeled. The product of this analysis is a methodology for the study of retrieval data and the construction of probabilistic retrieval models. Model building is based on the formal concept of weight of evidence, which is a measure of how much our belief in a hypothesis (such as the relevance of a document) is increased as a result of the observation of the value of a random variable (for example, the number of times a query term appears in the document). Application of the methodology results in the development of a probabilistic model from which a ranking formula is derived. The ranking status value assigned to each document is equal to the weight of the evidence due to the combination of features that have been observed. The resulting formula has two important properties: (1) it is decomposable, with each component corresponding to observed statistical regularities of retrieval situations; and (2) the value produced has a precise, empirically verifiable probabilistic interpretation. Experimental evidence is reported indicating that the ranking formula derived from the data analysis is able to produce retrieval performance comparable to that of a state of the art IR system. In conjunction with the study of empirical data, a formal framework is developed which supports the approach to modeling that is used. The formalism is founded on the Maximum Entropy Principle. This principle-states that the probability distribution that we attribute to an unknown stochastic process should be that which assumes the least consonant with constraints embodying the knowledge we do possess. Guided by this principle, a theory of weight of evidence is developed. In this theory additivity of weight of evidence is proved to be a characteristic of the maximum entropy distribution under general conditions on the form of the constraints. As well as serving as a justification for the modeling strategy adopted in the dissertation, two classical probabilistic retrieval models are shown to follow from the theory.

Subject Area

Computer science

Recommended Citation

Greiff, Warren Richard, "Maximum entropy, weight of evidence and information retrieval" (1999). Doctoral Dissertations Available from Proquest. AAI9950157.
https://scholarworks.umass.edu/dissertations/AAI9950157

Share

COinS