Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.

Document Type

Open Access Dissertation

Degree Name

Doctor of Philosophy (PhD)

Degree Program

Computer Science

Year Degree Awarded

Spring 2014

First Advisor

James Allan

Subject Categories

Other Computer Engineering

Abstract

When users investigate a topic, they are often interested in results that are not just relevant, but also strongly opinionated or covering a range of times. To get such results, users are forced to formulate ambiguous, complex, or longer queries. Commonly this becomes a burden, since users need to issue several queries with reformulations if initial search results are not completely satisfactory. In this thesis, we focus on those two non-topical dimensions: opinionatedness and time. We develop measures for quantifying them in documents and incorporate them into search results.

For improving search results with respect to non-topical dimensions, we use diversification approaches. To achieve controlled variety in results, our methods are integrated with a general bias framework, which seamlessly unifies extreme biases for each dimension. Results can be diversified across a single or multiple non-topical dimensions. Our experiments are performed on the TREC Blog Track.

As a result of this research, we can determine how temporal or opinionated a unit of text is. By means of diversification we provide a retrieval framework to users with which they can more easily find different kinds of opinionated or temporal results with only one submitted query. The burden of analyzing pre-existing biases for a query and discovering times at which important events happened is fully carried by the system.

As opposed to prior work in this area, pre-existing biases in search results are analyzed, and diversification is performed in a controlled manner for each dimension. We show how to combine several dimensions with individual biases for each, while also presenting approaches to time and sentiment diversification. The insights from this work will be very valuable for next generation search engines and retrieval systems.

Share

COinS