Thumbnail Image

Integrating Non-Topical Aspects Into Information Retrieval

When users investigate a topic, they are often interested in results that are not just relevant, but also strongly opinionated or covering a range of times. To get such results, users are forced to formulate ambiguous, complex, or longer queries. Commonly this becomes a burden, since users need to issue several queries with reformulations if initial search results are not completely satisfactory. In this thesis, we focus on those two non-topical dimensions: opinionatedness and time. We develop measures for quantifying them in documents and incorporate them into search results. For improving search results with respect to non-topical dimensions, we use diversification approaches. To achieve controlled variety in results, our methods are integrated with a general bias framework, which seamlessly unifies extreme biases for each dimension. Results can be diversified across a single or multiple non-topical dimensions. Our experiments are performed on the TREC Blog Track. As a result of this research, we can determine how temporal or opinionated a unit of text is. By means of diversification we provide a retrieval framework to users with which they can more easily find different kinds of opinionated or temporal results with only one submitted query. The burden of analyzing pre-existing biases for a query and discovering times at which important events happened is fully carried by the system. As opposed to prior work in this area, pre-existing biases in search results are analyzed, and diversification is performed in a controlled manner for each dimension. We show how to combine several dimensions with individual biases for each, while also presenting approaches to time and sentiment diversification. The insights from this work will be very valuable for next generation search engines and retrieval systems.
Research Projects
Organizational Units
Journal Issue
Publisher Version
Embedded videos