Loading...
Citations
Altmetric:
Abstract
Information retrieval (IR) is a scientific discipline within the fields of computer and information sciences that enables billions of users to efficiently access the information they need. Applications of information retrieval include, but are not limited to, search engines, question answering, and recommender systems.
Decades of IR research have demonstrated that learning accurate query and document representations plays a vital role in the effectiveness of IR systems. State-of the-art representation learning solutions for information retrieval heavily rely on deep neural networks. However, despite their effective performance, current approaches are not quite optimal for all IR settings. For example, information retrieval systems often deal with inputs that are not clear and self-sufficient, e.g., many queries submitted to search engines. In such cases, current state-of-the-art models cannot learn an optimal representation of the input or even an accurate set of all representations.
To address this major issue, we develop novel approaches by augmenting neural representation learning models using a retrieval module that guides the model towards learning more effective representations. We study our retrieval augmentation approaches in a diverse set of somewhat novel and emerging information retrieval ap plications. First, we introduce Guided Transformer—an extension to the Transformer network that adjusts the input representations using multiple documents provided by a retrieval module—and demonstrate its effectiveness in learning representations for conversational search problems. Next, we propose novel representation learning models that learn multiple representations for queries that may carry multiple intents, including ambiguous and faceted queries. For doing so, we also introduce a novel optimization approach that enables encoder-decoder architectures to generate a per mutation invariant set of query intents.
Furthermore, we study retrieval-augmented data generation for domain adaptation in IR, which concerns applying a retrieval model trained on a source domain to a target domain that often suffers from unavailability of training data. We introduce a novel adaptive IR task, in which only a textual description of the target domain is available. We define a taxonomy of domain attributes in information retrieval to identify different properties of a source domain that can be adapted to a target domain. We introduce a novel automatic data construction pipeline for adapting dense retrieval models to the target domain.
We believe that the applications of the developed retrieval augmentation methods can be expanded to many more real-world IR tasks.
Type
Dissertation (Open Access)
Date
2024-05
Publisher
Degree
Advisors
License
Attribution 4.0 International
License
http://creativecommons.org/licenses/by/4.0/
Research Projects
Organizational Units
Journal Issue
Embargo Lift Date
2025-05-17