Off-campus UMass Amherst users: To download dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users, please click the view more button below to purchase a copy of this dissertation from Proquest.

(Some titles may also be available free of charge in our Open Access Dissertation Collection, so please check there first.)

Exploring privacy and personalization in information retrieval applications

Henry A Feild, University of Massachusetts Amherst


A growing number of information retrieval applications rely on search behavior aggregated over many users. If aggregated data such as search query reformulations is not handled properly, it can allow users to be identified and their privacy compromised. Besides leveraging aggregate data, it is also common for applications to make use of user-specific behavior in order to provide a personalized experience for users. Unlike aggregate data, privacy is not an issue in individual personalization since users are the only consumers of their own data. The goal of this work is to explore the effects of personalization and privacy preservation methods on three information retrieval applications, namely search task identification, task-aware query recommendation, and searcher frustration detection. We pursue this goal by first introducing a novel framework called CrowdLogging for logging and aggregating data privately over a distributed set of users. We then describe several privacy mechanisms for sanitizing global data, including one novel mechanism based on differential privacy. We present a template for describing how local user data and global aggregate data are collected, processed, and used within an application, and apply this template to our three applications. We find that sanitizing feature vectors aggregated across users has a low impact on performance for classification applications (search task identification and searcher frustration detection). However, sanitizing free-text query reformulations is extremely detrimental to performance for the query recommendation application we consider. Personalization is useful to some degree in all the applications we explore when integrated with global information, achieving gains for search task identification, task-aware query recommendation, and searcher frustration detection. Finally we introduce an open source system called CrowdLogger that implements the CrowdLogging framework and also serves as a platform for conducting in-situ user studies of search behavior, prototyping and evaluating information retrieval applications, and collecting labeled data.

Subject Area

Computer science

Recommended Citation

Feild, Henry A, "Exploring privacy and personalization in information retrieval applications" (2013). Doctoral Dissertations Available from Proquest. AAI3603078.