Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.

Author ORCID Identifier

https://orcid.org/0000-0003-4788-7481

Document Type

Open Access Dissertation

Degree Name

Doctor of Philosophy (PhD)

Degree Program

Computer Science

Year Degree Awarded

2020

Month Degree Awarded

September

First Advisor

Andrew McCallum

Subject Categories

Artificial Intelligence and Robotics

Abstract

Intelligent, automated systems that are intertwined with everyday life---such as Google Search and virtual assistants like Amazon’s Alexa or Apple’s Siri---are often powered in part by knowledge bases (KBs), i.e., structured data repositories of entities, their attributes, and the relationships among them. Despite a wealth of research focused on automated KB construction methods, KBs are inevitably imperfect, with errors stemming from various points in the construction pipeline. Making matters more challenging, new data is created daily and must be integrated with existing KBs so that they remain up-to-date. As the primary consumers of KBs, human users have tremendous potential to aid in KB construction by contributing feedback that identifies spurious and missing entity attributes and relations. However, correctly integrating user feedback with an existing KB is complicated by the necessity to resolve identity uncertainty, i.e., uncertainty regarding to which real-world entity a piece of data refers. Identity uncertainty abounds in the collection of raw evidence from which a KB is built. Moreover, it also gives rise to identity uncertainty in user feedback, when KB entities, which were affected by user feedback, are split or merged.

In this dissertation, we present a continuous reasoning framework capable of integrating user feedback with a KB, under identity certainty. To begin, we introduce Grinch, an online entity resolution (ER) algorithm---with provable correctness guarantees---capable of merging and splitting KB entities as new data arrives. We show that Grinch is efficient and achieves state-of-the-art performance in ER as well as in clustering. Next, we propose a method for using Grinch to resolve identity uncertainty in a KB's underlying data as well as in user feedback. Our approach is based on representing user feedback as mentions, i.e., first class KB objects that participate in all parts of KB construction. Furthermore, we introduce a structured representation for feedback comprised of packaging and payload, which facilitates recovery from KB errors that stem from both identity uncertainty and noisy data. Finally, we evaluate our framework's efficacy using data from the KB that supports OpenReview.net---a deployed, conference management system that solicits feedback from users. The demands of OpenReview.net lead us to develop XGrinch-Shallow (XGS), a variant of Grinch that builds trees with arbitrary branching factors, and subsequently instantiates 60% fewer internal nodes than Grinch. Empirically, we show that XGS is efficient, and is able to effectively utilize user feedback to improve the correctness and completeness of the OpenReview.net KB. We conclude with 7 concrete suggestions for future research on this topic.

Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS