Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.
Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.
Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.
Author ORCID Identifier
N/A
AccessType
Open Access Dissertation
Document Type
dissertation
Degree Name
Doctor of Philosophy (PhD)
Degree Program
Computer Science
Year Degree Awarded
2019
Month Degree Awarded
February
First Advisor
Alexandra Meliou
Subject Categories
Data Storage Systems
Abstract
Systems and applications rely heavily on data, which makes data quality a critical factor for their function. In turn, low quality data can be incredibly costly and disruptive, leading to loss of revenue, incorrect conclusions, and misguided policy decisions. Improving data quality is far more than purging datasets of errors; it is more important to improve the processes that produce the data, to collect good data sources that are used for generating the data, and to truly understand the quality of the data. Therefore, the objective of this thesis is to improve and understand data quality from the above aspects. First, we develop two efficient and effective tools, DataXRay and QFix, that are able to diagnose systematic errors in general data extraction systems and relational data systems respectively. Second, we design a recommendation system, Midas, that focuses on identifying high quality data sources for augmenting knowledge bases. Third, we implement an explaining system, Explain3D, which explains the disagreements in disjoint datasets.
DOI
https://doi.org/10.7275/12926893
Recommended Citation
Wang, Xiaolan, "Improving and understanding data quality in large-scale data systems" (2019). Doctoral Dissertations. 1530.
https://doi.org/10.7275/12926893
https://scholarworks.umass.edu/dissertations_2/1530