Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.
Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.
Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.
Author ORCID Identifier
https://orcid.org/0000-0003-2529-4036
AccessType
Open Access Dissertation
Document Type
dissertation
Degree Name
Doctor of Philosophy (PhD)
Degree Program
Computer Science
Year Degree Awarded
2021
Month Degree Awarded
September
First Advisor
Barna Saha
Second Advisor
Arya Mazumdar
Third Advisor
Gerome Miklau
Fourth Advisor
Divesh Srivastava
Subject Categories
Data Science
Abstract
A growing number of data-based applications are used for decision-making that have far-reaching consequences and significant societal impact. Entity resolution, community detection and taxonomy construction are some of the building blocks of these applications and for these methods, clustering is the fundamental underlying concept. Therefore, the use of accurate, robust and scalable methods for clustering cannot be overstated. We tackle the various facets of clustering with a multi-pronged approach described below. 1. While identification of clusters that refer to different entities is challenging for automated strategies, it is relatively easy for humans. We study the robustness of clustering methods that leverage supervision through an oracle i.e an abstraction of crowdsourcing. Additionally, we focus on scalability to handle web-scale datasets. 2. In community detection applications, a common setback in evaluation of the quality of clustering techniques is the lack of ground truth data. We propose a generative model that considers dependent edge formation and devise techniques for efficient cluster recovery.
DOI
https://doi.org/10.7275/24169035
Recommended Citation
Galhotra, Sainyam, "Robust Algorithms for Clustering with Applications to Data Integration" (2021). Doctoral Dissertations. 2319.
https://doi.org/10.7275/24169035
https://scholarworks.umass.edu/dissertations_2/2319
Creative Commons License
This work is licensed under a Creative Commons Attribution-Share Alike 4.0 License.