Daniel SheldonSUN, TAO2024-04-262024-04-262019-022019-0210.7275/13489220https://hdl.handle.net/20.500.14394/17706Various real-world applications involve directly dealing with aggregate data. In this work, we study Learning with Aggregate Data from several perspectives and try to address their combinatorial challenges. At first, we study the problem of learning in Collective Graphical Models (CGMs), where only noisy aggregate observations are available. Inference in CGMs is NP- hard and we proposed an approximate inference algorithm. By solving the inference problems, we are empowered to build large-scale bird migration models, and models for human mobility under the differential privacy setting. Secondly, we consider problems given bags of instances and bag-level aggregate supervisions. Specifically, we study the US presidential election and try to build a model to understand the voting preferences of either individuals or demographic groups. The data consists of characteristic individuals from the US Census as well as voting tallies for each voting precinct. We proposed a fully probabilistic Learning with Label Proportions (LLPs) model with exact inference to build an instance-level model. Thirdly, we study distribution regression. It has similar problem setting to LLPs but builds bag-level models. We experimentally evaluated different algorithms on three tasks, and identified key factors in problem settings that impact the choice of algorithm.machine learninggraphical modelsArtificial Intelligence and RoboticsLearning with Aggregate Datadissertation