Loading...
Optimization with Intrinsic Diversity: Towards Generalizable, Safe, and Open-ended Learning
Citations
Altmetric:
Abstract
Building general-purpose artificial intelligence (AI) systems and safely aligning them with human values remains a critical, unsolved problem in AI. A significant challenge in this domain is the inherent diversity of human thoughts and demands, reflecting the complexities of the real world. This diversity is often not adequately captured in existing optimization processes, which typically aim to optimize aggregated objectives or average human preferences. This dissertation investigates intrinsic mechanisms for integrating diversity into optimization. First, we introduce Gradient Lexicase Selection and Probabilistic Lexicase Selection to promote diversity in goal-oriented tasks, enhancing model generalization and efficiency. Second, we address diversity in human preferences with Pareto Optimal Preferences Learning (POPL), a reinforcement learning from human feedback (RLHF) framework that learns policies and reward models catering to distinct groups, ensuring safe and fair alignment of AI agents. Finally, we propose Quality Diversity through Human Feedback (QDHF), a novel approach that learns notions of diversity from human judgment of difference to simultaneously optimize for quality and novelty, thereby enhancing the creativity and user satisfaction of model responses in open-ended generative tasks.
Type
Dissertation (Open Access)
Date
2024-09
Publisher
Degree
Advisors
License
License
http://creativecommons.org/licenses/by/4.0/