Loading...
Thumbnail Image
Publication

Observer-Aware Planning Under Uncertainty

Citations
Abstract
As more autonomous agents share space with people, it is crucial that these agents are cognizant of how their behaviors are interpreted. Previous work has investigated various methods to communicate or conceal agents' goals, intentions, and capabilities through their behaviors. Many of these methods are based on the same principle: the agent reasons about how its actions will be interpreted by observers and selects actions that lead to desired beliefs in the observer. Despite the commonality of this principle, these methods are often developed in isolation, focusing on specific aspects of the problem. This fragmentation makes it difficult to compare and combine these methods to create more sophisticated behaviors. This thesis presents a unifying model for generating behaviors that achieve desired goals while considering how these behaviors are interpreted. The proposed model, referred to as the Observer-Aware Markov Decision Process (OAMDP), operates under the assumption that observers interpret the agent's actions to form beliefs about the agent's potential desires, goals, and intentions. Rewards in OAMDPs depend on the observer's beliefs in addition to the agent's state and action. Using belief-dependent rewards, we demonstrate how OAMDPs can generate behaviors that lead to desirable observer beliefs. Additionally, we present an online user study comparing the observer-aware behaviors generated by OAMDPs to those generated by a baseline model. The results show that observers interpret the OAMDP-generated behaviors as intended more frequently. OAMDPs provide an expressive framework capable of producing various kinds of observer-aware behaviors. However, reasoning about the observer's beliefs introduces a dependence on histories. We provide a proof that this dependence makes solving OAMDPs intractable in the worst case. To address this issue, we investigate several approximation algorithms and present error bounds for a special case where rewards and belief updates are Lipschitz-continuous. We then empirically evaluate these algorithms' performance on a variety of domains. Although belief-dependent rewards are useful for generating observer-aware behaviors, they can conflict with task efficiency. To achieve a principled balance between task efficiency and observer-awareness, we introduce Constrained OAMDPs (COAMDPs). The introduction of constraints allows us to specify the desired observer beliefs while enforcing constraints on task efficiency. Since the constraints make solution methods for unconstrained models inapplicable to COAMDPs, we propose new algorithms and demonstrate their effectiveness through empirical results.
Type
Dissertation (Open Access)
Date
2025-05
Publisher
License
Attribution-NonCommercial 4.0 International
License
http://creativecommons.org/licenses/by-nc/4.0/
Research Projects
Organizational Units
Journal Issue
Embargo Lift Date
2026-05-16
Publisher Version
Embedded videos
Related Item(s)