Optimal Interactive Learning on the Job via Facility Location Planning

¹Brown University
, ²Carnegie Mellon University

Published at RSS 2025

Abstract

Collaborative robots have the ability to adapt and improve their behavior by learning from their human users. By interactively learning on the job, these robots can both acquire new motor skills and customize their behavior to personal user preferences. However, for this paradigm to be viable, there must be a balance between teaching the robot necessary skills, minimizing user burden, and maintaining task progress. We propose COIL, a novel polynomial-time interaction planner that explicitly minimizes human effort while ensuring the completion of a given sequence of tasks according to hidden user preferences. When user preferences are known, we formulate this planning-to-learn problem as an uncapacitated facility location problem. COIL utilizes efficient approximation algorithms for facility location to plan in the case of unknown preferences in polynomial time. In contrast, prior methods do not guarantee minimization of human effort nor consider the inherently collaborative nature of learning on the job, in which timely task execution may require the robot to forego learning and instead request human contributions. Simulated and physical experiments on manipulation tasks show that our framework significantly reduces the amount of work allocated to the human while maintaining successful task completion.

BibTeX

@article{vats2025optimal, title={Optimal Interactive Learning on the Job via Facility Location Planning}, author={Vats, Shivam and Zhao, Michelle and Callaghan, Patrick and Jia, Mingxi and Likhachev, Maxim and Kroemer, Oliver and Konidaris, George}, journal={Robotics: Systems and Science}, year={2025} }

Optimal Interactive Learning on the Job via Facility Location Planning

Abstract

Video

Long Overview

Our key theoretical contribution is a novel formulation of multi-task interactive robot learning as an instance of the uncapacitated facility location (UFL) problem. Different from prior works, our formulation is cost-optimal and jointly minimizes human burden during learning and deployment.

Results: Gridworld Domain

In the Gridworld domain, we find that COIL makes fewer preference queries than the confidence-based baselines because COIL only asks for human preferences if it believes that this information will be useful later. Format is mean(standard deviation).

Results: Manipulation Domain

Results: Real World Conveyor Belt Domain

BibTeX