Optimal Interactive Learning on the Job via Facility Location Planning

1Brown University
, 2Carnegie Mellon University

Published at RSS 2025

Abstract

Collaborative robots have the ability to adapt and improve their behavior by learning from their human users. By interactively learning on the job, these robots can both acquire new motor skills and customize their behavior to personal user preferences. However, for this paradigm to be viable, there must be a balance between teaching the robot necessary skills, minimizing user burden, and maintaining task progress. We propose COIL, a novel polynomial-time interaction planner that explicitly minimizes human effort while ensuring the completion of a given sequence of tasks according to hidden user preferences. When user preferences are known, we formulate this planning-to-learn problem as an uncapacitated facility location problem. COIL utilizes efficient approximation algorithms for facility location to plan in the case of unknown preferences in polynomial time. In contrast, prior methods do not guarantee minimization of human effort nor consider the inherently collaborative nature of learning on the job, in which timely task execution may require the robot to forego learning and instead request human contributions. Simulated and physical experiments on manipulation tasks show that our framework significantly reduces the amount of work allocated to the human while maintaining successful task completion.

Video

Long Overview


Our key theoretical contribution is a novel formulation of multi-task interactive robot learning as an instance of the uncapacitated facility location (UFL) problem. Different from prior works, our formulation is cost-optimal and jointly minimizes human burden during learning and deployment.


Facility location formulation. Tasks ๐›• are demands to be satisfied. Facilities correspond to interactive actions available for every task. We highlight facilities for ๐›•2: Human facility can only service ๐›•2. Skill facility can service similar future tasks ๐›•2, ๐›•3, ๐›•4. Robot facility cannot service any task as the robot hasn't learned a skill yet. Furthermore, none of the facilities can service past tasks.


Results: Gridworld Domain

In the Gridworld domain, we find that COIL makes fewer preference queries than the confidence-based baselines because COIL only asks for human preferences if it believes that this information will be useful later. Format is mean(standard deviation).


Results: Manipulation Domain

Results on the manipulation domain. On average, COIL plans interactions that result in 7% to 18% reduction in cost compared to the best performing baseline. The improvement over baselines is particularly marked when the cost of teaching is more expensive than assigning the task to the human, i.e, medium and high cost profiles. The reported statistics are averaged over 10 interactions with 30 randomly sampled tasks each.


Results: Real World Conveyor Belt Domain

Results on a physical conveyor. We ran experiments with 5 different task sequences, each with 20 objects, with COIL and CADL. We observed teaching failure on the white mug (pos- sibly because its shiny surface made camera-based pose estimation difficult). Hence, we report the run with teaching failure separately from the other 4 runs. COIL was able to achieve significantly lower cost than the baseline in both situations. CADL especially struggled in the case of teaching failure as it repeatedly requested to be taught the mug skill.

BibTeX



        @article{vats2025optimal,
          title={Optimal Interactive Learning on the Job via Facility Location Planning},
          author={Vats, Shivam and Zhao, Michelle and Callaghan, Patrick and Jia, Mingxi and Likhachev, Maxim and Kroemer, Oliver and Konidaris, George},
          journal={Robotics: Systems and Science},
          year={2025}
        }