Anticipatory Action Selection for Human-Robot Table Tennis





Abstract Anticipation can enhance the capability of a robot in its interaction with humans, where the robot predicts the humans' intention for selecting its own action. We present a novel framework of anticipatory action selection for human-robot interaction, which is capable to handle nonlinear and stochastic human behaviors such as table tennis strokes and allows the robot to choose the optimal action based on prediction of the human partner's intention with uncertainty. The presented framework is generic and can be used in many human-robot interaction scenarios, for example, in navigation and human-robot co-manipulation. In this article, we conduct a case study on human-robot table tennis. Due to the limited amount of time for executing hitting movements, a robot usually needs to initiate its hitting movement before the opponent hits the ball, which requires the robot to be anticipatory based on visual observation of the opponent's movement. Previous work on Intention-Driven Dynamics Models (IDDM) allowed the robot to predict the intended target of the opponent. In this article, we address the problem of action selection and optimal timing for initiating a chosen action by formulating the anticipatory action selection as a Partially Observable Markov Decision Process (POMDP), where the transition and observation are modeled by the \{IDDM\} framework. We present two approaches to anticipatory action selection based on the \{POMDP\} formulation, i.e., a model-free policy learning method based on Least-Squares Policy Iteration (LSPI) that employs the \{IDDM\} for belief updates, and a model-based Monte-Carlo Planning (MCP) method, which benefits from the transition and observation model by the IDDM. Experimental results using real data in a simulated environment show the importance of anticipatory action selection, and that \{POMDPs\} are suitable to formulate the anticipatory action selection problem by taking into account the uncertainties in prediction. We also show that existing algorithms for POMDPs, such as \{LSPI\} and MCP, can be applied to substantially improve the robot's performance in its interaction with humans.

Author(s): Wang, Z. and Boularias, A. and Mülling, K. and Schölkopf, B. and Peters, J.
Journal: Artificial Intelligence
Volume: 247
Pages: 399--414
Year: 2017
Day: 0

Department(s): Autonomous Motion, Empirical Inference
Bibtex Type: Article (article)

DOI: 10.1016/j.artint.2014.11.007
Note: Special Issue on AI and Robotics
State: Published


  title = {Anticipatory Action Selection for Human-Robot Table Tennis},
  author = {Wang, Z. and Boularias, A. and M{\"u}lling, K. and Sch{\"o}lkopf, B. and Peters, J.},
  journal = {Artificial Intelligence},
  volume = {247},
  pages = {399--414},
  year = {2017},
  note = {},
  crossref = {}