I study the geometry of robot motion for manipulation, especially the Riemannian geometry of the motion optimization problem, how it interacts with workspace geometry in the presence of obstacles, and how it relates to second-order methods for nonlinear constrained optimization. Earlier work in these areas include algorithms for motion planing using fast trajectory optimizers (CHOMP) and methods for learning local mappings to latent Euclidean spaces for control. I'm currently working to generalize these ideas into a unified optimization framework called Riemannian Motion Optimization (RieMO) with specific applications to the dual arm manipulation platform Apollo at the Max Planck Institute for Intelligent Systems Autonomous Motion Department.
I completed my Ph.D. work at Carnegie Mellon University’s Robotics Institute under Professor J. Andrew Bagnell in 2009. Since then I've been at TTI-C on the University of Chicago Campus building robots, Intel Labs in both Seattle and Pittsburgh studying trajectory optimization, and Google developing large scale learning systems to assess the quality of Ad Landing Pages. I'm currently part of Stefan Schaal's the Autonomous Motion Department (AMD) at the Max Planck Institute of Intelligent Systems in Tübingen, and I collaborate closely with Marc Toussaint's Machine Learning and Robotics lab at the University of Stuttgart where I have taught courses on Advanced Robotics and Mathematics for Intelligent Systems.
In Proceedings of Robotics: Science and Systems, Rome, Italy, Robotics: Science and Systems XI, July 2015 (inproceedings)
Inverse Optimal Control (IOC) has strongly impacted the systems engineering process, enabling automated planner tuning through straightforward and intuitive demonstration. The most successful and established applications, though, have been in lower dimensional problems such as navigation planning where exact optimal planning or control is feasible. In higher dimensional systems, such as humanoid robots, research has made substantial progress toward generalizing the ideas to model free or locally optimal settings, but these systems are complicated to the point where demonstration itself can be difficult. Typically, real-world applications are restricted to at best noisy or even partial or incomplete demonstrations that prove cumbersome in existing frameworks. This work derives a very flexible method of IOC based on a form of Structured Prediction known as Direct Loss Minimization. The resulting algorithm is essentially Policy Search on a reward function that rewards similarity to demonstrated behavior (using Covariance Matrix Adaptation (CMA) in our experiments). Our framework blurs the distinction between IOC, other forms of Imitation Learning, and Reinforcement Learning, enabling us to derive simple, versatile, and practical algorithms that blend imitation and reinforcement signals into a unified framework. Our experiments analyze various aspects of its performance and demonstrate its efficacy on conveying preferences for motion shaping and combined reach and grasp quality optimization.
In Reinforcement Learning and Decision Making, 2015 (inproceedings)
For robots to be able to manipulate in unknown and unstructured environments the robot should be capable of operating under partial observability of the environment. Object occlusions and unmodeled environments are some of the factors that result in partial observability. A common scenario where this is encountered is manipulation in clutter. In the case that the robot needs to locate an object of interest and manipulate it, it needs to perform a series of decluttering actions to accurately detect the object of interest. To perform such a series of actions, the robot also needs to account for the dynamics of objects in the environment and how they react to contact. This is a non trivial problem since one needs to reason not only about robot-object interactions but also object-object interactions in the presence of contact. In the example scenario of manipulation in clutter, the state vector would have to account for the pose of the object of interest and the structure of the surrounding environment. The process model would have to account for all the aforementioned robot-object, object-object interactions. The complexity of the process model grows exponentially as the number of objects in the scene increases. This is commonly the case in unstructured environments. Hence it is not reasonable to attempt to model all object-object and robot-object interactions explicitly. Under this setting we propose a hypothesis based action selection algorithm where we construct a hypothesis set of the possible poses of an object of interest given the current evidence in the scene and select actions based on our current set of hypothesis. This hypothesis set tends to represent the belief about the structure of the environment and the number of poses the object of interest can take. The agent's only stopping criterion is when the uncertainty regarding the pose of the object is fully resolved.
In Proceedings of the International Conference on Intelligent Robots and Systems, Chicago, IL, October 2014 (inproceedings)
Efficient manipulation requires contact to reduce uncertainty. The manipulation literature refers to this as funneling: a methodology for increasing reliability and robustness by leveraging haptic feedback and control of environmental interaction. However, there is a fundamental gap between traditional approaches to trajectory optimization and this concept of robustness by funneling: traditional trajectory optimizers do not discover force feedback strategies. From a POMDP perspective, these behaviors could be regarded as explicit obser- vation actions planned to sufficiently reduce uncertainty thereby enabling a task. While we are sympathetic to the full POMDP view, solving full continuous-space POMDPs in high-dimensions is hard. In this paper, we propose an alternative approach in which trajectory optimization objectives are augmented with new terms that reward uncertainty reduction through contacts, explicitly promoting funneling. This augmentation shifts the responsibility of robustness toward the actual execution of the optimized trajectories. Directly tracing trajectories through configuration space would lose all robustnessâ??dual execution achieves robustness by devising force controllers to reproduce the temporal interaction profile encoded in the dual solution of the optimization problem. This work introduces dual execution in depth and analyze its performance through robustness experiments in both simulation and on a real-world robotic platform.
Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems