Header logo is


2017


Multi-Modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets
Multi-Modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets

Hausman, K., Chebotar, Y., Schaal, S., Sukhatme, G., Lim, J.

In Proceedings from the conference "Neural Information Processing Systems 2017., (Editors: Guyon I. and Luxburg U.v. and Bengio S. and Wallach H. and Fergus R. and Vishwanathan S. and Garnett R.), Curran Associates, Inc., Advances in Neural Information Processing Systems 30 (NIPS), December 2017 (inproceedings)

am

pdf video [BibTex]

2017


pdf video [BibTex]


On the Design of {LQR} Kernels for Efficient Controller Learning
On the Design of LQR Kernels for Efficient Controller Learning

Marco, A., Hennig, P., Schaal, S., Trimpe, S.

Proceedings of the 56th IEEE Annual Conference on Decision and Control (CDC), pages: 5193-5200, IEEE, IEEE Conference on Decision and Control, December 2017 (conference)

Abstract
Finding optimal feedback controllers for nonlinear dynamic systems from data is hard. Recently, Bayesian optimization (BO) has been proposed as a powerful framework for direct controller tuning from experimental trials. For selecting the next query point and finding the global optimum, BO relies on a probabilistic description of the latent objective function, typically a Gaussian process (GP). As is shown herein, GPs with a common kernel choice can, however, lead to poor learning outcomes on standard quadratic control problems. For a first-order system, we construct two kernels that specifically leverage the structure of the well-known Linear Quadratic Regulator (LQR), yet retain the flexibility of Bayesian nonparametric learning. Simulations of uncertain linear and nonlinear systems demonstrate that the LQR kernels yield superior learning performance.

am ics pn

arXiv PDF On the Design of LQR Kernels for Efficient Controller Learning - CDC presentation DOI Project Page [BibTex]

arXiv PDF On the Design of LQR Kernels for Efficient Controller Learning - CDC presentation DOI Project Page [BibTex]


no image
Synchronicity Trumps Mischief in Rhythmic Human-Robot Social-Physical Interaction

Fitter, N. T., Kuchenbecker, K. J.

In Proceedings of the International Symposium on Robotics Research (ISRR), Puerto Varas, Chile, December 2017 (inproceedings) In press

Abstract
Hand-clapping games and other forms of rhythmic social-physical interaction might help foster human-robot teamwork, but the design of such interactions has scarcely been explored. We leveraged our prior work to enable the Rethink Robotics Baxter Research Robot to competently play one-handed tempo-matching hand-clapping games with a human user. To understand how such a robot’s capabilities and behaviors affect user perception, we created four versions of this interaction: the hand clapping could be initiated by either the robot or the human, and the non-initiating partner could be either cooperative, yielding synchronous motion, or mischievously uncooperative. Twenty adults tested two clapping tempos in each of these four interaction modes in a random order, rating every trial on standardized scales. The study results showed that having the robot initiate the interaction gave it a more dominant perceived personality. Despite previous results on the intrigue of misbehaving robots, we found that moving synchronously with the robot almost always made the interaction more enjoyable, less mentally taxing, less physically demanding, and lower effort for users than asynchronous interactions caused by robot or human mischief. Taken together, our results indicate that cooperative rhythmic social-physical interaction has the potential to strengthen human-robot partnerships.

hi

[BibTex]

[BibTex]


Optimizing Long-term Predictions for Model-based Policy Search
Optimizing Long-term Predictions for Model-based Policy Search

Doerr, A., Daniel, C., Nguyen-Tuong, D., Marco, A., Schaal, S., Toussaint, M., Trimpe, S.

Proceedings of 1st Annual Conference on Robot Learning (CoRL), 78, pages: 227-238, (Editors: Sergey Levine and Vincent Vanhoucke and Ken Goldberg), 1st Annual Conference on Robot Learning, November 2017 (conference)

Abstract
We propose a novel long-term optimization criterion to improve the robustness of model-based reinforcement learning in real-world scenarios. Learning a dynamics model to derive a solution promises much greater data-efficiency and reusability compared to model-free alternatives. In practice, however, modelbased RL suffers from various imperfections such as noisy input and output data, delays and unmeasured (latent) states. To achieve higher resilience against such effects, we propose to optimize a generative long-term prediction model directly with respect to the likelihood of observed trajectories as opposed to the common approach of optimizing a dynamics model for one-step-ahead predictions. We evaluate the proposed method on several artificial and real-world benchmark problems and compare it to PILCO, a model-based RL framework, in experiments on a manipulation robot. The results show that the proposed method is competitive compared to state-of-the-art model learning methods. In contrast to these more involved models, our model can directly be employed for policy search and outperforms a baseline method in the robot experiment.

am ics

PDF Project Page [BibTex]

PDF Project Page [BibTex]


no image
A New Data Source for Inverse Dynamics Learning

Kappler, D., Meier, F., Ratliff, N., Schaal, S.

In Proceedings IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, Piscataway, NJ, USA, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), September 2017 (inproceedings)

am

[BibTex]

[BibTex]


no image
Bayesian Regression for Artifact Correction in Electroencephalography

Fiebig, K., Jayaram, V., Hesse, T., Blank, A., Peters, J., Grosse-Wentrup, M.

Proceedings of the 7th Graz Brain-Computer Interface Conference 2017 - From Vision to Reality, pages: 131-136, (Editors: Müller-Putz G.R., Steyrl D., Wriessnegger S. C., Scherer R.), Graz University of Technology, Austria, Graz Brain-Computer Interface Conference, September 2017 (conference)

am ei

DOI [BibTex]

DOI [BibTex]


no image
Investigating Music Imagery as a Cognitive Paradigm for Low-Cost Brain-Computer Interfaces

Grossberger, L., Hohmann, M. R., Peters, J., Grosse-Wentrup, M.

Proceedings of the 7th Graz Brain-Computer Interface Conference 2017 - From Vision to Reality, pages: 160-164, (Editors: Müller-Putz G.R., Steyrl D., Wriessnegger S. C., Scherer R.), Graz University of Technology, Austria, Graz Brain-Computer Interface Conference, September 2017 (conference)

am ei

DOI [BibTex]

DOI [BibTex]


On the relevance of grasp metrics for predicting grasp success
On the relevance of grasp metrics for predicting grasp success

Rubert, C., Kappler, D., Morales, A., Schaal, S., Bohg, J.

In Proceedings of the IEEE/RSJ International Conference of Intelligent Robots and Systems, September 2017 (inproceedings) Accepted

Abstract
We aim to reliably predict whether a grasp on a known object is successful before it is executed in the real world. There is an entire suite of grasp metrics that has already been developed which rely on precisely known contact points between object and hand. However, it remains unclear whether and how they may be combined into a general purpose grasp stability predictor. In this paper, we analyze these questions by leveraging a large scale database of simulated grasps on a wide variety of objects. For each grasp, we compute the value of seven metrics. Each grasp is annotated by human subjects with ground truth stability labels. Given this data set, we train several classification methods to find out whether there is some underlying, non-trivial structure in the data that is difficult to model manually but can be learned. Quantitative and qualitative results show the complexity of the prediction problem. We found that a good prediction performance critically depends on using a combination of metrics as input features. Furthermore, non-parametric and non-linear classifiers best capture the structure in the data.

am

Project Page [BibTex]

Project Page [BibTex]


no image
Local Bayesian Optimization of Motor Skills

Akrour, R., Sorokin, D., Peters, J., Neumann, G.

Proceedings of the 34th International Conference on Machine Learning, 70, pages: 41-50, Proceedings of Machine Learning Research, (Editors: Doina Precup, Yee Whye Teh), PMLR, International Conference on Machine Learning (ICML), August 2017 (conference)

am ei

link (url) Project Page [BibTex]

link (url) Project Page [BibTex]


Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning
Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning

Chebotar, Y., Hausman, K., Zhang, M., Sukhatme, G., Schaal, S., Levine, S.

Proceedings of the 34th International Conference on Machine Learning, 70, Proceedings of Machine Learning Research, (Editors: Doina Precup, Yee Whye Teh), PMLR, International Conference on Machine Learning (ICML), August 2017 (conference)

am

pdf video [BibTex]

pdf video [BibTex]


no image
Stiffness Perception during Pinching and Dissection with Teleoperated Haptic Forceps

Ng, C., Zareinia, K., Sun, Q., Kuchenbecker, K. J.

In Proceedings of the International Symposium on Robot and Human Interactive Communication (RO-MAN), pages: 456-463, Lisbon, Portugal, August 2017 (inproceedings)

hi

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Towards quantifying dynamic human-human physical interactions for robot assisted stroke therapy

Mohan, M., Mendonca, R., Johnson, M. J.

In Proceedings of the IEEE International Conference on Rehabilitation Robotics (ICORR), London, UK, July 2017 (inproceedings)

Abstract
Human-Robot Interaction is a prominent field of robotics today. Knowledge of human-human physical interaction can prove vital in creating dynamic physical interactions between human and robots. Most of the current work in studying this interaction has been from a haptic perspective. Through this paper, we present metrics that can be used to identify if a physical interaction occurred between two people using kinematics. We present a simple Activity of Daily Living (ADL) task which involves a simple interaction. We show that we can use these metrics to successfully identify interactions.

hi

DOI [BibTex]

DOI [BibTex]


no image
Design of a Parallel Continuum Manipulator for 6-DOF Fingertip Haptic Display

Young, E. M., Kuchenbecker, K. J.

In Proceedings of the IEEE World Haptics Conference (WHC), pages: 599-604, Munich, Germany, June 2017, Finalist for best poster paper (inproceedings)

Abstract
Despite rapid advancements in the field of fingertip haptics, rendering tactile cues with six degrees of freedom (6 DOF) remains an elusive challenge. In this paper, we investigate the potential of displaying fingertip haptic sensations with a 6-DOF parallel continuum manipulator (PCM) that mounts to the user's index finger and moves a contact platform around the fingertip. Compared to traditional mechanisms composed of rigid links and discrete joints, PCMs have the potential to be strong, dexterous, and compact, but they are also more complicated to design. We define the design space of 6-DOF parallel continuum manipulators and outline a process for refining such a device for fingertip haptic applications. Following extensive simulation, we obtain 12 designs that meet our specifications, construct a manually actuated prototype of one such design, and evaluate the simulation's ability to accurately predict the prototype's motion. Finally, we demonstrate the range of deliverable fingertip tactile cues, including a normal force into the finger and shear forces tangent to the finger at three extreme points on the boundary of the fingertip.

hi

DOI Project Page [BibTex]

DOI Project Page [BibTex]


no image
High Magnitude Unidirectional Haptic Force Display Using a Motor/Brake Pair and a Cable

Hu, S., Kuchenbecker, K. J.

In Proceedings of the IEEE World Haptics Conference (WHC), pages: 394-399, Munich, Germany, June 2017 (inproceedings)

Abstract
Clever electromechanical design is required to make the force feedback delivered by a kinesthetic haptic interface both strong and safe. This paper explores a onedimensional haptic force display that combines a DC motor and a magnetic particle brake on the same shaft. Rather than a rigid linkage, a spooled cable connects the user to the actuators to enable a large workspace, reduce the moving mass, and eliminate the sticky residual force from the brake. This design combines the high torque/power ratio of the brake and the active output capabilities of the motor to provide a wider range of forces than can be achieved with either actuator alone. A prototype of this device was built, its performance was characterized, and it was used to simulate constant force sources and virtual springs and dampers. Compared to the conventional design of using only a motor, the hybrid device can output higher unidirectional forces at the expense of free space feeling less free.

hi

DOI Project Page [BibTex]

DOI Project Page [BibTex]


no image
A Stimulus-Response Model Of Therapist-Patient Interactions In Task-Oriented Stroke Therapy Can Guide Robot-Patient Interactions

Johnson, M., Mohan, M., Mendonca, R.

In Proceedings of the Annual Rehabilitation Engineering and Assistive Technology Society of North America (RESNA) Conference, New Orleans, USA, June 2017 (inproceedings)

Abstract
Current robot-patient interactions do not accurately model therapist-patient interactions in task-oriented stroke therapy. We analyzed patient-therapist interactions in task-oriented stroke therapy captured in 8 videos. We developed a model of the interaction between a patient and a therapist that can be overlaid on a stimulus-response paradigm where the therapist and the patient take on a set of acting states or roles and are motivated to move from one role to another when certain physical or verbal stimuli or cues are sensed and received. We examined how the model varies across 8 activities of daily living tasks and map this to a possible model for robot-patient interaction.

hi

link (url) [BibTex]

link (url) [BibTex]


no image
A Wrist-Squeezing Force-Feedback System for Robotic Surgery Training

Brown, J. D., Fernandez, J. N., Cohen, S. P., Kuchenbecker, K. J.

In Proceedings of the IEEE World Haptics Conference (WHC), pages: 107-112, Munich, Germany, June 2017 (inproceedings)

Abstract
Over time, surgical trainees learn to compensate for the lack of haptic feedback in commercial robotic minimally invasive surgical systems. Incorporating touch cues into robotic surgery training could potentially shorten this learning process if the benefits of haptic feedback were sustained after it is removed. In this paper, we develop a wrist-squeezing haptic feedback system and evaluate whether it holds the potential to train novice da Vinci users to reduce the force they exert on a bimanual inanimate training task. Subjects were randomly divided into two groups according to a multiple baseline experimental design. Each of the ten participants moved a ring along a curved wire nine times while the haptic feedback was conditionally withheld, provided, and withheld again. The realtime tactile feedback of applied force magnitude significantly reduced the integral of the force produced by the da Vinci tools on the task materials, and this result remained even when the haptic feedback was removed. Overall, our findings suggest that wrist-squeezing force feedback can play an essential role in helping novice trainees learn to minimize the force they exert with a surgical robot.

hi

DOI [BibTex]

DOI [BibTex]


no image
Handling Scan-Time Parameters in Haptic Surface Classification

Burka, A., Kuchenbecker, K. J.

In Proceedings of the IEEE World Haptics Conference (WHC), pages: 424-429, Munich, Germany, June 2017 (inproceedings)

hi

DOI Project Page [BibTex]

DOI Project Page [BibTex]


Model-Based Policy Search for Automatic Tuning of Multivariate PID Controllers
Model-Based Policy Search for Automatic Tuning of Multivariate PID Controllers

Doerr, A., Nguyen-Tuong, D., Marco, A., Schaal, S., Trimpe, S.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages: 5295-5301, IEEE, Piscataway, NJ, USA, IEEE International Conference on Robotics and Automation (ICRA), May 2017 (inproceedings)

am ics

PDF arXiv DOI Project Page [BibTex]

PDF arXiv DOI Project Page [BibTex]


Learning Feedback Terms for Reactive Planning and Control
Learning Feedback Terms for Reactive Planning and Control

Rai, A., Sutanto, G., Schaal, S., Meier, F.

Proceedings 2017 IEEE International Conference on Robotics and Automation (ICRA), IEEE, Piscataway, NJ, USA, IEEE International Conference on Robotics and Automation (ICRA), May 2017 (conference)

am

pdf video [BibTex]

pdf video [BibTex]


Virtual vs. {R}eal: Trading Off Simulations and Physical Experiments in Reinforcement Learning with {B}ayesian Optimization
Virtual vs. Real: Trading Off Simulations and Physical Experiments in Reinforcement Learning with Bayesian Optimization

Marco, A., Berkenkamp, F., Hennig, P., Schoellig, A. P., Krause, A., Schaal, S., Trimpe, S.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages: 1557-1563, IEEE, Piscataway, NJ, USA, IEEE International Conference on Robotics and Automation (ICRA), May 2017 (inproceedings)

am ics pn

PDF arXiv ICRA 2017 Spotlight presentation Virtual vs. Real - Video explanation DOI Project Page [BibTex]

PDF arXiv ICRA 2017 Spotlight presentation Virtual vs. Real - Video explanation DOI Project Page [BibTex]


no image
Proton 2: Increasing the Sensitivity and Portability of a Visuo-haptic Surface Interaction Recorder

Burka, A., Rajvanshi, A., Allen, S., Kuchenbecker, K. J.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages: 439-445, Singapore, May 2017 (inproceedings)

Abstract
The Portable Robotic Optical/Tactile ObservatioN PACKage (PROTONPACK, or Proton for short) is a new handheld visuo-haptic sensing system that records surface interactions. We previously demonstrated system calibration and a classification task using external motion tracking. This paper details improvements in surface classification performance and removal of the dependence on external motion tracking, necessary before embarking on our goal of gathering a vast surface interaction dataset. Two experiments were performed to refine data collection parameters. After adjusting the placement and filtering of the Proton's high-bandwidth accelerometers, we recorded interactions between two differently-sized steel tooling ball end-effectors (diameter 6.35 and 9.525 mm) and five surfaces. Using features based on normal force, tangential force, end-effector speed, and contact vibration, we trained multi-class SVMs to classify the surfaces using 50 ms chunks of data from each end-effector. Classification accuracies of 84.5% and 91.5% respectively were achieved on unseen test data, an improvement over prior results. In parallel, we pursued on-board motion tracking, using the Proton's camera and fiducial markers. Motion tracks from the external and onboard trackers agree within 2 mm and 0.01 rad RMS, and the accuracy decreases only slightly to 87.7% when using onboard tracking for the 9.525 mm end-effector. These experiments indicate that the Proton 2 is ready for portable data collection.

hi

DOI Project Page [BibTex]

DOI Project Page [BibTex]


no image
Pattern Generation for Walking on Slippery Terrains

Khadiv, M., Moosavian, S. A. A., Herzog, A., Righetti, L.

In 2017 5th International Conference on Robotics and Mechatronics (ICROM), Iran, August 2017 (inproceedings)

Abstract
In this paper, we extend state of the art Model Predictive Control (MPC) approaches to generate safe bipedal walking on slippery surfaces. In this setting, we formulate walking as a trade off between realizing a desired walking velocity and preserving robust foot-ground contact. Exploiting this for- mulation inside MPC, we show that safe walking on various flat terrains can be achieved by compromising three main attributes, i. e. walking velocity tracking, the Zero Moment Point (ZMP) modulation, and the Required Coefficient of Friction (RCoF) regulation. Simulation results show that increasing the walking velocity increases the possibility of slippage, while reducing the slippage possibility conflicts with reducing the tip-over possibility of the contact and vice versa.

mg

link (url) [BibTex]

link (url) [BibTex]

2013


Probabilistic Object Tracking Using a Range Camera
Probabilistic Object Tracking Using a Range Camera

Wüthrich, M., Pastor, P., Kalakrishnan, M., Bohg, J., Schaal, S.

In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages: 3195-3202, IEEE, November 2013 (inproceedings)

Abstract
We address the problem of tracking the 6-DoF pose of an object while it is being manipulated by a human or a robot. We use a dynamic Bayesian network to perform inference and compute a posterior distribution over the current object pose. Depending on whether a robot or a human manipulates the object, we employ a process model with or without knowledge of control inputs. Observations are obtained from a range camera. As opposed to previous object tracking methods, we explicitly model self-occlusions and occlusions from the environment, e.g, the human or robotic hand. This leads to a strongly non-linear observation model and additional dependencies in the Bayesian network. We employ a Rao-Blackwellised particle filter to compute an estimate of the object pose at every time step. In a set of experiments, we demonstrate the ability of our method to accurately and robustly track the object pose in real-time while it is being manipulated by a human or a robot.

am

arXiv Video Code Video DOI Project Page [BibTex]

2013


arXiv Video Code Video DOI Project Page [BibTex]


no image
Governance of Humanoid Robot Using Master Exoskeleton

Kumra, S., Mohan, M., Gupta, S., Vaswani, H.

In Proceedings of the IEEE International Symposium on Robotics (ISR), Seoul, South Korea, October 2013 (inproceedings)

Abstract
Dexto:Eka: is an adult-size humanoid robot being developed with the aim of achieving tele-presence. The paper sheds light on the control of this robot using a Master Exoskeleton which comprises of an Exo-Frame, a Control Column and a Graphical User Interface. It further illuminates the processes and algorithms that have been utilized to make an efficient system that would effectively emulate a tele-operator.

hi

DOI [BibTex]

DOI [BibTex]


no image
Design and development part 2 of Dexto:Eka: - The humanoid robot

Kumra, S., Mohan, M., Gupta, S., Vaswani, H.

In Proceedings of the International Conference on Mechatronics and Automation (ICMA), Takamatsu, Japan, August 2013 (inproceedings)

Abstract
Through this paper, we elucidate the second phase of the design and development of the tele-operated humanoid robot Dexto:Eka:. Phase one comprised of the development of a 6 DoF left anthropomorphic arm and left exo-frame. Here, we illustrate the development of the right arm, right exo-frame, torso, backbone, human machine interface and omni-directional locomotion system. Dexto:Eka: will be able to communicate with a remote user through Wi-Fi. An exo-frame capacitates it to emulate human arms and its locomotion is controlled by joystick. A Graphical User Interface monitors and helps in controlling the system.

hi

DOI [BibTex]

DOI [BibTex]


Hypothesis Testing Framework for Active Object Detection
Hypothesis Testing Framework for Active Object Detection

Sankaran, B., Atanasov, N., Le Ny, J., Koletschka, T., Pappas, G., Daniilidis, K.

In IEEE International Conference on Robotics and Automation (ICRA), May 2013, clmc (inproceedings)

Abstract
One of the central problems in computer vision is the detection of semantically important objects and the estimation of their pose. Most of the work in object detection has been based on single image processing and its performance is limited by occlusions and ambiguity in appearance and geometry. This paper proposes an active approach to object detection by controlling the point of view of a mobile depth camera. When an initial static detection phase identifies an object of interest, several hypotheses are made about its class and orientation. The sensor then plans a sequence of view-points, which balances the amount of energy used to move with the chance of identifying the correct hypothesis. We formulate an active M-ary hypothesis testing problem, which includes sensor mobility, and solve it using a point-based approximate POMDP algorithm. The validity of our approach is verified through simulation and experiments with real scenes captured by a kinect sensor. The results suggest a significant improvement over static object detection.

am

pdf [BibTex]

pdf [BibTex]


no image
Action and Goal Related Decision Variables Modulate the Competition Between Multiple Potential Targets

Enachescu, V, Christopoulos, Vassilios N, Schrater, P. R., Schaal, S.

In Abstracts of Neural Control of Movement Conference (NCM 2013), February 2013 (inproceedings)

am

[BibTex]

[BibTex]


Fusing visual and tactile sensing for 3-D object reconstruction while grasping
Fusing visual and tactile sensing for 3-D object reconstruction while grasping

Ilonen, J., Bohg, J., Kyrki, V.

In IEEE International Conference on Robotics and Automation (ICRA), pages: 3547-3554, 2013 (inproceedings)

Abstract
In this work, we propose to reconstruct a complete 3-D model of an unknown object by fusion of visual and tactile information while the object is grasped. Assuming the object is symmetric, a first hypothesis of its complete 3-D shape is generated from a single view. This initial model is used to plan a grasp on the object which is then executed with a robotic manipulator equipped with tactile sensors. Given the detected contacts between the fingers and the object, the full object model including the symmetry parameters can be refined. This refined model will then allow the planning of more complex manipulation tasks. The main contribution of this work is an optimal estimation approach for the fusion of visual and tactile data applying the constraint of object symmetry. The fusion is formulated as a state estimation problem and solved with an iterative extended Kalman filter. The approach is validated experimentally using both artificial and real data from two different robotic platforms.

am

DOI Project Page [BibTex]

DOI Project Page [BibTex]


no image
AGILITY – Dynamic Full Body Locomotion and Manipulation with Autonomous Legged Robots

Hutter, M., Bloesch, M., Buchli, J., Semini, C., Bazeille, S., Righetti, L., Bohg, J.

In 2013 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), pages: 1-4, IEEE, Linköping, Sweden, 2013 (inproceedings)

mg

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Learning Objective Functions for Manipulation

Kalakrishnan, M., Pastor, P., Righetti, L., Schaal, S.

In 2013 IEEE International Conference on Robotics and Automation, IEEE, Karlsruhe, Germany, 2013 (inproceedings)

Abstract
We present an approach to learning objective functions for robotic manipulation based on inverse reinforcement learning. Our path integral inverse reinforcement learning algorithm can deal with high-dimensional continuous state-action spaces, and only requires local optimality of demonstrated trajectories. We use L 1 regularization in order to achieve feature selection, and propose an efficient algorithm to minimize the resulting convex objective function. We demonstrate our approach by applying it to two core problems in robotic manipulation. First, we learn a cost function for redundancy resolution in inverse kinematics. Second, we use our method to learn a cost function over trajectories, which is then used in optimization-based motion planning for grasping and manipulation tasks. Experimental results show that our method outperforms previous algorithms in high-dimensional settings.

am mg

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Learning Task Error Models for Manipulation

Pastor, P., Kalakrishnan, M., Binney, J., Kelly, J., Righetti, L., Sukhatme, G. S., Schaal, S.

In 2013 IEEE Conference on Robotics and Automation, IEEE, Karlsruhe, Germany, 2013 (inproceedings)

Abstract
Precise kinematic forward models are important for robots to successfully perform dexterous grasping and manipulation tasks, especially when visual servoing is rendered infeasible due to occlusions. A lot of research has been conducted to estimate geometric and non-geometric parameters of kinematic chains to minimize reconstruction errors. However, kinematic chains can include non-linearities, e.g. due to cable stretch and motor-side encoders, that result in significantly different errors for different parts of the state space. Previous work either does not consider such non-linearities or proposes to estimate non-geometric parameters of carefully engineered models that are robot specific. We propose a data-driven approach that learns task error models that account for such unmodeled non-linearities. We argue that in the context of grasping and manipulation, it is sufficient to achieve high accuracy in the task relevant state space. We identify this relevant state space using previously executed joint configurations and learn error corrections for those. Therefore, our system is developed to generate subsequent executions that are similar to previous ones. The experiments show that our method successfully captures the non-linearities in the head kinematic chain (due to a counterbalancing spring) and the arm kinematic chains (due to cable stretch) of the considered experimental platform, see Fig. 1. The feasibility of the presented error learning approach has also been evaluated in independent DARPA ARM-S testing contributing to successfully complete 67 out of 72 grasping and manipulation tasks.

am mg

link (url) DOI [BibTex]

link (url) DOI [BibTex]

2009


no image
Modelling the interplay of central pattern generation and sensory feedback in the neuromuscular control of running

Daley, M., Righetti, L., Ijspeert, A.

In Comparative Biochemistry and Physiology - Part A: Molecular & Integrative Physiology. Annual Main Meeting for the Society for Experimental Biology, 153, Glasgow, Scotland, 2009 (inproceedings)

mg

link (url) DOI [BibTex]

2009


link (url) DOI [BibTex]


no image
Path integral-based stochastic optimal control for rigid body dynamics

Theodorou, E. A., Buchli, J., Schaal, S.

In Adaptive Dynamic Programming and Reinforcement Learning, 2009. ADPRL ’09. IEEE Symposium on, pages: 219-225, 2009, clmc (inproceedings)

Abstract
Recent advances on path integral stochastic optimal control [1],[2] provide new insights in the optimal control of nonlinear stochastic systems which are linear in the controls, with state independent and time invariant control transition matrix. Under these assumptions, the Hamilton-Jacobi-Bellman (HJB) equation is formulated and linearized with the use of the logarithmic transformation of the optimal value function. The resulting HJB is a linear second order partial differential equation which is solved by an approximation based on the Feynman-Kac formula [3]. In this work we review the theory of path integral control and derive the linearized HJB equation for systems with state dependent control transition matrix. In addition we derive the path integral formulation for the general class of systems with state dimensionality that is higher than the dimensionality of the controls. Furthermore, by means of a modified inverse dynamics controller, we apply path integral stochastic optimal control over the new control space. Simulations illustrate the theoretical results. Future developments and extensions are discussed.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Learning locomotion over rough terrain using terrain templates

Kalakrishnan, M., Buchli, J., Pastor, P., Schaal, S.

In Intelligent Robots and Systems, 2009. IROS 2009. IEEE/RSJ International Conference on, pages: 167-172, 2009, clmc (inproceedings)

Abstract
We address the problem of foothold selection in robotic legged locomotion over very rough terrain. The difficulty of the problem we address here is comparable to that of human rock-climbing, where foot/hand-hold selection is one of the most critical aspects. Previous work in this domain typically involves defining a reward function over footholds as a weighted linear combination of terrain features. However, a significant amount of effort needs to be spent in designing these features in order to model more complex decision functions, and hand-tuning their weights is not a trivial task. We propose the use of terrain templates, which are discretized height maps of the terrain under a foothold on different length scales, as an alternative to manually designed features. We describe an algorithm that can simultaneously learn a small set of templates and a foothold ranking function using these templates, from expert-demonstrated footholds. Using the LittleDog quadruped robot, we experimentally show that the use of terrain templates can produce complex ranking functions with higher performance than standard terrain features, and improved generalization to unseen terrain.

am

link (url) Project Page [BibTex]

link (url) Project Page [BibTex]


no image
CESAR: A lunar crater exploration and sample return robot

Schwendner, J., Grimminger, F., Bartsch, S., Kaupisch, T., Yüksel, M., Bresser, A., Akpo, J. B., Seydel, M. K. -., Dieterle, A., Schmidt, S., Kirchner, F.

In 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages: 3355-3360, October 2009 (inproceedings)

am

DOI [BibTex]

DOI [BibTex]


no image
Concept Evaluation of a New Biologically Inspired Robot “Littleape”

Kühn, D., Römmermann, M., Sauthoff, N., Grimminger, F., Kirchner, F.

In Proceedings of the 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages: 589–594, IROS’09, IEEE Press, 2009 (inproceedings)

am

DOI [BibTex]

DOI [BibTex]


no image
Compact models of motor primitive variations for predictible reaching and obstacle avoidance

Stulp, F., Oztop, E., Pastor, P., Beetz, M., Schaal, S.

In IEEE-RAS International Conference on Humanoid Robots (Humanoids 2009), Paris, Dec.7-10, 2009, clmc (inproceedings)

Abstract
over and over again. This regularity allows humans and robots to reuse existing solutions for known recurring tasks. We expect that reusing a set of standard solutions to solve similar tasks will facilitate the design and on-line adaptation of the control systems of robots operating in human environments. In this paper, we derive a set of standard solutions for reaching behavior from human motion data. We also derive stereotypical reaching trajectories for variations of the task, in which obstacles are present. These stereotypical trajectories are then compactly represented with Dynamic Movement Primitives. On the humanoid robot Sarcos CB, this approach leads to reproducible, predictable, and human-like reaching motions.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Human optimization strategies under reward feedback

Hoffmann, H., Theodorou, E., Schaal, S.

In Abstracts of Neural Control of Movement Conference (NCM 2009), Waikoloa, Hawaii, 2009, 2009, clmc (inproceedings)

Abstract
Many hypothesis on human movement generation have been cast into an optimization framework, implying that movements are adapted to optimize a single quantity, like, e.g., jerk, end-point variance, or control cost. However, we still do not understand how humans actually learn when given only a cost or reward feedback at the end of a movement. Such a reinforcement learning setting has been extensively explored theoretically in engineering and computer science, but in human movement control, hardly any experiment studied movement learning under reward feedback. We present experiments probing which computational strategies humans use to optimize a movement under a continuous reward function. We present two experimental paradigms. The first paradigm mimics a ball-hitting task. Subjects (n=12) sat in front of a computer screen and moved a stylus on a tablet towards an unknown target. This target was located on a line that the subjects had to cross. During the movement, visual feedback was suppressed. After the movement, a reward was displayed graphically as a colored bar. As reward, we used a Gaussian function of the distance between the target location and the point of line crossing. We chose such a function since in sensorimotor tasks, the cost or loss function that humans seem to represent is close to an inverted Gaussian function (Koerding and Wolpert 2004). The second paradigm mimics pocket billiards. On the same experimental setup as above, the computer screen displayed a pocket (two bars), a white disk, and a green disk. The goal was to hit with the white disk the green disk (as in a billiard collision), such that the green disk moved into the pocket. Subjects (n=8) manipulated with the stylus the white disk to effectively choose start point and movement direction. Reward feedback was implicitly given as hitting or missing the pocket with the green disk. In both paradigms, subjects increased the average reward over trials. The surprising result was that in these experiments, humans seem to prefer a strategy that uses a reward-weighted average over previous movements instead of gradient ascent. The literature on reinforcement learning is dominated by gradient-ascent methods. However, our computer simulations and theoretical analysis revealed that reward-weighted averaging is the more robust choice given the amount of movement variance observed in humans. Apparently, humans choose an optimization strategy that is suitable for their own movement variance.

am

[BibTex]

[BibTex]


no image
Concept evaluation of a new biologically inspired robot “LittleApe”

Kühn, D., Römmermann, M., Sauthoff, N., Grimminger, F., Kirchner, F.

In 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages: 589-594, October 2009 (inproceedings)

am

DOI [BibTex]

DOI [BibTex]


no image
Proprioceptive control of a hybrid legged-wheeled robot

Eich, M., Grimminger, F., Kirchner, F.

In 2008 IEEE International Conference on Robotics and Biomimetics, pages: 774-779, February 2009 (inproceedings)

am

DOI [BibTex]

DOI [BibTex]


no image
Learning and generalization of motor skills by learning from demonstration

Pastor, P., Hoffmann, H., Asfour, T., Schaal, S.

In International Conference on Robotics and Automation (ICRA2009), Kobe, Japan, May 12-19, 2009, 2009, clmc (inproceedings)

Abstract
We provide a general approach for learning robotic motor skills from human demonstration. To represent an observed movement, a non-linear differential equation is learned such that it reproduces this movement. Based on this representation, we build a library of movements by labeling each recorded movement according to task and context (e.g., grasping, placing, and releasing). Our differential equation is formulated such that generalization can be achieved simply by adapting a start and a goal parameter in the equation to the desired position values of a movement. For object manipulation, we present how our framework extends to the control of gripper orientation and finger position. The feasibility of our approach is demonstrated in simulation as well as on a real robot. The robot learned a pick-and-place operation and a water-serving task and could generalize these tasks to novel situations.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Compliant quadruped locomotion over rough terrain

Buchli, J., Kalakrishnan, M., Mistry, M., Pastor, P., Schaal, S.

In Intelligent Robots and Systems, 2009. IROS 2009. IEEE/RSJ International Conference on, pages: 814-820, 2009, clmc (inproceedings)

Abstract
Many critical elements for statically stable walking for legged robots have been known for a long time, including stability criteria based on support polygons, good foothold selection, recovery strategies to name a few. All these criteria have to be accounted for in the planning as well as the control phase. Most legged robots usually employ high gain position control, which means that it is crucially important that the planned reference trajectories are a good match for the actual terrain, and that tracking is accurate. Such an approach leads to conservative controllers, i.e. relatively low speed, ground speed matching, etc. Not surprisingly such controllers are not very robust - they are not suited for the real world use outside of the laboratory where the knowledge of the world is limited and error prone. Thus, to achieve robust robotic locomotion in the archetypical domain of legged systems, namely complex rough terrain, where the size of the obstacles are in the order of leg length, additional elements are required. A possible solution to improve the robustness of legged locomotion is to maximize the compliance of the controller. While compliance is trivially achieved by reduced feedback gains, for terrain requiring precise foot placement (e.g. climbing rocks, walking over pegs or cracks) compliance cannot be introduced at the cost of inferior tracking. Thus, model-based control and - in contrast to passive dynamic walkers - active balance control is required. To achieve these objectives, in this paper we add two crucial elements to legged locomotion, i.e., floating-base inverse dynamics control and predictive force control, and we show that these elements increase robustness in face of unknown and unanticipated perturbations (e.g. obstacles). Furthermore, we introduce a novel line-based COG trajectory planner, which yields a simpler algorithm than traditional polygon based methods and creates the appropriate input to our control system.We show results from bot- h simulation and real world of a robotic dog walking over non-perceived obstacles and rocky terrain. The results prove the effectivity of the inverse dynamics/force controller. The presented results show that we have all elements needed for robust all-terrain locomotion, which should also generalize to other legged systems, e.g., humanoid robots.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Inertial parameter estimation of floating-base humanoid systems using partial force sensing

Mistry, M., Schaal, S., Yamane, K.

In IEEE-RAS International Conference on Humanoid Robots (Humanoids 2009), Paris, Dec.7-10, 2009, clmc (inproceedings)

Abstract
Recently, several controllers have been proposed for humanoid robots which rely on full-body dynamic models. The estimation of inertial parameters from data is a critical component for obtaining accurate models for control. However, floating base systems, such as humanoid robots, incur added challenges to this task (e.g. contact forces must be measured, contact states can change, etc.) In this work, we outline a theoretical framework for whole body inertial parameter estimation, including the unactuated floating base. Using a least squares minimization approach, conducted within the nullspace of unmeasured degrees of freedom, we are able to use a partial force sensor set for full-body estimation, e.g. using only joint torque sensors, allowing for estimation when contact force measurement is unavailable or unreliable (e.g. due to slipping, rolling contacts, etc.). We also propose how to determine the theoretical minimum force sensor set for full body estimation, and discuss the practical limitations of doing so.

am

link (url) [BibTex]

link (url) [BibTex]

2008


no image
Pattern generators with sensory feedback for the control of quadruped locomotion

Righetti, L., Ijspeert, A.

In 2008 IEEE International Conference on Robotics and Automation, pages: 819-824, IEEE, Pasadena, USA, 2008 (inproceedings)

Abstract
Central pattern generators (CPGs) are becoming a popular model for the control of locomotion of legged robots. Biological CPGs are neural networks responsible for the generation of rhythmic movements, especially locomotion. In robotics, a systematic way of designing such CPGs as artificial neural networks or systems of coupled oscillators with sensory feedback inclusion is still missing. In this contribution, we present a way of designing CPGs with coupled oscillators in which we can independently control the ascending and descending phases of the oscillations (i.e. the swing and stance phases of the limbs). Using insights from dynamical system theory, we construct generic networks of oscillators able to generate several gaits under simple parameter changes. Then we introduce a systematic way of adding sensory feedback from touch sensors in the CPG such that the controller is strongly coupled with the mechanical system it controls. Finally we control three different simulated robots (iCub, Aibo and Ghostdog) using the same controller to show the effectiveness of the approach. Our simulations prove the importance of independent control of swing and stance duration. The strong mutual coupling between the CPG and the robot allows for more robust locomotion, even under non precise parameters and non-flat environment.

mg

link (url) DOI [BibTex]

2008


link (url) DOI [BibTex]


no image
Experimental Study of Limit Cycle and Chaotic Controllers for the Locomotion of Centipede Robots

Matthey, L., Righetti, L., Ijspeert, A.

In 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages: 1860-1865, IEEE, Nice, France, sep 2008 (inproceedings)

Abstract
In this contribution we present a CPG (central pattern generator) controller based on coupled Rossler systems. It is able to generate both limit cycle and chaotic behaviors through bifurcation. We develop an experimental test bench to measure quantitatively the performance of different controllers on unknown terrains of increasing difficulty. First, we show that for flat terrains, open loop limit cycle systems are the most efficient (in terms of speed of locomotion) but that they are quite sensitive to environmental changes. Second, we show that sensory feedback is a crucial addition for unknown terrains. Third, we show that the chaotic controller with sensory feedback outperforms the other controllers in very difficult terrains and actually promotes the emergence of short synchronized movement patterns. All that is done using an unified framework for the generation of limit cycle and chaotic behaviors, where a simple parameter change can switch from one behavior to the other through bifurcation. Such flexibility would allow the automatic adaptation of the robot locomotion strategy to the terrain uncertainty.

mg

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Human movement generation based on convergent flow fields: A computational model and a behavioral experiment

Hoffmann, H., Schaal, S.

In Advances in Computational Motor Control VII, Symposium at the Society for Neuroscience Meeting, Washington DC, 2008, 2008, clmc (inproceedings)

am

link (url) [BibTex]

link (url) [BibTex]


no image
Movement reproduction and obstacle avoidance with dynamic movement primitives and potential fields

Park, D., Hoffmann, H., Pastor, P., Schaal, S.

In IEEE International Conference on Humanoid Robots, 2008., 2008, clmc (inproceedings)

am

PDF [BibTex]

PDF [BibTex]


no image
Wetting and premelting of triple junctions and grain boundaries in the Al-Zn alloys

Straumal, B., Kogtenkova, O., Protasova, S., Mazilkin, A., Zieba, P., Czeppe, T., Wojewoda-Budka, J., Faryna, M.

In 495, pages: 126-131, Alicante, Spain, 2008 (inproceedings)

mms

DOI [BibTex]

DOI [BibTex]


no image
The dual role of uncertainty in force field learning

Mistry, M., Theodorou, E., Hoffmann, H., Schaal, S.

In Abstracts of the Eighteenth Annual Meeting of Neural Control of Movement (NCM), Naples, Florida, April 29-May 4, 2008, clmc (inproceedings)

Abstract
Force field experiments have been a successful paradigm for studying the principles of planning, execution, and learning in human arm movements. Subjects have been shown to cope with the disturbances generated by force fields by learning internal models of the underlying dynamics to predict disturbance effects or by increasing arm impedance (via co-contraction) if a predictive approach becomes infeasible. Several studies have addressed the issue uncertainty in force field learning. Scheidt et al. demonstrated that subjects exposed to a viscous force field of fixed structure but varying strength (randomly changing from trial to trial), learn to adapt to the mean disturbance, regardless of the statistical distribution. Takahashi et al. additionally show a decrease in strength of after-effects after learning in the randomly varying environment. Thus they suggest that the nervous system adopts a dual strategy: learning an internal model of the mean of the random environment, while simultaneously increasing arm impedance to minimize the consequence of errors. In this study, we examine what role variance plays in the learning of uncertain force fields. We use a 7 degree-of-freedom exoskeleton robot as a manipulandum (Sarcos Master Arm, Sarcos, Inc.), and apply a 3D viscous force field of fixed structure and strength randomly selected from trial to trial. Additionally, in separate blocks of trials, we alter the variance of the randomly selected strength multiplier (while keeping a constant mean). In each block, after sufficient learning has occurred, we apply catch trials with no force field and measure the strength of after-effects. As expected in higher variance cases, results show increasingly smaller levels of after-effects as the variance is increased, thus implying subjects choose the robust strategy of increasing arm impedance to cope with higher levels of uncertainty. Interestingly, however, subjects show an increase in after-effect strength with a small amount of variance as compared to the deterministic (zero variance) case. This result implies that a small amount of variability aides in internal model formation, presumably a consequence of the additional amount of exploration conducted in the workspace of the task.

am

[BibTex]

[BibTex]


no image
Dynamic movement primitives for movement generation motivated by convergent force fields in frog

Hoffmann, H., Pastor, P., Schaal, S.

In Adaptive Motion of Animals and Machines (AMAM), 2008, clmc (inproceedings)

am

PDF [BibTex]

PDF [BibTex]