Estimating human pose, shape, and motion from images and video are fundamental challenges with many applications. Recent advances in 2D human pose estimation use large amounts of manually-labeled training data for learning convolutional neural networks (CNNs). Such data is time consuming to acquire and difficult to extend. Moreover, manual labeling of 3D pose, depth and motion is impractical. In this work we present SURREAL: a new large-scale dataset with synthetically-generated but realistic images of people rendered from 3D sequences of human motion capture data. We generate more than 6 million frames together with ground truth pose, depth maps, and segmentation masks. We show that CNNs trained on our synthetic dataset allow for accurate human depth estimation and human part segmentation in real RGB images. Our results and the new dataset open up new possibilities for advancing person analysis using cheap and large-scale synthetic data.
Organizers: Dimitris Tzionas
Human-centric robotic applications often require the robots to learn new skills by interacting with the end-users. From a machine learning perspective, the challenge is to acquire skills from only few interactions, with strong generalization demands. It requires: 1) the development of intuitive active learning interfaces to acquire meaningful demonstrations; 2) the development of models that can exploit the structure and geometry of the acquired data in an efficient way; 3) the development of adaptive control techniques that can exploit the learned task variations and coordination patterns. The developed models often need to serve several purposes (recognition, prediction, online synthesis), and be compatible with different learning strategies (imitation, emulation, exploration). For the reproduction of skills, these models need to be enriched with force and impedance information to enable human-robot collaboration and to generate safe and natural movements. I will present an approach combining model predictive control and statistical learning of movement primitives in multiple coordinate systems. The proposed approach will be illustrated in various applications, with robots either close to us (robot for dressing assistance), part of us (prosthetic hand with EMG and tactile sensing), or far from us (teleoperation of bimanual robot in deep water).
Organizers: Ludovic Righetti
This talk will survey recent work to achieve multi-contact locomotion control of humanoid and legged robots. I will start by presenting some results on robust optimization-based control. We exploited robust optimization techniques, either stochastic or worst-case, to improve the robustness of Task-Space Inverse Dynamics (TSID), a well-known control framework for legged robots. We modeled uncertainties in the joint torques, and we immunized the constraints of the system to any of the realizations of these uncertainties. We also applied the same methodology to ensure the balance of the robot despite bounded errors in the its inertial parameters. Extensive simulations in a realistic environment show that the proposed robust controllers greatly outperform the classic one. Then I will present preliminary results on a new capturability criterion for legged robots in multi-contact. "N-step capturability" is the ability of a system to come to a stop by taking N or fewer steps. Simplified models to compute N-step capturability already exist and are widely used, but they are limited to locomotion on flat terrains. We propose a new efficient algorithm to compute 0-step capturability for a robot in arbitrary contact scenarios. Finally, I will present our recent efforts to transfer the above-mentioned techniques to the real humanoid robot HRP-2, on which we recently implemented joint torque control.
Organizers: Ludovic Righetti
The retina in the eye performs complex computations, to transmit only behaviourally relevant information about our visual environment to the brain. These computations are implemented by numerous different cell types that form complex circuits. New experimental and computational methods make it possible to study the cellular diversity of the retina in detail – the goal of obtaining a complete list of all the cell types in the retina and, thus, its “building blocks”, is within reach. I will review our recent contributions in this area, showing how analyzing multimodal datasets from electron microscopy and functional imaging can yield insights into the cellular organization of retinal circuits.
Organizers: Philipp Hennig
From gait, dance to martial art, human movements provide rich, complex yet coherent spatiotemporal patterns reflecting characteristics of a group or an individual. We develop computer algorithms to automatically learn such quality discriminative features from multimodal data. In this talk, I present a trilogy on learning from human movements: (1) Gait analysis from video data: based on frieze patterns (7 frieze groups), a video sequence of silhouettes is mapped into a pair of spatiotemporal patterns that are near-periodic along the time axis. A group theoretical analysis of periodic patterns allows us to determine the dynamic time warping and affine scaling that aligns two gait sequences from similar viewpoints for human identification. (2) Dance analysis and synthesis (mocap, music, ratings from Mechanical Turks): we explore the complex relationship between perceived dance quality/dancer's gender and dance movements respectively. As a feasibility study, we construct a computational framework for an analysis-synthesis-feedback loop using a novel multimedia dance-texture representation for joint angular displacement, velocity and acceleration. Furthermore, we integrate crowd sourcing, music and motion-capture data, and machine learning-based methods for dance segmentation, analysis and synthesis of new dancers. A quantitative validation of this framework on a motion-capture dataset of 172 dancers evaluated by more than 400 independent on-line raters demonstrates significant correlation between human perception and the algorithmically intended dance quality or gender of the synthesized dancers. (3) Tai Chi performance evaluation (mocap + video): I shall also discuss the feasibility of utilizing spatiotemporal synchronization and, ultimately, machine learning to evaluate Tai Chi routines performed by different subjects in our current project of “Tai Chi + Advanced Technology for Smart Health”.
There has been significant prior work on learning realistic, articulated, 3D statistical shape models of the human body. In contrast, there are few such models for animals, despite their many applications in biology, neuroscience, agriculture, and entertainment. The main challenge is that animals are much less cooperative subjects than humans: the best human body models are learned from thousands of 3D scans of people in specific poses, which is infeasible with live animals. In the talk I will illustrate how we extend a state-of-the-art articulated 3D human body model (SMPL) to animals learning from toys a multi-family shape space that can represent lions, cats, dogs, horses, cows and hippos. The generalization of the model is illustrated by fitting it to images of real animals, where it captures realistic animal shapes, even for new species not seen in training.
In this talk we will give an overview of research efforts within autonomous manipulation at the AASS Research Center, Örebro University, Sweden. We intend to give a holistic view on the historically separated subjects of robot motion planning and control. In particular, viewing motion behavior generation as an optimal control problem allows for a unified formulation that is uncluttered by a-priori domain assumptions and simplified solution strategies. Furthermore, We will also discuss the problems of workspace modeling and perception and how to integrate them in the overarching problem of autonomous manipulation.
Organizers: Ludovic Righetti
As large tensor-variate data increasingly become the norm in applied machine learning and statistics, complex analysis methods similarly increase in prevalence. Such a trend offers the opportunity to understand more intricate features of the data that, ostensibly, could not be studied with simpler datasets or simpler methodologies. While promising, these advances are also perilous: these novel analysis techniques do not always consider the possibility that their results are in fact an expected consequence of some simpler, already-known feature of simpler data (for example, treating the tensor like a matrix or a univariate quantity) or simpler statistic (for example, the mean and covariance of one of the tensor modes). I will present two works that address this growing problem, the first of which uses Kronecker algebra to derive a tensor-variate maximum entropy distribution that shares modal moments with the real data. This distribution of surrogate data forms the basis of a statistical hypothesis test, and I use this method to answer a question of epiphenomenal tensor structure in populations of neural recordings in the motor and prefrontal cortex. In the second part, I will discuss how to extend this maximum entropy formulation to arbitrary constraints using deep neural network architectures in the flavor of implicit generative modeling, and I will use this method in a texture synthesis application.
Organizers: Philipp Hennig
In classical reinforcement learning agents accept arbitrary short term loss for long term gain when exploring their environment. This is infeasible for safety critical applications such as robotics, where even a single unsafe action may cause system failure or harm the environment. In this work, we address the problem of safely exploring finite Markov decision processes (MDP). We define safety in terms of an a priori unknown safety constraint that depends on states and actions and satisfies certain regularity conditions expressed via a Gaussian process prior. We develop a novel algorithm, SAFEMDP, for this task and prove that it completely explores the safely reachable part of the MDP without violating the safety constraint. Moreover, the algorithm explicitly considers reachability when exploring the MDP, ensuring that it does not get stuck in any state with no safe way out. We demonstrate our method on digital terrain models for the task of exploring an unknown map with a rover.
Organizers: Sebastian Trimpe
Organizers: Moritz Grosse-Wentrup
This is the story of the novel model predictive control (MPC) solution for ABB’s largest drive, the Megadrive LCI. LCI stands for load commutated inverter, a type of current source converter which powers large machineries in many industries such as marine, mining or oil & gas. Starting from a small software project at ABB Corporate Research, this novel control solution turned out to become the first time ever MPC was employed in a 48 MW commercial drive. Subsequently it was commissioned at Kollsnes, a key facility of the natural gas delivery chain, in order to increase the plant’s availability. In this presentation I will talk about the magic behind this success story, the so-called Embedded MPC algorithms, and my objective will be to demonstrate the possibilities when power meets computation.
Organizers: Sebastian Trimpe