This talk will survey recent work to achieve multi-contact locomotion control of humanoid and legged robots. I will start by presenting some results on robust optimization-based control. We exploited robust optimization techniques, either stochastic or worst-case, to improve the robustness of Task-Space Inverse Dynamics (TSID), a well-known control framework for legged robots. We modeled uncertainties in the joint torques, and we immunized the constraints of the system to any of the realizations of these uncertainties. We also applied the same methodology to ensure the balance of the robot despite bounded errors in the its inertial parameters. Extensive simulations in a realistic environment show that the proposed robust controllers greatly outperform the classic one. Then I will present preliminary results on a new capturability criterion for legged robots in multi-contact. "N-step capturability" is the ability of a system to come to a stop by taking N or fewer steps. Simplified models to compute N-step capturability already exist and are widely used, but they are limited to locomotion on flat terrains. We propose a new efficient algorithm to compute 0-step capturability for a robot in arbitrary contact scenarios. Finally, I will present our recent efforts to transfer the above-mentioned techniques to the real humanoid robot HRP-2, on which we recently implemented joint torque control.
Organizers: Ludovic Righetti
Estimating human pose, shape, and motion from images and video are fundamental challenges with many applications. Recent advances in 2D human pose estimation use large amounts of manually-labeled training data for learning convolutional neural networks (CNNs). Such data is time consuming to acquire and difficult to extend. Moreover, manual labeling of 3D pose, depth and motion is impractical. In this work we present SURREAL: a new large-scale dataset with synthetically-generated but realistic images of people rendered from 3D sequences of human motion capture data. We generate more than 6 million frames together with ground truth pose, depth maps, and segmentation masks. We show that CNNs trained on our synthetic dataset allow for accurate human depth estimation and human part segmentation in real RGB images. Our results and the new dataset open up new possibilities for advancing person analysis using cheap and large-scale synthetic data.
Organizers: Dimitris Tzionas
In order to avoid an expensive manual labeling process or to learn object classes autonomously without human intervention, object discovery techniques have been proposed that extract visual similar objects from weakly labelled videos. However, the problem of discovering small or medium sized objects is largely unexplored. We observe that videos with activities involving human-object interactions can serve as weakly labelled data for such cases. Since neither object appearance nor motion is distinct enough to discover objects in these videos, we propose a framework that samples from a space of algorithms and their parameters to extract sequences of object proposals. Furthermore, we model similarity of objects based on appearance and functionality, which is derived from human and object motion. We show that functionality is an important cue for discovering objects from activities and demonstrate the generality of the model on three challenging RGB-D and RGB datasets.
Facebook serves close to a billion people every day, who are only able to consume a small subset of the information available to them. In this talk I will give some examples of how machine learning is used to personalize people’s Facebook experience. I will also present some data science experiments with fairly counter-intuitive results.
In this talk I will discuss two related problems in 3D reconstruction: (i) recovering the 3D shape of a temporally varying non-rigid 3D surface given a single video sequence and (ii) reconstructing different instances of the same object class category given a large collection of images from that category. In both cases we extract dense 3D shape information by analysing shape variation -- in one case of the same object instance over time and in the other across different instances of objects that belong to the same class.
First I will discuss the problem of dense capture of 3D non-rigid surfaces from a monocular video sequence. We take a purely model-free approach where no strong assumptions are made about the object we are looking at or the way it deforms. We apply low rank and spatial smoothness priors to obtain dense non-rigid models using a variational approach.
Second I will describe our recent approach to populating the Pascal VOC dataset with dense, per-object 3D reconstructions, bootstrapped from class labels, ground truth figure-ground segmentations and a small set of keypoint annotations. Our proposed algorithm first estimates camera viewpoint using rigid structure-from-motion, then reconstructs objects shapes by optimizing over visual hull proposals guided by loose within-class shape similarity assumptions.
Stochastic differential equations (SDEs) arise naturally as descriptions of continuous time dynamical systems. My talk addresses the problem of inferring the dynamical state and parameters of such systems from observations taken at discrete times. I will discuss the application of approximate inference methods such as the variational method and expectation propagation and show how higher dimensional systems can be treated by a mean field approximation. In the second part of my talk I will discuss the nonparametric estimation of the drift (i.e. the deterministic part of the ‘force’ which governs the dynamics) as a function of the state using Gaussian process approaches.
Even though many challenges remain unsolved, in recent years computer graphics algorithms to render photo-realistic imagery have seen tremendous progress. An important prerequisite for high-quality renderings is the availability of good models of the scenes to be rendered, namely models of shape, motion and appearance. Unfortunately, the technology to create such models has not kept pace with the technology to render the imagery. In fact, we observe a content creation bottleneck, as it often takes man months of tedious manual work by a animation artists to craft models of moving virtual scenes.
To overcome this limitation, the research community has been developing techniques to capture models of dynamic scenes from real world examples, for instance methods that rely on footage recorded with cameras or other sensors. One example are performance capture methods that measure detailed dynamic surface models, for example of actors or an actor's face, from multi-view video and without markers in the scene. Even though such 4D capture methods made big strides ahead, they are still at an early stage of their development. Their application is limited to scenes of moderate complexity in controlled environments, reconstructed detail is limited, and captured content cannot be easily modified, to name only a few restrictions.
In this talk, I will elaborate on some ideas on how to go beyond this limited scope of 4D reconstruction, and show some results from our recent work. For instance, I will show how we can capture more complex scenes with many objects or subjects in close interaction, as well as very challenging scenes of a smaller scale, such a hand motion. The talk will also show how we can capitalize on more sophisticated light transport models and inverse rendering to enable high-quality reconstruction in much more uncontrolled scenes, eventually also outdoors, and with very few cameras. I will also demonstrate how to represent captured scenes such that they can be conveniently modified. If time allows, the talk will cover some of our recent ideas on how to perform advanced edits of videos (e.g. removing or modifying dynamic objects in scenes) by exploiting reconstructed 4D models, as well as robustly found inter- and intra-frame correspondences.
Organizers: Gerard Pons-Moll
The recent theory of compressive sensing predicts that (approximately) sparse vectors can be recovered from vastly incomplete linear measurements using efficient algorithms. This principle has a large number of potential applications in signal and image processing, machine learning and more. Optimal measurement matrices in this context known so far are based on randomness. Recovery algorithms include convex optimization approaches (l1-minimization) as well as greedy methods. Gaussian and Bernoulli random matrices are provably optimal in the sense that the smallest possible number of samples is required. Such matrices, however, are of limited practical interest because of the lack of any structure. In fact, applications demand for certain structure so that there is only limited freedom to inject randomness. We present recovery results for various structured random matrices including random partial Fourier matrices and partial random circulant matrices. We will also review recent extensions of compressive sensing for recovering matrices of low rank from incomplete information via efficient algorithms such as nuclear norm minimization. This principle has recently found applications for phaseless estimation, i.e., in situations where only the magnitude of measurements is available. Another extension considers the recovery of low rank tensors (multi-dimensional arrays) from incomplete linear information. Several obstacles arise when passing from matrices and tensors such as the lack of a singular value decomposition which shares all the nice properties of the matrix singular value decomposition. Although only partial theoretical results are available, we discuss algorithmic approaches for this problem.
Organizers: Michel Besserve
A goal in virtual reality is for the user to experience a synthetic environment as if it were real. Engagement with virtual actors is a big part of the sensory context, thus getting the people "right" is critical for success. Size, shape, gender, ethnicity, clothing, color, texture, movement, among other attributes must be layered and nuanced to provide an accurate encounter between an actor and a user. In this talk, I discuss the development of digital human models and how they may be improved to obtain the high realism for successful engagement in a virtual world.
Volumetric 3D modeling has attracted a lot of attention in the past. In this talk I will explain how the standard volumetric formulation can be extended to include semantic information by using a convex multi-label formulation. One of the strengths of our formulation is that it allows us to directly account for the expected surface orientations. I will focus on two applications. Firstly, I will introduce a method that allows for joint volumetric reconstruction and class segmentation. This is achieved by taking into account the expected orientations of object classes such as ground and building. Such a joint approach considerably improves the quality of the geometry while at the same time it gives a consistent semantic segmentation. In the second application I will present a method that allows for the reconstruction of challenging objects such as for example glass bottles. The main difficulty with reconstructing such objects are the texture-less, transparent and reflective areas in the input images. We propose to formulate a shape prior based on the locally expected surface orientation to account for the ambiguous input data. Our multi-label approach also directly enables us to segment the object from its surrounding.
This talk reviews differential equations on manifolds of matrices or tensors of low rank. They serve to approximate, in a low-rank format, large time-dependent matrices and tensors that are either given explicitly via their increments or are unknown solutions of differential equations. Furthermore, low-rank differential equations are used in novel algorithms for eigenvalue optimisation, for instance in robust-stability problems.
Organizers: Philipp Hennig
This talk shows how embedded optimization - i.e. autonomous optimization algorithms receiving data, solving problems, and sending answers continuously - are able to address challenging control problems. When nonlinear differential equation models are used to predict and optimize future system behaviour, one speaks of Nonlinear Model Predictive Control (NMPC).The talk presents experimental applications of NMPC to time and energy optimal control of mechatronic systems and discusses some of the algorithmic tricks that make NMPC optimization rates up to 1 MHz possible. Finally, we present on particular challenging application, tethered flight for airborne wind energy systems.
Organizers: Sebastian Trimpe