Header logo is


2019


no image
EM-Fusion: Dynamic Object-Level SLAM With Probabilistic Data Association

Strecke, M., Stückler, J.

International Conference on Computer Vision, October 2019, arXiv:1904.11781 (conference) Accepted

ev

preprint [BibTex]

2019


preprint [BibTex]


Thumb xl occ flow
Occupancy Flow: 4D Reconstruction by Learning Particle Dynamics

Niemeyer, M., Mescheder, L., Oechsle, M., Geiger, A.

International Conference on Computer Vision, October 2019 (conference)

Abstract
Deep learning based 3D reconstruction techniques have recently achieved impressive results. However, while state-of-the-art methods are able to output complex 3D geometry, it is not clear how to extend these results to time-varying topologies. Approaches treating each time step individually lack continuity and exhibit slow inference, while traditional 4D reconstruction methods often utilize a template model or discretize the 4D space at fixed resolution. In this work, we present Occupancy Flow, a novel spatio-temporal representation of time-varying 3D geometry with implicit correspondences. Towards this goal, we learn a temporally and spatially continuous vector field which assigns a motion vector to every point in space and time. In order to perform dense 4D reconstruction from images or sparse point clouds, we combine our method with a continuous 3D representation. Implicitly, our model yields correspondences over time, thus enabling fast inference while providing a sound physical description of the temporal dynamics. We show that our method can be used for interpolation and reconstruction tasks, and demonstrate the accuracy of the learned correspondences. We believe that Occupancy Flow is a promising new 4D representation which will be useful for a variety of spatio-temporal reconstruction tasks.

avg

pdf poster suppmat code Project page video [BibTex]


Thumb xl tex felds
Texture Fields: Learning Texture Representations in Function Space

Oechsle, M., Mescheder, L., Niemeyer, M., Strauss, T., Geiger, A.

International Conference on Computer Vision, October 2019 (conference)

Abstract
In recent years, substantial progress has been achieved in learning-based reconstruction of 3D objects. At the same time, generative models were proposed that can generate highly realistic images. However, despite this success in these closely related tasks, texture reconstruction of 3D objects has received little attention from the research community and state-of-the-art methods are either limited to comparably low resolution or constrained experimental setups. A major reason for these limitations is that common representations of texture are inefficient or hard to interface for modern deep learning techniques. In this paper, we propose Texture Fields, a novel texture representation which is based on regressing a continuous 3D function parameterized with a neural network. Our approach circumvents limiting factors like shape discretization and parameterization, as the proposed texture representation is independent of the shape representation of the 3D object. We show that Texture Fields are able to represent high frequency texture and naturally blend with modern deep learning techniques. Experimentally, we find that Texture Fields compare favorably to state-of-the-art methods for conditional texture reconstruction of 3D objects and enable learning of probabilistic generative models for texturing unseen 3D models. We believe that Texture Fields will become an important building block for the next generation of generative 3D models.

avg

pdf suppmat video [BibTex]

pdf suppmat video [BibTex]


Thumb xl lv
Taking a Deeper Look at the Inverse Compositional Algorithm

Lv, Z., Dellaert, F., Rehg, J. M., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2019, June 2019 (inproceedings)

Abstract
In this paper, we provide a modern synthesis of the classic inverse compositional algorithm for dense image alignment. We first discuss the assumptions made by this well-established technique, and subsequently propose to relax these assumptions by incorporating data-driven priors into this model. More specifically, we unroll a robust version of the inverse compositional algorithm and replace multiple components of this algorithm using more expressive models whose parameters we train in an end-to-end fashion from data. Our experiments on several challenging 3D rigid motion estimation tasks demonstrate the advantages of combining optimization with learning-based techniques, outperforming the classic inverse compositional algorithm as well as data-driven image-to-pose regression approaches.

avg

pdf suppmat Video Project Page Poster [BibTex]

pdf suppmat Video Project Page Poster [BibTex]


Thumb xl mots
MOTS: Multi-Object Tracking and Segmentation

Voigtlaender, P., Krause, M., Osep, A., Luiten, J., Sekar, B. B. G., Geiger, A., Leibe, B.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2019, June 2019 (inproceedings)

Abstract
This paper extends the popular task of multi-object tracking to multi-object tracking and segmentation (MOTS). Towards this goal, we create dense pixel-level annotations for two existing tracking datasets using a semi-automatic annotation procedure. Our new annotations comprise 65,213 pixel masks for 977 distinct objects (cars and pedestrians) in 10,870 video frames. For evaluation, we extend existing multi-object tracking metrics to this new task. Moreover, we propose a new baseline method which jointly addresses detection, tracking, and segmentation with a single convolutional network. We demonstrate the value of our datasets by achieving improvements in performance when training on MOTS annotations. We believe that our datasets, metrics and baseline will become a valuable resource towards developing multi-object tracking approaches that go beyond 2D bounding boxes.

avg

pdf suppmat Project Page Poster Video Project Page [BibTex]

pdf suppmat Project Page Poster Video Project Page [BibTex]


Thumb xl behl
PointFlowNet: Learning Representations for Rigid Motion Estimation from Point Clouds

Behl, A., Paschalidou, D., Donne, S., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2019, June 2019 (inproceedings)

Abstract
Despite significant progress in image-based 3D scene flow estimation, the performance of such approaches has not yet reached the fidelity required by many applications. Simultaneously, these applications are often not restricted to image-based estimation: laser scanners provide a popular alternative to traditional cameras, for example in the context of self-driving cars, as they directly yield a 3D point cloud. In this paper, we propose to estimate 3D motion from such unstructured point clouds using a deep neural network. In a single forward pass, our model jointly predicts 3D scene flow as well as the 3D bounding box and rigid body motion of objects in the scene. While the prospect of estimating 3D scene flow from unstructured point clouds is promising, it is also a challenging task. We show that the traditional global representation of rigid body motion prohibits inference by CNNs, and propose a translation equivariant representation to circumvent this problem. For training our deep network, a large dataset is required. Because of this, we augment real scans from KITTI with virtual objects, realistically modeling occlusions and simulating sensor noise. A thorough comparison with classic and learning-based techniques highlights the robustness of the proposed approach.

avg

pdf suppmat Project Page Poster Video [BibTex]

pdf suppmat Project Page Poster Video [BibTex]


Thumb xl donne
Learning Non-volumetric Depth Fusion using Successive Reprojections

Donne, S., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2019, June 2019 (inproceedings)

Abstract
Given a set of input views, multi-view stereopsis techniques estimate depth maps to represent the 3D reconstruction of the scene; these are fused into a single, consistent, reconstruction -- most often a point cloud. In this work we propose to learn an auto-regressive depth refinement directly from data. While deep learning has improved the accuracy and speed of depth estimation significantly, learned MVS techniques remain limited to the planesweeping paradigm. We refine a set of input depth maps by successively reprojecting information from neighbouring views to leverage multi-view constraints. Compared to learning-based volumetric fusion techniques, an image-based representation allows significantly more detailed reconstructions; compared to traditional point-based techniques, our method learns noise suppression and surface completion in a data-driven fashion. Due to the limited availability of high-quality reconstruction datasets with ground truth, we introduce two novel synthetic datasets to (pre-)train our network. Our approach is able to improve both the output depth maps and the reconstructed point cloud, for both learned and traditional depth estimation front-ends, on both synthetic and real data.

avg

pdf suppmat Project Page Video Poster [BibTex]

pdf suppmat Project Page Video Poster [BibTex]


Thumb xl liao
Connecting the Dots: Learning Representations for Active Monocular Depth Estimation

Riegler, G., Liao, Y., Donne, S., Koltun, V., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2019, June 2019 (inproceedings)

Abstract
We propose a technique for depth estimation with a monocular structured-light camera, \ie, a calibrated stereo set-up with one camera and one laser projector. Instead of formulating the depth estimation via a correspondence search problem, we show that a simple convolutional architecture is sufficient for high-quality disparity estimates in this setting. As accurate ground-truth is hard to obtain, we train our model in a self-supervised fashion with a combination of photometric and geometric losses. Further, we demonstrate that the projected pattern of the structured light sensor can be reliably separated from the ambient information. This can then be used to improve depth boundaries in a weakly supervised fashion by modeling the joint statistics of image and depth edges. The model trained in this fashion compares favorably to the state-of-the-art on challenging synthetic and real-world datasets. In addition, we contribute a novel simulator, which allows to benchmark active depth prediction algorithms in controlled conditions.

avg

pdf suppmat Poster Project Page [BibTex]

pdf suppmat Poster Project Page [BibTex]


Thumb xl superquadrics parsing
Superquadrics Revisited: Learning 3D Shape Parsing beyond Cuboids

Paschalidou, D., Ulusoy, A. O., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2019, June 2019 (inproceedings)

Abstract
Abstracting complex 3D shapes with parsimonious part-based representations has been a long standing goal in computer vision. This paper presents a learning-based solution to this problem which goes beyond the traditional 3D cuboid representation by exploiting superquadrics as atomic elements. We demonstrate that superquadrics lead to more expressive 3D scene parses while being easier to learn than 3D cuboid representations. Moreover, we provide an analytical solution to the Chamfer loss which avoids the need for computational expensive reinforcement learning or iterative prediction. Our model learns to parse 3D objects into consistent superquadric representations without supervision. Results on various ShapeNet categories as well as the SURREAL human body dataset demonstrate the flexibility of our model in capturing fine details and complex poses that could not have been modelled using cuboids.

avg

Project Page Poster suppmat pdf Video handout [BibTex]

Project Page Poster suppmat pdf Video handout [BibTex]


no image
Visual-Inertial Mapping with Non-Linear Factor Recovery

Usenko, V., Demmel, N., Schubert, D., Stückler, J., Cremers, D.

2019, arXiv:1904.06504 (misc)

ev

[BibTex]

[BibTex]


no image
Learning to Disentangle Latent Physical Factors for Video Prediction

Zhu, D., Munderloh, M., Rosenhahn, B., Stückler, J.

In German Conference on Pattern Recognition (GCPR), 2019, to appear (inproceedings)

ev

dataset & evaluation code video preprint [BibTex]

dataset & evaluation code video preprint [BibTex]


Thumb xl teaser website
Occupancy Networks: Learning 3D Reconstruction in Function Space

Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2019, 2019 (inproceedings)

Abstract
With the advent of deep neural networks, learning-based approaches for 3D reconstruction have gained popularity. However, unlike for images, in 3D there is no canonical representation which is both computationally and memory efficient yet allows for representing high-resolution geometry of arbitrary topology. Many of the state-of-the-art learning-based 3D reconstruction approaches can hence only represent very coarse 3D geometry or are limited to a restricted domain. In this paper, we propose Occupancy Networks, a new representation for learning-based 3D reconstruction methods. Occupancy networks implicitly represent the 3D surface as the continuous decision boundary of a deep neural network classifier. In contrast to existing approaches, our representation encodes a description of the 3D output at infinite resolution without excessive memory footprint. We validate that our representation can efficiently encode 3D structure and can be inferred from various kinds of input. Our experiments demonstrate competitive results, both qualitatively and quantitatively, for the challenging tasks of 3D reconstruction from single images, noisy point clouds and coarse discrete voxel grids. We believe that occupancy networks will become a useful tool in a wide variety of learning-based 3D tasks.

avg

Code Video pdf suppmat Project Page [BibTex]

Code Video pdf suppmat Project Page [BibTex]


no image
3D Birds-Eye-View Instance Segmentation

Elich, C., Engelmann, F., Kontogianni, T., Leibe, B.

In German Conference on Pattern Recognition (GCPR), 2019, arXiv:1904.02199, to appear (inproceedings)

ev

[BibTex]

[BibTex]


Thumb xl nova
NoVA: Learning to See in Novel Viewpoints and Domains

Coors, B., Condurache, A. P., Geiger, A.

In 2019 International Conference on 3D Vision (3DV), 2019 International Conference on 3D Vision (3DV), 2019 (inproceedings)

Abstract
Domain adaptation techniques enable the re-use and transfer of existing labeled datasets from a source to a target domain in which little or no labeled data exists. Recently, image-level domain adaptation approaches have demonstrated impressive results in adapting from synthetic to real-world environments by translating source images to the style of a target domain. However, the domain gap between source and target may not only be caused by a different style but also by a change in viewpoint. This case necessitates a semantically consistent translation of source images and labels to the style and viewpoint of the target domain. In this work, we propose the Novel Viewpoint Adaptation (NoVA) model, which enables unsupervised adaptation to a novel viewpoint in a target domain for which no labeled data is available. NoVA utilizes an explicit representation of the 3D scene geometry to translate source view images and labels to the target view. Experiments on adaptation to synthetic and real-world datasets show the benefit of NoVA compared to state-of-the-art domain adaptation approaches on the task of semantic segmentation.

avg

pdf suppmat poster video [BibTex]

pdf suppmat poster video [BibTex]

2011


no image
Toward simple control for complex, autonomous robotic applications: combining discrete and rhythmic motor primitives

Degallier, S., Righetti, L., Gay, S., Ijspeert, A.

Autonomous Robots, 31(2-3):155-181, October 2011 (article)

Abstract
Vertebrates are able to quickly adapt to new environments in a very robust, seemingly effortless way. To explain both this adaptivity and robustness, a very promising perspective in neurosciences is the modular approach to movement generation: Movements results from combinations of a finite set of stable motor primitives organized at the spinal level. In this article we apply this concept of modular generation of movements to the control of robots with a high number of degrees of freedom, an issue that is challenging notably because planning complex, multidimensional trajectories in time-varying environments is a laborious and costly process. We thus propose to decrease the complexity of the planning phase through the use of a combination of discrete and rhythmic motor primitives, leading to the decoupling of the planning phase (i.e. the choice of behavior) and the actual trajectory generation. Such implementation eases the control of, and the switch between, different behaviors by reducing the dimensionality of the high-level commands. Moreover, since the motor primitives are generated by dynamical systems, the trajectories can be smoothly modulated, either by high-level commands to change the current behavior or by sensory feedback information to adapt to environmental constraints. In order to show the generality of our approach, we apply the framework to interactive drumming and infant crawling in a humanoid robot. These experiments illustrate the simplicity of the control architecture in terms of planning, the integration of different types of feedback (vision and contact) and the capacity of autonomously switching between different behaviors (crawling and simple reaching).

mg

link (url) DOI [BibTex]

2011


link (url) DOI [BibTex]


no image
Learning Force Control Policies for Compliant Manipulation

Kalakrishnan, M., Righetti, L., Pastor, P., Schaal, S.

In 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages: 4639-4644, IEEE, San Francisco, USA, sep 2011 (inproceedings)

Abstract
Developing robots capable of fine manipulation skills is of major importance in order to build truly assistive robots. These robots need to be compliant in their actuation and control in order to operate safely in human environments. Manipulation tasks imply complex contact interactions with the external world, and involve reasoning about the forces and torques to be applied. Planning under contact conditions is usually impractical due to computational complexity, and a lack of precise dynamics models of the environment. We present an approach to acquiring manipulation skills on compliant robots through reinforcement learning. The initial position control policy for manipulation is initialized through kinesthetic demonstration. We augment this policy with a force/torque profile to be controlled in combination with the position trajectories. We use the Policy Improvement with Path Integrals (PI2) algorithm to learn these force/torque profiles by optimizing a cost function that measures task success. We demonstrate our approach on the Barrett WAM robot arm equipped with a 6-DOF force/torque sensor on two different manipulation tasks: opening a door with a lever door handle, and picking up a pen off the table. We show that the learnt force control policies allow successful, robust execution of the tasks.

am mg

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Control of legged robots with optimal distribution of contact forces

Righetti, L., Buchli, J., Mistry, M., Schaal, S.

In 2011 11th IEEE-RAS International Conference on Humanoid Robots, pages: 318-324, IEEE, Bled, Slovenia, 2011 (inproceedings)

Abstract
The development of agile and safe humanoid robots require controllers that guarantee both high tracking performance and compliance with the environment. More specifically, the control of contact interaction is of crucial importance for robots that will actively interact with their environment. Model-based controllers such as inverse dynamics or operational space control are very appealing as they offer both high tracking performance and compliance. However, while widely used for fully actuated systems such as manipulators, they are not yet standard controllers for legged robots such as humanoids. Indeed such robots are fundamentally different from manipulators as they are underactuated due to their floating-base and subject to switching contact constraints. In this paper we present an inverse dynamics controller for legged robots that use torque redundancy to create an optimal distribution of contact constraints. The resulting controller is able to minimize, given a desired motion, any quadratic cost of the contact constraints at each instant of time. In particular we show how this can be used to minimize tangential forces during locomotion, therefore significantly improving the locomotion of legged robots on difficult terrains. In addition to the theoretical result, we present simulations of a humanoid and a quadruped robot, as well as experiments on a real quadruped robot that demonstrate the advantages of the controller.

am mg

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Learning Motion Primitive Goals for Robust Manipulation

Stulp, F., Theodorou, E., Kalakrishnan, M., Pastor, P., Righetti, L., Schaal, S.

In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages: 325-331, IEEE, San Francisco, USA, sep 2011 (inproceedings)

Abstract
Applying model-free reinforcement learning to manipulation remains challenging for several reasons. First, manipulation involves physical contact, which causes discontinuous cost functions. Second, in manipulation, the end-point of the movement must be chosen carefully, as it represents a grasp which must be adapted to the pose and shape of the object. Finally, there is uncertainty in the object pose, and even the most carefully planned movement may fail if the object is not at the expected position. To address these challenges we 1) present a simplified, computationally more efficient version of our model-free reinforcement learning algorithm PI2; 2) extend PI2 so that it simultaneously learns shape parameters and goal parameters of motion primitives; 3) use shape and goal learning to acquire motion primitives that are robust to object pose uncertainty. We evaluate these contributions on a manipulation platform consisting of a 7-DOF arm with a 4-DOF hand.

am mg

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Inverse Dynamics Control of Floating-Base Robots with External Constraints: a Unified View

Righetti, L., Buchli, J., Mistry, M., Schaal, S.

In 2011 IEEE International Conference on Robotics and Automation, pages: 1085-1090, IEEE, Shanghai, China, 2011 (inproceedings)

Abstract
Inverse dynamics controllers and operational space controllers have proved to be very efficient for compliant control of fully actuated robots such as fixed base manipulators. However legged robots such as humanoids are inherently different as they are underactuated and subject to switching external contact constraints. Recently several methods have been proposed to create inverse dynamics controllers and operational space controllers for these robots. In an attempt to compare these different approaches, we develop a general framework for inverse dynamics control and show that these methods lead to very similar controllers. We are then able to greatly simplify recent whole-body controllers based on operational space approaches using kinematic projections, bringing them closer to efficient practical implementations. We also generalize these controllers such that they can be optimal under an arbitrary quadratic cost in the commands.

am mg

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Operational Space Control of Constrained and Underactuated Systems

Mistry, M., Righetti, L.

In Proceedings of Robotics: Science and Systems, Los Angeles, CA, USA, June 2011 (inproceedings)

Abstract
The operational space formulation (Khatib, 1987), applied to rigid-body manipulators, describes how to decouple task-space and null-space dynamics, and write control equations that correspond only to forces at the end-effector or, alternatively, only to motion within the null-space. We would like to apply this useful theory to modern humanoids and other legged systems, for manipulation or similar tasks, however these systems present additional challenges due to their underactuated floating bases and contact states that can dynamically change. In recent work, Sentis et al. derived controllers for such systems by implementing a task Jacobian projected into a space consistent with the supporting constraints and underactuation (the so called "support consistent reduced Jacobian"). Here, we take a new approach to derive operational space controllers for constrained underactuated systems, by first considering the operational space dynamics within "projected inverse-dynamics" (Aghili, 2005), and subsequently resolving underactuation through the addition of dynamically consistent control torques. Doing so results in a simplified control solution compared with previous results, and importantly yields several new insights into the underlying problem of operational space control in constrained environments: 1) Underactuated systems, such as humanoid robots, cannot in general completely decouple task and null-space dynamics. However, 2) there may exist an infinite number of control solutions to realize desired task-space dynamics, and 3) these solutions involve the addition of dynamically consistent null-space motion or constraint forces (or combinations of both). In light of these findings, we present several possible control solutions, with varying optimization criteria, and highlight some of their practical consequences.

mg

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Online movement adaptation based on previous sensor experiences

Pastor, P., Righetti, L., Kalakrishnan, M., Schaal, S.

In 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages: 365-371, IEEE, San Francisco, USA, sep 2011 (inproceedings)

Abstract
Personal robots can only become widespread if they are capable of safely operating among humans. In uncertain and highly dynamic environments such as human households, robots need to be able to instantly adapt their behavior to unforseen events. In this paper, we propose a general framework to achieve very contact-reactive motions for robotic grasping and manipulation. Associating stereotypical movements to particular tasks enables our system to use previous sensor experiences as a predictive model for subsequent task executions. We use dynamical systems, named Dynamic Movement Primitives (DMPs), to learn goal-directed behaviors from demonstration. We exploit their dynamic properties by coupling them with the measured and predicted sensor traces. This feedback loop allows for online adaptation of the movement plan. Our system can create a rich set of possible motions that account for external perturbations and perception uncertainty to generate truly robust behaviors. As an example, we present an application to grasping with the WAM robot arm.

am mg

link (url) DOI [BibTex]

link (url) DOI [BibTex]