Header logo is


2018


Thumb xl screenshot 2018 05 18 16 38 40
Learning 3D Shape Completion under Weak Supervision

Stutz, D., Geiger, A.

Arxiv, May 2018 (article)

Abstract
We address the problem of 3D shape completion from sparse and noisy point clouds, a fundamental problem in computer vision and robotics. Recent approaches are either data-driven or learning-based: Data-driven approaches rely on a shape model whose parameters are optimized to fit the observations; Learning-based approaches, in contrast, avoid the expensive optimization step by learning to directly predict complete shapes from incomplete observations in a fully-supervised setting. However, full supervision is often not available in practice. In this work, we propose a weakly-supervised learning-based approach to 3D shape completion which neither requires slow optimization nor direct supervision. While we also learn a shape prior on synthetic data, we amortize, i.e., learn, maximum likelihood fitting using deep neural networks resulting in efficient shape completion without sacrificing accuracy. On synthetic benchmarks based on ShapeNet and ModelNet as well as on real robotics data from KITTI and Kinect, we demonstrate that the proposed amortized maximum likelihood approach is able to compete with fully supervised baselines and outperforms data-driven approaches, while requiring less supervision and being significantly faster.

avg

PDF Project Page Project Page [BibTex]


no image
Nonlinear decoding of a complex movie from the mammalian retina

Botella-Soler, V., Deny, S., Martius, G., Marre, O., Tkačik, G.

PLOS Computational Biology, 14(5):1-27, Public Library of Science, May 2018 (article)

Abstract
Author summary Neurons in the retina transform patterns of incoming light into sequences of neural spikes. We recorded from ∼100 neurons in the rat retina while it was stimulated with a complex movie. Using machine learning regression methods, we fit decoders to reconstruct the movie shown from the retinal output. We demonstrated that retinal code can only be read out with a low error if decoders make use of correlations between successive spikes emitted by individual neurons. These correlations can be used to ignore spontaneous spiking that would, otherwise, cause even the best linear decoders to “hallucinate” nonexistent stimuli. This work represents the first high resolution single-trial full movie reconstruction and suggests a new paradigm for separating spontaneous from stimulus-driven neural activity.

al

DOI [BibTex]

DOI [BibTex]


Thumb xl hassan teaser paper
Augmented Reality Meets Computer Vision: Efficient Data Generation for Urban Driving Scenes

Alhaija, H., Mustikovela, S., Mescheder, L., Geiger, A., Rother, C.

International Journal of Computer Vision (IJCV), 2018, 2018 (article)

Abstract
The success of deep learning in computer vision is based on the availability of large annotated datasets. To lower the need for hand labeled images, virtually rendered 3D worlds have recently gained popularity. Unfortunately, creating realistic 3D content is challenging on its own and requires significant human effort. In this work, we propose an alternative paradigm which combines real and synthetic data for learning semantic instance segmentation and object detection models. Exploiting the fact that not all aspects of the scene are equally important for this task, we propose to augment real-world imagery with virtual objects of the target category. Capturing real-world images at large scale is easy and cheap, and directly provides real background appearances without the need for creating complex 3D models of the environment. We present an efficient procedure to augment these images with virtual objects. In contrast to modeling complete 3D environments, our data augmentation approach requires only a few user interactions in combination with 3D models of the target object category. Leveraging our approach, we introduce a novel dataset of augmented urban driving scenes with 360 degree images that are used as environment maps to create realistic lighting and reflections on rendered objects. We analyze the significance of realistic object placement by comparing manual placement by humans to automatic methods based on semantic scene analysis. This allows us to create composite images which exhibit both realistic background appearance as well as a large number of complex object arrangements. Through an extensive set of experiments, we conclude the right set of parameters to produce augmented data which can maximally enhance the performance of instance segmentation models. Further, we demonstrate the utility of the proposed approach on training standard deep models for semantic instance segmentation and object detection of cars in outdoor driving scenarios. We test the models trained on our augmented data on the KITTI 2015 dataset, which we have annotated with pixel-accurate ground truth, and on the Cityscapes dataset. Our experiments demonstrate that the models trained on augmented imagery generalize better than those trained on fully synthetic data or models trained on limited amounts of annotated real data.

avg

pdf Project Page [BibTex]

pdf Project Page [BibTex]


Thumb xl stutz
Learning 3D Shape Completion under Weak Supervision

Stutz, D., Geiger, A.

International Journal of Computer Vision (IJCV), 2018, 2018 (article)

Abstract
We address the problem of 3D shape completion from sparse and noisy point clouds, a fundamental problem in computer vision and robotics. Recent approaches are either data-driven or learning-based: Data-driven approaches rely on a shape model whose parameters are optimized to fit the observations; Learning-based approaches, in contrast, avoid the expensive optimization step by learning to directly predict complete shapes from incomplete observations in a fully-supervised setting. However, full supervision is often not available in practice. In this work, we propose a weakly-supervised learning-based approach to 3D shape completion which neither requires slow optimization nor direct supervision. While we also learn a shape prior on synthetic data, we amortize, i.e., learn, maximum likelihood fitting using deep neural networks resulting in efficient shape completion without sacrificing accuracy. On synthetic benchmarks based on ShapeNet and ModelNet as well as on real robotics data from KITTI and Kinect, we demonstrate that the proposed amortized maximum likelihood approach is able to compete with a fully supervised baseline and outperforms the data-driven approach of Engelmann et al., while requiring less supervision and being significantly faster.

avg

pdf Project Page [BibTex]

pdf Project Page [BibTex]


Thumb xl objectflow
Object Scene Flow

Menze, M., Heipke, C., Geiger, A.

ISPRS Journal of Photogrammetry and Remote Sensing, 2018 (article)

Abstract
This work investigates the estimation of dense three-dimensional motion fields, commonly referred to as scene flow. While great progress has been made in recent years, large displacements and adverse imaging conditions as observed in natural outdoor environments are still very challenging for current approaches to reconstruction and motion estimation. In this paper, we propose a unified random field model which reasons jointly about 3D scene flow as well as the location, shape and motion of vehicles in the observed scene. We formulate the problem as the task of decomposing the scene into a small number of rigidly moving objects sharing the same motion parameters. Thus, our formulation effectively introduces long-range spatial dependencies which commonly employed local rigidity priors are lacking. Our inference algorithm then estimates the association of image segments and object hypotheses together with their three-dimensional shape and motion. We demonstrate the potential of the proposed approach by introducing a novel challenging scene flow benchmark which allows for a thorough comparison of the proposed scene flow approach with respect to various baseline models. In contrast to previous benchmarks, our evaluation is the first to provide stereo and optical flow ground truth for dynamic real-world urban scenes at large scale. Our experiments reveal that rigid motion segmentation can be utilized as an effective regularizer for the scene flow problem, improving upon existing two-frame scene flow methods. At the same time, our method yields plausible object segmentations without requiring an explicitly trained recognition model for a specific object class.

avg

Project Page [BibTex]

Project Page [BibTex]

2013


Thumb xl ijrr
Vision meets Robotics: The KITTI Dataset

Geiger, A., Lenz, P., Stiller, C., Urtasun, R.

International Journal of Robotics Research, 32(11):1231 - 1237 , Sage Publishing, September 2013 (article)

Abstract
We present a novel dataset captured from a VW station wagon for use in mobile robotics and autonomous driving research. In total, we recorded 6 hours of traffic scenarios at 10-100 Hz using a variety of sensor modalities such as high-resolution color and grayscale stereo cameras, a Velodyne 3D laser scanner and a high-precision GPS/IMU inertial navigation system. The scenarios are diverse, capturing real-world traffic situations and range from freeways over rural areas to inner-city scenes with many static and dynamic objects. Our data is calibrated, synchronized and timestamped, and we provide the rectified and raw image sequences. Our dataset also contains object labels in the form of 3D tracklets and we provide online benchmarks for stereo, optical flow, object detection and other tasks. This paper describes our recording platform, the data format and the utilities that we provide.

avg ps

pdf DOI [BibTex]

2013


pdf DOI [BibTex]


no image
Information Driven Self-Organization of Complex Robotic Behaviors

Martius, G., Der, R., Ay, N.

PLoS ONE, 8(5):e63400, Public Library of Science, 2013 (article)

al

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Linear combination of one-step predictive information with an external reward in an episodic policy gradient setting: a critical analysis

Zahedi, K., Martius, G., Ay, N.

Frontiers in Psychology, 4(801), 2013 (article)

Abstract
One of the main challenges in the field of embodied artificial intelligence is the open-ended autonomous learning of complex behaviours. Our approach is to use task-independent, information-driven intrinsic motivation(s) to support task-dependent learning. The work presented here is a preliminary step in which we investigate the predictive information (the mutual information of the past and future of the sensor stream) as an intrinsic drive, ideally supporting any kind of task acquisition. Previous experiments have shown that the predictive information (PI) is a good candidate to support autonomous, open-ended learning of complex behaviours, because a maximisation of the PI corresponds to an exploration of morphology- and environment-dependent behavioural regularities. The idea is that these regularities can then be exploited in order to solve any given task. Three different experiments are presented and their results lead to the conclusion that the linear combination of the one-step PI with an external reward function is not generally recommended in an episodic policy gradient setting. Only for hard tasks a great speed-up can be achieved at the cost of an asymptotic performance lost.

al

link (url) DOI [BibTex]


no image
Robustness of guided self-organization against sensorimotor disruptions

Martius, G.

Advances in Complex Systems, 16(02n03):1350001, 2013 (article)

Abstract
Self-organizing processes are crucial for the development of living beings. Practical applications in robots may benefit from the self-organization of behavior, e.g.~to increase fault tolerance and enhance flexibility, provided that external goals can also be achieved. We present results on the guidance of self-organizing control by visual target stimuli and show a remarkable robustness to sensorimotor disruptions. In a proof of concept study an autonomous wheeled robot is learning an object finding and ball-pushing task from scratch within a few minutes in continuous domains. The robustness is demonstrated by the rapid recovery of the performance after severe changes of the sensor configuration.

al

DOI [BibTex]

DOI [BibTex]