Header logo is


2020


A gamified app that helps people overcome self-limiting beliefs by promoting metacognition
A gamified app that helps people overcome self-limiting beliefs by promoting metacognition

Amo, V., Lieder, F.

Virtual, SIG 8 Meets SIG 16, September 2020 (conference) Accepted

Abstract
Previous research has shown that approaching learning with a growth mindset is key for maintaining motivation and overcoming setbacks. Mindsets are systems of beliefs that people hold to be true. They influence a person's attitudes, thoughts, and emotions when they learn something new or encounter challenges. In clinical psychology, metareasoning (reflecting on one's mental processes) and meta-awareness (recognizing thoughts as mental events instead of equating them to reality) have proven effective for overcoming maladaptive thinking styles. Hence, they are potentially an effective method for overcoming self-limiting beliefs in other domains as well. However, the potential of integrating assisted metacognition into mindset interventions has not been explored yet. Here, we propose that guiding and training people on how to leverage metareasoning and meta-awareness for overcoming self-limiting beliefs can significantly enhance the effectiveness of mindset interventions. To test this hypothesis, we develop a gamified mobile application that guides and trains people to use metacognitive strategies based on Cognitive Restructuring (CR) and Acceptance Commitment Therapy (ACT) techniques. The application helps users to identify and overcome self-limiting beliefs by working with aversive emotions when they are triggered by fixed mindsets in real-life situations. Our app aims to help people sustain their motivation to learn when they face inner obstacles (e.g. anxiety, frustration, and demotivation). We expect the application to be an effective tool for helping people better understand and develop the metacognitive skills of emotion regulation and self-regulation that are needed to overcome self-limiting beliefs and develop growth mindsets.

re

A gamified app that helps people overcome self-limiting beliefs by promoting metacognition [BibTex]


no image
How to navigate everyday distractions: Leveraging optimal feedback to train attention control

Wirzberger, M., Lado, A., Eckerstorfer, L., Oreshnikov, I., Passy, J., Stock, A., Shenhav, A., Lieder, F.

Annual Meeting of the Cognitive Science Society, July 2020 (conference) Accepted

Abstract
To stay focused on their chosen tasks, people have to inhibit distractions. The underlying attention control skills can improve through reinforcement learning, which can be accelerated by giving feedback. We applied the theory of metacognitive reinforcement learning to develop a training app that gives people optimal feedback on their attention control while they are working or studying. In an eight-day field experiment with 99 participants, we investigated the effect of this training on people's productivity, sustained attention, and self-control. Compared to a control condition without feedback, we found that participants receiving optimal feedback learned to focus increasingly better (f = .08, p < .01) and achieved higher productivity scores (f = .19, p < .01) during the training. In addition, they evaluated their productivity more accurately (r = .12, p < .01). However, due to asymmetric attrition problems, these findings need to be taken with a grain of salt.

re sf

How to navigate everyday distractions: Leveraging optimal feedback to train attention control DOI Project Page [BibTex]


no image
Leveraging Machine Learning to Automatically Derive Robust Planning Strategies from Biased Models of the Environment

Kemtur, A., Jain, Y. R., Mehta, A., Callaway, F., Consul, S., Stojcheski, J., Lieder, F.

Virtual, CogSci 2020, July 2020, Anirudha Kemtur and Yash Raj Jain contributed equally to this publication. (conference) Accepted

Abstract
Teaching clever heuristics is a promising approach to improve decision-making. We can leverage machine learning to discover clever strategies automatically. Current methods require an accurate model of the decision problems people face in real life. But most models are misspecified because of limited information and cognitive biases. To address this problem we develop strategy discovery methods that are robust to model misspecification. Robustness is achieved by model-ing model-misspecification and handling uncertainty about the real-world according to Bayesian inference. We translate our methods into an intelligent tutor that automatically discovers and teaches robust planning strategies. Our robust cognitive tutor significantly improved human decision-making when the model was so biased that conventional cognitive tutors were no longer effective. These findings highlight that our robust strategy discovery methods are a significant step towards leveraging artificial intelligence to improve human decision-making in the real world.

re

Leveraging Machine Learning to Automatically Derive Robust Planning Strategiesfrom Biased Models of the Environment [BibTex]


no image
Automatic Discovery of Interpretable Planning Strategies

Skirzyński, J., Becker, F., Lieder, F.

May 2020 (article) Submitted

Abstract
When making decisions, people often overlook critical information or are overly swayed by irrelevant information. A common approach to mitigate these biases is to provide decisionmakers, especially professionals such as medical doctors, with decision aids, such as decision trees and flowcharts. Designing effective decision aids is a difficult problem. We propose that recently developed reinforcement learning methods for discovering clever heuristics for good decision-making can be partially leveraged to assist human experts in this design process. One of the biggest remaining obstacles to leveraging the aforementioned methods for improving human decision-making is that the policies they learn are opaque to people. To solve this problem, we introduce AI-Interpret: a general method for transforming idiosyncratic policies into simple and interpretable descriptions. Our algorithm combines recent advances in imitation learning and program induction with a new clustering method for identifying a large subset of demonstrations that can be accurately described by a simple, high-performing decision rule. We evaluate our new AI-Interpret algorithm and employ it to translate information-acquisition policies discovered through metalevel reinforcement learning. The results of three large behavioral experiments showed that the provision of decision rules as flowcharts significantly improved people’s planning strategies and decisions across three different classes of sequential decision problems. Furthermore, a series of ablation studies confirmed that our AI-Interpret algorithm was critical to the discovery of interpretable decision rules and that it is ready to be applied to other reinforcement learning problems. We conclude that the methods and findings presented in this article are an important step towards leveraging automatic strategy discovery to improve human decision-making.

re

Automatic Discovery of Interpretable Planning Strategies The code for our algorithm and the experiments is available [BibTex]


no image
Advancing Rational Analysis to the Algorithmic Level

Lieder, F., Griffiths, T. L.

Behavioral and Brain Sciences, 43, E27, March 2020 (article)

Abstract
The commentaries raised questions about normativity, human rationality, cognitive architectures, cognitive constraints, and the scope or resource rational analysis (RRA). We respond to these questions and clarify that RRA is a methodological advance that extends the scope of rational modeling to understanding cognitive processes, why they differ between people, why they change over time, and how they could be improved.

re

Advancing rational analysis to the algorithmic level DOI [BibTex]

Advancing rational analysis to the algorithmic level DOI [BibTex]


no image
Learning to Overexert Cognitive Control in a Stroop Task

Bustamante, L., Lieder, F., Musslick, S., Shenhav, A., Cohen, J.

Febuary 2020, Laura Bustamante and Falk Lieder contributed equally to this publication. (article) In revision

Abstract
How do people learn when to allocate how much cognitive control to which task? According to the Learned Value of Control (LVOC) model, people learn to predict the value of alternative control allocations from features of a given situation. This suggests that people may generalize the value of control learned in one situation to other situations with shared features, even when the demands for cognitive control are different. This makes the intriguing prediction that what a person learned in one setting could, under some circumstances, cause them to misestimate the need for, and potentially over-exert control in another setting, even if this harms their performance. To test this prediction, we had participants perform a novel variant of the Stroop task in which, on each trial, they could choose to either name the color (more control-demanding) or read the word (more automatic). However only one of these tasks was rewarded, it changed from trial to trial, and could be predicted by one or more of the stimulus features (the color and/or the word). Participants first learned colors that predicted the rewarded task. Then they learned words that predicted the rewarded task. In the third part of the experiment, we tested how these learned feature associations transferred to novel stimuli with some overlapping features. The stimulus-task-reward associations were designed so that for certain combinations of stimuli the transfer of learned feature associations would incorrectly predict that more highly rewarded task would be color naming, which would require the exertion of control, even though the actually rewarded task was word reading and therefore did not require the engagement of control. Our results demonstrated that participants over-exerted control for these stimuli, providing support for the feature-based learning mechanism described by the LVOC model.

re

Learning to Overexert Cognitive Control in a Stroop Task DOI [BibTex]

Learning to Overexert Cognitive Control in a Stroop Task DOI [BibTex]


Toward a Formal Theory of Proactivity
Toward a Formal Theory of Proactivity

Lieder, F., Iwama, G.

January 2020 (article) Submitted

Abstract
Beyond merely reacting to their environment and impulses, people have the remarkable capacity to proactively set and pursue their own goals. But the extent to which they leverage this capacity varies widely across people and situations. The goal of this article is to make the mechanisms and variability of proactivity more amenable to rigorous experiments and computational modeling. We proceed in three steps. First, we develop and validate a mathematically precise behavioral measure of proactivity and reactivity that can be applied across a wide range of experimental paradigms. Second, we propose a formal definition of proactivity and reactivity, and develop a computational model of proactivity in the AX Continuous Performance Task (AX-CPT). Third, we develop and test a computational-level theory of meta-control over proactivity in the AX-CPT that identifies three distinct meta-decision-making problems: intention setting, resolving response conflict between intentions and automaticity, and deciding whether to recall context and intentions into working memory. People's response frequencies in the AX-CPT were remarkably well captured by a mixture between the predictions of our models of proactive and reactive control. Empirical data from an experiment varying the incentives and contextual load of an AX-CPT confirmed the predictions of our meta-control model of individual differences in proactivity. Our results suggest that proactivity can be understood in terms of computational models of meta-control. Our model makes additional empirically testable predictions. Future work will extend our models from proactive control in the AX-CPT to proactive goal creation and goal pursuit in the real world.

re

Toward a formal theory of proactivity DOI [BibTex]


Self-supervised motion deblurring
Self-supervised motion deblurring

Liu, P., Janai, J., Pollefeys, M., Sattler, T., Geiger, A.

IEEE Robotics and Automation Letters, 2020 (article)

Abstract
Motion blurry images challenge many computer vision algorithms, e.g., feature detection, motion estimation, or object recognition. Deep convolutional neural networks are state-of-the-art for image deblurring. However, obtaining training data with corresponding sharp and blurry image pairs can be difficult. In this paper, we present a differentiable reblur model for self-supervised motion deblurring, which enables the network to learn from real-world blurry image sequences without relying on sharp images for supervision. Our key insight is that motion cues obtained from consecutive images yield sufficient information to inform the deblurring task. We therefore formulate deblurring as an inverse rendering problem, taking into account the physical image formation process: we first predict two deblurred images from which we estimate the corresponding optical flow. Using these predictions, we re-render the blurred images and minimize the difference with respect to the original blurry inputs. We use both synthetic and real dataset for experimental evaluations. Our experiments demonstrate that self-supervised single image deblurring is really feasible and leads to visually compelling results.

avg

pdf Project Page Blog [BibTex]

pdf Project Page Blog [BibTex]


Learning Unsupervised Hierarchical Part Decomposition of 3D Objects from a Single RGB Image
Learning Unsupervised Hierarchical Part Decomposition of 3D Objects from a Single RGB Image

Paschalidou, D., Gool, L., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2020, 2020 (inproceedings)

Abstract
Humans perceive the 3D world as a set of distinct objects that are characterized by various low-level (geometry, reflectance) and high-level (connectivity, adjacency, symmetry) properties. Recent methods based on convolutional neural networks (CNNs) demonstrated impressive progress in 3D reconstruction, even when using a single 2D image as input. However, the majority of these methods focuses on recovering the local 3D geometry of an object without considering its part-based decomposition or relations between parts. We address this challenging problem by proposing a novel formulation that allows to jointly recover the geometry of a 3D object as a set of primitives as well as their latent hierarchical structure without part-level supervision. Our model recovers the higher level structural decomposition of various objects in the form of a binary tree of primitives, where simple parts are represented with fewer primitives and more complex parts are modeled with more components. Our experiments on the ShapeNet and D-FAUST datasets demonstrate that considering the organization of parts indeed facilitates reasoning about 3D geometry.

avg

pdf suppmat Video 2 Project Page Slides Poster Video 1 [BibTex]

pdf suppmat Video 2 Project Page Slides Poster Video 1 [BibTex]


no image
ACTrain: Ein KI-basiertes Aufmerksamkeitstraining für die Wissensarbeit [ACTrain: An AI-based attention training for knowledge work]

Wirzberger, M., Oreshnikov, I., Passy, J., Lado, A., Shenhav, A., Lieder, F.

66th Spring Conference of the German Ergonomics Society, 2020 (conference)

Abstract
Unser digitales Zeitalter lebt von Informationen und stellt unsere begrenzte Verarbeitungskapazität damit täglich auf die Probe. Gerade in der Wissensarbeit haben ständige Ablenkungen erhebliche Leistungseinbußen zur Folge. Unsere intelligente Anwendung ACTrain setzt genau an dieser Stelle an und verwandelt Computertätigkeiten in eine Trainingshalle für den Geist. Feedback auf Basis maschineller Lernverfahren zeigt anschaulich den Wert auf, sich nicht von einer selbst gewählten Aufgabe ablenken zu lassen. Diese metakognitive Einsicht soll zum Durchhalten motivieren und das zugrunde liegende Fertigkeitsniveau der Aufmerksamkeitskontrolle stärken. In laufenden Feldexperimenten untersuchen wir die Frage, ob das Training mit diesem optimalen Feedback die Aufmerksamkeits- und Selbstkontrollfertigkeiten im Vergleich zu einer Kontrollgruppe ohne Feedback verbessern kann.

re sf

link (url) Project Page [BibTex]


Towards Unsupervised Learning of Generative Models for 3D Controllable Image Synthesis
Towards Unsupervised Learning of Generative Models for 3D Controllable Image Synthesis

Liao, Y., Schwarz, K., Mescheder, L., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2020, 2020 (inproceedings)

Abstract
In recent years, Generative Adversarial Networks have achieved impressive results in photorealistic image synthesis. This progress nurtures hopes that one day the classical rendering pipeline can be replaced by efficient models that are learned directly from images. However, current image synthesis models operate in the 2D domain where disentangling 3D properties such as camera viewpoint or object pose is challenging. Furthermore, they lack an interpretable and controllable representation. Our key hypothesis is that the image generation process should be modeled in 3D space as the physical world surrounding us is intrinsically three-dimensional. We define the new task of 3D controllable image synthesis and propose an approach for solving it by reasoning both in 3D space and in the 2D image domain. We demonstrate that our model is able to disentangle latent 3D factors of simple multi-object scenes in an unsupervised fashion from raw images. Compared to pure 2D baselines, it allows for synthesizing scenes that are consistent wrt. changes in viewpoint or object pose. We further evaluate various 3D representations in terms of their usefulness for this challenging task.

avg

pdf suppmat Video 2 Project Page Video 1 Slides Poster [BibTex]

pdf suppmat Video 2 Project Page Video 1 Slides Poster [BibTex]


Exploring Data Aggregation in Policy Learning for Vision-based Urban Autonomous Driving
Exploring Data Aggregation in Policy Learning for Vision-based Urban Autonomous Driving

Prakash, A., Behl, A., Ohn-Bar, E., Chitta, K., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2020, 2020 (inproceedings)

Abstract
Data aggregation techniques can significantly improve vision-based policy learning within a training environment, e.g., learning to drive in a specific simulation condition. However, as on-policy data is sequentially sampled and added in an iterative manner, the policy can specialize and overfit to the training conditions. For real-world applications, it is useful for the learned policy to generalize to novel scenarios that differ from the training conditions. To improve policy learning while maintaining robustness when training end-to-end driving policies, we perform an extensive analysis of data aggregation techniques in the CARLA environment. We demonstrate how the majority of them have poor generalization performance, and develop a novel approach with empirically better generalization performance compared to existing techniques. Our two key ideas are (1) to sample critical states from the collected on-policy data based on the utility they provide to the learned policy in terms of driving behavior, and (2) to incorporate a replay buffer which progressively focuses on the high uncertainty regions of the policy's state distribution. We evaluate the proposed approach on the CARLA NoCrash benchmark, focusing on the most challenging driving scenarios with dense pedestrian and vehicle traffic. Our approach improves driving success rate by 16% over state-of-the-art, achieving 87% of the expert performance while also reducing the collision rate by an order of magnitude without the use of any additional modality, auxiliary tasks, architectural modifications or reward from the environment.

avg

pdf suppmat Video 2 Project Page Slides Video 1 [BibTex]

pdf suppmat Video 2 Project Page Slides Video 1 [BibTex]


Learning Situational Driving
Learning Situational Driving

Ohn-Bar, E., Prakash, A., Behl, A., Chitta, K., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2020, 2020 (inproceedings)

Abstract
Human drivers have a remarkable ability to drive in diverse visual conditions and situations, e.g., from maneuvering in rainy, limited visibility conditions with no lane markings to turning in a busy intersection while yielding to pedestrians. In contrast, we find that state-of-the-art sensorimotor driving models struggle when encountering diverse settings with varying relationships between observation and action. To generalize when making decisions across diverse conditions, humans leverage multiple types of situation-specific reasoning and learning strategies. Motivated by this observation, we develop a framework for learning a situational driving policy that effectively captures reasoning under varying types of scenarios. Our key idea is to learn a mixture model with a set of policies that can capture multiple driving modes. We first optimize the mixture model through behavior cloning, and show it to result in significant gains in terms of driving performance in diverse conditions. We then refine the model by directly optimizing for the driving task itself, i.e., supervised with the navigation task reward. Our method is more scalable than methods assuming access to privileged information, e.g., perception labels, as it only assumes demonstration and reward-based supervision. We achieve over 98% success rate on the CARLA driving benchmark as well as state-of-the-art performance on a newly introduced generalization benchmark.

avg

pdf suppmat Video 2 Project Page Video 1 Slides [BibTex]

pdf suppmat Video 2 Project Page Video 1 Slides [BibTex]


On Joint Estimation of Pose, Geometry and svBRDF from a Handheld Scanner
On Joint Estimation of Pose, Geometry and svBRDF from a Handheld Scanner

Schmitt, C., Donne, S., Riegler, G., Koltun, V., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2020, 2020 (inproceedings)

Abstract
We propose a novel formulation for joint recovery of camera pose, object geometry and spatially-varying BRDF. The input to our approach is a sequence of RGB-D images captured by a mobile, hand-held scanner that actively illuminates the scene with point light sources. Compared to previous works that jointly estimate geometry and materials from a hand-held scanner, we formulate this problem using a single objective function that can be minimized using off-the-shelf gradient-based solvers. By integrating material clustering as a differentiable operation into the optimization process, we avoid pre-processing heuristics and demonstrate that our model is able to determine the correct number of specular materials independently. We provide a study on the importance of each component in our formulation and on the requirements of the initial geometry. We show that optimizing over the poses is crucial for accurately recovering fine details and that our approach naturally results in a semantically meaningful material segmentation.

avg

pdf Project Page Slides Video Poster [BibTex]

pdf Project Page Slides Video Poster [BibTex]


Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision
Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision

Niemeyer, M., Mescheder, L., Oechsle, M., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2020, 2020 (inproceedings)

Abstract
Learning-based 3D reconstruction methods have shown impressive results. However, most methods require 3D supervision which is often hard to obtain for real-world datasets. Recently, several works have proposed differentiable rendering techniques to train reconstruction models from RGB images. Unfortunately, these approaches are currently restricted to voxel- and mesh-based representations, suffering from discretization or low resolution. In this work, we propose a differentiable rendering formulation for implicit shape and texture representations. Implicit representations have recently gained popularity as they represent shape and texture continuously. Our key insight is that depth gradients can be derived analytically using the concept of implicit differentiation. This allows us to learn implicit shape and texture representations directly from RGB images. We experimentally show that our single-view reconstructions rival those learned with full 3D supervision. Moreover, we find that our method can be used for multi-view 3D reconstruction, directly resulting in watertight meshes.

avg

pdf suppmat Video 2 Project Page Video 1 Video 3 Slides Poster [BibTex]

pdf suppmat Video 2 Project Page Video 1 Video 3 Slides Poster [BibTex]

2015


Exploiting Object Similarity in 3D Reconstruction
Exploiting Object Similarity in 3D Reconstruction

Zhou, C., Güney, F., Wang, Y., Geiger, A.

In International Conference on Computer Vision (ICCV), December 2015 (inproceedings)

Abstract
Despite recent progress, reconstructing outdoor scenes in 3D from movable platforms remains a highly difficult endeavor. Challenges include low frame rates, occlusions, large distortions and difficult lighting conditions. In this paper, we leverage the fact that the larger the reconstructed area, the more likely objects of similar type and shape will occur in the scene. This is particularly true for outdoor scenes where buildings and vehicles often suffer from missing texture or reflections, but share similarity in 3D shape. We take advantage of this shape similarity by locating objects using detectors and jointly reconstructing them while learning a volumetric model of their shape. This allows us to reduce noise while completing missing surfaces as objects of similar shape benefit from all observations for the respective category. We evaluate our approach with respect to LIDAR ground truth on a novel challenging suburban dataset and show its advantages over the state-of-the-art.

avg ps

pdf suppmat [BibTex]

2015


pdf suppmat [BibTex]


FollowMe: Efficient Online Min-Cost Flow Tracking with Bounded Memory and Computation
FollowMe: Efficient Online Min-Cost Flow Tracking with Bounded Memory and Computation

Lenz, P., Geiger, A., Urtasun, R.

In International Conference on Computer Vision (ICCV), International Conference on Computer Vision (ICCV), December 2015 (inproceedings)

Abstract
One of the most popular approaches to multi-target tracking is tracking-by-detection. Current min-cost flow algorithms which solve the data association problem optimally have three main drawbacks: they are computationally expensive, they assume that the whole video is given as a batch, and they scale badly in memory and computation with the length of the video sequence. In this paper, we address each of these issues, resulting in a computationally and memory-bounded solution. First, we introduce a dynamic version of the successive shortest-path algorithm which solves the data association problem optimally while reusing computation, resulting in faster inference than standard solvers. Second, we address the optimal solution to the data association problem when dealing with an incoming stream of data (i.e., online setting). Finally, we present our main contribution which is an approximate online solution with bounded memory and computation which is capable of handling videos of arbitrary length while performing tracking in real time. We demonstrate the effectiveness of our algorithms on the KITTI and PETS2009 benchmarks and show state-of-the-art performance, while being significantly faster than existing solvers.

avg ps

pdf suppmat video project [BibTex]

pdf suppmat video project [BibTex]


Towards Probabilistic Volumetric Reconstruction using Ray Potentials
Towards Probabilistic Volumetric Reconstruction using Ray Potentials

(Best Paper Award)

Ulusoy, A. O., Geiger, A., Black, M. J.

In 3D Vision (3DV), 2015 3rd International Conference on, pages: 10-18, Lyon, October 2015 (inproceedings)

Abstract
This paper presents a novel probabilistic foundation for volumetric 3-d reconstruction. We formulate the problem as inference in a Markov random field, which accurately captures the dependencies between the occupancy and appearance of each voxel, given all input images. Our main contribution is an approximate highly parallelized discrete-continuous inference algorithm to compute the marginal distributions of each voxel's occupancy and appearance. In contrast to the MAP solution, marginals encode the underlying uncertainty and ambiguity in the reconstruction. Moreover, the proposed algorithm allows for a Bayes optimal prediction with respect to a natural reconstruction loss. We compare our method to two state-of-the-art volumetric reconstruction algorithms on three challenging aerial datasets with LIDAR ground truth. Our experiments demonstrate that the proposed algorithm compares favorably in terms of reconstruction accuracy and the ability to expose reconstruction uncertainty.

avg ps

code YouTube pdf suppmat DOI Project Page [BibTex]

code YouTube pdf suppmat DOI Project Page [BibTex]


Displets: Resolving Stereo Ambiguities using Object Knowledge
Displets: Resolving Stereo Ambiguities using Object Knowledge

Güney, F., Geiger, A.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) 2015, pages: 4165-4175, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2015 (inproceedings)

Abstract
Stereo techniques have witnessed tremendous progress over the last decades, yet some aspects of the problem still remain challenging today. Striking examples are reflecting and textureless surfaces which cannot easily be recovered using traditional local regularizers. In this paper, we therefore propose to regularize over larger distances using object-category specific disparity proposals (displets) which we sample using inverse graphics techniques based on a sparse disparity estimate and a semantic segmentation of the image. The proposed displets encode the fact that objects of certain categories are not arbitrarily shaped but typically exhibit regular structures. We integrate them as non-local regularizer for the challenging object class 'car' into a superpixel based CRF framework and demonstrate its benefits on the KITTI stereo evaluation.

avg ps

pdf abstract suppmat [BibTex]

pdf abstract suppmat [BibTex]


Object Scene Flow for Autonomous Vehicles
Object Scene Flow for Autonomous Vehicles

Menze, M., Geiger, A.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) 2015, pages: 3061-3070, IEEE, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2015 (inproceedings)

Abstract
This paper proposes a novel model and dataset for 3D scene flow estimation with an application to autonomous driving. Taking advantage of the fact that outdoor scenes often decompose into a small number of independently moving objects, we represent each element in the scene by its rigid motion parameters and each superpixel by a 3D plane as well as an index to the corresponding object. This minimal representation increases robustness and leads to a discrete-continuous CRF where the data term decomposes into pairwise potentials between superpixels and objects. Moreover, our model intrinsically segments the scene into its constituting dynamic components. We demonstrate the performance of our model on existing benchmarks as well as a novel realistic dataset with scene flow ground truth. We obtain this dataset by annotating 400 dynamic scenes from the KITTI raw data collection using detailed 3D CAD models for all vehicles in motion. Our experiments also reveal novel challenges which can't be handled by existing methods.

avg ps

pdf abstract suppmat DOI [BibTex]

pdf abstract suppmat DOI [BibTex]


Joint 3D Object and Layout Inference from a single RGB-D Image
Joint 3D Object and Layout Inference from a single RGB-D Image

(Best Paper Award)

Geiger, A., Wang, C.

In German Conference on Pattern Recognition (GCPR), 9358, pages: 183-195, Lecture Notes in Computer Science, Springer International Publishing, 2015 (inproceedings)

Abstract
Inferring 3D objects and the layout of indoor scenes from a single RGB-D image captured with a Kinect camera is a challenging task. Towards this goal, we propose a high-order graphical model and jointly reason about the layout, objects and superpixels in the image. In contrast to existing holistic approaches, our model leverages detailed 3D geometry using inverse graphics and explicitly enforces occlusion and visibility constraints for respecting scene properties and projective geometry. We cast the task as MAP inference in a factor graph and solve it efficiently using message passing. We evaluate our method with respect to several baselines on the challenging NYUv2 indoor dataset using 21 object categories. Our experiments demonstrate that the proposed method is able to infer scenes with a large degree of clutter and occlusions.

avg ps

pdf suppmat video project DOI [BibTex]

pdf suppmat video project DOI [BibTex]


Discrete Optimization for Optical Flow
Discrete Optimization for Optical Flow

Menze, M., Heipke, C., Geiger, A.

In German Conference on Pattern Recognition (GCPR), 9358, pages: 16-28, Springer International Publishing, 2015 (inproceedings)

Abstract
We propose to look at large-displacement optical flow from a discrete point of view. Motivated by the observation that sub-pixel accuracy is easily obtained given pixel-accurate optical flow, we conjecture that computing the integral part is the hardest piece of the problem. Consequently, we formulate optical flow estimation as a discrete inference problem in a conditional random field, followed by sub-pixel refinement. Naive discretization of the 2D flow space, however, is intractable due to the resulting size of the label set. In this paper, we therefore investigate three different strategies, each able to reduce computation and memory demands by several orders of magnitude. Their combination allows us to estimate large-displacement optical flow both accurately and efficiently and demonstrates the potential of discrete optimization for optical flow. We obtain state-of-the-art performance on MPI Sintel and KITTI.

avg ps

pdf suppmat project DOI [BibTex]

pdf suppmat project DOI [BibTex]


Joint 3D Estimation of Vehicles and Scene Flow
Joint 3D Estimation of Vehicles and Scene Flow

Menze, M., Heipke, C., Geiger, A.

In Proc. of the ISPRS Workshop on Image Sequence Analysis (ISA), 2015 (inproceedings)

Abstract
Three-dimensional reconstruction of dynamic scenes is an important prerequisite for applications like mobile robotics or autonomous driving. While much progress has been made in recent years, imaging conditions in natural outdoor environments are still very challenging for current reconstruction and recognition methods. In this paper, we propose a novel unified approach which reasons jointly about 3D scene flow as well as the pose, shape and motion of vehicles in the scene. Towards this goal, we incorporate a deformable CAD model into a slanted-plane conditional random field for scene flow estimation and enforce shape consistency between the rendered 3D models and the parameters of all superpixels in the image. The association of superpixels to objects is established by an index variable which implicitly enables model selection. We evaluate our approach on the challenging KITTI scene flow dataset in terms of object and scene flow estimation. Our results provide a prove of concept and demonstrate the usefulness of our method.

avg ps

PDF [BibTex]

PDF [BibTex]