Header logo is


2018


no image
Reducing 3D Vibrations to 1D in Real Time

Park, G., Kuchenbecker, K. J.

Hands-on demonstration (4 pages) presented at AsiaHaptics, Incheon, South Korea, November 2018 (misc)

Abstract
For simple and realistic vibrotactile feedback, 3D accelerations from real contact interactions are usually rendered using a single-axis vibration actuator; this dimensional reduction can be performed in many ways. This demonstration implements a real-time conversion system that simultaneously measures 3D accelerations and renders corresponding 1D vibrations using a two-pen interface. In the demonstration, a user freely interacts with various objects using an In-Pen that contains a 3-axis accelerometer. The captured accelerations are converted to a single-axis signal, and an Out-Pen renders the reduced signal for the user to feel. We prepared seven conversion methods from the simple use of a single-axis signal to applying principal component analysis (PCA) so that users can compare the performance of each conversion method in this demonstration.

hi

Project Page [BibTex]

2018


Project Page [BibTex]


A Large-Scale Fabric-Based Tactile Sensor Using Electrical Resistance Tomography
A Large-Scale Fabric-Based Tactile Sensor Using Electrical Resistance Tomography

Lee, H., Park, K., Kim, J., Kuchenbecker, K. J.

Hands-on demonstration (3 pages) presented at AsiaHaptics, Incheon, South Korea, November 2018 (misc)

Abstract
Large-scale tactile sensing is important for household robots and human-robot interaction because contacts can occur all over a robot’s body surface. This paper presents a new fabric-based tactile sensor that is straightforward to manufacture and can cover a large area. The tactile sensor is made of conductive and non-conductive fabric layers, and the electrodes are stitched with conductive thread, so the resulting device is flexible and stretchable. The sensor utilizes internal array electrodes and a reconstruction method called electrical resistance tomography (ERT) to achieve a high spatial resolution with a small number of electrodes. The developed sensor shows that only 16 electrodes can accurately estimate single and multiple contacts over a square that measures 20 cm by 20 cm.

hi

Project Page [BibTex]

Project Page [BibTex]


Gait learning for soft microrobots controlled by light fields
Gait learning for soft microrobots controlled by light fields

Rohr, A. V., Trimpe, S., Marco, A., Fischer, P., Palagi, S.

In International Conference on Intelligent Robots and Systems (IROS) 2018, pages: 6199-6206, International Conference on Intelligent Robots and Systems 2018, October 2018 (inproceedings)

Abstract
Soft microrobots based on photoresponsive materials and controlled by light fields can generate a variety of different gaits. This inherent flexibility can be exploited to maximize their locomotion performance in a given environment and used to adapt them to changing environments. However, because of the lack of accurate locomotion models, and given the intrinsic variability among microrobots, analytical control design is not possible. Common data-driven approaches, on the other hand, require running prohibitive numbers of experiments and lead to very sample-specific results. Here we propose a probabilistic learning approach for light-controlled soft microrobots based on Bayesian Optimization (BO) and Gaussian Processes (GPs). The proposed approach results in a learning scheme that is highly data-efficient, enabling gait optimization with a limited experimental budget, and robust against differences among microrobot samples. These features are obtained by designing the learning scheme through the comparison of different GP priors and BO settings on a semisynthetic data set. The developed learning scheme is validated in microrobot experiments, resulting in a 115% improvement in a microrobot’s locomotion performance with an experimental budget of only 20 tests. These encouraging results lead the way toward self-adaptive microrobotic systems based on lightcontrolled soft microrobots and probabilistic learning control.

ics pf

arXiv IEEE Xplore DOI Project Page [BibTex]

arXiv IEEE Xplore DOI Project Page [BibTex]


On the Integration of Optical Flow and Action Recognition
On the Integration of Optical Flow and Action Recognition

Sevilla-Lara, L., Liao, Y., Güney, F., Jampani, V., Geiger, A., Black, M. J.

In German Conference on Pattern Recognition (GCPR), LNCS 11269, pages: 281-297, Springer, Cham, October 2018 (inproceedings)

Abstract
Most of the top performing action recognition methods use optical flow as a "black box" input. Here we take a deeper look at the combination of flow and action recognition, and investigate why optical flow is helpful, what makes a flow method good for action recognition, and how we can make it better. In particular, we investigate the impact of different flow algorithms and input transformations to better understand how these affect a state-of-the-art action recognition method. Furthermore, we fine tune two neural-network flow methods end-to-end on the most widely used action recognition dataset (UCF101). Based on these experiments, we make the following five observations: 1) optical flow is useful for action recognition because it is invariant to appearance, 2) optical flow methods are optimized to minimize end-point-error (EPE), but the EPE of current methods is not well correlated with action recognition performance, 3) for the flow methods tested, accuracy at boundaries and at small displacements is most correlated with action recognition performance, 4) training optical flow to minimize classification error instead of minimizing EPE improves recognition performance, and 5) optical flow learned for the task of action recognition differs from traditional optical flow especially inside the human body and at the boundary of the body. These observations may encourage optical flow researchers to look beyond EPE as a goal and guide action recognition researchers to seek better motion cues, leading to a tighter integration of the optical flow and action recognition communities.

avg ps

arXiv DOI [BibTex]

arXiv DOI [BibTex]


Towards Robust Visual Odometry with a Multi-Camera System
Towards Robust Visual Odometry with a Multi-Camera System

Liu, P., Geppert, M., Heng, L., Sattler, T., Geiger, A., Pollefeys, M.

In International Conference on Intelligent Robots and Systems (IROS) 2018, International Conference on Intelligent Robots and Systems, October 2018 (inproceedings)

Abstract
We present a visual odometry (VO) algorithm for a multi-camera system and robust operation in challenging environments. Our algorithm consists of a pose tracker and a local mapper. The tracker estimates the current pose by minimizing photometric errors between the most recent keyframe and the current frame. The mapper initializes the depths of all sampled feature points using plane-sweeping stereo. To reduce pose drift, a sliding window optimizer is used to refine poses and structure jointly. Our formulation is flexible enough to support an arbitrary number of stereo cameras. We evaluate our algorithm thoroughly on five datasets. The datasets were captured in different conditions: daytime, night-time with near-infrared (NIR) illumination and night-time without NIR illumination. Experimental results show that a multi-camera setup makes the VO more robust to challenging environments, especially night-time conditions, in which a single stereo configuration fails easily due to the lack of features.

avg

pdf Project Page [BibTex]

pdf Project Page [BibTex]


Learning Priors for Semantic 3D Reconstruction
Learning Priors for Semantic 3D Reconstruction

Cherabier, I., Schönberger, J., Oswald, M., Pollefeys, M., Geiger, A.

In Computer Vision – ECCV 2018, Springer International Publishing, Cham, September 2018 (inproceedings)

Abstract
We present a novel semantic 3D reconstruction framework which embeds variational regularization into a neural network. Our network performs a fixed number of unrolled multi-scale optimization iterations with shared interaction weights. In contrast to existing variational methods for semantic 3D reconstruction, our model is end-to-end trainable and captures more complex dependencies between the semantic labels and the 3D geometry. Compared to previous learning-based approaches to 3D reconstruction, we integrate powerful long-range dependencies using variational coarse-to-fine optimization. As a result, our network architecture requires only a moderate number of parameters while keeping a high level of expressiveness which enables learning from very little data. Experiments on real and synthetic datasets demonstrate that our network achieves higher accuracy compared to a purely variational approach while at the same time requiring two orders of magnitude less iterations to converge. Moreover, our approach handles ten times more semantic class labels using the same computational resources.

avg

pdf suppmat Project Page Video DOI Project Page [BibTex]

pdf suppmat Project Page Video DOI Project Page [BibTex]


Unsupervised Learning of Multi-Frame Optical Flow with Occlusions
Unsupervised Learning of Multi-Frame Optical Flow with Occlusions

Janai, J., Güney, F., Ranjan, A., Black, M. J., Geiger, A.

In European Conference on Computer Vision (ECCV), Lecture Notes in Computer Science, vol 11220, pages: 713-731, Springer, Cham, September 2018 (inproceedings)

avg ps

pdf suppmat Video Project Page DOI Project Page [BibTex]

pdf suppmat Video Project Page DOI Project Page [BibTex]


SphereNet: Learning Spherical Representations for Detection and Classification in Omnidirectional Images
SphereNet: Learning Spherical Representations for Detection and Classification in Omnidirectional Images

Coors, B., Condurache, A. P., Geiger, A.

European Conference on Computer Vision (ECCV), September 2018 (conference)

Abstract
Omnidirectional cameras offer great benefits over classical cameras wherever a wide field of view is essential, such as in virtual reality applications or in autonomous robots. Unfortunately, standard convolutional neural networks are not well suited for this scenario as the natural projection surface is a sphere which cannot be unwrapped to a plane without introducing significant distortions, particularly in the polar regions. In this work, we present SphereNet, a novel deep learning framework which encodes invariance against such distortions explicitly into convolutional neural networks. Towards this goal, SphereNet adapts the sampling locations of the convolutional filters, effectively reversing distortions, and wraps the filters around the sphere. By building on regular convolutions, SphereNet enables the transfer of existing perspective convolutional neural network models to the omnidirectional case. We demonstrate the effectiveness of our method on the tasks of image classification and object detection, exploiting two newly created semi-synthetic and real-world omnidirectional datasets.

avg

pdf suppmat Project Page [BibTex]


Statistical Modelling of Fingertip Deformations and Contact Forces during Tactile Interaction
Statistical Modelling of Fingertip Deformations and Contact Forces during Tactile Interaction

Gueorguiev, D., Tzionas, D., Pacchierotti, C., Black, M. J., Kuchenbecker, K. J.

Extended abstract presented at the Hand, Brain and Technology conference (HBT), Ascona, Switzerland, August 2018 (misc)

Abstract
Little is known about the shape and properties of the human finger during haptic interaction, even though these are essential parameters for controlling wearable finger devices and deliver realistic tactile feedback. This study explores a framework for four-dimensional scanning (3D over time) and modelling of finger-surface interactions, aiming to capture the motion and deformations of the entire finger with high resolution while simultaneously recording the interfacial forces at the contact. Preliminary results show that when the fingertip is actively pressing a rigid surface, it undergoes lateral expansion and proximal/distal bending, deformations that cannot be captured by imaging of the contact area alone. Therefore, we are currently capturing a dataset that will enable us to create a statistical model of the finger’s deformations and predict the contact forces induced by tactile interaction with objects. This technique could improve current methods for tactile rendering in wearable haptic devices, which rely on general physical modelling of the skin’s compliance, by developing an accurate model of the variations in finger properties across the human population. The availability of such a model will also enable a more realistic simulation of virtual finger behaviour in virtual reality (VR) environments, as well as the ability to accurately model a specific user’s finger from lower resolution data. It may also be relevant for inferring the physical properties of the underlying tissue from observing the surface mesh deformations, as previously shown for body tissues.

hi

Project Page [BibTex]

Project Page [BibTex]


A machine from machines
A machine from machines

Fischer, P.

Nature Physics, 14, pages: 1072–1073, July 2018 (misc)

Abstract
Building spinning microrotors that self-assemble and synchronize to form a gear sounds like an impossible feat. However, it has now been achieved using only a single type of building block -- a colloid that self-propels.

pf

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Reducing 3D Vibrations to 1D in Real Time

Park, G., Kuchenbecker, K. J.

Hands-on demonstration presented at EuroHaptics, Pisa, Italy, June 2018 (misc)

Abstract
In this demonstration, you will hold two pen-shaped modules: an in-pen and an out-pen. The in-pen is instrumented with a high-bandwidth three-axis accelerometer, and the out-pen contains a one-axis voice coil actuator. Use the in-pen to interact with different surfaces; the measured 3D accelerations are continually converted into 1D vibrations and rendered with the out-pen for you to feel. You can test conversion methods that range from simply selecting a single axis to applying a discrete Fourier transform or principal component analysis for realistic and brisk real-time conversion.

hi

Project Page [BibTex]

Project Page [BibTex]


no image
Haptipedia: Exploring Haptic Device Design Through Interactive Visualizations

Seifi, H., Fazlollahi, F., Park, G., Kuchenbecker, K. J., MacLean, K. E.

Hands-on demonstration presented at EuroHaptics, Pisa, Italy, June 2018 (misc)

Abstract
How many haptic devices have been proposed in the last 30 years? How can we leverage this rich source of design knowledge to inspire future innovations? Our goal is to make historical haptic invention accessible through interactive visualization of a comprehensive library – a Haptipedia – of devices that have been annotated with designer-relevant metadata. In this demonstration, participants can explore Haptipedia’s growing library of grounded force feedback devices through several prototype visualizations, interact with 3D simulations of the device mechanisms and movements, and tell us about the attributes and devices that could make Haptipedia a useful resource for the haptic design community.

hi

Project Page [BibTex]

Project Page [BibTex]


no image
Delivering 6-DOF Fingertip Tactile Cues

Young, E., Kuchenbecker, K. J.

Work-in-progress paper (5 pages) presented at EuroHaptics, Pisa, Italy, June 2018 (misc)

hi

Project Page [BibTex]

Project Page [BibTex]


Designing a Haptic Empathetic Robot Animal for Children with Autism
Designing a Haptic Empathetic Robot Animal for Children with Autism

Burns, R., Kuchenbecker, K. J.

Workshop paper (4 pages) presented at the Robotics: Science and Systems Workshop on Robot-Mediated Autism Intervention: Hardware, Software and Curriculum, Pittsburgh, USA, June 2018 (misc)

Abstract
Children with autism often endure sensory overload, may be nonverbal, and have difficulty understanding and relaying emotions. These experiences result in heightened stress during social interaction. Animal-assisted intervention has been found to improve the behavior of children with autism during social interaction, but live animal companions are not always feasible. We are thus in the process of designing a robotic animal to mimic some successful characteristics of animal-assisted intervention while trying to improve on others. The over-arching hypothesis of this research is that an appropriately designed robot animal can reduce stress in children with autism and empower them to engage in social interaction.

hi

link (url) Project Page [BibTex]

link (url) Project Page [BibTex]


no image
Soft Multi-Axis Boundary-Electrode Tactile Sensors for Whole-Body Robotic Skin

Lee, H., Kim, J., Kuchenbecker, K. J.

Workshop paper (2 pages) presented at the RSS Pioneers Workshop, Pittsburgh, USA, June 2018 (misc)

hi

Project Page [BibTex]

Project Page [BibTex]


Soft Miniaturized Linear Actuators Wirelessly Powered by Rotating Permanent Magnets
Soft Miniaturized Linear Actuators Wirelessly Powered by Rotating Permanent Magnets

Qiu, T., Palagi, S., Sachs, J., Fischer, P.

In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages: 3595-3600, May 2018 (inproceedings)

Abstract
Wireless actuation by magnetic fields allows for the operation of untethered miniaturized devices, e.g. in biomedical applications. Nevertheless, generating large controlled forces over relatively large distances is challenging. Magnetic torques are easier to generate and control, but they are not always suitable for the tasks at hand. Moreover, strong magnetic fields are required to generate a sufficient torque, which are difficult to achieve with electromagnets. Here, we demonstrate a soft miniaturized actuator that transforms an externally applied magnetic torque into a controlled linear force. We report the design, fabrication and characterization of both the actuator and the magnetic field generator. We show that the magnet assembly, which is based on a set of rotating permanent magnets, can generate strong controlled oscillating fields over a relatively large workspace. The actuator, which is 3D-printed, can lift a load of more than 40 times its weight. Finally, we show that the actuator can be further miniaturized, paving the way towards strong, wirelessly powered microactuators.

pf

link (url) DOI [BibTex]

link (url) DOI [BibTex]


Robust Dense Mapping for Large-Scale Dynamic Environments
Robust Dense Mapping for Large-Scale Dynamic Environments

Barsan, I. A., Liu, P., Pollefeys, M., Geiger, A.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) 2018, IEEE, International Conference on Robotics and Automation, May 2018 (inproceedings)

Abstract
We present a stereo-based dense mapping algorithm for large-scale dynamic urban environments. In contrast to other existing methods, we simultaneously reconstruct the static background, the moving objects, and the potentially moving but currently stationary objects separately, which is desirable for high-level mobile robotic tasks such as path planning in crowded environments. We use both instance-aware semantic segmentation and sparse scene flow to classify objects as either background, moving, or potentially moving, thereby ensuring that the system is able to model objects with the potential to transition from static to dynamic, such as parked cars. Given camera poses estimated from visual odometry, both the background and the (potentially) moving objects are reconstructed separately by fusing the depth maps computed from the stereo input. In addition to visual odometry, sparse scene flow is also used to estimate the 3D motions of the detected moving objects, in order to reconstruct them accurately. A map pruning technique is further developed to improve reconstruction accuracy and reduce memory consumption, leading to increased scalability. We evaluate our system thoroughly on the well-known KITTI dataset. Our system is capable of running on a PC at approximately 2.5Hz, with the primary bottleneck being the instance-aware semantic segmentation, which is a limitation we hope to address in future work.

avg

pdf Video Project Page Project Page [BibTex]

pdf Video Project Page Project Page [BibTex]


no image
Arm-Worn Tactile Displays

Kuchenbecker, K. J.

Cross-Cutting Challenge Interactive Discussion presented at the IEEE Haptics Symposium, San Francisco, USA, March 2018 (misc)

Abstract
Fingertips and hands captivate the attention of most haptic interface designers, but humans can feel touch stimuli across the entire body surface. Trying to create devices that both can be worn and can deliver good haptic sensations raises challenges that rarely arise in other contexts. Most notably, tactile cues such as vibration, tapping, and squeezing are far simpler to implement in wearable systems than kinesthetic haptic feedback. This interactive discussion will present a variety of relevant projects to which I have contributed, attempting to pull out common themes and ideas for the future.

hi

[BibTex]

[BibTex]


Haptipedia: An Expert-Sourced Interactive Device Visualization for Haptic Designers
Haptipedia: An Expert-Sourced Interactive Device Visualization for Haptic Designers

Seifi, H., MacLean, K. E., Kuchenbecker, K. J., Park, G.

Work-in-progress paper (3 pages) presented at the IEEE Haptics Symposium, San Francisco, USA, March 2018 (misc)

Abstract
Much of three decades of haptic device invention is effectively lost to today’s designers: dispersion across time, region, and discipline imposes an incalculable drag on innovation in this field. Our goal is to make historical haptic invention accessible through interactive navigation of a comprehensive library – a Haptipedia – of devices that have been annotated with designer-relevant metadata. To build this open resource, we will systematically mine the literature and engage the haptics community for expert annotation. In a multi-year broad-based initiative, we will empirically derive salient attributes of haptic devices, design an interactive visualization tool where device creators and repurposers can efficiently explore and search Haptipedia, and establish methods and tools to manually and algorithmically collect data from the haptics literature and our community of experts. This paper outlines progress in compiling an initial corpus of grounded force-feedback devices and their attributes, and it presents a concept sketch of the interface we envision.

hi

Project Page [BibTex]

Project Page [BibTex]


no image
Exercising with Baxter: Design and Evaluation of Assistive Social-Physical Human-Robot Interaction

Fitter, N. T., Mohan, M., Kuchenbecker, K. J., Johnson, M. J.

Workshop paper (6 pages) presented at the HRI Workshop on Personal Robots for Exercising and Coaching, Chicago, USA, March 2018 (misc)

Abstract
The worldwide population of older adults is steadily increasing and will soon exceed the capacity of assisted living facilities. Accordingly, we aim to understand whether appropriately designed robots could help older adults stay active and engaged while living at home. We developed eight human-robot exercise games for the Baxter Research Robot with the guidance of experts in game design, therapy, and rehabilitation. After extensive iteration, these games were employed in a user study that tested their viability with 20 younger and 20 older adult users. All participants were willing to enter Baxter’s workspace and physically interact with the robot. User trust and confidence in Baxter increased significantly between pre- and post-experiment assessments, and one individual from the target user population supplied us with abundant positive feedback about her experience. The preliminary results presented in this paper indicate potential for the use of two-armed human-scale robots for social-physical exercise interaction.

hi

link (url) Project Page [BibTex]

link (url) Project Page [BibTex]


Emotionally Supporting Humans Through Robot Hugs
Emotionally Supporting Humans Through Robot Hugs

Block, A. E., Kuchenbecker, K. J.

Workshop paper (2 pages) presented at the HRI Pioneers Workshop, Chicago, USA, March 2018 (misc)

Abstract
Hugs are one of the first forms of contact and affection humans experience. Due to their prevalence and health benefits, we want to enable robots to safely hug humans. This research strives to create and study a high fidelity robotic system that provides emotional support to people through hugs. This paper outlines our previous work evaluating human responses to a prototype’s physical and behavioral characteristics, and then it lays out our ongoing and future work.

hi

link (url) DOI Project Page [BibTex]

link (url) DOI Project Page [BibTex]


Towards a Statistical Model of Fingertip Contact Deformations from 4{D} Data
Towards a Statistical Model of Fingertip Contact Deformations from 4D Data

Gueorguiev, D., Tzionas, D., Pacchierotti, C., Black, M. J., Kuchenbecker, K. J.

Work-in-progress paper (3 pages) presented at the IEEE Haptics Symposium, San Francisco, USA, March 2018 (misc)

Abstract
Little is known about the shape and properties of the human finger during haptic interaction even though this knowledge is essential to control wearable finger devices and deliver realistic tactile feedback. This study explores a framework for four-dimensional scanning and modeling of finger-surface interactions, aiming to capture the motion and deformations of the entire finger with high resolution. The results show that when the fingertip is actively pressing a rigid surface, it undergoes lateral expansion of about 0.2 cm and proximal/distal bending of about 30◦, deformations that cannot be captured by imaging of the contact area alone. This project constitutes a first step towards an accurate statistical model of the finger’s behavior during haptic interaction.

hi

link (url) Project Page [BibTex]

link (url) Project Page [BibTex]


no image
Can Humans Infer Haptic Surface Properties from Images?

Burka, A., Kuchenbecker, K. J.

Work-in-progress paper (3 pages) presented at the IEEE Haptics Symposium, San Francisco, USA, March 2018 (misc)

Abstract
Human children typically experience their surroundings both visually and haptically, providing ample opportunities to learn rich cross-sensory associations. To thrive in human environments and interact with the real world, robots also need to build models of these cross-sensory associations; current advances in machine learning should make it possible to infer models from large amounts of data. We previously built a visuo-haptic sensing device, the Proton Pack, and are using it to collect a large database of matched multimodal data from tool-surface interactions. As a benchmark to compare with machine learning performance, we conducted a human subject study (n = 84) on estimating haptic surface properties (here: hardness, roughness, friction, and warmness) from images. Using a 100-surface subset of our database, we showed images to study participants and collected 5635 ratings of the four haptic properties, which we compared with ratings made by the Proton Pack operator and with physical data recorded using motion, force, and vibration sensors. Preliminary results indicate weak correlation between participant and operator ratings, but potential for matching up certain human ratings (particularly hardness and roughness) with features from the literature.

hi

Project Page [BibTex]

Project Page [BibTex]


RayNet: Learning Volumetric 3D Reconstruction with Ray Potentials
RayNet: Learning Volumetric 3D Reconstruction with Ray Potentials

Paschalidou, D., Ulusoy, A. O., Schmitt, C., Gool, L., Geiger, A.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2018, 2018 (inproceedings)

Abstract
In this paper, we consider the problem of reconstructing a dense 3D model using images captured from different views. Recent methods based on convolutional neural networks (CNN) allow learning the entire task from data. However, they do not incorporate the physics of image formation such as perspective geometry and occlusion. Instead, classical approaches based on Markov Random Fields (MRF) with ray-potentials explicitly model these physical processes, but they cannot cope with large surface appearance variations across different viewpoints. In this paper, we propose RayNet, which combines the strengths of both frameworks. RayNet integrates a CNN that learns view-invariant feature representations with an MRF that explicitly encodes the physics of perspective projection and occlusion. We train RayNet end-to-end using empirical risk minimization. We thoroughly evaluate our approach on challenging real-world datasets and demonstrate its benefits over a piece-wise trained baseline, hand-crafted models as well as other learning-based approaches.

avg

pdf suppmat Video Project Page code Poster Project Page [BibTex]

pdf suppmat Video Project Page code Poster Project Page [BibTex]


Deep Marching Cubes: Learning Explicit Surface Representations
Deep Marching Cubes: Learning Explicit Surface Representations

Liao, Y., Donne, S., Geiger, A.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2018, 2018 (inproceedings)

Abstract
Existing learning based solutions to 3D surface prediction cannot be trained end-to-end as they operate on intermediate representations (eg, TSDF) from which 3D surface meshes must be extracted in a post-processing step (eg, via the marching cubes algorithm). In this paper, we investigate the problem of end-to-end 3D surface prediction. We first demonstrate that the marching cubes algorithm is not differentiable and propose an alternative differentiable formulation which we insert as a final layer into a 3D convolutional neural network. We further propose a set of loss functions which allow for training our model with sparse point supervision. Our experiments demonstrate that the model allows for predicting sub-voxel accurate 3D shapes of arbitrary topology. Additionally, it learns to complete shapes and to separate an object's inside from its outside even in the presence of sparse and incomplete ground truth. We investigate the benefits of our approach on the task of inferring shapes from 3D point clouds. Our model is flexible and can be combined with a variety of shape encoder and shape inference techniques.

avg

pdf suppmat Video Project Page Poster Project Page [BibTex]

pdf suppmat Video Project Page Poster Project Page [BibTex]


Semantic Visual Localization
Semantic Visual Localization

Schönberger, J., Pollefeys, M., Geiger, A., Sattler, T.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2018, 2018 (inproceedings)

Abstract
Robust visual localization under a wide range of viewing conditions is a fundamental problem in computer vision. Handling the difficult cases of this problem is not only very challenging but also of high practical relevance, eg, in the context of life-long localization for augmented reality or autonomous robots. In this paper, we propose a novel approach based on a joint 3D geometric and semantic understanding of the world, enabling it to succeed under conditions where previous approaches failed. Our method leverages a novel generative model for descriptor learning, trained on semantic scene completion as an auxiliary task. The resulting 3D descriptors are robust to missing observations by encoding high-level 3D geometric and semantic information. Experiments on several challenging large-scale localization datasets demonstrate reliable localization under extreme viewpoint, illumination, and geometry changes.

avg

pdf suppmat Poster Project Page [BibTex]

pdf suppmat Poster Project Page [BibTex]


Which Training Methods for GANs do actually Converge?
Which Training Methods for GANs do actually Converge?

Mescheder, L., Geiger, A., Nowozin, S.

International Conference on Machine learning (ICML), 2018 (conference)

Abstract
Recent work has shown local convergence of GAN training for absolutely continuous data and generator distributions. In this paper, we show that the requirement of absolute continuity is necessary: we describe a simple yet prototypical counterexample showing that in the more realistic case of distributions that are not absolutely continuous, unregularized GAN training is not always convergent. Furthermore, we discuss regularization strategies that were recently proposed to stabilize GAN training. Our analysis shows that GAN training with instance noise or zero-centered gradient penalties converges. On the other hand, we show that Wasserstein-GANs and WGAN-GP with a finite number of discriminator updates per generator update do not always converge to the equilibrium point. We discuss these results, leading us to a new explanation for the stability problems of GAN training. Based on our analysis, we extend our convergence results to more general GANs and prove local convergence for simplified gradient penalties even if the generator and data distributions lie on lower dimensional manifolds. We find these penalties to work well in practice and use them to learn high-resolution generative image models for a variety of datasets with little hyperparameter tuning.

avg

code video paper supplement slides poster Project Page [BibTex]


Learning 3D Shape Completion from Laser Scan Data with Weak Supervision
Learning 3D Shape Completion from Laser Scan Data with Weak Supervision

Stutz, D., Geiger, A.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2018, 2018 (inproceedings)

Abstract
3D shape completion from partial point clouds is a fundamental problem in computer vision and computer graphics. Recent approaches can be characterized as either data-driven or learning-based. Data-driven approaches rely on a shape model whose parameters are optimized to fit the observations. Learning-based approaches, in contrast, avoid the expensive optimization step and instead directly predict the complete shape from the incomplete observations using deep neural networks. However, full supervision is required which is often not available in practice. In this work, we propose a weakly-supervised learning-based approach to 3D shape completion which neither requires slow optimization nor direct supervision. While we also learn a shape prior on synthetic data, we amortize, ie, learn, maximum likelihood fitting using deep neural networks resulting in efficient shape completion without sacrificing accuracy. Tackling 3D shape completion of cars on ShapeNet and KITTI, we demonstrate that the proposed amortized maximum likelihood approach is able to compete with a fully supervised baseline and a state-of-the-art data-driven approach while being significantly faster. On ModelNet, we additionally show that the approach is able to generalize to other object categories as well.

avg

pdf suppmat Project Page Poster Project Page [BibTex]

pdf suppmat Project Page Poster Project Page [BibTex]


Learning Transformation Invariant Representations with Weak Supervision
Learning Transformation Invariant Representations with Weak Supervision

Coors, B., Condurache, A., Mertins, A., Geiger, A.

In International Conference on Computer Vision Theory and Applications, International Conference on Computer Vision Theory and Applications, 2018 (inproceedings)

Abstract
Deep convolutional neural networks are the current state-of-the-art solution to many computer vision tasks. However, their ability to handle large global and local image transformations is limited. Consequently, extensive data augmentation is often utilized to incorporate prior knowledge about desired invariances to geometric transformations such as rotations or scale changes. In this work, we combine data augmentation with an unsupervised loss which enforces similarity between the predictions of augmented copies of an input sample. Our loss acts as an effective regularizer which facilitates the learning of transformation invariant representations. We investigate the effectiveness of the proposed similarity loss on rotated MNIST and the German Traffic Sign Recognition Benchmark (GTSRB) in the context of different classification models including ladder networks. Our experiments demonstrate improvements with respect to the standard data augmentation approach for supervised and semi-supervised learning tasks, in particular in the presence of little annotated data. In addition, we analyze the performance of the proposed approach with respect to its hyperparameters, including the strength of the regularization as well as the layer where representation similarity is enforced.

avg

pdf [BibTex]

pdf [BibTex]