Header logo is


2017


On the Design of {LQR} Kernels for Efficient Controller Learning
On the Design of LQR Kernels for Efficient Controller Learning

Marco, A., Hennig, P., Schaal, S., Trimpe, S.

Proceedings of the 56th IEEE Annual Conference on Decision and Control (CDC), pages: 5193-5200, IEEE, IEEE Conference on Decision and Control, December 2017 (conference)

Abstract
Finding optimal feedback controllers for nonlinear dynamic systems from data is hard. Recently, Bayesian optimization (BO) has been proposed as a powerful framework for direct controller tuning from experimental trials. For selecting the next query point and finding the global optimum, BO relies on a probabilistic description of the latent objective function, typically a Gaussian process (GP). As is shown herein, GPs with a common kernel choice can, however, lead to poor learning outcomes on standard quadratic control problems. For a first-order system, we construct two kernels that specifically leverage the structure of the well-known Linear Quadratic Regulator (LQR), yet retain the flexibility of Bayesian nonparametric learning. Simulations of uncertain linear and nonlinear systems demonstrate that the LQR kernels yield superior learning performance.

am ics pn

arXiv PDF On the Design of LQR Kernels for Efficient Controller Learning - CDC presentation DOI Project Page [BibTex]

2017


arXiv PDF On the Design of LQR Kernels for Efficient Controller Learning - CDC presentation DOI Project Page [BibTex]


no image
Optimal gamification can help people procrastinate less

Lieder, F., Griffiths, T. L.

Annual Meeting of the Society for Judgment and Decision Making, Annual Meeting of the Society for Judgment and Decision Making, November 2017 (conference)

re

Project Page [BibTex]

Project Page [BibTex]


A Generative Model of People in Clothing
A Generative Model of People in Clothing

Lassner, C., Pons-Moll, G., Gehler, P. V.

In Proceedings IEEE International Conference on Computer Vision (ICCV), IEEE, Piscataway, NJ, USA, IEEE International Conference on Computer Vision (ICCV), October 2017 (inproceedings)

Abstract
We present the first image-based generative model of people in clothing in a full-body setting. We sidestep the commonly used complex graphics rendering pipeline and the need for high-quality 3D scans of dressed people. Instead, we learn generative models from a large image database. The main challenge is to cope with the high variance in human pose, shape and appearance. For this reason, pure image-based approaches have not been considered so far. We show that this challenge can be overcome by splitting the generating process in two parts. First, we learn to generate a semantic segmentation of the body and clothing. Second, we learn a conditional model on the resulting segments that creates realistic images. The full model is differentiable and can be conditioned on pose, shape or color. The result are samples of people in different clothing items and styles. The proposed model can generate entirely new people with realistic clothing. In several experiments we present encouraging results that suggest an entirely data-driven approach to people generation is possible.

ps

link (url) Project Page [BibTex]

link (url) Project Page [BibTex]


Semantic Video CNNs through Representation Warping
Semantic Video CNNs through Representation Warping

Gadde, R., Jampani, V., Gehler, P. V.

In Proceedings IEEE International Conference on Computer Vision (ICCV), IEEE, Piscataway, NJ, USA, IEEE International Conference on Computer Vision (ICCV), October 2017 (inproceedings) Accepted

Abstract
In this work, we propose a technique to convert CNN models for semantic segmentation of static images into CNNs for video data. We describe a warping method that can be used to augment existing architectures with very lit- tle extra computational cost. This module is called Net- Warp and we demonstrate its use for a range of network architectures. The main design principle is to use optical flow of adjacent frames for warping internal network repre- sentations across time. A key insight of this work is that fast optical flow methods can be combined with many different CNN architectures for improved performance and end-to- end training. Experiments validate that the proposed ap- proach incurs only little extra computational cost, while im- proving performance, when video streams are available. We achieve new state-of-the-art results on the standard CamVid and Cityscapes benchmark datasets and show reliable im- provements over different baseline networks. Our code and models are available at http://segmentation.is. tue.mpg.de

ps

pdf Supplementary Project Page [BibTex]

pdf Supplementary Project Page [BibTex]


A simple yet effective baseline for 3d human pose estimation
A simple yet effective baseline for 3d human pose estimation

Martinez, J., Hossain, R., Romero, J., Little, J. J.

In Proceedings IEEE International Conference on Computer Vision (ICCV), IEEE, Piscataway, NJ, USA, IEEE International Conference on Computer Vision (ICCV), October 2017 (inproceedings)

Abstract
Following the success of deep convolutional networks, state-of-the-art methods for 3d human pose estimation have focused on deep end-to-end systems that predict 3d joint locations given raw image pixels. Despite their excellent performance, it is often not easy to understand whether their remaining error stems from a limited 2d pose (visual) understanding, or from a failure to map 2d poses into 3-dimensional positions. With the goal of understanding these sources of error, we set out to build a system that given 2d joint locations predicts 3d positions. Much to our surprise, we have found that, with current technology, "lifting" ground truth 2d joint locations to 3d space is a task that can be solved with a remarkably low error rate: a relatively simple deep feed-forward network outperforms the best reported result by about 30\% on Human3.6M, the largest publicly available 3d pose estimation benchmark. Furthermore, training our system on the output of an off-the-shelf state-of-the-art 2d detector (\ie, using images as input) yields state of the art results -- this includes an array of systems that have been trained end-to-end specifically for this task. Our results indicate that a large portion of the error of modern deep 3d pose estimation systems stems from their visual analysis, and suggests directions to further advance the state of the art in 3d human pose estimation.

ps

video code arxiv pdf preprint Project Page [BibTex]

video code arxiv pdf preprint Project Page [BibTex]


 Effects of animation retargeting on perceived action outcomes
Effects of animation retargeting on perceived action outcomes

Kenny, S., Mahmood, N., Honda, C., Black, M. J., Troje, N. F.

Proceedings of the ACM Symposium on Applied Perception (SAP’17), pages: 2:1-2:7, September 2017 (conference)

Abstract
The individual shape of the human body, including the geometry of its articulated structure and the distribution of weight over that structure, influences the kinematics of a person's movements. How sensitive is the visual system to inconsistencies between shape and motion introduced by retargeting motion from one person onto the shape of another? We used optical motion capture to record five pairs of male performers with large differences in body weight, while they pushed, lifted, and threw objects. Based on a set of 67 markers, we estimated both the kinematics of the actions as well as the performer's individual body shape. To obtain consistent and inconsistent stimuli, we created animated avatars by combining the shape and motion estimates from either a single performer or from different performers. In a virtual reality environment, observers rated the perceived weight or thrown distance of the objects. They were also asked to explicitly discriminate between consistent and hybrid stimuli. Observers were unable to accomplish the latter, but hybridization of shape and motion influenced their judgements of action outcome in systematic ways. Inconsistencies between shape and motion were assimilated into an altered perception of the action outcome.

ps

pdf DOI [BibTex]

pdf DOI [BibTex]


Coupling Adaptive Batch Sizes with Learning Rates
Coupling Adaptive Batch Sizes with Learning Rates

Balles, L., Romero, J., Hennig, P.

In Proceedings Conference on Uncertainty in Artificial Intelligence (UAI) 2017, pages: 410-419, (Editors: Gal Elidan and Kristian Kersting), Association for Uncertainty in Artificial Intelligence (AUAI), Conference on Uncertainty in Artificial Intelligence (UAI), August 2017 (inproceedings)

Abstract
Mini-batch stochastic gradient descent and variants thereof have become standard for large-scale empirical risk minimization like the training of neural networks. These methods are usually used with a constant batch size chosen by simple empirical inspection. The batch size significantly influences the behavior of the stochastic optimization algorithm, though, since it determines the variance of the gradient estimates. This variance also changes over the optimization process; when using a constant batch size, stability and convergence is thus often enforced by means of a (manually tuned) decreasing learning rate schedule. We propose a practical method for dynamic batch size adaptation. It estimates the variance of the stochastic gradients and adapts the batch size to decrease the variance proportionally to the value of the objective function, removing the need for the aforementioned learning rate decrease. In contrast to recent related work, our algorithm couples the batch size to the learning rate, directly reflecting the known relationship between the two. On three image classification benchmarks, our batch size adaptation yields faster optimization convergence, while simultaneously simplifying learning rate tuning. A TensorFlow implementation is available.

ps pn

Code link (url) Project Page [BibTex]

Code link (url) Project Page [BibTex]


Joint Graph Decomposition and Node Labeling by Local Search
Joint Graph Decomposition and Node Labeling by Local Search

Levinkov, E., Uhrig, J., Tang, S., Omran, M., Insafutdinov, E., Kirillov, A., Rother, C., Brox, T., Schiele, B., Andres, B.

In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages: 1904-1912, IEEE, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

ps

PDF Supplementary DOI Project Page [BibTex]

PDF Supplementary DOI Project Page [BibTex]


Dynamic {FAUST}: Registering Human Bodies in Motion
Dynamic FAUST: Registering Human Bodies in Motion

Bogo, F., Romero, J., Pons-Moll, G., Black, M. J.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

Abstract
While the ready availability of 3D scan data has influenced research throughout computer vision, less attention has focused on 4D data; that is 3D scans of moving nonrigid objects, captured over time. To be useful for vision research, such 4D scans need to be registered, or aligned, to a common topology. Consequently, extending mesh registration methods to 4D is important. Unfortunately, no ground-truth datasets are available for quantitative evaluation and comparison of 4D registration methods. To address this we create a novel dataset of high-resolution 4D scans of human subjects in motion, captured at 60 fps. We propose a new mesh registration method that uses both 3D geometry and texture information to register all scans in a sequence to a common reference topology. The approach exploits consistency in texture over both short and long time intervals and deals with temporal offsets between shape and texture capture. We show how using geometry alone results in significant errors in alignment when the motions are fast and non-rigid. We evaluate the accuracy of our registration and provide a dataset of 40,000 raw and aligned meshes. Dynamic FAUST extends the popular FAUST dataset to dynamic 4D data, and is available for research purposes at http://dfaust.is.tue.mpg.de.

ps

pdf video Project Page Project Page Project Page [BibTex]

pdf video Project Page Project Page Project Page [BibTex]


Learning from Synthetic Humans
Learning from Synthetic Humans

Varol, G., Romero, J., Martin, X., Mahmood, N., Black, M. J., Laptev, I., Schmid, C.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

Abstract
Estimating human pose, shape, and motion from images and videos are fundamental challenges with many applications. Recent advances in 2D human pose estimation use large amounts of manually-labeled training data for learning convolutional neural networks (CNNs). Such data is time consuming to acquire and difficult to extend. Moreover, manual labeling of 3D pose, depth and motion is impractical. In this work we present SURREAL (Synthetic hUmans foR REAL tasks): a new large-scale dataset with synthetically-generated but realistic images of people rendered from 3D sequences of human motion capture data. We generate more than 6 million frames together with ground truth pose, depth maps, and segmentation masks. We show that CNNs trained on our synthetic dataset allow for accurate human depth estimation and human part segmentation in real RGB images. Our results and the new dataset open up new possibilities for advancing person analysis using cheap and large-scale synthetic data.

ps

arXiv project data Project Page Project Page [BibTex]

arXiv project data Project Page Project Page [BibTex]


On human motion prediction using recurrent neural networks
On human motion prediction using recurrent neural networks

Martinez, J., Black, M. J., Romero, J.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

Abstract
Human motion modelling is a classical problem at the intersection of graphics and computer vision, with applications spanning human-computer interaction, motion synthesis, and motion prediction for virtual and augmented reality. Following the success of deep learning methods in several computer vision tasks, recent work has focused on using deep recurrent neural networks (RNNs) to model human motion, with the goal of learning time-dependent representations that perform tasks such as short-term motion prediction and long-term human motion synthesis. We examine recent work, with a focus on the evaluation methodologies commonly used in the literature, and show that, surprisingly, state-of-the-art performance can be achieved by a simple baseline that does not attempt to model motion at all. We investigate this result, and analyze recent RNN methods by looking at the architectures, loss functions, and training procedures used in state-of-the-art approaches. We propose three changes to the standard RNN models typically used for human motion, which result in a simple and scalable RNN architecture that obtains state-of-the-art performance on human motion prediction.

ps

arXiv Project Page [BibTex]

arXiv Project Page [BibTex]


Articulated Multi-person Tracking in the Wild
Articulated Multi-person Tracking in the Wild

Insafutdinov, E., Andriluka, M., Pishchulin, L., Tang, S., Levinkov, E., Andres, B., Schiele, B.

In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages: 1293-1301, IEEE, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017, Oral (inproceedings)

ps

DOI [BibTex]

DOI [BibTex]


Slow Flow: Exploiting High-Speed Cameras for Accurate and Diverse Optical Flow Reference Data
Slow Flow: Exploiting High-Speed Cameras for Accurate and Diverse Optical Flow Reference Data

Janai, J., Güney, F., Wulff, J., Black, M., Geiger, A.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, pages: 1406-1416, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

Abstract
Existing optical flow datasets are limited in size and variability due to the difficulty of capturing dense ground truth. In this paper, we tackle this problem by tracking pixels through densely sampled space-time volumes recorded with a high-speed video camera. Our model exploits the linearity of small motions and reasons about occlusions from multiple frames. Using our technique, we are able to establish accurate reference flow fields outside the laboratory in natural environments. Besides, we show how our predictions can be used to augment the input images with realistic motion blur. We demonstrate the quality of the produced flow fields on synthetic and real-world datasets. Finally, we collect a novel challenging optical flow dataset by applying our technique on data from a high-speed camera and analyze the performance of the state-of-the-art in optical flow under various levels of motion blur.

avg ps

pdf suppmat Project page Video DOI Project Page [BibTex]

pdf suppmat Project page Video DOI Project Page [BibTex]


Optical Flow in Mostly Rigid Scenes
Optical Flow in Mostly Rigid Scenes

Wulff, J., Sevilla-Lara, L., Black, M. J.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, pages: 6911-6920, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

Abstract
The optical flow of natural scenes is a combination of the motion of the observer and the independent motion of objects. Existing algorithms typically focus on either recovering motion and structure under the assumption of a purely static world or optical flow for general unconstrained scenes. We combine these approaches in an optical flow algorithm that estimates an explicit segmentation of moving objects from appearance and physical constraints. In static regions we take advantage of strong constraints to jointly estimate the camera motion and the 3D structure of the scene over multiple frames. This allows us to also regularize the structure instead of the motion. Our formulation uses a Plane+Parallax framework, which works even under small baselines, and reduces the motion estimation to a one-dimensional search problem, resulting in more accurate estimation. In moving regions the flow is treated as unconstrained, and computed with an existing optical flow method. The resulting Mostly-Rigid Flow (MR-Flow) method achieves state-of-the-art results on both the MPISintel and KITTI-2015 benchmarks.

ps

pdf SupMat video code Project Page [BibTex]

pdf SupMat video code Project Page [BibTex]


OctNet: Learning Deep 3D Representations at High Resolutions
OctNet: Learning Deep 3D Representations at High Resolutions

Riegler, G., Ulusoy, O., Geiger, A.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

Abstract
We present OctNet, a representation for deep learning with sparse 3D data. In contrast to existing models, our representation enables 3D convolutional networks which are both deep and high resolution. Towards this goal, we exploit the sparsity in the input data to hierarchically partition the space using a set of unbalanced octrees where each leaf node stores a pooled feature representation. This allows to focus memory allocation and computation to the relevant dense regions and enables deeper networks without compromising resolution. We demonstrate the utility of our OctNet representation by analyzing the impact of resolution on several 3D tasks including 3D object classification, orientation estimation and point cloud labeling.

avg ps

pdf suppmat Project Page Video Project Page [BibTex]

pdf suppmat Project Page Video Project Page [BibTex]


Reflectance Adaptive Filtering Improves Intrinsic Image Estimation
Reflectance Adaptive Filtering Improves Intrinsic Image Estimation

Nestmeyer, T., Gehler, P. V.

In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages: 1771-1780, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

ps

pre-print DOI Project Page Project Page [BibTex]

pre-print DOI Project Page Project Page [BibTex]


Detailed, accurate, human shape estimation from clothed {3D} scan sequences
Detailed, accurate, human shape estimation from clothed 3D scan sequences

Zhang, C., Pujades, S., Black, M., Pons-Moll, G.

In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society, Washington, DC, USA, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), July 2017, Spotlight (inproceedings)

Abstract
We address the problem of estimating human body shape from 3D scans over time. Reliable estimation of 3D body shape is necessary for many applications including virtual try-on, health monitoring, and avatar creation for virtual reality. Scanning bodies in minimal clothing, however, presents a practical barrier to these applications. We address this problem by estimating body shape under clothing from a sequence of 3D scans. Previous methods that have exploited statistical models of body shape produce overly smooth shapes lacking personalized details. In this paper we contribute a new approach to recover not only an approximate shape of the person, but also their detailed shape. Our approach allows the estimated shape to deviate from a parametric model to fit the 3D scans. We demonstrate the method using high quality 4D data as well as sequences of visual hulls extracted from multi-view images. We also make available a new high quality 4D dataset that enables quantitative evaluation. Our method outperforms the previous state of the art, both qualitatively and quantitatively.

ps

arxiv_preprint video dataset pdf supplemental DOI Project Page [BibTex]

arxiv_preprint video dataset pdf supplemental DOI Project Page [BibTex]


{3D} Menagerie: Modeling the {3D} Shape and Pose of Animals
3D Menagerie: Modeling the 3D Shape and Pose of Animals

Zuffi, S., Kanazawa, A., Jacobs, D., Black, M. J.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, pages: 5524-5532, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

Abstract
There has been significant work on learning realistic, articulated, 3D models of the human body. In contrast, there are few such models of animals, despite many applications. The main challenge is that animals are much less cooperative than humans. The best human body models are learned from thousands of 3D scans of people in specific poses, which is infeasible with live animals. Consequently, we learn our model from a small set of 3D scans of toy figurines in arbitrary poses. We employ a novel part-based shape model to compute an initial registration to the scans. We then normalize their pose, learn a statistical shape model, and refine the registrations and the model together. In this way, we accurately align animal scans from different quadruped families with very different shapes and poses. With the registration to a common template we learn a shape space representing animals including lions, cats, dogs, horses, cows and hippos. Animal shapes can be sampled from the model, posed, animated, and fit to data. We demonstrate generalization by fitting it to images of real animals including species not seen in training.

ps

pdf video Project Page [BibTex]

pdf video Project Page [BibTex]


Optical Flow Estimation using a Spatial Pyramid Network
Optical Flow Estimation using a Spatial Pyramid Network

Ranjan, A., Black, M.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

Abstract
We learn to compute optical flow by combining a classical spatial-pyramid formulation with deep learning. This estimates large motions in a coarse-to-fine approach by warping one image of a pair at each pyramid level by the current flow estimate and computing an update to the flow. Instead of the standard minimization of an objective function at each pyramid level, we train one deep network per level to compute the flow update. Unlike the recent FlowNet approach, the networks do not need to deal with large motions; these are dealt with by the pyramid. This has several advantages. First, our Spatial Pyramid Network (SPyNet) is much simpler and 96% smaller than FlowNet in terms of model parameters. This makes it more efficient and appropriate for embedded applications. Second, since the flow at each pyramid level is small (< 1 pixel), a convolutional approach applied to pairs of warped images is appropriate. Third, unlike FlowNet, the learned convolution filters appear similar to classical spatio-temporal filters, giving insight into the method and how to improve it. Our results are more accurate than FlowNet on most standard benchmarks, suggesting a new direction of combining classical flow methods with deep learning.

ps

pdf SupMat project/code [BibTex]

pdf SupMat project/code [BibTex]


Multiple People Tracking by Lifted Multicut and Person Re-identification
Multiple People Tracking by Lifted Multicut and Person Re-identification

Tang, S., Andriluka, M., Andres, B., Schiele, B.

In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages: 3701-3710, IEEE Computer Society, Washington, DC, USA, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

ps

DOI Project Page [BibTex]

DOI Project Page [BibTex]


Video Propagation Networks
Video Propagation Networks

Jampani, V., Gadde, R., Gehler, P. V.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

ps

pdf supplementary arXiv project page code Project Page [BibTex]

pdf supplementary arXiv project page code Project Page [BibTex]


no image
Dynamic Time-of-Flight

Schober, M., Adam, A., Yair, O., Mazor, S., Nowozin, S.

Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, pages: 170-179, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (conference)

ei pn

DOI [BibTex]

DOI [BibTex]


Generating Descriptions with Grounded and Co-Referenced People
Generating Descriptions with Grounded and Co-Referenced People

Rohrbach, A., Rohrbach, M., Tang, S., Oh, S. J., Schiele, B.

In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages: 4196-4206, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

ps

PDF DOI Project Page [BibTex]

PDF DOI Project Page [BibTex]


Semantic Multi-view Stereo: Jointly Estimating Objects and Voxels
Semantic Multi-view Stereo: Jointly Estimating Objects and Voxels

Ulusoy, A. O., Black, M. J., Geiger, A.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

Abstract
Dense 3D reconstruction from RGB images is a highly ill-posed problem due to occlusions, textureless or reflective surfaces, as well as other challenges. We propose object-level shape priors to address these ambiguities. Towards this goal, we formulate a probabilistic model that integrates multi-view image evidence with 3D shape information from multiple objects. Inference in this model yields a dense 3D reconstruction of the scene as well as the existence and precise 3D pose of the objects in it. Our approach is able to recover fine details not captured in the input shapes while defaulting to the input models in occluded regions where image evidence is weak. Due to its probabilistic nature, the approach is able to cope with the approximate geometry of the 3D models as well as input shapes that are not present in the scene. We evaluate the approach quantitatively on several challenging indoor and outdoor datasets.

avg ps

YouTube pdf suppmat Project Page [BibTex]

YouTube pdf suppmat Project Page [BibTex]


Deep representation learning for human motion prediction and classification
Deep representation learning for human motion prediction and classification

Bütepage, J., Black, M., Kragic, D., Kjellström, H.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

Abstract
Generative models of 3D human motion are often restricted to a small number of activities and can therefore not generalize well to novel movements or applications. In this work we propose a deep learning framework for human motion capture data that learns a generic representation from a large corpus of motion capture data and generalizes well to new, unseen, motions. Using an encoding-decoding network that learns to predict future 3D poses from the most recent past, we extract a feature representation of human motion. Most work on deep learning for sequence prediction focuses on video and speech. Since skeletal data has a different structure, we present and evaluate different network architectures that make different assumptions about time dependencies and limb correlations. To quantify the learned features, we use the output of different layers for action classification and visualize the receptive fields of the network units. Our method outperforms the recent state of the art in skeletal motion prediction even though these use action specific training data. Our results show that deep feedforward networks, trained from a generic mocap database, can successfully be used for feature extraction from human motion data and that this representation can be used as a foundation for classification and prediction.

ps

arXiv Project Page [BibTex]

arXiv Project Page [BibTex]


Unite the People: Closing the Loop Between 3D and 2D Human Representations
Unite the People: Closing the Loop Between 3D and 2D Human Representations

Lassner, C., Romero, J., Kiefel, M., Bogo, F., Black, M. J., Gehler, P. V.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

Abstract
3D models provide a common ground for different representations of human bodies. In turn, robust 2D estimation has proven to be a powerful tool to obtain 3D fits “in-the-wild”. However, depending on the level of detail, it can be hard to impossible to acquire labeled data for training 2D estimators on large scale. We propose a hybrid approach to this problem: with an extended version of the recently introduced SMPLify method, we obtain high quality 3D body model fits for multiple human pose datasets. Human annotators solely sort good and bad fits. This procedure leads to an initial dataset, UP-3D, with rich annotations. With a comprehensive set of experiments, we show how this data can be used to train discriminative models that produce results with an unprecedented level of detail: our models predict 31 segments and 91 landmark locations on the body. Using the 91 landmark pose estimator, we present state-of-the art results for 3D human pose and shape estimation using an order of magnitude less training data and without assumptions about gender or pose in the fitting procedure. We show that UP-3D can be enhanced with these improved fits to grow in quantity and quality, which makes the system deployable on large scale. The data, code and models are available for research purposes.

ps

arXiv project/code/data Project Page [BibTex]

arXiv project/code/data Project Page [BibTex]


Locomotion of light-driven soft microrobots through a hydrogel via local melting
Locomotion of light-driven soft microrobots through a hydrogel via local melting

Palagi, S., Mark, A. G., Melde, K., Qiu, T., Zeng, H., Parmeggiani, C., Martella, D., Wiersma, D. S., Fischer, P.

In 2017 International Conference on Manipulation, Automation and Robotics at Small Scales (MARSS), pages: 1-5, July 2017 (inproceedings)

Abstract
Soft mobile microrobots whose deformation can be directly controlled by an external field can adapt to move in different environments. This is the case for the light-driven microrobots based on liquid-crystal elastomers (LCEs). Here we show that the soft microrobots can move through an agarose hydrogel by means of light-controlled travelling-wave motions. This is achieved by exploiting the inherent rise of the LCE temperature above the melting temperature of the agarose gel, which facilitates penetration of the microrobot through the hydrogel. The locomotion performance is investigated as a function of the travelling-wave parameters, showing that effective propulsion can be obtained by adapting the generated motion to the specific environmental conditions.

pf

DOI [BibTex]

DOI [BibTex]


Virtual vs. {R}eal: Trading Off Simulations and Physical Experiments in Reinforcement Learning with {B}ayesian Optimization
Virtual vs. Real: Trading Off Simulations and Physical Experiments in Reinforcement Learning with Bayesian Optimization

Marco, A., Berkenkamp, F., Hennig, P., Schoellig, A. P., Krause, A., Schaal, S., Trimpe, S.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages: 1557-1563, IEEE, Piscataway, NJ, USA, IEEE International Conference on Robotics and Automation (ICRA), May 2017 (inproceedings)

am ics pn

PDF arXiv ICRA 2017 Spotlight presentation Virtual vs. Real - Video explanation DOI Project Page [BibTex]

PDF arXiv ICRA 2017 Spotlight presentation Virtual vs. Real - Video explanation DOI Project Page [BibTex]


Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets
Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets

Klein, A., Falkner, S., Bartels, S., Hennig, P., Hutter, F.

Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS 2017), 54, pages: 528-536, Proceedings of Machine Learning Research, (Editors: Sign, Aarti and Zhu, Jerry), PMLR, April 2017 (conference)

pn

pdf link (url) Project Page [BibTex]

pdf link (url) Project Page [BibTex]


Wireless micro-robots for endoscopic applications in urology
Wireless micro-robots for endoscopic applications in urology

Adams, F., Qiu, T., Mark, A. G., Melde, K., Palagi, S., Miernik, A., Fischer, P.

In Eur Urol Suppl, 16(3):e1914, March 2017 (inproceedings)

Abstract
Endoscopy is an essential and common method for both diagnostics and therapy in Urology. Current flexible endoscope is normally cable-driven, thus it is hard to be miniaturized and its reachability is restricted as only one bending section near the tip with one degree of freedom (DoF) is allowed. Recent progresses in micro-robotics offer a unique opportunity for medical inspections in minimally invasive surgery. Micro-robots are active devices that has a feature size smaller than one millimeter and can normally be actuated and controlled wirelessly. Magnetically actuated micro-robots have been demonstrated to propel through biological fluids.Here, we report a novel micro robotic arm, which is actuated wirelessly by ultrasound. It works as a miniaturized endoscope with a side length of ~1 mm, which fits through the 3 Fr. tool channel of a cystoscope, and successfully performs an active cystoscopy in a rabbit bladder.

pf

link (url) DOI [BibTex]


Towards Accurate Marker-less Human Shape and Pose Estimation over Time
Towards Accurate Marker-less Human Shape and Pose Estimation over Time

Huang, Y., Bogo, F., Lassner, C., Kanazawa, A., Gehler, P. V., Romero, J., Akhter, I., Black, M. J.

In International Conference on 3D Vision (3DV), pages: 421-430, 2017 (inproceedings)

Abstract
Existing markerless motion capture methods often assume known backgrounds, static cameras, and sequence specific motion priors, limiting their application scenarios. Here we present a fully automatic method that, given multiview videos, estimates 3D human pose and body shape. We take the recently proposed SMPLify method [12] as the base method and extend it in several ways. First we fit a 3D human body model to 2D features detected in multi-view images. Second, we use a CNN method to segment the person in each image and fit the 3D body model to the contours, further improving accuracy. Third we utilize a generic and robust DCT temporal prior to handle the left and right side swapping issue sometimes introduced by the 2D pose estimator. Validation on standard benchmarks shows our results are comparable to the state of the art and also provide a realistic 3D shape avatar. We also demonstrate accurate results on HumanEva and on challenging monocular sequences of dancing from YouTube.

ps

Code pdf DOI Project Page [BibTex]


no image
An automatic method for discovering rational heuristics for risky choice

Lieder, F., Krueger, P. M., Griffiths, T. L.

In Proceedings of the 39th Annual Meeting of the Cognitive Science Society. Austin, TX: Cognitive Science Society, 2017, Falk Lieder and Paul M. Krueger contributed equally to this publication. (inproceedings)

Abstract
What is the optimal way to make a decision given that your time is limited and your cognitive resources are bounded? To answer this question, we formalized the bounded optimal decision process as the solution to a meta-level Markov decision process whose actions are costly computations. We approximated the optimal solution and evaluated its predictions against human choice behavior in the Mouselab paradigm, which is widely used to study decision strategies. Our computational method rediscovered well-known heuristic strategies and the conditions under which they are used, as well as novel heuristics. A Mouselab experiment confirmed our model’s main predictions. These findings are a proof-of-concept that optimal cognitive strategies can be automatically derived as the rational use of finite time and bounded cognitive resources.

re

Project Page [BibTex]

Project Page [BibTex]


no image
A reward shaping method for promoting metacognitive learning

Lieder, F., Krueger, P. M., Callaway, F., Griffiths, T. L.

In Proceedings of the Third Multidisciplinary Conference on Reinforcement Learning and Decision-Making, 2017 (inproceedings)

re

Project Page [BibTex]

Project Page [BibTex]


no image
When does bounded-optimal metareasoning favor few cognitive systems?

Milli, S., Lieder, F., Griffiths, T. L.

In AAAI Conference on Artificial Intelligence, 31, 2017 (inproceedings)

re

[BibTex]

[BibTex]


no image
The Structure of Goal Systems Predicts Human Performance

Bourgin, D., Lieder, F., Reichman, D., Talmon, N., Griffiths, T.

In Proceedings of the 39th Annual Meeting of the Cognitive Science Society, 2017 (inproceedings)

re

[BibTex]

[BibTex]


no image
Learning to (mis) allocate control: maltransfer can lead to self-control failure

Bustamante, L., Lieder, F., Musslick, S., Shenhav, A., Cohen, J.

In The 3rd Multidisciplinary Conference on Reinforcement Learning and Decision Making. Ann Arbor, Michigan, 2017 (inproceedings)

re

[BibTex]

[BibTex]


no image
Mouselab-MDP: A new paradigm for tracing how people plan

Callaway, F., Lieder, F., Krueger, P. M., Griffiths, T. L.

In The 3rd multidisciplinary conference on reinforcement learning and decision making, 2017 (inproceedings)

re

[BibTex]

[BibTex]


no image
Enhancing metacognitive reinforcement learning using reward structures and feedback

Krueger, P. M., Lieder, F., Griffiths, T. L.

In Proceedings of the 39th Annual Meeting of the Cognitive Science Society, 2017 (inproceedings)

re

Project Page Project Page [BibTex]

Project Page Project Page [BibTex]


no image
Helping people choose subgoals with sparse pseudo rewards

Callaway, F., Lieder, F., Griffiths, T. L.

In Proceedings of the Third Multidisciplinary Conference on Reinforcement Learning and Decision Making, 2017 (inproceedings)

re

[BibTex]

[BibTex]

2014


Hough-based Object Detection with Grouped Features
Hough-based Object Detection with Grouped Features

Srikantha, A., Gall, J.

International Conference on Image Processing, pages: 1653-1657, Paris, France, IEEE International Conference on Image Processing , October 2014 (conference)

Abstract
Hough-based voting approaches have been successfully applied to object detection. While these methods can be efficiently implemented by random forests, they estimate the probability for an object hypothesis for each feature independently. In this work, we address this problem by grouping features in a local neighborhood to obtain a better estimate of the probability. To this end, we propose oblique classification-regression forests that combine features of different trees. We further investigate the benefit of combining independent and grouped features and evaluate the approach on RGB and RGB-D datasets.

ps

pdf poster DOI Project Page [BibTex]

2014


pdf poster DOI Project Page [BibTex]


Omnidirectional 3D Reconstruction in Augmented Manhattan Worlds
Omnidirectional 3D Reconstruction in Augmented Manhattan Worlds

Schoenbein, M., Geiger, A.

International Conference on Intelligent Robots and Systems, pages: 716 - 723, IEEE, Chicago, IL, USA, IEEE/RSJ International Conference on Intelligent Robots and System, October 2014 (conference)

Abstract
This paper proposes a method for high-quality omnidirectional 3D reconstruction of augmented Manhattan worlds from catadioptric stereo video sequences. In contrast to existing works we do not rely on constructing virtual perspective views, but instead propose to optimize depth jointly in a unified omnidirectional space. Furthermore, we show that plane-based prior models can be applied even though planes in 3D do not project to planes in the omnidirectional domain. Towards this goal, we propose an omnidirectional slanted-plane Markov random field model which relies on plane hypotheses extracted using a novel voting scheme for 3D planes in omnidirectional space. To quantitatively evaluate our method we introduce a dataset which we have captured using our autonomous driving platform AnnieWAY which we equipped with two horizontally aligned catadioptric cameras and a Velodyne HDL-64E laser scanner for precise ground truth depth measurements. As evidenced by our experiments, the proposed method clearly benefits from the unified view and significantly outperforms existing stereo matching techniques both quantitatively and qualitatively. Furthermore, our method is able to reduce noise and the obtained depth maps can be represented very compactly by a small number of image segments and plane parameters.

avg ps

pdf DOI [BibTex]

pdf DOI [BibTex]


{Image-based 4-d Reconstruction Using 3-d Change Detection}
Image-based 4-d Reconstruction Using 3-d Change Detection

Ulusoy, A. O., Mundy, J. L.

In Computer Vision – ECCV 2014, pages: 31-45, Lecture Notes in Computer Science, (Editors: D. Fleet and T. Pajdla and B. Schiele and T. Tuytelaars ), Springer International Publishing, 13th European Conference on Computer Vision, September 2014 (inproceedings)

Abstract
This paper describes an approach to reconstruct the complete history of a 3-d scene over time from imagery. The proposed approach avoids rebuilding 3-d models of the scene at each time instant. Instead, the approach employs an initial 3-d model which is continuously updated with changes in the environment to form a full 4-d representation. This updating scheme is enabled by a novel algorithm that infers 3-d changes with respect to the model at one time step from images taken at a subsequent time step. This algorithm can effectively detect changes even when the illumination conditions between image collections are significantly different. The performance of the proposed framework is demonstrated on four challenging datasets in terms of 4-d modeling accuracy as well as quantitative evaluation of 3-d change detection.

ps

video pdf supplementary DOI [BibTex]

video pdf supplementary DOI [BibTex]


Human Pose Estimation with Fields of Parts
Human Pose Estimation with Fields of Parts

Kiefel, M., Gehler, P.

In Computer Vision – ECCV 2014, LNCS 8693, pages: 331-346, Lecture Notes in Computer Science, (Editors: Fleet, David and Pajdla, Tomas and Schiele, Bernt and Tuytelaars, Tinne), Springer, 13th European Conference on Computer Vision, September 2014 (inproceedings)

Abstract
This paper proposes a new formulation of the human pose estimation problem. We present the Fields of Parts model, a binary Conditional Random Field model designed to detect human body parts of articulated people in single images. The Fields of Parts model is inspired by the idea of Pictorial Structures, it models local appearance and joint spatial configuration of the human body. However the underlying graph structure is entirely different. The idea is simple: we model the presence and absence of a body part at every possible position, orientation, and scale in an image with a binary random variable. This results into a vast number of random variables, however, we show that approximate inference in this model is efficient. Moreover we can encode the very same appearance and spatial structure as in Pictorial Structures models. This approach allows us to combine ideas from segmentation and pose estimation into a single model. The Fields of Parts model can use evidence from the background, include local color information, and it is connected more densely than a kinematic chain structure. On the challenging Leeds Sports Poses dataset we improve over the Pictorial Structures counterpart by 5.5% in terms of Average Precision of Keypoints (APK).

ei ps

website pdf DOI Project Page [BibTex]

website pdf DOI Project Page [BibTex]


Capturing Hand Motion with an RGB-D Sensor, Fusing a Generative Model with Salient Points
Capturing Hand Motion with an RGB-D Sensor, Fusing a Generative Model with Salient Points

Tzionas, D., Srikantha, A., Aponte, P., Gall, J.

In German Conference on Pattern Recognition (GCPR), pages: 1-13, Lecture Notes in Computer Science, Springer, GCPR, September 2014 (inproceedings)

Abstract
Hand motion capture has been an active research topic in recent years, following the success of full-body pose tracking. Despite similarities, hand tracking proves to be more challenging, characterized by a higher dimensionality, severe occlusions and self-similarity between fingers. For this reason, most approaches rely on strong assumptions, like hands in isolation or expensive multi-camera systems, that limit the practical use. In this work, we propose a framework for hand tracking that can capture the motion of two interacting hands using only a single, inexpensive RGB-D camera. Our approach combines a generative model with collision detection and discriminatively learned salient points. We quantitatively evaluate our approach on 14 new sequences with challenging interactions.

ps

pdf Supplementary pdf Supplementary Material Project Page DOI Project Page [BibTex]

pdf Supplementary pdf Supplementary Material Project Page DOI Project Page [BibTex]


{OpenDR}: An Approximate Differentiable Renderer
OpenDR: An Approximate Differentiable Renderer

Loper, M. M., Black, M. J.

In Computer Vision – ECCV 2014, 8695, pages: 154-169, Lecture Notes in Computer Science, (Editors: D. Fleet and T. Pajdla and B. Schiele and T. Tuytelaars ), Springer International Publishing, 13th European Conference on Computer Vision, September 2014 (inproceedings)

Abstract
Inverse graphics attempts to take sensor data and infer 3D geometry, illumination, materials, and motions such that a graphics renderer could realistically reproduce the observed scene. Renderers, however, are designed to solve the forward process of image synthesis. To go in the other direction, we propose an approximate di fferentiable renderer (DR) that explicitly models the relationship between changes in model parameters and image observations. We describe a publicly available OpenDR framework that makes it easy to express a forward graphics model and then automatically obtain derivatives with respect to the model parameters and to optimize over them. Built on a new autodiff erentiation package and OpenGL, OpenDR provides a local optimization method that can be incorporated into probabilistic programming frameworks. We demonstrate the power and simplicity of programming with OpenDR by using it to solve the problem of estimating human body shape from Kinect depth and RGB data.

ps

pdf Code Chumpy Supplementary video of talk DOI Project Page [BibTex]

pdf Code Chumpy Supplementary video of talk DOI Project Page [BibTex]


Discovering Object Classes from Activities
Discovering Object Classes from Activities

Srikantha, A., Gall, J.

In European Conference on Computer Vision, 8694, pages: 415-430, Lecture Notes in Computer Science, (Editors: D. Fleet and T. Pajdla and B. Schiele and T. Tuytelaars ), Springer International Publishing, 13th European Conference on Computer Vision, September 2014 (inproceedings)

Abstract
In order to avoid an expensive manual labeling process or to learn object classes autonomously without human intervention, object discovery techniques have been proposed that extract visual similar objects from weakly labelled videos. However, the problem of discovering small or medium sized objects is largely unexplored. We observe that videos with activities involving human-object interactions can serve as weakly labelled data for such cases. Since neither object appearance nor motion is distinct enough to discover objects in these videos, we propose a framework that samples from a space of algorithms and their parameters to extract sequences of object proposals. Furthermore, we model similarity of objects based on appearance and functionality, which is derived from human and object motion. We show that functionality is an important cue for discovering objects from activities and demonstrate the generality of the model on three challenging RGB-D and RGB datasets.

ps

pdf anno poster DOI Project Page [BibTex]

pdf anno poster DOI Project Page [BibTex]


Probabilistic Progress Bars
Probabilistic Progress Bars

Kiefel, M., Schuler, C., Hennig, P.

In Conference on Pattern Recognition (GCPR), 8753, pages: 331-341, Lecture Notes in Computer Science, (Editors: Jiang, X., Hornegger, J., and Koch, R.), Springer, GCPR, September 2014 (inproceedings)

Abstract
Predicting the time at which the integral over a stochastic process reaches a target level is a value of interest in many applications. Often, such computations have to be made at low cost, in real time. As an intuitive example that captures many features of this problem class, we choose progress bars, a ubiquitous element of computer user interfaces. These predictors are usually based on simple point estimators, with no error modelling. This leads to fluctuating behaviour confusing to the user. It also does not provide a distribution prediction (risk values), which are crucial for many other application areas. We construct and empirically evaluate a fast, constant cost algorithm using a Gauss-Markov process model which provides more information to the user.

ei ps pn

website+code pdf DOI [BibTex]

website+code pdf DOI [BibTex]


Optical Flow Estimation with Channel Constancy
Optical Flow Estimation with Channel Constancy

Sevilla-Lara, L., Sun, D., Learned-Miller, E. G., Black, M. J.

In Computer Vision – ECCV 2014, 8689, pages: 423-438, Lecture Notes in Computer Science, (Editors: D. Fleet and T. Pajdla and B. Schiele and T. Tuytelaars ), Springer International Publishing, 13th European Conference on Computer Vision, September 2014 (inproceedings)

Abstract
Large motions remain a challenge for current optical flow algorithms. Traditionally, large motions are addressed using multi-resolution representations like Gaussian pyramids. To deal with large displacements, many pyramid levels are needed and, if an object is small, it may be invisible at the highest levels. To address this we decompose images using a channel representation (CR) and replace the standard brightness constancy assumption with a descriptor constancy assumption. CRs can be seen as an over-segmentation of the scene into layers based on some image feature. If the appearance of a foreground object differs from the background then its descriptor will be different and they will be represented in different layers.We create a pyramid by smoothing these layers, without mixing foreground and background or losing small objects. Our method estimates more accurate flow than the baseline on the MPI-Sintel benchmark, especially for fast motions and near motion boundaries.

ps

pdf DOI [BibTex]

pdf DOI [BibTex]


Modeling Blurred Video with Layers
Modeling Blurred Video with Layers

Wulff, J., Black, M. J.

In Computer Vision – ECCV 2014, 8694, pages: 236-252, Lecture Notes in Computer Science, (Editors: D. Fleet and T. Pajdla and B. Schiele and T. Tuytelaars ), Springer International Publishing, 13th European Conference on Computer Vision, September 2014 (inproceedings)

Abstract
Videos contain complex spatially-varying motion blur due to the combination of object motion, camera motion, and depth variation with fi nite shutter speeds. Existing methods to estimate optical flow, deblur the images, and segment the scene fail in such cases. In particular, boundaries between di fferently moving objects cause problems, because here the blurred images are a combination of the blurred appearances of multiple surfaces. We address this with a novel layered model of scenes in motion. From a motion-blurred video sequence, we jointly estimate the layer segmentation and each layer's appearance and motion. Since the blur is a function of the layer motion and segmentation, it is completely determined by our generative model. Given a video, we formulate the optimization problem as minimizing the pixel error between the blurred frames and images synthesized from the model, and solve it using gradient descent. We demonstrate our approach on synthetic and real sequences.

ps

pdf Supplemental Video Data DOI Project Page Project Page [BibTex]

pdf Supplemental Video Data DOI Project Page Project Page [BibTex]


Intrinsic Video
Intrinsic Video

Kong, N., Gehler, P. V., Black, M. J.

In Computer Vision – ECCV 2014, 8690, pages: 360-375, Lecture Notes in Computer Science, (Editors: D. Fleet and T. Pajdla and B. Schiele and T. Tuytelaars ), Springer International Publishing, 13th European Conference on Computer Vision, September 2014 (inproceedings)

Abstract
Intrinsic images such as albedo and shading are valuable for later stages of visual processing. Previous methods for extracting albedo and shading use either single images or images together with depth data. Instead, we define intrinsic video estimation as the problem of extracting temporally coherent albedo and shading from video alone. Our approach exploits the assumption that albedo is constant over time while shading changes slowly. Optical flow aids in the accurate estimation of intrinsic video by providing temporal continuity as well as putative surface boundaries. Additionally, we find that the estimated albedo sequence can be used to improve optical flow accuracy in sequences with changing illumination. The approach makes only weak assumptions about the scene and we show that it substantially outperforms existing single-frame intrinsic image methods. We evaluate this quantitatively on synthetic sequences as well on challenging natural sequences with complex geometry, motion, and illumination.

ps

pdf Supplementary Video DOI Project Page Project Page [BibTex]

pdf Supplementary Video DOI Project Page Project Page [BibTex]