Header logo is


2015


Thumb xl objs2acts
Linking Objects to Actions: Encoding of Target Object and Grasping Strategy in Primate Ventral Premotor Cortex

Vargas-Irwin, C. E., Franquemont, L., Black, M. J., Donoghue, J. P.

Journal of Neuroscience, 35(30):10888-10897, July 2015 (article)

Abstract
Neural activity in ventral premotor cortex (PMv) has been associated with the process of matching perceived objects with the motor commands needed to grasp them. It remains unclear how PMv networks can flexibly link percepts of objects affording multiple grasp options into a final desired hand action. Here, we use a relational encoding approach to track the functional state of PMv neuronal ensembles in macaque monkeys through the process of passive viewing, grip planning, and grasping movement execution. We used objects affording multiple possible grip strategies. The task included separate instructed delay periods for object presentation and grip instruction. This approach allowed us to distinguish responses elicited by the visual presentation of the objects from those associated with selecting a given motor plan for grasping. We show that PMv continuously incorporates information related to object shape and grip strategy as it becomes available, revealing a transition from a set of ensemble states initially most closely related to objects, to a new set of ensemble patterns reflecting unique object-grip combinations. These results suggest that PMv dynamically combines percepts, gradually navigating toward activity patterns associated with specific volitional actions, rather than directly mapping perceptual object properties onto categorical grip representations. Our results support the idea that PMv is part of a network that dynamically computes motor plans from perceptual information. Significance Statement: The present work demonstrates that the activity of groups of neurons in primate ventral premotor cortex reflects information related to visually presented objects, as well as the motor strategy used to grasp them, linking individual objects to multiple possible grips. PMv could provide useful control signals for neuroprosthetic assistive devices designed to interact with objects in a flexible way.

ps

publisher link DOI Project Page [BibTex]

2015


publisher link DOI Project Page [BibTex]


Thumb xl silviateaser
The Stitched Puppet: A Graphical Model of 3D Human Shape and Pose

Zuffi, S., Black, M. J.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2015), pages: 3537-3546, June 2015 (inproceedings)

Abstract
We propose a new 3D model of the human body that is both realistic and part-based. The body is represented by a graphical model in which nodes of the graph correspond to body parts that can independently translate and rotate in 3D as well as deform to capture pose-dependent shape variations. Pairwise potentials define a “stitching cost” for pulling the limbs apart, giving rise to the stitched puppet model (SPM). Unlike existing realistic 3D body models, the distributed representation facilitates inference by allowing the model to more effectively explore the space of poses, much like existing 2D pictorial structures models. We infer pose and body shape using a form of particle-based max-product belief propagation. This gives the SPM the realism of recent 3D body models with the computational advantages of part-based models. We apply the SPM to two challenging problems involving estimating human shape and pose from 3D data. The first is the FAUST mesh alignment challenge (http://faust.is.tue.mpg.de/), where ours is the first method to successfully align all 3D meshes. The second involves estimating pose and shape from crude visual hull representations of complex body movements.

ps

pdf Extended Abstract poster code/project video DOI Project Page [BibTex]

pdf Extended Abstract poster code/project video DOI Project Page [BibTex]


Thumb xl img displet
Displets: Resolving Stereo Ambiguities using Object Knowledge

Güney, F., Geiger, A.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) 2015, pages: 4165-4175, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2015 (inproceedings)

Abstract
Stereo techniques have witnessed tremendous progress over the last decades, yet some aspects of the problem still remain challenging today. Striking examples are reflecting and textureless surfaces which cannot easily be recovered using traditional local regularizers. In this paper, we therefore propose to regularize over larger distances using object-category specific disparity proposals (displets) which we sample using inverse graphics techniques based on a sparse disparity estimate and a semantic segmentation of the image. The proposed displets encode the fact that objects of certain categories are not arbitrarily shaped but typically exhibit regular structures. We integrate them as non-local regularizer for the challenging object class 'car' into a superpixel based CRF framework and demonstrate its benefits on the KITTI stereo evaluation.

avg ps

pdf abstract suppmat [BibTex]

pdf abstract suppmat [BibTex]


Thumb xl img sceneflow
Object Scene Flow for Autonomous Vehicles

Menze, M., Geiger, A.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) 2015, pages: 3061-3070, IEEE, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2015 (inproceedings)

Abstract
This paper proposes a novel model and dataset for 3D scene flow estimation with an application to autonomous driving. Taking advantage of the fact that outdoor scenes often decompose into a small number of independently moving objects, we represent each element in the scene by its rigid motion parameters and each superpixel by a 3D plane as well as an index to the corresponding object. This minimal representation increases robustness and leads to a discrete-continuous CRF where the data term decomposes into pairwise potentials between superpixels and objects. Moreover, our model intrinsically segments the scene into its constituting dynamic components. We demonstrate the performance of our model on existing benchmarks as well as a novel realistic dataset with scene flow ground truth. We obtain this dataset by annotating 400 dynamic scenes from the KITTI raw data collection using detailed 3D CAD models for all vehicles in motion. Our experiments also reveal novel challenges which can't be handled by existing methods.

avg ps

pdf abstract suppmat DOI [BibTex]

pdf abstract suppmat DOI [BibTex]


Thumb xl ijazteaser
Pose-Conditioned Joint Angle Limits for 3D Human Pose Reconstruction

Akhter, I., Black, M. J.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2015), pages: 1446-1455, June 2015 (inproceedings)

Abstract
The estimation of 3D human pose from 2D joint locations is central to many vision problems involving the analysis of people in images and video. To address the fact that the problem is inherently ill posed, many methods impose a prior over human poses. Unfortunately these priors admit invalid poses because they do not model how joint-limits vary with pose. Here we make two key contributions. First, we collected a motion capture dataset that explores a wide range of human poses. From this we learn a pose-dependent model of joint limits that forms our prior. The dataset and the prior will be made publicly available. Second, we define a general parameterization of body pose and a new, multistage, method to estimate 3D pose from 2D joint locations that uses an over-complete dictionary of human poses. Our method shows good generalization while avoiding impossible poses. We quantitatively compare our method with recent work and show state-of-the-art results on 2D to 3D pose estimation using the CMU mocap dataset. We also show superior results on manual annotations on real images and automatic part-based detections on the Leeds sports pose dataset.

ps

pdf Extended Abstract video project/data/code poster DOI Project Page Project Page [BibTex]

pdf Extended Abstract video project/data/code poster DOI Project Page Project Page [BibTex]


Thumb xl jonasteaser
Efficient Sparse-to-Dense Optical Flow Estimation using a Learned Basis and Layers

Wulff, J., Black, M. J.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2015), pages: 120-130, June 2015 (inproceedings)

Abstract
We address the elusive goal of estimating optical flow both accurately and efficiently by adopting a sparse-to-dense approach. Given a set of sparse matches, we regress to dense optical flow using a learned set of full-frame basis flow fields. We learn the principal components of natural flow fields using flow computed from four Hollywood movies. Optical flow fields are then compactly approximated as a weighted sum of the basis flow fields. Our new PCA-Flow algorithm robustly estimates these weights from sparse feature matches. The method runs in under 300ms/frame on the MPI-Sintel dataset using a single CPU and is more accurate and significantly faster than popular methods such as LDOF and Classic+NL. The results, however, are too smooth for some applications. Consequently, we develop a novel sparse layered flow method in which each layer is represented by PCA-flow. Unlike existing layered methods, estimation is fast because it uses only sparse matches. We combine information from different layers into a dense flow field using an image-aware MRF. The resulting PCA-Layers method runs in 3.6s/frame, is significantly more accurate than PCA-flow and achieves state-of-the-art performance in occluded regions on MPI-Sintel.

ps

pdf Extended Abstract Supplemental Material Poster Code Project Page Project Page [BibTex]


Thumb xl teaser
Permutohedral Lattice CNNs

Kiefel, M., Jampani, V., Gehler, P. V.

In ICLR Workshop Track, ICLR, May 2015 (inproceedings)

Abstract
This paper presents a convolutional layer that is able to process sparse input features. As an example, for image recognition problems this allows an efficient filtering of signals that do not lie on a dense grid (like pixel position), but of more general features (such as color values). The presented algorithm makes use of the permutohedral lattice data structure. The permutohedral lattice was introduced to efficiently implement a bilateral filter, a commonly used image processing operation. Its use allows for a generalization of the convolution type found in current (spatial) convolutional network architectures.

ei ps

pdf link (url) [BibTex]

pdf link (url) [BibTex]


Thumb xl jampani15aistats teaser
Consensus Message Passing for Layered Graphical Models

Jampani, V., Eslami, S. M. A., Tarlow, D., Kohli, P., Winn, J.

In Eighteenth International Conference on Artificial Intelligence and Statistics (AISTATS), 38, pages: 425-433, JMLR Workshop and Conference Proceedings, Eighteenth International Conference on Artificial Intelligence and Statistics, May 2015 (inproceedings)

Abstract
Generative models provide a powerful framework for probabilistic reasoning. However, in many domains their use has been hampered by the practical difficulties of inference. This is particularly the case in computer vision, where models of the imaging process tend to be large, loopy and layered. For this reason bottom-up conditional models have traditionally dominated in such domains. We find that widely-used, general-purpose message passing inference algorithms such as Expectation Propagation (EP) and Variational Message Passing (VMP) fail on the simplest of vision models. With these models in mind, we introduce a modification to message passing that learns to exploit their layered structure by passing 'consensus' messages that guide inference towards good solutions. Experiments on a variety of problems show that the proposed technique leads to significantly more accurate inference results, not only when compared to standard EP and VMP, but also when compared to competitive bottom-up conditional models.

ps

online pdf supplementary link (url) [BibTex]

online pdf supplementary link (url) [BibTex]


Thumb xl silvia phd
Shape Models of the Human Body for Distributed Inference

Zuffi, S.

Brown University, May 2015 (phdthesis)

Abstract
In this thesis we address the problem of building shape models of the human body, in 2D and 3D, which are realistic and efficient to use. We focus our efforts on the human body, which is highly articulated and has interesting shape variations, but the approaches we present here can be applied to generic deformable and articulated objects. To address efficiency, we constrain our models to be part-based and have a tree-structured representation with pairwise relationships between connected parts. This allows the application of methods for distributed inference based on message passing. To address realism, we exploit recent advances in computer graphics that represent the human body with statistical shape models learned from 3D scans. We introduce two articulated body models, a 2D model, named Deformable Structures (DS), which is a contour-based model parameterized for 2D pose and projected shape, and a 3D model, named Stitchable Puppet (SP), which is a mesh-based model parameterized for 3D pose, pose-dependent deformations and intrinsic body shape. We have successfully applied the models to interesting and challenging problems in computer vision and computer graphics, namely pose estimation from static images, pose estimation from video sequences, pose and shape estimation from 3D scan data. This advances the state of the art in human pose and shape estimation and suggests that carefully de ned realistic models can be important for computer vision. More work at the intersection of vision and graphics is thus encouraged.

ps

PDF [BibTex]


Thumb xl screen shot 2015 10 14 at 08.57.57
Multi-view and 3D Deformable Part Models

Pepik, B., Stark, M., Gehler, P., Schiele, B.

Pattern Analysis and Machine Intelligence, 37(11):14, IEEE, March 2015 (article)

Abstract
As objects are inherently 3-dimensional, they have been modeled in 3D in the early days of computer vision. Due to the ambiguities arising from mapping 2D features to 3D models, 3D object representations have been neglected and 2D feature-based models are the predominant paradigm in object detection nowadays. While such models have achieved outstanding bounding box detection performance, they come with limited expressiveness, as they are clearly limited in their capability of reasoning about 3D shape or viewpoints. In this work, we bring the worlds of 3D and 2D object representations closer, by building an object detector which leverages the expressive power of 3D object representations while at the same time can be robustly matched to image evidence. To that end, we gradually extend the successful deformable part model [1] to include viewpoint information and part-level 3D geometry information, resulting in several different models with different level of expressiveness. We end up with a 3D object model, consisting of multiple object parts represented in 3D and a continuous appearance model. We experimentally verify that our models, while providing richer object hypotheses than the 2D object models, provide consistently better joint object localization and viewpoint estimation than the state-of-the-art multi-view and 3D object detectors on various benchmarks (KITTI [2], 3D object classes [3], Pascal3D+ [4], Pascal VOC 2007 [5], EPFL multi-view cars [6]).

ps

DOI Project Page [BibTex]

DOI Project Page [BibTex]


Thumb xl th teaser
From Scans to Models: Registration of 3D Human Shapes Exploiting Texture Information

Bogo, F.

University of Padova, March 2015 (phdthesis)

Abstract
New scanning technologies are increasing the importance of 3D mesh data, and of algorithms that can reliably register meshes obtained from multiple scans. Surface registration is important e.g. for building full 3D models from partial scans, identifying and tracking objects in a 3D scene, creating statistical shape models. Human body registration is particularly important for many applications, ranging from biomedicine and robotics to the production of movies and video games; but obtaining accurate and reliable registrations is challenging, given the articulated, non-rigidly deformable structure of the human body. In this thesis, we tackle the problem of 3D human body registration. We start by analyzing the current state of the art, and find that: a) most registration techniques rely only on geometric information, which is ambiguous on flat surface areas; b) there is a lack of adequate datasets and benchmarks in the field. We address both issues. Our contribution is threefold. First, we present a model-based registration technique for human meshes that combines geometry and surface texture information to provide highly accurate mesh-to-mesh correspondences. Our approach estimates scene lighting and surface albedo, and uses the albedo to construct a high-resolution textured 3D body model that is brought into registration with multi-camera image data using a robust matching term. Second, by leveraging our technique, we present FAUST (Fine Alignment Using Scan Texture), a novel dataset collecting 300 high-resolution scans of 10 people in a wide range of poses. FAUST is the first dataset providing both real scans and automatically computed, reliable "ground-truth" correspondences between them. Third, we explore possible uses of our approach in dermatology. By combining our registration technique with a melanocytic lesion segmentation algorithm, we propose a system that automatically detects new or evolving lesions over almost the entire body surface, thus helping dermatologists identify potential melanomas. We conclude this thesis investigating the benefits of using texture information to establish frame-to-frame correspondences in dynamic monocular sequences captured with consumer depth cameras. We outline a novel approach to reconstruct realistic body shape and appearance models from dynamic human performances, and show preliminary results on challenging sequences captured with a Kinect.

ps

[BibTex]


Thumb xl thesis teaser
Long Range Motion Estimation and Applications

Sevilla-Lara, L.

Long Range Motion Estimation and Applications, University of Massachusetts Amherst, University of Massachusetts Amherst, Febuary 2015 (phdthesis)

Abstract
Finding correspondences between images underlies many computer vision problems, such as optical flow, tracking, stereovision and alignment. Finding these correspondences involves formulating a matching function and optimizing it. This optimization process is often gradient descent, which avoids exhaustive search, but relies on the assumption of being in the basin of attraction of the right local minimum. This is often the case when the displacement is small, and current methods obtain very accurate results for small motions. However, when the motion is large and the matching function is bumpy this assumption is less likely to be true. One traditional way of avoiding this abruptness is to smooth the matching function spatially by blurring the images. As the displacement becomes larger, the amount of blur required to smooth the matching function becomes also larger. This averaging of pixels leads to a loss of detail in the image. Therefore, there is a trade-off between the size of the objects that can be tracked and the displacement that can be captured. In this thesis we address the basic problem of increasing the size of the basin of attraction in a matching function. We use an image descriptor called distribution fields (DFs). By blurring the images in DF space instead of in pixel space, we in- crease the size of the basin attraction with respect to traditional methods. We show competitive results using DFs both in object tracking and optical flow. Finally we demonstrate an application of capturing large motions for temporal video stitching.

ps

[BibTex]

[BibTex]


Thumb xl ssimssmall
Spike train SIMilarity Space (SSIMS): A framework for single neuron and ensemble data analysis

Vargas-Irwin, C. E., Brandman, D. M., Zimmermann, J. B., Donoghue, J. P., Black, M. J.

Neural Computation, 27(1):1-31, MIT Press, January 2015 (article)

Abstract
We present a method to evaluate the relative similarity of neural spiking patterns by combining spike train distance metrics with dimensionality reduction. Spike train distance metrics provide an estimate of similarity between activity patterns at multiple temporal resolutions. Vectors of pair-wise distances are used to represent the intrinsic relationships between multiple activity patterns at the level of single units or neuronal ensembles. Dimensionality reduction is then used to project the data into concise representations suitable for clustering analysis as well as exploratory visualization. Algorithm performance and robustness are evaluated using multielectrode ensemble activity data recorded in behaving primates. We demonstrate how Spike train SIMilarity Space (SSIMS) analysis captures the relationship between goal directions for an 8-directional reaching task and successfully segregates grasp types in a 3D grasping task in the absence of kinematic information. The algorithm enables exploration of virtually any type of neural spiking (time series) data, providing similarity-based clustering of neural activity states with minimal assumptions about potential information encoding models.

ps

pdf: publisher site pdf: author's proof DOI Project Page [BibTex]

pdf: publisher site pdf: author's proof DOI Project Page [BibTex]


Thumb xl untitled
Efficient Facade Segmentation using Auto-Context

Jampani, V., Gadde, R., Gehler, P. V.

In Applications of Computer Vision (WACV), 2015 IEEE Winter Conference on, pages: 1038-1045, IEEE, WACV,, January 2015 (inproceedings)

Abstract
In this paper we propose a system for the problem of facade segmentation. Building facades are highly structured images and consequently most methods that have been proposed for this problem, aim to make use of this strong prior information. We are describing a system that is almost domain independent and consists of standard segmentation methods. A sequence of boosted decision trees is stacked using auto-context features and learned using the stacked generalization technique. We find that this, albeit standard, technique performs better, or equals, all previous published empirical results on all available facade benchmark datasets. The proposed method is simple to implement, easy to extend, and very efficient at test time inference.

ps

website pdf supplementary IEEE page link (url) DOI Project Page [BibTex]

website pdf supplementary IEEE page link (url) DOI Project Page [BibTex]


Thumb xl flowcap im
FlowCap: 2D Human Pose from Optical Flow

Romero, J., Loper, M., Black, M. J.

In Pattern Recognition, Proc. 37th German Conference on Pattern Recognition (GCPR), LNCS 9358, pages: 412-423, Springer, GCPR, 2015 (inproceedings)

Abstract
We estimate 2D human pose from video using only optical flow. The key insight is that dense optical flow can provide information about 2D body pose. Like range data, flow is largely invariant to appearance but unlike depth it can be directly computed from monocular video. We demonstrate that body parts can be detected from dense flow using the same random forest approach used by the Microsoft Kinect. Unlike range data, however, when people stop moving, there is no optical flow and they effectively disappear. To address this, our FlowCap method uses a Kalman filter to propagate body part positions and ve- locities over time and a regression method to predict 2D body pose from part centers. No range sensor is required and FlowCap estimates 2D human pose from monocular video sources containing human motion. Such sources include hand-held phone cameras and archival television video. We demonstrate 2D body pose estimation in a range of scenarios and show that the method works with real-time optical flow. The results suggest that optical flow shares invariances with range data that, when complemented with tracking, make it valuable for pose estimation.

ps

video pdf preprint Project Page Project Page [BibTex]

video pdf preprint Project Page Project Page [BibTex]


Thumb xl thumb teaser mrg
Metric Regression Forests for Correspondence Estimation

Pons-Moll, G., Taylor, J., Shotton, J., Hertzmann, A., Fitzgibbon, A.

International Journal of Computer Vision, pages: 1-13, 2015 (article)

ps

springer PDF Project Page [BibTex]

springer PDF Project Page [BibTex]


Thumb xl geiger
Joint 3D Object and Layout Inference from a single RGB-D Image

(Best Paper Award)

Geiger, A., Wang, C.

In German Conference on Pattern Recognition (GCPR), 9358, pages: 183-195, Lecture Notes in Computer Science, Springer International Publishing, 2015 (inproceedings)

Abstract
Inferring 3D objects and the layout of indoor scenes from a single RGB-D image captured with a Kinect camera is a challenging task. Towards this goal, we propose a high-order graphical model and jointly reason about the layout, objects and superpixels in the image. In contrast to existing holistic approaches, our model leverages detailed 3D geometry using inverse graphics and explicitly enforces occlusion and visibility constraints for respecting scene properties and projective geometry. We cast the task as MAP inference in a factor graph and solve it efficiently using message passing. We evaluate our method with respect to several baselines on the challenging NYUv2 indoor dataset using 21 object categories. Our experiments demonstrate that the proposed method is able to infer scenes with a large degree of clutter and occlusions.

avg ps

pdf suppmat video project DOI [BibTex]

pdf suppmat video project DOI [BibTex]


Thumb xl screen shot 2015 05 07 at 11.56.54
3D Object Class Detection in the Wild

Pepik, B., Stark, M., Gehler, P., Ritschel, T., Schiele, B.

In Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE, Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2015 (inproceedings)

ps

Project Page [BibTex]

Project Page [BibTex]


Thumb xl menze
Discrete Optimization for Optical Flow

Menze, M., Heipke, C., Geiger, A.

In German Conference on Pattern Recognition (GCPR), 9358, pages: 16-28, Springer International Publishing, 2015 (inproceedings)

Abstract
We propose to look at large-displacement optical flow from a discrete point of view. Motivated by the observation that sub-pixel accuracy is easily obtained given pixel-accurate optical flow, we conjecture that computing the integral part is the hardest piece of the problem. Consequently, we formulate optical flow estimation as a discrete inference problem in a conditional random field, followed by sub-pixel refinement. Naive discretization of the 2D flow space, however, is intractable due to the resulting size of the label set. In this paper, we therefore investigate three different strategies, each able to reduce computation and memory demands by several orders of magnitude. Their combination allows us to estimate large-displacement optical flow both accurately and efficiently and demonstrates the potential of discrete optimization for optical flow. We obtain state-of-the-art performance on MPI Sintel and KITTI.

avg ps

pdf suppmat project DOI [BibTex]

pdf suppmat project DOI [BibTex]


Thumb xl isa
Joint 3D Estimation of Vehicles and Scene Flow

Menze, M., Heipke, C., Geiger, A.

In Proc. of the ISPRS Workshop on Image Sequence Analysis (ISA), 2015 (inproceedings)

Abstract
Three-dimensional reconstruction of dynamic scenes is an important prerequisite for applications like mobile robotics or autonomous driving. While much progress has been made in recent years, imaging conditions in natural outdoor environments are still very challenging for current reconstruction and recognition methods. In this paper, we propose a novel unified approach which reasons jointly about 3D scene flow as well as the pose, shape and motion of vehicles in the scene. Towards this goal, we incorporate a deformable CAD model into a slanted-plane conditional random field for scene flow estimation and enforce shape consistency between the rendered 3D models and the parameters of all superpixels in the image. The association of superpixels to objects is established by an index variable which implicitly enables model selection. We evaluate our approach on the challenging KITTI scene flow dataset in terms of object and scene flow estimation. Our results provide a prove of concept and demonstrate the usefulness of our method.

avg ps

PDF [BibTex]

PDF [BibTex]


Thumb xl subimage
Smooth Loops from Unconstrained Video

Sevilla-Lara, L., Wulff, J., Sunkavalli, K., Shechtman, E.

In Computer Graphics Forum (Proceedings of EGSR), 34(4):99-107, Eurographics Symposium on Rendering, 2015 (inproceedings)

Abstract
Converting unconstrained video sequences into videos that loop seamlessly is an extremely challenging problem. In this work, we take the first steps towards automating this process by focusing on an important subclass of videos containing a single dominant foreground object. Our technique makes two novel contributions over previous work: first, we propose a correspondence-based similarity metric to automatically identify a good transition point in the video where the appearance and dynamics of the foreground are most consistent. Second, we develop a technique that aligns both the foreground and background about this transition point using a combination of global camera path planning and patch-based video morphing. We demonstrate that this allows us to create natural, compelling, loopy videos from a wide range of videos collected from the internet.

ps

pdf link (url) DOI Project Page [BibTex]

pdf link (url) DOI Project Page [BibTex]

2012


Thumb xl pengthesisteaser
Virtual Human Bodies with Clothing and Hair: From Images to Animation

Guan, P.

Brown University, Department of Computer Science, December 2012 (phdthesis)

ps

pdf [BibTex]

2012


pdf [BibTex]


Thumb xl coregtr
Coregistration: Supplemental Material

Hirshberg, D., Loper, M., Rachlin, E., Black, M. J.

(No. 4), Max Planck Institute for Intelligent Systems, October 2012 (techreport)

ps

pdf [BibTex]

pdf [BibTex]


Thumb xl paperfig
Lie Bodies: A Manifold Representation of 3D Human Shape

Freifeld, O., Black, M. J.

In European Conf. on Computer Vision (ECCV), pages: 1-14, Part I, LNCS 7572, (Editors: A. Fitzgibbon et al. (Eds.)), Springer-Verlag, October 2012 (inproceedings)

Abstract
Three-dimensional object shape is commonly represented in terms of deformations of a triangular mesh from an exemplar shape. Existing models, however, are based on a Euclidean representation of shape deformations. In contrast, we argue that shape has a manifold structure: For example, summing the shape deformations for two people does not necessarily yield a deformation corresponding to a valid human shape, nor does the Euclidean difference of these two deformations provide a meaningful measure of shape dissimilarity. Consequently, we define a novel manifold for shape representation, with emphasis on body shapes, using a new Lie group of deformations. This has several advantages. First we define triangle deformations exactly, removing non-physical deformations and redundant degrees of freedom common to previous methods. Second, the Riemannian structure of Lie Bodies enables a more meaningful definition of body shape similarity by measuring distance between bodies on the manifold of body shape deformations. Third, the group structure allows the valid composition of deformations. This is important for models that factor body shape deformations into multiple causes or represent shape as a linear combination of basis shapes. Finally, body shape variation is modeled using statistics on manifolds. Instead of modeling Euclidean shape variation with Principal Component Analysis we capture shape variation on the manifold using Principal Geodesic Analysis. Our experiments show consistent visual and quantitative advantages of Lie Bodies over traditional Euclidean models of shape deformation and our representation can be easily incorporated into existing methods.

ps

pdf supplemental material youtube poster eigenshape video code Project Page Project Page Project Page [BibTex]

pdf supplemental material youtube poster eigenshape video code Project Page Project Page Project Page [BibTex]


Thumb xl coregteaser
Coregistration: Simultaneous alignment and modeling of articulated 3D shape

Hirshberg, D., Loper, M., Rachlin, E., Black, M.

In European Conf. on Computer Vision (ECCV), pages: 242-255, LNCS 7577, Part IV, (Editors: A. Fitzgibbon et al. (Eds.)), Springer-Verlag, October 2012 (inproceedings)

Abstract
Three-dimensional (3D) shape models are powerful because they enable the inference of object shape from incomplete, noisy, or ambiguous 2D or 3D data. For example, realistic parameterized 3D human body models have been used to infer the shape and pose of people from images. To train such models, a corpus of 3D body scans is typically brought into registration by aligning a common 3D human-shaped template to each scan. This is an ill-posed problem that typically involves solving an optimization problem with regularization terms that penalize implausible deformations of the template. When aligning a corpus, however, we can do better than generic regularization. If we have a model of how the template can deform then alignments can be regularized by this model. Constructing a model of deformations, however, requires having a corpus that is already registered. We address this chicken-and-egg problem by approaching modeling and registration together. By minimizing a single objective function, we reliably obtain high quality registration of noisy, incomplete, laser scans, while simultaneously learning a highly realistic articulated body model. The model greatly improves robustness to noise and missing data. Since the model explains a corpus of body scans, it captures how body shape varies across people and poses.

ps

pdf publisher site poster supplemental material (400MB) Project Page Project Page [BibTex]

pdf publisher site poster supplemental material (400MB) Project Page Project Page [BibTex]


Thumb xl lietr
Lie Bodies: A Manifold Representation of 3D Human Shape. Supplemental Material

Freifeld, O., Black, M. J.

(No. 5), Max Planck Institute for Intelligent Systems, October 2012 (techreport)

ps

pdf Project Page [BibTex]

pdf Project Page [BibTex]


Thumb xl posear
Coupled Action Recognition and Pose Estimation from Multiple Views

Yao, A., Gall, J., van Gool, L.

International Journal of Computer Vision, 100(1):16-37, October 2012 (article)

ps

publisher's site code pdf Project Page Project Page Project Page [BibTex]

publisher's site code pdf Project Page Project Page Project Page [BibTex]


Thumb xl sinteltr
MPI-Sintel Optical Flow Benchmark: Supplemental Material

Butler, D. J., Wulff, J., Stanley, G. B., Black, M. J.

(No. 6), Max Planck Institute for Intelligent Systems, October 2012 (techreport)

ps

pdf Project Page [BibTex]

pdf Project Page [BibTex]


Thumb xl sintelworkshop
Lessons and insights from creating a synthetic optical flow benchmark

Wulff, J., Butler, D. J., Stanley, G. B., Black, M. J.

In ECCV Workshop on Unsolved Problems in Optical Flow and Stereo Estimation, pages: 168-177, Part II, LNCS 7584, (Editors: A. Fusiello et al. (Eds.)), Springer-Verlag, October 2012 (inproceedings)

ps

pdf dataset poster youtube Project Page [BibTex]

pdf dataset poster youtube Project Page [BibTex]


Thumb xl tripod seq 16 054 part 3d vis
3D2PM – 3D Deformable Part Models

Pepik, B., Gehler, P., Stark, M., Schiele, B.

In Proceedings of the European Conference on Computer Vision (ECCV), pages: 356-370, Lecture Notes in Computer Science, (Editors: Fitzgibbon, Andrew W. and Lazebnik, Svetlana and Perona, Pietro and Sato, Yoichi and Schmid, Cordelia), Springer, Firenze, October 2012 (inproceedings)

ps

pdf video poster Project Page [BibTex]

pdf video poster Project Page [BibTex]


Thumb xl sinteleccv2012crop
A naturalistic open source movie for optical flow evaluation

Butler, D. J., Wulff, J., Stanley, G. B., Black, M. J.

In European Conf. on Computer Vision (ECCV), pages: 611-625, Part IV, LNCS 7577, (Editors: A. Fitzgibbon et al. (Eds.)), Springer-Verlag, October 2012 (inproceedings)

Abstract
Ground truth optical flow is difficult to measure in real scenes with natural motion. As a result, optical flow data sets are restricted in terms of size, complexity, and diversity, making optical flow algorithms difficult to train and test on realistic data. We introduce a new optical flow data set derived from the open source 3D animated short film Sintel. This data set has important features not present in the popular Middlebury flow evaluation: long sequences, large motions, specular reflections, motion blur, defocus blur, and atmospheric effects. Because the graphics data that generated the movie is open source, we are able to render scenes under conditions of varying complexity to evaluate where existing flow algorithms fail. We evaluate several recent optical flow algorithms and find that current highly-ranked methods on the Middlebury evaluation have difficulty with this more complex data set suggesting further research on optical flow estimation is needed. To validate the use of synthetic data, we compare the image- and flow-statistics of Sintel to those of real films and videos and show that they are similar. The data set, metrics, and evaluation website are publicly available.

ps

pdf dataset youtube talk supplemental material Project Page Project Page [BibTex]

pdf dataset youtube talk supplemental material Project Page Project Page [BibTex]


Thumb xl embs2012
A framework for relating neural activity to freely moving behavior

Foster, J. D., Nuyujukian, P., Freifeld, O., Ryu, S., Black, M. J., Shenoy, K. V.

In 34th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC’12), pages: 2736 -2739 , IEEE, San Diego, August 2012 (inproceedings)

ps

pdf Project Page [BibTex]

pdf Project Page [BibTex]


Thumb xl screen shot 2012 06 25 at 1.59.41 pm
Pottics – The Potts Topic Model for Semantic Image Segmentation

Dann, C., Gehler, P., Roth, S., Nowozin, S.

In Proceedings of 34th DAGM Symposium, pages: 397-407, Lecture Notes in Computer Science, (Editors: Pinz, Axel and Pock, Thomas and Bischof, Horst and Leberl, Franz), Springer, August 2012 (inproceedings)

ps

code pdf poster [BibTex]

code pdf poster [BibTex]


Thumb xl thumb hennigk2012
Quasi-Newton Methods: A New Direction

Hennig, P., Kiefel, M.

In Proceedings of the 29th International Conference on Machine Learning, pages: 25-32, ICML ’12, (Editors: John Langford and Joelle Pineau), Omnipress, New York, NY, USA, ICML, July 2012 (inproceedings)

Abstract
Four decades after their invention, quasi- Newton methods are still state of the art in unconstrained numerical optimization. Although not usually interpreted thus, these are learning algorithms that fit a local quadratic approximation to the objective function. We show that many, including the most popular, quasi-Newton methods can be interpreted as approximations of Bayesian linear regression under varying prior assumptions. This new notion elucidates some shortcomings of classical algorithms, and lights the way to a novel nonparametric quasi-Newton method, which is able to make more efficient use of available information at computational cost similar to its predecessors.

ei ps pn

website+code pdf link (url) [BibTex]

website+code pdf link (url) [BibTex]


Thumb xl representativecrop
DRAPE: DRessing Any PErson

Guan, P., Reiss, L., Hirshberg, D., Weiss, A., Black, M. J.

ACM Trans. on Graphics (Proc. SIGGRAPH), 31(4):35:1-35:10, July 2012 (article)

Abstract
We describe a complete system for animating realistic clothing on synthetic bodies of any shape and pose without manual intervention. The key component of the method is a model of clothing called DRAPE (DRessing Any PErson) that is learned from a physics-based simulation of clothing on bodies of different shapes and poses. The DRAPE model has the desirable property of "factoring" clothing deformations due to body shape from those due to pose variation. This factorization provides an approximation to the physical clothing deformation and greatly simplifies clothing synthesis. Given a parameterized model of the human body with known shape and pose parameters, we describe an algorithm that dresses the body with a garment that is customized to fit and possesses realistic wrinkles. DRAPE can be used to dress static bodies or animated sequences with a learned model of the cloth dynamics. Since the method is fully automated, it is appropriate for dressing large numbers of virtual characters of varying shape. The method is significantly more efficient than physical simulation.

ps

YouTube pdf talk Project Page Project Page [BibTex]

YouTube pdf talk Project Page Project Page [BibTex]


Thumb xl deqingthesisteaser
From Pixels to Layers: Joint Motion Estimation and Segmentation

Sun, D.

Brown University, Department of Computer Science, July 2012 (phdthesis)

ps

pdf [BibTex]

pdf [BibTex]


Thumb xl frompstods2
From pictorial structures to deformable structures

Zuffi, S., Freifeld, O., Black, M. J.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages: 3546-3553, IEEE, June 2012 (inproceedings)

Abstract
Pictorial Structures (PS) define a probabilistic model of 2D articulated objects in images. Typical PS models assume an object can be represented by a set of rigid parts connected with pairwise constraints that define the prior probability of part configurations. These models are widely used to represent non-rigid articulated objects such as humans and animals despite the fact that such objects have parts that deform non-rigidly. Here we define a new Deformable Structures (DS) model that is a natural extension of previous PS models and that captures the non-rigid shape deformation of the parts. Each part in a DS model is represented by a low-dimensional shape deformation space and pairwise potentials between parts capture how the shape varies with pose and the shape of neighboring parts. A key advantage of such a model is that it more accurately models object boundaries. This enables image likelihood models that are more discriminative than previous PS likelihoods. This likelihood is learned using training imagery annotated using a DS “puppet.” We focus on a human DS model learned from 2D projections of a realistic 3D human body model and use it to infer human poses in images using a form of non-parametric belief propagation.

ps

pdf sup mat code poster Project Page Project Page Project Page Project Page [BibTex]

pdf sup mat code poster Project Page Project Page Project Page Project Page [BibTex]


Thumb xl screen shot 2012 03 22 at 17.51.07
Teaching 3D Geometry to Deformable Part Models

Pepik, B., Stark, M., Gehler, P., Schiele, B.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages: 3362 -3369, IEEE, Providence, RI, USA, June 2012, oral presentation (inproceedings)

ps

pdf DOI Project Page [BibTex]

pdf DOI Project Page [BibTex]


Thumb xl jneuroscicrop
Visual Orientation and Directional Selectivity Through Thalamic Synchrony

Stanley, G., Jin, J., Wang, Y., Desbordes, G., Wang, Q., Black, M., Alonso, J.

Journal of Neuroscience, 32(26):9073-9088, June 2012 (article)

Abstract
Thalamic neurons respond to visual scenes by generating synchronous spike trains on the timescale of 10–20 ms that are very effective at driving cortical targets. Here we demonstrate that this synchronous activity contains unexpectedly rich information about fundamental properties of visual stimuli. We report that the occurrence of synchronous firing of cat thalamic cells with highly overlapping receptive fields is strongly sensitive to the orientation and the direction of motion of the visual stimulus. We show that this stimulus selectivity is robust, remaining relatively unchanged under different contrasts and temporal frequencies (stimulus velocities). A computational analysis based on an integrate-and-fire model of the direct thalamic input to a layer 4 cortical cell reveals a strong correlation between the degree of thalamic synchrony and the nonlinear relationship between cortical membrane potential and the resultant firing rate. Together, these findings suggest a novel population code in the synchronous firing of neurons in the early visual pathway that could serve as the substrate for establishing cortical representations of the visual scene.

ps

preprint publisher's site Project Page [BibTex]

preprint publisher's site Project Page [BibTex]


Thumb xl facialfeature
Real-time Facial Feature Detection using Conditional Regression Forests

Dantone, M., Gall, J., Fanelli, G., van Gool, L.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages: 2578-2585, IEEE, Providence, RI, USA, 2012 (inproceedings)

ps

code pdf Project Page [BibTex]

code pdf Project Page [BibTex]


Thumb xl lht
Latent Hough Transform for Object Detection

Razavi, N., Gall, J., Kohli, P., van Gool, L.

In European Conference on Computer Vision (ECCV), 7574, pages: 312-325, LNCS, Springer, 2012 (inproceedings)

ps

pdf Project Page [BibTex]

pdf Project Page [BibTex]


Thumb xl destflow
Destination Flow for Crowd Simulation

Pellegrini, S., Gall, J., Sigal, L., van Gool, L.

In Workshop on Analysis and Retrieval of Tracked Events and Motion in Imagery Streams, 7585, pages: 162-171, LNCS, Springer, 2012 (inproceedings)

ps

pdf Project Page [BibTex]

pdf Project Page [BibTex]


Thumb xl soumyanips
From Deformations to Parts: Motion-based Segmentation of 3D Objects

Ghosh, S., Sudderth, E., Loper, M., Black, M.

In Advances in Neural Information Processing Systems 25 (NIPS), pages: 2006-2014, (Editors: P. Bartlett and F.C.N. Pereira and C.J.C. Burges and L. Bottou and K.Q. Weinberger), MIT Press, 2012 (inproceedings)

Abstract
We develop a method for discovering the parts of an articulated object from aligned meshes of the object in various three-dimensional poses. We adapt the distance dependent Chinese restaurant process (ddCRP) to allow nonparametric discovery of a potentially unbounded number of parts, while simultaneously guaranteeing a spatially connected segmentation. To allow analysis of datasets in which object instances have varying 3D shapes, we model part variability across poses via affine transformations. By placing a matrix normal-inverse-Wishart prior on these affine transformations, we develop a ddCRP Gibbs sampler which tractably marginalizes over transformation uncertainty. Analyzing a dataset of humans captured in dozens of poses, we infer parts which provide quantitatively better deformation predictions than conventional clustering methods.

ps

pdf supplemental code poster link (url) Project Page [BibTex]

pdf supplemental code poster link (url) Project Page [BibTex]


Thumb xl cells
Interactive Object Detection

Yao, A., Gall, J., Leistner, C., van Gool, L.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages: 3242-3249, IEEE, Providence, RI, USA, 2012 (inproceedings)

ps

video pdf Project Page [BibTex]

video pdf Project Page [BibTex]


Thumb xl hands
Motion Capture of Hands in Action using Discriminative Salient Points

Ballan, L., Taneja, A., Gall, J., van Gool, L., Pollefeys, M.

In European Conference on Computer Vision (ECCV), 7577, pages: 640-653, LNCS, Springer, 2012 (inproceedings)

ps

data video pdf supplementary Project Page [BibTex]

data video pdf supplementary Project Page [BibTex]


Thumb xl selfsimilarity small
Sparsity Potentials for Detecting Objects with the Hough Transform

Razavi, N., Alvar, N., Gall, J., van Gool, L.

In British Machine Vision Conference (BMVC), pages: 11.1-11.10, (Editors: Bowden, Richard and Collomosse, John and Mikolajczyk, Krystian), BMVA Press, 2012 (inproceedings)

ps

pdf Project Page [BibTex]

pdf Project Page [BibTex]


Thumb xl multiclasshf
An Introduction to Random Forests for Multi-class Object Detection

Gall, J., Razavi, N., van Gool, L.

In Outdoor and Large-Scale Real-World Scene Analysis, 7474, pages: 243-263, LNCS, (Editors: Dellaert, Frank and Frahm, Jan-Michael and Pollefeys, Marc and Rosenhahn, Bodo and Leal-Taix’e, Laura), Springer, 2012 (incollection)

ps

code code for Hough forest publisher's site pdf Project Page [BibTex]

code code for Hough forest publisher's site pdf Project Page [BibTex]


Thumb xl metricpose
Metric Learning from Poses for Temporal Clustering of Human Motion

L’opez-M’endez, A., Gall, J., Casas, J., van Gool, L.

In British Machine Vision Conference (BMVC), pages: 49.1-49.12, (Editors: Bowden, Richard and Collomosse, John and Mikolajczyk, Krystian), BMVA Press, 2012 (inproceedings)

ps

video pdf Project Page Project Page [BibTex]

video pdf Project Page Project Page [BibTex]


Thumb xl objectproposal
Local Context Priors for Object Proposal Generation

Ristin, M., Gall, J., van Gool, L.

In Asian Conference on Computer Vision (ACCV), 7724, pages: 57-70, LNCS, Springer-Verlag, 2012 (inproceedings)

ps

pdf DOI Project Page [BibTex]

pdf DOI Project Page [BibTex]


Thumb xl kinectbookchap
Home 3D body scans from noisy image and range data

Weiss, A., Hirshberg, D., Black, M. J.

In Consumer Depth Cameras for Computer Vision: Research Topics and Applications, pages: 99-118, 6, (Editors: Andrea Fossati and Juergen Gall and Helmut Grabner and Xiaofeng Ren and Kurt Konolige), Springer-Verlag, 2012 (incollection)

ps

Project Page [BibTex]

Project Page [BibTex]