I am a visiting student from University of Maryland where I work with my advisor David Jacobs.
My work focuses on model-based 3D reconstruction of animals from a single image. Animals are exciting objects to reconstruct because they are articulated, deform non-rigidly and often come with heavy self-occlusion.
I believe the key to solving this problem is having a strong prior on object pose and shape, which can resolve many of the ambiguities. I am interested in how we can learn such models and how to use them efficiently. I am also interested in exploring how deep learning methods can be used for building such models.
In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)
There has been significant work on learning realistic, articulated, 3D models of the human body. In contrast, there are few such models of animals, despite many applications. The main challenge is that animals are much less cooperative than humans. The best human body models are learned from thousands of 3D scans of people in specific poses, which is infeasible with live animals. Consequently,
we learn our model from a small set of 3D scans of toy figurines in arbitrary poses. We employ a novel part-based shape model to compute an initial registration to the scans. We then normalize their pose, learn a statistical shape model, and refine the registrations and the model together. In this way, we accurately align animal scans from different quadruped families with very different shapes and poses. With the registration to a common template we learn a shape space representing animals including lions, cats, dogs, horses, cows and hippos. Animal shapes can be sampled from the model, posed, animated, and fit to data. We demonstrate generalization by fitting it to images of real animals including species not seen in training.
In Computer Vision – ECCV 2016, Lecture Notes in Computer Science, Springer International Publishing, 14th European Conference on Computer Vision, October 2016 (inproceedings)
We describe the first method to automatically estimate the 3D pose of the human body as well as its 3D shape from a single unconstrained image. We estimate a full 3D mesh and show that 2D joints alone carry a surprising amount of information about body shape. The problem is challenging because of the complexity of the human body, articulation, occlusion, clothing, lighting, and the inherent ambiguity in inferring 3D from 2D. To solve this, we first use a recently published CNN-based method, DeepCut, to predict (bottom-up) the 2D body joint locations. We then fit (top-down) a recently published statistical body shape model, called SMPL, to the 2D joints. We do so by minimizing an objective function that penalizes the error between the projected 3D model joints and detected 2D joints. Because SMPL captures correlations in human shape across the population, we are able to robustly fit it to very little data. We further leverage the 3D model to prevent solutions that cause interpenetration. We evaluate our method, SMPLify, on the Leeds Sports, HumanEva, and Human3.6M datasets, showing superior
pose accuracy with respect to the state of the art.
Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems