Header logo is

Semi-Supervised Learning of Multi-Object 3D Scene Representations




Representing scenes at the granularity of objects is a prerequisite for scene understanding and decision making. We propose a novel approach for learning multi-object 3D scene representations from images. A recurrent encoder regresses a latent representation of 3D shapes, poses and texture of each object from an input RGB image. The 3D shapes are represented continuously in function-space as signed distance functions (SDF) which we efficiently pre-train from example shapes in a supervised way. By differentiable rendering we then train our model to decompose scenes self-supervised from RGB-D images. Our approach learns to decompose images into the constituent objects of the scene and to infer their shape, pose and texture from a single view. We evaluate the accuracy of our model in inferring the 3D scene layout and demonstrate its generative capabilities.

Author(s): Cathrin Elich and Martin R Oswald and Marc Pollefeys and Joerg Stueckler
Journal: CoRR
Volume: abs/2010.04030
Year: 2020

Department(s): Embodied Vision
Bibtex Type: Article (article)

URL: https://arxiv.org/abs/2010.04030


  title = {Semi-Supervised Learning of Multi-Object 3D Scene Representations},
  author = {Elich, Cathrin and Oswald, Martin R and Pollefeys, Marc and Stueckler, Joerg},
  journal = {CoRR},
  volume = {abs/2010.04030},
  year = {2020},
  url = {https://arxiv.org/abs/2010.04030}