Publications | Max Planck Institute for Intelligent Systems

274 results (View BibTeX file of all listed publications)

2023

Synchronizing Machine Learning Algorithms, Realtime Robotic Control and Simulated Environment with o80

Berenz, V., Widmaier, F., Guist, S., Schölkopf, B., Büchler, D.

Robot Software Architectures Workshop (RSA) 2023, ICRA, 2023 (techreport)

Abstract

Robotic applications require the integration of various modalities, encompassing perception, control of real robots and possibly the control of simulated environments. While the state-of-the-art robotic software solutions such as ROS 2 provide most of the required features, flexible synchronization between algorithms, data streams and control loops can be tedious. o80 is a versatile C++ framework for robotics which provides a shared memory model and a command framework for real-time critical systems. It enables expert users to set up complex robotic systems and generate Python bindings for scientists. o80's unique feature is its flexible synchronization between processes, including the traditional blocking commands and the novel ``bursting mode'', which allows user code to control the execution of the lower process control loop. This makes it particularly useful for setups that mix real and simulated environments.

arxiv poster link (url) [BibTex]

2023

ei Berenz, V., Widmaier, F., Guist, S., Schölkopf, B., Büchler, D. Synchronizing Machine Learning Algorithms, Realtime Robotic Control and Simulated Environment with o80 Robot Software Architectures Workshop (RSA) 2023, ICRA, 2023 (techreport)

arxiv poster link (url) [BibTex]

Challenging Common Assumptions in Multi-task Learning

Elich, C., Kirchdorfer, L., Köhler, J. M., Schott, L.

abs/2311.04698, CoRR/arxiv, 2023 (techreport)

paper link (url) [BibTex]

ev Elich, C., Kirchdorfer, L., Köhler, J. M., Schott, L. Challenging Common Assumptions in Multi-task Learning abs/2311.04698, CoRR/arxiv, 2023 (techreport)

paper link (url) [BibTex]

2022

Reconstructing Expressive 3D Humans from RGB Images

Choutas, V.

ETH Zurich, Max Planck Institute for Intelligent Systems and ETH Zurich, December 2022 (thesis)

Abstract

To interact with our environment, we need to adapt our body posture and grasp objects with our hands. During a conversation our facial expressions and hand gestures convey important non-verbal cues about our emotional state and intentions towards our fellow speakers. Thus, modeling and capturing 3D full-body shape and pose, hand articulation and facial expressions are necessary to create realistic human avatars for augmented and virtual reality. This is a complex task, due to the large number of degrees of freedom for articulation, body shape variance, occlusions from objects and self-occlusions from body parts, e.g. crossing our hands, and subject appearance. The community has thus far relied on expensive and cumbersome equipment, such as multi-view cameras or motion capture markers, to capture the 3D human body. While this approach is effective, it is limited to a small number of subjects and indoor scenarios. Using monocular RGB cameras would greatly simplify the avatar creation process, thanks to their lower cost and ease of use. These advantages come at a price though, since RGB capture methods need to deal with occlusions, perspective ambiguity and large variations in subject appearance, in addition to all the challenges posed by full-body capture. In an attempt to simplify the problem, researchers generally adopt a divide-and-conquer strategy, estimating the body, face and hands with distinct methods using part-specific datasets and benchmarks. However, the hands and face constrain the body and vice-versa, e.g. the position of the wrist depends on the elbow, shoulder, etc.; the divide-and-conquer approach can not utilize this constraint. In this thesis, we aim to reconstruct the full 3D human body, using only readily accessible monocular RGB images. In a first step, we introduce a parametric 3D body model, called SMPL-X, that can represent full-body shape and pose, hand articulation and facial expression. Next, we present an iterative optimization method, named SMPLify-X, that fits SMPL-X to 2D image keypoints. While SMPLify-X can produce plausible results if the 2D observations are sufficiently reliable, it is slow and susceptible to initialization. To overcome these limitations, we introduce ExPose, a neural network regressor, that predicts SMPL-X parameters from an image using body-driven attention, i.e. by zooming in on the hands and face, after predicting the body. From the zoomed-in part images, dedicated part networks predict the hand and face parameters. ExPose combines the independent body, hand, and face estimates by trusting them equally. This approach though does not fully exploit the correlation between parts and fails in the presence of challenges such as occlusion or motion blur. Thus, we need a better mechanism to aggregate information from the full body and part images. PIXIE uses neural networks called moderators that learn to fuse information from these two image sets before predicting the final part parameters. Overall, the addition of the hands and face leads to noticeably more natural and expressive reconstructions. Creating high fidelity avatars from RGB images requires accurate estimation of 3D body shape. Although existing methods are effective at predicting body pose, they struggle with body shape. We identify the lack of proper training data as the cause. To overcome this obstacle, we propose to collect internet images from fashion models websites, together with anthropometric measurements. At the same time, we ask human annotators to rate images and meshes according to a pre-defined set of linguistic attributes. We then define mappings between measurements, linguistic shape attributes and 3D body shape. Equipped with these mappings, we train a neural network regressor, SHAPY, that predicts accurate 3D body shapes from a single RGB image. We observe that existing 3D shape benchmarks lack subject variety and/or ground-truth shape. Thus, we introduce a new benchmark, Human Bodies in the Wild (HBW), which contains images of humans and their corresponding 3D ground-truth body shape. SHAPY shows how we can overcome the lack of in-the-wild images with 3D shape annotations through easy-to-obtain anthropometric measurements and linguistic shape attributes. Regressors that estimate 3D model parameters are robust and accurate, but often fail to tightly fit the observations. Optimization-based approaches tightly fit the data, by minimizing an energy function composed of a data term that penalizes deviations from the observations and priors that encode our knowledge of the problem. Finding the balance between these terms and implementing a performant version of the solver is a time-consuming and non-trivial task. Machine-learned continuous optimizers combine the benefits of both regression and optimization approaches. They learn the priors directly from data, avoiding the need for hand-crafted heuristics and loss term balancing, and benefit from optimized neural network frameworks for fast inference. Inspired from the classic Levenberg-Marquardt algorithm, we propose a neural optimizer that outperforms classic optimization, regression and hybrid optimization-regression approaches. Our proposed update rule uses a weighted combination of gradient descent and a network-predicted update. To show the versatility of the proposed method, we apply it on three other problems, namely full body estimation from (i) 2D keypoints, (ii) head and hand location from a head-mounted device and (iii) face tracking from dense 2D landmarks. Our method can easily be applied to new model fitting problems and offers a competitive alternative to well-tuned traditional model fitting pipelines, both in terms of accuracy and speed. To summarize, we propose a new and richer representation of the human body, SMPL-X, that is able to jointly model the 3D human body pose and shape, facial expressions and hand articulation. We propose methods, SMPLify-X, ExPose and PIXIE that estimate SMPL-X parameters from monocular RGB images, progressively improving the accuracy and realism of the predictions. To further improve reconstruction fidelity, we demonstrate how we can use easy-to-collect internet data and human annotations to overcome the lack of 3D shape data and train a model, SHAPY, that predicts accurate 3D body shape from a single RGB image. Finally, we propose a flexible learnable update rule for parametric human model fitting that outperforms both classic optimization and neural network approaches. This approach is easily applicable to a variety of problems, unlocking new applications in AR/VR scenarios.

pdf [BibTex]

2022

ps Choutas, V. Reconstructing Expressive 3D Humans from RGB Images ETH Zurich, Max Planck Institute for Intelligent Systems and ETH Zurich, December 2022 (thesis)

pdf [BibTex]

Causality, causal digital twins, and their applications

Schölkopf, B.

Machine Learning for Science: Bridging Data-Driven and Mechanistic Modelling (Dagstuhl Seminar 22382), (Editors: Berens, Philipp and Cranmer, Kyle and Lawrence, Neil D. and von Luxburg, Ulrike and Montgomery, Jessica), September 2022 (talk)

link (url) DOI [BibTex]

ei Schölkopf, B. Causality, causal digital twins, and their applications Machine Learning for Science: Bridging Data-Driven and Mechanistic Modelling (Dagstuhl Seminar 22382), (Editors: Berens, Philipp and Cranmer, Kyle and Lawrence, Neil D. and von Luxburg, Ulrike and Montgomery, Jessica), September 2022 (talk)

link (url) DOI [BibTex]

Learning Plastic Matching of Robot Dynamics in Closed-Loop Central Pattern Generators: Data

Ruppert, F., Badri-Spröwitz, A.

Edmond, May 2022 (techreport)

dlg

link (url) DOI [BibTex]

dlg Ruppert, F., Badri-Spröwitz, A. Learning Plastic Matching of Robot Dynamics in Closed-Loop Central Pattern Generators: Data Edmond, May 2022 (techreport)

link (url) DOI [BibTex]

Data for BirdBot Achieves Energy-Efﬁcient Gait with Minimal Control Using Avian-Inspired Leg Clutching

Badri-Spröwitz, A., Sarvestani, A. A., Sitti, M., Daley, M. A.

Edmond, March 2022 (techreport)

dlg pi

DOI Project Page [BibTex]

dlg pi Badri-Spröwitz, A., Sarvestani, A. A., Sitti, M., Daley, M. A. Data for BirdBot Achieves Energy-Efﬁcient Gait with Minimal Control Using Avian-Inspired Leg Clutching Edmond, March 2022 (techreport)

DOI Project Page [BibTex]

Observability Analysis of Visual-Inertial Odometry with Online Calibration of Velocity-Control Based Kinematic Motion Models

Li, H., Stueckler, J.

abs/2204.06651, CoRR/arxiv, 2022 (techreport)

Abstract

In this paper, we analyze the observability of the visual-inertial odometry (VIO) using stereo cameras with a velocity-control based kinematic motion model. Previous work shows that in general case the global position and yaw are unobservable in VIO system, additionally the roll and pitch become also unobservable if there is no rotation. We prove that by integrating a planar motion constraint roll and pitch become observable. We also show that the parameters of the motion model are observable.

link (url) [BibTex]

ev Li, H., Stueckler, J. Observability Analysis of Visual-Inertial Odometry with Online Calibration of Velocity-Control Based Kinematic Motion Models abs/2204.06651, CoRR/arxiv, 2022 (techreport)

link (url) [BibTex]

2021

Physically Plausible Tracking & Reconstruction of Dynamic Objects

Strecke, M., Stückler, J.

KIT Science Week Scientific Conference & DGR-Days 2021, October 2021 (talk)

MPI Papers

Departments

Research Groups

Publication Type

Year

2023

2023

2022

2022

2021

2021

2020

2020

2019

2019

2018

2018

2016

2016

2015

2015

2014

2014

2013

2013

2012

2012