Header logo is


2024


HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video
HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video

Fan, Z., Parelli, M., Kadoglou, M. E., Kocabas, M., Chen, X., Black, M. J., Hilliges, O.

Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2024 (conference)

ps

Paper Project Code [BibTex]

2024


Paper Project Code [BibTex]


no image
GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs

Gao, G., Liu, W., Chen, A., Geiger, A., Schölkopf, B.

The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2024 (conference) Accepted

ei

[BibTex]

[BibTex]


AMUSE: Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion
AMUSE: Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion

Chhatre, K., Daněček, R., Athanasiou, N., Becherini, G., Peters, C., Black, M. J., Bolkart, T.

Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2024 (conference) To be published

Abstract
Existing methods for synthesizing 3D human gestures from speech have shown promising results, but they do not explicitly model the impact of emotions on the generated gestures. Instead, these methods directly output animations from speech without control over the expressed emotion. To address this limitation, we present AMUSE, an emotional speech-driven body animation model based on latent diffusion. Our observation is that content (i.e., gestures related to speech rhythm and word utterances), emotion, and personal style are separable. To account for this, AMUSE maps the driving audio to three disentangled latent vectors: one for content, one for emotion, and one for personal style. A latent diffusion model, trained to generate gesture motion sequences, is then conditioned on these latent vectors. Once trained, AMUSE synthesizes 3D human gestures directly from speech with control over the expressed emotions and style by combining the content from the driving speech with the emotion and style of another speech sequence. Randomly sampling the noise of the diffusion model further generates variations of the gesture with the same emotional expressivity. Qualitative, quantitative, and perceptual evaluations demonstrate that AMUSE outputs realistic gesture sequences. Compared to the state of the art, the generated gestures are better synchronized with the speech content and better represent the emotion expressed by the input speech.

ps

Project Paper Code link (url) [BibTex]

Project Paper Code link (url) [BibTex]


no image
Out-of-Variable Generalization for Discriminative Models

Guo, S., Wildberger, J., Schölkopf, B.

Proceedings of the Twelfth International Conference on Learning Representations (ICLR), May 2024 (conference) Accepted

ei

arXiv [BibTex]

arXiv [BibTex]


no image
Delphic Offline Reinforcement Learning under Nonidentifiable Hidden Confounding

Pace, A., Yèche, H., Schölkopf, B., Rätsch, G., Tennenholtz, G.

Proceedings of the Twelfth International Conference on Learning Representations (ICLR), May 2024 (conference) Accepted

ei

arXiv [BibTex]

arXiv [BibTex]


no image
Towards Training Without Depth Limits: Batch Normalization Without Gradient Explosion

Meterez*, A., Joudaki*, A., Orabona, F., Immer, A., Rätsch, G., Daneshmand, H.

Proceedings of the Twelfth International Conference on Learning Representations (ICLR), May 2024, *equal contribution (conference) Accepted

ei

arXiv [BibTex]

arXiv [BibTex]


no image
The Expressive Leaky Memory Neuron: an Efficient and Expressive Phenomenological Neuron Model Can Solve Long-Horizon Tasks

Spieler, A., Rahaman, N., Martius, G., Schölkopf, B., Levina, A.

Proceedings of the Twelfth International Conference on Learning Representations (ICLR), May 2024 (conference) Accepted

ei al

arXiv [BibTex]

arXiv [BibTex]


no image
Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Open X-Embodiment Collaboration ( incl. Guist, S., Schneider, J., Schölkopf, B., Büchler, D. ).

IEEE International Conference on Robotics and Automation (ICRA), May 2024 (conference) Accepted

ei

[BibTex]

[BibTex]


no image
Can Large Language Models Infer Causation from Correlation?

Jin, Z., Liu, J., Lyu, Z., Poff, S., Sachan, M., Mihalcea, R., Diab*, M., Schölkopf*, B.

Proceedings of the Twelfth International Conference on Learning Representations (ICLR), May 2024, *equal supervision (conference) Accepted

ei

arXiv [BibTex]

arXiv [BibTex]


no image
Certified private data release for sparse Lipschitz functions

Donhauser, K., Lokna, J., Sanyal, A., Boedihardjo, M., Hönig, R., Yang, F.

27th International Conference on Artificial Intelligence and Statistics (AISTATS), May 2024 (conference) Accepted

ei

[BibTex]

[BibTex]


Ghost on the Shell: An Expressive Representation of General 3D Shapes
Ghost on the Shell: An Expressive Representation of General 3D Shapes

(Oral)

Liu, Z., Feng, Y., Xiu, Y., Liu, W., Paull, L., Black, M. J., Schölkopf, B.

In Proceedings of the Twelfth International Conference on Learning Representations, The Twelfth International Conference on Learning Representations, May 2024 (inproceedings) Accepted

Abstract
The creation of photorealistic virtual worlds requires the accurate modeling of 3D surface geometry for a wide range of objects. For this, meshes are appealing since they 1) enable fast physics-based rendering with realistic material and lighting, 2) support physical simulation, and 3) are memory-efficient for modern graphics pipelines. Recent work on reconstructing and statistically modeling 3D shape, however, has critiqued meshes as being topologically inflexible. To capture a wide range of object shapes, any 3D representation must be able to model solid, watertight, shapes as well as thin, open, surfaces. Recent work has focused on the former, and methods for reconstructing open surfaces do not support fast reconstruction with material and lighting or unconditional generative modelling. Inspired by the observation that open surfaces can be seen as islands floating on watertight surfaces, we parameterize open surfaces by defining a manifold signed distance field on watertight templates. With this parameterization, we further develop a grid-based and differentiable representation that parameterizes both watertight and non-watertight meshes of arbitrary topology. Our new representation, called Ghost-on-the-Shell (G-Shell), enables two important applications: differentiable rasterization-based reconstruction from multiview images and generative modelling of non-watertight meshes. We empirically demonstrate that G-Shell achieves state-of-the-art performance on non-watertight mesh reconstruction and generation tasks, while also performing effectively for watertight meshes.

ei ps

Home Code Video Project [BibTex]

Home Code Video Project [BibTex]


no image
Identifying Policy Gradient Subspaces

Schneider, J., Schumacher, P., Guist, S., Chen, L., Häufle, D., Schölkopf, B., Büchler, D.

Proceedings of the Twelfth International Conference on Learning Representations (ICLR), May 2024 (conference) Accepted

ei

arXiv [BibTex]

arXiv [BibTex]


no image
Emergent mechanisms for long timescales depend on training curriculum and affect performance in memory tasks

Khajehabdollahi, S., Zeraati, R., Giannakakis, E., Schäfer, T. J., Martius, G., Levina, A.

In The Twelfth International Conference on Learning Representations, ICLR 2024, May 2024 (inproceedings)

al

link (url) [BibTex]

link (url) [BibTex]


no image
Some Intriguing Aspects about Lipschitz Continuity of Neural Networks

Khromov*, G., Singh*, S. P.

Proceedings of the Twelfth International Conference on Learning Representations (ICLR), May 2024, *equal contribution (conference) Accepted

ei

arXiv [BibTex]

arXiv [BibTex]


Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization
Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization

Liu, W., Qiu, Z., Feng, Y., Xiu, Y., Xue, Y., Yu, L., Feng, H., Liu, Z., Heo, J., Peng, S., Wen, Y., Black, M. J., Weller, A., Schölkopf, B.

In Proceedings of the Twelfth International Conference on Learning Representations, The Twelfth International Conference on Learning Representations, May 2024 (inproceedings) Accepted

Abstract
Large foundation models are becoming ubiquitous, but training them from scratch is prohibitively expensive. Thus, efficiently adapting these powerful models to downstream tasks is increasingly important. In this paper, we study a principled finetuning paradigm -- Orthogonal Finetuning (OFT) -- for downstream task adaptation. Despite demonstrating good generalizability, OFT still uses a fairly large number of trainable parameters due to the high dimensionality of orthogonal matrices. To address this, we start by examining OFT from an information transmission perspective, and then identify a few key desiderata that enable better parameter-efficiency. Inspired by how the Cooley-Tukey fast Fourier transform algorithm enables efficient information transmission, we propose an efficient orthogonal parameterization using butterfly structures. We apply this parameterization to OFT, creating a novel parameter-efficient finetuning method, called Orthogonal Butterfly (BOFT). By subsuming OFT as a special case, BOFT introduces a generalized orthogonal finetuning framework. Finally, we conduct an extensive empirical study of adapting large vision transformers, large language models, and text-to-image diffusion models to various downstream tasks in vision and language.

ei ps

Home Code HuggingFace project [BibTex]

Home Code HuggingFace project [BibTex]


no image
Skill or Luck? Return Decomposition via Advantage Functions

Pan, H., Schölkopf, B.

Proceedings of the Twelfth International Conference on Learning Representations (ICLR), May 2024 (conference) Accepted

ei

[BibTex]

[BibTex]


no image
Transformer Fusion with Optimal Transport

Imfeld*, M., Graldi*, J., Giordano*, M., Hofmann, T., Anagnostidis, S., Singh, S. P.

Proceedings of the Twelfth International Conference on Learning Representations (ICLR), May 2024, *equal contribution (conference) Accepted

ei

arXiv [BibTex]

arXiv [BibTex]


no image
Learning Hierarchical World Models with Adaptive Temporal Abstractions from Discrete Latent Dynamics

Gumbsch, C., Sajid, N., Martius, G., Butz, M. V.

In The Twelfth International Conference on Learning Representations, ICLR 2024, May 2024 (inproceedings)

al

link (url) [BibTex]

link (url) [BibTex]


no image
Causal Modeling with Stationary Diffusions

Lorch, L., Krause*, A., Schölkopf*, B.

27th International Conference on Artificial Intelligence and Statistics (AISTATS), May 2024, *equal supervision (conference) Accepted

ei

[BibTex]

[BibTex]


no image
Multi-View Causal Representation Learning with Partial Observability

Yao, D., Xu, D., Lachapelle, S., Magliacane, S., Taslakian, P., Martius, G., von Kügelgen, J., Locatello, F.

Proceedings of the Twelfth International Conference on Learning Representations (ICLR), May 2024 (conference) Accepted

ei al

arXiv [BibTex]

arXiv [BibTex]


no image
Towards Meta-Pruning via Optimal Transport

Theus, A., Geimer, O., Wicke, F., Hofmann, T., Anagnostidis, S., Singh, S. P.

Proceedings of the Twelfth International Conference on Learning Representations (ICLR), May 2024 (conference) Accepted

ei

[BibTex]

[BibTex]


no image
Stochastic Gradient Descent for Gaussian Processes Done Right

Lin*, J. A., Padhy*, S., Antorán*, J., Tripp, A., Terenin, A., Szepesvari, C., Hernández-Lobato, J. M., Janz, D.

Proceedings of the Twelfth International Conference on Learning Representations (ICLR), May 2024, *equal contribution (conference) Accepted

ei

arXiv [BibTex]

arXiv [BibTex]


no image
PILLAR: How to make semi-private learning more effective

Hu, Y., Pinto, F., Yang, F., Sanyal, A.

2nd IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), April 2024 (conference) Accepted

ei

[BibTex]

[BibTex]


no image
Expert Perception of Teleoperated Social Exercise Robots

Mohan, M., Mat Husin, H., Kuchenbecker, K. J.

In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction (HRI), pages: 769-773, Boulder, USA, March 2024, Late-Breaking Report (LBR) (5 pages) presented at the IEEE/ACM International Conference on Human-Robot Interaction (HRI) (inproceedings)

Abstract
Social robots could help address the growing issue of physical inactivity by inspiring users to engage in interactive exercise. Nevertheless, the practical implementation of social exercise robots poses substantial challenges, particularly in terms of personalizing their activities to individuals. We propose that motion-capture-based teleoperation could serve as a viable solution to address these needs by enabling experts to record custom motions that could later be played back without their real-time involvement. To gather feedback about this idea, we conducted semi-structured interviews with eight exercise-therapy professionals. Our findings indicate that experts' attitudes toward social exercise robots become more positive when considering the prospect of teleoperation to record and customize robot behaviors.

hi

DOI Project Page [BibTex]

DOI Project Page [BibTex]


TADA! Text to Animatable Digital Avatars
TADA! Text to Animatable Digital Avatars

Liao, T., Yi, H., Xiu, Y., Tang, J., Huang, Y., Thies, J., Black, M. J.

In International Conference on 3D Vision (3DV 2024), 3DV 2024, March 2024 (inproceedings) Accepted

Abstract
We introduce TADA, a simple-yet-effective approach that takes textual descriptions and produces expressive 3D avatars with high-quality geometry and lifelike textures, that can be animated and rendered with traditional graphics pipelines. Existing text-based character generation methods are limited in terms of geometry and texture quality, and cannot be realistically animated due to inconsistent align-007 ment between the geometry and the texture, particularly in the face region. To overcome these limitations, TADA leverages the synergy of a 2D diffusion model and an animatable parametric body model. Specifically, we derive an optimizable high-resolution body model from SMPL-X with 3D displacements and a texture map, and use hierarchical rendering with score distillation sampling (SDS) to create high-quality, detailed, holistic 3D avatars from text. To ensure alignment between the geometry and texture, we render normals and RGB images of the generated character and exploit their latent embeddings in the SDS training process. We further introduce various expression parameters to deform the generated character during training, ensuring that the semantics of our generated character remain consistent with the original SMPL-X model, resulting in an animatable character. Comprehensive evaluations demonstrate that TADA significantly surpasses existing approaches on both qualitative and quantitative measures. TADA enables creation of large-scale digital character assets that are ready for animation and rendering, while also being easily editable through natural language. The code will be public for research purposes.

ps

Home Code Video [BibTex]

Home Code Video [BibTex]


POCO: {3D} Pose and Shape Estimation using Confidence
POCO: 3D Pose and Shape Estimation using Confidence

Dwivedi, S. K., Schmid, C., Yi, H., Black, M. J., Tzionas, D.

In International Conference on 3D Vision (3DV 2024), 3DV 2024, March 2024 (inproceedings)

Abstract
The regression of 3D Human Pose and Shape HPS from an image is becoming increasingly accurate. This makes the results useful for downstream tasks like human action recognition or 3D graphics. Yet, no regressor is perfect, and accuracy can be affected by ambiguous image evidence or by poses and appearance that are unseen during training. Most current HPS regressors, however, do not report the confidence of their outputs, meaning that downstream tasks cannot differentiate accurate estimates from inaccurate ones. To address this, we develop POCO, a novel framework for training HPS regressors to estimate not only a 3D human body, but also their confidence, in a single feed-forward pass. Specifically, POCO estimates both the 3D body pose and a per-sample variance. The key idea is to introduce a Dual Conditioning Strategy (DCS) for regressing uncertainty that is highly correlated to pose reconstruction quality. The POCO framework can be applied to any HPS regressor and here we evaluate it by modifying HMR, PARE, and CLIFF. In all cases, training the network to reason about uncertainty helps it learn to more accurately estimate 3D pose. While this was not our goal, the improvement is modest but consistent. Our main motivation is to provide uncertainty estimates for downstream tasks; we demonstrate this in two ways: (1) We use the confidence estimates to bootstrap HPS training. Given unlabelled image data, we take the confident estimates of a POCO-trained regressor as pseudo ground truth. Retraining with this automatically-curated data improves accuracy. (2) We exploit uncertainty in video pose estimation by automatically identifying uncertain frames (e.g. due to occlusion) and inpainting these from confident frames.

ps

Paper SupMat Poster link (url) [BibTex]

Paper SupMat Poster link (url) [BibTex]


TECA: Text-Guided Generation and Editing of Compositional 3D Avatars
TECA: Text-Guided Generation and Editing of Compositional 3D Avatars

Zhang, H., Feng, Y., Kulits, P., Wen, Y., Thies, J., Black, M. J.

In International Conference on 3D Vision (3DV 2024), 3DV 2024, March 2024 (inproceedings) To be published

Abstract
Our goal is to create a realistic 3D facial avatar with hair and accessories using only a text description. While this challenge has attracted significant recent interest, existing methods either lack realism, produce unrealistic shapes, or do not support editing, such as modifications to the hairstyle. We argue that existing methods are limited because they employ a monolithic modeling approach, using a single representation for the head, face, hair, and accessories. Our observation is that the hair and face, for example, have very different structural qualities that benefit from different representations. Building on this insight, we generate avatars with a compositional model, in which the head, face, and upper body are represented with traditional 3D meshes, and the hair, clothing, and accessories with neural radiance fields (NeRF). The model-based mesh representation provides a strong geometric prior for the face region, improving realism while enabling editing of the person's appearance. By using NeRFs to represent the remaining components, our method is able to model and synthesize parts with complex geometry and appearance, such as curly hair and fluffy scarves. Our novel system synthesizes these high-quality compositional avatars from text descriptions. The experimental results demonstrate that our method, Text-guided generation and Editing of Compositional Avatars (TECA), produces avatars that are more realistic than those of recent methods while being editable because of their compositional nature. For example, our TECA enables the seamless transfer of compositional features like hairstyles, scarves, and other accessories between avatars. This capability supports applications such as virtual try-on.

ncs ps

arXiv project link (url) [BibTex]

arXiv project link (url) [BibTex]


TeCH: Text-guided Reconstruction of Lifelike Clothed Humans
TeCH: Text-guided Reconstruction of Lifelike Clothed Humans

Huang, Y., Yi, H., Xiu, Y., Liao, T., Tang, J., Cai, D., Thies, J.

In International Conference on 3D Vision (3DV 2024), 3DV 2024, March 2024 (inproceedings) Accepted

Abstract
Despite recent research advancements in reconstructing clothed humans from a single image, accurately restoring the "unseen regions" with high-level details remains an unsolved challenge that lacks attention. Existing methods often generate overly smooth back-side surfaces with a blurry texture. But how to effectively capture all visual attributes of an individual from a single image, which are sufficient to reconstruct unseen areas (e.g., the back view)? Motivated by the power of foundation models, TeCH reconstructs the 3D human by leveraging 1) descriptive text prompts (e.g., garments, colors, hairstyles) which are automatically generated via a garment parsing model and Visual Question Answering (VQA), 2) a personalized fine-tuned Text-to-Image diffusion model (T2I) which learns the "indescribable" appearance. To represent high-resolution 3D clothed humans at an affordable cost, we propose a hybrid 3D representation based on DMTet, which consists of an explicit body shape grid and an implicit distance field. Guided by the descriptive prompts + personalized T2I diffusion model, the geometry and texture of the 3D humans are optimized through multi-view Score Distillation Sampling (SDS) and reconstruction losses based on the original observation. TeCH produces high-fidelity 3D clothed humans with consistent & delicate texture, and detailed full-body geometry. Quantitative and qualitative experiments demonstrate that TeCH outperforms the state-of-the-art methods in terms of reconstruction accuracy and rendering quality.

ps

Code Home Video arXiv [BibTex]

Code Home Video arXiv [BibTex]


{ArtiGrasp}: Physically Plausible Synthesis of Bi-Manual Dexterous Grasping and Articulation
ArtiGrasp: Physically Plausible Synthesis of Bi-Manual Dexterous Grasping and Articulation

Zhang, H., Christen, S., Fan, Z., Zheng, L., Hwangbo, J., Song, J., Hilliges, O.

In International Conference on 3D Vision (3DV 2024), 3DV 2024, March 2024 (inproceedings) Accepted

Abstract
We present ArtiGrasp, a novel method to synthesize bi-manual hand-object interactions that include grasping and articulation. This task is challenging due to the diversity of the global wrist motions and the precise finger control that are necessary to articulate objects. ArtiGrasp leverages reinforcement learning and physics simulations to train a policy that controls the global and local hand pose. Our framework unifies grasping and articulation within a single policy guided by a single hand pose reference. Moreover, to facilitate the training of the precise finger control required for articulation, we present a learning curriculum with increasing difficulty. It starts with single-hand manipulation of stationary objects and continues with multi-agent training including both hands and non-stationary objects. To evaluate our method, we introduce Dynamic Object Grasping and Articulation, a task that involves bringing an object into a target articulated pose. This task requires grasping, relocation, and articulation. We show our method's efficacy towards this task. We further demonstrate that our method can generate motions with noisy hand-object pose estimates from an off-the-shelf image-based regressor.

ps

pdf project code [BibTex]

pdf project code [BibTex]


Adapting a High-Fidelity Simulation of Human Skin for Comparative Touch Sensing in the Elephant Trunk
Adapting a High-Fidelity Simulation of Human Skin for Comparative Touch Sensing in the Elephant Trunk

Schulz, A., Serhat, G., Kuchenbecker, K. J.

Abstract presented at the Society for Integrative and Comparative BIology Annual Meeting (SICB), Seattle, USA, January 2024 (misc)

Abstract
Skin is a complex biological composite consisting of layers with distinct mechanical properties, morphologies, and mechanosensory capabilities. This work seeks to expand the comparative biomechanics field to comparative haptics, analyzing elephant trunk touch by redesigning a previously published human finger-pad model with morphological parameters measured from an elephant trunk. The dorsal surface of the elephant trunk has a thick, wrinkled epidermis covered with whiskers at the distal tip and deep folds at the proximal base. We hypothesize that this thick dorsal skin protects the trunk from mechanical damage but significantly dulls its tactile sensing ability. To facilitate safe and dexterous motion, the distributed dorsal whiskers might serve as pre-touch antennae, transmitting an amplified version of impending contact to the mechanoreceptors beneath the elephant's armor. We tested these hypotheses by simulating soft tissue deformation through high-fidelity finite element analyses involving representative skin layers and whiskers, modeled based on frozen African elephant trunk (Loxodonta africana) morphology. For a typical contact force, quintupling the stratum corneum thickness to match dorsal trunk skin reduces the von Mises stress communicated to the dermis by 18%. However, adding a whisker offsets this dulled sensing, as hypothesized, amplifying the stress by more than 15 at the same location. We hope this work will motivate further investigations of mammalian touch using approaches and models from the ample literature on human touch.

hi

[BibTex]

[BibTex]


Adversarial Likelihood Estimation With One-Way Flows
Adversarial Likelihood Estimation With One-Way Flows

Ben-Dov, O., Gupta, P. S., Abrevaya, V., Black, M. J., Ghosh, P.

In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages: 3779-3788, January 2024 (inproceedings)

Abstract
Generative Adversarial Networks (GANs) can produce high-quality samples, but do not provide an estimate of the probability density around the samples. However, it has been noted that maximizing the log-likelihood within an energy-based setting can lead to an adversarial framework where the discriminator provides unnormalized density (often called energy). We further develop this perspective, incorporate importance sampling, and show that 1) Wasserstein GAN performs a biased estimate of the partition function, and we propose instead to use an unbiased estimator; and 2) when optimizing for likelihood, one must maximize generator entropy. This is hypothesized to provide a better mode coverage. Different from previous works, we explicitly compute the density of the generated samples. This is the key enabler to designing an unbiased estimator of the partition function and computation of the generator entropy term. The generator density is obtained via a new type of flow network, called one-way flow network, that is less constrained in terms of architecture, as it does not require a tractable inverse function. Our experimental results show that our method converges faster, produces comparable sample quality to GANs with similar architecture, successfully avoids over-fitting to commonly used datasets and produces smooth low-dimensional latent representations of the training data.

ps

pdf arXiv [BibTex]

pdf arXiv [BibTex]


no image
MPI-10: Haptic-Auditory Measurements from Tool-Surface Interactions

Khojasteh, B., Shao, Y., Kuchenbecker, K. J.

Dataset published as a companion to the journal article "Robust Surface Recognition with the Maximum Mean Discrepancy: Degrading Haptic-Auditory Signals through Bandwidth and Noise" in IEEE Transactions on Haptics, January 2024 (misc)

hi

DOI Project Page [BibTex]

DOI Project Page [BibTex]


Whiskers That Don’t Whisk: Unique Structure From the Absence of Actuation in Elephant Whiskers
Whiskers That Don’t Whisk: Unique Structure From the Absence of Actuation in Elephant Whiskers

Schulz, A., Kaufmann, L., Brecht, M., Richter, G., Kuchenbecker, K. J.

Abstract presented at the Society for Integrative and Comparative BIology Annual Meeting (SICB), Seattle, USA, January 2024 (misc)

Abstract
Whiskers are so named because these hairs often actuate circularly, whisking, via collagen wrapping at the root of the hair follicle to increase their sensing volumes. Elephant trunks are a unique case study for whiskers, as the dorsal and lateral sections of the elephant proboscis have scattered sensory hairs that lack individual actuation. We hypothesize that the actuation limitations of these non-whisking whiskers led to anisotropic morphology and non-homogeneous composition to meet the animal's sensory needs. To test these hypotheses, we examined trunk whiskers from a 35-year-old female African savannah elephant (Loxodonta africana). Whisker morphology was evaluated through micro-CT and polarized light microscopy. The whiskers from the distal tip of the trunk were found to be axially asymmetric, with an ovular cross-section at the root, shifting to a near-square cross-section at the point. Nanoindentation and additional microscopy revealed that elephant whiskers have a composition unlike any other mammalian hair ever studied: we recorded an elastic modulus of 3 GPa at the root and 0.05 GPa at the point of a single 4-cm-long whisker. This work challenges the assumption that hairs have circular cross-sections and isotropic mechanical properties. With such striking differences compared to other mammals, including the mouse (Mus musculus), rat (Rattus norvegicus), and cat (Felis catus), we conclude that whisker morphology and composition play distinct and complementary roles in elephant trunk mechanosensing.

zwe-csfm hi

[BibTex]

[BibTex]


no image
Physics-Based Rigid Body Object Tracking and Friction Filtering From RGB-D Videos

Kandukuri, R. K., Strecke, M., Stueckler, J.

In International Conference on 3D Vision (3DV), 2024, accepted, preprint arXiv: 2309.15703 (inproceedings) Accepted

Abstract
Physics-based understanding of object interactions from sensory observations is an essential capability in augmented reality and robotics. It enables capturing the properties of a scene for simulation and control. In this paper, we propose a novel approach for real-to-sim which tracks rigid objects in 3D from RGB-D images and infers physical properties of the objects. We use a differentiable physics simulation as state-transition model in an Extended Kalman Filter which can model contact and friction for arbitrary mesh-based shapes and in this way estimate physically plausible trajectories. We demonstrate that our approach can filter position, orientation, velocities, and concurrently can estimate the coefficient of friction of the objects. We analyse our approach on various sliding scenarios in synthetic image sequences of single objects and colliding objects. We also demonstrate and evaluate our approach on a real-world dataset. We will make our novel benchmark datasets publicly available to foster future research in this novel problem setting and comparison with our method.

ev

preprint supplemental video dataset [BibTex]

preprint supplemental video dataset [BibTex]


no image
Natural and Robust Walking using Reinforcement Learning without Demonstrations in High-Dimensional Musculoskeletal Models
2024 (misc)

Abstract
Humans excel at robust bipedal walking in complex natural environments. In each step, they adequately tune the interaction of biomechanical muscle dynamics and neuronal signals to be robust against uncertainties in ground conditions. However, it is still not fully understood how the nervous system resolves the musculoskeletal redundancy to solve the multi-objective control problem considering stability, robustness, and energy efficiency. In computer simulations, energy minimization has been shown to be a successful optimization target, reproducing natural walking with trajectory optimization or reflex-based control methods. However, these methods focus on particular motions at a time and the resulting controllers are limited when compensating for perturbations. In robotics, reinforcement learning~(RL) methods recently achieved highly stable (and efficient) locomotion on quadruped systems, but the generation of human-like walking with bipedal biomechanical models has required extensive use of expert data sets. This strong reliance on demonstrations often results in brittle policies and limits the application to new behaviors, especially considering the potential variety of movements for high-dimensional musculoskeletal models in 3D. Achieving natural locomotion with RL without sacrificing its incredible robustness might pave the way for a novel approach to studying human walking in complex natural environments. Videos: this https://sites.google.com/view/naturalwalkingrl

al

link (url) [BibTex]


no image
Online Calibration of a Single-Track Ground Vehicle Dynamics Model by Tight Fusion with Visual-Inertial Odometry

Li, H., Stueckler, J.

In Accepted for IEEE International Conference on Robotics and Automation (ICRA), 2024, accepted, preprint arXiv:2309.11148 (inproceedings) Accepted

Abstract
Wheeled mobile robots need the ability to estimate their motion and the effect of their control actions for navigation planning. In this paper, we present ST-VIO, a novel approach which tightly fuses a single-track dynamics model for wheeled ground vehicles with visual inertial odometry. Our method calibrates and adapts the dynamics model online and facilitates accurate forward prediction conditioned on future control inputs. The single-track dynamics model approximates wheeled vehicle motion under specific control inputs on flat ground using ordinary differential equations. We use a singularity-free and differentiable variant of the single-track model to enable seamless integration as dynamics factor into VIO and to optimize the model parameters online together with the VIO state variables. We validate our method with real-world data in both indoor and outdoor environments with different terrain types and wheels. In our experiments, we demonstrate that our ST-VIO can not only adapt to the change of the environments and achieve accurate prediction under new control inputs, but even improves the tracking accuracy.

ev

preprint supplemental video [BibTex]

preprint supplemental video [BibTex]


no image
Discrete fourier transform three-to-one (DFT321): Code

Landin, N., Romano, J. M., McMahan, W., Kuchenbecker, K. J.

MATLAB code of discrete fourier transform three-to-one (DFT321), 2024 (misc)

hi

Code Project Page [BibTex]

Code Project Page [BibTex]

2023


no image
Transitivity Recovering Decompositions: Interpretable and Robust Fine-Grained Relationships

Chaudhuri, A., Mancini, M., Akata, Z., Dutta, A.

Advances in Neural Information Processing Systems 36 (NeurIPS 2023), 37th Annual Conference on Neural Information Processing Systems, December 2023 (conference) Accepted

ei

[BibTex]

2023


[BibTex]


no image
Spuriosity Didn’t Kill the Classifier: Using Invariant Predictions to Harness Spurious Features

Eastwood*, C., Singh*, S., Nicolicioiu, A. L., Vlastelica, M., von Kügelgen, J., Schölkopf, B.

Advances in Neural Information Processing Systems 36 (NeurIPS 2023), 36, pages: 18291-18324, (Editors: A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine), Curran Associates, Inc., 37th Annual Conference on Neural Information Processing Systems, December 2023, *equal contribution (conference)

ei

link (url) [BibTex]

link (url) [BibTex]


no image
CLadder: A Benchmark to Assess Causal Reasoning Capabilities of Language Models

Jin*, Z., Chen*, Y., Leeb*, F., Gresele*, L., Kamal, O., Lyu, Z., Blin, K., Gonzalez, F., Kleiman-Weiner, M., Sachan, M., Schölkopf, B.

Advances in Neural Information Processing Systems 36 (NeurIPS 2023), 36, pages: 31038-31065, (Editors: A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine), Curran Associates, Inc., 37th Annual Conference on Neural Information Processing Systems, December 2023, *main contributors (conference)

ei

link (url) [BibTex]

link (url) [BibTex]


no image
Online Learning under Adversarial Nonlinear Constraints

Kolev, P., Martius, G., Muehlebach, M.

In Advances in Neural Information Processing Systems 36, December 2023 (inproceedings)

al

link (url) [BibTex]

link (url) [BibTex]


no image
Generalized Bayesian Inference for Scientific Simulators via Amortized Cost Estimation

Gao*, R., Deistler*, M., Macke, J. H.

Advances in Neural Information Processing Systems 36 (NeurIPS 2023), 36, pages: 80191-80219, (Editors: A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine), Curran Associates, Inc., 37th Annual Conference on Neural Information Processing Systems, December 2023, *equal contribution (conference)

ei

link (url) [BibTex]

link (url) [BibTex]


no image
Meta-in-context learning in large language models

Coda-Forno, J., Binz, M., Akata, Z., Botvinick, M., Wang, J. X., Schulz, E.

Advances in Neural Information Processing Systems 36 (NeurIPS 2023), 37th Annual Conference on Neural Information Processing Systems, December 2023 (conference) Accepted

ei

[BibTex]

[BibTex]


no image
On the Importance of Step-wise Embeddings for Heterogeneous Clinical Time-Series

Kuznetsova*, R., Pace*, A., Burger*, M., Yèche, H., Rätsch, G.

Proceedings of the 3rd Machine Learning for Health Symposium (ML4H) , 225, pages: 268-291, Proceedings of Machine Learning Research, (Editors: Hegselmann, S.and Parziale, A. and Shanmugam, D. and Tang, S. and Asiedu, M. N. and Chang, S. and Hartvigsen, T. and Singh, H.), PMLR, December 2023, *equal contribution (conference)

ei

link (url) [BibTex]

link (url) [BibTex]


no image
Regularity as Intrinsic Reward for Free Play

Sancaktar, C., Piater, J., Martius, G.

In Advances in Neural Information Processing Systems 37, December 2023 (inproceedings)

al

Website Code link (url) [BibTex]

Website Code link (url) [BibTex]


no image
Causal de Finetti: On the Identification of Invariant Causal Structure in Exchangeable Data

Guo*, S., Tóth*, V., Schölkopf, B., Huszár, F.

Advances in Neural Information Processing Systems 36 (NeurIPS 2023), 36, pages: 36463-36475, (Editors: A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine), Curran Associates, Inc., 37th Annual Conference on Neural Information Processing Systems, December 2023, *equal contribution (conference)

ei

link (url) [BibTex]

link (url) [BibTex]


no image
Sampling from Gaussian Process Posteriors using Stochastic Gradient Descent

Lin*, J. A., Antorán*, J., Padhy*, S., Janz, D., Hernández-Lobato, J. M., Terenin, A.

Advances in Neural Information Processing Systems 36 (NeurIPS 2023), 36, pages: 36886-36912, (Editors: A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine), Curran Associates, Inc., 37th Annual Conference on Neural Information Processing Systems, December 2023, *equal contribution (conference)

ei

link (url) [BibTex]

link (url) [BibTex]


Controlling Text-to-Image Diffusion by Orthogonal Finetuning
Controlling Text-to-Image Diffusion by Orthogonal Finetuning

Qiu*, Z., Liu*, W., Feng, H., Xue, Y., Feng, Y., Liu, Z., Zhang, D., Weller, A., Schölkopf, B.

Advances in Neural Information Processing Systems 36 (NeurIPS 2023), 36, pages: 79320-79362, (Editors: A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine), Curran Associates, Inc., 37th Annual Conference on Neural Information Processing Systems, December 2023, *equal contribution (conference)

Abstract
Large text-to-image diffusion models have impressive capabilities in generating photorealistic images from text prompts. How to effectively guide or control these powerful models to perform different downstream tasks becomes an important open problem. To tackle this challenge, we introduce a principled finetuning method -- Orthogonal Finetuning (OFT), for adapting text-to-image diffusion models to downstream tasks. Unlike existing methods, OFT can provably preserve hyperspherical energy which characterizes the pairwise neuron relationship on the unit hypersphere. We find that this property is crucial for preserving the semantic generation ability of text-to-image diffusion models. To improve finetuning stability, we further propose Constrained Orthogonal Finetuning (COFT) which imposes an additional radius constraint to the hypersphere. Specifically, we consider two important finetuning text-to-image tasks: subject-driven generation where the goal is to generate subject-specific images given a few images of a subject and a text prompt, and controllable generation where the goal is to enable the model to take in additional control signals. We empirically show that our OFT framework outperforms existing methods in generation quality and convergence speed.

ei ps

Home Code link (url) [BibTex]

Home Code link (url) [BibTex]


no image
Object-Centric Learning for Real-World Videos by Predicting Temporal Feature Similarities

Zadaianchuk, A., Seitzer, M., Martius, G.

In Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023), Advances in Neural Information Processing Systems 36, December 2023 (inproceedings)

Abstract
Unsupervised video-based object-centric learning is a promising avenue to learn structured representations from large, unlabeled video collections, but previous approaches have only managed to scale to real-world datasets in restricted domains. Recently, it was shown that the reconstruction of pre-trained self-supervised features leads to object-centric representations on unconstrained real-world image datasets. Building on this approach, we propose a novel way to use such pre-trained features in the form of a temporal feature similarity loss. This loss encodes semantic and temporal correlations between image patches and is a natural way to introduce a motion bias for object discovery. We demonstrate that this loss leads to state-of-the-art performance on the challenging synthetic MOVi datasets. When used in combination with the feature reconstruction loss, our model is the first object-centric video model that scales to unconstrained video datasets such as YouTube-VIS.

al

arXiv Website OpenReview link (url) [BibTex]

arXiv Website OpenReview link (url) [BibTex]


no image
In-Context Impersonation Reveals Large Language Models’ Strengths and Biases

Salewski, L., Alaniz, S., Rio-Torto, I., Schulz, E., Akata, Z.

Advances in Neural Information Processing Systems 36 (NeurIPS 2023), 37th Annual Conference on Neural Information Processing Systems, December 2023 (conference) Accepted

ei

[BibTex]

[BibTex]