Header logo is


2019


no image
Limitations of the empirical Fisher approximation for natural gradient descent

Kunstner, F., Hennig, P., Balles, L.

Advances in Neural Information Processing Systems 32, pages: 4158-4169, (Editors: H. Wallach and H. Larochelle and A. Beygelzimer and F. d’Alché-Buc and E. Fox and R. Garnett), Curran Associates, Inc., 33rd Annual Conference on Neural Information Processing Systems, December 2019 (conference)

ei pn

link (url) [BibTex]

2019


link (url) [BibTex]


no image
On the Transfer of Inductive Bias from Simulation to the Real World: a New Disentanglement Dataset

Gondal, M. W., Wuthrich, M., Miladinovic, D., Locatello, F., Breidt, M., Volchkov, V., Akpo, J., Bachem, O., Schölkopf, B., Bauer, S.

Advances in Neural Information Processing Systems 32, pages: 15714-15725, (Editors: H. Wallach and H. Larochelle and A. Beygelzimer and F. d’Alché-Buc and E. Fox and R. Garnett), Curran Associates, Inc., 33rd Annual Conference on Neural Information Processing Systems, December 2019 (conference)

am ei sf

link (url) [BibTex]

link (url) [BibTex]


no image
Convergence Guarantees for Adaptive Bayesian Quadrature Methods

Kanagawa, M., Hennig, P.

Advances in Neural Information Processing Systems 32, pages: 6234-6245, (Editors: H. Wallach and H. Larochelle and A. Beygelzimer and F. d’Alché-Buc and E. Fox and R. Garnett), Curran Associates, Inc., 33rd Annual Conference on Neural Information Processing Systems, December 2019 (conference)

ei pn

link (url) [BibTex]

link (url) [BibTex]


Attacking Optical Flow
Attacking Optical Flow

Ranjan, A., Janai, J., Geiger, A., Black, M. J.

In Proceedings International Conference on Computer Vision (ICCV), pages: 2404-2413, IEEE, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), November 2019, ISSN: 2380-7504 (inproceedings)

Abstract
Deep neural nets achieve state-of-the-art performance on the problem of optical flow estimation. Since optical flow is used in several safety-critical applications like self-driving cars, it is important to gain insights into the robustness of those techniques. Recently, it has been shown that adversarial attacks easily fool deep neural networks to misclassify objects. The robustness of optical flow networks to adversarial attacks, however, has not been studied so far. In this paper, we extend adversarial patch attacks to optical flow networks and show that such attacks can compromise their performance. We show that corrupting a small patch of less than 1% of the image size can significantly affect optical flow estimates. Our attacks lead to noisy flow estimates that extend significantly beyond the region of the attack, in many cases even completely erasing the motion of objects in the scene. While networks using an encoder-decoder architecture are very sensitive to these attacks, we found that networks using a spatial pyramid architecture are less affected. We analyse the success and failure of attacking both architectures by visualizing their feature maps and comparing them to classical optical flow techniques which are robust to these attacks. We also demonstrate that such attacks are practical by placing a printed pattern into real scenes.

avg ps

Video Project Page Paper Supplementary Material link (url) DOI [BibTex]

Video Project Page Paper Supplementary Material link (url) DOI [BibTex]


Markerless Outdoor Human Motion Capture Using Multiple Autonomous Micro Aerial Vehicles
Markerless Outdoor Human Motion Capture Using Multiple Autonomous Micro Aerial Vehicles

Saini, N., Price, E., Tallamraju, R., Enficiaud, R., Ludwig, R., Martinović, I., Ahmad, A., Black, M.

Proceedings 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages: 823-832, IEEE, International Conference on Computer Vision (ICCV), October 2019 (conference)

Abstract
Capturing human motion in natural scenarios means moving motion capture out of the lab and into the wild. Typical approaches rely on fixed, calibrated, cameras and reflective markers on the body, significantly limiting the motions that can be captured. To make motion capture truly unconstrained, we describe the first fully autonomous outdoor capture system based on flying vehicles. We use multiple micro-aerial-vehicles(MAVs), each equipped with a monocular RGB camera, an IMU, and a GPS receiver module. These detect the person, optimize their position, and localize themselves approximately. We then develop a markerless motion capture method that is suitable for this challenging scenario with a distant subject, viewed from above, with approximately calibrated and moving cameras. We combine multiple state-of-the-art 2D joint detectors with a 3D human body model and a powerful prior on human pose. We jointly optimize for 3D body pose and camera pose to robustly fit the 2D measurements. To our knowledge, this is the first successful demonstration of outdoor, full-body, markerless motion capture from autonomous flying vehicles.

ps

Code Data Video Paper Manuscript DOI Project Page [BibTex]

Code Data Video Paper Manuscript DOI Project Page [BibTex]


Resolving {3D} Human Pose Ambiguities with {3D} Scene Constraints
Resolving 3D Human Pose Ambiguities with 3D Scene Constraints

Hassan, M., Choutas, V., Tzionas, D., Black, M. J.

In Proceedings International Conference on Computer Vision, pages: 2282-2292, IEEE, International Conference on Computer Vision, October 2019 (inproceedings)

Abstract
To understand and analyze human behavior, we need to capture humans moving in, and interacting with, the world. Most existing methods perform 3D human pose estimation without explicitly considering the scene. We observe however that the world constrains the body and vice-versa. To motivate this, we show that current 3D human pose estimation methods produce results that are not consistent with the 3D scene. Our key contribution is to exploit static 3D scene structure to better estimate human pose from monocular images. The method enforces Proximal Relationships with Object eXclusion and is called PROX. To test this, we collect a new dataset composed of 12 different 3D scenes and RGB sequences of 20 subjects moving in and interacting with the scenes. We represent human pose using the 3D human body model SMPL-X and extend SMPLify-X to estimate body pose using scene constraints. We make use of the 3D scene information by formulating two main constraints. The interpenetration constraint penalizes intersection between the body model and the surrounding 3D scene. The contact constraint encourages specific parts of the body to be in contact with scene surfaces if they are close enough in distance and orientation. For quantitative evaluation we capture a separate dataset with 180 RGB frames in which the ground-truth body pose is estimated using a motion-capture system. We show quantitatively that introducing scene constraints significantly reduces 3D joint error and vertex error. Our code and data are available for research at https://prox.is.tue.mpg.de.

ps

pdf poster link (url) DOI [BibTex]

pdf poster link (url) DOI [BibTex]


Learning to Reconstruct {3D} Human Pose and Shape via Model-fitting in the Loop
Learning to Reconstruct 3D Human Pose and Shape via Model-fitting in the Loop

Kolotouros, N., Pavlakos, G., Black, M. J., Daniilidis, K.

Proceedings International Conference on Computer Vision (ICCV), pages: 2252-2261, IEEE, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), October 2019, ISSN: 2380-7504 (conference)

Abstract
Model-based human pose estimation is currently approached through two different paradigms. Optimization-based methods fit a parametric body model to 2D observations in an iterative manner, leading to accurate image-model alignments, but are often slow and sensitive to the initialization. In contrast, regression-based methods, that use a deep network to directly estimate the model parameters from pixels, tend to provide reasonable, but not pixel accurate, results while requiring huge amounts of supervision. In this work, instead of investigating which approach is better, our key insight is that the two paradigms can form a strong collaboration. A reasonable, directly regressed estimate from the network can initialize the iterative optimization making the fitting faster and more accurate. Similarly, a pixel accurate fit from iterative optimization can act as strong supervision for the network. This is the core of our proposed approach SPIN (SMPL oPtimization IN the loop). The deep network initializes an iterative optimization routine that fits the body model to 2D joints within the training loop, and the fitted estimate is subsequently used to supervise the network. Our approach is self-improving by nature, since better network estimates can lead the optimization to better solutions, while more accurate optimization fits provide better supervision for the network. We demonstrate the effectiveness of our approach in different settings, where 3D ground truth is scarce, or not available, and we consistently outperform the state-of-the-art model-based pose estimation approaches by significant margins.

ps

pdf code project DOI [BibTex]

pdf code project DOI [BibTex]


Three-D Safari: Learning to Estimate Zebra Pose, Shape, and Texture from Images "In the Wild"
Three-D Safari: Learning to Estimate Zebra Pose, Shape, and Texture from Images "In the Wild"

Zuffi, S., Kanazawa, A., Berger-Wolf, T., Black, M. J.

In International Conference on Computer Vision, pages: 5358-5367, IEEE, International Conference on Computer Vision, October 2019 (inproceedings)

Abstract
We present the first method to perform automatic 3D pose, shape and texture capture of animals from images acquired in-the-wild. In particular, we focus on the problem of capturing 3D information about Grevy's zebras from a collection of images. The Grevy's zebra is one of the most endangered species in Africa, with only a few thousand individuals left. Capturing the shape and pose of these animals can provide biologists and conservationists with information about animal health and behavior. In contrast to research on human pose, shape and texture estimation, training data for endangered species is limited, the animals are in complex natural scenes with occlusion, they are naturally camouflaged, travel in herds, and look similar to each other. To overcome these challenges, we integrate the recent SMAL animal model into a network-based regression pipeline, which we train end-to-end on synthetically generated images with pose, shape, and background variation. Going beyond state-of-the-art methods for human shape and pose estimation, our method learns a shape space for zebras during training. Learning such a shape space from images using only a photometric loss is novel, and the approach can be used to learn shape in other settings with limited 3D supervision. Moreover, we couple 3D pose and shape prediction with the task of texture synthesis, obtaining a full texture map of the animal from a single image. We show that the predicted texture map allows a novel per-instance unsupervised optimization over the network features. This method, SMALST (SMAL with learned Shape and Texture) goes beyond previous work, which assumed manual keypoints and/or segmentation, to regress directly from pixels to 3D animal shape, pose and texture. Code and data are available at https://github.com/silviazuffi/smalst

ps

code pdf supmat iccv19 presentation DOI Project Page [BibTex]

code pdf supmat iccv19 presentation DOI Project Page [BibTex]


no image
Energy Conscious Over-actuated Multi-Agent Payload Transport Robot: Simulations and Preliminary Physical Validation

Tallamraju, R., Verma, P., Sripada, V., Agrawal, S., Karlapalem, K.

28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), pages: 1-7, IEEE, 2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), October 2019 (conference)

ps

DOI [BibTex]

DOI [BibTex]


Efficient Learning on Point Clouds With Basis Point Sets
Efficient Learning on Point Clouds With Basis Point Sets

Prokudin, S., Lassner, C., Romero, J.

International Conference on Computer Vision, pages: 4332-4341, October 2019 (conference)

Abstract
With an increased availability of 3D scanning technology, point clouds are moving into the focus of computer vision as a rich representation of everyday scenes. However, they are hard to handle for machine learning algorithms due to the unordered structure. One common approach is to apply voxelization, which dramatically increases the amount of data stored and at the same time loses details through discretization. Recently, deep learning models with hand-tailored architectures were proposed to handle point clouds directly and achieve input permutation invariance. However, these architectures use an increased number of parameters and are computationally inefficient. In this work we propose basis point sets as a highly efficient and fully general way to process point clouds with machine learning algorithms. Basis point sets are a residual representation that can be computed efficiently and can be used with standard neural network architectures. Using the proposed representation as the input to a relatively simple network allows us to match the performance of PointNet on a shape classification task while using three order of magnitudes less floating point operations. In a second experiment, we show how proposed representation can be used for obtaining high resolution meshes from noisy 3D scans. Here, our network achieves performance comparable to the state-of-the-art computationally intense multi-step frameworks, in one network pass that can be done in less than 1ms.

ps

code pdf [BibTex]

code pdf [BibTex]


End-to-end Learning for Graph Decomposition
End-to-end Learning for Graph Decomposition

Song, J., Andres, B., Black, M., Hilliges, O., Tang, S.

In International Conference on Computer Vision, pages: 10093-10102, October 2019 (inproceedings)

Abstract
Deep neural networks provide powerful tools for pattern recognition, while classical graph algorithms are widely used to solve combinatorial problems. In computer vision, many tasks combine elements of both pattern recognition and graph reasoning. In this paper, we study how to connect deep networks with graph decomposition into an end-to-end trainable framework. More specifically, the minimum cost multicut problem is first converted to an unconstrained binary cubic formulation where cycle consistency constraints are incorporated into the objective function. The new optimization problem can be viewed as a Conditional Random Field (CRF) in which the random variables are associated with the binary edge labels. Cycle constraints are introduced into the CRF as high-order potentials. A standard Convolutional Neural Network (CNN) provides the front-end features for the fully differentiable CRF. The parameters of both parts are optimized in an end-to-end manner. The efficacy of the proposed learning algorithm is demonstrated via experiments on clustering MNIST images and on the challenging task of real-world multi-people pose estimation.

ps

PDF [BibTex]

PDF [BibTex]


{AMASS}: Archive of Motion Capture as Surface Shapes
AMASS: Archive of Motion Capture as Surface Shapes

Mahmood, N., Ghorbani, N., Troje, N. F., Pons-Moll, G., Black, M. J.

Proceedings International Conference on Computer Vision, pages: 5442-5451, IEEE, International Conference on Computer Vision (ICCV), October 2019 (conference)

Abstract
Large datasets are the cornerstone of recent advances in computer vision using deep learning. In contrast, existing human motion capture (mocap) datasets are small and the motions limited, hampering progress on learning models of human motion. While there are many different datasets available, they each use a different parameterization of the body, making it difficult to integrate them into a single meta dataset. To address this, we introduce AMASS, a large and varied database of human motion that unifies 15 different optical marker-based mocap datasets by representing them within a common framework and parameterization. We achieve this using a new method, MoSh++, that converts mocap data into realistic 3D human meshes represented by a rigged body model. Here we use SMPL [26], which is widely used and provides a standard skeletal representation as well as a fully rigged surface mesh. The method works for arbitrary marker-sets, while recovering soft-tissue dynamics and realistic hand motion. We evaluate MoSh++ and tune its hyper-parameters using a new dataset of 4D body scans that are jointly recorded with marker-based mocap. The consistent representation of AMASS makes it readily useful for animation, visualization, and generating training data for deep learning. Our dataset is significantly richer than previous human motion collections, having more than 40 hours of motion data, spanning over 300 subjects, more than 11000 motions, and is available for research at https://amass.is.tue.mpg.de/.

ps

code pdf suppl arxiv project website video poster AMASS_Poster DOI [BibTex]

code pdf suppl arxiv project website video poster AMASS_Poster DOI [BibTex]


The Influence of Visual Perspective on Body Size Estimation in Immersive Virtual Reality
The Influence of Visual Perspective on Body Size Estimation in Immersive Virtual Reality

Thaler, A., Pujades, S., Stefanucci, J. K., Creem-Regehr, S. H., Tesch, J., Black, M. J., Mohler, B. J.

In ACM Symposium on Applied Perception, pages: 1-12, ACM, SAP '19: ACM Symposium on Applied Perception 2019, September 2019 (inproceedings)

Abstract
The creation of realistic self-avatars that users identify with is important for many virtual reality applications. However, current approaches for creating biometrically plausible avatars that represent a particular individual require expertise and are time-consuming. We investigated the visual perception of an avatar’s body dimensions by asking males and females to estimate their own body weight and shape on a virtual body using a virtual reality avatar creation tool. In a method of adjustment task, the virtual body was presented in an HTC Vive head-mounted display either co-located with (first-person perspective) or facing (third-person perspective) the participants. Participants adjusted the body weight and dimensions of various body parts to match their own body shape and size. Both males and females underestimated their weight by 10-20% in the virtual body, but the estimates of the other body dimensions were relatively accurate and within a range of ±6%. There was a stronger influence of visual perspective on the estimates for males, but this effect was dependent on the amount of control over the shape of the virtual body, indicating that the results might be caused by where in the body the weight changes expressed themselves. These results suggest that this avatar creation tool could be used to allow participants to make a relatively accurate self-avatar in terms of adjusting body part dimensions, but not weight, and that the influence of visual perspective and amount of control needed over the body shape are likely gender-specific.

ps

pdf DOI [BibTex]

pdf DOI [BibTex]


Learning to Train with Synthetic Humans
Learning to Train with Synthetic Humans

Hoffmann, D. T., Tzionas, D., Black, M. J., Tang, S.

In German Conference on Pattern Recognition (GCPR), pages: 609-623, Springer International Publishing, September 2019 (inproceedings)

Abstract
Neural networks need big annotated datasets for training. However, manual annotation can be too expensive or even unfeasible for certain tasks, like multi-person 2D pose estimation with severe occlusions. A remedy for this is synthetic data with perfect ground truth. Here we explore two variations of synthetic data for this challenging problem; a dataset with purely synthetic humans, as well as a real dataset augmented with synthetic humans. We then study which approach better generalizes to real data, as well as the influence of virtual humans in the training loss. We observe that not all synthetic samples are equally informative for training, while the informative samples are different for each training stage. To exploit this observation, we employ an adversarial student-teacher framework; the teacher improves the student by providing the hardest samples for its current state as a challenge. Experiments show that this student-teacher framework outperforms all our baselines.

ps

pdf suppl poster link (url) DOI Project Page [BibTex]

pdf suppl poster link (url) DOI Project Page [BibTex]


Motion Planning for Multi-Mobile-Manipulator Payload Transport Systems
Motion Planning for Multi-Mobile-Manipulator Payload Transport Systems

Tallamraju, R., Salunkhe, D., Rajappa, S., Ahmad, A., Karlapalem, K., Shah, S. V.

In 15th IEEE International Conference on Automation Science and Engineering, pages: 1469-1474, IEEE, 2019 IEEE 15th International Conference on Automation Science and Engineering (CASE), August 2019, ISSN: 2161-8089 (inproceedings)

ps

DOI [BibTex]

DOI [BibTex]


Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation
Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation

Ranjan, A., Jampani, V., Balles, L., Kim, K., Sun, D., Wulff, J., Black, M. J.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages: 12240-12249, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2019, June 2019 (inproceedings)

Abstract
We address the unsupervised learning of several interconnected problems in low-level vision: single view depth prediction, camera motion estimation, optical flow, and segmentation of a video into the static scene and moving regions. Our key insight is that these four fundamental vision problems are coupled through geometric constraints. Consequently, learning to solve them together simplifies the problem because the solutions can reinforce each other. We go beyond previous work by exploiting geometry more explicitly and segmenting the scene into static and moving regions. To that end, we introduce Competitive Collaboration, a framework that facilitates the coordinated training of multiple specialized neural networks to solve complex problems. Competitive Collaboration works much like expectation-maximization, but with neural networks that act as both competitors to explain pixels that correspond to static or moving regions, and as collaborators through a moderator that assigns pixels to be either static or independently moving. Our novel method integrates all these problems in a common framework and simultaneously reasons about the segmentation of the scene into moving objects and the static background, the camera motion, depth of the static scene structure, and the optical flow of moving objects. Our model is trained without any supervision and achieves state-of-the-art performance among joint unsupervised methods on all sub-problems.

ps

Paper link (url) Project Page Project Page [BibTex]

Paper link (url) Project Page Project Page [BibTex]


Local Temporal Bilinear Pooling for Fine-grained Action Parsing
Local Temporal Bilinear Pooling for Fine-grained Action Parsing

Zhang, Y., Tang, S., Muandet, K., Jarvers, C., Neumann, H.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2019, June 2019 (inproceedings)

Abstract
Fine-grained temporal action parsing is important in many applications, such as daily activity understanding, human motion analysis, surgical robotics and others requiring subtle and precise operations in a long-term period. In this paper we propose a novel bilinear pooling operation, which is used in intermediate layers of a temporal convolutional encoder-decoder net. In contrast to other work, our proposed bilinear pooling is learnable and hence can capture more complex local statistics than the conventional counterpart. In addition, we introduce exact lower-dimension representations of our bilinear forms, so that the dimensionality is reduced with neither information loss nor extra computation. We perform intensive experiments to quantitatively analyze our model and show the superior performances to other state-of-the-art work on various datasets.

ei ps

Code video demo pdf link (url) [BibTex]

Code video demo pdf link (url) [BibTex]


Learning to Regress 3D Face Shape and Expression from an Image without 3D Supervision
Learning to Regress 3D Face Shape and Expression from an Image without 3D Supervision

Sanyal, S., Bolkart, T., Feng, H., Black, M. J.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages: 7763-7772, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2019, June 2019 (inproceedings)

Abstract
The estimation of 3D face shape from a single image must be robust to variations in lighting, head pose, expression, facial hair, makeup, and occlusions. Robustness requires a large training set of in-the-wild images, which by construction, lack ground truth 3D shape. To train a network without any 2D-to-3D supervision, we present RingNet, which learns to compute 3D face shape from a single image. Our key observation is that an individual’s face shape is constant across images, regardless of expression, pose, lighting, etc. RingNet leverages multiple images of a person and automatically detected 2D face features. It uses a novel loss that encourages the face shape to be similar when the identity is the same and different for different people. We achieve invariance to expression by representing the face using the FLAME model. Once trained, our method takes a single image and outputs the parameters of FLAME, which can be readily animated. Additionally we create a new database of faces “not quite in-the-wild” (NoW) with 3D head scans and high-resolution images of the subjects in a wide variety of conditions. We evaluate publicly available methods and find that RingNet is more accurate than methods that use 3D supervision. The dataset, model, and results are available for research purposes.

ps

code pdf preprint link (url) Project Page [BibTex]

code pdf preprint link (url) Project Page [BibTex]


Learning Joint Reconstruction of Hands and Manipulated Objects
Learning Joint Reconstruction of Hands and Manipulated Objects

Hasson, Y., Varol, G., Tzionas, D., Kalevatykh, I., Black, M. J., Laptev, I., Schmid, C.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages: 11807-11816, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2019, June 2019 (inproceedings)

Abstract
Estimating hand-object manipulations is essential for interpreting and imitating human actions. Previous work has made significant progress towards reconstruction of hand poses and object shapes in isolation. Yet, reconstructing hands and objects during manipulation is a more challenging task due to significant occlusions of both the hand and object. While presenting challenges, manipulations may also simplify the problem since the physics of contact restricts the space of valid hand-object configurations. For example, during manipulation, the hand and object should be in contact but not interpenetrate. In this work, we regularize the joint reconstruction of hands and objects with manipulation constraints. We present an end-to-end learnable model that exploits a novel contact loss that favors physically plausible hand-object constellations. Our approach improves grasp quality metrics over baselines, using RGB images as input. To train and evaluate the model, we also propose a new large-scale synthetic dataset, ObMan, with hand-object manipulations. We demonstrate the transferability of ObMan-trained models to real data.

ps

pdf suppl poster link (url) DOI Project Page Project Page [BibTex]

pdf suppl poster link (url) DOI Project Page Project Page [BibTex]


Expressive Body Capture: 3D Hands, Face, and Body from a Single Image
Expressive Body Capture: 3D Hands, Face, and Body from a Single Image

Pavlakos, G., Choutas, V., Ghorbani, N., Bolkart, T., Osman, A. A. A., Tzionas, D., Black, M. J.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages: 10975-10985, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2019, June 2019 (inproceedings)

Abstract
To facilitate the analysis of human actions, interactions and emotions, we compute a 3D model of human body pose, hand pose, and facial expression from a single monocular image. To achieve this, we use thousands of 3D scans to train a new, unified, 3D model of the human body, SMPL-X, that extends SMPL with fully articulated hands and an expressive face. Learning to regress the parameters of SMPL-X directly from images is challenging without paired images and 3D ground truth. Consequently, we follow the approach of SMPLify, which estimates 2D features and then optimizes model parameters to fit the features. We improve on SMPLify in several significant ways: (1) we detect 2D features corresponding to the face, hands, and feet and fit the full SMPL-X model to these; (2) we train a new neural network pose prior using a large MoCap dataset; (3) we define a new interpenetration penalty that is both fast and accurate; (4) we automatically detect gender and the appropriate body models (male, female, or neutral); (5) our PyTorch implementation achieves a speedup of more than 8x over Chumpy. We use the new method, SMPLify-X, to fit SMPL-X to both controlled images and images in the wild. We evaluate 3D accuracy on a new curated dataset comprising 100 images with pseudo ground-truth. This is a step towards automatic expressive human capture from monocular RGB data. The models, code, and data are available for research purposes at https://smpl-x.is.tue.mpg.de.

ps

video code pdf suppl poster link (url) DOI Project Page [BibTex]

video code pdf suppl poster link (url) DOI Project Page [BibTex]


Capture, Learning, and Synthesis of 3D Speaking Styles
Capture, Learning, and Synthesis of 3D Speaking Styles

Cudeiro, D., Bolkart, T., Laidlaw, C., Ranjan, A., Black, M. J.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages: 10101-10111, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2019, June 2019 (inproceedings)

Abstract
Audio-driven 3D facial animation has been widely explored, but achieving realistic, human-like performance is still unsolved. This is due to the lack of available 3D datasets, models, and standard evaluation metrics. To address this, we introduce a unique 4D face dataset with about 29 minutes of 4D scans captured at 60 fps and synchronized audio from 12 speakers. We then train a neural network on our dataset that factors identity from facial motion. The learned model, VOCA (Voice Operated Character Animation) takes any speech signal as input—even speech in languages other than English—and realistically animates a wide range of adult faces. Conditioning on subject labels during training allows the model to learn a variety of realistic speaking styles. VOCA also provides animator controls to alter speaking style, identity-dependent facial shape, and pose (i.e. head, jaw, and eyeball rotations) during animation. To our knowledge, VOCA is the only realistic 3D facial animation model that is readily applicable to unseen subjects without retargeting. This makes VOCA suitable for tasks like in-game video, virtual reality avatars, or any scenario in which the speaker, speech, or language is not known in advance. We make the dataset and model available for research purposes at http://voca.is.tue.mpg.de.

ps

code Project Page video paper [BibTex]

code Project Page video paper [BibTex]


no image
DeepOBS: A Deep Learning Optimizer Benchmark Suite

Schneider, F., Balles, L., Hennig, P.

7th International Conference on Learning Representations (ICLR), ICLR, 7th International Conference on Learning Representations (ICLR), May 2019 (conference)

ei pn

link (url) [BibTex]

link (url) [BibTex]


Accurate Vision-based Manipulation through Contact Reasoning
Accurate Vision-based Manipulation through Contact Reasoning

Kloss, A., Bauza, M., Wu, J., Tenenbaum, J. B., Rodriguez, A., Bohg, J.

In International Conference on Robotics and Automation, May 2019 (inproceedings) Accepted

Abstract
Planning contact interactions is one of the core challenges of many robotic tasks. Optimizing contact locations while taking dynamics into account is computationally costly and in only partially observed environments, executing contact-based tasks often suffers from low accuracy. We present an approach that addresses these two challenges for the problem of vision-based manipulation. First, we propose to disentangle contact from motion optimization. Thereby, we improve planning efficiency by focusing computation on promising contact locations. Second, we use a hybrid approach for perception and state estimation that combines neural networks with a physically meaningful state representation. In simulation and real-world experiments on the task of planar pushing, we show that our method is more efficient and achieves a higher manipulation accuracy than previous vision-based approaches.

am

Video link (url) [BibTex]

Video link (url) [BibTex]


Learning Latent Space Dynamics for Tactile Servoing
Learning Latent Space Dynamics for Tactile Servoing

Sutanto, G., Ratliff, N., Sundaralingam, B., Chebotar, Y., Su, Z., Handa, A., Fox, D.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) 2019, IEEE, International Conference on Robotics and Automation, May 2019 (inproceedings) Accepted

am

pdf video [BibTex]

pdf video [BibTex]


Leveraging Contact Forces for Learning to Grasp
Leveraging Contact Forces for Learning to Grasp

Merzic, H., Bogdanovic, M., Kappler, D., Righetti, L., Bohg, J.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) 2019, IEEE, International Conference on Robotics and Automation, May 2019 (inproceedings)

Abstract
Grasping objects under uncertainty remains an open problem in robotics research. This uncertainty is often due to noisy or partial observations of the object pose or shape. To enable a robot to react appropriately to unforeseen effects, it is crucial that it continuously takes sensor feedback into account. While visual feedback is important for inferring a grasp pose and reaching for an object, contact feedback offers valuable information during manipulation and grasp acquisition. In this paper, we use model-free deep reinforcement learning to synthesize control policies that exploit contact sensing to generate robust grasping under uncertainty. We demonstrate our approach on a multi-fingered hand that exhibits more complex finger coordination than the commonly used two- fingered grippers. We conduct extensive experiments in order to assess the performance of the learned policies, with and without contact sensing. While it is possible to learn grasping policies without contact sensing, our results suggest that contact feedback allows for a significant improvement of grasping robustness under object pose uncertainty and for objects with a complex shape.

am mg

video arXiv [BibTex]

video arXiv [BibTex]


no image
Fast and Robust Shortest Paths on Manifolds Learned from Data

Arvanitidis, G., Hauberg, S., Hennig, P., Schober, M.

Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), 89, pages: 1506-1515, (Editors: Kamalika Chaudhuri and Masashi Sugiyama), PMLR, April 2019 (conference)

ei pn

PDF link (url) [BibTex]

PDF link (url) [BibTex]


Active Probabilistic Inference on Matrices for Pre-Conditioning in Stochastic Optimization
Active Probabilistic Inference on Matrices for Pre-Conditioning in Stochastic Optimization

de Roos, F., Hennig, P.

Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), 89, pages: 1448-1457, (Editors: Kamalika Chaudhuri and Masashi Sugiyama), PMLR, April 2019 (conference)

Abstract
Pre-conditioning is a well-known concept that can significantly improve the convergence of optimization algorithms. For noise-free problems, where good pre-conditioners are not known a priori, iterative linear algebra methods offer one way to efficiently construct them. For the stochastic optimization problems that dominate contemporary machine learning, however, this approach is not readily available. We propose an iterative algorithm inspired by classic iterative linear solvers that uses a probabilistic model to actively infer a pre-conditioner in situations where Hessian-projections can only be constructed with strong Gaussian noise. The algorithm is empirically demonstrated to efficiently construct effective pre-conditioners for stochastic gradient descent and its variants. Experiments on problems of comparably low dimensionality show improved convergence. In very high-dimensional problems, such as those encountered in deep learning, the pre-conditioner effectively becomes an automatic learning-rate adaptation scheme, which we also empirically show to work well.

ei pn

PDF link (url) [BibTex]

PDF link (url) [BibTex]


no image
Distributed, Collaborative Virtual Reality Application for Product Development with Simple Avatar Calibration Method

Dixken, M., Diers, D., Wingert, B., Hatzipanayioti, A., Mohler, B. J., Riedel, O., Bues, M.

IEEE Conference on Virtual Reality and 3D User Interfaces, (VR), pages: 1299-1300, IEEE, March 2019 (conference)

ps

DOI [BibTex]

DOI [BibTex]


no image
Soft Sensors for Curvature Estimation under Water in a Soft Robotic Fish

Wright, Brian, Vogt, Daniel M., Wood, Robert J., Jusufi, Ardian

In 2019 2nd IEEE International Conference on Soft Robotics (RoboSoft 2019), pages: 367-371, IEEE, Piscataway, NJ, 2nd IEEE International Conference on Soft Robotics (RoboSoft 2019), 2019 (inproceedings)

bio

DOI [BibTex]

DOI [BibTex]


Resisting Adversarial Attacks using Gaussian Mixture Variational Autoencoders
Resisting Adversarial Attacks using Gaussian Mixture Variational Autoencoders

Ghosh, P., Losalka, A., Black, M. J.

In Proc. AAAI, 2019 (inproceedings)

Abstract
Susceptibility of deep neural networks to adversarial attacks poses a major theoretical and practical challenge. All efforts to harden classifiers against such attacks have seen limited success till now. Two distinct categories of samples against which deep neural networks are vulnerable, ``adversarial samples" and ``fooling samples", have been tackled separately so far due to the difficulty posed when considered together. In this work, we show how one can defend against them both under a unified framework. Our model has the form of a variational autoencoder with a Gaussian mixture prior on the latent variable, such that each mixture component corresponds to a single class. We show how selective classification can be performed using this model, thereby causing the adversarial objective to entail a conflict. The proposed method leads to the rejection of adversarial samples instead of misclassification, while maintaining high precision and recall on test data. It also inherently provides a way of learning a selective classifier in a semi-supervised scenario, which can similarly resist adversarial attacks. We further show how one can reclassify the detected adversarial samples by iterative optimization.

ps

link (url) Project Page [BibTex]


no image
Heads or Tails? Cranio-Caudal Mass Distribution for Robust Locomotion with Biorobotic Appendages Composed of 3D-Printed Soft Materials

Siddall, R., Schwab, F., Michel, J., Weaver, J., Jusufi, A.

In Biomimetic and Biohybrid Systems, 11556, pages: 240-253, Lecture Notes in Artificial Intelligence, (Editors: Martinez-Hernandez, Uriel and Vouloutsi, Vasiliki and Mura, Anna and Mangan, Michael and Asada, Minoru and Prescott, Tony J. and Verschure, Paul F. M. J.), Springer, Cham, Living Machines 2019: 8th International Conference on Biomimetic and Biohybrid Systems, 2019 (inproceedings)

bio

DOI [BibTex]

DOI [BibTex]

2006


no image
Learning operational space control

Peters, J., Schaal, S.

In Robotics: Science and Systems II (RSS 2006), pages: 255-262, (Editors: Gaurav S. Sukhatme and Stefan Schaal and Wolfram Burgard and Dieter Fox), Cambridge, MA: MIT Press, RSS , 2006, clmc (inproceedings)

Abstract
While operational space control is of essential importance for robotics and well-understood from an analytical point of view, it can be prohibitively hard to achieve accurate control in face of modeling errors, which are inevitable in complex robots, e.g., humanoid robots. In such cases, learning control methods can offer an interesting alternative to analytical control algorithms. However, the resulting learning problem is ill-defined as it requires to learn an inverse mapping of a usually redundant system, which is well known to suffer from the property of non-covexity of the solution space, i.e., the learning system could generate motor commands that try to steer the robot into physically impossible configurations. A first important insight for this paper is that, nevertheless, a physically correct solution to the inverse problem does exits when learning of the inverse map is performed in a suitable piecewise linear way. The second crucial component for our work is based on a recent insight that many operational space controllers can be understood in terms of a constraint optimal control problem. The cost function associated with this optimal control problem allows us to formulate a learning algorithm that automatically synthesizes a globally consistent desired resolution of redundancy while learning the operational space controller. From the view of machine learning, the learning problem corresponds to a reinforcement learning problem that maximizes an immediate reward and that employs an expectation-maximization policy search algorithm. Evaluations on a three degrees of freedom robot arm illustrate the feasability of our suggested approach.

am ei

link (url) [BibTex]

2006


link (url) [BibTex]


no image
Reinforcement Learning for Parameterized Motor Primitives

Peters, J., Schaal, S.

In Proceedings of the 2006 International Joint Conference on Neural Networks, pages: 73-80, IJCNN, 2006, clmc (inproceedings)

Abstract
One of the major challenges in both action generation for robotics and in the understanding of human motor control is to learn the "building blocks of movement generation", called motor primitives. Motor primitives, as used in this paper, are parameterized control policies such as splines or nonlinear differential equations with desired attractor properties. While a lot of progress has been made in teaching parameterized motor primitives using supervised or imitation learning, the self-improvement by interaction of the system with the environment remains a challenging problem. In this paper, we evaluate different reinforcement learning approaches for improving the performance of parameterized motor primitives. For pursuing this goal, we highlight the difficulties with current reinforcement learning methods, and outline both established and novel algorithms for the gradient-based improvement of parameterized policies. We compare these algorithms in the context of motor primitive learning, and show that our most modern algorithm, the Episodic Natural Actor-Critic outperforms previous algorithms by at least an order of magnitude. We demonstrate the efficiency of this reinforcement learning method in the application of learning to hit a baseball with an anthropomorphic robot arm.

am ei

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Statistical Learning of LQG controllers

Theodorou, E.

Technical Report-2006-1, Computational Action and Vision Lab University of Minnesota, 2006, clmc (techreport)

am

PDF [BibTex]

PDF [BibTex]

2001


no image
Humanoid oculomotor control based on concepts of computational neuroscience

Shibata, T., Vijayakumar, S., Conradt, J., Schaal, S.

In Humanoids2001, Second IEEE-RAS International Conference on Humanoid Robots, 2001, clmc (inproceedings)

Abstract
Oculomotor control in a humanoid robot faces similar problems as biological oculomotor systems, i.e., the stabilization of gaze in face of unknown perturbations of the body, selective attention, the complexity of stereo vision and dealing with large information processing delays. In this paper, we suggest control circuits to realize three of the most basic oculomotor behaviors - the vestibulo-ocular and optokinetic reflex (VOR-OKR) for gaze stabilization, smooth pursuit for tracking moving objects, and saccades for overt visual attention. Each of these behaviors was derived from inspirations from computational neuroscience, which proves to be a viable strategy to explore novel control mechanisms for humanoid robotics. Our implementations on a humanoid robot demonstrate good performance of the oculomotor behaviors that appears natural and human-like.

am

link (url) [BibTex]

2001


link (url) [BibTex]


no image
Trajectory formation for imitation with nonlinear dynamical systems

Ijspeert, A., Nakanishi, J., Schaal, S.

In IEEE International Conference on Intelligent Robots and Systems (IROS 2001), pages: 752-757, Weilea, Hawaii, Oct.29-Nov.3, 2001, clmc (inproceedings)

Abstract
This article explores a new approach to learning by imitation and trajectory formation by representing movements as mixtures of nonlinear differential equations with well-defined attractor dynamics. An observed movement is approximated by finding a best fit of the mixture model to its data by a recursive least squares regression technique. In contrast to non-autonomous movement representations like splines, the resultant movement plan remains an autonomous set of nonlinear differential equations that forms a control policy which is robust to strong external perturbations and that can be modified by additional perceptual variables. This movement policy remains the same for a given target, regardless of the initial conditions, and can easily be re-used for new targets. We evaluate the trajectory formation system (TFS) in the context of a humanoid robot simulation that is part of the Virtual Trainer (VT) project, which aims at supervising rehabilitation exercises in stroke-patients. A typical rehabilitation exercise was collected with a Sarcos Sensuit, a device to record joint angular movement from human subjects, and approximated and reproduced with our imitation techniques. Our results demonstrate that multi-joint human movements can be encoded successfully, and that this system allows robust modifications of the movement policy through external variables.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Real-time statistical learning for robotics and human augmentation

Schaal, S., Vijayakumar, S., D’Souza, A., Ijspeert, A., Nakanishi, J.

In International Symposium on Robotics Research, (Editors: Jarvis, R. A.;Zelinsky, A.), Lorne, Victoria, Austrialia Nov.9-12, 2001, clmc (inproceedings)

Abstract
Real-time modeling of complex nonlinear dynamic processes has become increasingly important in various areas of robotics and human augmentation. To address such problems, we have been developing special statistical learning methods that meet the demands of on-line learning, in particular the need for low computational complexity, rapid learning, and scalability to high-dimensional spaces. In this paper, we introduce a novel algorithm that possesses all the necessary properties by combining methods from probabilistic and nonparametric learning. We demonstrate the applicability of our methods for three different applications in humanoid robotics, i.e., the on-line learning of a full-body inverse dynamics model, an inverse kinematics model, and imitation learning. The latter application will also introduce a novel method to shape attractor landscapes of dynamical system by means of statis-tical learning.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Robust learning of arm trajectories through human demonstration

Billard, A., Schaal, S.

In IEEE International Conference on Intelligent Robots and Systems (IROS 2001), Piscataway, NJ: IEEE, Maui, Hawaii, Oct.29-Nov.3, 2001, clmc (inproceedings)

Abstract
We present a model, composed of hierarchy of artificial neural networks, for robot learning by demonstration. The model is implemented in a dynamic simulation of a 41 degrees of freedom humanoid for reproducing 3D human motion of the arm. Results show that the model requires few information about the desired trajectory and learns on-line the relevant features of movement. It can generalize across a small set of data to produce a qualitatively good reproduction of the demonstrated trajectory. Finally, it is shown that reproduction of the trajectory after learning is robust against perturbations.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Overt visual attention for a humanoid robot

Vijayakumar, S., Conradt, J., Shibata, T., Schaal, S.

In IEEE International Conference on Intelligent Robots and Systems (IROS 2001), 2001, clmc (inproceedings)

Abstract
The goal of our research is to investigate the interplay between oculomotor control, visual processing, and limb control in humans and primates by exploring the computational issues of these processes with a biologically inspired artificial oculomotor system on an anthropomorphic robot. In this paper, we investigate the computational mechanisms for visual attention in such a system. Stimuli in the environment excite a dynamical neural network that implements a saliency map, i.e., a winner-take-all competition between stimuli while simultenously smoothing out noise and suppressing irrelevant inputs. In real-time, this system computes new targets for the shift of gaze, executed by the head-eye system of the robot. The redundant degrees-of- freedom of the head-eye system are resolved through a learned inverse kinematics with optimization criterion. We also address important issues how to ensure that the coordinate system of the saliency map remains correct after movement of the robot. The presented attention system is built on principled modules and generally applicable for any sensory modality.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Learning inverse kinematics

D’Souza, A., Vijayakumar, S., Schaal, S.

In IEEE International Conference on Intelligent Robots and Systems (IROS 2001), Piscataway, NJ: IEEE, Maui, Hawaii, Oct.29-Nov.3, 2001, clmc (inproceedings)

Abstract
Real-time control of the endeffector of a humanoid robot in external coordinates requires computationally efficient solutions of the inverse kinematics problem. In this context, this paper investigates learning of inverse kinematics for resolved motion rate control (RMRC) employing an optimization criterion to resolve kinematic redundancies. Our learning approach is based on the key observations that learning an inverse of a non uniquely invertible function can be accomplished by augmenting the input representation to the inverse model and by using a spatially localized learning approach. We apply this strategy to inverse kinematics learning and demonstrate how a recently developed statistical learning algorithm, Locally Weighted Projection Regression, allows efficient learning of inverse kinematic mappings in an incremental fashion even when input spaces become rather high dimensional. The resulting performance of the inverse kinematics is comparable to Liegeois ([1]) analytical pseudo inverse with optimization. Our results are illustrated with a 30 degree-of-freedom humanoid robot.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Biomimetic smooth pursuit based on fast learning of the target dynamics

Shibata, T., Schaal, S.

In IEEE International Conference on Intelligent Robots and Systems (IROS 2001), 2001, clmc (inproceedings)

Abstract
Following a moving target with a narrow-view foveal vision system is one of the essential oculomotor behaviors of humans and humanoids. This oculomotor behavior, called ``Smooth Pursuit'', requires accurate tracking control which cannot be achieved by a simple visual negative feedback controller due to the significant delays in visual information processing. In this paper, we present a biologically inspired and control theoretically sound smooth pursuit controller consisting of two cascaded subsystems. One is an inverse model controller for the oculomotor system, and the other is a learning controller for the dynamics of the visual target. The latter controller learns how to predict the target's motion in head coordinates such that tracking performance can be improved. We investigate our smooth pursuit system in simulations and experiments on a humanoid robot. By using a fast on-line statistical learning network, our humanoid oculomotor system is able to acquire high performance smooth pursuit after about 5 seconds of learning despite significant processing delays in the syste

am

link (url) [BibTex]

link (url) [BibTex]

2000


no image
Reciprocal excitation between biological and robotic research

Schaal, S., Sternad, D., Dean, W., Kotoska, S., Osu, R., Kawato, M.

In Sensor Fusion and Decentralized Control in Robotic Systems III, Proceedings of SPIE, 4196, pages: 30-40, Boston, MA, Nov.5-8, 2000, November 2000, clmc (inproceedings)

Abstract
While biological principles have inspired researchers in computational and engineering research for a long time, there is still rather limited knowledge flow back from computational to biological domains. This paper presents examples of our work where research on anthropomorphic robots lead us to new insights into explaining biological movement phenomena, starting from behavioral studies up to brain imaging studies. Our research over the past years has focused on principles of trajectory formation with nonlinear dynamical systems, on learning internal models for nonlinear control, and on advanced topics like imitation learning. The formal and empirical analyses of the kinematics and dynamics of movements systems and the tasks that they need to perform lead us to suggest principles of motor control that later on we found surprisingly related to human behavior and even brain activity.

am

link (url) [BibTex]

2000


link (url) [BibTex]


no image
Nonlinear dynamical systems as movement primitives

Schaal, S., Kotosaka, S., Sternad, D.

In Humanoids2000, First IEEE-RAS International Conference on Humanoid Robots, CD-Proceedings, Cambridge, MA, September 2000, clmc (inproceedings)

Abstract
This paper explores the idea to create complex human-like movements from movement primitives based on nonlinear attractor dynamics. Each degree-of-freedom of a limb is assumed to have two independent abilities to create movement, one through a discrete dynamic system, and one through a rhythmic system. The discrete system creates point-to-point movements based on internal or external target specifications. The rhythmic system can add an additional oscillatory movement relative to the current position of the discrete system. In the present study, we develop appropriate dynamic systems that can realize the above model, motivate the particular choice of the systems from a biological and engineering point of view, and present simulation results of the performance of such movement primitives. The model was implemented for a drumming task on a humanoid robot

am

link (url) [BibTex]

link (url) [BibTex]


no image
Real Time Learning in Humanoids: A challenge for scalability of Online Algorithms

Vijayakumar, S., Schaal, S.

In Humanoids2000, First IEEE-RAS International Conference on Humanoid Robots, CD-Proceedings, Cambridge, MA, September 2000, clmc (inproceedings)

Abstract
While recent research in neural networks and statistical learning has focused mostly on learning from finite data sets without stringent constraints on computational efficiency, there is an increasing number of learning problems that require real-time performance from an essentially infinite stream of incrementally arriving data. This paper demonstrates how even high-dimensional learning problems of this kind can successfully be dealt with by techniques from nonparametric regression and locally weighted learning. As an example, we describe the application of one of the most advanced of such algorithms, Locally Weighted Projection Regression (LWPR), to the on-line learning of the inverse dynamics model of an actual seven degree-of-freedom anthropomorphic robot arm. LWPR's linear computational complexity in the number of input dimensions, its inherent mechanisms of local dimensionality reduction, and its sound learning rule based on incremental stochastic leave-one-out cross validation allows -- to our knowledge for the first time -- implementing inverse dynamics learning for such a complex robot with real-time performance. In our sample task, the robot acquires the local inverse dynamics model needed to trace a figure-8 in only 60 seconds of training.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Synchronized robot drumming by neural oscillator

Kotosaka, S., Schaal, S.

In The International Symposium on Adaptive Motion of Animals and Machines, Montreal, Canada, August 2000, clmc (inproceedings)

Abstract
Sensory-motor integration is one of the key issues in robotics. In this paper, we propose an approach to rhythmic arm movement control that is synchronized with an external signal based on exploiting a simple neural oscillator network. Trajectory generation by the neural oscillator is a biologically inspired method that can allow us to generate a smooth and continuous trajectory. The parameter tuning of the oscillators is used to generate a synchronized movement with wide intervals. We adopted the method for the drumming task as an example task. By using this method, the robot can realize synchronized drumming with wide drumming intervals in real time. The paper also shows the experimental results of drumming by a humanoid robot.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Real-time robot learning with locally weighted statistical learning

Schaal, S., Atkeson, C. G., Vijayakumar, S.

In International Conference on Robotics and Automation (ICRA2000), San Francisco, April 2000, 2000, clmc (inproceedings)

Abstract
Locally weighted learning (LWL) is a class of statistical learning techniques that provides useful representations and training algorithms for learning about complex phenomena during autonomous adaptive control of robotic systems. This paper introduces several LWL algorithms that have been tested successfully in real-time learning of complex robot tasks. We discuss two major classes of LWL, memory-based LWL and purely incremental LWL that does not need to remember any data explicitly. In contrast to the traditional beliefs that LWL methods cannot work well in high-dimensional spaces, we provide new algorithms that have been tested in up to 50 dimensional learning problems. The applicability of our LWL algorithms is demonstrated in various robot learning examples, including the learning of devil-sticking, pole-balancing of a humanoid robot arm, and inverse-dynamics learning for a seven degree-of-freedom robot.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Fast learning of biomimetic oculomotor control with nonparametric regression networks

Shibata, T., Schaal, S.

In International Conference on Robotics and Automation (ICRA2000), pages: 3847-3854, San Francisco, April 2000, 2000, clmc (inproceedings)

Abstract
Accurate oculomotor control is one of the essential pre-requisites of successful visuomotor coordination. Given the variable nonlinearities of the geometry of binocular vision as well as the possible nonlinearities of the oculomotor plant, it is desirable to accomplish accurate oculomotor control through learning approaches. In this paper, we investigate learning control for a biomimetic active vision system mounted on a humanoid robot. By combining a biologically inspired cerebellar learning scheme with a state-of-the-art statistical learning network, our robot system is able to acquire high performance visual stabilization reflexes after about 40 seconds of learning despite significant nonlinearities and processing delays in the system.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Locally weighted projection regression: An O(n) algorithm for incremental real time learning in high dimensional spaces

Vijayakumar, S., Schaal, S.

In Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), 1, pages: 288-293, Stanford, CA, 2000, clmc (inproceedings)

Abstract
Locally weighted projection regression is a new algorithm that achieves nonlinear function approximation in high dimensional spaces with redundant and irrelevant input dimensions. At its core, it uses locally linear models, spanned by a small number of univariate regressions in selected directions in input space. This paper evaluates different methods of projection regression and derives a nonlinear function approximator based on them. This nonparametric local learning system i) learns rapidly with second order learning methods based on incremental training, ii) uses statistically sound stochastic cross validation to learn iii) adjusts its weighting kernels based on local information only, iv) has a computational complexity that is linear in the number of inputs, and v) can deal with a large number of - possibly redundant - inputs, as shown in evaluations with up to 50 dimensional data sets. To our knowledge, this is the first truly incremental spatially localized learning method to combine all these properties.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Inverse kinematics for humanoid robots

Tevatia, G., Schaal, S.

In International Conference on Robotics and Automation (ICRA2000), pages: 294-299, San Fransisco, April 24-28, 2000, 2000, clmc (inproceedings)

Abstract
Real-time control of the endeffector of a humanoid robot in external coordinates requires computationally efficient solutions of the inverse kinematics problem. In this context, this paper investigates methods of resolved motion rate control (RMRC) that employ optimization criteria to resolve kinematic redundancies. In particular we focus on two established techniques, the pseudo inverse with explicit optimization and the extended Jacobian method. We prove that the extended Jacobian method includes pseudo-inverse methods as a special solution. In terms of computational complexity, however, pseudo-inverse and extended Jacobian differ significantly in favor of pseudo-inverse methods. Employing numerical estimation techniques, we introduce a computationally efficient version of the extended Jacobian with performance comparable to the original version . Our results are illustrated in simulation studies with a multiple degree-of-freedom robot, and were tested on a 30 degree-of-freedom robot. 

am

link (url) [BibTex]

link (url) [BibTex]


no image
Fast and efficient incremental learning for high-dimensional movement systems

Vijayakumar, S., Schaal, S.

In International Conference on Robotics and Automation (ICRA2000), San Francisco, April 2000, 2000, clmc (inproceedings)

Abstract
We introduce a new algorithm, Locally Weighted Projection Regression (LWPR), for incremental real-time learning of nonlinear functions, as particularly useful for problems of autonomous real-time robot control that re-quires internal models of dynamics, kinematics, or other functions. At its core, LWPR uses locally linear models, spanned by a small number of univariate regressions in selected directions in input space, to achieve piecewise linear function approximation. The most outstanding properties of LWPR are that it i) learns rapidly with second order learning methods based on incremental training, ii) uses statistically sound stochastic cross validation to learn iii) adjusts its local weighting kernels based on only local information to avoid interference problems, iv) has a computational complexity that is linear in the number of inputs, and v) can deal with a large number ofâ??possibly redundant and/or irrelevantâ??inputs, as shown in evaluations with up to 50 dimensional data sets for learning the inverse dynamics of an anthropomorphic robot arm. To our knowledge, this is the first incremental neural network learning method to combine all these properties and that is well suited for complex on-line learning problems in robotics.

am

link (url) [BibTex]

link (url) [BibTex]