Disney Research has been actively pushing the state-of-the-art in digitizing humans over the past decade, impacting both academia and industry. In this talk I will give an overview of a selected few projects in this area, from research into production. I will be talking about photogrammetric shape acquisition and dense performance capture for faces, eye and teeth scanning and parameterization, as well as physically based capture and modelling for hair and volumetric tissues.
Organizers: Timo Bolkart
Abstract: Sequential Monte Carlo (SMC) methods (including the particle filters and smoothers) allows us to compute probabilistic representations of the unknown objects in models used to represent for example nonlinear dynamical systems. This talk has three connected parts: 1. A (hopefully pedagogical) introduction to probabilistic modelling of dynamical systems and an explanation of the SMC method. 2. In learning unknown parameters appearing in nonlinear state-space models using maximum likelihood it is natural to make use of SMC to compute unbiased estimates of the intractable likelihood. The challenge is that the resulting optimization problem is stochastic, which recently inspired us to construct a new solution to this problem. 3. A challenge with the above (and in fact with most use of SMC) is that it all quickly becomes very technical. This is indeed the key challenging in spreading the use of SMC methods to a wider group of users. At the same time there are many researchers who would benefit a lot from having access to these methods in their daily work and for those of us already working with them it is essential to reduce the amount of time spent on new problems. We believe that the solution to this can be provided by probabilistic programming. We are currently developing a new probabilistic programming language that we call Birch. A pre-release is available from birch-lang.org/ It allow users to use SMC methods without having to implement the algorithms on their own.
Organizers: Philipp Hennig
Today’s advances in tactile sensing and wearable, IOT and context-aware computing are spurring new ideas about how to configure touch-centered interactions in terms of roles and utility, which in turn expose new technical and social design questions. But while haptic actuation, sensing and control are improving, incorporating them into a real-world design process is challenging and poses a major obstacle to adoption into everyday technology. Some classes of haptic devices, e.g., grounded force feedback, remain expensive and limited in range. I’ll describe some recent highlights of an ongoing effort to understand how to support haptic designers and end-users. These include a wealth of online experimental design tools, and DIY open sourced hardware and accessible means of creating, for example, expressive physical robot motions and evolve physically sensed expressive tactile languages. Elsewhere, we are establishing the value of haptic force feedback in embodied learning environments, to help kids understand physics and math concepts. This has inspired the invention of a low-cost, handheld and large motion force feedback device that can be used in online environments or collaborative scenarios, and could be suitable for K-12 school contexts; this is ongoing research with innovative education and technological elements. All our work is available online, where possible as web tools, and we plan to push our research into a broader openhaptics effort.
Organizers: Katherine Kuchenbecker
Why cannot the current robots act intelligently in the real-world environment? A major challenge lies in the lack of adequate tactile sensing technologies. Robots need tactile sensing to understand the physical environment, and detect the contact states during manipulation. Progress requires advances in the sensing hardware, but also advances in the software that can exploit the tactile signals. We developed a high-resolution tactile sensor, GelSight, which measures the geometry and traction field of the contact surface. For interpreting the high-resolution tactile signal, we utilize both traditional statistical models and deep neural networks. I will describe my research on both exploration and manipulation. For exploration, I use active touch to estimate the physical properties of the objects. The work has included learning the hardness of artificial objects, as well as estimating the general properties of natural objects via autonomous tactile exploration. For manipulation, I study the robot’s ability to detect slip or incipient slip with tactile sensing during grasping. The research helps robots to better understand and flexibly interact with the physical world.
Organizers: Katherine Kuchenbecker
Gliding evolved at least nine times in mammals. Despite the abundance and diversity of gliding mammals, little is known about their convergent morphology and mechanisms of aerodynamic control. Many gliding animals are capable of impressive and agile aerial behaviors and their flight performance depends on the aerodynamic forces resulting from airflow interacting with a flexible, membranous wing (patagium). Although the mechanisms that gliders use to control dynamic flight are poorly understood, the shape of the gliding membrane (e.g., angle of attack, camber) is likely a primary factor governing the control of the interaction between aerodynamic forces and the animal’s body. Data from field studies of gliding behavior, lab experiments examining membrane shape changes during glides and morphological and materials testing data of gliding membranes will be presented that can aid our understanding of the mechanisms gliding mammals use to control their membranous wings and potentially provide insights into the design of man-made flexible wings.
Modern technology allows us to collect, process, and share more data than ever before. This data revolution opens up new ways to design control and learning algorithms, which will form the algorithmic foundation for future intelligent systems that shall act autonomously in the physical world. Starting from a discussion of the special challenges when combining machine learning and control, I will present some of our recent research in this exciting area. Using the example of the Apollo robot learning to balance a stick in its hand, I will explain how intelligent agents can learn new behavior from just a few experimental trails. I will also discuss the need for theoretical guarantees in learning-based control, and how we can obtain them by combining learning and control theory.
In 1995 Fraunhofer IPA embarked on a mission towards designing a personal robot assistant for everyday tasks. In the following years Care-O-bot developed into a long-term experiment for exploring and demonstrating new robot technologies and future product visions. The recent fourth generation of the Care-O-bot, introduced in 2014 aimed at designing an integrated system which addressed a number of innovations such as modularity, “low-cost” by making use of new manufacturing processes, and advanced human-user interaction. Some 15 systems were built and the intellectual property (IP) generated by over 20 years of research was recently licensed to a start-up. The presentation will review the path from an experimental platform for building up expertise in various robotic disciplines to recent pilot applications based on the now commercial Care-O-bot hardware.
With the ubiquity of catalyzed reactions in manufacturing, the emergence of the device laden internet of things, and global challenges with respect to water and energy, it has never been more important to understand atomic interactions in the functional materials that can provide solutions in these spaces.
Everyone in visual psychology seems to know what Biological Motion is. Yet, it is not easy to come up with a definition that is specific enough to justify a distinct label, but is also general enough to include the many different experiments to which the term has been applied in the past. I will present a number of tasks, stimuli, and experiments, including some of my own work, to demonstrate the diversity and the appeal of the field of biological motion perception. In trying to come up with a definition of the term, I will particularly focus on a type of motion that has been considered “non-biological” in some contexts, even though it might contain -- as more recent work shows -- one of the most important visual invariants used by the visual system to distinguish animate from inanimate motion.
We present an approach to creating 3D models of objects depicted in Web images, even when each object may only be shown in a single image. Our approach uses a comparatively small collection of existing 3D models to guide the reconstruction process. These existing shapes are used to derive information about shape structure. Our guiding idea is to jointly analyze the images and the available 3D models. Joint analysis of all images along with the available shapes regularizes the formulated optimization problems, stabilizes estimation of camera parameters and construction of dense pixel-level correspondences, and leads to reasonable reproduction of object appearance in the absence of traditional multi-view cues. Joint work with Qixing Huang and Hai Wang.
Image-based rendering has been introduced in the 1990s as an alternative approach to photorealistic rendering. Its key idea is to novel renderings by re-projecting pixels from nearby views. The basic approach works well for many scenes but breaks down if the scene contains “non-standard” elements such as reflective surfaces. In this talk, I will first show how we can extend image-based rendering to handle scenes with reflections. I will then discuss a novel gradient-based technique for image-based rendering that can intrinsically handle scenes with reflections.
When you touch objects in your surroundings, you can discern each item’ s physical properties from the rich array of haptic cues you experience, including both the tactile sensations arising in your skin and the kinesthetic cues originating in your muscles and joints. Although physical interaction with the world is at the core of human experience, few computer and machine interfaces provide the operator with high-fidelity touch feedback, limiting their usability . Similarly , autonomous robots rarely take advantage of touch perception and thus struggle to match the manipulation capabilities of humans. This talk will describe several research projects from Professor Kuchenbecker's laboratory , including data-driven haptic texture rendering, vibrotactile feedback of tool vibrations for robotic surgery , and robotic learning of haptic adjectives
Organizers: Jane Walters
The scenario approach is a broad methodology to deal with decision-making in an uncertain environment. By resorting to observations, or by sampling uncertainty from a given model, one obtains an optimization problem (the scenario problem), whose solution bears precise probabilistic guarantees in relation to new, unseen, situations. The scenario approach opens up new avenues to address data-based problems in learning, identification, finance, and other fields.
Organizers: Sebastian Trimpe
Driven by the increasing demand for photorealistic computer-generated images, graphics is currently undergoing a substantial transformation to physics-based approaches which accurately reproduce the interaction of light and matter. Progress on both sides of this transformation -- physical models and simulation techniques -- has been steady but mostly independent from another. When combined, the resulting methods are in many cases impracticably slow and require unrealistic workarounds to process even simple everyday scenes. My research lies at the interface of these two research fields; my goal is to break down the barriers between simulation techniques and the underlying physical models, and to use the resulting insights to develop realistic methods that remain efficient over a wide range of inputs.
I will cover three areas of recent work: the first involves volumetric modeling approaches to create realistic images of woven and knitted cloth. Next, I will discuss reflectance models for glitter/sparkle effects and arbitrarily layered materials that are specially designed to allow for efficient simulations. In the last part of the talk, I will give an overview of Manifold Exploration, a Markov Chain Monte Carlo technique that is able to reason about the geometric structure of light paths in high dimensional configuration spaces defined by the underlying physical models, and which uses this information to compute images more efficiently.
I will present selected research projects of the Photogrammetry and Remote Sensing Group at ETH, including (i) 3D scene flow estimation for stereo video captured from a car; (ii) extraction of road networks from aerial images; and (iii) 3D reconstruction from large, unstructured (e.g. crowd-sourced) image collections.
The growing scale of image and video datasets in vision makes labeling and annotation of such datasets, for training of recognition models, difficult and time consuming. Further, richer models often require richer labelings of the data, that are typically even more difficult to obtain. In this talk I will focus on two models that make use of different forms of supervision for two different vision tasks.
In the first part of this talk I will focus on object detection. The appearance of an object changes profoundly with pose, camera view and interactions of the object with other objects in the scene. This makes it challenging to learn detectors based on an object-level labels (e.g., “car”). We postulate that having a richer set of labelings (at different levels of granularity) for an object, including finer-grained sub-categories, consistent in appearance and view, and higher-order composites – contextual groupings of objects consistent in their spatial layout and appearance, can significantly alleviate these problems. However, obtaining such a rich set of annotations, including annotation of an exponentially growing set of object groupings, is infeasible. To this end, we propose a weakly-supervised framework for object detection where we discover subcategories and the composites automatically with only traditional object-level category labels as input.
In the second part of the talk I will focus on the framework for large scale image set and video summarization. Starting from the intuition that the characteristics of the two media types are different but complementary, we develop a fast and easily-parallelizable approach for creating not only video summaries but also novel structural summaries of events in the form of the storyline graphs. The storyline graphs can illustrate various events or activities associated with the topic in the form of a branching directed network. The video summarization is achieved by diversity ranking on the similarity graphs between images and video frame, thereby treating consumer image as essentially a form of weak-supervision. The reconstruction of storyline graphs on the other hand is formulated as inference of the sparse time-varying directed graphs from a set of photo streams with assistance of consumer videos.
Time permitting I will also talk about a few other recent project highlights.
Abstract: I will present a general framework for modelling and recovering 3D shape and pose using subdivision surfaces. To demonstrate this frameworks generality, I will show how to recover both a personalized rigged hand model from a sequence of depth images and a blend shape model of dolphin pose from a collection of 2D dolphin images. The core requirement is the formulation of a generative model in which the control vertices of a smooth subdivision surface are parameterized (e.g. with joint angles or blend weights) by a differentiable deformation function. The energy function that falls out of measuring the deviation between the surface and the observed data is also differentiable and can be minimized through standard, albeit tricky, gradient based non-linear optimization from a reasonable initial guess. The latter can often be obtained using machine learning methods when manual intervention is undesirable. Satisfyingly, the "tricks" involved in the former are elegant and widen the applicability of these methods.
In order to avoid an expensive manual labeling process or to learn object classes autonomously without human intervention, object discovery techniques have been proposed that extract visual similar objects from weakly labelled videos. However, the problem of discovering small or medium sized objects is largely unexplored. We observe that videos with activities involving human-object interactions can serve as weakly labelled data for such cases. Since neither object appearance nor motion is distinct enough to discover objects in these videos, we propose a framework that samples from a space of algorithms and their parameters to extract sequences of object proposals. Furthermore, we model similarity of objects based on appearance and functionality, which is derived from human and object motion. We show that functionality is an important cue for discovering objects from activities and demonstrate the generality of the model on three challenging RGB-D and RGB datasets.