I will present three recent projects within the 3D Deep Learning research line from my team at Google Research: (1) a deep network for reconstructing the 3D shape of multiple objects appearing in a single RGB image (ECCV'20). (2) a new conditioning scheme for normalizing flow models. It enables several applications such as reconstructing an object's 3D point cloud from an image, or the converse problem of rendering an image given a 3D point cloud, both within the same modeling framework (CVPR'20); (3) a neural rendering framework that maps a voxelized object into a high quality image. It renders highly-textured objects and illumination effects such as reflections and shadows realistically. It allows controllable rendering: geometric and appearance modifications in the input are accurately represented in the final rendering (CVPR'20).
Game Development requires a vast array of tools, techniques, and expertise, ranging from game design, artistic content creation, to data management and low level engine programming. Yet all of these domains have one kind of task in common - the transformation of one kind of data into another. Meanwhile, advances in Machine Learning have resulted in a fundamental change in how we think about these kinds of data transformations - allowing for accurate and scalable function approximation, and the ability to train such approximations on virtually unlimited amounts of data. In this talk I will present how these two fundamental changes in Computer Science affect game development - how they can be used to improve game technology as well as the way games are built - and the exciting new possibilities and challenges they bring along the way.
Organizers: Abhinanda Ranjit Punnakkal
Technological advances in laser and vacuum technology have allowed realizing a dream of the early days of quantum mechanics: controlling single, laser-cooled atoms at a quantum level. Interfacing individual atoms with ultracold gases offer new experimental approaches to unsolved problems of nonequilibrium quantum physics. Moreover, such systems allow experimentally addressing the question if and how quantum properties can boost the performance of atomic-scale devices. In this talk, I will discuss how single atoms can be controlled and probed in an ultracold gas. Understanding the impurity-gas interaction at the atomic level allows employing inelastic spinexchange collisions, which are usually considered harmful, for quantum applications. First, I will show how the inelastic spin-exchange can map information about the gas temperature or the surrounding magnetic field to the quantum-spin distribution of single impurity atoms. Interestingly, the nonequilibrium spin dynamics before reaching the steady-state increases the sensitivity of the probe while reducing the perturbation of the gas compared to the steady-state. Second, I will discuss how the quantized energy transfer during inelastic collisions allows operating a single-atom quantum engine. We over-come the limitations imposed by using thermal states and run a quantum-enhanced Otto cycle operating at orders of magnitude larger powers compared to a thermal case, alternating between positive and negative temperature regimes at maximum efficiency. I will discuss the properties of the engine as well as limitations originating from the quantum aspects resulting in fluctuations of power.
In a Born-Oppenheimer description, atomic motions evolve across a potential energy surface determined by the occupation of electronic states as a function of atom positions. Ultrafast photo-induced phase transitions provide a test case for how the forces and resulting nuclear motion along the reaction co-ordinate originate from a non-equilibrium population of excited electronic states. Here I discuss recent advances in time-resolved photoemission spectroscopy allowing for direct probing of the underlying fundamental steps and the transiently evolving band structure in the ultrafast phase transition in indium nanowires on Si(111) . Furthermore, I will discuss some recent attempts to access the space-time limit in surface dynamics using local optical excitation of controlled plasmonic nano-junctions and tip-enhanced Raman scattering (TERS) [2,3].
Organizers: Joachim Gräfe
Accurate 3D human pose estimation has been a longstanding goal in computer vision. However, till now, it has only gained limited success in easy scenarios such as studios which have little occlusion. In this talk, I will present our two works aiming to address the occlusion problem in realistic scenarios. In the first work, we present an approach to recover absolute 3D human pose of single person from multi-view images by incorporating multi-view geometric priors in our model. It consists of two separate steps: (1) estimating the 2D poses in multi-view images and (2) recovering the 3D poses from the multi-view 2D poses. First, we introduce a cross-view fusion scheme into CNN to jointly estimate 2D poses for multiple views. Consequently, the 2D pose estimation for each view already benefits from other views. Second, we present a recursive Pictorial Structure Model to recover the 3D pose from the multi-view 2D poses. It gradually improves the accuracy of 3D pose with affordable computational cost. In the second work, we present a 3D pose estimator which allows us to reliably estimate and track people in crowded scenes. In contrast to the previous efforts which require to establish cross-view correspondence based on noisy and incomplete 2D pose estimations, we present an end-to-end solution which directly operates in the 3D space, therefore avoids making incorrect hard decisions in the 2D space. To achieve this goal, the features in all camera views are warped and aggregated in a common 3D space, and fed to Cuboid Proposal Network (CPN) to coarsely localize all people. Then we propose Pose Regression Network (PRN) to estimate a detailed 3D pose for each proposal. The approach is robust to occlusion which occurs frequently in practice. Without bells and whistles, it significantly outperforms the state-of-the-arts on the benchmark datasets.
Organizers: Chun-Hao Paul Huang
Tying a knot in a piece of string can be a hard practical problem.
Traditional voice conversion methods rely on parallel recordings of multiple speakers pronouncing the same sentences. For real-world applications however, parallel data is rarely available. We propose MelGAN-VC, a voice conversion method that relies on non-parallel speech data and is able to convert audio signals of arbitrary length from a source voice to a target voice. We firstly compute spectrograms from waveform data and then perform a domain translation using a Generative Adversarial Network (GAN) architecture. An additional siamese network helps preserving speech information in the translation process, without sacrificing the ability to flexibly model the style of the target speaker. We test our framework with a dataset of clean speech recordings, as well as with a collection of noisy real-world speech examples. Finally, we apply the same method to perform music style transfer, translating arbitrarily long music samples from one genre to another, and showing that our framework is flexible and can be used for audio manipulation applications different from voice conversion.
In this talk I will present an overview of our recent works that learn deep geometric models for the 3D face from large datasets of scans. Priors for the 3D face are crucial for many applications: to constrain ill posed problems such as 3D reconstruction from monocular input, for efficient generation and animation of 3D virtual avatars, or even in medical domains such as recognition of craniofacial disorders. Generative models of the face have been widely used for this task, as well as deep learning approaches that have recently emerged as a robust alternative. Barring a few exceptions, most of these data-driven approaches were built from either a relatively limited number of samples (in the case of linear models of the shape), or by synthetic data augmentation (for deep-learning based approaches), mainly due to the difficulty in obtaining large-scale and accurate 3D scans of the face. Yet, there is a substantial amount of 3D information that can be gathered when considering publicly available datasets that have been captured over the last decade. I will discuss here our works that tackle the challenges of building rich geometric models out of these large and varied datasets, with the goal of modeling the facial shape, expression (i.e. motion) or geometric details. Concretely, I will talk about (1) an efficient and fully automatic approach for registration of large datasets of 3D faces in motion; (2) deep learning methods for modeling the facial geometry that can disentangle the shape and expression aspects of the face; and (3) a multi-modal learning approach for capturing geometric details from images in-the-wild, by simultaneously encoding both facial surface normal and natural image information.
Organizers: Jinlong Yang
Motivated by the low voltage driven actuation of ionic Electroactive Polymers (iEAPs)  , recently we began investigating ionic elastomers. In this talk I will discuss the preparation, physical characterization and electric bending actuation properties of two novel ionic elastomers; ionic polymer electrolyte membranes (iPEM), and ionic liquid crystal elastomers (iLCE). Both materials can be actuated by low frequency AC or DC voltages of less than 1 V. The bending actuation properties of the iPEMs are outperforming most of the well-developed iEAPs, and the not optimized first iLCEs are already comparable to them. Ionic liquid crystal elastomers also exhibit superior features, such as the alignment dependent actuation, which offers the possibility of pre-programed actuation pattern at the level of cross-linking process. Additionally, multiple (thermal, optical and electric) actuations are also possible. I will also discuss issues with compliant electrodes and possible soft robotic applications.  Y. Bar-Cohen, Electroactive Polyer Actuators as Artficial Muscles: Reality, Potential and Challenges, SPIE Press, Bellingham, 2004.  O. Kim, S. J. Kim, M. J. Park, Chem. Commun. 2018, 54, 4895.  C. P. H. Rajapaksha, C. Feng, C. Piedrahita, J. Cao, V. Kaphle, B. Lüssem, T. Kyu, A. Jákli, Macromol. Rapid Commun. 2020, in print.  C. Feng, C. P. H. Rajapaksha, J. M. Cedillo, C. Piedrahita, J. Cao, V. Kaphle, B. Lussem, T. Kyu, A. I. Jákli, Macromol. Rapid Commun. 2019, 1900299.
In this talk I will discuss the development of functional materials and their application in modulating the biological microenvironment during cellular sensing and signal transduction. First, I’ll briefly summarize the mechanical, biochemical and physicochemical material properties that influence cellular sensing and subsequent integration with the tissues at the macroscale. Controlling signal transduction at the submicron scale, however, requires careful materials engineering to address the need for minimally invasive targeting of single proteins and for providing sufficient physical stimuli for cellular signaling. I will discuss an approach to fabricate anisotropic magnetite nanodiscs (MNDs) which can be used as torque transducers to mechanosensory cells under weak, slowly varying magnetic fields (MFs). When MNDs are coupled to MFs, their magnetization transitions between a vortex and in-plane state, leading to torques on the pN scale, sufficient to activate mechanosensitive ion channels in neuronal cell membranes. This approach opens new avenues for studies of biological mechanoreception and provides new tools for minimally invasive neuromodulation technology.
Optoacoustic imaging is increasingly attracting the attention of the biomedical research community due to its excellent spatial and temporal resolution, centimeter scale penetration into living tissues, versatile endogenous and exogenous optical absorption contrast. State-of-the-art implementations of multi-spectral optoacoustic tomography (MSOT) are based on multi-wavelength excitation of tissues to visualize specific molecules within opaque tissues. As a result, the technology can noninvasively deliver structural, functional, metabolic, and molecular information from living tissues. The talk covers most recent advances pertaining ultrafast imaging instrumentation, multi-modal combinations with optical and ultrasound methods, intelligent reconstruction algorithms as well as smart optoacoustic contrast and sensing approaches. Our current efforts are also geared toward exploring potential of the technique in studying multi-scale dynamics of the brain and heart, monitoring of therapies, fast tracking of cells and targeted molecular imaging applications. MSOT further allows for a handheld operation thus offers new level of precision for clinical diagnostics of patients in a number of indications, such as breast and skin lesions, inflammatory diseases and cardiovascular diagnostics.
Organizers: Metin Sitti
Machine learning allows automated systems to identify structures and physical laws based on measured data, which is particularly useful in areas where an analytic derivation of a model is too tedious or not possible. Research in reinforcement learning led to impressive results and superhuman performance in well-structured tasks and games. However, to this day, data-driven models are rarely employed in the control of safety critical systems, because the success of a controller, which is based on these models, cannot be guaranteed. Therefore, the research presented in this talk analyzes the closed-loop behavior of learning control laws by means of rigorous proofs. More specifically, we propose a control law based on Gaussian process (GP) models, which actively avoids uncertainties in the state space and favors trajectories along the training data, where the system is well-known. We show that this behavior is optimal as it maximizes the probability of asymptotic stability. Additionally, we consider an event-triggered online learning control law, which safely explores an initially unknown system. It only takes new training data whenever the uncertainty in the system becomes too large. As the control law only requires a locally precise model, this novel learning strategy has a high data efficiency and provides safety guarantees.
Organizers: Sebastian Trimpe