There has been significant prior work on learning realistic, articulated, 3D statistical shape models of the human body. In contrast, there are few such models for animals, despite their many applications in biology, neuroscience, agriculture, and entertainment. The main challenge is that animals are much less cooperative subjects than humans: the best human body models are learned from thousands of 3D scans of people in specific poses, which is infeasible with live animals. In the talk I will illustrate how we extend a state-of-the-art articulated 3D human body model (SMPL) to animals learning from toys a multi-family shape space that can represent lions, cats, dogs, horses, cows and hippos. The generalization of the model is illustrated by fitting it to images of real animals, where it captures realistic animal shapes, even for new species not seen in training.
Organizers: Moritz Grosse-Wentrup
Directed acyclic graph models (DAG models, also called Bayesian networks) are widely used in the context of causal inference, and they can be manipulated to represent the consequences of intervention in a causal system. However, DAGs cannot fully represent causal models with confounding; other classes of graphs, such as ancestral graphs and ADMGs, have been introduced to deal with this using additional kinds of edge, but we show that these are not sufficiently rich to capture the range of possible models. In fact, no mixed graph over the observed variables is rich enough, regardless of how many edges are used. Instead we introduce mDAGs, a class of hyper-graphs appropriate for representing causal models when some of the variables are unobserved. Results on the Markov equivalence of these marginal models show that when interpreted causally, mDAGs are the minimal class of graphs which can be sensibly used. Understanding such equivalences is critical for the use of automatic causal structure learning methods, a topic in which there is considerable interest. We elucidate the state of the art as well as some open problems.
Organizers: Sabrina Rehbaum
Human diseases show considerable heterogeneity at the molecular level. Such heterogeneity is central to personalized medicine efforts that seek to exploit molecular data to better understand disease biology and inform clinical decision making. An emerging notion is that diseases and disease subgroups may differ not only at the level of mean molecular abundance, but also with respect to patterns of molecular interplay. I will discuss our ongoing efforts to develop methods to investigate such heterogeneity, with an emphasis on some high-dimensional aspects.
Our eyes typically anticipate the next action module in a sequence, by targeting the relevant object for the following step. Yet, how the final goal, or the way we intend to achieve it, is reflected in the early visual exploration of each object has been less investigated. In a series of experiments we considered how scan paths on real-world objects would be affected by different factors such as task, object orientation, familiarity, or low-level saliency, hence revealing which components can account for fixation target selection during eye-hand coordination. In each experiment, the fixation distribution differed significantly depending on the final task (e.g. lifting vs. opening). Already from the second fixation prior to reaching the object the eyes targeted the task-relevant regions and these significantly correlated with salient features like oriented edges. Familiarity had a significant effect when different tools were used as stimuli, with more fixations concentrating on the active end of unfamiliar tools. Object orientation (upright or inverse) and anticipation of the final comfort state determined the height of the fixations on the objects. Scan paths dynamics, thus, reveal how action is planned, offering indirect insight in the structuring of complex behaviour and the understanding of how task and affordance perception relates to motor control.
Organizers: Jeannette Bohg
Lilla and Bill are two returning artists to Perceiving Systems. Their talk will update us on the exciting projects that they’ve been involved with since their last visit and to present some of their current plans that will unfold during the week (Sept 21st - 25th). They will be joining our department and working with professional dancers in the 4D scanner as part of an art project on mental health. In general, Lilla and Bill have been using 3D captures as an artistic tool to visualize the human body in a contemporary form for some time. They produce marionettes or avatars which can be seen as figures that are anonymous yet universal. Through this medium they portray a prominent theme of human frailty.
Organizers: Emma-Jayne Holderness
Organizers: Jane Walters
In this talk, I will start with describing the pervasiveness of image and video content, and how such content is growing with the ubiquity of cameras. I will use this to motivate the need for better tools for analysis and enhancement of video content. I will start with some of our earlier work on temporal modeling of video, then lead up to some of our current work and describe two main projects. (1) Our approach for a video stabilizer, currently implemented and running on YouTube, and its extensions. (2) A robust and scaleable method for video segmentation. I will describe, in some detail, our Video stabilization method, which generates stabilized videos and is in wide use. Our method allows for video stabilization beyond the conventional filtering that only suppresses high frequency jitter. This method also supports removal of rolling shutter distortions common in modern CMOS cameras that capture the frame one scan-line at a time resulting in non-rigid image distortions such as shear and wobble. Our method does not rely on a-priori knowledge and works on video from any camera or on legacy footage. I will showcase examples of this approach and also discuss how this method is launched and running on YouTube, with Millions of users. Then I will describe an efficient and scalable technique for spatio-temporal segmentation of long video sequences using a hierarchical graph-based algorithm. This hierarchical approach generates high quality segmentations and we demonstrate the use of this segmentation as users interact with the video, enabling efficient annotation of objects within the video. I will also show some recent work on how this segmentation and annotation can be used to do dynamic scene understanding. I will then follow up with some recent work on image and video analysis in the mobile domains. I will also make some observations about ubiquity of imaging and video in general and need for better tools for video analysis.
Organizers: Naejin Kong
Optics with long focal length have been extensively used for shooting 2D cinema and television, either to virtually get closer to the scene or to produce an aesthetical effect through the deformation of the perspective. However, in 3D cinema or television, the use of long focal length either creates a ``cardboard effect'' or causes visual divergence. To overcome this problem, state-of-the-art methods use disparity mapping techniques, which is a generalization of view interpolation, and generate new stereoscopic pairs from the two image sequences. We propose to use more than two cameras to solve for the remaining issues in disparity mapping methods. In the first part of the talk, we briefly review the causes of visual fatigue and visual discomfort when viewing a stereoscopic film. We model the depth perception from stereopsis of a 3D scene shot with two cameras, and projected in a movie theater or on a 3DTV. We mathematically characterize this 3D distortion, and derive the mathematical constraints associated with the causes of visual fatigue and discomfort. We illustrate these 3D distortions with a new interactive software, ``The Virtual Projection Room". In order to generate the desired stereoscopic images, we propose to use image-based rendering. These techniques usually proceed in two stages. First, the input images are warped into the target view, and then the warped images are blended together. The warps are usually computed with the help of a geometric proxy (either implicit or explicit). Image blending has been extensively addressed in the literature and a few heuristics have proven to achieve very good performance. Yet the combination of the heuristics is not straightforward, and requires manual adjustment of many parameters. We present a new Bayesian approach to the problem of novel view synthesis, based on a generative model taking into account the uncertainty of the image warps in the image formation model. The Bayesian formalism allows us to deduce the energy of the generative model and to compute the desired images as the Maximum a Posteriori estimate. The method outperforms state-of-the-art image-based rendering techniques on challenging datasets. Moreover, the energy equations provide a formalization of the heuristics widely used inimage-based rendering techniques. Besides, the proposed generative model also addresses the problem of super-resolution, allowing to render images at a higher resolution than the initial ones. In the last part of the presentation, we apply the new rendering technique to the case of the stereoscopic zoom.
The visual effects and entertainment industries are now a fundamental part of the computer graphics and vision landscapes - as well as impacting across society in general. One of the issues in this area is the creation of realistic characters, creating assets for production, and improving work-flow. Advances in computer graphics, vision and rendering have underlined much of the success of these industries, built on top of academic advances. However, there are still many unsolved problems. In this talk I will outline some of the challenges we have faced in crossing over academic research into the visual effects industry. In particular, I will attempt to distinguish between academic challenges and industrial demands we have experienced - and how this has impacted projects. This draws on experience in several themes involving leading Visual Effects and entertainment companies. Our work has been in several diverse areas, including on-set capture, digital doubles, real-time animation and motion capture retargeting. I will describe how many of these problems led to us step back and focus on first solving more fundamental computer vision research problems - particularly in the area of optical flow, non-rigid tracking and shadow removal - and how these opened up other opportunities. Some of these projects are supported through our Centre for Digital Entertainment (CDE) - which has 60 PhD level student embedded across the creative industries in the UK. Others are more specific to partners at The Imaginarium and Double Negative Visual Effects. Attempting to draw these experiences together, we are now starting a new Centre for the Analysis of Motion, Entertainment Research and Applications (CAMERA), with leading partners across entertainment, elite sport and rehabilitation.
Organizers: Silvia Zuffi