Learnable representations, and deep convolutional neural networks (CNNs) in particular, have become the preferred way of extracting visual features for image understanding tasks, from object recognition to semantic segmentation. In this talk I will discuss several recent advances in deep representations for computer vision. After reviewing modern CNN architectures, I will give an example of a state-of-the-art network in text spotting; in particular, I will show that, by using only synthetic data and a sufficiently large deep model, it is possible directly map image regions to English words, a classification problem with 90K classes, obtaining in this manner state-of-the-art performance in text spotting. I will also briefly touch on other applications of deep learning to object recognition and discuss feature universality and transfer learning. In the last part of the talk I will move to the problem of understanding deep networks, which remain largely black boxes, presenting two possible approaches to their analysis. The first one are visualisation techniques that can investigate the information retained and learned by a visual representation. The second one is a method that allows exploring how representation capture geometric notions such as image transformations, and to find whether different representations are related and how.
Recent progress in computer-based visual recognition heavily relies on machine learning methods trained using large scale annotated datasets. While such data has made advances in model design and evaluation possible, it does not necessarily provide insights or constraints into those intermediate levels of computation, or deep structure, perceived as ultimately necessary in order to design reliable computer vision systems. This is noticeable in the accuracy of state of the art systems trained with such annotations, which still lag behind human performance in similar tasks. Nor does the existing data makes it immediately possible to exploit insights from a working system - the human eye - to derive potentially better features, models or algorithms. In this talk I will present a mix of perceptual and computational insights resulted from the analysis of large-scale human eye movement and 3d body motion capture datasets, collected in the context of visual recognition tasks (Human3.6M available at http://vision.imar.ro/human3.6m/, and Actions in the Eye available at http://vision.imar.ro/eyetracking/). I will show that attention models (fixation detectors, scan-paths estimators, weakly supervised object detector response functions and search strategies) can be learned from human eye movement data, and can produce state of the art results when used in end-to-end automatic visual recognition systems. I will also describe recent work in large-scale human pose estimation, showing the feasibility of pixel-level body part labeling from RGB, and towards promising 2D and 3D human pose estimation results in monocular images.In this context, I will discuss perceptual, perhaps surprising recent quantitative experiments, revealing that humans may not be significantly better than computers at perceiving 3D articulated poses from monocular images. Such findings may challenge established definitions of computer vision `tasks' and their expected levels of performance.
Organizers: Ludovic Righetti
In the age of large streaming data it seems appropriate to revisit the foundations of what we think of as data modelling. In this talk I'll argue that traditional statistical approaches based on parametric models and i.i.d. assumptions are inappropriate for the type of large scale machine learning we need to do in the age of massive streaming data sets. Particularly when we realise that regardless of the size of data we have, it pales in comparison to the data we could have. This is the domain of massively missing data. I'll be arguing for flexible non-parametric models as the answer. This presents a particular challenge, non parametric models require data storage of the entire data set, which presents problems for massive, streaming data. I will present a potential solution, but perhaps end with more questions than we started with.
Organizers: Jane Walters
The breast is not just a protruding gland situated on the front of the thorax in female bodies: behind biology lies an intricate symbolism that has taken various and often contradictory meanings. We begin our journey looking at pre-historic artifacts that revered the breast as the ultimate symbol of life; we then transition to the rich iconographical tradition centering on the so-called Virgo Lactans when the breast became a metaphor of nourishment for the entire Christian community. Next, we look at how artists have eroticized the breast in portraits of fifteenth-century French courtesans and how enlightenment philosophers and revolutionary events have transformed it into a symbol of the national community. Lastly, we analyze how contemporary society has medicalized the breast through cosmetic surgery and discourses around breast cancer, and has objectified it by making the breast a constant presence in advertisement and magazine covers. Through twenty-five centuries of representations, I will talk about how the breast has been coded as both "good" and "bad," sacred and erotic, life-giving and life-destroying.
Autonomous micro aerial robots can operate in three-dimensional, indoor and outdoor environments, and have applications to search and rescue, first response and precision farming. I will describe the challenges in developing small, agile robots and the algorithmic challenges in the areas of (a) control and planning, (b) state estimation and mapping, and (c) coordinating large teams of robots.
It is a great pleasure to invite you to the talk of Ioannis Havoutis (cf. info below) on Monday March 9th at 11h in the AMD seminar room (TTR building, first floor). have a nice week-end, ludovic Quadrupedal animals move with skill, grace and agility. Quadrupedal robots have made tremendous progress in the last few years. In this talk I will give an overview of our work with the Hydraulic Quadruped -HyQ- and present our latest framework for perception, planning and control of quadrupedal locomotion in challenging environments. In addition, I will give a short preview of our work on optimization of dynamic motions, and our future goals.
Organizers: Ludovic Righetti
How is it that biological systems can be so imprecise, so ad hoc, and so inefficient, yet accomplish (seemingly) simple tasks that still elude state-of-the-art artificial systems? In this context, I will introduce some of the themes central to CMU's new BrainHub Initiative by discussing: (1) The complexity and challenges of studying the mind and brain; (2) How the study of the mind and brain may benefit from considering contemporary artificial systems; (3) Why studying the mind and brain might be interesting (and possibly useful) to computer scientists.
In this talk I will give an overview of work I have done over the years exploring physically based simulation of contact, deformation, and articulated structures where there are trade-offs between computational speed and physical fidelity that can be made. I will also discuss examples that mix data-driven and physically based approaches in animation and control.
Paul Kry is an associate professor in the School of Computer Science at McGill University. He has a BMath from University of Waterloo, and MSc and PhD from University of British Columbia. His research focuses on physically based simulation, motion capture, and control of character animation.
Everyone in visual psychology seems to know what Biological Motion is. Yet, it is not easy to come up with a definition that is specific enough to justify a distinct label, but is also general enough to include the many different experiments to which the term has been applied in the past. I will present a number of tasks, stimuli, and experiments, including some of my own work, to demonstrate the diversity and the appeal of the field of biological motion perception. In trying to come up with a definition of the term, I will particularly focus on a type of motion that has been considered “non-biological” in some contexts, even though it might contain -- as more recent work shows -- one of the most important visual invariants used by the visual system to distinguish animate from inanimate motion.