In Special Issue on Generative Models in Computer Vision and Medical Imaging, 136, pages: 32-44, Elsevier, July 2015 (inproceedings)
Computer vision is hard because of a large variability in lighting, shape, and texture; in addition the image signal is non-additive due to occlusion. Generative models promised to account for this variability by accurately modelling the image formation process as a function of latent variables with prior beliefs. Bayesian posterior inference could then, in principle, explain the observation. While intuitively appealing, generative models for computer vision have largely failed to deliver on that promise due to the difficulty of posterior inference. As a result the community has favored efficient discriminative approaches. We still believe in the usefulness of generative models in computer vision, but argue that we need to leverage existing discriminative or even heuristic computer vision methods.
We implement this idea in a principled way in our informed sampler and in careful experiments demonstrate it on challenging models which contain renderer programs as their components. The informed sampler, using simple discriminative proposals based on existing computer vision technology achieves dramatic improvements in inference. Our approach enables a new richness in generative models that was out of reach with existing inference technology.
Advanced Structured Prediction, pages: 432, Neural Information Processing Series, MIT Press, November 2014 (book)
The goal of structured prediction is to build machine learning models that predict relational information that itself has structure, such as being composed of multiple interrelated parts. These models, which reflect prior knowledge, task-specific relations, and constraints, are used in fields including computer vision, speech recognition, natural language processing, and computational biology. They can carry out such tasks as predicting a natural language sentence, or segmenting an image into meaningful components.
These models are expressive and powerful, but exact computation is often intractable. A broad research effort in recent years has aimed at designing structured prediction models and approximate inference and learning procedures that are computationally efficient. This volume offers an overview of this recent research in order to make the work accessible to a broader research community. The chapters, by leading researchers in the field, cover a range of topics, including research trends, the linear programming relaxation approach, innovations in probabilistic modeling, recent theoretical progress, and resource-aware learning.
In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages: 1314-1321, IEEE, IEEE International Conference on Computer Vision and Pattern Recognition, June 2014 (inproceedings)
Dynamic Bayesian networks such as Hidden Markov
Models (HMMs) are successfully used as probabilistic models
for human motion. The use of hidden variables makes
them expressive models, but inference is only approximate
and requires procedures such as particle filters or Markov
chain Monte Carlo methods. In this work we propose to instead
use simple Markov models that only model observed
quantities. We retain a highly expressive dynamic model by
using interactions that are nonlinear and non-parametric.
A presentation of our approach in terms of latent variables
shows logarithmic growth for the computation of exact loglikelihoods
in the number of latent states. We validate
our model on human motion capture data and demonstrate
state-of-the-art performance on action recognition and motion
In Proceedings IEEE Conf. on Computer Vision (ICCV), pages: 1281-1288, IEEE International Conference on Computer Vision, December 2013 (inproceedings)
Having a sensible prior of human pose is a vital ingredient for many computer vision applications, including tracking and pose estimation. While the application of global non-parametric approaches and parametric models has led to some success, finding the right balance in terms of flexibility and tractability, as well as estimating model parameters from data has turned out to be challenging. In this work, we introduce a sparse Bayesian network model of human pose that is non-parametric with respect to the estimation of both its graph structure and its local distributions. We describe an efficient sampling scheme for our model and show its tractability for the computation of exact log-likelihoods. We empirically validate our approach on the Human 3.6M dataset and demonstrate superior performance to global models and parametric networks. We further illustrate our model's ability to represent and compose poses not present in the training set (compositionality) and describe a speed-accuracy trade-off that allows realtime scoring of poses.
In Proceedings of 34th DAGM Symposium, pages: 397-407, Lecture Notes in Computer Science, (Editors: Pinz, Axel and Pock, Thomas and Bischof, Horst and Leberl, Franz), Springer, August 2012 (inproceedings)
pages: 494, Neural information processing series, MIT Press, Cambridge, MA, USA, December 2011 (book)
The interplay between optimization and machine learning is one of the most important developments in modern computational science. Optimization formulations and methods are proving to be vital in designing algorithms to extract essential knowledge from huge volumes of data. Machine learning, however, is not simply a consumer of optimization technology but a rapidly evolving field that is itself generating new optimization ideas. This book captures the state of the art of the interaction between optimization and machine learning in a way that is accessible to researchers in both fields.
Optimization approaches have enjoyed prominence in machine learning because of their wide applicability and attractive theoretical properties. The increasing complexity, size, and variety of today's machine learning models call for the reassessment of existing assumptions. This book starts the process of reassessment. It describes the resurgence in novel contexts of established frameworks such as first-order methods, stochastic approximations, convex relaxations, interior-point methods, and proximal methods. It also devotes attention to newer themes such as regularized optimization, robust optimization, gradient and subgradient methods, splitting techniques, and second-order methods. Many of these techniques draw inspiration from other fields, including operations research, theoretical computer science, and subfields of optimization. The book will enrich the ongoing cross-fertilization between the machine learning community and these other fields, and within the broader optimization community.
In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages: 2836-2843, IEEE Service Center, Piscataway, NJ, USA, CVPR, June 2009 (inproceedings)
Most modern computer vision systems for high-level tasks, such as image classification, object recognition and segmentation, are based on learning algorithms that are able to separate discriminative information from noise. In practice, however, the typical system consists of a long pipeline of pre-processing steps, such as extraction of different kinds of features, various kinds of normalizations, feature selection, and quantization into aggregated representations such as histograms. Along this pipeline, there are many parameters to set and choices to make, and their effect on the overall system performance is a-priori unclear. In this work, we shorten the pipeline in a principled way. We move pre-processing steps into the learning system by means of kernel parameters, letting the learning algorithm decide upon suitable parameter values. Learning to optimize the pre-processing choices becomes learning the kernel parameters. We realize this paradigm by extending the recent Multiple Kernel Learning formulation from the finite case of having a fixed number of kernels which can be combined to the general infinite case where each possible parameter setting induces an associated kernel. We evaluate the new paradigm extensively on image classification and object classification tasks. We show that it is possible to learn optimal discriminative codebooks and optimal spatial pyramid schemes, consistently outperforming all previous state-of-the-art approaches.
In CVPR 2009, pages: 818-825, IEEE Service Center, Piscataway, NJ, USA, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 2009 (inproceedings)
Markov random field (MRF, CRF) models are popular in
computer vision. However, in order to be computationally
tractable they are limited to incorporate only local interactions
and cannot model global properties, such as connectedness,
which is a potentially useful high-level prior
for object segmentation. In this work, we overcome this
limitation by deriving a potential function that enforces the
output labeling to be connected and that can naturally be
used in the framework of recent MAP-MRF LP relaxations.
Using techniques from polyhedral combinatorics, we show
that a provably tight approximation to the MAP solution of
the resulting MRF can still be found efficiently by solving
a sequence of max-flow problems. The efficiency of the inference
procedure also allows us to learn the parameters
of a MRF with global connectivity potentials by means of a
cutting plane algorithm. We experimentally evaluate our algorithm
on both synthetic data and on the challenging segmentation
task of the PASCAL VOC 2008 data set. We show
that in both cases the addition of a connectedness prior significantly
reduces the segmentation error.
In ICML 2009, pages: 769-776, (Editors: Danyluk, A. , L. Bottou, M. Littman), ACM Press, New York, NY, USA, 26th International Conference on Machine Learning, June 2009 (inproceedings)
We propose a new method to quantify the solution
stability of a large class of combinatorial
optimization problems arising in machine
learning. As practical example we apply the
method to correlation clustering, clustering
aggregation, modularity clustering, and relative
performance significance clustering. Our
method is extensively motivated by the idea
of linear programming relaxations. We prove
that when a relaxation is used to solve the
original clustering problem, then the solution
stability calculated by our method is conservative,
that is, it never overestimates the solution
stability of the true, unrelaxed problem.
We also demonstrate how our method
can be used to compute the entire path of
optimal solutions as the optimization problem
is increasingly perturbed. Experimentally,
our method is shown to perform well
on a number of benchmark problems.
In ICML 2008, pages: 704-711, (Editors: Cohen, W. W., A. McCallum, S. Roweis), ACM Press, New York, NY, USA, 25th International Conference on Machine Learning, July 2008 (inproceedings)
A recent trend in exemplar based unsupervised
learning is to formulate the learning
problem as a convex optimization problem.
Convexity is achieved by restricting the set
of possible prototypes to training exemplars.
In particular, this has been done for clustering,
vector quantization and mixture model
density estimation. In this paper we propose
a novel algorithm that is theoretically and
practically superior to these convex formulations.
This is possible by posing the unsupervised
learning problem as a single convex
master problem" with non-convex subproblems.
We show that for the above learning
tasks the subproblems are extremely wellbehaved
and can be solved efficiently.
(180), Max-Planck Institute for Biological Cybernetics, Tübingen, Germany, November 2008 (techreport)
Discovery of knowledge from geometric graph databases is of particular importance in chemistry and
biology, because chemical compounds and proteins are represented as graphs with 3D geometric coordinates. In
such applications, scientists are not interested in the statistics of the whole database. Instead they need information
about a novel drug candidate or protein at hand, represented as a query graph. We propose a polynomial-delay
algorithm for geometric frequent subgraph retrieval. It enumerates all subgraphs of a single given query graph
which are frequent geometric epsilon-subgraphs under the entire class of rigid geometric transformations in a database.
By using geometric epsilon-subgraphs, we achieve tolerance against variations in geometry. We compare the proposed
algorithm to gSpan on chemical compound data, and we show that for a given minimum support the total number
of frequent patterns is substantially limited by requiring geometric matching. Although the computation time per
pattern is larger than for non-geometric graph mining, the total time is within a reasonable level even for small
Machine Learning, 75(1):69-89, November 2008 (article)
Graph mining methods enumerate frequently appearing subgraph patterns, which can be used as features for subsequent classification or regression. However, frequent patterns are not necessarily informative for the given learning problem. We propose a mathematical programming boosting method (gBoost) that progressively collects informative patterns. Compared to AdaBoost, gBoost can build the prediction rule with fewer iterations. To apply the boosting method to graph data, a branch-and-bound pattern search algorithm is developed based on the DFS code tree. The constructed search space is reused in later iterations to minimize the computation time. Our method can learn more efficiently than the simpler method based on frequent substructure mining, because the output labels are used as an extra information source for pruning the search space. Furthermore, by engineering the mathematical program, a wide range of machine learning problems can be solved without modifying the pattern search algorithm.
Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems