Header logo is


2009


no image
Influence of graph construction on graph-based clustering measures

Maier, M., von Luxburg, U., Hein, M.

In Advances in neural information processing systems 21, pages: 1025-1032, (Editors: Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou), Curran, Red Hook, NY, USA, Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
Graph clustering methods such as spectral clustering are defined for general weighted graphs. In machine learning, however, data often is not given in form of a graph, but in terms of similarity (or distance) values between points. In this case, first a neighborhood graph is constructed using the similarities between the points and then a graph clustering algorithm is applied to this graph. In this paper we investigate the influence of the construction of the similarity graph on the clustering results. We first study the convergence of graph clustering criteria such as the normalized cut (Ncut) as the sample size tends to infinity. We find that the limit expressions are different for different types of graph, for example the r-neighborhood graph or the k-nearest neighbor graph. In plain words: Ncut on a kNN graph does something systematically different than Ncut on an r-neighborhood graph! This finding shows that graph clustering criteria cannot be studied independently of the kind of graph they are applied to. We also provide examples which show that these differences can be observed for toy and real data already for rather small sample sizes.

ei

PDF Web [BibTex]

2009


PDF Web [BibTex]


no image
Local Gaussian Process Regression for Real Time Online Model Learning and Control

Nguyen-Tuong, D., Seeger, M., Peters, J.

In Advances in neural information processing systems 21, pages: 1193-1200, (Editors: Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou), Curran, Red Hook, NY, USA, Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
Learning in real-time applications, e.g., online approximation of the inverse dynamics model for model-based robot control, requires fast online regression techniques. Inspired by local learning, we propose a method to speed up standard Gaussian Process regression (GPR) with local GP models (LGP). The training data is partitioned in local regions, for each an individual GP model is trained. The prediction for a query point is performed by weighted estimation using nearby local models. Unlike other GP approximations, such as mixtures of experts, we use a distance based measure for partitioning of the data and weighted prediction. The proposed method achieves online learning and prediction in real-time. Comparisons with other nonparametric regression methods show that LGP has higher accuracy than LWPR and close to the performance of standard GPR and nu-SVR.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Detecting the Direction of Causal Time Series

Peters, J., Janzing, D., Gretton, A., Schölkopf, B.

In Proceedings of the 26th International Conference on Machine Learning, pages: 801-808, (Editors: A Danyluk and L Bottou and ML Littman), ACM Press, New York, NY, USA, ICML, June 2009 (inproceedings)

Abstract
We propose a method that detects the true direction of time series, by fitting an autoregressive moving average model to the data. Whenever the noise is independent of the previous samples for one ordering of the observations, but dependent for the opposite ordering, we infer the former direction to be the true one. We prove that our method works in the population case as long as the noise of the process is not normally distributed (for the latter case, the direction is not identificable). A new and important implication of our result is that it confirms a fundamental conjecture in causal reasoning - if after regression the noise is independent of signal for one direction and dependent for the other, then the former represents the true causal direction - in the case of time series. We test our approach on two types of data: simulated data sets conforming to our modeling assumptions, and real world EEG time series. Our method makes a decision for a significant fraction of both data sets, and these decisions are mostly correct. For real world data, our approach outperforms alternative solutions to the problem of time direction recovery.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Text Clustering with Mixture of von Mises-Fisher Distributions

Sra, S., Banerjee, A., Ghosh, J., Dhillon, I.

In Text mining: classification, clustering, and applications, pages: 121-161, Chapman & Hall/CRC data mining and knowledge discovery series, (Editors: Srivastava, A. N. and Sahami, M.), CRC Press, Boca Raton, FL, USA, June 2009 (inbook)

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer

Lampert, C.

IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), June 2009 (talk)

ei

Web [BibTex]

Web [BibTex]


no image
Learning Taxonomies by Dependence Maximization

Blaschko, M., Gretton, A.

In Advances in neural information processing systems 21, pages: 153-160, (Editors: Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou), Curran, Red Hook, NY, USA, Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
We introduce a family of unsupervised algorithms, numerical taxonomy clustering, to simultaneously cluster data, and to learn a taxonomy that encodes the relationship between the clusters. The algorithms work by maximizing the dependence between the taxonomy and the original data. The resulting taxonomy is a more informative visualization of complex data than simple clustering; in addition, taking into account the relations between different clusters is shown to substantially improve the quality of the clustering, when compared with state-ofthe-art algorithms in the literature (both spectral clustering and a previous dependence maximization approach). We demonstrate our algorithm on image and text data.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Learning object-specific grasp affordance densities

Detry, R., Baseski, E., Popovic, M., Touati, Y., Krüger, N., Kroemer, O., Peters, J., Piater, J.

In 8th IEEE International Conference on Development and Learning, pages: 1-7, IEEE Service Center, Piscataway, NJ, USA, ICDL, June 2009 (inproceedings)

Abstract
This paper addresses the issue of learning and representing object grasp affordances, i.e. object-gripper relative configurations that lead to successful grasps. The purpose of grasp affordances is to organize and store the whole knowledge that an agent has about the grasping of an object, in order to facilitate reasoning on grasping solutions and their achievability. The affordance representation consists in a continuous probability density function defined on the 6D gripper pose space-3D position and orientation-, within an object-relative reference frame. Grasp affordances are initially learned from various sources, e.g. from imitation or from visual cues, leading to grasp hypothesis densities. Grasp densities are attached to a learned 3D visual object model, and pose estimation of the visual model allows a robotic agent to execute samples from a grasp hypothesis density under various object poses. Grasp outcomes are used to learn grasp empirical densities, i.e. grasps that have been confirmed through experience. We show the result of learning grasp hypothesis densities from both imitation and visual cues, and present grasp empirical densities learned from physical experience by a robot.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Understanding Brain Connectivity Patterns during Motor Imagery for Brain-Computer Interfacing

Grosse-Wentrup, M.

In Advances in neural information processing systems 21, pages: 561-568, (Editors: Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou), Curran, Red Hook, NY, USA, Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
EEG connectivity measures could provide a new type of feature space for inferring a subject‘s intention in Brain-Computer Interfaces (BCIs). However, very little is known on EEG connectivity patterns for BCIs. In this study, EEG connectivity during motor imagery (MI) of the left and right is investigated in a broad frequency range across the whole scalp by combining Beamforming with Transfer Entropy and taking into account possible volume conduction effects. Observed connectivity patterns indicate that modulation intentionally induced by MI is strongest in the gamma-band, i.e., above 35 Hz. Furthermore, modulation between MI and rest is found to be more pronounced than between MI of different hands. This is in contrast to results on MI obtained with bandpower features, and might provide an explanation for the so far only moderate success of connectivity features in BCIs. It is concluded that future studies on connectivity based BCIs should focus on high frequency bands and con side r ex peri mental paradigms that maximally vary cognitive demands between conditions.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Nonlinear causal discovery with additive noise models

Hoyer, P., Janzing, D., Mooij, J., Peters, J., Schölkopf, B.

In Advances in neural information processing systems 21, pages: 689-696, (Editors: D Koller and D Schuurmans and Y Bengio and L Bottou), Curran, Red Hook, NY, USA, 22nd Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
The discovery of causal relationships between a set of observed variables is a fundamental problem in science. For continuous-valued data linear acyclic causal models are often used because these models are well understood and there are well-known methods to fit them to data. In reality, of course, many causal relationships are more or less nonlinear, raising some doubts as to the applicability and usefulness of purely linear methods. In this contribution we show that in fact the basic linear framework can be generalized to nonlinear models with additive noise. In this extended framework, nonlinearities in the data-generating process are in fact a blessing rather than a curse, as they typically provide information on the underlying causal system and allow more aspects of the true data-generating mechanisms to be identified. In addition to theoretical results we show simulations and some simple real data experiments illustrating the identification power provided by nonlinearities.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Bounds on marginal probability distributions

Mooij, JM., Kappen, B.

In Advances in neural information processing systems 21, pages: 1105-1112, (Editors: Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou), Curran, Red Hook, NY, USA, Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
We propose a novel bound on single-variable marginal probability distributions in factor graphs with discrete variables. The bound is obtained by propagating local bounds (convex sets of probability distributions) over a subtree of the factor graph, rooted in the variable of interest. By construction, the method not only bounds the exact marginal probability distribution of a variable, but also its approximate Belief Propagation marginal ("belief"). Thus, apart from providing a practical means to calculate bounds on marginals, our contribution also lies in providing a better understanding of the error made by Belief Propagation. We show that our bound outperforms the state-of-the-art on some inference problems arising in medical diagnosis.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Convex variational Bayesian inference for large scale generalized linear models

Nickisch, H., Seeger, M.

In ICML 2009, pages: 761-768, (Editors: Danyluk, A. , L. Bottou, M. Littman), ACM Press, New York, NY, USA, 26th International Conference on Machine Learning, June 2009 (inproceedings)

Abstract
We show how variational Bayesian inference can be implemented for very large generalized linear models. Our relaxation is proven to be a convex problem for any log-concave model. We provide a generic double loop algorithm for solving this relaxation on models with arbitrary super-Gaussian potentials. By iteratively decoupling the criterion, most of the work can be done by solving large linear systems, rendering our algorithm orders of magnitude faster than previously proposed solvers for the same problem. We evaluate our method on problems of Bayesian active learning for large binary classification models, and show how to address settings with many candidates and sequential inclusion steps.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
An Empirical Analysis of Domain Adaptation Algorithms for Genomic Sequence Analysis

Schweikert, G., Widmer, C., Schölkopf, B., Rätsch, G.

In Advances in neural information processing systems 21, pages: 1433-1440, (Editors: D Koller and D Schuurmans and Y Bengio and L Bottou), Curran, Red Hook, NY, USA, 22nd Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
We study the problem of domain transfer for a supervised classification task in mRNA splicing. We consider a number of recent domain transfer methods from machine learning, including some that are novel, and evaluate them on genomic sequence data from model organisms of varying evolutionary distance. We find that in cases where the organisms are not closely related, the use of domain adaptation methods can help improve classification performance.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Diffeomorphic Dimensionality Reduction

Walder, C., Schölkopf, B.

In Advances in neural information processing systems 21, pages: 1713-1720, (Editors: D Koller and D Schuurmans and Y Bengio and L Bottou), Curran, Red Hook, NY, USA, 22nd Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
This paper introduces a new approach to constructing meaningful lower dimensional representations of sets of data points. We argue that constraining the mapping between the high and low dimensional spaces to be a diffeomorphism is a natural way of ensuring that pairwise distances are approximately preserved. Accordingly we develop an algorithm which diffeomorphically maps the data near to a lower dimensional subspace and then projects onto that subspace. The problem of solving for the mapping is transformed into one of solving for an Eulerian flow field which we compute using ideas from kernel methods. We demonstrate the efficacy of our approach on various real world data sets.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Combining appearance and motion for human action classification in videos

Dhillon, P., Nowozin, S., Lampert, C.

In 1st International Workshop on Visual Scene Understanding, pages: 22-29, IEEE Service Center, Piscataway, NJ, USA, ViSU, June 2009 (inproceedings)

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Let the Kernel Figure it Out: Principled Learning of Pre-processing for Kernel Classifiers

Gehler, P., Nowozin, S.

In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages: 2836-2843, IEEE Service Center, Piscataway, NJ, USA, CVPR, June 2009 (inproceedings)

Abstract
Most modern computer vision systems for high-level tasks, such as image classification, object recognition and segmentation, are based on learning algorithms that are able to separate discriminative information from noise. In practice, however, the typical system consists of a long pipeline of pre-processing steps, such as extraction of different kinds of features, various kinds of normalizations, feature selection, and quantization into aggregated representations such as histograms. Along this pipeline, there are many parameters to set and choices to make, and their effect on the overall system performance is a-priori unclear. In this work, we shorten the pipeline in a principled way. We move pre-processing steps into the learning system by means of kernel parameters, letting the learning algorithm decide upon suitable parameter values. Learning to optimize the pre-processing choices becomes learning the kernel parameters. We realize this paradigm by extending the recent Multiple Kernel Learning formulation from the finite case of having a fixed number of kernels which can be combined to the general infinite case where each possible parameter setting induces an associated kernel. We evaluate the new paradigm extensively on image classification and object classification tasks. We show that it is possible to learn optimal discriminative codebooks and optimal spatial pyramid schemes, consistently outperforming all previous state-of-the-art approaches.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Identifying confounders using additive noise models

Janzing, D., Peters, J., Mooij, J., Schölkopf, B.

In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, pages: 249-257, (Editors: J Bilmes and AY Ng), AUAI Press, Corvallis, OR, USA, UAI, June 2009 (inproceedings)

Abstract
We propose a method for inferring the existence of a latent common cause ("confounder") of two observed random variables. The method assumes that the two effects of the confounder are (possibly nonlinear) functions of the confounder plus independent, additive noise. We discuss under which conditions the model is identifiable (up to an arbitrary reparameterization of the confounder) from the joint distribution of the effects. We state and prove a theoretical result that provides evidence for the conjecture that the model is generically identifiable under suitable technical conditions. In addition, we propose a practical method to estimate the confounder from a finite i.i.d. sample of the effects and illustrate that the method works well on both simulated and real-world data.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Regression by dependence minimization and its application to causal inference in additive noise models

Mooij, J., Janzing, D., Peters, J., Schölkopf, B.

In Proceedings of the 26th International Conference on Machine Learning, pages: 745-752, (Editors: A Danyluk and L Bottou and M Littman), ACM Press, New York, NY, USA, ICML, June 2009 (inproceedings)

Abstract
Motivated by causal inference problems, we propose a novel method for regression that minimizes the statistical dependence between regressors and residuals. The key advantage of this approach to regression is that it does not assume a particular distribution of the noise, i.e., it is non-parametric with respect to the noise distribution. We argue that the proposed regression method is well suited to the task of causal inference in additive noise models. A practical disadvantage is that the resulting optimization problem is generally non-convex and can be difficult to solve. Nevertheless, we report good results on one of the tasks of the NIPS 2008 Causality Challenge, where the goal is to distinguish causes from effects in pairs of statistically dependent variables. In addition, we propose an algorithm for efficiently inferring causal models from observational data for more than two variables. The required number of regressions and independence tests is quadratic in the number of variables, which is a significant improvement over the simple method that tests all possible DAGs.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Fitted Q-iteration by Advantage Weighted Regression

Neumann, G., Peters, J.

In Advances in neural information processing systems 21, pages: 1177-1184, (Editors: Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou), Curran, Red Hook, NY, USA, Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
Recently, fitted Q-iteration (FQI) based methods have become more popular due to their increased sample efficiency, a more stable learning process and the higher quality of the resulting policy. However, these methods remain hard to use for continuous action spaces which frequently occur in real-world tasks, e.g., in robotics and other technical applications. The greedy action selection commonly used for the policy improvement step is particularly problematic as it is expensive for continuous actions, can cause an unstable learning process, introduces an optimization bias and results in highly non-smooth policies unsuitable for real-world systems. In this paper, we show that by using a soft-greedy action selection the policy improvement step used in FQI can be simplified to an inexpensive advantage-weighted regression. With this result, we are able to derive a new, computationally efficient FQI algorithm which can even deal with high dimensional action spaces.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Global Connectivity Potentials for Random Field Models

Nowozin, S., Lampert, C.

In CVPR 2009, pages: 818-825, IEEE Service Center, Piscataway, NJ, USA, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 2009 (inproceedings)

Abstract
Markov random field (MRF, CRF) models are popular in computer vision. However, in order to be computationally tractable they are limited to incorporate only local interactions and cannot model global properties, such as connectedness, which is a potentially useful high-level prior for object segmentation. In this work, we overcome this limitation by deriving a potential function that enforces the output labeling to be connected and that can naturally be used in the framework of recent MAP-MRF LP relaxations. Using techniques from polyhedral combinatorics, we show that a provably tight approximation to the MAP solution of the resulting MRF can still be found efficiently by solving a sequence of max-flow problems. The efficiency of the inference procedure also allows us to learn the parameters of a MRF with global connectivity potentials by means of a cutting plane algorithm. We experimentally evaluate our algorithm on both synthetic data and on the challenging segmentation task of the PASCAL VOC 2008 data set. We show that in both cases the addition of a connectedness prior significantly reduces the segmentation error.

ei

PDF PDF Web DOI [BibTex]

PDF PDF Web DOI [BibTex]


no image
Bayesian Experimental Design of Magnetic Resonance Imaging Sequences

Seeger, M., Nickisch, H., Pohmann, R., Schölkopf, B.

In Advances in neural information processing systems 21, pages: 1441-1448, (Editors: D Koller and D Schuurmans and Y Bengio and L Bottou), Curran, Red Hook, NY, USA, 22nd Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
We show how improved sequences for magnetic resonance imaging can be found through automated optimization of Bayesian design scores. Combining recent advances in approximate Bayesian inference and natural image statistics with high-performance numerical computation, we propose the first scalable Bayesian experimental design framework for this problem of high relevance to clinical and brain research. Our solution requires approximate inference for dense, non-Gaussian models on a scale seldom addressed before. We propose a novel scalable variational inference algorithm, and show how powerful methods of numerical mathematics can be modified to compute primitives in our framework. Our approach is evaluated on a realistic setup with raw data from a 3T MR scanner.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Characteristic Kernels on Groups and Semigroups

Fukumizu, K., Sriperumbudur, B., Gretton, A., Schölkopf, B.

In Advances in neural information processing systems 21, pages: 473-480, (Editors: D Koller and D Schuurmans and Y Bengio and L Bottou), Curran, Red Hook, NY, USA, 22nd Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
Embeddings of random variables in reproducing kernel Hilbert spaces (RKHSs) may be used to conduct statistical inference based on higher order moments. For sufficiently rich (characteristic) RKHSs, each probability distribution has a unique embedding, allowing all statistical properties of the distribution to be taken into consideration. Necessary and sufficient conditions for an RKHS to be characteristic exist for Rn. In the present work, conditions are established for an RKHS to be characteristic on groups and semigroups. Illustrative examples are provided, including characteristic kernels on periodic domains, rotation matrices, and Rn+.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Policy Search for Motor Primitives in Robotics

Kober, J., Peters, J.

In Advances in neural information processing systems 21, pages: 849-856, (Editors: Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou), Curran, Red Hook, NY, USA, Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
Many motor skills in humanoid robotics can be learned using parametrized motor primitives as done in imitation learning. However, most interesting motor learning problems are high-dimensional reinforcement learning problems often beyond the reach of current methods. In this paper, we extend previous work on policy learning from the immediate reward case to episodic reinforcement learning. We show that this results into a general, common framework also connected to policy gradient methods and yielding a novel algorithm for policy learning by assuming a form of exploration that is particularly well-suited for dynamic motor primitives. The resulting algorithm is an EM-inspired algorithm applicable in complex motor learning tasks. We compare this algorithm to alternative parametrized policy search methods and show that it outperforms previous methods. We apply it in the context of motor learning and show that it can learn a complex Ball-in-a-Cup task using a real Barrett WAM robot arm.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
The graphlet spectrum

Kondor, R., Shervashidze, N., Borgwardt, K.

In Proceedings of the 26th International Conference on Machine Learning (ICML 2009), pages: 529-536, (Editors: Danyluk, A. , L. Bottou, M. Littman), ACM Press, New York, NY, USA, 26th International Conference on Machine Learning (ICML), June 2009 (inproceedings)

Abstract
Current graph kernels suffer from two limitations: graph kernels based on counting particular types of subgraphs ignore the relative position of these subgraphs to each other, while graph kernels based on algebraic methods are limited to graphs without node labels. In this paper we present the graphlet spectrum, a system of graph invariants derived by means of group representation theory that capture information about the number as well as the position of labeled subgraphs in a given graph. In our experimental evaluation the graphlet spectrum outperforms state-of-the-art graph kernels.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Learning Similarity Measure for Multi-Modal 3D Image Registration

Lee, D., Hofmann, M., Steinke, F., Altun, Y., Cahill, N., Schölkopf, B.

In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages: 186-193, IEEE Service Center, Piscataway, NJ, USA, CVPR, June 2009 (inproceedings)

Abstract
Multi-modal image registration is a challenging problem in medical imaging. The goal is to align anatomically identical structures; however, their appearance in images acquired with different imaging devices, such as CT or MR, may be very different. Registration algorithms generally deform one image, the floating image, such that it matches with a second, the reference image, by maximizing some similarity score between the deformed and the reference image. Instead of using a universal, but a priori fixed similarity criterion such as mutual information, we propose learning a similarity measure in a discriminative manner such that the reference and correctly deformed floating images receive high similarity scores. To this end, we develop an algorithm derived from max-margin structured output learning, and employ the learned similarity measure within a standard rigid registration algorithm. Compared to other approaches, our method adapts to the specific registration problem at hand and exploits correlations between neighboring pixels in the reference and the floating image. Empirical evaluation on CT-MR/PET-MR rigid registration tasks demonstrates that our approach yields robust performance and outperforms the state of the art methods for multi-modal medical image registration.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Learning Complex Motions by Sequencing Simpler Motion Templates

Neumann, G., Maass, W., Peters, J.

In ICML 2009, pages: 753-760, (Editors: Danyluk, A. , L. Bottou, M. Littman), ACM Press, New York, NY, USA, 26th International Conference on Machine Learning, June 2009 (inproceedings)

Abstract
Abstraction of complex, longer motor tasks into simpler elemental movements enables humans and animals to exhibit motor skills which have not yet been matched by robots. Humans intuitively decompose complex motions into smaller, simpler segments. For example when describing simple movements like drawing a triangle with a pen, we can easily name the basic steps of this movement. Surprisingly, such abstractions have rarely been used in artificial motor skill learning algorithms. These algorithms typically choose a new action (such as a torque or a force) at a very fast time-scale. As a result, both policy and temporal credit assignment problem become unnecessarily complex - often beyond the reach of current machine learning methods. We introduce a new framework for temporal abstractions in reinforcement learning (RL), i.e. RL with motion templates. We present a new algorithm for this framework which can learn high-quality policies by making only few abstract decisions.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Solution Stability in Linear Programming Relaxations: Graph Partitioning and Unsupervised Learning

Nowozin, S., Jegelka, S.

In ICML 2009, pages: 769-776, (Editors: Danyluk, A. , L. Bottou, M. Littman), ACM Press, New York, NY, USA, 26th International Conference on Machine Learning, June 2009 (inproceedings)

Abstract
We propose a new method to quantify the solution stability of a large class of combinatorial optimization problems arising in machine learning. As practical example we apply the method to correlation clustering, clustering aggregation, modularity clustering, and relative performance significance clustering. Our method is extensively motivated by the idea of linear programming relaxations. We prove that when a relaxation is used to solve the original clustering problem, then the solution stability calculated by our method is conservative, that is, it never overestimates the solution stability of the true, unrelaxed problem. We also demonstrate how our method can be used to compute the entire path of optimal solutions as the optimization problem is increasingly perturbed. Experimentally, our method is shown to perform well on a number of benchmark problems.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Kernel Measures of Independence for Non-IID Data

Zhang, X., Song, L., Gretton, A., Smola, A.

In Advances in neural information processing systems 21, pages: 1937-1944, (Editors: Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou), Curran, Red Hook, NY, USA, Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
Many machine learning algorithms can be formulated in the framework of statistical independence such as the Hilbert Schmidt Independence Criterion. In this paper, we extend this criterion to deal with structured and interdependent observations. This is achieved by modeling the structures using undirected graphical models and comparing the Hilbert space embeddings of distributions. We apply this new criterion to independent component analysis and sequence clustering.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Using Bayesian Dynamical Systems for Motion Template Libraries

Chiappa, S., Kober, J., Peters, J.

In Advances in neural information processing systems 21, pages: 297-304, (Editors: Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou), Curran, Red Hook, NY, USA, Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
Motor primitives or motion templates have become an important concept for both modeling human motor control as well as generating robot behaviors using imitation learning. Recent impressive results range from humanoid robot movement generation to timing models of human motions. The automatic generation of skill libraries containing multiple motion templates is an important step in robot learning. Such a skill learning system needs to cluster similar movements together and represent each resulting motion template as a generative model which is subsequently used for the execution of the behavior by a robot system. In this paper, we show how human trajectories captured as multidimensional time-series can be clustered using Bayesian mixtures of linear Gaussian state-space models based on the similarity of their dynamics. The appropriate number of templates is automatically determined by enforcing a parsimonious parametrization. As the resulting model is intractable, we introduce a novel approximation method based on variational Bayes, which is especially designed to enable the use of efficient inference algorithms. On recorded human Balero movements, this method is not only capable of finding reasonable motion templates but also yields a generative model which works well in the execution of this complex task on a simulated anthropomorphic SARCOS arm.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Multi-way set enumeration in real-valued tensors

Georgii, E., Tsuda, K., Schölkopf, B.

In Proceedings of the 2nd Workshop on Data Mining using Matrices and Tensors (DMMT 2009), pages: 32-41, (Editors: C Ding and T Li), ACM Press, New York, NY, USA, 2nd Workshop on Data Mining using Matrices and Tensors (DMMT/KDD), June 2009 (inproceedings)

Abstract
The analysis of n-ary relations receives attention in many different fields, for instance biology, web mining, and social studies. In the basic setting, there are n sets of instances, and each observation associates n instances, one from each set. A common approach to explore these n-way data is the search for n-set patterns. An n-set pattern consists of specific subsets of the n instance sets such that all possible n- ary associations between the corresponding instances are observed. This provides a higher-level view of the data, revealing associative relationships between groups of instances. Here, we generalize this approach in two respects. First, we tolerate missing observations to a certain degree, that means we are also interested in n-sets where most (although not all) of the possible combinations have been recorded in the data. Second, we take association weights into account. More precisely, we propose a method to enumerate all n- sets that satisfy a minimum threshold with respect to the average association weight. Non-observed associations obtain by default a weight of zero. Technically, we solve the enumeration task using a reverse search strategy, which allows for effective pruning of the search space. In addition, our algorithm provides a ranking of the solutions and can consider further constraints. We show experimental results on artificial and real-world data sets from different domains.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Non-parametric Regression between Riemannian Manifolds

Steinke, F., Hein, M.

In Advances in neural information processing systems 21, pages: 1561-1568, (Editors: Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou), Curran, Red Hook, NY, USA, Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
This paper discusses non-parametric regression between Riemannian manifolds. This learning problem arises frequently in many application areas ranging from signal processing, computer vision, over robotics to computer graphics. We present a new algorithmic scheme for the solution of this general learning problem based on regularized empirical risk minimization. The regularization functional takes into account the geometry of input and output manifold, and we show that it implements a prior which is particularly natural. Moreover, we demonstrate that our algorithm performs well in a difficult surface registration problem.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Near-optimal supervised feature selection among frequent subgraphs

Thoma, M., Cheng, H., Gretton, A., Han, J., Kriegel, H., Smola, A., Song, L., Yu, P., Yan, X., Borgwardt, K.

In Proccedings of the 2009 SIAM Conference on Data Mining (SDM 2009), pages: 1076-1087, (Editors: Park, H. , S. Parthasarathy, H. Liu), Philadelphia, PA, USA, Society for Industrial and Applied Mathematics, 9th SIAM Conference on Data Mining (SDM), May 2009 (inproceedings)

Abstract
Graph classification is an increasingly important step in numerous application domains, such as function prediction of molecules and proteins, computerised scene analysis, and anomaly detection in program flows. Among the various approaches proposed in the literature, graph classification based on frequent subgraphs is a popular branch: Graphs are represented as (usually binary) vectors, with components indicating whether a graph contains a particular subgraph that is frequent across the dataset. On large graphs, however, one faces the enormous problem that the number of these frequent subgraphs may grow exponentially with the size of the graphs, but only few of them possess enough discriminative power to make them useful for graph classification. Efficient and discriminative feature selection among frequent subgraphs is hence a key challenge for graph mining. In this article, we propose an approach to feature selection on frequent subgraphs, called CORK, that combines two central advantages. First, it optimizes a submodular quality criterion, which means that we can yield a near-optimal solution using greedy feature selection. Second, our submodular quality function criterion can be integrated into gSpan, the state-of-the-art tool for frequent subgraph mining, and help to prune the search space for discriminative frequent subgraphs even during frequent subgraph mining.

ei

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Semi-supervised subspace analysis of human functional magnetic resonance imaging data

Shelton, J., Blaschko, M., Bartels, A.

(185), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, May 2009 (techreport)

Abstract
Kernel Canonical Correlation Analysis is a very general technique for subspace learning that incorporates PCA and LDA as special cases. Functional magnetic resonance imaging (fMRI) acquired data is naturally amenable to these techniques as data are well aligned. fMRI data of the human brain is a particularly interesting candidate. In this study we implemented various supervised and semi-supervised versions of KCCA on human fMRI data, with regression to single- and multi-variate labels (corresponding to video content subjects viewed during the image acquisition). In each variate condition, the semi-supervised variants of KCCA performed better than the supervised variants, including a supervised variant with Laplacian regularization. We additionally analyze the weights learned by the regression in order to infer brain regions that are important to different types of visual processing.

ei

PDF [BibTex]

PDF [BibTex]


no image
Center-surround patterns emerge as optimal predictors for human saccade targets

Kienzle, W., Franz, M., Schölkopf, B., Wichmann, F.

Journal of Vision, 9(5:7):1-15, May 2009 (article)

Abstract
The human visual system is foveated, that is, outside the central visual field resolution and acuity drop rapidly. Nonetheless much of a visual scene is perceived after only a few saccadic eye movements, suggesting an effective strategy for selecting saccade targets. It has been known for some time that local image structure at saccade targets influences the selection process. However, the question of what the most relevant visual features are is still under debate. Here we show that center-surround patterns emerge as the optimal solution for predicting saccade targets from their local image structure. The resulting model, a one-layer feed-forward network, is surprisingly simple compared to previously suggested models which assume much more complex computations such as multi-scale processing and multiple feature channels. Nevertheless, our model is equally predictive. Furthermore, our findings are consistent with neurophysiological hardware in the superior colliculus. Bottom-up visual saliency may thus not be computed cortically as has been thought previously.

ei

PDF DOI [BibTex]


no image
Link Propagation: A Fast Semi-supervised Learning Algorithm for Link Prediction

Kashima, H., Kato, T., Yamanishi, Y., Sugiyama, M., Tsuda, K.

In Proceedings of the 2009 SIAM International Conference on Data Mining, pages: 1099-1110, (Editors: Park, H. , S. Parthasarathy, H. Liu), Philadelphia, PA, USA, Society for Industrial and Applied Mathematics, SDM, May 2009 (inproceedings)

Abstract
We propose Link Propagation as a new semi-supervised learning method for link prediction problems, where the task is to predict unknown parts of the network structure by using auxiliary information such as node similarities. Since the proposed method can fill in missing parts of tensors, it is applicable to multi-relational domains, allowing us to handle multiple types of links simultaneously. We also give a novel efficient algorithm for Link Propagation based on an accelerated conjugate gradient method.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Data Mining for Biologists

Tsuda, K.

In Biological Data Mining in Protein Interaction Networks, pages: 14-27, (Editors: Li, X. and Ng, S.-K.), Medical Information Science Reference, Hershey, PA, USA, May 2009 (inbook)

Abstract
In this tutorial chapter, we review basics about frequent pattern mining algorithms, including itemset mining, association rule mining and graph mining. These algorithms can find frequently appearing substructures in discrete data. They can discover structural motifs, for example, from mutation data, protein structures and chemical compounds. As they have been primarily used for business data, biological applications are not so common yet, but their potential impact would be large. Recent advances in computers including multicore machines and ever increasing memory capacity support the application of such methods to larger datasets. We explain technical aspects of the algorithms, but do not go into details. Current biological applications are summarized and possible future directions are given.

ei

Web [BibTex]

Web [BibTex]


no image
Learning motor primitives for robotics

Kober, J., Peters, J.

In Proceedings of the 2009 IEEE International Conference on Robotics and Automation (ICRA 2009), pages: 2112-2118, IEEE Service Center, Piscataway, NJ, USA, IEEE International Conference on Robotics and Automation (ICRA '09), May 2009 (inproceedings)

Abstract
The acquisition and self-improvement of novel motor skills is among the most important problems in robotics. Motor primitives offer one of the most promising frameworks for the application of machine learning techniques in this context. Employing an improved form of the dynamic systems motor primitives originally introduced by Ijspeert et al. [2], we show how both discrete and rhythmic tasks can be learned using a concerted approach of both imitation and reinforcement learning. For doing so, we present both learning algorithms and representations targeted for the practical application in robotics. Furthermore, we show that it is possible to include a start-up phase in rhythmic primitives. We show that two new motor skills, i.e., Ball-in-a-Cup and Ball-Paddling, can be learned on a real Barrett WAM robot arm at a pace similar to human learning while achieving a significantly more reliable final performance.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
A robust Bayesian two-sample test for detecting intervals of differential gene expression in microarray time series

Stegle, O., Denby, K., Wild, DL., Ghahramani, Z., Borgwardt, KM.

In Research in Computational Molecular Biology, pages: 201-216, (Editors: Batzoglou, S. ), Springer, Berlin, Germany, 13th Annual International Conference on Research in Computational Molecular Biology (RECOMB), May 2009 (inproceedings)

Abstract
Understanding the regulatory mechanisms that are responsible for an organism’s response to environmental changes is an important question in molecular biology. A first and important step towards this goal is to detect genes whose expression levels are affected by altered external conditions. A range of methods to test for differential gene expression, both in static as well as in time-course experiments, have been proposed. While these tests answer the question whether a gene is differentially expressed, they do not explicitly address the question when a gene is differentially expressed, although this information may provide insights into the course and causal structure of regulatory programs. In this article, we propose a two-sample test for identifying intervals of differential gene expression in microarray time series. Our approach is based on Gaussian process regression, can deal with arbitrary numbers of replicates and is robust with respect to outliers. We apply our algorithm to study the response of Arabidopsis thaliana genes to an infection by a fungal pathogen using a microarray time series dataset covering 30,336 gene probes at 24 time points. In classification experiments our test compares favorably with existing methods and provides additional insights into time-dependent differential expression.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
A Bayesian Approach to Graph Regression with Relevant Subgraph Selection

Chiappa, S., Saigo, H., Tsuda, K.

In SIAM International Conference on Data Mining, pages: 295-304, (Editors: Park, H. , S. Parthasarathy, H. Liu), Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, SDM, May 2009 (inproceedings)

Abstract
Many real-world applications with graph data require the efficient solution of a given regression task as well as the identification of the subgraphs which are relevant for the task. In these cases graphs are commonly represented as binary vectors of indicators of subgraphs, giving rise to an intractable input dimensionality. An efficient solution to this problem was recently proposed by a Lasso-type method where the objective function optimization over an intractable number of variables is reformulated as a dual mathematical programming problem over a small number of variables but a large number of constraints. The dual problem is then solved by column generation where the subgraphs corresponding to the most violated constraints are found by weighted subgraph mining. This paper proposes an extension of this method to a fully Bayesian approach which defines a prior distribution on the parameters and integrate them out from the model, thus providing a posterior distribution on the target variable as opposed to a single estimate. The advantage of this approach is that the extra information given by the target posterior distribution can be used for improving the model in several ways. In this paper, we use the target posterior variance as a measure of uncertainty in the prediction and show that, by rejecting unconfident predictions, we can improve state-of-the-art performance on several molecular graph datasets.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Efficient data reuse in value function approximation

Hachiya, H., Akiyama, T., Sugiyama, M., Peters, J.

In IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning, pages: 8-15, IEEE Service Center, Piscataway, NJ, USA, IEEE ADPRL, May 2009 (inproceedings)

Abstract
Off-policy reinforcement learning is aimed at efficiently using data samples gathered from a policy that is different from the currently optimized policy. A common approach is to use importance sampling techniques for compensating for the bias of value function estimators caused by the difference between the data-sampling policy and the target policy. However, existing off-policy methods often do not take the variance of the value function estimators explicitly into account and therefore their performance tends to be unstable. To cope with this problem, we propose using an adaptive importance sampling technique which allows us to actively control the trade-off between bias and variance. We further provide a method for optimally determining the trade-off parameter based on a variant of cross-validation. The usefulness of the proposed approach is demonstrated through simulated swing-up inverted-pendulum problem.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Influence of Different Assignment Conditions on the Determination of Symmetric Homo-dimeric Structures with ARIA

Bardiaux, B., Bernard, A., Rieping, W., Habeck, M., Malliavin, TE., Nilges, M.

Proteins, 75(3):569-585, May 2009 (article)

Abstract
The ambiguous restraint for iterative assignment (ARIA) approach for NMR structure calculation is evaluated for symmetric homodimeric proteins by assessing the effect of several data analysis and assignment methods on the structure quality. In particular, we study the effects of network anchoring and spin-diffusion correction. The spin-diffusion correction improves the protein structure quality systematically, whereas network anchoring enhances the assignment efficiency by speeding up the convergence and coping with highly ambiguous data. For some homodimeric folds, network anchoring has been proved essential for unraveling both chain and proton assignment ambiguities.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Using reward-weighted imitation for robot Reinforcement Learning

Peters, J., Kober, J.

In IEEE ADPRL 2009, pages: 226-232, IEEE Service Center, Piscataway, NJ, USA, 2009 IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning, May 2009 (inproceedings)

Abstract
Reinforcement Learning is an essential ability for robots to learn new motor skills. Nevertheless, few methods scale into the domain of anthropomorphic robotics. In order to improve in terms of efficiency, the problem is reduced onto reward-weighted imitation. By doing so, we are able to generate a framework for policy learning which both unifies previous reinforcement learning approaches and allows the derivation of novel algorithms. We show our two most relevant applications both for motor primitive learning (e.g., a complex Ball-in-a-Cup task using a real Barrett WAM robot arm) and learning task-space control.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Denoising photographs using dark frames optimized by quadratic programming

Gomez Rodriguez, M., Kober, J., Schölkopf, B.

In Proceedings of the First IEEE International Conference on Computational Photography (ICCP 2009), pages: 1-9, IEEE, Piscataway, NJ, USA, First IEEE International Conference on Computational Photography (ICCP), April 2009 (inproceedings)

Abstract
Photographs taken with long exposure or high ISO setting may contain substantial amounts of noise, drastically reducing the Signal-To-Noise Ratio (SNR). This paper presents a novel optimization approach for denoising. It is based on a library of dark frames previously taken under varying conditions of temperature, ISO setting and exposure time, and a quality measure or prior for the class of images to denoise. The method automatically computes a synthetic dark frame that, when subtracted from an image, optimizes the quality measure. For specific choices of the quality measure, the denoising problem reduces to a quadratic programming (QP) problem that can be solved efficiently. We show experimentally that it is sufficient to consider a limited subsample of pixels when evaluating the quality measure in the optimization, in which case the complexity of the procedure does not depend on the size of the images but only on the number of dark frames. We provide quantitative experimental results showing that our method automatically computes dark frames that are competitive with those taken under idealized conditions (controlled temperature, ISO setting, exposure time, and averaging of multiple exposures). We provide application examples in astronomical image denoising. The method is validated on two CMOS SLRs.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
On Pairwise Kernels: An Efficient Alternative and Generalization Analysis

Kashima, H., Oyama, S., Yamanishi, Y., Tsuda, K.

In Advances in Knowledge Discovery and Data Mining: 13th Pacific-Asia Conference, pages: 1030-1037, (Editors: Theeramunkong, T. , B. Kijsirikul, N. Cercone, T. B. Ho), Springer, Berlin, Germany, PAKDD, April 2009 (inproceedings)

Abstract
Pairwise classification has many applications including network prediction, entity resolution, and collaborative filtering. The pairwise kernel has been proposed for those purposes by several research groups independently, and become successful in various fields. In this paper, we propose an efficient alternative which we call Cartesian kernel. While the existing pairwise kernel (which we refer to as Kronecker kernel) can be interpreted as the weighted adjacency matrix of the Kronecker product graph of two graphs, the Cartesian kernel can be interpreted as that of the Cartesian graph which is more sparse than the Kronecker product graph. Experimental results show the Cartesian kernel is much faster than the existing pairwise kernel, and at the same time, competitive with the existing pairwise kernel in predictive performance.We discuss the generalization bounds by the two pairwise kernels by using eigenvalue analysis of the kernel matrices.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Convex Perturbations for Scalable Semidefinite Programming

Kulis, B., Sra, S., Dhillon, I.

In JMLR Workshop and Conference Proceedings Volume 5: AISTATS 2009, pages: 296-303, (Editors: van Dyk, D. , M. Welling), MIT Press, Cambridge, MA, USA, Twelfth International Conference on Artificial Intelligence and Statistics, April 2009 (inproceedings)

Abstract
Many important machine learning problems are modeled and solved via semidefinite programs; examples include metric learning, nonlinear embedding, and certain clustering problems. Often, off-the-shelf software is invoked for the associated optimization, which can be inappropriate due to excessive computational and storage requirements. In this paper, we introduce the use of convex perturbations for solving semidefinite programs (SDPs), and for a specific perturbation we derive an algorithm that has several advantages over existing techniques: a) it is simple, requiring only a few lines of Matlab, b) it is a first-order method, and thereby scalable, and c) it can easily exploit the structure of a given SDP (e.g., when the constraint matrices are low-rank, a situation common to several machine learning SDPs). A pleasant byproduct of our method is a fast, kernelized version of the large-margin nearest neighbor metric learning algorithm. We demonstrate that our algorithm is effective in finding fast approximations to large-scale SDPs arising in some machine learning applications.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Block Jacobi-type methods for non-orthogonal joint diagonalisation

Shen, H., Hüper, K.

In ICASSP09, pages: 3285-3288, IEEE Service Center, Piscataway, NJ, USA, 34th International Conference on Acoustics, Speech, and Signal Processing, April 2009 (inproceedings)

Abstract
In this paper, we study the problem of non-orthogonal joint diagonalisation of a set of real symmetric matrices via simultaneous conjugation. A family of block Jacobi-type methods are proposed to optimise two popular cost functions for the non-orthogonal joint diagonalisation, namely, the off-norm function and the log-likelihood function. By exploiting the appropriate underlying manifold, namely the so-called oblique manifold, rigorous analysis shows that, under the exact non-orthogonal joint diagonalisation setting, the proposed methods converge locally quadratically fast to a joint diagonaliser. Finally, performance of our methods is investigated by numerical experiments for both exact and approximate non-orthogonal joint diagonalisation.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Beamforming in Noninvasive Brain-Computer Interfaces

Grosse-Wentrup, M., Liefhold, C., Gramann, K., Buss, M.

IEEE Transactions on Biomedical Engineering, 56(4):1209-1219, April 2009 (article)

Abstract
Spatial filtering (SF) constitutes an integral part of building EEG-based brain–computer interfaces (BCIs). Algorithms frequently used for SF, such as common spatial patterns (CSPs) and independent component analysis, require labeled training data for identifying filters that provide information on a subject‘s intention, which renders these algorithms susceptible to overfitting on artifactual EEG components. In this study, beamforming is employed to construct spatial filters that extract EEG sources originating within predefined regions of interest within the brain. In this way, neurophysiological knowledge on which brain regions are relevant for a certain experimental paradigm can be utilized to construct unsupervised spatial filters that are robust against artifactual EEG components. Beamforming is experimentally compared with CSP and Laplacian spatial filtering (LP) in a two-class motor-imagery paradigm. It is demonstrated that beamforming outperforms CSP and LP on noisy datasets, while CSP and beamforming perform almost equally well on datasets with few artifactual trials. It is concluded that beamforming constitutes an alternative method for SF that might be particularly useful for BCIs used in clinical settings, i.e., in an environment where artifact-free datasets are difficult to obtain.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Constructing Sparse Kernel Machines Using Attractors

Lee, D., Jung, K., Lee, J.

IEEE Transactions on Neural Networks, 20(4):721-729, April 2009 (article)

Abstract
In this brief, a novel method that constructs a sparse kernel machine is proposed. The proposed method generates attractors as sparse solutions from a built-in kernel machine via a dynamical system framework. By readjusting the corresponding coefficients and bias terms, a sparse kernel machine that approximates a conventional kernel machine is constructed. The simulation results show that the constructed sparse kernel machine improves the efficiency of testing phase while maintaining comparable test error.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Optimal construction of k-nearest-neighbor graphs for identifying noisy clusters

Maier, M., Hein, M., von Luxburg, U.

Theoretical Computer Science, 410(19):1749-1764, April 2009 (article)

Abstract
We study clustering algorithms based on neighborhood graphs on a random sample of data points. The question we ask is how such a graph should be constructed in order to obtain optimal clustering results. Which type of neighborhood graph should one choose, mutual k-nearest-neighbor or symmetric k-nearest-neighbor? What is the optimal parameter k? In our setting, clusters are defined as connected components of the t-level set of the underlying probability distribution. Clusters are said to be identified in the neighborhood graph if connected components in the graph correspond to the true underlying clusters. Using techniques from random geometric graph theory, we prove bounds on the probability that clusters are identified successfully, both in a noise-free and in a noisy setting. Those bounds lead to several conclusions. First, k has to be chosen surprisingly high (rather of the order n than of the order logn) to maximize the probability of cluster identification. Secondly, the major difference between the mutual and the symmetric k-nearest-neighbor graph occurs when one attempts to detect the most significant cluster only.

ei

PDF PDF DOI [BibTex]


no image
An Expectation Maximization Algorithm for Continuous Markov Decision Processes with Arbitrary Reward

Hoffman, M., Freitas, N., Doucet, A., Peters, J.

In JMLR Workshop and Conference Proceedings Volume 5: AISTATS 2009, pages: 232-239, (Editors: van Dyk, D. , M. Welling), MIT Press, Cambridge, MA, USA, Twelfth International Conference on Artificial Intelligence and Statistics, April 2009 (inproceedings)

Abstract
We derive a new expectation maximization algorithm for policy optimization in linear Gaussian Markov decision processes, where the reward function is parameterised in terms of a flexible mixture of Gaussians. This approach exploits both analytical tractability and numerical optimization. Consequently, on the one hand, it is more flexible and general than closed-form solutions, such as the widely used linear quadratic Gaussian (LQG) controllers. On the other hand, it is more accurate and faster than optimization methods that rely on approximation and simulation. Partial analytical solutions (though costly) eliminate the need for simulation and, hence, avoid approximation error. The experiments will show that for the same cost of computation, policy optimization methods that rely on analytical tractability have higher value than the ones that rely on simulation.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Efficient Graphlet Kernels for Large Graph Comparison

Shervashidze, N., Vishwanathan, S., Petri, T., Mehlhorn, K., Borgwardt, K.

In JMLR Workshop and Conference Proceedings Volume 5: AISTATS 2009, pages: 488-495, (Editors: Van Dyk, D. , M. Welling), MIT Press, Cambridge, MA, USA, Twelfth International Conference on Artificial Intelligence and Statistics, April 2009 (inproceedings)

Abstract
State-of-the-art graph kernels do not scale to large graphs with hundreds of nodes and thousands of edges. In this article we propose to compare graphs by counting {it graphlets}, ie subgraphs with $k$ nodes where $k in { 3, 4, 5 }$. Exhaustive enumeration of all graphlets being prohibitively expensive, we introduce two theoretically grounded speedup schemes, one based on sampling and the second one specifically designed for bounded degree graphs. In our experimental evaluation, our novel kernels allow us to efficiently compare large graphs that cannot be tackled by existing graph kernels.

ei

PDF Web [BibTex]

PDF Web [BibTex]