Header logo is


2005


no image
Implicit Wiener series for higher-order image analysis

Franz, M., Schölkopf, B.

In Advances in Neural Information Processing Systems 17, pages: 465-472, (Editors: LK Saul and Y Weiss and L Bottou), MIT Press, Cambridge, MA, USA, 18th Annual Conference on Neural Information Processing Systems (NIPS), July 2005 (inproceedings)

Abstract
The computation of classical higher-order statistics such as higher-order moments or spectra is difficult for images due to the huge number of terms to be estimated and interpreted. We propose an alternative approach in which multiplicative pixel interactions are described by a series of Wiener functionals. Since the functionals are estimated implicitly via polynomial kernels, the combinatorial explosion associated with the classical higher-order statistics is avoided. First results show that image structures such as lines or corners can be predicted correctly, and that pixel interactions up to the order of five play an important role in natural images.

ei

PDF Web [BibTex]

2005


PDF Web [BibTex]


no image
Limits of Spectral Clustering

von Luxburg, U., Bousquet, O., Belkin, M.

In Advances in Neural Information Processing Systems 17, pages: 857-864, (Editors: Saul, L. K., Y. Weiss, L. Bottou), MIT Press, Cambridge, MA, USA, Eighteenth Annual Conference on Neural Information Processing Systems (NIPS), July 2005 (inproceedings)

Abstract
An important aspect of clustering algorithms is whether the partitions constructed on finite samples converge to a useful clustering of the whole data space as the sample size increases. This paper investigates this question for normalized and unnormalized versions of the popular spectral clustering algorithm. Surprisingly, the convergence of unnormalized spectral clustering is more difficult to handle than the normalized case. Even though recently some first results on the convergence of normalized spectral clustering have been obtained, for the unnormalized case we have to develop a completely new approach combining tools from numerical integration, spectral and perturbation theory, and probability. It turns out that while in the normalized case, spectral clustering usually converges to a nice partition of the data space, in the unnormalized case the same only holds under strong additional assumptions which are not always satisfied. We conclude that our analysis gives strong evidence for the superiority of normalized spectral clustering. It also provides a basis for future exploration of other Laplacian-based methods.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Semi-supervised Learning on Directed Graphs

Zhou, D., Schölkopf, B., Hofmann, T.

In Advances in Neural Information Processing Systems 17, pages: 1633-1640, (Editors: LK Saul and Y Weiss and L Bottou), MIT Press, Cambridge, MA, USA, 18th Annual Conference on Neural Information Processing Systems (NIPS), July 2005 (inproceedings)

Abstract
Given a directed graph in which some of the nodes are labeled, we investigate the question of how to exploit the link structure of the graph to infer the labels of the remaining unlabeled nodes. To that extent we propose a regularization framework for functions defined over nodes of a directed graph that forces the classification function to change slowly on densely linked subgraphs. A powerful, yet computationally simple classification algorithm is derived within the proposed framework. The experimental evaluation on real-world Web classification problems demonstrates encouraging results that validate our approach.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Splines with non positive kernels

Canu, S., Ong, CS., Mary, X.

In 5th International ISAAC Congress, pages: 1-10, (Editors: Begehr, H. G.W., F. Nicolosi), World Scientific, Singapore, 5th International ISAAC Congress, July 2005 (inproceedings)

Abstract
Non parametric regressions methods can be presented in two main clusters. The one of smoothing splines methods requiring positive kernels and the other one known as Nonparametric Kernel Regression allowing the use of non positive kernels such as the Epanechnikov kernel. We propose a generalization of the smoothing spline method to include kernels which are still symmetric but not positive semi definite (they are called indefinite). The general relationship between smoothing spline, Reproducing Kernel Hilbert Spaces and positive kernels no longer exists with indefinite kernel. Instead they are associated with functional spaces called Reproducing Kernel Krein Spaces (RKKS) embedded with an indefinite inner product and thus not directly associated with a norm. Smothing splines in RKKS have many of the interesting properties of splines in RKHS, such as orthogon ality, projection, representer theorem and generalization bounds. We show that smoothing splines can be defined in RKKS as the regularized solution of the interpolation problem. Since no norm is available in a RKKS, Tikhonov regularization cannot be defined. Instead, we proposed to use iterative methods of conjugate gradient type with early stopping as regularization mechanism. Several iterative algorithms were collected which can be used to solve the optimization problems associated with learning in indefinite spaces. Some preliminary experiments with indefinite kernels for spline smoothing are reported revealing the computational efficiency of the approach.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Kernel Methods for Implicit Surface Modeling

Schölkopf, B., Giesen, J., Spalinger, S.

In Advances in Neural Information Processing Systems 17, pages: 1193-1200, (Editors: LK Saul and Y Weiss and L Bottou), MIT Press, Cambridge, MA, USA, 18th Annual Conference on Neural Information Processing Systems (NIPS), July 2005 (inproceedings)

Abstract
We describe methods for computing an implicit model of a hypersurface that is given only by a finite sampling. The methods work by mapping the sample points into a reproducing kernel Hilbert space and then determining regions in terms of hyperplanes.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Comparative evaluation of Independent Components Analysis algorithms for isolating target-relevant information in brain-signal classification

Hill, N., Schröder, M., Lal, T., Schölkopf, B.

Brain-Computer Interface Technology, 3, pages: 95, June 2005 (poster)

ei

PDF [BibTex]


no image
Machine-Learning Approaches to BCI in Tübingen

Bensch, M., Bogdan, M., Hill, N., Lal, T., Rosenstiel, W., Schölkopf, B., Schröder, M.

Brain-Computer Interface Technology, June 2005, Talk given by NJH. (talk)

ei

[BibTex]

[BibTex]


no image
Image Reconstruction by Linear Programming

Tsuda, K., Rätsch, G.

IEEE Transactions on Image Processing, 14(6):737-744, June 2005 (article)

Abstract
One way of image denoising is to project a noisy image to the subspace of admissible images derived, for instance, by PCA. However, a major drawback of this method is that all pixels are updated by the projection, even when only a few pixels are corrupted by noise or occlusion. We propose a new method to identify the noisy pixels by l1-norm penalization and to update the identified pixels only. The identification and updating of noisy pixels are formulated as one linear program which can be efficiently solved. In particular, one can apply the upsilon trick to directly specify the fraction of pixels to be reconstructed. Moreover, we extend the linear program to be able to exploit prior knowledge that occlusions often appear in contiguous blocks (e.g., sunglasses on faces). The basic idea is to penalize boundary points and interior points of the occluded area differently. We are also able to show the upsilon property for this extended LP leading to a method which is easy to use. Experimental results demonstrate the power of our approach.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
RASE: recognition of alternatively spliced exons in C.elegans

Rätsch, G., Sonnenburg, S., Schölkopf, B.

Bioinformatics, 21(Suppl. 1):i369-i377, June 2005 (article)

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Matrix Exponentiated Gradient Updates for On-line Learning and Bregman Projection

Tsuda, K., Rätsch, G., Warmuth, M.

Journal of Machine Learning Research, 6, pages: 995-1018, June 2005 (article)

Abstract
We address the problem of learning a symmetric positive definite matrix. The central issue is to design parameter updates that preserve positive definiteness. Our updates are motivated with the von Neumann divergence. Rather than treating the most general case, we focus on two key applications that exemplify our methods: on-line learning with a simple square loss, and finding a symmetric positive definite matrix subject to linear constraints. The updates generalize the exponentiated gradient (EG) update and AdaBoost, respectively: the parameter is now a symmetric positive definite matrix of trace one instead of a probability vector (which in this context is a diagonal positive definite matrix with trace one). The generalized updates use matrix logarithms and exponentials to preserve positive definiteness. Most importantly, we show how the derivation and the analyses of the original EG update and AdaBoost generalize to the non-diagonal case. We apply the resulting matrix exponentiated gradient (MEG) update and DefiniteBoost to the problem of learning a kernel matrix from distance measurements.

ei

PDF [BibTex]

PDF [BibTex]


no image
Measuring Statistical Dependence with Hilbert-Schmidt Norms

Gretton, A., Bousquet, O., Smola, A., Schölkopf, B.

(140), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, June 2005 (techreport)

Abstract
We propose an independence criterion based on the eigenspectrum of covariance operators in reproducing kernel Hilbert spaces (RKHSs), consisting of an empirical estimate of the Hilbert-Schmidt norm of the cross-covariance operator (we term this a Hilbert-Schmidt Independence Criterion, or HSIC). This approach has several advantages, compared with previous kernel-based independence criteria. First, the empirical estimate is simpler than any other kernel dependence test, and requires no user-defined regularisation. Second, there is a clearly defined population quantity which the empirical estimate approaches in the large sample limit, with exponential convergence guaranteed between the two: this ensures that independence tests based on HSIC do not suffer from slow learning rates. Finally, we show in the context of independent component analysis (ICA) that the performance of HSIC is competitive with that of previously published kernel-based criteria, and of other recently published ICA methods.

ei

PDF [BibTex]

PDF [BibTex]


no image
Texture and haptic cues in slant discrimination: Reliability-based cue weighting without statistically optimal cue combination

Rosas, P., Wagemans, J., Ernst, M., Wichmann, F.

Journal of the Optical Society of America A, 22(5):801-809, May 2005 (article)

Abstract
A number of models of depth cue combination suggest that the final depth percept results from a weighted average of independent depth estimates based on the different cues available. The weight of each cue in such an average is thought to depend on the reliability of each cue. In principle, such a depth estimation could be statistically optimal in the sense of producing the minimum variance unbiased estimator that can be constructed from the available information. Here we test such models using visual and haptic depth information. Different texture types produce differences in slant discrimination performance, providing a means for testing a reliability-sensitive cue combination model using texture as one of the cues to slant. Our results show that the weights for the cues were generally sensitive to their reliability, but fell short of statistically optimal combination—we find reliability-based re-weighting, but not statistically optimal cue combination.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Efficient Adaptive Sampling of the Psychometric Function by Maximizing Information Gain

Tanner, TG.

Biologische Kybernetik, Eberhard-Karls University Tübingen, Tübingen, Germany, May 2005 (diplomathesis)

Abstract
A common task in psychophysics is to measure the psychometric function. A psychometric function can be described by its shape and four parameters: offset or threshold, slope or width, false alarm rate or chance level and miss or lapse rate. Depending on the parameters of interest some points on the psychometric function may be more informative than others. Adaptive methods attempt to place trials on the most informative points based on the data collected in previous trials. A new Bayesian adaptive psychometric method placing trials by minimising the expected entropy of the posterior probabilty dis- tribution over a set of possible stimuli is introduced. The method is more flexible, faster and at least as efficient as the established method (Kontsevich and Tyler, 1999). Comparably accurate (2dB) threshold and slope estimates can be obtained after about 30 and 500 trials, respectively. By using a dynamic termination criterion the efficiency can be further improved. The method can be applied to all experimental designs including yes/no designs and allows acquisition of any set of free parameters. By weighting the importance of parameters one can include nuisance parameters and adjust the relative expected errors. Use of nuisance parameters may lead to more accurate estimates than assuming a guessed fixed value. Block designs are supported and do not harm the performance if a sufficient number of trials are performed. The method was evaluated by computer simulations in which the role of parametric assumptions, its robustness, the quality of different point estimates, the effect of dynamic termination criteria and many other settings were investigated.

ei

[BibTex]

[BibTex]


no image
Bayesian inference for psychometric functions

Kuss, M., Jäkel, F., Wichmann, F.

Journal of Vision, 5(5):478-492, May 2005 (article)

Abstract
In psychophysical studies, the psychometric function is used to model the relation between physical stimulus intensity and the observer’s ability to detect or discriminate between stimuli of different intensities. In this study, we propose the use of Bayesian inference to extract the information contained in experimental data to estimate the parameters of psychometric functions. Because Bayesian inference cannot be performed analytically, we describe how a Markov chain Monte Carlo method can be used to generate samples from the posterior distribution over parameters. These samples are used to estimate Bayesian confidence intervals and other characteristics of the posterior distribution. In addition, we discuss the parameterization of psychometric functions and the role of prior distributions in the analysis. The proposed approach is exemplified using artificially generated data and in a case study for real experimental data. Furthermore, we compare our approach with traditional methods based on maximum likelihood parameter estimation combined with bootstrap techniques for confidence interval estimation and find the Bayesian approach to be superior.

ei

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Classification of natural scenes using global image statistics

Drewes, J., Wichmann, F., Gegenfurtner, K.

47, pages: 88, 47. Tagung Experimentell Arbeitender Psychologen, April 2005 (poster)

ei

[BibTex]

[BibTex]


no image
A gene expression map of Arabidopsis thaliana development

Schmid, M., Davison, T., Henz, S., Pape, U., Demar, M., Vingron, M., Schölkopf, B., Weigel, D., Lohmann, J.

Nature Genetics, 37(5):501-506, April 2005 (article)

Abstract
Regulatory regions of plant genes tend to be more compact than those of animal genes, but the complement of transcription factors encoded in plant genomes is as large or larger than that found in those of animals. Plants therefore provide an opportunity to study how transcriptional programs control multicellular development. We analyzed global gene expression during development of the reference plant Arabidopsis thaliana in samples covering many stages, from embryogenesis to senescence, and diverse organs. Here, we provide a first analysis of this data set, which is part of the AtGenExpress expression atlas. We observed that the expression levels of transcription factor genes and signal transduction components are similar to those of metabolic genes. Examining the expression patterns of large gene families, we found that they are often more similar than would be expected by chance, indicating that many gene families have been co-opted for specific developmental processes.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Experimentally optimal v in support vector regression for different noise models and parameter settings

Chalimourda, A., Schölkopf, B., Smola, A.

Neural Networks, 18(2):205-205, March 2005 (article)

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Adhesive microstructure and method of forming same

Fearing, R. S., Sitti, M.

March 2005, US Patent 6,872,439 (misc)

pi

[BibTex]

[BibTex]


no image
Classification of Natural Scenes using Global Image Statistics

Drewes, J., Wichmann, F., Gegenfurtner, K.

8, pages: 88, 8th T{\"u}bingen Perception Conference (TWK), February 2005 (poster)

Abstract
The algorithmic classification of complex, natural scenes is generally considered a difficult task due to the large amount of information conveyed by natural images. Work by Simon Thorpe and colleagues showed that humans are capable of detecting animals within novel natural scenes with remarkable speed and accuracy. This suggests that the relevant information for classification can be extracted at comparatively limited computational cost. One hypothesis is that global image statistics such as the amplitude spectrum could underly fast image classification (Johnson & Olshausen, Journal of Vision, 2003; Torralba & Oliva, Network: Comput. Neural Syst., 2003). We used linear discriminant analysis to classify a set of 11.000 images into animal and nonanimal images. After applying a DFT to the image, we put the Fourier spectrum of each image into 48 bins (8 orientations with 6 frequency bands). Using all of these bins, classification performance on the Fourier spectrum reached 70%. In an iterative procedure, we then removed the bins whose absence caused the smallest damage to the classification performance (one bin per iteration). Notably, performance stayed at about 70% until less then 6 bins were left. A detailed analysis of the classification weights showed that a comparatively high level of performance (67%) could also be obtained when only 2 bins were used, namely the vertical orientations at the highest spatial frequency band. When using only a single frequency band (8 bins) we found that 67% classification performance could be reached when only the high spatial frequency information was used, which decreased steadily at lower spatial frequencies, reaching a minimum (50%) for the low spatial frequency information. Similar results were obtained when all bins were used on spatially pre-filtered images. Our results show that in the absence of sophisticated machine learning techniques, animal detection in natural scenes is limited to rather modest levels of performance, far below those of human observers. If limiting oneself to global image statistics such as the DFT then mostly information at the highest spatial frequencies is useful for the task. This is analogous to the results obtained with human observers on filtered images (Kirchner et al, VSS 2004).

ei

Web [BibTex]

Web [BibTex]


no image
Efficient Adaptive Sampling of the Psychometric Function by Maximizing Information Gain

Tanner, T., Hill, N., Rasmussen, C., Wichmann, F.

8, pages: 109, (Editors: Bülthoff, H. H., H. A. Mallot, R. Ulrich and F. A. Wichmann), 8th T{\"u}bingen Perception Conference (TWK), February 2005 (poster)

Abstract
A psychometric function can be described by its shape and four parameters: position or threshold, slope or width, false alarm rate or chance level, and miss or lapse rate. Depending on the parameters of interest some points on the psychometric function may be more informative than others. Adaptive methods attempt to place trials on the most informative points based on the data collected in previous trials. We introduce a new adaptive bayesian psychometric method which collects data for any set of parameters with high efficency. It places trials by minimizing the expected entropy [1] of the posterior pdf over a set of possible stimuli. In contrast to most other adaptive methods it is neither limited to threshold measurement nor to forced-choice designs. Nuisance parameters can be included in the estimation and lead to less biased estimates. The method supports block designs which do not harm the performance when a sufficient number of trials are performed. Block designs are useful for control of response bias and short term performance shifts such as adaptation. We present the results of evaluations of the method by computer simulations and experiments with human observers. In the simulations we investigated the role of parametric assumptions, the quality of different point estimates, the effect of dynamic termination criteria and many other settings. [1] Kontsevich, L.L. and Tyler, C.W. (1999): Bayesian adaptive estimation of psychometric slope and threshold. Vis. Res. 39 (16), 2729-2737.

ei

Web [BibTex]

Web [BibTex]


no image
Bayesian Inference for Psychometric Functions

Kuss, M., Jäkel, F., Wichmann, F.

8, pages: 106, (Editors: Bülthoff, H. H., H. A. Mallot, R. Ulrich and F. A. Wichmann), 8th T{\"u}bingen Perception Conference (TWK), February 2005 (poster)

Abstract
In psychophysical studies of perception the psychometric function is used to model the relation between the physical stimulus intensity and the observer's ability to detect or discriminate between stimuli of different intensities. We propose the use of Bayesian inference to extract the information contained in experimental data to learn about the parameters of psychometric functions. Since Bayesian inference cannot be performed analytically we use a Markov chain Monte Carlo method to generate samples from the posterior distribution over parameters. These samples can be used to estimate Bayesian confidence intervals and other characteristics of the posterior distribution. We compare our approach with traditional methods based on maximum-likelihood parameter estimation combined with parametric bootstrap techniques for confidence interval estimation. Experiments indicate that Bayesian inference methods are superior to bootstrap-based methods and are thus the method of choice for estimating the psychometric function and its confidence-intervals.

ei

Web [BibTex]

Web [BibTex]


no image
Active Learning for Parzen Window Classifier

Chapelle, O.

In AISTATS 2005, pages: 49-56, (Editors: Cowell, R. , Z. Ghahramani), Tenth International Workshop on Artificial Intelligence and Statistics (AI & Statistics), January 2005 (inproceedings)

Abstract
The problem of active learning is approached in this paper by minimizing directly an estimate of the expected test error. The main difficulty in this ``optimal'' strategy is that output probabilities need to be estimated accurately. We suggest here different methods for estimating those efficiently. In this context, the Parzen window classifier is considered because it is both simple and probabilistic. The analysis of experimental results highlights that regularization is a key ingredient for this strategy.

ei

Web [BibTex]

Web [BibTex]


no image
Semi-Supervised Classification by Low Density Separation

Chapelle, O., Zien, A.

In AISTATS 2005, pages: 57-64, (Editors: Cowell, R. , Z. Ghahramani), Tenth International Workshop on Artificial Intelligence and Statistics (AI & Statistics), January 2005 (inproceedings)

Abstract
We believe that the cluster assumption is key to successful semi-supervised learning. Based on this, we propose three semi-supervised algorithms: 1. deriving graph-based distances that emphazise low density regions between clusters, followed by training a standard SVM; 2. optimizing the Transductive SVM objective function, which places the decision boundary in low density regions, by gradient descent; 3. combining the first two to make maximum use of the cluster assumption. We compare with state of the art algorithms and demonstrate superior accuracy for the latter two methods.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Kernel Constrained Covariance for Dependence Measurement

Gretton, A., Smola, A., Bousquet, O., Herbrich, R., Belitski, A., Augath, M., Murayama, Y., Pauls, J., Schölkopf, B., Logothetis, N.

In Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics, pages: 112-119, (Editors: R Cowell, R and Z Ghahramani), AISTATS, January 2005 (inproceedings)

Abstract
We discuss reproducing kernel Hilbert space (RKHS)-based measures of statistical dependence, with emphasis on constrained covariance (COCO), a novel criterion to test dependence of random variables. We show that COCO is a test for independence if and only if the associated RKHSs are universal. That said, no independence test exists that can distinguish dependent and independent random variables in all circumstances. Dependent random variables can result in a COCO which is arbitrarily close to zero when the source densities are highly non-smooth. All current kernel-based independence tests share this behaviour. We demonstrate exponential convergence between the population and empirical COCO. Finally, we use COCO as a measure of joint neural activity between voxels in MRI recordings of the macaque monkey, and compare the results to the mutual information and the correlation. We also show the effect of removing breathing artefacts from the MRI recording.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Hilbertian Metrics and Positive Definite Kernels on Probability Measures

Hein, M., Bousquet, O.

In AISTATS 2005, pages: 136-143, (Editors: Cowell, R. , Z. Ghahramani), Tenth International Workshop on Artificial Intelligence and Statistics (AI & Statistics), January 2005 (inproceedings)

Abstract
We investigate the problem of defining Hilbertian metrics resp. positive definite kernels on probability measures, continuing previous work. This type of kernels has shown very good results in text classification and has a wide range of possible applications. In this paper we extend the two-parameter family of Hilbertian metrics of Topsoe such that it now includes all commonly used Hilbertian metrics on probability measures. This allows us to do model selection among these metrics in an elegant and unified way. Second we investigate further our approach to incorporate similarity information of the probability space into the kernel. The analysis provides a better understanding of these kernels and gives in some cases a more efficient way to compute them. Finally we compare all proposed kernels in two text and two image classification problems.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Kernel Constrained Covariance for Dependence Measurement

Gretton, A., Smola, A., Bousquet, O., Herbrich, R., Belitski, A., Augath, M., Murayama, Y., Schölkopf, B., Logothetis, N.

AISTATS, January 2005 (talk)

Abstract
We discuss reproducing kernel Hilbert space (RKHS)-based measures of statistical dependence, with emphasis on constrained covariance (COCO), a novel criterion to test dependence of random variables. We show that COCO is a test for independence if and only if the associated RKHSs are universal. That said, no independence test exists that can distinguish dependent and independent random variables in all circumstances. Dependent random variables can result in a COCO which is arbitrarily close to zero when the source densities are highly non-smooth. All current kernel-based independence tests share this behaviour. We demonstrate exponential convergence between the population and empirical COCO. Finally, we use COCO as a measure of joint neural activity between voxels in MRI recordings of the macaque monkey, and compare the results to the mutual information and the correlation. We also show the effect of removing breathing artefacts from the MRI recording.

ei

PostScript [BibTex]

PostScript [BibTex]


no image
Composite adaptive control with locally weighted statistical learning

Nakanishi, J., Farrell, J. A., Schaal, S.

Neural Networks, 18(1):71-90, January 2005, clmc (article)

Abstract
This paper introduces a provably stable learning adaptive control framework with statistical learning. The proposed algorithm employs nonlinear function approximation with automatic growth of the learning network according to the nonlinearities and the working domain of the control system. The unknown function in the dynamical system is approximated by piecewise linear models using a nonparametric regression technique. Local models are allocated as necessary and their parameters are optimized on-line. Inspired by composite adaptive control methods, the proposed learning adaptive control algorithm uses both the tracking error and the estimation error to update the parameters. We first discuss statistical learning of nonlinear functions, and motivate our choice of the locally weighted learning framework. Second, we begin with a class of first order SISO systems for theoretical development of our learning adaptive control framework, and present a stability proof including a parameter projection method that is needed to avoid potential singularities during adaptation. Then, we generalize our adaptive controller to higher order SISO systems, and discuss further extension to MIMO problems. Finally, we evaluate our theoretical control framework in numerical simulations to illustrate the effectiveness of the proposed learning adaptive controller for rapid convergence and high accuracy of control.

am

link (url) [BibTex]

link (url) [BibTex]


no image
Invariance of Neighborhood Relation under Input Space to Feature Space Mapping

Shin, H., Cho, S.

Pattern Recognition Letters, 26(6):707-718, 2005 (article)

Abstract
If the training pattern set is large, it takes a large memory and a long time to train support vector machine (SVM). Recently, we proposed neighborhood property based pattern selection algorithm (NPPS) which selects only the patterns that are likely to be near the decision boundary ahead of SVM training [Proc. of the 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Lecture Notes in Artificial Intelligence (LNAI 2637), Seoul, Korea, pp. 376–387]. NPPS tries to identify those patterns that are likely to become support vectors in feature space. Preliminary reports show its effectiveness: SVM training time was reduced by two orders of magnitude with almost no loss in accuracy for various datasets. It has to be noted, however, that decision boundary of SVM and support vectors are all defined in feature space while NPPS described above operates in input space. If neighborhood relation in input space is not preserved in feature space, NPPS may not always be effective. In this paper, we sh ow that the neighborhood relation is invariant under input to feature space mapping. The result assures that the patterns selected by NPPS in input space are likely to be located near decision boundary in feature space.

ei

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Intrinsic Dimensionality Estimation of Submanifolds in Euclidean space

Hein, M., Audibert, Y.

In Proceedings of the 22nd International Conference on Machine Learning, pages: 289 , (Editors: De Raedt, L. , S. Wrobel), ICML Bonn, 2005 (inproceedings)

Abstract
We present a new method to estimate the intrinsic dimensionality of a submanifold M in Euclidean space from random samples. The method is based on the convergence rates of a certain U-statistic on the manifold. We solve at least partially the question of the choice of the scale of the data. Moreover the proposed method is easy to implement, can handle large data sets and performs very well even for small sample sizes. We compare the proposed method to two standard estimators on several artificial as well as real data sets.

ei

PDF [BibTex]

PDF [BibTex]


no image
Large Scale Genomic Sequence SVM Classifiers

Sonnenburg, S., Rätsch, G., Schölkopf, B.

In Proceedings of the 22nd International Conference on Machine Learning, pages: 849-856, (Editors: L De Raedt and S Wrobel), ACM, New York, NY, USA, ICML, 2005 (inproceedings)

Abstract
In genomic sequence analysis tasks like splice site recognition or promoter identification, large amounts of training sequences are available, and indeed needed to achieve sufficiently high classification performances. In this work we study two recently proposed and successfully used kernels, namely the Spectrum kernel and the Weighted Degree kernel (WD). In particular, we suggest several extensions using Suffix Trees and modi cations of an SMO-like SVM training algorithm in order to accelerate the training of the SVMs and their evaluation on test sequences. Our simulations show that for the spectrum kernel and WD kernel, large scale SVM training can be accelerated by factors of 20 and 4 times, respectively, while using much less memory (e.g. no kernel caching). The evaluation on new sequences is often several thousand times faster using the new techniques (depending on the number of Support Vectors). Our method allows us to train on sets as large as one million sequences.

ei

PDF [BibTex]

PDF [BibTex]


no image
Joint Kernel Maps

Weston, J., Schölkopf, B., Bousquet, O.

In Proceedings of the 8th InternationalWork-Conference on Artificial Neural Networks, LNCS 3512, pages: 176-191, (Editors: J Cabestany and A Prieto and F Sandoval), Springer, Berlin Heidelberg, Germany, IWANN, 2005 (inproceedings)

Abstract
We develop a methodology for solving high dimensional dependency estimation problems between pairs of data types, which is viable in the case where the output of interest has very high dimension, e.g., thousands of dimensions. This is achieved by mapping the objects into continuous or discrete spaces, using joint kernels. Known correlations between input and output can be defined by such kernels, some of which can maintain linearity in the outputs to provide simple (closed form) pre-images. We provide examples of such kernels and empirical results.

ei

PostScript DOI [BibTex]

PostScript DOI [BibTex]


no image
Analysis of Some Methods for Reduced Rank Gaussian Process Regression

Quinonero Candela, J., Rasmussen, C.

In Switching and Learning in Feedback Systems, pages: 98-127, (Editors: Murray Smith, R. , R. Shorten), Springer, Berlin, Germany, European Summer School on Multi-Agent Control, 2005 (inproceedings)

Abstract
While there is strong motivation for using Gaussian Processes (GPs) due to their excellent performance in regression and classification problems, their computational complexity makes them impractical when the size of the training set exceeds a few thousand cases. This has motivated the recent proliferation of a number of cost-effective approximations to GPs, both for classification and for regression. In this paper we analyze one popular approximation to GPs for regression: the reduced rank approximation. While generally GPs are equivalent to infinite linear models, we show that Reduced Rank Gaussian Processes (RRGPs) are equivalent to finite sparse linear models. We also introduce the concept of degenerate GPs and show that they correspond to inappropriate priors. We show how to modify the RRGP to prevent it from being degenerate at test time. Training RRGPs consists both in learning the covariance function hyperparameters and the support set. We propose a method for learning hyperparameters for a given support set. We also review the Sparse Greedy GP (SGGP) approximation (Smola and Bartlett, 2001), which is a way of learning the support set for given hyperparameters based on approximating the posterior. We propose an alternative method to the SGGP that has better generalization capabilities. Finally we make experiments to compare the different ways of training a RRGP. We provide some Matlab code for learning RRGPs.

ei

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Approximate Inference for Robust Gaussian Process Regression

Kuss, M., Pfingsten, T., Csato, L., Rasmussen, C.

(136), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, 2005 (techreport)

Abstract
Gaussian process (GP) priors have been successfully used in non-parametric Bayesian regression and classification models. Inference can be performed analytically only for the regression model with Gaussian noise. For all other likelihood models inference is intractable and various approximation techniques have been proposed. In recent years expectation-propagation (EP) has been developed as a general method for approximate inference. This article provides a general summary of how expectation-propagation can be used for approximate inference in Gaussian process models. Furthermore we present a case study describing its implementation for a new robust variant of Gaussian process regression. To gain further insights into the quality of the EP approximation we present experiments in which we compare to results obtained by Markov chain Monte Carlo (MCMC) sampling.

ei

PDF [BibTex]

PDF [BibTex]


no image
Global image statistics of natural scenes

Drewes, J., Wichmann, F., Gegenfurtner, K.

Bioinspired Information Processing, 08, pages: 1, 2005 (poster)

ei

[BibTex]

[BibTex]


no image
Theory of Classification: A Survey of Some Recent Advances

Boucheron, S., Bousquet, O., Lugosi, G.

ESAIM: Probability and Statistics, 9, pages: 323 , 2005 (article)

Abstract
The last few years have witnessed important new developments in the theory and practice of pattern classification. We intend to survey some of the main new ideas that have lead to these important recent developments.

ei

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
From Graphs to Manifolds - Weak and Strong Pointwise Consistency of Graph Laplacians

Hein, M., Audibert, J., von Luxburg, U.

In Proceedings of the 18th Conference on Learning Theory (COLT), pages: 470-485, Conference on Learning Theory, 2005, Student Paper Award (inproceedings)

Abstract
In the machine learning community it is generally believed that graph Laplacians corresponding to a finite sample of data points converge to a continuous Laplace operator if the sample size increases. Even though this assertion serves as a justification for many Laplacian-based algorithms, so far only some aspects of this claim have been rigorously proved. In this paper we close this gap by establishing the strong pointwise consistency of a family of graph Laplacians with data-dependent weights to some weighted Laplace operator. Our investigation also includes the important case where the data lies on a submanifold of $R^d$.

ei

PDF [BibTex]

PDF [BibTex]


no image
Propagating Distributions on a Hypergraph by Dual Information Regularization

Tsuda, K.

In Proceedings of the 22nd International Conference on Machine Learning, pages: 921 , (Editors: De Raedt, L. , S. Wrobel), ICML Bonn, 2005 (inproceedings)

Abstract
In the information regularization framework by Corduneanu and Jaakkola (2005), the distributions of labels are propagated on a hypergraph for semi-supervised learning. The learning is efficiently done by a Blahut-Arimoto-like two step algorithm, but, unfortunately, one of the steps cannot be solved in a closed form. In this paper, we propose a dual version of information regularization, which is considered as more natural in terms of information geometry. Our learning algorithm has two steps, each of which can be solved in a closed form. Also it can be naturally applied to exponential family distributions such as Gaussians. In experiments, our algorithm is applied to protein classification based on a metabolic network and known functional categories.

ei

[BibTex]

[BibTex]


no image
Support Vector Machines and Kernel Algorithms

Schölkopf, B., Smola, A.

In Encyclopedia of Biostatistics (2nd edition), Vol. 8, 8, pages: 5328-5335, (Editors: P Armitage and T Colton), John Wiley & Sons, NY USA, 2005 (inbook)

ei

[BibTex]

[BibTex]


no image
Moment Inequalities for Functions of Independent Random Variables

Boucheron, S., Bousquet, O., Lugosi, G., Massart, P.

To appear in Annals of Probability, 33, pages: 514-560, 2005 (article)

Abstract
A general method for obtaining moment inequalities for functions of independent random variables is presented. It is a generalization of the entropy method which has been used to derive concentration inequalities for such functions cite{BoLuMa01}, and is based on a generalized tensorization inequality due to Lata{l}a and Oleszkiewicz cite{LaOl00}. The new inequalities prove to be a versatile tool in a wide range of applications. We illustrate the power of the method by showing how it can be used to effortlessly re-derive classical inequalities including Rosenthal and Kahane-Khinchine-type inequalities for sums of independent random variables, moment inequalities for suprema of empirical processes, and moment inequalities for Rademacher chaos and $U$-statistics. Some of these corollaries are apparently new. In particular, we generalize Talagrands exponential inequality for Rademacher chaos of order two to any order. We also discuss applications for other complex functions of independent random variables, such as suprema of boolean polynomials which include, as special cases, subgraph counting problems in random graphs.

ei

PDF [BibTex]

PDF [BibTex]


no image
A Brain Computer Interface with Online Feedback based on Magnetoencephalography

Lal, T., Schröder, M., Hill, J., Preissl, H., Hinterberger, T., Mellinger, J., Bogdan, M., Rosenstiel, W., Hofmann, T., Birbaumer, N., Schölkopf, B.

In Proceedings of the 22nd International Conference on Machine Learning, pages: 465-472, (Editors: L De Raedt and S Wrobel), ACM, New York, NY, USA, ICML, 2005 (inproceedings)

Abstract
The aim of this paper is to show that machine learning techniques can be used to derive a classifying function for human brain signal data measured by magnetoencephalography (MEG), for the use in a brain computer interface (BCI). This is especially helpful for evaluating quickly whether a BCI approach based on electroencephalography, on which training may be slower due to lower signalto- noise ratio, is likely to succeed. We apply recursive channel elimination and regularized SVMs to the experimental data of ten healthy subjects performing a motor imagery task. Four subjects were able to use a trained classifier together with a decision tree interface to write a short name. Further analysis gives evidence that the proposed imagination task is suboptimal for the possible extension to a multiclass interface. To the best of our knowledge this paper is the first working online BCI based on MEG recordings and is therefore a “proof of concept”.

ei

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Healing the Relevance Vector Machine through Augmentation

Rasmussen, CE., Candela, JQ.

In Proceedings of the 22nd International Conference on Machine Learning, pages: 689 , (Editors: De Raedt, L. , S. Wrobel), ICML, 2005 (inproceedings)

Abstract
The Relevance Vector Machine (RVM) is a sparse approximate Bayesian kernel method. It provides full predictive distributions for test cases. However, the predictive uncertainties have the unintuitive property, that emph{they get smaller the further you move away from the training cases}. We give a thorough analysis. Inspired by the analogy to non-degenerate Gaussian Processes, we suggest augmentation to solve the problem. The purpose of the resulting model, RVM*, is primarily to corroborate the theoretical and experimental analysis. Although RVM* could be used in practical applications, it is no longer a truly sparse model. Experiments show that sparsity comes at the expense of worse predictive distributions.

ei

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Visual perception I: Basic principles

Wagemans, J., Wichmann, F., de Beeck, H.

In Handbook of Cognition, pages: 3-47, (Editors: Lamberts, K. , R. Goldstone), Sage, London, 2005 (inbook)

ei

[BibTex]

[BibTex]


no image
Maximum-Margin Feature Combination for Detection and Categorization

BakIr, G., Wu, M., Eichhorn, J.

Max Planck Institute for Biological Cybernetics, Tübingen, Germany, 2005 (techreport)

Abstract
In this paper we are concerned with the optimal combination of features of possibly different types for detection and estimation tasks in machine vision. We propose to combine features such that the resulting classifier maximizes the margin between classes. In contrast to existing approaches which are non-convex and/or generative we propose to use a discriminative model leading to convex problem formulation and complexity control. Furthermore we assert that decision functions should not compare apples and oranges by comparing features of different types directly. Instead we propose to combine different similarity measures for each different feature type. Furthermore we argue that the question: ”Which feature type is more discriminative for task X?” is ill-posed and show empirically that the answer to this question might depend on the complexity of the decision function.

ei

PDF [BibTex]

PDF [BibTex]


no image
Kernel-Methods, Similarity, and Exemplar Theories of Categorization

Jäkel, F., Wichmann, F.

ASIC, 4, 2005 (poster)

Abstract
Kernel-methods are popular tools in machine learning and statistics that can be implemented in a simple feed-forward neural network. They have strong connections to several psychological theories. For example, Shepard‘s universal law of generalization can be given a kernel interpretation. This leads to an inner product and a metric on the psychological space that is different from the usual Minkowski norm. The metric has psychologically interesting properties: It is bounded from above and does not have additive segments. As categorization models often rely on Shepard‘s law as a model for psychological similarity some of them can be recast as kernel-methods. In particular, ALCOVE is shown to be closely related to kernel logistic regression. The relationship to the Generalized Context Model is also discussed. It is argued that functional analysis which is routinely used in machine learning provides valuable insights also for psychology.

ei

Web [BibTex]


no image
Rapid animal detection in natural scenes: critical features are local

Wichmann, F., Rosas, P., Gegenfurtner, K.

Experimentelle Psychologie. Beitr{\"a}ge zur 47. Tagung experimentell arbeitender Psychologen, 47, pages: 225, 2005 (poster)

ei

[BibTex]

[BibTex]


no image
Long Term Prediction of Product Quality in a Glass Manufacturing Process Using a Kernel Based Approach

Jung, T., Herrera, L., Schölkopf, B.

In Proceedings of the 8th International Work-Conferenceon Artificial Neural Networks (Computational Intelligence and Bioinspired Systems), Lecture Notes in Computer Science, Vol. 3512, LNCS 3512, pages: 960-967, (Editors: J Cabestany and A Prieto and F Sandoval), Springer, Berlin Heidelberg, Germany, IWANN, 2005 (inproceedings)

Abstract
In this paper we report the results obtained using a kernel-based approach to predict the temporal development of four response signals in the process control of a glass melting tank with 16 input parameters. The data set is a revised version1 from the modelling challenge in EUNITE-2003. The central difficulties are: large time-delays between changes in the inputs and the outputs, large number of data, and a general lack of knowledge about the relevant variables that intervene in the process. The methodology proposed here comprises Support Vector Machines (SVM) and Regularization Networks (RN). We use the idea of sparse approximation both as a means of regularization and as a means of reducing the computational complexity. Furthermore, we will use an incremental approach to add new training examples to the kernel-based method and efficiently update the current solution. This allows us to use a sophisticated learning scheme, where we iterate between prediction and training, with good computational efficiency and satisfactory results.

ei

DOI [BibTex]

DOI [BibTex]


no image
Object correspondence as a machine learning problem

Schölkopf, B., Steinke, F., Blanz, V.

In Proceedings of the 22nd International Conference on Machine Learning, pages: 777-784, (Editors: L De Raedt and S Wrobel), ACM, New York, NY, USA, ICML, 2005 (inproceedings)

Abstract
We propose machine learning methods for the estimation of deformation fields that transform two given objects into each other, thereby establishing a dense point to point correspondence. The fields are computed using a modified support vector machine containing a penalty enforcing that points of one object will be mapped to ``similar‘‘ points on the other one. Our system, which contains little engineering or domain knowledge, delivers state of the art performance. We present application results including close to photorealistic morphs of 3D head models.

ei

PDF [BibTex]

PDF [BibTex]


no image
Towards a Statistical Theory of Clustering. Presented at the PASCAL workshop on clustering, London

von Luxburg, U., Ben-David, S.

Presented at the PASCAL workshop on clustering, London, 2005 (techreport)

Abstract
The goal of this paper is to discuss statistical aspects of clustering in a framework where the data to be clustered has been sampled from some unknown probability distribution. Firstly, the clustering of the data set should reveal some structure of the underlying data rather than model artifacts due to the random sampling process. Secondly, the more sample points we have, the more reliable the clustering should be. We discuss which methods can and cannot be used to tackle those problems. In particular we argue that generalization bounds as they are used in statistical learning theory of classification are unsuitable in a general clustering framework. We suggest that the main replacements of generalization bounds should be convergence proofs and stability considerations. This paper should be considered as a road map paper which identifies important questions and potentially fruitful directions for future research about statistical clustering. We do not attempt to present a complete statistical theory of clustering.

ei

PDF [BibTex]

PDF [BibTex]


no image
The human brain as large margin classifier

Graf, A., Wichmann, F., Bülthoff, H., Schölkopf, B.

Proceedings of the Computational & Systems Neuroscience Meeting (COSYNE), 2, pages: 1, 2005 (poster)

ei

[BibTex]

[BibTex]