Header logo is


2006


no image
Conformal Multi-Instance Kernels

Blaschko, M., Hofmann, T.

In NIPS 2006 Workshop on Learning to Compare Examples, pages: 1-6, NIPS Workshop on Learning to Compare Examples, December 2006 (inproceedings)

Abstract
In the multiple instance learning setting, each observation is a bag of feature vectors of which one or more vectors indicates membership in a class. The primary task is to identify if any vectors in the bag indicate class membership while ignoring vectors that do not. We describe here a kernel-based technique that defines a parametric family of kernels via conformal transformations and jointly learns a discriminant function over bags together with the optimal parameter settings of the kernel. Learning a conformal transformation effectively amounts to weighting regions in the feature space according to their contribution to classification accuracy; regions that are discriminative will be weighted higher than regions that are not. This allows the classifier to focus on regions contributing to classification accuracy while ignoring regions that correspond to vectors found both in positive and in negative bags. We show how parameters of this transformation can be learned for support vector machines by posing the problem as a multiple kernel learning problem. The resulting multiple instance classifier gives competitive accuracy for several multi-instance benchmark datasets from different domains.

ei

PDF Web [BibTex]

2006


PDF Web [BibTex]


no image
Some observations on the pedestal effect or dipper function

Henning, B., Wichmann, F.

Journal of Vision, 6(13):50, 2006 Fall Vision Meeting of the Optical Society of America, December 2006 (poster)

Abstract
The pedestal effect is the large improvement in the detectabilty of a sinusoidal “signal” grating observed when the signal is added to a masking or “pedestal” grating of the same spatial frequency, orientation, and phase. We measured the pedestal effect in both broadband and notched noise - noise from which a 1.5-octave band centred on the signal frequency had been removed. Although the pedestal effect persists in broadband noise, it almost disappears in the notched noise. Furthermore, the pedestal effect is substantial when either high- or low-pass masking noise is used. We conclude that the pedestal effect in the absence of notched noise results principally from the use of information derived from channels with peak sensitivities at spatial frequencies different from that of the signal and pedestal. The spatial-frequency components of the notched noise above and below the spatial frequency of the signal and pedestal prevent the use of information about changes in contrast carried in channels tuned to spatial frequencies that are very much different from that of the signal and pedestal. Thus the pedestal or dipper effect measured without notched noise is not a characteristic of individual spatial-frequency tuned channels.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
A Kernel Method for the Two-Sample-Problem

Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B., Smola, A.

20th Annual Conference on Neural Information Processing Systems (NIPS), December 2006 (talk)

Abstract
We propose two statistical tests to determine if two samples are from different distributions. Our test statistic is in both cases the distance between the means of the two samples mapped into a reproducing kernel Hilbert space (RKHS). The first test is based on a large deviation bound for the test statistic, while the second is based on the asymptotic distribution of this statistic. We show that the test statistic can be computed in $O(m^2)$ time. We apply our approach to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where our test performs strongly. We also demonstrate excellent performance when comparing distributions over graphs, for which no alternative tests currently exist.

ei

PDF [BibTex]

PDF [BibTex]


no image
Ab-initio gene finding using machine learning

Schweikert, G., Zeller, G., Zien, A., Ong, C., de Bona, F., Sonnenburg, S., Phillips, P., Rätsch, G.

NIPS Workshop on New Problems and Methods in Computational Biology, December 2006 (talk)

ei

Web [BibTex]

Web [BibTex]


no image
Graph boosting for molecular QSAR analysis

Saigo, H., Kadowaki, T., Kudo, T., Tsuda, K.

NIPS Workshop on New Problems and Methods in Computational Biology, December 2006 (talk)

Abstract
We propose a new boosting method that systematically combines graph mining and mathematical programming-based machine learning. Informative and interpretable subgraph features are greedily found by a series of graph mining calls. Due to our mathematical programming formulation, subgraph features and pre-calculated real-valued features are seemlessly integrated. We tested our algorithm on a quantitative structure-activity relationship (QSAR) problem, which is basically a regression problem when given a set of chemical compounds. In benchmark experiments, the prediction accuracy of our method favorably compared with the best results reported on each dataset.

ei

Web [BibTex]

Web [BibTex]


no image
Inferring Causal Directions by Evaluating the Complexity of Conditional Distributions

Sun, X., Janzing, D., Schölkopf, B.

NIPS Workshop on Causality and Feature Selection, December 2006 (talk)

Abstract
We propose a new approach to infer the causal structure that has generated the observed statistical dependences among n random variables. The idea is that the factorization of the joint measure of cause and effect into P(cause)P(effect|cause) leads typically to simpler conditionals than non-causal factorizations. To evaluate the complexity of the conditionals we have tried two methods. First, we have compared them to those which maximize the conditional entropy subject to the observed first and second moments since we consider the latter as the simplest conditionals. Second, we have fitted the data with conditional probability measures being exponents of functions in an RKHS space and defined the complexity by a Hilbert-space semi-norm. Such a complexity measure has several properties that are useful for our purpose. We describe some encouraging results with both methods applied to real-world data. Moreover, we have combined constraint-based approaches to causal discovery (i.e., methods using only information on conditional statistical dependences) with our method in order to distinguish between causal hypotheses which are equivalent with respect to the imposed independences. Furthermore, we compare the performance to Bayesian approaches to causal inference.

ei

Web [BibTex]


no image
Learning Optimal EEG Features Across Time, Frequency and Space

Farquhar, J., Hill, J., Schölkopf, B.

NIPS Workshop on Current Trends in Brain-Computer Interfacing, December 2006 (talk)

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Semi-Supervised Learning

Zien, A.

Advanced Methods in Sequence Analysis Lectures, November 2006 (talk)

ei

Web [BibTex]

Web [BibTex]


no image
Adapting Spatial Filter Methods for Nonstationary BCIs

Tomioka, R., Hill, J., Blankertz, B., Aihara, K.

In IBIS 2006, pages: 65-70, 2006 Workshop on Information-Based Induction Sciences, November 2006 (inproceedings)

Abstract
A major challenge in applying machine learning methods to Brain-Computer Interfaces (BCIs) is to overcome the possible nonstationarity in the data from the datablock the method is trained on and that the method is applied to. Assuming the joint distributions of the whitened signal and the class label to be identical in two blocks, where the whitening is done in each block independently, we propose a simple adaptation formula that is applicable to a broad class of spatial filtering methods including ICA, CSP, and logistic regression classifiers. We characterize the class of linear transformations for which the above assumption holds. Experimental results on 60 BCI datasets show improved classification accuracy compared to (a) fixed spatial filter approach (no adaptation) and (b) fixed spatial pattern approach (proposed by Hill et al., 2006 [1]).

ei

PDF [BibTex]

PDF [BibTex]


no image
Optimizing Spatial Filters for BCI: Margin- and Evidence-Maximization Approaches

Farquhar, J., Hill, N., Schölkopf, B.

Challenging Brain-Computer Interfaces: MAIA Workshop 2006, pages: 1, November 2006 (poster)

Abstract
We present easy-to-use alternatives to the often-used two-stage Common Spatial Pattern + classifier approach for spatial filtering and classification of Event-Related Desychnronization signals in BCI. We report two algorithms that aim to optimize the spatial filters according to a criterion more directly related to the ability of the algorithms to generalize to unseen data. Both are based upon the idea of treating the spatial filter coefficients as hyperparameters of a kernel or covariance function. We then optimize these hyper-parameters directly along side the normal classifier parameters with respect to our chosen learning objective function. The two objectives considered are margin maximization as used in Support-Vector Machines and the evidence maximization framework used in Gaussian Processes. Our experiments assessed generalization error as a function of the number of training points used, on 9 BCI competition data sets and 5 offline motor imagery data sets measured in Tubingen. Both our approaches sho w consistent improvements relative to the commonly used CSP+linear classifier combination. Strikingly, the improvement is most significant in the higher noise cases, when either few trails are used for training, or with the most poorly performing subjects. This a reversal of the usual "rich get richer" effect in the development of CSP extensions, which tend to perform best when the signal is strong enough to accurately find their additional parameters. This makes our approach particularly suitable for clinical application where high levels of noise are to be expected.

ei

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
A Machine Learning Approach for Determining the PET Attenuation Map from Magnetic Resonance Images

Hofmann, M., Steinke, F., Judenhofer, M., Claussen, C., Schölkopf, B., Pichler, B.

IEEE Medical Imaging Conference, November 2006 (talk)

Abstract
A promising new combination in multimodality imaging is MR-PET, where the high soft tissue contrast of Magnetic Resonance Imaging (MRI) and the functional information of Positron Emission Tomography (PET) are combined. Although many technical problems have recently been solved, it is still an open problem to determine the attenuation map from the available MR scan, as the MR intensities are not directly related to the attenuation values. One standard approach is an atlas registration where the atlas MR image is aligned with the patient MR thus also yielding an attenuation image for the patient. We also propose another approach, which to our knowledge has not been tried before: Using Support Vector Machines we predict the attenuation value directly from the local image information. We train this well-established machine learning algorithm using small image patches. Although both approaches sometimes yielded acceptable results, they also showed their specific shortcomings: The registration often fails with large deformations whereas the prediction approach is problematic when the local image structure is not characteristic enough. However, the failures often do not coincide and integration of both information sources is promising. We therefore developed a combination method extending Support Vector Machines to use not only local image structure but also atlas registered coordinates. We demonstrate the strength of this combination approach on a number of examples.

ei

[BibTex]

[BibTex]


no image
Semi-Supervised Support Vector Machines and Application to Spam Filtering

Zien, A.

ECML Discovery Challenge Workshop, September 2006 (talk)

Abstract
After introducing the semi-supervised support vector machine (aka TSVM for "transductive SVM"), a few popular training strategies are briefly presented. Then the assumptions underlying semi-supervised learning are reviewed. Finally, two modern TSVM optimization techniques are applied to the spam filtering data sets of the workshop; it is shown that they can achieve excellent results, if the problem of the data being non-iid can be handled properly.

ei

PDF Web [BibTex]


no image
A Linear Programming Approach for Molecular QSAR analysis

Saigo, H., Kadowaki, T., Tsuda, K.

In MLG 2006, pages: 85-96, (Editors: Gärtner, T. , G. C. Garriga, T. Meinl), International Workshop on Mining and Learning with Graphs, September 2006, Best Paper Award (inproceedings)

Abstract
Small molecules in chemistry can be represented as graphs. In a quantitative structure-activity relationship (QSAR) analysis, the central task is to find a regression function that predicts the activity of the molecule in high accuracy. Setting a QSAR as a primal target, we propose a new linear programming approach to the graph-based regression problem. Our method extends the graph classification algorithm by Kudo et al. (NIPS 2004), which is a combination of boosting and graph mining. Instead of sequential multiplicative updates, we employ the linear programming boosting (LP) for regression. The LP approach allows to include inequality constraints for the parameter vector, which turns out to be particularly useful in QSAR tasks where activity values are sometimes unavailable. Furthermore, the efficiency is improved significantly by employing multiple pricing.

ei

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Incremental Aspect Models for Mining Document Streams

Surendran, A., Sra, S.

In PKDD 2006, pages: 633-640, (Editors: Fürnkranz, J. , T. Scheffer, M. Spiliopoulou), Springer, Berlin, Germany, 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, September 2006 (inproceedings)

Abstract
In this paper we introduce a novel approach for incrementally building aspect models, and use it to dynamically discover underlying themes from document streams. Using the new approach we present an application which we call “query-line tracking” i.e., we automatically discover and summarize different themes or stories that appear over time, and that relate to a particular query. We present evaluation on news corpora to demonstrate the strength of our method for both query-line tracking, online indexing and clustering.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Learning Eye Movements

Kienzle, W., Wichmann, F., Schölkopf, B., Franz, M.

Sensory Coding And The Natural Environment, 2006, pages: 1, September 2006 (poster)

Abstract
The human visual system samples images through saccadic eye movements which rapidly change the point of fixation. Although the selection of eye movement targets depends on numerous top-down mechanisms, a number of recent studies have shown that low-level image features such as local contrast or edges play an important role. These studies typically used predefined image features which were afterwards experimentally verified. Here, we follow a complementary approach: instead of testing a set of candidate image features, we infer these hypotheses from the data, using methods from statistical learning. To this end, we train a non-linear classifier on fixated vs. randomly selected image patches without making any physiological assumptions. The resulting classifier can be essentially characterized by a nonlinear combination of two center-surround receptive fields. We find that the prediction performance of this simple model on our eye movement data is indistinguishable from the physiologically motivated model of Itti & Koch (2000) which is far more complex. In particular, we obtain a comparable performance without using any multi-scale representations, long-range interactions or oriented image features.

ei

Web [BibTex]

Web [BibTex]


no image
PALMA: Perfect Alignments using Large Margin Algorithms

Rätsch, G., Hepp, B., Schulze, U., Ong, C.

In GCB 2006, pages: 104-113, (Editors: Huson, D. , O. Kohlbacher, A. Lupas, K. Nieselt, A. Zell), Gesellschaft für Informatik, Bonn, Germany, German Conference on Bioinformatics, September 2006 (inproceedings)

Abstract
Despite many years of research on how to properly align sequences in the presence of sequencing errors, alternative splicing and micro-exons, the correct alignment of mRNA sequences to genomic DNA is still a challenging task. We present a novel approach based on large margin learning that combines kernel based splice site predictions with common sequence alignment techniques. By solving a convex optimization problem, our algorithm -- called PALMA -- tunes the parameters of the model such that the true alignment scores higher than all other alignments. In an experimental study on the alignments of mRNAs containing artificially generated micro-exons, we show that our algorithm drastically outperforms all other methods: It perfectly aligns all 4358 sequences on an hold-out set, while the best other method misaligns at least 90 of them. Moreover, our algorithm is very robust against noise in the query sequence: when deleting, inserting, or mutating up to 50% of the query sequence, it still aligns 95% of all sequences correctly, while other methods achieve less than 36% accuracy. For datasets, additional results and a stand-alone alignment tool see http://www.fml.mpg.de/raetsch/projects/palma.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Graph Based Semi-Supervised Learning with Sharper Edges

Shin, H., Hill, N., Rätsch, G.

In ECML 2006, pages: 401-412, (Editors: Fürnkranz, J. , T. Scheffer, M. Spiliopoulou), Springer, Berlin, Germany, 17th European Conference on Machine Learning (ECML), September 2006 (inproceedings)

Abstract
In many graph-based semi-supervised learning algorithms, edge weights are assumed to be fixed and determined by the data points‘ (often symmetric)relationships in input space, without considering directionality. However, relationships may be more informative in one direction (e.g. from labelled to unlabelled) than in the reverse direction, and some relationships (e.g. strong weights between oppositely labelled points) are unhelpful in either direction. Undesirable edges may reduce the amount of influence an informative point can propagate to its neighbours -- the point and its outgoing edges have been ``blunted.‘‘ We present an approach to ``sharpening‘‘ in which weights are adjusted to meet an optimization criterion wherever they are directed towards labelled points. This principle can be applied to a wide variety of algorithms. In the current paper, we present one ad hoc solution satisfying the principle, in order to show that it can improve performance on a number of publicly available benchmark data sets.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Finite-Horizon Optimal State-Feedback Control of Nonlinear Stochastic Systems Based on a Minimum Principle

Deisenroth, MP., Ohtsuka, T., Weissel, F., Brunn, D., Hanebeck, UD.

In MFI 2006, pages: 371-376, (Editors: Hanebeck, U. D.), IEEE Service Center, Piscataway, NJ, USA, 6th IEEE International Conference on Multisensor Fusion and Integration, September 2006 (inproceedings)

Abstract
In this paper, an approach to the finite-horizon optimal state-feedback control problem of nonlinear, stochastic, discrete-time systems is presented. Starting from the dynamic programming equation, the value function will be approximated by means of Taylor series expansion up to second-order derivatives. Moreover, the problem will be reformulated, such that a minimum principle can be applied to the stochastic problem. Employing this minimum principle, the optimal control problem can be rewritten as a two-point boundary-value problem to be solved at each time step of a shrinking horizon. To avoid numerical problems, the two-point boundary-value problem will be solved by means of a continuation method. Thus, the curse of dimensionality of dynamic programming is avoided, and good candidates for the optimal state-feedback controls are obtained. The proposed approach will be evaluated by means of a scalar example system.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Uniform Convergence of Adaptive Graph-Based Regularization

Hein, M.

In COLT 2006, pages: 50-64, (Editors: Lugosi, G. , H.-U. Simon), Springer, Berlin, Germany, 19th Annual Conference on Learning Theory, September 2006 (inproceedings)

Abstract
The regularization functional induced by the graph Laplacian of a random neighborhood graph based on the data is adaptive in two ways. First it adapts to an underlying manifold structure and second to the density of the data-generating probability measure. We identify in this paper the limit of the regularizer and show uniform convergence over the space of Hoelder functions. As an intermediate step we derive upper bounds on the covering numbers of Hoelder functions on compact Riemannian manifolds, which are of independent interest for the theoretical analysis of manifold-based learning methods.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Regularised CSP for Sensor Selection in BCI

Farquhar, J., Hill, N., Lal, T., Schölkopf, B.

In Proceedings of the 3rd International Brain-Computer Interface Workshop and Training Course 2006, pages: 14-15, (Editors: GR Müller-Putz and C Brunner and R Leeb and R Scherer and A Schlögl and S Wriessnegger and G Pfurtscheller), Verlag der Technischen Universität Graz, Graz, Austria, 3rd International Brain-Computer Interface Workshop and Training Course, September 2006 (inproceedings)

Abstract
The Common Spatial Pattern (CSP) algorithm is a highly successful method for efficiently calculating spatial filters for brain signal classification. Spatial filtering can improve classification performance considerably, but demands that a large number of electrodes be mounted, which is inconvenient in day-to-day BCI usage. The CSP algorithm is also known for its tendency to overfit, i.e. to learn the noise in the training set rather than the signal. Both problems motivate an approach in which spatial filters are sparsified. We briefly sketch a reformulation of the problem which allows us to do this, using 1-norm regularisation. Focusing on the electrode selection issue, we present preliminary results on EEG data sets that suggest that effective spatial filters may be computed with as few as 10--20 electrodes, hence offering the potential to simplify the practical realisation of BCI systems significantly.

ei

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Time-Dependent Demixing of Task-Relevant EEG Signals

Hill, N., Farquhar, J., Lal, T., Schölkopf, B.

In Proceedings of the 3rd International Brain-Computer Interface Workshop and Training Course 2006, pages: 20-21, (Editors: GR Müller-Putz and C Brunner and R Leeb and R Scherer and A Schlögl and S Wriessnegger and G Pfurtscheller), Verlag der Technischen Universität Graz, Graz, Austria, 3rd International Brain-Computer Interface Workshop and Training Course, September 2006 (inproceedings)

Abstract
Given a spatial filtering algorithm that has allowed us to identify task-relevant EEG sources, we present a simple approach for monitoring the activity of these sources while remaining relatively robust to changes in other (task-irrelevant) brain activity. The idea is to keep spatial *patterns* fixed rather than spatial filters, when transferring from training to test sessions or from one time window to another. We show that a fixed spatial pattern (FSP) approach, using a moving-window estimate of signal covariances, can be more robust to non-stationarity than a fixed spatial filter (FSF) approach.

ei

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Inferential Structure Determination: Probabilistic determination and validation of NMR structures

Habeck, M.

Gordon Research Conference on Computational Aspects of Biomolecular NMR, September 2006 (talk)

ei

Web [BibTex]

Web [BibTex]


no image
A Sober Look at Clustering Stability

Ben-David, S., von Luxburg, U., Pal, D.

In COLT 2006, pages: 5-19, (Editors: Lugosi, G. , H.-U. Simon), Springer, Berlin, Germany, 19th Annual Conference on Learning Theory, September 2006 (inproceedings)

Abstract
Stability is a common tool to verify the validity of sample based algorithms. In clustering it is widely used to tune the parameters of the algorithm, such as the number k of clusters. In spite of the popularity of stability in practical applications, there has been very little theoretical analysis of this notion. In this paper we provide a formal definition of stability and analyze some of its basic properties. Quite surprisingly, the conclusion of our analysis is that for large sample size, stability is fully determined by the behavior of the objective function which the clustering algorithm is aiming to minimize. If the objective function has a unique global minimizer, the algorithm is stable, otherwise it is unstable. In particular we conclude that stability is not a well-suited tool to determine the number of clusters - it is determined by the symmetries of the data which may be unrelated to clustering parameters. We prove our results for center-based clusterings and for spectral clustering, and support our conclusions by many examples in which the behavior of stability is counter-intuitive.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Information Marginalization on Subgraphs

Huang, J., Zhu, T., Rereiner, R., Zhou, D., Schuurmans, D.

In ECML/PKDD 2006, pages: 199-210, (Editors: Fürnkranz, J. , T. Scheffer, M. Spiliopoulou), Springer, Berlin, Germany, 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, September 2006 (inproceedings)

Abstract
Real-world data often involves objects that exhibit multiple relationships; for example, ‘papers’ and ‘authors’ exhibit both paper-author interactions and paper-paper citation relationships. A typical learning problem requires one to make inferences about a subclass of objects (e.g. ‘papers’), while using the remaining objects and relations to provide relevant information. We present a simple, unified mechanism for incorporating information from multiple object types and relations when learning on a targeted subset. In this scheme, all sources of relevant information are marginalized onto the target subclass via random walks. We show that marginalized random walks can be used as a general technique for combining multiple sources of information in relational data. With this approach, we formulate new algorithms for transduction and ranking in relational data, and quantify the performance of new schemes on real world data—achieving good results in many problems.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Bayesian Active Learning for Sensitivity Analysis

Pfingsten, T.

In ECML 2006, pages: 353-364, (Editors: Fürnkranz, J. , T. Scheffer, M. Spiliopoulou), Springer, Berlin, Germany, 17th European Conference on Machine Learning, September 2006 (inproceedings)

Abstract
Designs of micro electro-mechanical devices need to be robust against fluctuations in mass production. Computer experiments with tens of parameters are used to explore the behavior of the system, and to compute sensitivity measures as expectations over the input distribution. Monte Carlo methods are a simple approach to estimate these integrals, but they are infeasible when the models are computationally expensive. Using a Gaussian processes prior, expensive simulation runs can be saved. This Bayesian quadrature allows for an active selection of inputs where the simulation promises to be most valuable, and the number of simulation runs can be reduced further. We present an active learning scheme for sensitivity analysis which is rigorously derived from the corresponding Bayesian expected loss. On three fully featured, high dimensional physical models of electro-mechanical sensors, we show that the learning rate in the active learning scheme is significantly better than for passive learning.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Machine Learning Algorithms for Polymorphism Detection

Schweikert, G., Zeller, G., Clark, R., Ossowski, S., Warthmann, N., Shinn, P., Frazer, K., Ecker, J., Huson, D., Weigel, D., Schölkopf, B., Rätsch, G.

2nd ISCB Student Council Symposium, August 2006 (talk)

Abstract
Analyzing resequencing array data using machine learning, we obtain a genome-wide inventory of polymorphisms in 20 wild strains of Arabidopsis thaliana, including 750,000 single nucleotide poly- morphisms (SNPs) and thousands of highly polymorphic regions and deletions. We thus provide an unprecedented resource for the study of natural variation in plants.

ei

Web [BibTex]

Web [BibTex]


no image
Supervised Probabilistic Principal Component Analysis

Yu, S., Yu, K., Tresp, V., Kriegel, H., Wu, M.

In KDD 2006, pages: 464-473, (Editors: Ungar, L. ), ACM Press, New York, NY, USA, 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2006 (inproceedings)

Abstract
Principal component analysis (PCA) has been extensively applied in data mining, pattern recognition and information retrieval for unsupervised dimensionality reduction. When labels of data are available, e.g.,~in a classification or regression task, PCA is however not able to use this information. The problem is more interesting if only part of the input data are labeled, i.e.,~in a semi-supervised setting. In this paper we propose a supervised PCA model called SPPCA and a semi-supervised PCA model called S$^2$PPCA, both of which are extensions of a probabilistic PCA model. The proposed models are able to incorporate the label information into the projection phase, and can naturally handle multiple outputs (i.e.,~in multi-task learning problems). We derive an efficient EM learning algorithm for both models, and also provide theoretical justifications of the model behaviors. SPPCA and S$^2$PPCA are compared with other supervised projection methods on various learning tasks, and show not only promising performance but also good scalability.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Let It Roll – Emerging Sensorimotor Coordination in a Spherical Robot

Der, R., Martius, G., Hesse, F.

In Proc, Artificial Life X, pages: 192-198, Intl. Society for Artificial Life, MIT Press, August 2006 (inproceedings)

al

[BibTex]

[BibTex]


no image
Inferential structure determination: Overview and new developments

Habeck, M.

Sixth CCPN Annual Conference: Efficient and Rapid Structure Determination by NMR, July 2006 (talk)

ei

Web [BibTex]

Web [BibTex]


no image
A Continuation Method for Semi-Supervised SVMs

Chapelle, O., Chi, M., Zien, A.

In ICML 2006, pages: 185-192, (Editors: Cohen, W. W., A. Moore), ACM Press, New York, NY, USA, 23rd International Conference on Machine Learning, June 2006 (inproceedings)

Abstract
Semi-Supervised Support Vector Machines (S3VMs) are an appealing method for using unlabeled data in classification: their objective function favors decision boundaries which do not cut clusters. However their main problem is that the optimization problem is non-convex and has many local minima, which often results in suboptimal performances. In this paper we propose to use a global optimization technique known as continuation to alleviate this problem. Compared to other algorithms minimizing the same objective function, our continuation method often leads to lower test errors.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Classification of natural scenes: Critical features revisited

Drewes, J., Wichmann, F., Gegenfurtner, K.

Journal of Vision, 6(6):561, 6th Annual Meeting of the Vision Sciences Society (VSS), June 2006 (poster)

Abstract
Human observers are capable of detecting animals within novel natural scenes with remarkable speed and accuracy. Despite the seeming complexity of such decisions it has been hypothesized that a simple global image feature, the relative abundance of high spatial frequencies at certain orientations, could underly such fast image classification (A. Torralba & A. Oliva, Network: Comput. Neural Syst., 2003). We successfully used linear discriminant analysis to classify a set of 11.000 images into “animal” and “non-animal” images based on their individual amplitude spectra only (Drewes, Wichmann, Gegenfurtner VSS 2005). We proceeded to sort the images based on the performance of our classifier, retaining only the best and worst classified 400 images (“best animals”, “best distractors” and “worst animals”, “worst distractors”). We used a Go/No-go paradigm to evaluate human performance on this subset of our images. Both reaction time and proportion of correctly classified images showed a significant effect of classification difficulty. Images more easily classified by our algorithm were also classified faster and better by humans, as predicted by the Torralba & Oliva hypothesis. We then equated the amplitude spectra of the 400 images, which, by design, reduced algorithmic performance to chance whereas human performance was only slightly reduced (cf. Wichmann, Rosas, Gegenfurtner, VSS 2005). Most importantly, the same images as before were still classified better and faster, suggesting that even in the original condition features other than specifics of the amplitude spectrum made particular images easy to classify, clearly at odds with the Torralba & Oliva hypothesis.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
MCMC inference in (Conditionally) Conjugate Dirichlet Process Gaussian Mixture Models

Rasmussen, C., Görür, D.

ICML Workshop on Learning with Nonparametric Bayesian Methods, June 2006 (talk)

Abstract
We compare the predictive accuracy of the Dirichlet Process Gaussian mixture models using conjugate and conditionally conjugate priors and show that better density models result from using the wider class of priors. We explore several MCMC schemes exploiting conditional conjugacy and show their computational merits on several multidimensional density estimation problems.

ei

Web [BibTex]

Web [BibTex]


no image
Trading Convexity for Scalability

Collobert, R., Sinz, F., Weston, J., Bottou, L.

In ICML 2006, pages: 201-208, (Editors: Cohen, W. W., A. Moore), ACM Press, New York, NY, USA, 23rd International Conference on Machine Learning, June 2006 (inproceedings)

Abstract
Convex learning algorithms, such as Support Vector Machines (SVMs), are often seen as highly desirable because they offer strong practical properties and are amenable to theoretical analysis. However, in this work we show how non-convexity can provide scalability advantages over convexity. We show how concave-convex programming can be applied to produce (i) faster SVMs where training errors are no longer support vectors, and (ii) much faster Transductive SVMs.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Personalized handwriting recognition via biased regularization

Kienzle, W., Chellapilla, K.

In ICML 2006, pages: 457-464, (Editors: Cohen, W. W., A. Moore), ACM Press, New York, NY, USA, 23rd International Conference on Machine Learning, June 2006 (inproceedings)

Abstract
We present a new approach to personalized handwriting recognition. The problem, also known as writer adaptation, consists of converting a generic (user-independent) recognizer into a personalized (user-dependent) one, which has an improved recognition rate for a particular user. The adaptation step usually involves user-specific samples, which leads to the fundamental question of how to fuse this new information with that captured by the generic recognizer. We propose adapting the recognizer by minimizing a regularized risk functional (a modified SVM) where the prior knowledge from the generic recognizer enters through a modified regularization term. The result is a simple personalization framework with very good practical properties. Experiments on a 100 class real-world data set show that the number of errors can be reduced by over 40% with as few as five user samples per character.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Sampling for non-conjugate infinite latent feature models

Görür, D., Rasmussen, C.

(Editors: Bernardo, J. M.), 8th Valencia International Meeting on Bayesian Statistics (ISBA), June 2006 (talk)

Abstract
Latent variable models are powerful tools to model the underlying structure in data. Infinite latent variable models can be defined using Bayesian nonparametrics. Dirichlet process (DP) models constitute an example of infinite latent class models in which each object is assumed to belong to one of the, mutually exclusive, infinitely many classes. Recently, the Indian buffet process (IBP) has been defined as an extension of the DP. IBP is a distribution over sparse binary matrices with infinitely many columns which can be used as a distribution for non-exclusive features. Inference using Markov chain Monte Carlo (MCMC) in conjugate IBP models has been previously described, however requiring conjugacy restricts the use of IBP. We describe an MCMC algorithm for non-conjugate IBP models. Modelling the choice behaviour is an important topic in psychology, economics and related fields. Elimination by Aspects (EBA) is a choice model that assumes each alternative has latent features with associated weights that lead to the observed choice outcomes. We formulate a non-parametric version of EBA by using IBP as the prior over the latent binary features. We infer the features of objects that lead to the choice data by using our sampling scheme for inference.

ei

PDF [BibTex]

PDF [BibTex]


no image
Deterministic annealing for semi-supervised kernel machines

Sindhwani, V., Keerthi, S., Chapelle, O.

In ICML 2006, pages: 841-848, (Editors: Cohen, W. W., A. Moore), ACM Press, New York, NY, USA, 23rd International Conference on Machine Learning, June 2006 (inproceedings)

Abstract
An intuitive approach to utilizing unlabeled data in kernel-based classification algorithms is to simply treat the unknown labels as additional optimization variables. For margin-based loss functions, one can view this approach as attempting to learn low-density separators. However, this is a hard optimization problem to solve in typical semi-supervised settings where unlabeled data is abundant. The popular Transductive SVM algorithm is a label-switching-retraining procedure that is known to be susceptible to local minima. In this paper, we present a global optimization framework for semi-supervised Kernel machines where an easier problem is parametrically deformed to the original hard problem and minimizers are smoothly tracked. Our approach is motivated from deterministic annealing techniques and involves a sequence of convex optimization problems that are exactly and efficiently solved. We present empirical results on several synthetic and real world datasets that demonstrate the effectiveness of our approach.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Clustering Graphs by Weighted Substructure Mining

Tsuda, K., Kudo, T.

In ICML 2006, pages: 953-960, (Editors: Cohen, W. W., A. Moore), ACM Press, New York, NY, USA, 23rd International Conference on Machine Learning, June 2006 (inproceedings)

Abstract
Graph data is getting increasingly popular in, e.g., bioinformatics and text processing. A main difficulty of graph data processing lies in the intrinsic high dimensionality of graphs, namely, when a graph is represented as a binary feature vector of indicators of all possible subgraphs, the dimensionality gets too large for usual statistical methods. We propose an efficient method for learning a binomial mixture model in this feature space. Combining the $ell_1$ regularizer and the data structure called DFS code tree, the MAP estimate of non-zero parameters are computed efficiently by means of the EM algorithm. Our method is applied to the clustering of RNA graphs, and is compared favorably with graph kernels and the spectral graph distance.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
A Choice Model with Infinitely Many Latent Features

Görür, D., Jäkel, F., Rasmussen, C.

In ICML 2006, pages: 361-368, (Editors: Cohen, W. W., A. Moore), ACM Press, New York, NY, USA, 23rd International Conference on Machine Learning, June 2006 (inproceedings)

Abstract
Elimination by aspects (EBA) is a probabilistic choice model describing how humans decide between several options. The options from which the choice is made are characterized by binary features and associated weights. For instance, when choosing which mobile phone to buy the features to consider may be: long lasting battery, color screen, etc. Existing methods for inferring the parameters of the model assume pre-specified features. However, the features that lead to the observed choices are not always known. Here, we present a non-parametric Bayesian model to infer the features of the options and the corresponding weights from choice data. We use the Indian buffet process (IBP) as a prior over the features. Inference using Markov chain Monte Carlo (MCMC) in conjugate IBP models has been previously described. The main contribution of this paper is an MCMC algorithm for the EBA model that can also be used in inference for other non-conjugate IBP models---this may broaden the use of IBP priors considerably.

ei

PostScript PDF Web DOI [BibTex]

PostScript PDF Web DOI [BibTex]


no image
Learning High-Order MRF Priors of Color Images

McAuley, J., Caetano, T., Smola, A., Franz, MO.

In ICML 2006, pages: 617-624, (Editors: Cohen, W. W., A. Moore), ACM Press, New York, NY, USA, 23rd International Conference on Machine Learning, June 2006 (inproceedings)

Abstract
In this paper, we use large neighborhood Markov random fields to learn rich prior models of color images. Our approach extends the monochromatic Fields of Experts model (Roth and Blackwell, 2005) to color images. In the Fields of Experts model, the curse of dimensionality due to very large clique sizes is circumvented by parameterizing the potential functions according to a product of experts. We introduce several simplifications of the original approach by Roth and Black which allow us to cope with the increased clique size (typically 3x3x3 or 5x5x3 pixels) of color images. Experimental results are presented for image denoising which evidence improvements over state-of-the-art monochromatic image priors.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Inference with the Universum

Weston, J., Collobert, R., Sinz, F., Bottou, L., Vapnik, V.

In ICML 2006, pages: 1009-1016, (Editors: Cohen, W. W., A. Moore), ACM Press, New York, NY, USA, 23rd International Conference on Machine Learning, June 2006 (inproceedings)

Abstract
WIn this paper we study a new framework introduced by Vapnik (1998) and Vapnik (2006) that is an alternative capacity concept to the large margin approach. In the particular case of binary classification, we are given a set of labeled examples, and a collection of "non-examples" that do not belong to either class of interest. This collection, called the Universum, allows one to encode prior knowledge by representing meaningful concepts in the same domain as the problem at hand. We describe an algorithm to leverage the Universum by maximizing the number of observed contradictions, and show experimentally that this approach delivers accuracy improvements over using labeled data alone.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
The pedestal effect is caused by off-frequency looking, not nonlinear transduction or contrast gain-control

Wichmann, F., Henning, B.

Journal of Vision, 6(6):194, 6th Annual Meeting of the Vision Sciences Society (VSS), June 2006 (poster)

Abstract
The pedestal or dipper effect is the large improvement in the detectabilty of a sinusoidal grating observed when the signal is added to a pedestal or masking grating having the signal‘s spatial frequency, orientation, and phase. The effect is largest with pedestal contrasts just above the ‘threshold‘ in the absence of a pedestal. We measured the pedestal effect in both broadband and notched masking noise---noise from which a 1.5- octave band centered on the signal and pedestal frequency had been removed. The pedestal effect persists in broadband noise, but almost disappears with notched noise. The spatial-frequency components of the notched noise that lie above and below the spatial frequency of the signal and pedestal prevent the use of information about changes in contrast carried in channels tuned to spatial frequencies that are very much different from that of the signal and pedestal. We conclude that the pedestal effect in the absence of notched noise results principally from the use of information derived from channels with peak sensitivities at spatial frequencies that are different from that of the signal and pedestal. Thus the pedestal or dipper effect is not a characteristic of individual spatial-frequency tuned channels.

ei

Web DOI [BibTex]

Web DOI [BibTex]


no image
Statistical Convergence of Kernel CCA

Fukumizu, K., Bach, F., Gretton, A.

In Advances in neural information processing systems 18, pages: 387-394, (Editors: Weiss, Y. , B. Schölkopf, J. Platt), MIT Press, Cambridge, MA, USA, Nineteenth Annual Conference on Neural Information Processing Systems (NIPS), May 2006 (inproceedings)

Abstract
While kernel canonical correlation analysis (kernel CCA) has been applied in many problems, the asymptotic convergence of the functions estimated from a finite sample to the true functions has not yet been established. This paper gives a rigorous proof of the statistical convergence of kernel CCA and a related method (NOCCO), which provides a theoretical justification for these methods. The result also gives a sufficient condition on the decay of the regularization coefficient in the methods to ensure convergence.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Products of "Edge-perts"

Gehler, PV., Welling, M.

In Advances in neural information processing systems 18, pages: 419-426, (Editors: Weiss, Y. , B. Schölkopf, J. Platt), MIT Press, Cambridge, MA, USA, Nineteenth Annual Conference on Neural Information Processing Systems (NIPS), May 2006 (inproceedings)

Abstract
Images represent an important and abundant source of data. Understanding their statistical structure has important applications such as image compression and restoration. In this paper we propose a particular kind of probabilistic model, dubbed the “products of edge-perts model” to describe the structure of wavelet transformed images. We develop a practical denoising algorithm based on a single edge-pert and show state-ofthe-art denoising performance on benchmark images.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Assessing Approximations for Gaussian Process Classification

Kuss, M., Rasmussen, C.

In Advances in neural information processing systems 18, pages: 699-706, (Editors: Weiss, Y. , B. Schölkopf, J. Platt), MIT Press, Cambridge, MA, USA, Nineteenth Annual Conference on Neural Information Processing Systems (NIPS), May 2006 (inproceedings)

Abstract
Gaussian processes are attractive models for probabilistic classification but unfortunately exact inference is analytically intractable. We compare Laplace‘s method and Expectation Propagation (EP) focusing on marginal likelihood estimates and predictive performance. We explain theoretically and corroborate empirically that EP is superior to Laplace. We also compare to a sophisticated MCMC scheme and show that EP is surprisingly accurate.

ei

PDF Web [BibTex]

PDF Web [BibTex]


no image
Learning an Interest Operator from Human Eye Movements

Kienzle, W., Wichmann, F., Schölkopf, B., Franz, M.

In CVPWR 2006, pages: page 24, (Editors: C Schmid and S Soatto and C Tomasi), IEEE Computer Society, Los Alamitos, CA, USA, 2006 Conference on Computer Vision and Pattern Recognition Workshop, April 2006 (inproceedings)

Abstract
We present an approach for designing interest operators that are based on human eye movement statistics. In contrast to existing methods which use hand-crafted saliency measures, we use machine learning methods to infer an interest operator directly from eye movement data. That way, the operator provides a measure of biologically plausible interestingness. We describe the data collection, training, and evaluation process, and show that our learned saliency measure significantly accounts for human eye movements. Furthermore, we illustrate connections to existing interest operators, and present a multi-scale interest point detector based on the learned function.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Evaluating Predictive Uncertainty Challenge

Quinonero Candela, J., Rasmussen, C., Sinz, F., Bousquet, O., Schölkopf, B.

In Machine Learning Challenges: Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Tectual Entailment, pages: 1-27, (Editors: J Quiñonero Candela and I Dagan and B Magnini and F d’Alché-Buc), Springer, Berlin, Germany, First PASCAL Machine Learning Challenges Workshop (MLCW), April 2006 (inproceedings)

Abstract
This Chapter presents the PASCAL Evaluating Predictive Uncertainty Challenge, introduces the contributed Chapters by the participants who obtained outstanding results, and provides a discussion with some lessons to be learnt. The Challenge was set up to evaluate the ability of Machine Learning algorithms to provide good “probabilistic predictions”, rather than just the usual “point predictions” with no measure of uncertainty, in regression and classification problems. Parti-cipants had to compete on a number of regression and classification tasks, and were evaluated by both traditional losses that only take into account point predictions and losses we proposed that evaluate the quality of the probabilistic predictions.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
An Inventory of Sequence Polymorphisms For Arabidopsis

Clark, R., Ossowski, S., Schweikert, G., Rätsch, G., Shinn, P., Zeller, G., Warthmann, N., Fu, G., Hinds, D., Chen, H., Frazer, K., Huson, D., Schölkopf, B., Nordborg, M., Ecker, J., Weigel, D.

17th International Conference on Arabidopsis Research, April 2006 (talk)

Abstract
We have used high-density oligonucleotide arrays to characterize common sequence variation in 20 wild strains of Arabidopsis thaliana that were chosen for maximal genetic diversity. Both strands of each possible SNP of the 119 Mb reference genome were represented on the arrays, which were hybridized with whole genome, isothermally amplified DNA to minimize ascertainment biases. Using two complementary approaches, a model based algorithm, and a newly developed machine learning method, we identified over 550,000 SNPs with a false discovery rate of ~ 0.03 (average of 1 SNP for every 216 bp of the genome). A heuristic algorithm predicted in addition ~700 highly polymorphic or deleted regions per accession. Over 700 predicted polymorphisms with major functional effects (e.g., premature stop codons, or deletions of coding sequence) were validated by dideoxy sequencing. Using this data set, we provide the first systematic description of the types of genes that harbor major effect polymorphisms in natural populations at moderate allele frequencies. The data also provide an unprecedented resource for the study of genetic variation in an experimentally tractable, multicellular model organism.

ei

[BibTex]

[BibTex]


no image
Estimating Predictive Variances with Kernel Ridge Regression

Cawley, G., Talbot, N., Chapelle, O.

In MLCW 2005, pages: 56-77, (Editors: Quinonero-Candela, J. , I. Dagan, B. Magnini, F. D‘Alché-Buc), Springer, Berlin, Germany, First PASCAL Machine Learning Challenges Workshop, April 2006 (inproceedings)

Abstract
In many regression tasks, in addition to an accurate estimate of the conditional mean of the target distribution, an indication of the predictive uncertainty is also required. There are two principal sources of this uncertainty: the noise process contaminating the data and the uncertainty in estimating the model parameters based on a limited sample of training data. Both of them can be summarised in the predictive variance which can then be used to give confidence intervals. In this paper, we present various schemes for providing predictive variances for kernel ridge regression, especially in the case of a heteroscedastic regression, where the variance of the noise process contaminating the data is a smooth function of the explanatory variables. The use of leave-one-out cross-validation is shown to eliminate the bias inherent in estimates of the predictive variance. Results obtained on all three regression tasks comprising the predictive uncertainty challenge demonstrate the value of this approach.

ei

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
The Pedestal Effect is Caused by Off-Frequency Looking, not Nonlinear Transduction or Contrast Gain-Control

Wichmann, F., Henning, G.

9, pages: 174, 9th T{\"u}bingen Perception Conference (TWK), March 2006 (poster)

Abstract
The pedestal or dipper effect is the large improvement in the detectability of a sinusoidal grating observed when the signal is added to a pedestal or masking grating having the signal‘s spatial frequency, orientation, and phase. The effect is largest with pedestal contrasts just above the ‘threshold’ in the absence of a pedestal. We measured the pedestal effect in both broadband and notched masking noise---noise from which a 1.5-octave band centered on the signal and pedestal frequency had been removed. The pedestal effect persists in broadband noise, but almost disappears with notched noise. The spatial-frequency components of the notched noise that lie above and below the spatial frequency of the signal and pedestal prevent the use of information about changes in contrast carried in channels tuned to spatial frequencies that are very much different from that of the signal and pedestal. We conclude that the pedestal effect in the absence of notched noise results principally from the use of information derived from channels with peak sensitivities at spatial frequencies that are different from that of the signal and pedestal. Thus the pedestal or dipper effect is not a characteristic of individual spatial-frequency tuned channels.

ei

Web [BibTex]

Web [BibTex]


no image
Classification of Natural Scenes: Critical Features Revisited

Drewes, J., Wichmann, F., Gegenfurtner, K.

9, pages: 92, 9th T{\"u}bingen Perception Conference (TWK), March 2006 (poster)

Abstract
Human observers are capable of detecting animals within novel natural scenes with remarkable speed and accuracy. Despite the seeming complexity of such decisions it has been hypothesized that a simple global image feature, the relative abundance of high spatial frequencies at certain orientations, could underly such fast image classification [1]. We successfully used linear discriminant analysis to classify a set of 11.000 images into “animal” and “non-animal” images based on their individual amplitude spectra only [2]. We proceeded to sort the images based on the performance of our classifier, retaining only the best and worst classified 400 images ("best animals", "best distractors" and "worst animals", "worst distractors"). We used a Go/No-go paradigm to evaluate human performance on this subset of our images. Both reaction time and proportion of correctly classified images showed a significant effect of classification difficulty. Images more easily classified by our algorithm were also classified faster and better by humans, as predicted by the Torralba & Oliva hypothesis. We then equated the amplitude spectra of the 400 images, which, by design, reduced algorithmic performance to chance whereas human performance was only slightly reduced [3]. Most importantly, the same images as before were still classified better and faster, suggesting that even in the original condition features other than specifics of the amplitude spectrum made particular images easy to classify, clearly at odds with the Torralba & Oliva hypothesis.

ei

Web [BibTex]

Web [BibTex]