2005

Kernel ICA for Large Scale Problems

Jegelka, S., Gretton, A., Achlioptas, D.

In pages: -, NIPS Workshop on Large Scale Kernel Machines, December 2005 (inproceedings)

ei

2005

Infinite dimensional exponential families by reproducing kernel Hilbert spaces
In IGAIA 2005, pages: 324-333, 2nd International Symposium on Information Geometry and its Applications, December 2005 (inproceedings)

Abstract
The purpose of this paper is to propose a method of constructing exponential families of Hilbert manifold, on which estimation theory can be built. Although there have been works on infinite dimensional exponential families of Banach manifolds (Pistone and Sempi, 1995; Gibilisco and Pistone, 1998; Pistone and Rogantin, 1999), they are not appropriate to discuss statistical estimation with finite number of samples; the likelihood function with finite samples is not continuous on the manifold. In this paper we use a reproducing kernel Hilbert space as a functional space for constructing an exponential manifold. A reproducing kernel Hilbert space is dened as a Hilbert space of functions such that evaluation of a function at an arbitrary point is a continuous functional on the Hilbert space. Since we can discuss the value of a function with this space, it is very natural to use a manifold associated with a reproducing kernel Hilbert space as a basis of estimation theory. We focus on the maximum likelihood estimation (MLE) with the exponential manifold of a reproducing kernel Hilbert space. As in many non-parametric estimation methods, straightforward extension of MLE to an infinite dimensional exponential manifold suffers the problem of ill-posedness caused by the fact that the estimator should be chosen from the infinite dimensional space with only finite number of constraints given by the data. To solve this problem, a pseudo-maximum likelihood method is proposed by restricting the infinite dimensional manifold to a series of finite dimensional submanifolds, which enlarge as the number of samples increases. Some asymptotic results in the limit of infinite samples are shown, including the consistency of the pseudo-MLE.

ei

Method and device for detection of splice form and alternative splice forms in DNA or RNA sequences

Rätsch, G., Sonnenburg, S., Müller, K., Schölkopf, B.

European Patent Application, International No PCT/EP2005/005783, December 2005 (patent)

ei

Shortest-path kernels on graphs

Borgwardt, KM., Kriegel, H-P.

In pages: 74-81, IEEE Computer Society, Los Alamitos, CA, USA, Fifth International Conference on Data Mining (ICDM), November 2005 (inproceedings)

Abstract
Data mining algorithms are facing the challenge to deal with an increasing number of complex objects. For graph data, a whole toolbox of data mining algorithms becomes available by defining a kernel function on instances of graphs. Graph kernels based on walks, subtrees and cycles in graphs have been proposed so far. As a general problem, these kernels are either computationally expensive or limited in their expressiveness. We try to overcome this problem by defining expressive graph kernels which are based on paths. As the computation of all paths and longest paths in a graph is NP-hard, we propose graph kernels based on shortest paths. These kernels are computable in polynomial time, retain expressivity and are still positive definite. In experiments on classification of graph models of proteins, our shortest-path kernels show significantly higher classification accuracy than walk-based kernels.

ei

Training Support Vector Machines with Multiple Equality Constraints
In Proceedings of the 16th European Conference on Machine Learning, Lecture Notes in Computer Science, Vol. 3720, pages: 182-193, (Editors: JG Carbonell and J Siekmann), Springer, Berlin, Germany, ECML, November 2005 (inproceedings)

Abstract
In this paper we present a primal-dual decomposition algorithm for support vector machine training. As with existing methods that use very small working sets (such as Sequential Minimal Optimization (SMO), Successive Over-Relaxation (SOR) or the Kernel Adatron (KA)), our method scales well, is straightforward to implement, and does not require an external QP solver. Unlike SMO, SOR and KA, the method is applicable to a large number of SVM formulations regardless of the number of equality constraints involved. The effectiveness of our algorithm is demonstrated on a more difficult SVM variant in this respect, namely semi-parametric support vector regression.

ei

Measuring Statistical Dependence with Hilbert-Schmidt Norms

Gretton, A., Bousquet, O., Smola, A., Schoelkopf, B.

In Algorithmic Learning Theory, Lecture Notes in Computer Science, Vol. 3734, pages: 63-78, (Editors: S Jain and H-U Simon and E Tomita), Springer, Berlin, Germany, 16th International Conference ALT, October 2005 (inproceedings)

Abstract
We propose an independence criterion based on the eigenspectrum of covariance operators in reproducing kernel Hilbert spaces (RKHSs), consisting of an empirical estimate of the Hilbert-Schmidt norm of the cross-covariance operator (we term this a Hilbert-Schmidt Independence Criterion, or HSIC). This approach has several advantages, compared with previous kernel-based independence criteria. First, the empirical estimate is simpler than any other kernel dependence test, and requires no user-defined regularisation. Second, there is a clearly defined population quantity which the empirical estimate approaches in the large sample limit, with exponential convergence guaranteed between the two: this ensures that independence tests based on {methodname} do not suffer from slow learning rates. Finally, we show in the context of independent component analysis (ICA) that the performance of HSIC is competitive with that of previously published kernel-based criteria, and of other recently published ICA methods.

ei

An Analysis of the Anti-Learning Phenomenon for the Class Symmetric Polyhedron

Kowalczyk, A., Chapelle, O.

In Algorithmic Learning Theory: 16th International Conference, pages: 78-92, Algorithmic Learning Theory, October 2005 (inproceedings)

Abstract
This paper deals with an unusual phenomenon where most machine learning algorithms yield good performance on the training set but systematically worse than random performance on the test set. This has been observed so far for some natural data sets and demonstrated for some synthetic data sets when the classification rule is learned from a small set of training samples drawn from some high dimensional space. The initial analysis presented in this paper shows that anti-learning is a property of data sets and is quite distinct from overfitting of a training data. Moreover, the analysis leads to a specification of some machine learning procedures which can overcome anti-learning and generate ma- chines able to classify training and test data consistently.

ei

A quantitative evaluation of video-based 3D person tracking

Balan, A. O., Sigal, L., Black, M. J.

In The Second Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, VS-PETS, pages: 349-356, October 2005 (inproceedings)

ps

Perception of Curvature and Object Motion Via Contact Location Feedback

Provancher, W. R., Kuchenbecker, K. J., Niemeyer, G., Cutkosky, M. R.

In Proceedings of the International Symposium on Robotics Research (ISRR), 15, pages: 456-465, Springer Tracts in Advanced Robotics, Springer, Siena, Italy, 2005, Oral presentation given by Provancher in October of 2003 (inproceedings)

hi

Visual motion analysis method for detecting arbitrary numbers of moving objects in image sequences

Jepson, A. D., Fleet, D. J., Black, M. J.

US Pat. 6,954,544, October 2005 (patent)

ps

A new methodology for robot controller design

Peters, J., Peters, J., Mistry, M., Udwadia, F.

In Proceedings of the 5th ASME International Design Engineering Technical Conferences and Computers and Information in Engineering Conference (IDETC‘05), 5, pages: 1067-1076 , ASME, New York, NY, USA, 5th ASME International Design Engineering Technical Conferences and Computers and Information in Engineering Conference (IDETC-MSNDC), September 2005 (inproceedings)

Abstract
Gauss' principle of least constraint and its generalizations have provided a useful insights for the development of tracking controllers for mechanical systems [1]. Using this concept, we present a novel methodology for the design of a specific class of robot controllers. With our new framework, we demonstrate that well-known and also several novel nonlinear robot control laws can be derived from this generic framework, and show experimental verifications on a Sarcos Master Arm robot for some of these controllers. We believe that the suggested approach unifies and simplifies the design of optimal nonlinear control laws for robots obeying rigid body dynamics equations, both with or without external constraints, holonomic or nonholonomic constraints, with over-actuation or underactuation, as well as open-chain and closed-chain kinematics.

ei

EEG-Based Mental Task Classification: Linear and Nonlinear Classification of Movement Imagery
In EMBS, 27th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS), September 1-4,, Shanghai, China (Accepted), September 2005 (inproceedings) Accepted

Abstract
AbstractUse of EEG signals as a channel of communication between men and machines represents one of the current challenges in signal theory research. The principal element of such a communication system, known as a Brain-Computer Interface, is the interpretation of the EEG signals related to the characteristic parameters of brain electrical activity. Our goal in this work was extracting quantitative changes in the EEG due to movement imagination. Subject&amp;lsquo;s EEG was recorded while he performed left or right hand movement imagination. Different feature sets extracted from EEG were used as inputs into linear, Neural Network and HMM classifiers for the purpose of imagery movement mental task classification. The results indicate that applying linear classifier to 5 frequency features of asymmetry signal produced from channel C3 and C4 can provide a very high classification accuracy percentage as a simple classifier with small number of features comparing to other feature sets.

ei

Inferring attentional state and kinematics from motor cortical firing rates

Wood, F., Prabhat, , Donoghue, J. P., Black, M. J.

In Proc. IEEE Engineering in Medicine and Biology Society, pages: 1544-1547, September 2005 (inproceedings)

ps

Motor cortical decoding using an autoregressive moving average model

Fisher, J., Black, M. J.

In Proc. IEEE Engineering in Medicine and Biology Society, pages: 1469-1472, September 2005 (inproceedings)

ps

Building Sparse Large Margin Classifiers
In Proceedings of the 22nd International Conference on Machine Learning, pages: 996-1003, (Editors: L De Raedt and S Wrobel ), ACM, New York, NY, USA, ICML , August 2005 (inproceedings)

Abstract
This paper presents an approach to build Sparse Large Margin Classifiers (SLMC) by adding one more constraint to the standard Support Vector Machine (SVM) training problem. The added constraint explicitly controls the sparseness of the classifier and an approach is provided to solve the formulated problem. When considering the dual of this problem, it can be seen that building an SLMC is equivalent to constructing an SVM with a modified kernel function. Further analysis of this kernel function indicates that the proposed approach essentially finds a discriminating subspace that can be spanned by a small number of vectors, and in this subspace different classes of data are linearly well separated. Experimental results over several classification benchmarks show that in most cases the proposed approach outperforms the state-of-art sparse learning algorithms.

ei

A unifying methodology for the control of robotic systems

Peters, J., Mistry, M., Udwadia, F., Cory, R., Nakanishi, J., Schaal, S.

In Proceedings of the 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2005), pages: 1824-1831, IEEE Operations Center, Piscataway, NJ, USA, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), August 2005 (inproceedings)

Abstract
Recently, R. E. Udwadia (2003) suggested to derive tracking controllers for mechanical systems using a generalization of Gauss&lsquo; principle of least constraint. This method allows us to reformulate control problems as a special class of optimal control. We take this line of reasoning one step further and demonstrate that well-known and also several novel nonlinear robot control laws can be derived from this generic methodology. We show experimental verifications on a Sarcos Master Arm robot for some of the derived controllers. We believe that the suggested approach offers a promising unification and simplification of nonlinear control law design for robots obeying rigid body dynamics equations, both with or without external constraints, with over-actuation or underactuation, as well as open-chain and closed-chain kinematics.

ei

Learning from Labeled and Unlabeled Data on a Directed Graph
In Proceedings of the 22nd International Conference on Machine Learning, pages: 1041 -1048, (Editors: L De Raedt and S Wrobel), ACM, New York, NY, USA, ICML, August 2005 (inproceedings)

Abstract
We propose a general framework for learning from labeled and unlabeled data on a directed graph in which the structure of the graph including the directionality of the edges is considered. The time complexity of the algorithm derived from this framework is nearly linear due to recently developed numerical techniques. In the absence of labeled instances, this framework can be utilized as a spectral clustering method for directed graphs, which generalizes the spectral clustering approach for undirected graphs. We have applied our framework to real-world web classification problems and obtained encouraging results.

ei

Regularization on Discrete Spaces
In Pattern Recognition, Lecture Notes in Computer Science, Vol. 3663, pages: 361-368, (Editors: WG Kropatsch and R Sablatnig and A Hanbury), Springer, Berlin, Germany, 27th DAGM Symposium, August 2005 (inproceedings)

Abstract
We consider the classification problem on a finite set of objects. Some of them are labeled, and the task is to predict the labels of the remaining unlabeled ones. Such an estimation problem is generally referred to as transductive inference. It is well-known that many meaningful inductive or supervised methods can be derived from a regularization framework, which minimizes a loss function plus a regularization term. In the same spirit, we propose a general discrete regularization framework defined on finite object sets, which can be thought of as the discrete analogue of classical regularization theory. A family of transductive inference schemes is then systemically derived from the framework, including our earlier algorithm for transductive inference, with which we obtained encouraging results on many practical classification problems. The discrete regularization framework is built on the discrete analysis and geometry developed by ourselves, in which a number of discrete differential operators of various orders are constructed, which can be thought of as the discrete analogue of their counterparts in the continuous case.

ei

Large Margin Non-Linear Embedding

Zien, A., Candela, J.

In ICML 2005, pages: 1065-1072, (Editors: De Raedt, L. , S. Wrobel), ACM Press, New York, NY, USA, 22nd International Conference on Machine Learning, August 2005 (inproceedings)

Abstract
It is common in classification methods to first place data in a vector space and then learn decision boundaries. We propose reversing that process: for fixed decision boundaries, we learn&amp;amp;lsquo;&amp;amp;lsquo; the location of the data. This way we (i) do not need a metric (or even stronger structure) -- pairwise dissimilarities suffice; and additionally (ii) produce low-dimensional embeddings that can be analyzed visually. We achieve this by combining an entropy-based embedding method with an entropy-based version of semi-supervised logistic regression. We present results for clustering and semi-supervised classification.

ei

Triangle Fixing Algorithms for the Metric Nearness Problem

Dhillon, I., Sra, S., Tropp, J.

In Advances in Neural Information Processing Systems 17, pages: 361-368, (Editors: Saul, L.K. , Y. Weiss, L. Bottou), MIT Press, Cambridge, MA, USA, Eighteenth Annual Conference on Neural Information Processing Systems (NIPS), July 2005 (inproceedings)

Abstract
Various problems in machine learning, databases, and statistics involve pairwise distances among a set of objects. It is often desirable for these distances to satisfy the properties of a metric, especially the triangle inequality. Applications where metric data is useful include clustering, classification, metric-based indexing, and approximation algorithms for various graph problems. This paper presents the Metric Nearness Problem: Given a dissimilarity matrix, find the "nearest" matrix of distances that satisfy the triangle inequalities. For lp nearness measures, this paper develops efficient triangle fixing algorithms that compute globally optimal solutions by exploiting the inherent structure of the problem. Empirically, the algorithms have time and storage costs that are linear in the number of triangle constraints. The methods can also be easily parallelized for additional speed.

ei

Face Detection: Efficient and Rank Deficient
In Advances in Neural Information Processing Systems 17, pages: 673-680, (Editors: LK Saul and Y Weiss and L Bottou), MIT Press, Cambridge, MA, USA, 18th Annual Conference on Neural Information Processing Systems (NIPS), July 2005 (inproceedings)

Abstract
This paper proposes a method for computing fast approximations to support vector decision functions in the field of object detection. In the present approach we are building on an existing algorithm where the set of support vectors is replaced by a smaller, so-called reduced set of synthesized input space points. In contrast to the existing method that finds the reduced set via unconstrained optimization, we impose a structural constraint on the synthetic points such that the resulting approximations can be evaluated via separable filters. For applications that require scanning an entire image, this decreases the computational complexity of a scan by a significant amount. We present experimental results on a standard face detection database.

ei

Methods Towards Invasive Human Brain Computer Interfaces

Lal, T., Hinterberger, T., Widman, G., Schröder, M., Hill, J., Rosenstiel, W., Elger, C., Schölkopf, B., Birbaumer, N.

In Advances in Neural Information Processing Systems 17, pages: 737-744, (Editors: LK Saul and Y Weiss and L Bottou), MIT Press, Cambridge, MA, USA, 18th Annual Conference on Neural Information Processing Systems (NIPS), July 2005 (inproceedings)

Abstract
During the last ten years there has been growing interest in the development of Brain Computer Interfaces (BCIs). The field has mainly been driven by the needs of completely paralyzed patients to communicate. With a few exceptions, most human BCIs are based on extracranial electroencephalography (EEG). However, reported bit rates are still low. One reason for this is the low signal-to-noise ratio of the EEG. We are currently investigating if BCIs based on electrocorticography (ECoG) are a viable alternative. In this paper we present the method and examples of intracranial EEG recordings of three epilepsy patients with electrode grids placed on the motor cortex. The patients were asked to repeatedly imagine movements of two kinds, e.g., tongue or finger movements. We analyze the classifiability of the data using Support Vector Machines (SVMs) and Recursive Channel Elimination (RCE).

ei

A Machine Learning Approach to Conjoint Analysis
In Advances in Neural Information Processing Systems 17, pages: 257-264, (Editors: Saul, L.K. , Y. Weiss, L. Bottou), MIT Press, Cambridge, MA, USA, Eighteenth Annual Conference on Neural Information Processing Systems (NIPS), July 2005 (inproceedings)

Abstract
Choice-based conjoint analysis builds models of consumers preferences over products with answers gathered in questionnaires. Our main goal is to bring tools from the machine learning community to solve more efficiently this problem. Thus, we propose two algorithms to estimate quickly and accurately consumer preferences.

ei

An Auditory Paradigm for Brain-Computer Interfaces

Hill, N., Lal, T., Bierig, K., Birbaumer, N., Schölkopf, B.

In Advances in Neural Information Processing Systems 17, pages: 569-576, (Editors: LK Saul and Y Weiss and L Bottou), MIT Press, Cambridge, MA, USA, 18th Annual Conference on Neural Information Processing Systems (NIPS), July 2005 (inproceedings)

Abstract
Motivated by the particular problems involved in communicating with "locked-in" paralysed patients, we aim to develop a brain-computer interface that uses auditory stimuli. We describe a paradigm that allows a user to make a binary decision by focusing attention on one of two concurrent auditory stimulus sequences. Using Support Vector Machine classification and Recursive Channel Elimination on the independent components of averaged event-related potentials, we show that an untrained user's EEG data can be classified with an encouragingly high level of accuracy. This suggests that it is possible for users to modulate EEG signals in a single trial by the conscious direction of attention, well enough to be useful in BCI.

ei

Tsuda, K., Rätsch, G., Warmuth, M.

In Advances in Neural Information Processing Systems 17, pages: 1425-1432, (Editors: Saul, L.K. , Y. Weiss, L. Bottou), MIT Press, Cambridge, MA, USA, Eighteenth Annual Conference on Neural Information Processing Systems (NIPS), July 2005 (inproceedings)

Abstract
We address the problem of learning a symmetric positive definite matrix. The central issue is to design parameter updates that preserve positive definiteness. Our updates are motivated with the von Neumann divergence. Rather than treating the most general case, we focus on two key applications that exemplify our methods: On-line learning with a simple square loss and finding a symmetric positive definite matrix subject to symmetric linear constraints. The updates generalize the Exponentiated Gradient (EG) update and AdaBoost, respectively: the parameter is now a symmetric positive definite matrix of trace one instead of a probability vector (which in this context is a diagonal positive definite matrix with trace one). The generalized updates use matrix logarithms and exponentials to preserve positive definiteness. Most importantly, we show how the analysis of each algorithm generalizes to the non-diagonal case. We apply both new algorithms, called the Matrix Exponentiated Gradient (MEG) update and DefiniteBoost, to learn a kernel matrix from distance measurements.

ei

Machine Learning Applied to Perception: Decision Images for Classification

Wichmann, F., Graf, A., Simoncelli, E., Bülthoff, H., Schölkopf, B.

In Advances in Neural Information Processing Systems 17, pages: 1489-1496, (Editors: LK, Saul and Y, Weiss and L, Bottou), MIT Press, Cambridge, MA, USA, 18th Annual Conference on Neural Information Processing Systems (NIPS), July 2005 (inproceedings)

Abstract
We study gender discrimination of human faces using a combination of psychophysical classification and discrimination experiments together with methods from machine learning. We reduce the dimensionality of a set of face images using principal component analysis, and then train a set of linear classifiers on this reduced representation (linear support vector machines (SVMs), relevance vector machines (RVMs), Fisher linear discriminant (FLD), and prototype (prot) classifiers) using human classification data. Because we combine a linear preprocessor with linear classifiers, the entire system acts as a linear classifier, allowing us to visualise the decision-image corresponding to the normal vector of the separating hyperplanes (SH) of each classifier. We predict that the female-to-maleness transition along the normal vector for classifiers closely mimicking human classification (SVM and RVM 1) should be faster than the transition along any other direction. A psychophysical discrimination experiment using the decision images as stimuli is consistent with this prediction.

ei

Breaking SVM Complexity with Cross-Training

Bakir, G., Bottou, L., Weston, J.

In Advances in Neural Information Processing Systems 17, pages: 81-88, (Editors: Saul, L.K. , Y. Weiss, L. Bottou), MIT Press, Cambridge, MA, USA, Eighteenth Annual Conference on Neural Information Processing Systems (NIPS), July 2005 (inproceedings)

Abstract
We propose an algorithm for selectively removing examples from the training set using probabilistic estimates related to editing algorithms (Devijver and Kittler82). The procedure creates a separable distribution of training examples with minimal impact on the decision boundary position. It breaks the linear dependency between the number of SVs and the number of training examples, and sharply reduces the complexity of SVMs during both the training and prediction stages.

ei

Implicit Wiener series for higher-order image analysis

Franz, M., Schölkopf, B.

In Advances in Neural Information Processing Systems 17, pages: 465-472, (Editors: LK Saul and Y Weiss and L Bottou), MIT Press, Cambridge, MA, USA, 18th Annual Conference on Neural Information Processing Systems (NIPS), July 2005 (inproceedings)

Abstract
The computation of classical higher-order statistics such as higher-order moments or spectra is difficult for images due to the huge number of terms to be estimated and interpreted. We propose an alternative approach in which multiplicative pixel interactions are described by a series of Wiener functionals. Since the functionals are estimated implicitly via polynomial kernels, the combinatorial explosion associated with the classical higher-order statistics is avoided. First results show that image structures such as lines or corners can be predicted correctly, and that pixel interactions up to the order of five play an important role in natural images.

ei

Limits of Spectral Clustering
In Advances in Neural Information Processing Systems 17, pages: 857-864, (Editors: Saul, L. K., Y. Weiss, L. Bottou), MIT Press, Cambridge, MA, USA, Eighteenth Annual Conference on Neural Information Processing Systems (NIPS), July 2005 (inproceedings)

Abstract
An important aspect of clustering algorithms is whether the partitions constructed on finite samples converge to a useful clustering of the whole data space as the sample size increases. This paper investigates this question for normalized and unnormalized versions of the popular spectral clustering algorithm. Surprisingly, the convergence of unnormalized spectral clustering is more difficult to handle than the normalized case. Even though recently some first results on the convergence of normalized spectral clustering have been obtained, for the unnormalized case we have to develop a completely new approach combining tools from numerical integration, spectral and perturbation theory, and probability. It turns out that while in the normalized case, spectral clustering usually converges to a nice partition of the data space, in the unnormalized case the same only holds under strong additional assumptions which are not always satisfied. We conclude that our analysis gives strong evidence for the superiority of normalized spectral clustering. It also provides a basis for future exploration of other Laplacian-based methods.

ei

Semi-supervised Learning on Directed Graphs
In Advances in Neural Information Processing Systems 17, pages: 1633-1640, (Editors: LK Saul and Y Weiss and L Bottou), MIT Press, Cambridge, MA, USA, 18th Annual Conference on Neural Information Processing Systems (NIPS), July 2005 (inproceedings)

Abstract
Given a directed graph in which some of the nodes are labeled, we investigate the question of how to exploit the link structure of the graph to infer the labels of the remaining unlabeled nodes. To that extent we propose a regularization framework for functions defined over nodes of a directed graph that forces the classification function to change slowly on densely linked subgraphs. A powerful, yet computationally simple classification algorithm is derived within the proposed framework. The experimental evaluation on real-world Web classification problems demonstrates encouraging results that validate our approach.

ei

Splines with non positive kernels

Canu, S., Ong, CS., Mary, X.

In 5th International ISAAC Congress, pages: 1-10, (Editors: Begehr, H. G.W., F. Nicolosi), World Scientific, Singapore, 5th International ISAAC Congress, July 2005 (inproceedings)

Abstract
Non parametric regressions methods can be presented in two main clusters. The one of smoothing splines methods requiring positive kernels and the other one known as Nonparametric Kernel Regression allowing the use of non positive kernels such as the Epanechnikov kernel. We propose a generalization of the smoothing spline method to include kernels which are still symmetric but not positive semi definite (they are called indefinite). The general relationship between smoothing spline, Reproducing Kernel Hilbert Spaces and positive kernels no longer exists with indefinite kernel. Instead they are associated with functional spaces called Reproducing Kernel Krein Spaces (RKKS) embedded with an indefinite inner product and thus not directly associated with a norm. Smothing splines in RKKS have many of the interesting properties of splines in RKHS, such as orthogon ality, projection, representer theorem and generalization bounds. We show that smoothing splines can be defined in RKKS as the regularized solution of the interpolation problem. Since no norm is available in a RKKS, Tikhonov regularization cannot be defined. Instead, we proposed to use iterative methods of conjugate gradient type with early stopping as regularization mechanism. Several iterative algorithms were collected which can be used to solve the optimization problems associated with learning in indefinite spaces. Some preliminary experiments with indefinite kernels for spline smoothing are reported revealing the computational efficiency of the approach.

ei

Kernel Methods for Implicit Surface Modeling

Schölkopf, B., Giesen, J., Spalinger, S.

In Advances in Neural Information Processing Systems 17, pages: 1193-1200, (Editors: LK Saul and Y Weiss and L Bottou), MIT Press, Cambridge, MA, USA, 18th Annual Conference on Neural Information Processing Systems (NIPS), July 2005 (inproceedings)

Abstract
We describe methods for computing an implicit model of a hypersurface that is given only by a finite sampling. The methods work by mapping the sample points into a reproducing kernel Hilbert space and then determining regions in terms of hyperplanes.

ei

Method and apparatus for measuring and monitoring distances, physical properties, and phase changes for light based on a ring-resonator
June 2005 (patent)

pf

Combining Local and Global Image Features for Object Class Recognition

Lisin, DA., Mattar, MA., Blaschko, MB., Benfield, MC., Learned-Miller, EG.

In CVPR, pages: 47-47, CVPR, June 2005 (inproceedings)

ei

Fields of Experts: A framework for learning image priors

Roth, S., Black, M. J.

In IEEE Conf. on Computer Vision and Pattern Recognition, 2, pages: 860-867, June 2005 (inproceedings)

ps

To apply score function difference based ICA algorithms to high-dimensional data

Zhang, K., Chan, L.

In Proceedings of the 13th European Symposium on Artificial Neural Networks (ESANN 2005), pages: 291-297, 13th European Symposium on Artificial Neural Networks (ESANN), April 2005 (inproceedings)

ei

Joint Regularization

Borgwardt, KM., Guttman, O., Vishwanathan, SVN., Smola, AJ.

In pages: 455-460, (Editors: Verleysen, M.), d-side, Evere, Belgium, 13th European Symposium on Artificial Neural Networks (ESANN), April 2005 (inproceedings)

Abstract
We present a principled method to combine kernels under joint regularization constraints. Central to our method is an extension of the representer theorem for handling multiple joint regularization constraints. Experimental evidence shows the feasibility of our approach.

ei

Modeling Induced Master Motion in Force-Reflecting Teleoperation

Kuchenbecker, K. J., Niemeyer, G.

In Proc. IEEE International Conference on Robotics and Automation, pages: 348-353, Barcelona, Spain, April 2005, Oral presentation given by Kuchenbecker (inproceedings)

hi

EEG Source Localization for Brain-Computer-Interfaces
In 2nd International IEEE EMBS Conference on Neural Engineering, pages: 128-131, IEEE, 2nd International IEEE EMBS Conference on Neural Engineering, March 2005 (inproceedings)

ei

Event-Based Haptics and Acceleration Matching: Portraying and Assessing the Realism of Contact

Kuchenbecker, K. J., Fiene, J. P., Niemeyer, G.

In Proc. IEEE World Haptics Conference, pages: 381-387, Pisa, Italy, March 2005, Oral presentation given by Kuchenbecker (inproceedings)

hi

On the receptive fields of Markov random fields: Predictions from a probabilistic model of scene statistics

Black, M. J., Roth, S.

COSYNE 2005, Salt Lake City, March 2005 (conference)

ps

Active Learning for Parzen Window Classifier
In AISTATS 2005, pages: 49-56, (Editors: Cowell, R. , Z. Ghahramani), Tenth International Workshop on Artificial Intelligence and Statistics (AI & Statistics), January 2005 (inproceedings)

Abstract
The problem of active learning is approached in this paper by minimizing directly an estimate of the expected test error. The main difficulty in this optimal'' strategy is that output probabilities need to be estimated accurately. We suggest here different methods for estimating those efficiently. In this context, the Parzen window classifier is considered because it is both simple and probabilistic. The analysis of experimental results highlights that regularization is a key ingredient for this strategy.

ei

Semi-Supervised Classification by Low Density Separation
In AISTATS 2005, pages: 57-64, (Editors: Cowell, R. , Z. Ghahramani), Tenth International Workshop on Artificial Intelligence and Statistics (AI & Statistics), January 2005 (inproceedings)

Abstract
We believe that the cluster assumption is key to successful semi-supervised learning. Based on this, we propose three semi-supervised algorithms: 1. deriving graph-based distances that emphazise low density regions between clusters, followed by training a standard SVM; 2. optimizing the Transductive SVM objective function, which places the decision boundary in low density regions, by gradient descent; 3. combining the first two to make maximum use of the cluster assumption. We compare with state of the art algorithms and demonstrate superior accuracy for the latter two methods.

ei

Automatic In Situ Identification of Plankton

Blaschko, MB., Holness, G., Mattar, MA., Lisin, D., Utgoff, PE., Hanson, AR., Schultz, H., Riseman, EM., Sieracki, ME., Balch, WM., Tupper, B.

In WACV, pages: 79 , WACV, January 2005 (inproceedings)

ei

Kernel Constrained Covariance for Dependence Measurement

Gretton, A., Smola, A., Bousquet, O., Herbrich, R., Belitski, A., Augath, M., Murayama, Y., Pauls, J., Schölkopf, B., Logothetis, N.

In Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics, pages: 112-119, (Editors: R Cowell, R and Z Ghahramani), AISTATS, January 2005 (inproceedings)

Abstract
We discuss reproducing kernel Hilbert space (RKHS)-based measures of statistical dependence, with emphasis on constrained covariance (COCO), a novel criterion to test dependence of random variables. We show that COCO is a test for independence if and only if the associated RKHSs are universal. That said, no independence test exists that can distinguish dependent and independent random variables in all circumstances. Dependent random variables can result in a COCO which is arbitrarily close to zero when the source densities are highly non-smooth. All current kernel-based independence tests share this behaviour. We demonstrate exponential convergence between the population and empirical COCO. Finally, we use COCO as a measure of joint neural activity between voxels in MRI recordings of the macaque monkey, and compare the results to the mutual information and the correlation. We also show the effect of removing breathing artefacts from the MRI recording.

ei

Hilbertian Metrics and Positive Definite Kernels on Probability Measures
In AISTATS 2005, pages: 136-143, (Editors: Cowell, R. , Z. Ghahramani), Tenth International Workshop on Artificial Intelligence and Statistics (AI & Statistics), January 2005 (inproceedings)

Abstract
We investigate the problem of defining Hilbertian metrics resp. positive definite kernels on probability measures, continuing previous work. This type of kernels has shown very good results in text classification and has a wide range of possible applications. In this paper we extend the two-parameter family of Hilbertian metrics of Topsoe such that it now includes all commonly used Hilbertian metrics on probability measures. This allows us to do model selection among these metrics in an elegant and unified way. Second we investigate further our approach to incorporate similarity information of the probability space into the kernel. The analysis provides a better understanding of these kernels and gives in some cases a more efficient way to compute them. Finally we compare all proposed kernels in two text and two image classification problems.

ei

Intrinsic Dimensionality Estimation of Submanifolds in Euclidean space

Hein, M., Audibert, Y.

In Proceedings of the 22nd International Conference on Machine Learning, pages: 289 , (Editors: De Raedt, L. , S. Wrobel), ICML Bonn, 2005 (inproceedings)

Abstract
We present a new method to estimate the intrinsic dimensionality of a submanifold M in Euclidean space from random samples. The method is based on the convergence rates of a certain U-statistic on the manifold. We solve at least partially the question of the choice of the scale of the data. Moreover the proposed method is easy to implement, can handle large data sets and performs very well even for small sample sizes. We compare the proposed method to two standard estimators on several artificial as well as real data sets.

ei

Large Scale Genomic Sequence SVM Classifiers

Sonnenburg, S., Rätsch, G., Schölkopf, B.

In Proceedings of the 22nd International Conference on Machine Learning, pages: 849-856, (Editors: L De Raedt and S Wrobel), ACM, New York, NY, USA, ICML, 2005 (inproceedings)

Abstract
In genomic sequence analysis tasks like splice site recognition or promoter identification, large amounts of training sequences are available, and indeed needed to achieve sufficiently high classification performances. In this work we study two recently proposed and successfully used kernels, namely the Spectrum kernel and the Weighted Degree kernel (WD). In particular, we suggest several extensions using Suffix Trees and modi cations of an SMO-like SVM training algorithm in order to accelerate the training of the SVMs and their evaluation on test sequences. Our simulations show that for the spectrum kernel and WD kernel, large scale SVM training can be accelerated by factors of 20 and 4 times, respectively, while using much less memory (e.g. no kernel caching). The evaluation on new sequences is often several thousand times faster using the new techniques (depending on the number of Support Vectors). Our method allows us to train on sets as large as one million sequences.

ei

Joint Kernel Maps
In Proceedings of the 8th InternationalWork-Conference on Artificial Neural Networks, LNCS 3512, pages: 176-191, (Editors: J Cabestany and A Prieto and F Sandoval), Springer, Berlin Heidelberg, Germany, IWANN, 2005 (inproceedings)

Abstract
We develop a methodology for solving high dimensional dependency estimation problems between pairs of data types, which is viable in the case where the output of interest has very high dimension, e.g., thousands of dimensions. This is achieved by mapping the objects into continuous or discrete spaces, using joint kernels. Known correlations between input and output can be defined by such kernels, some of which can maintain linearity in the outputs to provide simple (closed form) pre-images. We provide examples of such kernels and empirical results.

ei

Analysis of Some Methods for Reduced Rank Gaussian Process Regression

Quinonero Candela, J., Rasmussen, C.

In Switching and Learning in Feedback Systems, pages: 98-127, (Editors: Murray Smith, R. , R. Shorten), Springer, Berlin, Germany, European Summer School on Multi-Agent Control, 2005 (inproceedings)

Abstract
While there is strong motivation for using Gaussian Processes (GPs) due to their excellent performance in regression and classification problems, their computational complexity makes them impractical when the size of the training set exceeds a few thousand cases. This has motivated the recent proliferation of a number of cost-effective approximations to GPs, both for classification and for regression. In this paper we analyze one popular approximation to GPs for regression: the reduced rank approximation. While generally GPs are equivalent to infinite linear models, we show that Reduced Rank Gaussian Processes (RRGPs) are equivalent to finite sparse linear models. We also introduce the concept of degenerate GPs and show that they correspond to inappropriate priors. We show how to modify the RRGP to prevent it from being degenerate at test time. Training RRGPs consists both in learning the covariance function hyperparameters and the support set. We propose a method for learning hyperparameters for a given support set. We also review the Sparse Greedy GP (SGGP) approximation (Smola and Bartlett, 2001), which is a way of learning the support set for given hyperparameters based on approximating the posterior. We propose an alternative method to the SGGP that has better generalization capabilities. Finally we make experiments to compare the different ways of training a RRGP. We provide some Matlab code for learning RRGPs.

ei