My scientific interests are in the field of machine learning and inference from empirical data. In particular, I study kernel methods for extracting regularities from possibly high-dimensional data. These regularities are usually statistical ones, however, in recent years I have also become interested in methods for finding causal structures that underly statistical dependences. I have worked on a number of different applications of machine learning - in our field, you get "to play in everyone's backyard." Most recently, I have been trying to play in the backyard of astronomers and photographers.
With the growing interest in (how to make money with) big data, machine learning has significantly gained in popularity. We have published an article in the German newspaper FAZ, discussing some of the implications. Disclaimer: the text that appears above our names was neither written nor approved by us.
M.Sc. in mathematics and Lionel Cooper Memorial Prize, University of London (1992)
Diplom in physics (Tübingen, 1994)
doctorate in computer science from the Technical University Berlin (1997); thesis on Support Vector Learning (main advisor: V. Vapnik, AT&T Bell Labs) won the annual dissertation prize of the German Association for Computer Science (GI)
If you'd like to contact me, please consider these two notes:
1. I recently became co-editor-in-chief of JMLR. I work for JMLR because I believe in its open access model, but it takes a lot of time. During my JMLR term, please don't convince me to do other journal or grant reviewing duties.
2. I am not very organized with my e-mail so if you want to apply for a position in my lab, please send your application only to Sekretariat-Schoelkopf@tuebingen.mpg.de. Note that we do not respond to non-personalized applications that look like they are being sent to a large number of places simultaneously.
We are always happy to receive outstanding applications for PhD positions and postdocs.
Journal of Nuclear Medicine, 54(10):1768-1774, 2013 (article)
Hybrid PET/MR systems have recently entered clinical practice. Thus, the accuracy of MR-based attenuation correction in simultaneously acquired data can now be investigated. We assessed the accuracy of 4 methods of MR-based attenuation correction in lesions within soft tissue, bone, and MR susceptibility artifacts: 2 segmentation-based methods (SEG1, provided by the manufacturer, and SEG2, a method with atlas-based susceptibility artifact correction); an atlas- and pattern recognition–based method (AT&PR), which also used artifact correction; and a new method combining AT&PR and SEG2 (SEG2wBONE). Methods: Attenuation maps were calculated for the PET/MR datasets of 10 patients acquired on a whole-body PET/MR system, allowing for simultaneous acquisition of PET and MR data. Eighty percent iso-contour volumes of interest were placed on lesions in soft tissue (n = 21), in bone (n = 20), near bone (n = 19), and within or near MR susceptibility artifacts (n = 9). Relative mean volume-of-interest differences were calculated with CT-based attenuation correction as a reference. Results: For soft-tissue lesions, none of the methods revealed a significant difference in PET standardized uptake value relative to CT-based attenuation correction (SEG1, −2.6% ± 5.8%; SEG2, −1.6% ± 4.9%; AT&PR, −4.7% ± 6.5%; SEG2wBONE, 0.2% ± 5.3%). For bone lesions, underestimation of PET standardized uptake values was found for all methods, with minimized error for the atlas-based approaches (SEG1, −16.1% ± 9.7%; SEG2, −11.0% ± 6.7%; AT&PR, −6.6% ± 5.0%; SEG2wBONE, −4.7% ± 4.4%). For lesions near bone, underestimations of lower magnitude were observed (SEG1, −12.0% ± 7.4%; SEG2, −9.2% ± 6.5%; AT&PR, −4.6% ± 7.8%; SEG2wBONE, −4.2% ± 6.2%). For lesions affected by MR susceptibility artifacts, quantification errors could be reduced using the atlas-based artifact correction (SEG1, −54.0% ± 38.4%; SEG2, −15.0% ± 12.2%; AT&PR, −4.1% ± 11.2%; SEG2wBONE, 0.6% ± 11.1%). Conclusion: For soft-tissue lesions, none of the evaluated methods showed statistically significant errors. For bone lesions, significant underestimations of −16% and −11% occurred for methods in which bone tissue was ignored (SEG1 and SEG2). In the present attenuation correction schemes, uncorrected MR susceptibility artifacts typically result in reduced attenuation values, potentially leading to highly reduced PET standardized uptake values, rendering lesions indistinguishable from background. While AT&PR and SEG2wBONE show accurate results in both soft tissue and bone, SEG2wBONE uses a two-step approach for tissue classification, which increases the robustness of prediction and can be applied retrospectively if more precision in bone areas is needed.
In Advances in Neural Information Processing Systems 26, pages: 2535-2543, (Editors: C.J.C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger), 27th Annual Conference on Neural Information Processing Systems (NIPS), 2013 (inproceedings)
In Proceedings of the Fifth International Brain-Computer Interface Meeting: Defining the Future, pages: Article ID: 086, (Editors: J.d.R. Millán, S. Gao, R. Müller-Putz, J.R. Wolpaw, and J.E. Huggins), Verlag der Technischen Universität Graz, 5th International Brain-Computer Interface Meeting, 2013, Article ID: 086 (inproceedings)
In Advances in Neural Information Processing Systems 26, pages: 154-162, (Editors: C.J.C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger), 27th Annual Conference on Neural Information Processing Systems (NIPS), 2013 (inproceedings)
Journal of Neural Engineering, 9(2):026011, February 2012 (article)
We report on the development and online testing of an electroencephalogram-based brain–computer interface (BCI) that aims to be usable by completely paralysed users—for whom visual or motor-system-based BCIs may not be suitable, and among whom reports of successful BCI use have so far been very rare. The current approach exploits covert shifts of attention to auditory stimuli in a dichotic-listening stimulus design. To compare the efficacy of event-related potentials (ERPs) and steady-state auditory evoked potentials (SSAEPs), the stimuli were designed such that they elicited both ERPs and SSAEPs simultaneously. Trial-by-trial feedback was provided online, based on subjects' modulation of N1 and P3 ERP components measured during single 5 s stimulation intervals. All 13 healthy subjects were able to use the BCI, with performance in a binary left/right choice task ranging from 75% to 96% correct across subjects (mean 85%). BCI classification was based on the contrast between stimuli in the attended stream and stimuli in the unattended stream, making use of every stimulus, rather than contrasting frequent standard and rare 'oddball' stimuli. SSAEPs were assessed offline: for all subjects, spectral components at the two exactly known modulation frequencies allowed discrimination of pre-stimulus from stimulus intervals, and of left-only stimuli from right-only stimuli when one side of the dichotic stimulus pair was muted. However, attention modulation of SSAEPs was not sufficient for single-trial BCI communication, even when the subject's attention was clearly focused well enough to allow classification of the same trials via ERPs. ERPs clearly provided a superior basis for BCI. The ERP results are a promising step towards the development of a simple-to-use, reliable yes/no communication system for users in the most severely paralysed states, as well as potential attention-monitoring and -training applications outside the context of assistive technology.
Journal of Machine Learning Research, 13, pages: 723-773, March 2012 (article)
We propose a framework for analyzing and comparing distributions, which we use to construct statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS), and is called the maximum mean discrepancy (MMD). We present two distribution-free tests based on large deviation bounds for the MMD, and a third test based on the asymptotic distribution of this statistic. The MMD can be computed in quadratic time, although efficient linear time approximations are available. Our statistic is an instance of an integral probability metric, and various classical metrics on distributions are obtained when alternative function classes are used in place of an RKHS. We apply our two-sample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests.
In Computer Vision - ECCV 2012, LNCS Vol. 7574, pages: 187-200, (Editors: A Fitzgibbon, S Lazebnik, P Perona, Y Sato, and C Schmid), Springer, Berlin, Germany, 12th IEEE European Conference on Computer Vision, ECCV, 2012 (inproceedings)
Camera lenses are a critical component of optical imaging systems, and lens imperfections compromise image quality. While traditionally, sophisticated lens design and quality control aim at limiting optical aberrations, recent works [1,2,3] promote the correction of optical flaws by computational means. These approaches rely on elaborate measurement procedures to characterize an optical system, and perform image correction by non-blind deconvolution.
In this paper, we present a method that utilizes physically plausible assumptions to estimate non-stationary lens aberrations blindly, and thus can correct images without knowledge of specifics of camera and lens. The blur estimation features a novel preconditioning step that enables fast deconvolution. We obtain results that are competitive with state-of-the-art non-blind approaches.
In Advances in Neural Information Processing Systems 25, pages: 189-196, (Editors: P Bartlett, FCN Pereira, CJC. Burges, L Bottou, and KQ Weinberger), Curran Associates Inc., 26th Annual Conference on Neural Information Processing Systems (NIPS), 2012 (inproceedings)
(3), Max-Planck-Institut für Intelligente Systeme, Tübingen, February 2012 (techreport)
Subjects operating a brain-computer interface (BCI) based on sensorimotor rhythms exhibit large variations in performance over the course of an experimental session. Here, we show that high-frequency gamma-oscillations, originating in fronto-parietal networks, predict such variations on a trial-to-trial basis. We interpret this nding as empirical support for an in uence of attentional networks on BCI-performance via modulation of the sensorimotor rhythm.
In Proceedings of Robotics: Science and Systems VIII, pages: 8, R:SS, 2012 (inproceedings)
Inference of human intention may be an essential step towards understanding human actions  and is hence
important for realizing efficient human-robot interaction. In this paper, we propose the Intention-Driven Dynamics Model (IDDM), a latent variable model for inferring unknown human intentions. We train the model based on observed human behaviors/actions and we introduce an approximate inference algorithm to efficiently infer the human’s intention from an ongoing action.
We verify the feasibility of the IDDM in two scenarios, i.e., target inference in robot table tennis and action recognition for interactive humanoid robots. In both tasks, the IDDM achieves substantial improvements over state-of-the-art regression and classification.
20th Annual Scientific Meeting ISMRM, May 2012 (poster)
Patient motion in the scanner is one of the most challenging problems in MRI. We propose a new retrospective motion correction method for which no tracking devices or specialized sequences are required. We seek the motion parameters such that the image gradients in the spatial domain become sparse. We then use these parameters to invert the motion and recover the sharp image. In our experiments we acquired 2D TSE images and 3D FLASH/MPRAGE volumes of the human head. Major quality improvements are possible in the 2D case and substantial improvements in the 3D case.
In Advances in Neural Information Processing Systems 25, pages: 10-18, (Editors: P Bartlett, FCN Pereira, CJC. Burges, L Bottou, and KQ Weinberger), Curran Associates Inc., 26th Annual Conference on Neural Information Processing Systems (NIPS), 2012 (inproceedings)
Artificial Intelligence, 182-183, pages: 1-31, May 2012 (article)
While conventional approaches to causal inference are mainly based on conditional (in)dependences, recent methods also account for the shape of (conditional) distributions. The idea is that the causal hypothesis “X causes Y” imposes that the marginal distribution PX and the conditional distribution PY|X represent independent mechanisms of nature. Recently it has been postulated that the shortest description of the joint distribution PX,Y should therefore be given by separate descriptions of PX and PY|X. Since description length in the sense of Kolmogorov complexity is uncomputable, practical implementations rely on other notions of independence. Here we define independence via orthogonality in information space. This way, we can explicitly describe the kind of dependence that occurs between PY and PX|Y making the causal hypothesis “Y causes X” implausible. Remarkably, this asymmetry between cause and effect becomes particularly simple if X and Y are deterministically related. We present an inference method that works in this case. We also discuss some theoretical results for the non-deterministic case although it is not clear how to employ them for a more general inference method.
Kam-Thong, T., Azencott, C., Cayton, L., Pütz, B., Altmann, A., Karbalai, N., Sämann, P., Schölkopf, B., Müller-Myhsok, B., Borgwardt, K.
Human Heredity, 73(4):220-236, September 2012 (article)
Due to recent advances in genotyping technologies, mapping phenotypes to single loci in the genome has become a standard technique in statistical genetics. However, one-locus mapping fails to explain much of the phenotypic variance in complex traits. Here, we present GLIDE, which maps phenotypes to pairs of genetic loci and systematically searches for the epistatic interactions expected to reveal part of this missing heritability. GLIDE makes use of the computational power of consumer-grade graphics cards to detect such interactions via linear regression. This enabled us to conduct a systematic two-locus mapping study on seven disease data sets from the Wellcome Trust Case Control Consortium and on in-house hippocampal volume data in 6 h per data set, while current single CPU-based approaches require more than a year’s time to complete the same task.
In Computer Vision - ECCV 2012, LNCS Vol. 7578, pages: 27-40, (Editors: A. Fitzgibbon, S. Lazebnik, P. Perona, Y. Sato, and C. Schmid), Springer, Berlin, Germany, 12th European Conference on Computer Vision, ECCV , 2012 (inproceedings)
Motion blur due to camera shake is one of the predominant sources of degradation in handheld photography. Single image blind deconvolution (BD) or motion deblurring aims at restoring a sharp latent image from the blurred recorded picture without knowing the camera motion that took place during the exposure. BD is a long-standing problem, but has attracted much attention recently, cumulating in several algorithms able to restore photos degraded by real camera motion in high quality. In this paper, we present a benchmark dataset for motion deblurring that allows quantitative performance evaluation and comparison of recent approaches featuring non-uniform blur models. To this end, we record and analyse real camera motion, which is played back on a robot platform such that we can record a sequence of sharp images sampling the six dimensional camera motion trajectory. The goal of deblurring is to recover one of these sharp images, and our dataset contains all information to assess how closely various algorithms approximate that goal. In a comprehensive comparison, we evaluate state-of-the-art single image BD algorithms incorporating uniform and non-uniform blur models.
Journal of Neural Engineering, 9(4):046001, May 2012 (article)
Subjects operating a brain–computer interface (BCI) based on sensorimotor rhythms exhibit large variations in performance over the course of an experimental session. Here, we show that
high-frequency γ-oscillations, originating in fronto-parietal networks, predict such variations on a trial-to-trial basis. We interpret this finding as empirical support for an influence of attentional networks on BCI performance via modulation of the sensorimotor rhythm.
In Advances in Neural Information Processing Systems 25, pages: 674-682, (Editors: P Bartlett, FCN Pereira, CJC. Burges, L Bottou, and KQ Weinberger), Curran Associates Inc., 26th Annual Conference on Neural Information Processing Systems (NIPS), 2012 (inproceedings)
eiLopez-Paz, D., Hernandez-Lobato, J., Schölkopf, B.Semi-Supervised Domain Adaptation with Copulas
In Advances in Neural Information Processing Systems 25, pages: 674-682, (Editors: P Bartlett, FCN Pereira, CJC. Burges, L Bottou, and KQ Weinberger), Curran Associates Inc., 26th Annual Conference on Neural Information Processing Systems (NIPS), 2012 (inproceedings)
In Advances in Neural Information Processing Systems 24, pages: 765-773, (Editors: Shawe-Taylor, John and Zemel, Richard S. and Bartlett, Peter L. and Pereira, Fernando C. N. and Weinberger, Kilian Q.), Curran Associates, Inc., Red Hook, NY, USA, Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)
We address the challenging task of decoupling material properties from lighting properties given a single image. In the last two decades virtually all works have concentrated on exploiting edge information to address this problem. We take a different route by introducing a new prior on reflectance, that models reflectance values as being drawn from a sparse set of basis colors. This results in a Random Field model with global, latent variables (basis colors) and pixel-accurate output reflectance values. We show that without edge information high-quality results can be achieved, that are on par with methods exploiting this source of information. Finally, we are able to improve on state-of-the-art results by integrating edge information into our model. We believe that our new approach is an excellent starting point for future developments in this field.
Neural Computation, 23(1):160-182, January 2011 (article)
We present a graphical model framework for decoding in the visual ERP-based speller system. The proposed framework allows researchers to build generative models from which the decoding rules are obtained in a straightforward manner. We suggest two models for generating brain signals conditioned on the stimulus events. Both models incorporate letter frequency information but assume different dependencies between brain signals and stimulus events. For both models, we derive decoding rules and perform a discriminative training. We show on real visual speller data how decoding performance improves by incorporating letter frequency information and using a more realistic graphical model for the dependencies between the brain signals and the stimulus events. Furthermore, we discuss how the standard approach to decoding can be seen as a special case of the graphical model framework. The letter also gives more insight into the discriminative approach for decoding in the visual speller system.
In pages: 2080-2083 , IEEE, Piscataway, NJ, USA, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , May 2011 (inproceedings)
Cross-spectral density (CSD), is widely used to find linear dependency between two real or complex valued time series. We define a non-linear extension of this measure by mapping the time series into two Reproducing Kernel Hilbert Spaces. The dependency is quantified by the Hilbert Schmidt norm of a cross-spectral density operator between these two spaces. We prove that, by choosing a characteristic kernel for the mapping, this quantity detects any pairwise dependency between the time series. Then we provide a fast estimator for the Hilbert-Schmidt norm based on the Fast Fourier Trans form. We demonstrate the interest of this approach to quantify non-linear dependencies between frequency bands of simulated signals and intra-cortical neural recordings.
Journal of Computational Biology, 18(3):335-346, March 2011 (article)
Cryo-electron microscopy (cryo-EM) plays an increasingly prominent role in structure elucidation of macromolecular assemblies. Advances in experimental instrumentation and computational power have spawned numerous cryo-EM studies of large biomolecular complexes resulting in the reconstruction of three-dimensional density maps at intermediate and low resolution. In this resolution range, identification and interpretation of structural elements and modeling of biomolecular structure with atomic detail becomes problematic. In this article, we present a novel algorithm that enhances the resolution of intermediate- and low-resolution density maps. Our underlying assumption is to model the low-resolution density map as a blurred and possibly noise-corrupted version of an unknown high-resolution map that we seek to recover by deconvolution. By exploiting the nonnegativity of both the high-resolution map and blur kernel, we derive multiplicative updates reminiscent of those used in nonnegative matrix factorization. Our framework allows for easy incorporation of additional prior knowledge such as smoothness and sparseness, on both the sharpened density map and the blur kernel. A probabilistic formulation enables us to derive updates for the hyperparameters; therefore, our approach has no parameter that needs adjustment. We apply the algorithm to simulated three-dimensional electron microscopic data. We show that our method provides better resolved density maps when compared with B-factor sharpening, especially in the presence of noise. Moreover, our method can use additional information provided by homologous structures, which helps to improve the resolution even further.
In Proceedings of the 58th World Statistics Congress, pages: 4456-4461, ISI, August 2011 (inproceedings)
We develop a novel method for detection of signals and reconstruction of images in the presence of random noise. The method uses results from percolation theory. We specifically address the problem of detection of multiple objects of unknown shapes in the case of nonparametric noise. The noise density is unknown and can be heavy-tailed. The objects of interest have unknown varying intensities. No boundary shape constraints are imposed on the objects, only a set of weak bulk conditions is required. We view the object detection problem as hypothesis testing for discrete statistical inverse problems. We present an algorithm that allows to detect greyscale objects of various shapes in noisy images. We prove results on consistency and algorithmic complexity of our procedures. Applications to cryo-electron microscopy are presented.
In pages: 383-391, (Editors: FG Cozman and A Pfeffer), AUAI Press, Corvallis, OR, USA, 27th Conference on Uncertainty in Artificial Intelligence (UAI), July 2011 (inproceedings)
We describe a method that infers whether statistical dependences between two observed variables X and Y are due to a \direct" causal link or only due to a connecting causal
path that contains an unobserved variable of low complexity, e.g., a binary variable. This problem is motivated by statistical genetics. Given a genetic marker that is correlated with a phenotype of interest, we want to
detect whether this marker is causal or it only correlates with a causal one. Our method is based on the analysis of the location of the conditional distributions P(Y jx) in the simplex of all distributions of Y . We report encouraging results on semi-empirical data.
In pages: 332-337 , (Editors: NM Amato), IEEE, Piscataway, NJ, USA, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), September 2011 (inproceedings)
Playing table tennis is a difficult task for robots, especially due to their limitations of acceleration. A key bottleneck is the amount of time needed to reach the desired hitting position and velocity of the racket for returning the incoming ball. Here, it often does not suffice to simply extrapolate the ball's trajectory after the opponent returns it but more information is needed. Humans are able to predict the ball's trajectory based on the opponent's moves and, thus, have a considerable advantage. Hence, we propose to incorporate an anticipation system into robot table tennis players, which enables the robot to react earlier while the opponent is performing the striking movement. Based on visual observation of the opponent's racket movement, the robot can predict the aim of the opponent and adjust its movement generation accordingly. The policies for deciding how and when to react are obtained by reinforcement learning. We conduct experiments with an existing robot player to show that the learned reaction policy can significantly improve the performance of the overall system.
2011(MIC18.M-96), 2011 IEEE Nuclear Science Symposium, Medical Imaging Conference (NSS-MIC), October 2011 (poster)
Combined PET/MR provides simultaneous molecular and functional information in an anatomical context with unique soft tissue contrast. However, PET/MR does not support direct derivation of attenuation maps of objects and tissues within the measured PET field-of-view. Valid attenuation maps are required for quantitative PET imaging, specifically for scientific brain studies. Therefore, several methods have been proposed for MR-based attenuation correction (MR-AC). Last year, we performed an evaluation of different MR-AC methods, including simple MR thresholding, atlas- and machine learning-based MR-AC. CT-based AC served as gold standard reference. RoIs from 2 anatomic brain atlases with different levels of detail were used for evaluation of correction accuracy. We now extend our evaluation of different MR-AC methods by using an enlarged dataset of 23 patients from the integrated BrainPET/MR (Siemens Healthcare). Further, we analyze options for improving the MR-AC performance in terms of speed and accuracy. Finally, we assess the impact of ignoring BrainPET positioning aids during the course of MR-AC. This extended study confirms the overall prediction accuracy evaluation results of the first evaluation in a larger patient population. Removing datasets affected by metal artifacts from the Atlas-Patch database helped to improve prediction accuracy, although the size of the database was reduced by one half. Significant improvement in prediction speed can be gained at a cost of only slightly reduced accuracy, while further optimizations are still possible.
Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems