Felix Wichmann is Professor for Neural Information Processing at the Eberhard Karls Universität Tübingen and an adjunct senior research scientist at the MPI-IS. For details about his research go to the WWW-pages of the Wichmann-Lab at the Eberhard Karls Universität Tübingen.
Felix Wichmann's work explores synergies between machine learning and psychophysics. He investigates perceptual processes from the level of simple artificial stimuli up to complex object and scene perception, combining psychophysical experiments and computational modeling driven by methods of machine learning and inference. Currently he has three main research foci: First, to uncover the critical features observers use in complex perceptual tasks. Second, to develop a computationally efficient model of early spatial vision. Finally we have begun to investigate the visual perception of causality, particularly whether human observers are sensitive to "the arrow of time."
Barthelmé, S., Trukenbrod, H., Engbert, R., Wichmann, F.
Journal of Vision, 13(12):1-34, 2013 (article)
Whenever eye movements are measured, a central part of the analysis has to do with where subjects fixate and why they fixated where they fixated. To a first approximation, a set of fixations can be viewed as a set of points in space; this implies that fixations are spatial data and that the analysis of fixation locations can be beneficially thought of as a spatial statistics problem. We argue that thinking of fixation locations as arising from point processes is a very fruitful framework for eye-movement data, helping turn qualitative questions into quantitative ones. We provide a tutorial introduction to some of the main ideas of the field of spatial statistics, focusing especially on spatial Poisson processes. We show how point processes help relate image properties to fixation locations. In particular we show how point processes naturally express the idea that image features' predictability for fixations may vary from one image to another. We review other methods of analysis used in the literature, show how they relate to point process theory, and argue that thinking in terms of point processes substantially extends the range of analyses that can be performed and clarify their interpretation.
PLoS Computational Biology, 9(1):e1002873, January 2013 (article)
Several aspects of primate visual physiology have been identified as adaptations to local regularities of natural images. However, much less work has measured visual sensitivity to local natural image regularities. Most previous work focuses on global perception of large images and shows that observers are more sensitive to visual information when image properties resemble those of natural images. In this work we measure human sensitivity to local natural image regularities using stimuli generated by patch-based probabilistic natural image models that have been related to primate visual physiology. We find that human observers can learn to discriminate the statistical regularities of natural image patches from those represented by current natural image models after very few exposures and that discriminability depends on the degree of regularities captured by the model. The quick learning we observed suggests that the human visual system is biased for processing natural images, even at very fine spatial scales, and that it has a surprisingly large knowledge of the regularities in natural images, at least in comparison to the state-of-the-art statistical models of natural images.
PLoS Computational Biology, 8(4):1-13, April 2012 (article)
Several studies have reported optimal population decoding of sensory responses in two-alternative visual discrimination tasks. Such decoding involves integrating noisy neural responses into a more reliable representation of the likelihood that the stimuli under consideration evoked the observed responses. Importantly, an ideal observer must be able to evaluate likelihood with high precision and only consider the likelihood of the two relevant stimuli involved in the discrimination task. We report a new perceptual bias suggesting that observers read out the likelihood representation with remarkably low precision when discriminating grating spatial frequencies. Using spectrally filtered noise, we induced an asymmetry in the likelihood function of spatial frequency. This manipulation mainly affects the likelihood of spatial frequencies that are irrelevant to the task at hand. Nevertheless, we find a significant shift in perceived grating frequency, indicating that observers evaluate likelihoods of a broad range of irrelevant frequencies and discard prior knowledge of stimulus alternatives when performing two-alternative discrimination.
Journal of the Acoustical Society of America, 131(5):3953-3969, May 2012 (article)
As a prerequisite to quantitative psychophysical models of sensory processing it is necessary to learn to what extent decisions in behavioral tasks depend on specific stimulus features, the perceptual cues. Based on relative linear combination weights, this study demonstrates how stimulus-response data can be analyzed in this regard relying on an L1-regularized multiple logistic regression, a modern statistical procedure developed in machine learning. This method prevents complex models from over-fitting to noisy data. In addition, it enforces “sparse” solutions, a computational approximation to the postulate that a good model should contain the minimal set of predictors necessary to explain the data. In simulations, behavioral data from a classical auditory tone-in-noise detection task were generated. The proposed method is shown to precisely identify observer cues from a large set of covarying, interdependent stimulus features—a setting where standard correlational and regression methods fail. The proposed method succeeds for a wide range of signal-to-noise ratios and for deterministic as well as probabilistic observers. Furthermore, the detailed decision rules of the simulated observers were reconstructed from the estimated linear model weights allowing predictions of responses on the basis of individual stimuli.
Measuring sensitivity is at the heart of psychophysics. Often, sensitivity is derived from estimates of the psychometric function. This function relates response probability to stimulus intensity. In estimating these response probabilities, most studies assume stationary observers: Responses are expected to be dependent only on the intensity of a presented stimulus and not on other factors such as stimulus sequence, duration of the experiment, or the responses on previous trials. Unfortunately, a number of factors such as learning, fatigue, or fluctuations in attention and motivation will typically result in violations of this assumption. The severity of these violations is yet unknown. We use Monte Carlo simulations to show that violations of these assumptions can result in underestimation of confidence intervals for parameters of the psychometric function. Even worse, collecting more trials does not eliminate this misestimation of confidence intervals. We present a simple adjustment of the confidence intervals that corrects for the underestimation almost independently of the number of trials and the particular type of violation.
Journal of Vision, 10(4):1-27, April 2010 (article)
S. J. Thorpe, D. Fize, and C. Marlot (1996) showed how rapidly observers can detect animals in images of natural scenes, but it is still unclear which image features support this rapid detection. A. B. Torralba and A. Oliva (2003) suggested that a simple image statistic based on the power spectrum allows the absence or presence of objects in natural scenes to be predicted. We tested whether human observers make use of power spectral differences between image categories when detecting animals in natural scenes. In Experiments 1 and 2 we found performance to be essentially independent of the power spectrum. Computational analysis revealed that the ease of classification correlates with the proposed spectral cue without being caused by it. This result is consistent with the hypothesis that in commercial stock photo databases a majority of animal images are pre-segmented from the background by the photographers and this pre-segmentation causes the power spectral differences between image categories and may, furthermore, help rapid animal detection. Data from a third experiment are consistent with this hypothesis. Together, our results make it exceedingly unlikely that human observers make use of power spectral differences between animal- and no-animal images during rapid animal detection. In addition, our results point to potential confounds in the commercially available “natural image” databases whose statistics may be less natural than commonly presumed.
Journal of Vision, 10(5:22):1-24, May 2010 (article)
One major challenge in the sensory sciences is to identify the stimulus features on which sensory systems base their computations, and which are predictive of a behavioral decision: they are a prerequisite for computational models of perception. We describe a technique (decision images) for extracting predictive stimulus features using logistic regression. A decision image not only defines a region of interest within a stimulus but is a quantitative template which defines a direction in stimulus space. Decision images thus enable the development of predictive models, as well as the generation of optimized stimuli for subsequent psychophysical investigations. Here we describe our method and apply it to data from a human face classification experiment. We show that decision images are able to predict human responses not only in terms of overall percent correct but also in terms of the probabilities with which individual faces are (mis-) classified by individual observers. We show that the most predictive dimension for gender categorization is neither aligned with the axis defined by the two class-means, nor with the first principal component of all faces-two hypotheses frequently entertained in the literature. Our method can be applied to a wide range of binary classification tasks in vision or other psychophysical contexts.
Trends in Cognitive Sciences, 13(9):381-388, September 2009 (article)
Kernel methods are among the most successful tools in machine learning and are used in challenging data analysis problems in many disciplines. Here we provide examples where kernel methods have proven to be powerful tools for analyzing behavioral data, especially for identifying features in categorization experiments. We also demonstrate that kernel methods relate to perceptrons and exemplar models of categorization. Hence, we argue that kernel methods have neural and psychological plausibility, and theoretical results concerning their behavior are therefore potentially relevant for human category learning. In particular, we believe kernel methods have the potential to provide explanations ranging from the implementational via the algorithmic to the computational level.
Journal of Vision, 9(8):31, 9th Annual Meeting of the Vision Sciences Society (VSS), August 2009 (poster)
One of the main challenges in the sensory sciences is to identify the stimulus features on which the sensory systems base their computations: they are a pre-requisite for computational models of perception. We describe a technique---decision-images--- for extracting critical stimulus features based on logistic regression. Rather than embedding the stimuli in noise, as is done in classification image analysis, we want to infer the important features directly from physically heterogeneous stimuli. A Decision-image not only defines the critical region-of-interest within a stimulus but is a quantitative template which defines a direction in stimulus space. Decision-images thus enable the development of predictive models, as well as the generation of optimized stimuli for subsequent psychophysical investigations. Here we describe our method and apply it to data from a human face discrimination experiment. We show that decision-images are able to predict human responses not only in terms of overall percent correct but are able to predict, for individual observers, the probabilities with which individual faces are (mis-) classified. We then test the predictions of the models using optimized stimuli. Finally, we discuss possible generalizations of the approach and its relationships with other models.
Journal of Vision, 9(5:7):1-15, May 2009 (article)
The human visual system is foveated, that is, outside the central visual field resolution and acuity drop rapidly. Nonetheless much of a visual scene is perceived after only a few saccadic eye movements, suggesting an effective strategy for selecting saccade targets. It has been known for some time that local image structure at saccade targets influences the selection process. However, the question of what the most relevant visual features are is still under debate. Here we show that center-surround patterns emerge as the optimal solution for predicting saccade targets from their local image structure. The resulting model, a one-layer feed-forward network, is surprisingly simple compared to previously suggested models which assume much more complex computations such as multi-scale processing and multiple feature channels. Nevertheless, our model is equally predictive. Furthermore, our findings are consistent with neurophysiological hardware in the superior colliculus. Bottom-up visual saliency may thus not be computed cortically as has been thought previously.
Journal of Vision, 9(8):article 32, 2009 (article)
For simple visual patterns under the experimenter's control we impose which information, or features, an observer can use to solve a given perceptual task. For natural vision tasks, however, there are typically a multitude of potential features in a given visual scene which the visual system may be exploiting when analyzing it: edges, corners, contours, etc. Here we describe a novel non-linear system identification technique based on modern machine learning methods that allows the critical features an observer uses to be inferred directly from the observer's data. The method neither requires stimuli to be embedded in noise nor is it limited to linear perceptive fields (classification images). We demonstrate our technique by deriving the critical image features observers fixate in natural scenes (bottom-up visual saliency). Unlike previous studies where the relevant structure is determined manuallyâ€”e.g. by selecting Gabors as visual filtersâ€”we do not make any assumptions in this regard, but numerically infer number and properties them from the eye-movement data. We show that center-surround patterns emerge as the optimal solution for predicting saccade targets from local image structure. The resulting model, a one-layer feed-forward network with contrast gain-control, is surprisingly simple compared to previously suggested saliency models. Nevertheless, our model is equally predictive. Furthermore, our findings are consistent with neurophysiological hardware in the superior colliculus. Bottom-up visual saliency may thus not be computed cortically as has been thought previously.
Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems