Header logo is


2018


Probabilistic Solutions To Ordinary Differential Equations As Non-Linear Bayesian Filtering: A New Perspective
Probabilistic Solutions To Ordinary Differential Equations As Non-Linear Bayesian Filtering: A New Perspective

Tronarp, F., Kersting, H., Särkkä, S., Hennig, P.

ArXiv preprint 2018, arXiv:1810.03440 [stat.ME], October 2018 (article)

Abstract
We formulate probabilistic numerical approximations to solutions of ordinary differential equations (ODEs) as problems in Gaussian process (GP) regression with non-linear measurement functions. This is achieved by defining the measurement sequence to consists of the observations of the difference between the derivative of the GP and the vector field evaluated at the GP---which are all identically zero at the solution of the ODE. When the GP has a state-space representation, the problem can be reduced to a Bayesian state estimation problem and all widely-used approximations to the Bayesian filtering and smoothing problems become applicable. Furthermore, all previous GP-based ODE solvers, which were formulated in terms of generating synthetic measurements of the vector field, come out as specific approximations. We derive novel solvers, both Gaussian and non-Gaussian, from the Bayesian state estimation problem posed in this paper and compare them with other probabilistic solvers in illustrative experiments.

pn

link (url) Project Page [BibTex]

2018



no image
Convergence Rates of Gaussian ODE Filters

Kersting, H., Sullivan, T. J., Hennig, P.

arXiv preprint 2018, arXiv:1807.09737 [math.NA], July 2018 (article)

Abstract
A recently-introduced class of probabilistic (uncertainty-aware) solvers for ordinary differential equations (ODEs) applies Gaussian (Kalman) filtering to initial value problems. These methods model the true solution $x$ and its first $q$ derivatives a priori as a Gauss--Markov process $\boldsymbol{X}$, which is then iteratively conditioned on information about $\dot{x}$. We prove worst-case local convergence rates of order $h^{q+1}$ for a wide range of versions of this Gaussian ODE filter, as well as global convergence rates of order $h^q$ in the case of $q=1$ and an integrated Brownian motion prior, and analyse how inaccurate information on $\dot{x}$ coming from approximate evaluations of $f$ affects these rates. Moreover, we present explicit formulas for the steady states and show that the posterior confidence intervals are well calibrated in all considered cases that exhibit global convergence---in the sense that they globally contract at the same rate as the truncation error.

pn

link (url) Project Page [BibTex]

link (url) Project Page [BibTex]


no image
Rational metareasoning and the plasticity of cognitive control

Lieder, F., Shenhav, A., Musslick, S., Griffiths, T. L.

PLOS Computational Biology, 14(4):e1006043, Public Library of Science, April 2018 (article)

Abstract
The human brain has the impressive capacity to adapt how it processes information to high-level goals. While it is known that these cognitive control skills are malleable and can be improved through training, the underlying plasticity mechanisms are not well understood. Here, we develop and evaluate a model of how people learn when to exert cognitive control, which controlled process to use, and how much effort to exert. We derive this model from a general theory according to which the function of cognitive control is to select and configure neural pathways so as to make optimal use of finite time and limited computational resources. The central idea of our Learned Value of Control model is that people use reinforcement learning to predict the value of candidate control signals of different types and intensities based on stimulus features. This model correctly predicts the learning and transfer effects underlying the adaptive control-demanding behavior observed in an experiment on visual attention and four experiments on interference control in Stroop and Flanker paradigms. Moreover, our model explained these findings significantly better than an associative learning model and a Win-Stay Lose-Shift model. Our findings elucidate how learning and experience might shape people’s ability and propensity to adaptively control their minds and behavior. We conclude by predicting under which circumstances these learning mechanisms might lead to self-control failure.

re

Rational metareasoning and the plasticity of cognitive control DOI Project Page Project Page [BibTex]

Rational metareasoning and the plasticity of cognitive control DOI Project Page Project Page [BibTex]


no image
Over-Representation of Extreme Events in Decision Making Reflects Rational Use of Cognitive Resources

Lieder, F., Griffiths, T. L., Hsu, M.

Psychological Review, 125(1):1-32, January 2018 (article)

Abstract
People’s decisions and judgments are disproportionately swayed by improbable but extreme eventualities, such as terrorism, that come to mind easily. This article explores whether such availability biases can be reconciled with rational information processing by taking into account the fact that decision-makers value their time and have limited cognitive resources. Our analysis suggests that to make optimal use of their finite time decision-makers should over-represent the most important potential consequences relative to less important, put potentially more probable, outcomes. To evaluate this account we derive and test a model we call utility-weighted sampling. Utility-weighted sampling estimates the expected utility of potential actions by simulating their outcomes. Critically, outcomes with more extreme utilities have a higher probability of being simulated. We demonstrate that this model can explain not only people’s availability bias in judging the frequency of extreme events but also a wide range of cognitive biases in decisions from experience, decisions from description, and memory recall.

re

DOI [BibTex]

DOI [BibTex]


no image
Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences

Kanagawa, M., Hennig, P., Sejdinovic, D., Sriperumbudur, B. K.

Arxiv e-prints, arXiv:1805.08845v1 [stat.ML], 2018 (article)

Abstract
This paper is an attempt to bridge the conceptual gaps between researchers working on the two widely used approaches based on positive definite kernels: Bayesian learning or inference using Gaussian processes on the one side, and frequentist kernel methods based on reproducing kernel Hilbert spaces on the other. It is widely known in machine learning that these two formalisms are closely related; for instance, the estimator of kernel ridge regression is identical to the posterior mean of Gaussian process regression. However, they have been studied and developed almost independently by two essentially separate communities, and this makes it difficult to seamlessly transfer results between them. Our aim is to overcome this potential difficulty. To this end, we review several old and new results and concepts from either side, and juxtapose algorithmic quantities from each framework to highlight close similarities. We also provide discussions on subtle philosophical and theoretical differences between the two approaches.

pn ei

arXiv [BibTex]

arXiv [BibTex]


no image
Counterfactual Mean Embedding: A Kernel Method for Nonparametric Causal Inference

Muandet, K., Kanagawa, M., Saengkyongam, S., Marukata, S.

Arxiv e-prints, arXiv:1805.08845v1 [stat.ML], 2018 (article)

Abstract
This paper introduces a novel Hilbert space representation of a counterfactual distribution---called counterfactual mean embedding (CME)---with applications in nonparametric causal inference. Counterfactual prediction has become an ubiquitous tool in machine learning applications, such as online advertisement, recommendation systems, and medical diagnosis, whose performance relies on certain interventions. To infer the outcomes of such interventions, we propose to embed the associated counterfactual distribution into a reproducing kernel Hilbert space (RKHS) endowed with a positive definite kernel. Under appropriate assumptions, the CME allows us to perform causal inference over the entire landscape of the counterfactual distribution. The CME can be estimated consistently from observational data without requiring any parametric assumption about the underlying distributions. We also derive a rate of convergence which depends on the smoothness of the conditional mean and the Radon-Nikodym derivative of the underlying marginal distributions. Our framework can deal with not only real-valued outcome, but potentially also more complex and structured outcomes such as images, sequences, and graphs. Lastly, our experimental results on off-policy evaluation tasks demonstrate the advantages of the proposed estimator.

ei pn

arXiv [BibTex]

arXiv [BibTex]


no image
Model-based Kernel Sum Rule: Kernel Bayesian Inference with Probabilistic Models

Nishiyama, Y., Kanagawa, M., Gretton, A., Fukumizu, K.

Arxiv e-prints, arXiv:1409.5178v2 [stat.ML], 2018 (article)

Abstract
Kernel Bayesian inference is a powerful nonparametric approach to performing Bayesian inference in reproducing kernel Hilbert spaces or feature spaces. In this approach, kernel means are estimated instead of probability distributions, and these estimates can be used for subsequent probabilistic operations (as for inference in graphical models) or in computing the expectations of smooth functions, for instance. Various algorithms for kernel Bayesian inference have been obtained by combining basic rules such as the kernel sum rule (KSR), kernel chain rule, kernel product rule and kernel Bayes' rule. However, the current framework only deals with fully nonparametric inference (i.e., all conditional relations are learned nonparametrically), and it does not allow for flexible combinations of nonparametric and parametric inference, which are practically important. Our contribution is in providing a novel technique to realize such combinations. We introduce a new KSR referred to as the model-based KSR (Mb-KSR), which employs the sum rule in feature spaces under a parametric setting. Incorporating the Mb-KSR into existing kernel Bayesian framework provides a richer framework for hybrid (nonparametric and parametric) kernel Bayesian inference. As a practical application, we propose a novel filtering algorithm for state space models based on the Mb-KSR, which combines the nonparametric learning of an observation process using kernel mean embedding and the additive Gaussian noise model for a state transition process. While we focus on additive Gaussian noise models in this study, the idea can be extended to other noise models, such as the Cauchy and alpha-stable noise models.

pn

arXiv [BibTex]

arXiv [BibTex]


A probabilistic model for the numerical solution of initial value problems
A probabilistic model for the numerical solution of initial value problems

Schober, M., Särkkä, S., Philipp Hennig,

Statistics and Computing, Springer US, 2018 (article)

Abstract
We study connections between ordinary differential equation (ODE) solvers and probabilistic regression methods in statistics. We provide a new view of probabilistic ODE solvers as active inference agents operating on stochastic differential equation models that estimate the unknown initial value problem (IVP) solution from approximate observations of the solution derivative, as provided by the ODE dynamics. Adding to this picture, we show that several multistep methods of Nordsieck form can be recast as Kalman filtering on q-times integrated Wiener processes. Doing so provides a family of IVP solvers that return a Gaussian posterior measure, rather than a point estimate. We show that some such methods have low computational overhead, nontrivial convergence order, and that the posterior has a calibrated concentration rate. Additionally, we suggest a step size adaptation algorithm which completes the proposed method to a practically useful implementation, which we experimentally evaluate using a representative set of standard codes in the DETEST benchmark set.

pn

PDF Code DOI Project Page [BibTex]


no image
The Computational Challenges of Pursuing Multiple Goals: Network Structure of Goal Systems Predicts Human Performance

Reichman, D., Lieder, F., Bourgin, D. D., Talmon, N., Griffiths, T. L.

PsyArXiv, 2018 (article)

Abstract
Extant psychological theories attribute people’s failure to achieve their goals primarily to failures of self-control, insufficient motivation, or lacking skills. We develop a complementary theory specifying conditions under which the computational complexity of making the right decisions becomes prohibitive of goal achievement regardless of skill or motivation. We support our theory by predicting human performance from factors determining the computational complexity of selecting the optimal set of means for goal achievement. Following previous theories of goal pursuit, we express the relationship between goals and means as a bipartite graph where edges between means and goals indicate which means can be used to achieve which goals. This allows us to map two computational challenges that arise in goal achievement onto two classic combinatorial optimization problems: Set Cover and Maximum Coverage. While these problems are believed to be computationally intractable on general networks, their solution can be nevertheless efficiently approximated when the structure of the network resembles a tree. Thus, our initial prediction was that people should perform better with goal systems that are more tree-like. In addition, our theory predicted that people’s performance at selecting means should be a U-shaped function of the average number of goals each means is relevant to and the average number of means through which each goal could be accomplished. Here we report on six behavioral experiments which confirmed these predictions. Our results suggest that combinatorial parameters that are instrumental to algorithm design can also be useful for understanding when and why people struggle to pursue their goals effectively.

re

DOI [BibTex]

DOI [BibTex]

2015


no image
Probabilistic Interpretation of Linear Solvers

Hennig, P.

SIAM Journal on Optimization, 25(1):234-260, 2015 (article)

ei pn

Web PDF link (url) DOI [BibTex]

2015


Web PDF link (url) DOI [BibTex]


no image
Probabilistic numerics and uncertainty in computations

Hennig, P., Osborne, M. A., Girolami, M.

Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 471(2179), 2015 (article)

Abstract
We deliver a call to arms for probabilistic numerical methods: algorithms for numerical tasks, including linear algebra, integration, optimization and solving differential equations, that return uncertainties in their calculations. Such uncertainties, arising from the loss of precision induced by numerical calculation with limited time or hardware, are important for much contemporary science and industry. Within applications such as climate science and astrophysics, the need to make decisions on the basis of computations with large and complex data have led to a renewed focus on the management of numerical uncertainty. We describe how several seminal classic numerical methods can be interpreted naturally as probabilistic inference. We then show that the probabilistic view suggests new algorithms that can flexibly be adapted to suit application specifics, while delivering improved empirical performance. We provide concrete illustrations of the benefits of probabilistic numeric algorithms on real scientific problems from astrometry and astronomical imaging, while highlighting open problems with these new algorithms. Finally, we describe how probabilistic numerical methods provide a coherent framework for identifying the uncertainty in calculations performed with a combination of numerical algorithms (e.g. both numerical optimizers and differential equation solvers), potentially allowing the diagnosis (and control) of error sources in computations.

ei pn

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Model-Based Strategy Selection Learning

Lieder, F., Griffiths, T. L.

The 2nd Multidisciplinary Conference on Reinforcement Learning and Decision Making, 2015 (article)

Abstract
Humans possess a repertoire of decision strategies. This raises the question how we decide how to decide. Behavioral experiments suggest that the answer includes metacognitive reinforcement learning: rewards reinforce not only our behavior but also the cognitive processes that lead to it. Previous theories of strategy selection, namely SSL and RELACS, assumed that model-free reinforcement learning identifies the cognitive strategy that works best on average across all problems in the environment. Here we explore the alternative: model-based reinforcement learning about how the differential effectiveness of cognitive strategies depends on the features of individual problems. Our theory posits that people learn a predictive model of each strategy’s accuracy and execution time and choose strategies according to their predicted speed-accuracy tradeoff for the problem to be solved. We evaluate our theory against previous accounts by fitting published data on multi-attribute decision making, conducting a novel experiment, and demonstrating that our theory can account for people’s adaptive flexibility in risky choice. We find that while SSL and RELACS are sufficient to explain people’s ability to adapt to a homogeneous environment in which all decision problems are of the same type, model-based strategy selection learning can also explain people’s ability to adapt to heterogeneous environments and flexibly switch to a different decision-strategy when the situation changes.

re

link (url) Project Page [BibTex]

link (url) Project Page [BibTex]


no image
The optimism bias may support rational action

Lieder, F., Goel, S., Kwan, R., Griffiths, T. L.

NIPS 2015 Workshop on Bounded Optimality and Rational Metareasoning, 2015 (article)

re

[BibTex]

[BibTex]


no image
Rational use of cognitive resources: Levels of analysis between the computational and the algorithmic

Griffiths, T. L., Lieder, F., Goodman, N. D.

Topics in Cognitive Science, 7(2):217-229, Wiley, 2015 (article)

re

[BibTex]

[BibTex]