AbstractThis thesis discusses Markov Chain Monte Carlo (MCMC) sampling algorithms which make use of differential geometry to improve performance. In particular, new sampling algorithms are introduced which exploit the dual geometry of statistical models developed by Amari, Chentsov and other authors in the field of Information Geometry.
Previous work on differential-geometric MCMC has only used Riemannian geometry, i.e. without the dual structures studied in Information Geometry.
Background chapters of the thesis discuss the development of Information Geometry by formulating statistical models as manifolds of data-generating measures. We review how Bregman divergences on such manifolds give rise to the dually-flat structure of Information Geometry, with examples provided by exponential families. Furthermore, Bayes' theorem is recast in a co-ordinate free form.
Two novel sampling algorithms are introduced, called Orthogonal Gibbs and Orthogonal Gradient, which make use of dually-flat geometry. The Orthogonal Gibbs algorithm samples primal variables conditionally on dual variables and vice versa; similarly Orthogonal Gradient moves along the primal geodesic whose tangent vector is proportional to the gradient of the target log-density, and samples recursively on an m-flat submanifold orthogonal to the e-flat geodesic. We show that both algorithms move in directions in parameter space which are orthogonal with respect to the Riemannian metric. Furthermore, we show that when applied to a multivariate Gaussian target distribution, the Orthogonal Gibbs and Orthogonal Gradient algorithm respectively reduce to the Gaussian Elimination and Conjugate Gradient solvers for linear systems with added stochastic terms in the updates to the solution. The latter finding reflects work by previous authors on the connection between MCMC and optimization, including that of Fox and Parker. We show that the Orthogonal Gibbs algorithm produces independent samples for multivariate Gaussian targets, and produce numerical results suggesting the analogous result for Orthogonal Gradient. Finally, we present numerical results for a non-Gaussian target distribution for which the novel algorithms have favourable effective sample sizes compared to the No U-Turn Sampler (NUTS).
|Date of Award||12 Oct 2022|
|Supervisor||Karim Anaya-Izquierdo (Supervisor) & Robert Scheichl (Supervisor)|