Sparse multiscale Gaussian process regression

Christian Walder, Kwang In Kim, Bernhard Schölkopf

Research output: Chapter in Book/Report/Conference proceedingConference contribution

32 Citations (Scopus)

Abstract

Most existing sparse Gaussian process (g.p.) models seek computational advantages by basing their computations on a set of m basis functions that are the covariance function of the g.p. with one of its two inputs fixed. We generalise this for the case of Gaussian covariance function, by basing our computations on m Gaussian basis functions with arbitrary diagonal covariance matrices (or length scales). For a fixed number of basis functions and any given criteria, this additional flexibility permits approximations no worse and typically better than was previously possible. We perform gradient based optimisation of the marginal likelihood, which costs O(m2n) time where n is the number of data points, and compare the method to various other sparse g.p. methods. Although we focus on g.p. regression, the central idea is applicable to all kernel based algorithms, and we also provide some results for the support vector machine (s.v.m.) and kernel ridge regression (k.r.r.). Our approach outperforms the other methods, particularly for the case of very few basis functions, i. e. a very high sparsity ratio.
Original languageEnglish
Title of host publicationProceedings of the.25th International Conference on Machine Learning (ICML), 2008
Place of PublicationNew York, U. S. A.
PublisherAssociation for Computing Machinery
Pages1112-1119
Number of pages8
ISBN (Print)9781605582054
DOIs
Publication statusPublished - 2008
Event25th International Conference on Machine Learning (ICML), 2008 - Helsinki, Finland
Duration: 5 Jun 20089 Jun 2008

Conference

Conference25th International Conference on Machine Learning (ICML), 2008
CountryFinland
CityHelsinki
Period5/06/089/06/08

Fingerprint

Covariance matrix
Support vector machines
Costs

Cite this

Walder, C., Kim, K. I., & Schölkopf, B. (2008). Sparse multiscale Gaussian process regression. In Proceedings of the.25th International Conference on Machine Learning (ICML), 2008 (pp. 1112-1119). New York, U. S. A.: Association for Computing Machinery. https://doi.org/10.1145/1390156.1390296

Sparse multiscale Gaussian process regression. / Walder, Christian; Kim, Kwang In; Schölkopf, Bernhard .

Proceedings of the.25th International Conference on Machine Learning (ICML), 2008. New York, U. S. A. : Association for Computing Machinery, 2008. p. 1112-1119.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Walder, C, Kim, KI & Schölkopf, B 2008, Sparse multiscale Gaussian process regression. in Proceedings of the.25th International Conference on Machine Learning (ICML), 2008. Association for Computing Machinery, New York, U. S. A., pp. 1112-1119, 25th International Conference on Machine Learning (ICML), 2008, Helsinki, Finland, 5/06/08. https://doi.org/10.1145/1390156.1390296
Walder C, Kim KI, Schölkopf B. Sparse multiscale Gaussian process regression. In Proceedings of the.25th International Conference on Machine Learning (ICML), 2008. New York, U. S. A.: Association for Computing Machinery. 2008. p. 1112-1119 https://doi.org/10.1145/1390156.1390296
Walder, Christian ; Kim, Kwang In ; Schölkopf, Bernhard . / Sparse multiscale Gaussian process regression. Proceedings of the.25th International Conference on Machine Learning (ICML), 2008. New York, U. S. A. : Association for Computing Machinery, 2008. pp. 1112-1119
@inproceedings{61f2d6691b0e42ddb987242c76546c97,
title = "Sparse multiscale Gaussian process regression",
abstract = "Most existing sparse Gaussian process (g.p.) models seek computational advantages by basing their computations on a set of m basis functions that are the covariance function of the g.p. with one of its two inputs fixed. We generalise this for the case of Gaussian covariance function, by basing our computations on m Gaussian basis functions with arbitrary diagonal covariance matrices (or length scales). For a fixed number of basis functions and any given criteria, this additional flexibility permits approximations no worse and typically better than was previously possible. We perform gradient based optimisation of the marginal likelihood, which costs O(m2n) time where n is the number of data points, and compare the method to various other sparse g.p. methods. Although we focus on g.p. regression, the central idea is applicable to all kernel based algorithms, and we also provide some results for the support vector machine (s.v.m.) and kernel ridge regression (k.r.r.). Our approach outperforms the other methods, particularly for the case of very few basis functions, i. e. a very high sparsity ratio.",
author = "Christian Walder and Kim, {Kwang In} and Bernhard Sch{\"o}lkopf",
year = "2008",
doi = "10.1145/1390156.1390296",
language = "English",
isbn = "9781605582054",
pages = "1112--1119",
booktitle = "Proceedings of the.25th International Conference on Machine Learning (ICML), 2008",
publisher = "Association for Computing Machinery",
address = "USA United States",

}

TY - GEN

T1 - Sparse multiscale Gaussian process regression

AU - Walder, Christian

AU - Kim, Kwang In

AU - Schölkopf, Bernhard

PY - 2008

Y1 - 2008

N2 - Most existing sparse Gaussian process (g.p.) models seek computational advantages by basing their computations on a set of m basis functions that are the covariance function of the g.p. with one of its two inputs fixed. We generalise this for the case of Gaussian covariance function, by basing our computations on m Gaussian basis functions with arbitrary diagonal covariance matrices (or length scales). For a fixed number of basis functions and any given criteria, this additional flexibility permits approximations no worse and typically better than was previously possible. We perform gradient based optimisation of the marginal likelihood, which costs O(m2n) time where n is the number of data points, and compare the method to various other sparse g.p. methods. Although we focus on g.p. regression, the central idea is applicable to all kernel based algorithms, and we also provide some results for the support vector machine (s.v.m.) and kernel ridge regression (k.r.r.). Our approach outperforms the other methods, particularly for the case of very few basis functions, i. e. a very high sparsity ratio.

AB - Most existing sparse Gaussian process (g.p.) models seek computational advantages by basing their computations on a set of m basis functions that are the covariance function of the g.p. with one of its two inputs fixed. We generalise this for the case of Gaussian covariance function, by basing our computations on m Gaussian basis functions with arbitrary diagonal covariance matrices (or length scales). For a fixed number of basis functions and any given criteria, this additional flexibility permits approximations no worse and typically better than was previously possible. We perform gradient based optimisation of the marginal likelihood, which costs O(m2n) time where n is the number of data points, and compare the method to various other sparse g.p. methods. Although we focus on g.p. regression, the central idea is applicable to all kernel based algorithms, and we also provide some results for the support vector machine (s.v.m.) and kernel ridge regression (k.r.r.). Our approach outperforms the other methods, particularly for the case of very few basis functions, i. e. a very high sparsity ratio.

UR - http://dx.doi.org/10.1145/1390156.1390296

U2 - 10.1145/1390156.1390296

DO - 10.1145/1390156.1390296

M3 - Conference contribution

SN - 9781605582054

SP - 1112

EP - 1119

BT - Proceedings of the.25th International Conference on Machine Learning (ICML), 2008

PB - Association for Computing Machinery

CY - New York, U. S. A.

ER -