Generalized additive models for large data sets

Simon N. Wood, Yannig Goude, Simon Shaw

Research output: Contribution to journalArticle

74 Citations (Scopus)
53 Downloads (Pure)

Abstract

We consider an application in electricity grid load prediction, where generalized additive models are appropriate, but where the data set's size can make their use practically intractable with existing methods. We therefore develop practical generalized additive model fitting methods for large data sets in the case in which the smooth terms in the model are represented by using penalized regression splines. The methods use iterative update schemes to obtain factors of the model matrix while requiring only subblocks of the model matrix to be computed at any one time. We show that efficient smoothing parameter estimation can be carried out in a well‐justified manner. The grid load prediction problem requires updates of the model fit, as new data become available, and some means for dealing with residual auto‐correlation in grid load. Methods are provided for these problems and parallel implementation is covered. The methods allow estimation of generalized additive models for large data sets by using modest computer hardware, and the grid load prediction problem illustrates the utility of reduced rank spline smoothing methods for dealing with complex modelling problems.
Original languageEnglish
Pages (from-to)139-155
Number of pages17
JournalJournal of the Royal Statistical Society Series C-Applied Statistics
Volume64
Issue number1
Early online date27 May 2014
DOIs
Publication statusPublished - 1 Jan 2015

Cite this

Generalized additive models for large data sets. / Wood, Simon N.; Goude, Yannig; Shaw, Simon.

In: Journal of the Royal Statistical Society Series C-Applied Statistics, Vol. 64, No. 1, 01.01.2015, p. 139-155.

Research output: Contribution to journalArticle

@article{183b31a383fe458e9f44e68a148a638b,
title = "Generalized additive models for large data sets",
abstract = "We consider an application in electricity grid load prediction, where generalized additive models are appropriate, but where the data set's size can make their use practically intractable with existing methods. We therefore develop practical generalized additive model fitting methods for large data sets in the case in which the smooth terms in the model are represented by using penalized regression splines. The methods use iterative update schemes to obtain factors of the model matrix while requiring only subblocks of the model matrix to be computed at any one time. We show that efficient smoothing parameter estimation can be carried out in a well‐justified manner. The grid load prediction problem requires updates of the model fit, as new data become available, and some means for dealing with residual auto‐correlation in grid load. Methods are provided for these problems and parallel implementation is covered. The methods allow estimation of generalized additive models for large data sets by using modest computer hardware, and the grid load prediction problem illustrates the utility of reduced rank spline smoothing methods for dealing with complex modelling problems.",
author = "Wood, {Simon N.} and Yannig Goude and Simon Shaw",
year = "2015",
month = "1",
day = "1",
doi = "10.1111/rssc.12068",
language = "English",
volume = "64",
pages = "139--155",
journal = "Journal of the Royal Statistical Society: Series C - Applied Statistics",
issn = "0035-9254",
publisher = "Wiley-Blackwell",
number = "1",

}

TY - JOUR

T1 - Generalized additive models for large data sets

AU - Wood, Simon N.

AU - Goude, Yannig

AU - Shaw, Simon

PY - 2015/1/1

Y1 - 2015/1/1

N2 - We consider an application in electricity grid load prediction, where generalized additive models are appropriate, but where the data set's size can make their use practically intractable with existing methods. We therefore develop practical generalized additive model fitting methods for large data sets in the case in which the smooth terms in the model are represented by using penalized regression splines. The methods use iterative update schemes to obtain factors of the model matrix while requiring only subblocks of the model matrix to be computed at any one time. We show that efficient smoothing parameter estimation can be carried out in a well‐justified manner. The grid load prediction problem requires updates of the model fit, as new data become available, and some means for dealing with residual auto‐correlation in grid load. Methods are provided for these problems and parallel implementation is covered. The methods allow estimation of generalized additive models for large data sets by using modest computer hardware, and the grid load prediction problem illustrates the utility of reduced rank spline smoothing methods for dealing with complex modelling problems.

AB - We consider an application in electricity grid load prediction, where generalized additive models are appropriate, but where the data set's size can make their use practically intractable with existing methods. We therefore develop practical generalized additive model fitting methods for large data sets in the case in which the smooth terms in the model are represented by using penalized regression splines. The methods use iterative update schemes to obtain factors of the model matrix while requiring only subblocks of the model matrix to be computed at any one time. We show that efficient smoothing parameter estimation can be carried out in a well‐justified manner. The grid load prediction problem requires updates of the model fit, as new data become available, and some means for dealing with residual auto‐correlation in grid load. Methods are provided for these problems and parallel implementation is covered. The methods allow estimation of generalized additive models for large data sets by using modest computer hardware, and the grid load prediction problem illustrates the utility of reduced rank spline smoothing methods for dealing with complex modelling problems.

U2 - 10.1111/rssc.12068

DO - 10.1111/rssc.12068

M3 - Article

VL - 64

SP - 139

EP - 155

JO - Journal of the Royal Statistical Society: Series C - Applied Statistics

JF - Journal of the Royal Statistical Society: Series C - Applied Statistics

SN - 0035-9254

IS - 1

ER -