On quantile quantile plots for generalized linear models

Nicole Augustin, Eric-Andre Sauleau, Simon Wood

Research output: Contribution to journalArticle

18 Citations (Scopus)
71 Downloads (Pure)

Abstract

The distributional assumption for a generalized linear model is often checked by plotting the ordered deviance residuals against the quantiles of a standard normal distribution. Such plots can be difficult to interpret, because even when the model is correct, the plot often deviates substantially from a straight line. To rectify this problem Garcia Ben and Yohai (2004, J. Comput. Graph. Stat. 13: 36-47) proposed plotting the deviance residuals against their theoretical quantiles, under the assumption that the model is correct. Such plots are closer to a straight line, when the model is correct, making them much more useful for model checking. However the quantile computation proposed in Garcia Ben and Yohai is, in general, relatively complicated to implement and computationally expensive, so that general purpose software for these plots is only available for the Poisson and binary cases in the R package {\tt robust}. As an alternative the theoretical quantiles can efficiently and simply be estimated by repeatedly simulating new response data from the fitted model and computing the corresponding residuals. This method also provides reference bands for judging the significance of departures of QQ-plots from ideal straight line form. A second alternative is to estimate the quantiles using quantiles of the response variable distribution according to the estimated model. This latter alternative generally has lower computational cost than the first, but does not yield QQ-plot reference bands. In simulations the quantiles produced by the new methods give results indistinguishable from the original Garc\'ia Ben and Yohai quantile computations, but the scaling of computational cost with sample size is much improved so that a 500 fold reduction in computation time was observed at sample size 50000. Application of the methods to generalized linear models fitted to prostate cancer incidence data suggest that they are particularly useful in large dataset cases that might otherwise be incorrectly viewed as zero-inflated. The new approaches are simple enough to implement for any exponential family distribution and for several alternative types of residual, and this has been done for all the families available for use with generalized linear models in the basic distribution of R.
Original languageEnglish
Pages (from-to)2404-2409
Number of pages6
JournalComputational Statistics & Data Analysis
Volume56
Issue number8
DOIs
Publication statusPublished - Aug 2012

Fingerprint

Generalized Linear Model
Quantile
Q-Q Plot
Straight Line
Deviance
Alternatives
Computational Cost
Sample Size
Standard Normal distribution
Prostate Cancer
Model
Exponential Family
Model checking
Normal distribution
Large Data Sets
Model Checking
Costs
Incidence
Siméon Denis Poisson
Fold

Cite this

On quantile quantile plots for generalized linear models. / Augustin, Nicole; Sauleau, Eric-Andre; Wood, Simon.

In: Computational Statistics & Data Analysis, Vol. 56, No. 8, 08.2012, p. 2404-2409.

Research output: Contribution to journalArticle

Augustin, Nicole ; Sauleau, Eric-Andre ; Wood, Simon. / On quantile quantile plots for generalized linear models. In: Computational Statistics & Data Analysis. 2012 ; Vol. 56, No. 8. pp. 2404-2409.
@article{c4dc5114dceb478f98605fbc5c69eebf,
title = "On quantile quantile plots for generalized linear models",
abstract = "The distributional assumption for a generalized linear model is often checked by plotting the ordered deviance residuals against the quantiles of a standard normal distribution. Such plots can be difficult to interpret, because even when the model is correct, the plot often deviates substantially from a straight line. To rectify this problem Garcia Ben and Yohai (2004, J. Comput. Graph. Stat. 13: 36-47) proposed plotting the deviance residuals against their theoretical quantiles, under the assumption that the model is correct. Such plots are closer to a straight line, when the model is correct, making them much more useful for model checking. However the quantile computation proposed in Garcia Ben and Yohai is, in general, relatively complicated to implement and computationally expensive, so that general purpose software for these plots is only available for the Poisson and binary cases in the R package {\tt robust}. As an alternative the theoretical quantiles can efficiently and simply be estimated by repeatedly simulating new response data from the fitted model and computing the corresponding residuals. This method also provides reference bands for judging the significance of departures of QQ-plots from ideal straight line form. A second alternative is to estimate the quantiles using quantiles of the response variable distribution according to the estimated model. This latter alternative generally has lower computational cost than the first, but does not yield QQ-plot reference bands. In simulations the quantiles produced by the new methods give results indistinguishable from the original Garc\'ia Ben and Yohai quantile computations, but the scaling of computational cost with sample size is much improved so that a 500 fold reduction in computation time was observed at sample size 50000. Application of the methods to generalized linear models fitted to prostate cancer incidence data suggest that they are particularly useful in large dataset cases that might otherwise be incorrectly viewed as zero-inflated. The new approaches are simple enough to implement for any exponential family distribution and for several alternative types of residual, and this has been done for all the families available for use with generalized linear models in the basic distribution of R.",
author = "Nicole Augustin and Eric-Andre Sauleau and Simon Wood",
year = "2012",
month = "8",
doi = "10.1016/j.csda.2012.01.026",
language = "English",
volume = "56",
pages = "2404--2409",
journal = "Computational Statistics & Data Analysis",
issn = "0167-9473",
publisher = "Elsevier",
number = "8",

}

TY - JOUR

T1 - On quantile quantile plots for generalized linear models

AU - Augustin, Nicole

AU - Sauleau, Eric-Andre

AU - Wood, Simon

PY - 2012/8

Y1 - 2012/8

N2 - The distributional assumption for a generalized linear model is often checked by plotting the ordered deviance residuals against the quantiles of a standard normal distribution. Such plots can be difficult to interpret, because even when the model is correct, the plot often deviates substantially from a straight line. To rectify this problem Garcia Ben and Yohai (2004, J. Comput. Graph. Stat. 13: 36-47) proposed plotting the deviance residuals against their theoretical quantiles, under the assumption that the model is correct. Such plots are closer to a straight line, when the model is correct, making them much more useful for model checking. However the quantile computation proposed in Garcia Ben and Yohai is, in general, relatively complicated to implement and computationally expensive, so that general purpose software for these plots is only available for the Poisson and binary cases in the R package {\tt robust}. As an alternative the theoretical quantiles can efficiently and simply be estimated by repeatedly simulating new response data from the fitted model and computing the corresponding residuals. This method also provides reference bands for judging the significance of departures of QQ-plots from ideal straight line form. A second alternative is to estimate the quantiles using quantiles of the response variable distribution according to the estimated model. This latter alternative generally has lower computational cost than the first, but does not yield QQ-plot reference bands. In simulations the quantiles produced by the new methods give results indistinguishable from the original Garc\'ia Ben and Yohai quantile computations, but the scaling of computational cost with sample size is much improved so that a 500 fold reduction in computation time was observed at sample size 50000. Application of the methods to generalized linear models fitted to prostate cancer incidence data suggest that they are particularly useful in large dataset cases that might otherwise be incorrectly viewed as zero-inflated. The new approaches are simple enough to implement for any exponential family distribution and for several alternative types of residual, and this has been done for all the families available for use with generalized linear models in the basic distribution of R.

AB - The distributional assumption for a generalized linear model is often checked by plotting the ordered deviance residuals against the quantiles of a standard normal distribution. Such plots can be difficult to interpret, because even when the model is correct, the plot often deviates substantially from a straight line. To rectify this problem Garcia Ben and Yohai (2004, J. Comput. Graph. Stat. 13: 36-47) proposed plotting the deviance residuals against their theoretical quantiles, under the assumption that the model is correct. Such plots are closer to a straight line, when the model is correct, making them much more useful for model checking. However the quantile computation proposed in Garcia Ben and Yohai is, in general, relatively complicated to implement and computationally expensive, so that general purpose software for these plots is only available for the Poisson and binary cases in the R package {\tt robust}. As an alternative the theoretical quantiles can efficiently and simply be estimated by repeatedly simulating new response data from the fitted model and computing the corresponding residuals. This method also provides reference bands for judging the significance of departures of QQ-plots from ideal straight line form. A second alternative is to estimate the quantiles using quantiles of the response variable distribution according to the estimated model. This latter alternative generally has lower computational cost than the first, but does not yield QQ-plot reference bands. In simulations the quantiles produced by the new methods give results indistinguishable from the original Garc\'ia Ben and Yohai quantile computations, but the scaling of computational cost with sample size is much improved so that a 500 fold reduction in computation time was observed at sample size 50000. Application of the methods to generalized linear models fitted to prostate cancer incidence data suggest that they are particularly useful in large dataset cases that might otherwise be incorrectly viewed as zero-inflated. The new approaches are simple enough to implement for any exponential family distribution and for several alternative types of residual, and this has been done for all the families available for use with generalized linear models in the basic distribution of R.

UR - http://www.scopus.com/inward/record.url?scp=84859107033&partnerID=8YFLogxK

UR - http://dx.doi.org/10.1016/j.csda.2012.01.026

U2 - 10.1016/j.csda.2012.01.026

DO - 10.1016/j.csda.2012.01.026

M3 - Article

VL - 56

SP - 2404

EP - 2409

JO - Computational Statistics & Data Analysis

JF - Computational Statistics & Data Analysis

SN - 0167-9473

IS - 8

ER -