Abstract

A regression analysis usually consists of several stages, such as variable selection, transformation and residual diagnosis. Inference is often made from the selected model without regard to the model selection methods that preceeded it. This can result in overoptimistic and biased inferences. We first characterize data-analytic actions as functions acting on regression models. We investigate the extent of the problem and test bootstrap, jackknife, and sample-splitting methods for ameliorating it. We also demonstrate an interactive LISP-STAT system for assessing the cost of the data analysis while it is taking place.

Original languageEnglish
Pages (from-to)213-229
Number of pages17
JournalJournal of Computational and Graphical Statistics
Volume1
Issue number3
DOIs
Publication statusPublished - 1 Jan 1992

Keywords

  • Bootstrap
  • Data splitting
  • Jackknife
  • Model selection
  • Regression analysis

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty
  • Discrete Mathematics and Combinatorics

Cite this

On The Cost Of Data Analysis. / Faraway, Julian J.

In: Journal of Computational and Graphical Statistics, Vol. 1, No. 3, 01.01.1992, p. 213-229.

Research output: Contribution to journalArticle

@article{edf12097f3c14610a37b6fc146e79990,
title = "On The Cost Of Data Analysis",
abstract = "A regression analysis usually consists of several stages, such as variable selection, transformation and residual diagnosis. Inference is often made from the selected model without regard to the model selection methods that preceeded it. This can result in overoptimistic and biased inferences. We first characterize data-analytic actions as functions acting on regression models. We investigate the extent of the problem and test bootstrap, jackknife, and sample-splitting methods for ameliorating it. We also demonstrate an interactive LISP-STAT system for assessing the cost of the data analysis while it is taking place.",
keywords = "Bootstrap, Data splitting, Jackknife, Model selection, Regression analysis",
author = "Faraway, {Julian J.}",
year = "1992",
month = "1",
day = "1",
doi = "10.1080/10618600.1992.10474582",
language = "English",
volume = "1",
pages = "213--229",
journal = "Journal of Computational and Graphical Statistics",
issn = "1061-8600",
publisher = "American Statistical Association",
number = "3",

}

TY - JOUR

T1 - On The Cost Of Data Analysis

AU - Faraway, Julian J.

PY - 1992/1/1

Y1 - 1992/1/1

N2 - A regression analysis usually consists of several stages, such as variable selection, transformation and residual diagnosis. Inference is often made from the selected model without regard to the model selection methods that preceeded it. This can result in overoptimistic and biased inferences. We first characterize data-analytic actions as functions acting on regression models. We investigate the extent of the problem and test bootstrap, jackknife, and sample-splitting methods for ameliorating it. We also demonstrate an interactive LISP-STAT system for assessing the cost of the data analysis while it is taking place.

AB - A regression analysis usually consists of several stages, such as variable selection, transformation and residual diagnosis. Inference is often made from the selected model without regard to the model selection methods that preceeded it. This can result in overoptimistic and biased inferences. We first characterize data-analytic actions as functions acting on regression models. We investigate the extent of the problem and test bootstrap, jackknife, and sample-splitting methods for ameliorating it. We also demonstrate an interactive LISP-STAT system for assessing the cost of the data analysis while it is taking place.

KW - Bootstrap

KW - Data splitting

KW - Jackknife

KW - Model selection

KW - Regression analysis

UR - http://www.scopus.com/inward/record.url?scp=0000039381&partnerID=8YFLogxK

U2 - 10.1080/10618600.1992.10474582

DO - 10.1080/10618600.1992.10474582

M3 - Article

VL - 1

SP - 213

EP - 229

JO - Journal of Computational and Graphical Statistics

JF - Journal of Computational and Graphical Statistics

SN - 1061-8600

IS - 3

ER -