On The Cost Of Data Analysis

Research output: Contribution to journalArticlepeer-review


A regression analysis usually consists of several stages, such as variable selection, transformation and residual diagnosis. Inference is often made from the selected model without regard to the model selection methods that preceeded it. This can result in overoptimistic and biased inferences. We first characterize data-analytic actions as functions acting on regression models. We investigate the extent of the problem and test bootstrap, jackknife, and sample-splitting methods for ameliorating it. We also demonstrate an interactive LISP-STAT system for assessing the cost of the data analysis while it is taking place.

Original languageEnglish
Pages (from-to)213-229
Number of pages17
JournalJournal of Computational and Graphical Statistics
Issue number3
Publication statusPublished - 1 Jan 1992


  • Bootstrap
  • Data splitting
  • Jackknife
  • Model selection
  • Regression analysis

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty
  • Discrete Mathematics and Combinatorics


Dive into the research topics of 'On The Cost Of Data Analysis'. Together they form a unique fingerprint.

Cite this