A regression analysis usually consists of several stages, such as variable selection, transformation and residual diagnosis. Inference is often made from the selected model without regard to the model selection methods that preceeded it. This can result in overoptimistic and biased inferences. We first characterize data-analytic actions as functions acting on regression models. We investigate the extent of the problem and test bootstrap, jackknife, and sample-splitting methods for ameliorating it. We also demonstrate an interactive LISP-STAT system for assessing the cost of the data analysis while it is taking place.
- Data splitting
- Model selection
- Regression analysis
ASJC Scopus subject areas
- Statistics and Probability
- Statistics, Probability and Uncertainty
- Discrete Mathematics and Combinatorics