Multiple imputation of missing data in nested case-control and case-cohort studies

Ruth H Keogh, Shaun R Seaman, Jonathan W Bartlett, Angela M Wood

Research output: Contribution to journalArticle

Abstract

The nested case-control and case-cohort designs are two main approaches for carrying out a substudy within a prospective cohort. This article adapts multiple imputation (MI) methods for handling missing covariates in full-cohort studies for nested case-control and case-cohort studies. We consider data missing by design and data missing by chance. MI analyses that make use of full-cohort data and MI analyses based on substudy data only are described, alongside an intermediate approach in which the imputation uses full-cohort data but the analysis uses only the substudy. We describe adaptations to two imputation methods: the approximate method (MI-approx) of White and Royston () and the "substantive model compatible" (MI-SMC) method of Bartlett et al. (). We also apply the "MI matched set" approach of Seaman and Keogh () to nested case-control studies, which does not require any full-cohort information. The methods are investigated using simulation studies and all perform well when their assumptions hold. Substantial gains in efficiency can be made by imputing data missing by design using the full-cohort approach or by imputing data missing by chance in analyses using the substudy only. The intermediate approach brings greater gains in efficiency relative to the substudy approach and is more robust to imputation model misspecification than the full-cohort approach. The methods are illustrated using the ARIC Study cohort. Supplementary Materials provide R and Stata code.

Original languageEnglish
Number of pages12
JournalBiometrics
Early online date5 Jun 2018
DOIs
Publication statusE-pub ahead of print - 5 Jun 2018

Fingerprint

Cohort Study
Multiple Imputation
Case-control
Missing Data
cohort studies
Cohort Studies
Imputation
Case-cohort Design
methodology
Missing Covariates
Model Misspecification
Case-control Study
Relative Efficiency
case-control studies
Case-Control Studies
data analysis
Simulation Study

Cite this

Multiple imputation of missing data in nested case-control and case-cohort studies. / Keogh, Ruth H; Seaman, Shaun R; Bartlett, Jonathan W; Wood, Angela M.

In: Biometrics, 05.06.2018.

Research output: Contribution to journalArticle

@article{995175050ec64c11a1da76802c06aa04,
title = "Multiple imputation of missing data in nested case-control and case-cohort studies",
abstract = "The nested case-control and case-cohort designs are two main approaches for carrying out a substudy within a prospective cohort. This article adapts multiple imputation (MI) methods for handling missing covariates in full-cohort studies for nested case-control and case-cohort studies. We consider data missing by design and data missing by chance. MI analyses that make use of full-cohort data and MI analyses based on substudy data only are described, alongside an intermediate approach in which the imputation uses full-cohort data but the analysis uses only the substudy. We describe adaptations to two imputation methods: the approximate method (MI-approx) of White and Royston () and the {"}substantive model compatible{"} (MI-SMC) method of Bartlett et al. (). We also apply the {"}MI matched set{"} approach of Seaman and Keogh () to nested case-control studies, which does not require any full-cohort information. The methods are investigated using simulation studies and all perform well when their assumptions hold. Substantial gains in efficiency can be made by imputing data missing by design using the full-cohort approach or by imputing data missing by chance in analyses using the substudy only. The intermediate approach brings greater gains in efficiency relative to the substudy approach and is more robust to imputation model misspecification than the full-cohort approach. The methods are illustrated using the ARIC Study cohort. Supplementary Materials provide R and Stata code.",
author = "Keogh, {Ruth H} and Seaman, {Shaun R} and Bartlett, {Jonathan W} and Wood, {Angela M}",
note = "{\circledC} 2018, The International Biometric Society.",
year = "2018",
month = "6",
day = "5",
doi = "10.1111/biom.12910",
language = "English",
journal = "Biometrics",
issn = "0006-341X",
publisher = "Wiley-Blackwell",

}

TY - JOUR

T1 - Multiple imputation of missing data in nested case-control and case-cohort studies

AU - Keogh, Ruth H

AU - Seaman, Shaun R

AU - Bartlett, Jonathan W

AU - Wood, Angela M

N1 - © 2018, The International Biometric Society.

PY - 2018/6/5

Y1 - 2018/6/5

N2 - The nested case-control and case-cohort designs are two main approaches for carrying out a substudy within a prospective cohort. This article adapts multiple imputation (MI) methods for handling missing covariates in full-cohort studies for nested case-control and case-cohort studies. We consider data missing by design and data missing by chance. MI analyses that make use of full-cohort data and MI analyses based on substudy data only are described, alongside an intermediate approach in which the imputation uses full-cohort data but the analysis uses only the substudy. We describe adaptations to two imputation methods: the approximate method (MI-approx) of White and Royston () and the "substantive model compatible" (MI-SMC) method of Bartlett et al. (). We also apply the "MI matched set" approach of Seaman and Keogh () to nested case-control studies, which does not require any full-cohort information. The methods are investigated using simulation studies and all perform well when their assumptions hold. Substantial gains in efficiency can be made by imputing data missing by design using the full-cohort approach or by imputing data missing by chance in analyses using the substudy only. The intermediate approach brings greater gains in efficiency relative to the substudy approach and is more robust to imputation model misspecification than the full-cohort approach. The methods are illustrated using the ARIC Study cohort. Supplementary Materials provide R and Stata code.

AB - The nested case-control and case-cohort designs are two main approaches for carrying out a substudy within a prospective cohort. This article adapts multiple imputation (MI) methods for handling missing covariates in full-cohort studies for nested case-control and case-cohort studies. We consider data missing by design and data missing by chance. MI analyses that make use of full-cohort data and MI analyses based on substudy data only are described, alongside an intermediate approach in which the imputation uses full-cohort data but the analysis uses only the substudy. We describe adaptations to two imputation methods: the approximate method (MI-approx) of White and Royston () and the "substantive model compatible" (MI-SMC) method of Bartlett et al. (). We also apply the "MI matched set" approach of Seaman and Keogh () to nested case-control studies, which does not require any full-cohort information. The methods are investigated using simulation studies and all perform well when their assumptions hold. Substantial gains in efficiency can be made by imputing data missing by design using the full-cohort approach or by imputing data missing by chance in analyses using the substudy only. The intermediate approach brings greater gains in efficiency relative to the substudy approach and is more robust to imputation model misspecification than the full-cohort approach. The methods are illustrated using the ARIC Study cohort. Supplementary Materials provide R and Stata code.

U2 - 10.1111/biom.12910

DO - 10.1111/biom.12910

M3 - Article

JO - Biometrics

JF - Biometrics

SN - 0006-341X

ER -