Inclusion of prevalent cohorts to study the causal impact of Systemic Sclerosis on cancer

  • Eleanor Barry

Student thesis: Doctoral ThesisPhD


Systemic Sclerosis (SSc) is a rare autoimmune disease. In this thesis we investigate the risk of cancer (the outcome of interest) in patients diagnosed with SSc (the exposure of interest) and compare this against the risk of cancer in people without SSc. A large UK primary care dataset provides the data for this study. This dataset contains 806 patients who were diagnosed with SSc over the chosen study period 1998 to 2018, forming the incident cohort group, termed ‘SSc’. An additional 780 patients diagnosed with SSc prior to their entry to the UK database (which may be later than 1998) and cancer free at this time, form the prevalent cohort, potentially a valuable additional resource for analysis as its addition greatly increases sample size and length of follow-up. A pre-requisite for inclusion of this prevalent cohort is that it is consistent with the incident cohort and does not distort the study population. The thesis examines how to account for prevalent patients in a competing risk framework, including in the presence of informative censoring, and the issues that could be encountered by left truncated data with a long follow-up.

Each patient diagnosed with SSc, in both incident and prevalent cohorts, is matched to 6 other patients who will be used as comparators. These matches form a group termed ‘non-SSc’ and this group enables the risk of cancer in patients without an SSc diagnosis to be assessed. For both SSc and non-SSc patients there is a competing risk, death, and the thesis considers methods of analysis in a competing risk setting. As common nonparametric methods of analysis are often limited by confounding, we investigate the parametric g-formula to permit causal interpretation, as opposed to an association, between SSc and cancer. The expansion of the parametric g-formula to include prevalent patients, including the derivation of a weighted form, is formulated. Methods to adjust for differences observed due to date of diagnosis of SSc are developed, with particular relevance to the potential adjustment of patients in the prevalent cohort to provide a better estimation of current risk.

The thesis concludes that prevalent inclusion is often beneficial, following pre-analysis of differences between the incident and prevalent cohorts and use of the Cox model to test for temporal trends. Including prevalent patients in this analysis without temporal trends suggests that there is an increase in risk of cancer in those with SSc when compared to those without (approximate 1.25 times the risk), however once temporal trends are accounted for there is no longer a statistically significant difference in the risk of cancer between SSc patients and non-SSc patients. Due to the increased mortality of SSc patients, the cause-specific risk ratio between SSc and non-SSc patients is not significant at the 5% level and decreases over calendar time. The parametric g-formula method allowing for prevalent patients was used for this study but no change in outcome between this method and the prior nonparametric methods was observed, possibly due to the matching having already accounted for the majority of confounding. We recommend the parametric g-formula when there are differentiating covariate distributions between the exposed and unexposed.

The results of this study will support those who are interested in the epidemiology of SSc, those who are considering inclusion of prevalent patients and how best to do this, and lastly those who may be interested in using the g-formula with left truncated data.
Date of Award22 Feb 2023
Original languageEnglish
Awarding Institution
  • University of Bath
SupervisorAnita McGrogan (Supervisor) & Jonathan Bartlett (Supervisor)


  • Left truncation
  • Prevalent cohorts
  • Competing risks
  • Epidemiology
  • G-formula
  • Systemic Sclerosis

Cite this