Investigating the genome of Bordetella pertussis using long-read sequencing

  • Natalie Ring

Student thesis: Doctoral ThesisPhD


Whooping cough, the respiratory disease caused by the bacterium Bordetella pertussis, has been resurgent for the last thirty years. Several reasons have been suggested for this resurgence, including increased awareness and improved diagnosis techniques, waning immunity conferred by the whooping cough vaccine, and genetic shifts of circulating bacteria away from the vaccine strains. These genetic changes may have been accelerated by the switch in many countries from a whole cell vaccine to an acellular vaccine containing just one to five B. pertussis antigens. Aside from certain key genes, however, variation between B. pertussis strains appears to be very limited on the level of single genes. Instead, in recent years, a picture of genome level inter-strain variation has been emerging, beginning with the revelation that genomic rearrangements, mediated by the numerous insertion sequence elements in the B. pertussis genome, are common. Investigations of whole genome variation, alongside classic molecular epidemiological studies, may therefore be important to our understanding of how genetic changes in B. pertussis are contributing to whooping cough resurgence.

Many genome level changes may be observable only in closed genome sequences. The highly repetitive B. pertussis genome, which contains up to 300 identical copies of a 1,000 bp insertion sequence, has traditionally been difficult to resolve to the single-contig level, with most genomes assembled using Illumina sequencing data consisting of at least as many contigs as there are insertion sequence copies. Here, I first define a sequencing and data processing pipeline, utilising nanopore long-read and Illumina short-read sequencing to enable the assembly of accurate, closed B. pertussis genome sequences. Using this hybrid sequencing pipeline, I then investigate the genomes of 66 B. pertussis strains isolated in New Zealand between 1982 and 2008. New Zealand commonly sees a higher rate of incidence of whooping cough than most other countries, and no isolates from the country had previously been sequenced. Several of the genomic features of the New Zealand isolates match those observed in many other countries, including a selective sweep from strains carrying the ptxP1 allele to the ptxP3 allele and a recent rapid increase in the number of strains which are unable to produce pertactin, one of the antigens usually included in the acellular vaccine. Nonetheless, the data also indicate that the strains circulating in New Zealand might be more genetically similar than those circulating in other countries, particularly in recent years, and particularly during whooping cough outbreaks. This strain screen is the first of its kind to use nanopore sequencing, and to include traditional analysis of genotypes with analysis of genome level variation, such as rearrangements and copy number variations. Next, I attempt to investigate an ultra-long genomic duplication identified whilst testing the hybrid assembly pipeline on five UK B. pertussis strains. This work ultimately shows that the complexity of the B. pertussis genome can make in vitro studies into the links between genotype and phenotype difficult. Finally, I use the closed genome sequences of every B. pertussis strain sequenced with long read technologies to investigate any recent changes in filamentous haemagglutinin, another of the antigens included in the acellular vaccine. Studies of the genes coding for this antigen have typically been limited by its length and repetitive nature, which have hindered attempts to assemble its whole sequence. This work reveals a homopolymeric locus which may be prone to slippage and which, under selective pressure, could therefore lead to an increase in the numbers of strains which are deficient in this vaccine antigen.

Overall, the work in this thesis demonstrates how long-read sequencing can reveal previously unstudied or intractable aspects of B. pertussis biology, along with defining an affordable method for using nanopore long-read sequencing to assemble and study closed B. pertussis genomes.
Date of Award4 Nov 2020
Original languageEnglish
Awarding Institution
  • University of Bath
SupervisorStefan Bagby (Supervisor) & Andrew Preston (Supervisor)


  • Bordetella pertussis
  • Nanopore
  • Whole Genome Sequencing
  • Genomics
  • Long-read sequencing

Cite this