Resolving the complex Bordetella pertussis genome using barcoded nanopore sequencing

Research output: Contribution to journalArticle

Abstract

The genome of Bordetella pertussis is complex, with high G+C content and many repeats, each longer than 1000 bp. Long-read sequencing offers the opportunity to produce single-contig B. pertussis assemblies using sequencing reads which are longer than the repetitive sections, with the potential to reveal genomic features which were previously unobservable in multi-contig assemblies produced by short-read sequencing alone. We used an R9.4 MinION flow cell and barcoding to sequence five B. pertussis strains in a single sequencing run. We then trialled combinations of the many nanopore user community-built long-read analysis tools to establish the current optimal assembly pipeline for B. pertussis genome sequences. This pipeline produced closed genome sequences for four strains, allowing visualization of inter-strain genomic rearrangement. Read mapping to the Tohama I reference genome suggests that the remaining strain contains an ultra-long duplicated region (almost 200 kbp), which was not resolved by our pipeline; further investigation also revealed that a second strain that was seemingly resolved by our pipeline may contain an even longer duplication, albeit in a small subset of cells. We have therefore demonstrated the ability to resolve the structure of several B. pertussis strains per single barcoded nanopore flow cell, but the genomes with highest complexity (e.g. very large duplicated regions) remain only partially resolved using the standard library preparation and will require an alternative library preparation method. For full strain characterization, we recommend hybrid assembly of long and short reads together; for comparison of genome arrangement, assembly using long reads alone is sufficient.

Original languageEnglish
Pages (from-to)1-13
Number of pages13
JournalMicrobial Genomics
Volume4
Issue number11
DOIs
Publication statusPublished - 28 Nov 2018

Keywords

  • Bordetella pertussis
  • Oxford nanopore
  • benchmarking
  • duplications
  • genome assembly
  • long-read sequencing

ASJC Scopus subject areas

  • Medicine(all)

Cite this

Resolving the complex Bordetella pertussis genome using barcoded nanopore sequencing. / Ring, Natalie; Abrahams, Jonathan S; Jain, Miten; Olsen, Hugh; Preston, Andrew; Bagby, Stefan.

In: Microbial Genomics, Vol. 4, No. 11, 28.11.2018, p. 1-13.

Research output: Contribution to journalArticle

@article{11c22e4d4d814a2886af85abbf42418a,
title = "Resolving the complex Bordetella pertussis genome using barcoded nanopore sequencing",
abstract = "The genome of Bordetella pertussis is complex, with high G+C content and many repeats, each longer than 1000 bp. Long-read sequencing offers the opportunity to produce single-contig B. pertussis assemblies using sequencing reads which are longer than the repetitive sections, with the potential to reveal genomic features which were previously unobservable in multi-contig assemblies produced by short-read sequencing alone. We used an R9.4 MinION flow cell and barcoding to sequence five B. pertussis strains in a single sequencing run. We then trialled combinations of the many nanopore user community-built long-read analysis tools to establish the current optimal assembly pipeline for B. pertussis genome sequences. This pipeline produced closed genome sequences for four strains, allowing visualization of inter-strain genomic rearrangement. Read mapping to the Tohama I reference genome suggests that the remaining strain contains an ultra-long duplicated region (almost 200 kbp), which was not resolved by our pipeline; further investigation also revealed that a second strain that was seemingly resolved by our pipeline may contain an even longer duplication, albeit in a small subset of cells. We have therefore demonstrated the ability to resolve the structure of several B. pertussis strains per single barcoded nanopore flow cell, but the genomes with highest complexity (e.g. very large duplicated regions) remain only partially resolved using the standard library preparation and will require an alternative library preparation method. For full strain characterization, we recommend hybrid assembly of long and short reads together; for comparison of genome arrangement, assembly using long reads alone is sufficient.",
keywords = "Bordetella pertussis, Oxford nanopore, benchmarking, duplications, genome assembly, long-read sequencing",
author = "Natalie Ring and Abrahams, {Jonathan S} and Miten Jain and Hugh Olsen and Andrew Preston and Stefan Bagby",
year = "2018",
month = "11",
day = "28",
doi = "10.1099/mgen.0.000234",
language = "English",
volume = "4",
pages = "1--13",
journal = "Microbial Genomics",
issn = "2057-5858",
publisher = "Microbiology Society",
number = "11",

}

TY - JOUR

T1 - Resolving the complex Bordetella pertussis genome using barcoded nanopore sequencing

AU - Ring, Natalie

AU - Abrahams, Jonathan S

AU - Jain, Miten

AU - Olsen, Hugh

AU - Preston, Andrew

AU - Bagby, Stefan

PY - 2018/11/28

Y1 - 2018/11/28

N2 - The genome of Bordetella pertussis is complex, with high G+C content and many repeats, each longer than 1000 bp. Long-read sequencing offers the opportunity to produce single-contig B. pertussis assemblies using sequencing reads which are longer than the repetitive sections, with the potential to reveal genomic features which were previously unobservable in multi-contig assemblies produced by short-read sequencing alone. We used an R9.4 MinION flow cell and barcoding to sequence five B. pertussis strains in a single sequencing run. We then trialled combinations of the many nanopore user community-built long-read analysis tools to establish the current optimal assembly pipeline for B. pertussis genome sequences. This pipeline produced closed genome sequences for four strains, allowing visualization of inter-strain genomic rearrangement. Read mapping to the Tohama I reference genome suggests that the remaining strain contains an ultra-long duplicated region (almost 200 kbp), which was not resolved by our pipeline; further investigation also revealed that a second strain that was seemingly resolved by our pipeline may contain an even longer duplication, albeit in a small subset of cells. We have therefore demonstrated the ability to resolve the structure of several B. pertussis strains per single barcoded nanopore flow cell, but the genomes with highest complexity (e.g. very large duplicated regions) remain only partially resolved using the standard library preparation and will require an alternative library preparation method. For full strain characterization, we recommend hybrid assembly of long and short reads together; for comparison of genome arrangement, assembly using long reads alone is sufficient.

AB - The genome of Bordetella pertussis is complex, with high G+C content and many repeats, each longer than 1000 bp. Long-read sequencing offers the opportunity to produce single-contig B. pertussis assemblies using sequencing reads which are longer than the repetitive sections, with the potential to reveal genomic features which were previously unobservable in multi-contig assemblies produced by short-read sequencing alone. We used an R9.4 MinION flow cell and barcoding to sequence five B. pertussis strains in a single sequencing run. We then trialled combinations of the many nanopore user community-built long-read analysis tools to establish the current optimal assembly pipeline for B. pertussis genome sequences. This pipeline produced closed genome sequences for four strains, allowing visualization of inter-strain genomic rearrangement. Read mapping to the Tohama I reference genome suggests that the remaining strain contains an ultra-long duplicated region (almost 200 kbp), which was not resolved by our pipeline; further investigation also revealed that a second strain that was seemingly resolved by our pipeline may contain an even longer duplication, albeit in a small subset of cells. We have therefore demonstrated the ability to resolve the structure of several B. pertussis strains per single barcoded nanopore flow cell, but the genomes with highest complexity (e.g. very large duplicated regions) remain only partially resolved using the standard library preparation and will require an alternative library preparation method. For full strain characterization, we recommend hybrid assembly of long and short reads together; for comparison of genome arrangement, assembly using long reads alone is sufficient.

KW - Bordetella pertussis

KW - Oxford nanopore

KW - benchmarking

KW - duplications

KW - genome assembly

KW - long-read sequencing

UR - http://www.scopus.com/inward/record.url?scp=85057550910&partnerID=8YFLogxK

U2 - 10.1099/mgen.0.000234

DO - 10.1099/mgen.0.000234

M3 - Article

VL - 4

SP - 1

EP - 13

JO - Microbial Genomics

JF - Microbial Genomics

SN - 2057-5858

IS - 11

ER -