GPS Pipeline: portable, scalable genomic pipeline for Streptococcus pneumoniae surveillance from Global Pneumococcal Sequencing Project

Harry C.H. Hung, Narender Kumar, Victoria Dyster, Corin Yeats, Benjamin Metcalf, Yuan Li, Paulina A. Hawkins, Lesley McGee, Stephen D. Bentley, Stephanie W. Lo

Research output: Contribution to journalArticlepeer-review

Abstract

Streptococcus pneumoniae (pneumococcus) is a major pathogen globally, responsible for an estimated one million deaths annually and contributing significantly to the global burden of antimicrobial resistance. Ongoing surveillance of its vaccine antigen (i.e. serotypes), antimicrobial resistance, and pneumococcal lineages is crucial for assessing the impact of vaccination programs and guiding future vaccine design. However, current bioinformatics tools have several limitations that prevent them from enabling comprehensive analysis that allows simultaneous, large-scale, and independent generation of these crucial data. Here, we present the GPS Pipeline that enables reliable extraction of public health information from pneumococcal genomes using in silico methods. It can accurately identify 102 of 107 known serotypes, recognise 1053 pneumococcal lineages, and predict susceptibilities to 19 common antibiotics. Built on Nextflow and utilising containerisation technology, the GPS Pipeline minimises software setup requirements and bioinformatics expertise while facilitating large-scale analysis of genomic data. The GPS Pipeline was applied and validated on 20,924 pneumococcal genomes worldwide, demonstrating its effectiveness in enhancing responsiveness in pneumococcal genomic surveillance.

Original languageEnglish
Article number8345
JournalNature Communications
Volume16
Issue number1
Early online date24 Sept 2025
DOIs
Publication statusE-pub ahead of print - 24 Sept 2025

Data Availability Statement

Published data from the GPS Database is available on Monocle Data Viewer at data.monocle.sanger.ac.uk and associated sequence read files are searchable and downloadable in the European Nucleotide Archive at ebi.ac.uk/ena via their ERR accession numbers. The list of accession numbers is available in Supplementary Data 3. This study did not publish new data, and all data analysed were previously published.

ASJC Scopus subject areas

  • General Chemistry
  • General Biochemistry,Genetics and Molecular Biology
  • General
  • General Physics and Astronomy

Fingerprint

Dive into the research topics of 'GPS Pipeline: portable, scalable genomic pipeline for Streptococcus pneumoniae surveillance from Global Pneumococcal Sequencing Project'. Together they form a unique fingerprint.

Cite this