TY - JOUR
T1 - High-resolution sweep metagenomics using fast probabilistic inference [version 2; peer review: 2 approved]
AU - Mäklin, Tommi
AU - Kallonen, Teemu
AU - David, Sophia
AU - Boinett, Christine J.
AU - Pascoe, Ben
AU - Méric, Guillaume
AU - Aanensen, David M.
AU - Feil, Edward J.
AU - Baker, Stephen
AU - Parkhill, Julian
AU - Sheppard, Samuel K.
AU - Corander, Jukka
AU - Honkela, Antti
N1 - Funding Information:
Flagship programme (Finnish Center for Artificial Intelligence FCAI; to JC and AH). TK, JC, DA and EJF are supported by the JPI-AMR consortium SpARK (MR/R00241X/1). JC was funded by the ERC (grant no. 742158). TK was funded by the Norwegian Research Council JPIAMR (grant no. 144501). SB is a Sir Henry Dale Fellow, jointly funded by the Wellcome Trust and the Royal Society [100087]. Sequencing of the Vietnamese E. coli samples was supported by the Wellcome Trust [098051]. Computational resources were provided by the 'Finnish Grid and Cloud Infrastructure' (persistent identifier urn:nbn:fi:research-infras-2016072533).
Funding Information:
Grant information: This work was supported by the Academy of Finland (grants no. 259440 and 310261; to TM and AH) as well as the
Publisher Copyright:
© 2020. Mäklin T et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
PY - 2021/10/8
Y1 - 2021/10/8
N2 - Determining the composition of bacterial communities beyond the level of a genus or species is challenging because of the considerable overlap between genomes representing close relatives. Here, we present the mSWEEP pipeline for identifying and estimating the relative sequence abundances of bacterial lineages from plate sweeps of enrichment cultures. mSWEEP leverages biologically grouped sequence assembly databases, applying probabilistic modelling, and provides controls for false positive results. Using sequencing data from major pathogens, we demonstrate significant improvements in lineage quantification and detection accuracy. Our pipeline facilitates investigating cultures comprising mixtures of bacteria, and opens up a new field of plate sweep metagenomics.
AB - Determining the composition of bacterial communities beyond the level of a genus or species is challenging because of the considerable overlap between genomes representing close relatives. Here, we present the mSWEEP pipeline for identifying and estimating the relative sequence abundances of bacterial lineages from plate sweeps of enrichment cultures. mSWEEP leverages biologically grouped sequence assembly databases, applying probabilistic modelling, and provides controls for false positive results. Using sequencing data from major pathogens, we demonstrate significant improvements in lineage quantification and detection accuracy. Our pipeline facilitates investigating cultures comprising mixtures of bacteria, and opens up a new field of plate sweep metagenomics.
KW - bacterial strain identification
KW - metagenomics
KW - microbial communities
KW - plate sweeps
KW - probabilistic modeling
UR - http://www.scopus.com/inward/record.url?scp=85117353602&partnerID=8YFLogxK
U2 - https://doi.org/10.12688/wellcomeopenres.15639.2
DO - https://doi.org/10.12688/wellcomeopenres.15639.2
M3 - Article
SN - 2312-0541
VL - 5
JO - Wellcome Open Research
JF - Wellcome Open Research
M1 - 14
ER -