Phylogeny-Aware Analysis of Metagenome Community Ecology Based on Matched Reference Genomes while Bypassing Taxonomy

Qiyun Zhu, Shi Huang, Antonio Gonzalez, Imran McGrath, Daniel McDonald, Niina Haiminen, George Armstrong, Yoshiki Vázquez-Baeza, Julian Yu, Justin Kuczynski, Gregory D Sepich-Poore, Austin D Swafford, Promi Das, Justin P Shaffer, Franck Lejzerowicz, Pedro Belda-Ferre, Aki S Havulinna, Guillaume Méric, Teemu Niiranen, Leo LahtiVeikko Salomaa, Ho-Cheol Kim, Mohit Jain, Michael Inouye, Jack A Gilbert, Rob Knight

Research output: Contribution to journalArticlepeer-review

63 Citations (SciVal)

Abstract

We introduce the operational genomic unit (OGU) method, a metagenome analysis strategy that directly exploits sequence alignment hits to individual reference genomes as the minimum unit for assessing the diversity of microbial communities and their relevance to environmental factors. This approach is independent of taxonomic classification, granting the possibility of maximal resolution of community composition, and organizes features into an accurate hierarchy using a phylogenomic tree. The outputs are suitable for contemporary analytical protocols for community ecology, differential abundance, and supervised learning while supporting phylogenetic methods, such as UniFrac and phylofactorization, that are seldom applied to shotgun metagenomics despite being prevalent in 16S rRNA gene amplicon studies. As demonstrated in two real-world case studies, the OGU method produces biologically meaningful patterns from microbiome data sets. Such patterns further remain detectable at very low metagenomic sequencing depths. Compared with taxonomic unit-based analyses implemented in currently adopted metagenomics tools, and the analysis of 16S rRNA gene amplicon sequence variants, this method shows superiority in informing biologically relevant insights, including stronger correlation with body environment and host sex on the Human Microbiome Project data set and more accurate prediction of human age by the gut microbiomes of Finnish individuals included in the FINRISK 2002 cohort. We provide Woltka, a bioinformatics tool to implement this method, with full integration with the QIIME 2 package and the Qiita web platform, to facilitate adoption of the OGU method in future metagenomics studies. IMPORTANCE Shotgun metagenomics is a powerful, yet computationally challenging, technique compared to 16S rRNA gene amplicon sequencing for decoding the composition and structure of microbial communities. Current analyses of metagenomic data are primarily based on taxonomic classification, which is limited in feature resolution. To solve these challenges, we introduce operational genomic units (OGUs), which are the individual reference genomes derived from sequence alignment results, without further assigning them taxonomy. The OGU method advances current read-based metagenomics in two dimensions: (i) providing maximal resolution of community composition and (ii) permitting use of phylogeny-aware tools. Our analysis of real-world data sets shows that it is advantageous over currently adopted metagenomic analysis methods and the finest-grained 16S rRNA analysis methods in predicting biological traits. We thus propose the adoption of OGUs as an effective practice in metagenomic studies.

Original languageEnglish
Pages (from-to)e0016722
JournalmSystems
Volume7
Issue number2
Early online date4 Apr 2022
DOIs
Publication statusPublished - 26 Apr 2022

Acknowledgements

We are grateful to Evguenia Kopylova, Stefan Janssen, Tomasz Kosciolek, Holly Lutz, Se Jin Song, Zachary Burcham, Shalisa Hansen, Emily Kobayashi, Gabriel Al-Ghalith, Cameron Martino, Siavash Mirarab, James Morton, Oriane Moyne, Wayne Pfeiffer, Daniel Roush, and Jeff DeReus for valuable testing of the methodology, insightful discussions on this study, and additional assistance.
This work was supported in part by an Arizona State University start-up grant (to Q.Z.), Sloan Foundation G-2017-9838, IBM Research AI through the AI Horizons Network-AI for Healthy Living A1770534, DARPA JUMP/CRISP, NIH P30DK120515, DP1AT010885, U19AG063744, U24CA248454, Emerald Foundation Distinguished Investigator Award, Crohn’s and Colitis Foundation 675191, NSF RAPID 2038509, IBM Research AI through the AI Horizons Network and the UC San Diego Center for Microbiome Innovation (to S.H., I.M., Y.V.-B., and R.K.). G.D.S.-P. is supported by a fellowship from the National Institutes of Health (F30 CA243480). T.N. was funded by the Emil Aaltonen Foundation, the Finnish Medical Foundation, the Finnish Foundation for Cardiovascular Disease, and the Academy of Finland (grant 321351). L.L. was funded by the Academy of Finland (grant 295741). V.S. was supported by the Finnish Foundation for Cardiovascular Research. J.P.S. was supported by NIH/NIGMS IRACDA K12 GM068524. This work used the Comet supercomputer at the San Diego Supercomputer Center through allocation BIO150043 through the Extreme Science and Engineering Discovery Environment (XSEDE).
Q.Z. and R.K. conceived the project. Q.Z. led the development of the methodology and software. S.H. and Q.Z. led the analysis and interpretation of the data sets presented in this article. S.H., A.G., D.M., and Y.V.-B. contributed to the design of the method. A.G., D.M., and G.A. contributed to the development and deployment of the software. G.D.S.-P., A.D.S., P.D., and F.L. contributed to the test of the method. P.B.-F., A.S.H., G.M., T.N., L.L., and V.S. contributed to data collection. A.G., I.M., J.Y., Y.V-B., and J.K. contributed to data analysis. N.H., G.D.S.-P., A.S.H., G.M., T.N., L.L., V.S., H.-C.K., M.J., M.I., J.A.G., and R.K. contributed to result interpretation. R.K. and Q.Z. managed the project. All the authors contributed to the discussion and writing of the manuscript.
We declare that we have no competing interests.

Keywords

  • Humans
  • Phylogeny
  • Metagenome
  • RNA, Ribosomal, 16S/genetics
  • Microbiota
  • Ecology

Fingerprint

Dive into the research topics of 'Phylogeny-Aware Analysis of Metagenome Community Ecology Based on Matched Reference Genomes while Bypassing Taxonomy'. Together they form a unique fingerprint.

Cite this