Abstract
Motivation: Metagenome-Assembled Genomes (MAGs) or Single-cell Amplified Genomes (SAGs) are often incomplete, with sequences missing due to errors in assembly or low coverage. This presents a particular challenge for the identification of true gene frequencies within a microbial population, as core genes missing in only a few assemblies will be mischaracterized by current pangenome approaches. Results: Here, we present CELEBRIMBOR, a Snakemake pangenome analysis pipeline which uses a measure of genome completeness to automatically adjust the frequency threshold at which core genes are identified, enabling accurate core gene identification in MAGs and SAGs.
Original language | English |
---|---|
Article number | btae542 |
Journal | Bioinformatics |
Volume | 40 |
Issue number | 9 |
Early online date | 19 Sept 2024 |
DOIs | |
Publication status | Published - 30 Sept 2024 |
Data Availability Statement
Code for CELEBRIMBOR and pangenome simulations is available on Github (https://github.com/bacpop/CELEBRIMBOR). Code for cgt is also available on Github (https://github.com/bacpop/cgt).ASJC Scopus subject areas
- Statistics and Probability
- Biochemistry
- Molecular Biology
- Computer Science Applications
- Computational Theory and Mathematics
- Computational Mathematics