Abstract
The use of k-mers to capture genetic variation in bacterial genome-wide association studies (bGWAS) has demonstrated its effectiveness in overcoming the plasticity of bacterial genomes by providing a comprehensive array of genetic variants in a genome set that is not confined to a single reference genome. However, little attempt has been made to interpret k-mers in the context of genome rearrangements, partly due to challenges in the exhaustive and high-throughput identification of genome structure and individual rearrangement events. Here, we present GWarrange, a pre- and post-bGWAS processing methodology that leverages the unique properties of k-mers to facilitate bGWAS for genome rearrangements. Repeat sequences are common instigators of genome rearrangements through intragenomic homologous recombination, and they are commonly found at rearrangement boundaries. Using whole-genome sequences, repeat sequences are replaced by short placeholder sequences, allowing the regions flanking repeats to be incorporated into relatively short k-mers. Then, locations of flanking regions in significant k-mers are mapped back to complete genome sequences to visualise genome rearrangements. Four case studies based on two bacterial species (Bordetella pertussis and Enterococcus faecium) and a simulated genome set are presented to demonstrate the ability to identify phenotype-associated rearrangements. GWarrange is available at https://github.com/DorothyTamYiLing/GWarrange.
Original language | English |
---|---|
Article number | 001268 |
Journal | Microbial Genomics |
Volume | 10 |
Issue number | 7 |
Early online date | 9 Jul 2024 |
DOIs | |
Publication status | Published - 9 Jul 2024 |
Data Availability Statement
Tables S1, S2, S3 and S4, available in the online version of this article. Closed genome sequences and simulated genome sequences used in each of the four examples.Funding
This work was funded by a Leverhulme Trust Research Project Grant, RPG-2019–373.
Funders | Funder number |
---|---|
Leverhulme Trust | RPG-2019–373 |
Keywords
- Bordetella pertussis
- Enterococcus faecium
- bacterial genome-wide association studies
- genome rearrangement
- k-mers
- repeat sequences
ASJC Scopus subject areas
- Genetics
- Molecular Biology
- Epidemiology
- Microbiology