GWarrange: a pre- and post- genome-wide association studies pipeline for detecting phenotype-associated genome rearrangement events

Research output: Contribution to journalArticlepeer-review

Abstract

The use of k-mers to capture genetic variation in bacterial genome-wide association studies (bGWAS) has demonstrated its effectiveness in overcoming the plasticity of bacterial genomes by providing a comprehensive array of genetic variants in a genome set that is not confined to a single reference genome. However, little attempt has been made to interpret k-mers in the context of genome rearrangements, partly due to challenges in the exhaustive and high-throughput identification of genome structure and individual rearrangement events. Here, we present GWarrange, a pre- and post-bGWAS processing methodology that leverages the unique properties of k-mers to facilitate bGWAS for genome rearrangements. Repeat sequences are common instigators of genome rearrangements through intragenomic homologous recombination, and they are commonly found at rearrangement boundaries. Using whole-genome sequences, repeat sequences are replaced by short placeholder sequences, allowing the regions flanking repeats to be incorporated into relatively short k-mers. Then, locations of flanking regions in significant k-mers are mapped back to complete genome sequences to visualise genome rearrangements. Four case studies based on two bacterial species (Bordetella pertussis and Enterococcus faecium) and a simulated genome set are presented to demonstrate the ability to identify phenotype-associated rearrangements. GWarrange is available at https://github.com/DorothyTamYiLing/GWarrange.
Original languageEnglish
Article number001268
JournalMicrobial Genomics
Volume10
Issue number7
Early online date9 Jul 2024
DOIs
Publication statusPublished - 9 Jul 2024

Data Availability Statement

Tables S1, S2, S3 and S4, available in the online version of this article. Closed genome sequences and simulated genome sequences used in each of the four examples.

Funding

This work was funded by a Leverhulme Trust Research Project Grant, RPG-2019–373.

FundersFunder number
Leverhulme TrustRPG-2019–373

    Keywords

    • Bordetella pertussis
    • Enterococcus faecium
    • bacterial genome-wide association studies
    • genome rearrangement
    • k-mers
    • repeat sequences

    ASJC Scopus subject areas

    • Genetics
    • Molecular Biology
    • Epidemiology
    • Microbiology

    Fingerprint

    Dive into the research topics of 'GWarrange: a pre- and post- genome-wide association studies pipeline for detecting phenotype-associated genome rearrangement events'. Together they form a unique fingerprint.

    Cite this