A phylogeny-based sampling strategy and power calculator informs genome-wide associations study design for microbial pathogens

Maha R. Farhat, B. Jesse Shapiro, Samuel K. Sheppard, Caroline Colijn, Megan Murray

Research output: Contribution to journalArticle

14 Citations (Scopus)

Abstract

Whole genome sequencing is increasingly used to study phenotypic variation among infectious pathogens and to evaluate their relative transmissibility, virulence, and immunogenicity. To date, relatively little has been published on how and how many pathogen strains should be selected for studies associating phenotype and genotype. There are specific challenges when identifying genetic associations in bacteria which often comprise highly structured populations. Here we consider general methodological questions related to sampling and analysis focusing on clonal to moderately recombining pathogens. We propose that a matched sampling scheme constitutes an efficient study design, and provide a power calculator based on phylogenetic convergence. We demonstrate this approach by applying it to genomic datasets for two microbial pathogens: Mycobacterium tuberculosis and Campylobacter species.

Original languageEnglish
Article number101
Pages (from-to)1-14
Number of pages14
JournalGenome Medicine
Volume6
Issue number11
DOIs
Publication statusPublished - 15 Nov 2014

ASJC Scopus subject areas

  • Molecular Medicine
  • Molecular Biology
  • Genetics
  • Genetics(clinical)

Cite this

A phylogeny-based sampling strategy and power calculator informs genome-wide associations study design for microbial pathogens. / Farhat, Maha R.; Shapiro, B. Jesse; Sheppard, Samuel K.; Colijn, Caroline; Murray, Megan.

In: Genome Medicine, Vol. 6, No. 11, 101, 15.11.2014, p. 1-14.

Research output: Contribution to journalArticle

Farhat, Maha R. ; Shapiro, B. Jesse ; Sheppard, Samuel K. ; Colijn, Caroline ; Murray, Megan. / A phylogeny-based sampling strategy and power calculator informs genome-wide associations study design for microbial pathogens. In: Genome Medicine. 2014 ; Vol. 6, No. 11. pp. 1-14.
@article{2081d5ce27964339838d758027acb10f,
title = "A phylogeny-based sampling strategy and power calculator informs genome-wide associations study design for microbial pathogens",
abstract = "Whole genome sequencing is increasingly used to study phenotypic variation among infectious pathogens and to evaluate their relative transmissibility, virulence, and immunogenicity. To date, relatively little has been published on how and how many pathogen strains should be selected for studies associating phenotype and genotype. There are specific challenges when identifying genetic associations in bacteria which often comprise highly structured populations. Here we consider general methodological questions related to sampling and analysis focusing on clonal to moderately recombining pathogens. We propose that a matched sampling scheme constitutes an efficient study design, and provide a power calculator based on phylogenetic convergence. We demonstrate this approach by applying it to genomic datasets for two microbial pathogens: Mycobacterium tuberculosis and Campylobacter species.",
author = "Farhat, {Maha R.} and Shapiro, {B. Jesse} and Sheppard, {Samuel K.} and Caroline Colijn and Megan Murray",
year = "2014",
month = "11",
day = "15",
doi = "10.1186/s13073-014-0101-7",
language = "English",
volume = "6",
pages = "1--14",
journal = "Genome Medicine",
issn = "1756-994X",
publisher = "BioMed Central",
number = "11",

}

TY - JOUR

T1 - A phylogeny-based sampling strategy and power calculator informs genome-wide associations study design for microbial pathogens

AU - Farhat, Maha R.

AU - Shapiro, B. Jesse

AU - Sheppard, Samuel K.

AU - Colijn, Caroline

AU - Murray, Megan

PY - 2014/11/15

Y1 - 2014/11/15

N2 - Whole genome sequencing is increasingly used to study phenotypic variation among infectious pathogens and to evaluate their relative transmissibility, virulence, and immunogenicity. To date, relatively little has been published on how and how many pathogen strains should be selected for studies associating phenotype and genotype. There are specific challenges when identifying genetic associations in bacteria which often comprise highly structured populations. Here we consider general methodological questions related to sampling and analysis focusing on clonal to moderately recombining pathogens. We propose that a matched sampling scheme constitutes an efficient study design, and provide a power calculator based on phylogenetic convergence. We demonstrate this approach by applying it to genomic datasets for two microbial pathogens: Mycobacterium tuberculosis and Campylobacter species.

AB - Whole genome sequencing is increasingly used to study phenotypic variation among infectious pathogens and to evaluate their relative transmissibility, virulence, and immunogenicity. To date, relatively little has been published on how and how many pathogen strains should be selected for studies associating phenotype and genotype. There are specific challenges when identifying genetic associations in bacteria which often comprise highly structured populations. Here we consider general methodological questions related to sampling and analysis focusing on clonal to moderately recombining pathogens. We propose that a matched sampling scheme constitutes an efficient study design, and provide a power calculator based on phylogenetic convergence. We demonstrate this approach by applying it to genomic datasets for two microbial pathogens: Mycobacterium tuberculosis and Campylobacter species.

UR - http://www.scopus.com/inward/record.url?scp=84925608282&partnerID=8YFLogxK

U2 - 10.1186/s13073-014-0101-7

DO - 10.1186/s13073-014-0101-7

M3 - Article

VL - 6

SP - 1

EP - 14

JO - Genome Medicine

JF - Genome Medicine

SN - 1756-994X

IS - 11

M1 - 101

ER -