Using false discovery rates to benchmark SNP-callers in next-generation sequencing projects

Rhys A Farrer, Daniel A Henk, Dan MacLean, David J Studholme, Matthew C Fisher

Research output: Contribution to journalArticle

21 Citations (Scopus)
58 Downloads (Pure)

Abstract

Sequence alignments form the basis for many comparative and population genomic studies. Alignment
tools provide a range of accuracies dependent on the divergence between the sequences and the alignment
methods. Despite widespread use, there is no standard method for assessing the accuracy of a dataset and
alignment strategy after resequencing. We present a framework and tool for determining the overall
accuracies of an input read dataset, alignment and SNP-calling method providing an isolate in that dataset
has a corresponding, or closely related reference sequence available. In addition to this tool for comparing
FalseDiscoveryRates(FDR),weincludeamethodfordetermininghomozygousandheterozygouspositions
from an alignment using binomial probabilities for an expected error rate. We benchmark this method
against other SNP callers using our FDR method with three fungal genomes, finding that it was able achieve
a high level of accuracy. These tools are available at http://cfdr.sourc
Original languageEnglish
Article number1512
JournalScientific Reports
Volume3
DOIs
Publication statusPublished - 21 Mar 2013

Fingerprint

Genes

Cite this

Using false discovery rates to benchmark SNP-callers in next-generation sequencing projects. / Farrer, Rhys A; Henk, Daniel A; MacLean, Dan; Studholme, David J; Fisher, Matthew C.

In: Scientific Reports, Vol. 3, 1512, 21.03.2013.

Research output: Contribution to journalArticle

Farrer, Rhys A ; Henk, Daniel A ; MacLean, Dan ; Studholme, David J ; Fisher, Matthew C. / Using false discovery rates to benchmark SNP-callers in next-generation sequencing projects. In: Scientific Reports. 2013 ; Vol. 3.
@article{28de31264c274552a867436aefb1a045,
title = "Using false discovery rates to benchmark SNP-callers in next-generation sequencing projects",
abstract = "Sequence alignments form the basis for many comparative and population genomic studies. Alignmenttools provide a range of accuracies dependent on the divergence between the sequences and the alignmentmethods. Despite widespread use, there is no standard method for assessing the accuracy of a dataset andalignment strategy after resequencing. We present a framework and tool for determining the overallaccuracies of an input read dataset, alignment and SNP-calling method providing an isolate in that datasethas a corresponding, or closely related reference sequence available. In addition to this tool for comparingFalseDiscoveryRates(FDR),weincludeamethodfordetermininghomozygousandheterozygouspositionsfrom an alignment using binomial probabilities for an expected error rate. We benchmark this methodagainst other SNP callers using our FDR method with three fungal genomes, finding that it was able achievea high level of accuracy. These tools are available at http://cfdr.sourc",
author = "Farrer, {Rhys A} and Henk, {Daniel A} and Dan MacLean and Studholme, {David J} and Fisher, {Matthew C}",
year = "2013",
month = "3",
day = "21",
doi = "10.1038/srep01512",
language = "English",
volume = "3",
journal = "Scientific Reports",
issn = "2045-2322",
publisher = "Nature Publishing Group",

}

TY - JOUR

T1 - Using false discovery rates to benchmark SNP-callers in next-generation sequencing projects

AU - Farrer, Rhys A

AU - Henk, Daniel A

AU - MacLean, Dan

AU - Studholme, David J

AU - Fisher, Matthew C

PY - 2013/3/21

Y1 - 2013/3/21

N2 - Sequence alignments form the basis for many comparative and population genomic studies. Alignmenttools provide a range of accuracies dependent on the divergence between the sequences and the alignmentmethods. Despite widespread use, there is no standard method for assessing the accuracy of a dataset andalignment strategy after resequencing. We present a framework and tool for determining the overallaccuracies of an input read dataset, alignment and SNP-calling method providing an isolate in that datasethas a corresponding, or closely related reference sequence available. In addition to this tool for comparingFalseDiscoveryRates(FDR),weincludeamethodfordetermininghomozygousandheterozygouspositionsfrom an alignment using binomial probabilities for an expected error rate. We benchmark this methodagainst other SNP callers using our FDR method with three fungal genomes, finding that it was able achievea high level of accuracy. These tools are available at http://cfdr.sourc

AB - Sequence alignments form the basis for many comparative and population genomic studies. Alignmenttools provide a range of accuracies dependent on the divergence between the sequences and the alignmentmethods. Despite widespread use, there is no standard method for assessing the accuracy of a dataset andalignment strategy after resequencing. We present a framework and tool for determining the overallaccuracies of an input read dataset, alignment and SNP-calling method providing an isolate in that datasethas a corresponding, or closely related reference sequence available. In addition to this tool for comparingFalseDiscoveryRates(FDR),weincludeamethodfordetermininghomozygousandheterozygouspositionsfrom an alignment using binomial probabilities for an expected error rate. We benchmark this methodagainst other SNP callers using our FDR method with three fungal genomes, finding that it was able achievea high level of accuracy. These tools are available at http://cfdr.sourc

UR - http://www.scopus.com/inward/record.url?scp=84875767941&partnerID=8YFLogxK

UR - http://dx.doi.org/10.1038/srep01512

U2 - 10.1038/srep01512

DO - 10.1038/srep01512

M3 - Article

VL - 3

JO - Scientific Reports

JF - Scientific Reports

SN - 2045-2322

M1 - 1512

ER -