Publishing re-usable phylogenetic trees, in theory and practice

Arlin Stoltzfus, Brian O'Meara, Jamie Whitacre, Ross Mounce, Dan Rosauer, Rutger Vos, Arlin Stoltzfus

Research output: Contribution to conferenceOther

68 Downloads (Pure)

Abstract

Sharing and re-use of data are essential to the progressive and self-correcting nature of science. In recognition of this principle, journals and funding agencies have adopted policies to encourage sharing of information (‘data’), including empirical data as well as computed inferences such as phylogenetic trees.
Here we summarize an ongoing analysis of 1) current practices for sharing phylogenetic trees and associated data; 2) current barriers to effective sharing and reuse of such data; and 3) prospects for reducing these barriers to promote more widespread sharing and re-use. Currently, the technical infrastructure is available to support (with some limitations) rudimentary archiving in conjunction with manuscript publication. Yet, most published trees are not archived, and there is no community standard governing the recommended format or content to ensure a re-usable phylogenetic record. Without a shift in emphasis toward re-usability, along with technology and standards to support such a shift, the value of trees (whether disseminated via public archives, or by other means) will be limited. Interviews with actual or potential secondary consumers of phylogenetic results suggest that there is a considerable market for re-use, but that most attempts end in disappointment. Phylogenetic results available via author requests, journal web sites, archival repositories and project web sites rarely include the critical information that secondary consumers seek, such as unique identifiers for biological sources (including species sources and accession numbers), indicators of quality, and documentation of the analytical methods used to obtain the results.
Based on the analysis presented here, we suggest that enabling effective re-use entails a commitment by the research community to several changes from current practice: 1) using globally unique identifiers (GUIDs) to reference informational and material entities; 2) developing and using technology for documenting and exchanging the metadata that facilitate re-use; and 3) supporting development and use of a minimal reporting standard that indicates what data and metadata are considered essential for a re-useable phylogenetic record. We suggest that re-use may be catalyzed most rapidly by identifying and targeting (with appropriate technology) the most promising circumstances for re-use. These might include the extraction of sub-trees from large trees (for use in reconciliation, classification, and comparative analysis); the re-use of seed alignments, sub-alignments and homologized characters; the linking of phylogenies to geographic information (for use in ecology, phylogeography and biogeography); and the construction of supertrees and supermatrices.
Original languageEnglish
DOIs
Publication statusPublished - 2011
EventiEvoBio 2011 - Norman, Oklahoma, USA United States
Duration: 20 Jun 2011 → …

Conference

ConferenceiEvoBio 2011
CountryUSA United States
CityNorman, Oklahoma
Period20/06/11 → …

Fingerprint

phylogenetics
metadata
appropriate technology
phylogeography
biogeography
repository
targeting
analytical method
phylogeny
infrastructure
ecology
seed
market
analysis
alignment

Cite this

Stoltzfus, A., O'Meara, B., Whitacre, J., Mounce, R., Rosauer, D., Vos, R., & Stoltzfus, A. (2011). Publishing re-usable phylogenetic trees, in theory and practice. iEvoBio 2011, Norman, Oklahoma, USA United States. https://doi.org/10.1038/npre.2011.6048.1

Publishing re-usable phylogenetic trees, in theory and practice. / Stoltzfus, Arlin; O'Meara, Brian; Whitacre, Jamie; Mounce, Ross; Rosauer, Dan; Vos, Rutger; Stoltzfus, Arlin.

2011. iEvoBio 2011, Norman, Oklahoma, USA United States.

Research output: Contribution to conferenceOther

Stoltzfus, A, O'Meara, B, Whitacre, J, Mounce, R, Rosauer, D, Vos, R & Stoltzfus, A 2011, 'Publishing re-usable phylogenetic trees, in theory and practice' iEvoBio 2011, Norman, Oklahoma, USA United States, 20/06/11, . https://doi.org/10.1038/npre.2011.6048.1
Stoltzfus A, O'Meara B, Whitacre J, Mounce R, Rosauer D, Vos R et al. Publishing re-usable phylogenetic trees, in theory and practice. 2011. iEvoBio 2011, Norman, Oklahoma, USA United States. https://doi.org/10.1038/npre.2011.6048.1
Stoltzfus, Arlin ; O'Meara, Brian ; Whitacre, Jamie ; Mounce, Ross ; Rosauer, Dan ; Vos, Rutger ; Stoltzfus, Arlin. / Publishing re-usable phylogenetic trees, in theory and practice. iEvoBio 2011, Norman, Oklahoma, USA United States.
@conference{fafd2c0995434a91b0c3d87048a45efd,
title = "Publishing re-usable phylogenetic trees, in theory and practice",
abstract = "Sharing and re-use of data are essential to the progressive and self-correcting nature of science. In recognition of this principle, journals and funding agencies have adopted policies to encourage sharing of information (‘data’), including empirical data as well as computed inferences such as phylogenetic trees. Here we summarize an ongoing analysis of 1) current practices for sharing phylogenetic trees and associated data; 2) current barriers to effective sharing and reuse of such data; and 3) prospects for reducing these barriers to promote more widespread sharing and re-use. Currently, the technical infrastructure is available to support (with some limitations) rudimentary archiving in conjunction with manuscript publication. Yet, most published trees are not archived, and there is no community standard governing the recommended format or content to ensure a re-usable phylogenetic record. Without a shift in emphasis toward re-usability, along with technology and standards to support such a shift, the value of trees (whether disseminated via public archives, or by other means) will be limited. Interviews with actual or potential secondary consumers of phylogenetic results suggest that there is a considerable market for re-use, but that most attempts end in disappointment. Phylogenetic results available via author requests, journal web sites, archival repositories and project web sites rarely include the critical information that secondary consumers seek, such as unique identifiers for biological sources (including species sources and accession numbers), indicators of quality, and documentation of the analytical methods used to obtain the results. Based on the analysis presented here, we suggest that enabling effective re-use entails a commitment by the research community to several changes from current practice: 1) using globally unique identifiers (GUIDs) to reference informational and material entities; 2) developing and using technology for documenting and exchanging the metadata that facilitate re-use; and 3) supporting development and use of a minimal reporting standard that indicates what data and metadata are considered essential for a re-useable phylogenetic record. We suggest that re-use may be catalyzed most rapidly by identifying and targeting (with appropriate technology) the most promising circumstances for re-use. These might include the extraction of sub-trees from large trees (for use in reconciliation, classification, and comparative analysis); the re-use of seed alignments, sub-alignments and homologized characters; the linking of phylogenies to geographic information (for use in ecology, phylogeography and biogeography); and the construction of supertrees and supermatrices.",
author = "Arlin Stoltzfus and Brian O'Meara and Jamie Whitacre and Ross Mounce and Dan Rosauer and Rutger Vos and Arlin Stoltzfus",
year = "2011",
doi = "10.1038/npre.2011.6048.1",
language = "English",
note = "iEvoBio 2011 ; Conference date: 20-06-2011",

}

TY - CONF

T1 - Publishing re-usable phylogenetic trees, in theory and practice

AU - Stoltzfus, Arlin

AU - O'Meara, Brian

AU - Whitacre, Jamie

AU - Mounce, Ross

AU - Rosauer, Dan

AU - Vos, Rutger

AU - Stoltzfus, Arlin

PY - 2011

Y1 - 2011

N2 - Sharing and re-use of data are essential to the progressive and self-correcting nature of science. In recognition of this principle, journals and funding agencies have adopted policies to encourage sharing of information (‘data’), including empirical data as well as computed inferences such as phylogenetic trees. Here we summarize an ongoing analysis of 1) current practices for sharing phylogenetic trees and associated data; 2) current barriers to effective sharing and reuse of such data; and 3) prospects for reducing these barriers to promote more widespread sharing and re-use. Currently, the technical infrastructure is available to support (with some limitations) rudimentary archiving in conjunction with manuscript publication. Yet, most published trees are not archived, and there is no community standard governing the recommended format or content to ensure a re-usable phylogenetic record. Without a shift in emphasis toward re-usability, along with technology and standards to support such a shift, the value of trees (whether disseminated via public archives, or by other means) will be limited. Interviews with actual or potential secondary consumers of phylogenetic results suggest that there is a considerable market for re-use, but that most attempts end in disappointment. Phylogenetic results available via author requests, journal web sites, archival repositories and project web sites rarely include the critical information that secondary consumers seek, such as unique identifiers for biological sources (including species sources and accession numbers), indicators of quality, and documentation of the analytical methods used to obtain the results. Based on the analysis presented here, we suggest that enabling effective re-use entails a commitment by the research community to several changes from current practice: 1) using globally unique identifiers (GUIDs) to reference informational and material entities; 2) developing and using technology for documenting and exchanging the metadata that facilitate re-use; and 3) supporting development and use of a minimal reporting standard that indicates what data and metadata are considered essential for a re-useable phylogenetic record. We suggest that re-use may be catalyzed most rapidly by identifying and targeting (with appropriate technology) the most promising circumstances for re-use. These might include the extraction of sub-trees from large trees (for use in reconciliation, classification, and comparative analysis); the re-use of seed alignments, sub-alignments and homologized characters; the linking of phylogenies to geographic information (for use in ecology, phylogeography and biogeography); and the construction of supertrees and supermatrices.

AB - Sharing and re-use of data are essential to the progressive and self-correcting nature of science. In recognition of this principle, journals and funding agencies have adopted policies to encourage sharing of information (‘data’), including empirical data as well as computed inferences such as phylogenetic trees. Here we summarize an ongoing analysis of 1) current practices for sharing phylogenetic trees and associated data; 2) current barriers to effective sharing and reuse of such data; and 3) prospects for reducing these barriers to promote more widespread sharing and re-use. Currently, the technical infrastructure is available to support (with some limitations) rudimentary archiving in conjunction with manuscript publication. Yet, most published trees are not archived, and there is no community standard governing the recommended format or content to ensure a re-usable phylogenetic record. Without a shift in emphasis toward re-usability, along with technology and standards to support such a shift, the value of trees (whether disseminated via public archives, or by other means) will be limited. Interviews with actual or potential secondary consumers of phylogenetic results suggest that there is a considerable market for re-use, but that most attempts end in disappointment. Phylogenetic results available via author requests, journal web sites, archival repositories and project web sites rarely include the critical information that secondary consumers seek, such as unique identifiers for biological sources (including species sources and accession numbers), indicators of quality, and documentation of the analytical methods used to obtain the results. Based on the analysis presented here, we suggest that enabling effective re-use entails a commitment by the research community to several changes from current practice: 1) using globally unique identifiers (GUIDs) to reference informational and material entities; 2) developing and using technology for documenting and exchanging the metadata that facilitate re-use; and 3) supporting development and use of a minimal reporting standard that indicates what data and metadata are considered essential for a re-useable phylogenetic record. We suggest that re-use may be catalyzed most rapidly by identifying and targeting (with appropriate technology) the most promising circumstances for re-use. These might include the extraction of sub-trees from large trees (for use in reconciliation, classification, and comparative analysis); the re-use of seed alignments, sub-alignments and homologized characters; the linking of phylogenies to geographic information (for use in ecology, phylogeography and biogeography); and the construction of supertrees and supermatrices.

UR - http://dx.doi.org/10.1038/npre.2011.6048.1

U2 - 10.1038/npre.2011.6048.1

DO - 10.1038/npre.2011.6048.1

M3 - Other

ER -