Abstract
Whole-genome sequencing projects are increasingly populating the tree of life and characterizing biodiversity1–4. Sparse taxon sampling has previously been proposed to confound phylogenetic inference5, and captures only a fraction of the genomic diversity. Here we report a substantial step towards the dense representation of avian phylogenetic and molecular diversity, by analysing 363 genomes from 92.4% of bird families—including 267 newly sequenced genomes produced for phase II of the Bird 10,000 Genomes (B10K) Project. We use this comparative genome dataset in combination with a pipeline that leverages a reference-free whole-genome alignment to identify orthologous regions in greater numbers than has previously been possible and to recognize genomic novelties in particular bird lineages. The densely sampled alignment provides a single-base-pair map of selection, has more than doubled the fraction of bases that are confidently predicted to be under conservation and reveals extensive patterns of weak selection in predominantly non-coding DNA. Our results demonstrate that increasing the diversity of genomes used in comparative studies can reveal more shared and lineage-specific variation, and improve the investigation of genomic characteristics. We anticipate that this genomic resource will offer new perspectives on evolutionary processes in cross-species comparative analyses and assist in efforts to conserve species.
Original language | English |
---|---|
Pages (from-to) | 252-257 |
Number of pages | 6 |
Journal | Nature |
Volume | 587 |
Issue number | 7833 |
Early online date | 11 Nov 2020 |
DOIs | |
Publication status | Published - 12 Nov 2020 |
Funding
Rowe (Museums Victoria), K. Winker (University of Alaska Museum) and the late A. Baker (Royal Ontario Museum) for providing tissue samples; B. J. Novak for sample coordination; Dovetail Genomics for the assembly of Caloenas nicobarica; T. Riede for helpful discussions of the mechanism and evolution of the vocal tract filter in songbirds; and China National Genebank at BGI for contributing to the sequencing for the B10K Project. The final version of the manuscript was approved by H. G. Spencer (University of Otago), in place of the late I.G.J. This work was supported by Strategic Priority Research Program of the Chinese Academy of Sciences (XDB31020000), International Partnership Program of Chinese Academy of Sciences (no. 152453KYSB20170002), Carlsberg Foundation (CF16-0663) and Villum Foundation (no. 25900) to G.Z. This work was also supported in part by National Natural Science Foundation of China no. 31901214 to S.F., ERC Consolidator Grant 681396 to M.T.P.G. and Howard Hughes Medical Institute funds to E.D.J., the National Institutes of Health (award numbers 5U54HG007990, 5T32HG008345-04, 1U01HL137183, R01HG010053, U01HL137183 and U54HG007990) to B. Paten. Supercomputing was partially performed using the DeiC National Life Science Supercomputer, Computerome, at the Technical University of Denmark. Portions of this research were also conducted with high-performance computing resources provided by Louisiana State University (http://www.hpc.lsu.edu). Parts of this work and its text were included in J.A.’s PhD thesis18.
ASJC Scopus subject areas
- General