Probabilistic methods outperform parsimony in the phylogenetic analysis of data simulated without a probabilistic model

Mark N. Puttick, Joseph E. O'Reilly, Davide Pisani, Philip C.J. Donoghue

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

To understand patterns and processes of the diversification of life, we require an accurate understanding of taxon interrelationships. Recent studies have suggested that analyses of morphological character data using the Bayesian and maximum likelihood Mk model provide phylogenies of higher accuracy compared to parsimony methods. This has proved controversial, particularly studies simulating morphology-data under Markov models that assume shared branch lengths for characters, as it is claimed this leads to bias favouring the Bayesian or maximum likelihood Mk model over parsimony models which do not explicitly make this assumption. We avoid these potential issues by employing a simulation protocol in which character states are randomly assigned to tips, but datasets are constrained to an empirically realistic distribution of homoplasy as measured by the consistency index. Datasets were analysed with equal weights and implied weights parsimony, and the maximum likelihood and Bayesian Mk model. We find that consistent (low homoplasy) datasets render method choice largely irrelevant, as all methods perform well with high consistency (low homoplasy) datasets, but the largest discrepancies in accuracy occur with low consistency datasets (high homoplasy). In such cases, the Bayesian Mk model is significantly more accurate than alternative models and implied weights parsimony never significantly outperforms the Bayesian Mk model. When poorly supported branches are collapsed, the Bayesian Mk model recovers trees with higher resolution compared to other methods. As it is not possible to assess homoplasy independently of a tree estimate, the Bayesian Mk model emerges as the most reliable approach for categorical morphological analyses.

Original languageEnglish
Pages (from-to)1-17
Number of pages17
JournalPalaeontology
Volume62
Issue number1
Early online date19 Aug 2018
DOIs
Publication statusPublished - 1 Jan 2019

Fingerprint

probabilistic models
phylogenetics
phylogeny
methodology
method
analysis

Keywords

  • Bayesian
  • likelihood
  • morphology
  • parsimony
  • phylogenetics
  • simulation

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Palaeontology

Cite this

Probabilistic methods outperform parsimony in the phylogenetic analysis of data simulated without a probabilistic model. / Puttick, Mark N.; O'Reilly, Joseph E.; Pisani, Davide; Donoghue, Philip C.J.

In: Palaeontology, Vol. 62, No. 1, 01.01.2019, p. 1-17.

Research output: Contribution to journalArticle

Puttick, Mark N. ; O'Reilly, Joseph E. ; Pisani, Davide ; Donoghue, Philip C.J. / Probabilistic methods outperform parsimony in the phylogenetic analysis of data simulated without a probabilistic model. In: Palaeontology. 2019 ; Vol. 62, No. 1. pp. 1-17.
@article{ecd8d094915d44f1807a0b5563edd988,
title = "Probabilistic methods outperform parsimony in the phylogenetic analysis of data simulated without a probabilistic model",
abstract = "To understand patterns and processes of the diversification of life, we require an accurate understanding of taxon interrelationships. Recent studies have suggested that analyses of morphological character data using the Bayesian and maximum likelihood Mk model provide phylogenies of higher accuracy compared to parsimony methods. This has proved controversial, particularly studies simulating morphology-data under Markov models that assume shared branch lengths for characters, as it is claimed this leads to bias favouring the Bayesian or maximum likelihood Mk model over parsimony models which do not explicitly make this assumption. We avoid these potential issues by employing a simulation protocol in which character states are randomly assigned to tips, but datasets are constrained to an empirically realistic distribution of homoplasy as measured by the consistency index. Datasets were analysed with equal weights and implied weights parsimony, and the maximum likelihood and Bayesian Mk model. We find that consistent (low homoplasy) datasets render method choice largely irrelevant, as all methods perform well with high consistency (low homoplasy) datasets, but the largest discrepancies in accuracy occur with low consistency datasets (high homoplasy). In such cases, the Bayesian Mk model is significantly more accurate than alternative models and implied weights parsimony never significantly outperforms the Bayesian Mk model. When poorly supported branches are collapsed, the Bayesian Mk model recovers trees with higher resolution compared to other methods. As it is not possible to assess homoplasy independently of a tree estimate, the Bayesian Mk model emerges as the most reliable approach for categorical morphological analyses.",
keywords = "Bayesian, likelihood, morphology, parsimony, phylogenetics, simulation",
author = "Puttick, {Mark N.} and O'Reilly, {Joseph E.} and Davide Pisani and Donoghue, {Philip C.J.}",
year = "2019",
month = "1",
day = "1",
doi = "10.1111/pala.12388",
language = "English",
volume = "62",
pages = "1--17",
journal = "Palaeontology",
issn = "0031-0239",
publisher = "Wiley-Blackwell",
number = "1",

}

TY - JOUR

T1 - Probabilistic methods outperform parsimony in the phylogenetic analysis of data simulated without a probabilistic model

AU - Puttick, Mark N.

AU - O'Reilly, Joseph E.

AU - Pisani, Davide

AU - Donoghue, Philip C.J.

PY - 2019/1/1

Y1 - 2019/1/1

N2 - To understand patterns and processes of the diversification of life, we require an accurate understanding of taxon interrelationships. Recent studies have suggested that analyses of morphological character data using the Bayesian and maximum likelihood Mk model provide phylogenies of higher accuracy compared to parsimony methods. This has proved controversial, particularly studies simulating morphology-data under Markov models that assume shared branch lengths for characters, as it is claimed this leads to bias favouring the Bayesian or maximum likelihood Mk model over parsimony models which do not explicitly make this assumption. We avoid these potential issues by employing a simulation protocol in which character states are randomly assigned to tips, but datasets are constrained to an empirically realistic distribution of homoplasy as measured by the consistency index. Datasets were analysed with equal weights and implied weights parsimony, and the maximum likelihood and Bayesian Mk model. We find that consistent (low homoplasy) datasets render method choice largely irrelevant, as all methods perform well with high consistency (low homoplasy) datasets, but the largest discrepancies in accuracy occur with low consistency datasets (high homoplasy). In such cases, the Bayesian Mk model is significantly more accurate than alternative models and implied weights parsimony never significantly outperforms the Bayesian Mk model. When poorly supported branches are collapsed, the Bayesian Mk model recovers trees with higher resolution compared to other methods. As it is not possible to assess homoplasy independently of a tree estimate, the Bayesian Mk model emerges as the most reliable approach for categorical morphological analyses.

AB - To understand patterns and processes of the diversification of life, we require an accurate understanding of taxon interrelationships. Recent studies have suggested that analyses of morphological character data using the Bayesian and maximum likelihood Mk model provide phylogenies of higher accuracy compared to parsimony methods. This has proved controversial, particularly studies simulating morphology-data under Markov models that assume shared branch lengths for characters, as it is claimed this leads to bias favouring the Bayesian or maximum likelihood Mk model over parsimony models which do not explicitly make this assumption. We avoid these potential issues by employing a simulation protocol in which character states are randomly assigned to tips, but datasets are constrained to an empirically realistic distribution of homoplasy as measured by the consistency index. Datasets were analysed with equal weights and implied weights parsimony, and the maximum likelihood and Bayesian Mk model. We find that consistent (low homoplasy) datasets render method choice largely irrelevant, as all methods perform well with high consistency (low homoplasy) datasets, but the largest discrepancies in accuracy occur with low consistency datasets (high homoplasy). In such cases, the Bayesian Mk model is significantly more accurate than alternative models and implied weights parsimony never significantly outperforms the Bayesian Mk model. When poorly supported branches are collapsed, the Bayesian Mk model recovers trees with higher resolution compared to other methods. As it is not possible to assess homoplasy independently of a tree estimate, the Bayesian Mk model emerges as the most reliable approach for categorical morphological analyses.

KW - Bayesian

KW - likelihood

KW - morphology

KW - parsimony

KW - phylogenetics

KW - simulation

UR - http://www.scopus.com/inward/record.url?scp=85053201479&partnerID=8YFLogxK

U2 - 10.1111/pala.12388

DO - 10.1111/pala.12388

M3 - Article

VL - 62

SP - 1

EP - 17

JO - Palaeontology

JF - Palaeontology

SN - 0031-0239

IS - 1

ER -