Exercise 5: Models of continuous character evolution

In this tutorial, we will learn how to fit & compare alternative models for continuous character evolution for a single continuous character trait.

We will start by estimating 'phylogenetic signal,' a very simple & commonly-used measure of the tendency of related species to resemble one-another. Then, we will learn how to fit & compare a series of alternative evolutionary models for continuous traits.

Phylogenetic signal

First, start by loading the packages & data that we will use for the present exercise:

## packages
library(phytools)
library(geiger)

We will use data for body size & a phylogeny of 100 Anolis lizard species from the Caribbean. The files can be downloaded here:

  1. svl.csv
  2. Anolis.tre

First check to make sure the two files are in your current working directory, and then read them from file:

anole.tree<-read.tree("Anolis.tre")
obj<-read.csv("svl.csv",row.names=1,header=TRUE)

Let's examine the objects we have created. First, the tree:

anole.tree
## 
## Phylogenetic tree with 100 tips and 99 internal nodes.
## 
## Tip labels:
##  ahli, allogus, rubribarbus, imias, sagrei, bremeri, ...
## 
## Rooted; includes branch lengths.
plotTree(anole.tree,fsize=0.9,ftype="i",type="fan",lwd=1)

plot of chunk unnamed-chunk-3

obj is a data frame. We will convert it to a simple vector.

## convert to a vector:
svl<-setNames(obj$svl,rownames(obj))
svl
##            ahli         alayoni         alfaroi        aliniger 
##        4.039125        3.815705        3.526655        4.036557 
##        allisoni         allogus   altitudinalis         alumina 
##        4.375390        4.040138        3.842994        3.588941 
##       alutaceus     angusticeps     argenteolus     argillaceus 
##        3.554891        3.788595        3.971307        3.757869 
##         armouri   bahorucoensis        baleatus        baracoae 
##        4.121684        3.827445        5.053056        5.042780 
##       barahonae        barbatus        barbouri        bartschi 
##        5.076958        5.003946        3.663932        4.280547 
##         bremeri        breslini    brevirostris        caudalis 
##        4.113371        4.051111        3.874155        3.911743 
##       centralis  chamaeleonides    chlorocyanus     christophei 
##        3.697941        5.042349        4.275448        3.884652 
##       clivicola     coelestinus        confusus           cooki 
##        3.758726        4.297965        3.938442        4.091535 
##    cristatellus    cupeyalensis         cuvieri    cyanopleurus 
##        4.189820        3.462014        4.875012        3.630161 
##         cybotes     darlingtoni       distichus dolichocephalus 
##        4.210982        4.302036        3.928796        3.908550 
##       equestris      etheridgei   eugenegrahami       evermanni 
##        5.113994        3.657991        4.128504        4.165605 
##         fowleri         garmani         grahami           guafe 
##        4.288780        4.769473        4.154274        3.877457 
##       guamuhaya         guazuma       gundlachi       haetianus 
##        5.036953        3.763884        4.188105        4.316542 
##      hendersoni      homolechis           imias    inexpectatus 
##        3.859835        4.032806        4.099687        3.537439 
##       insolitus        isolepis           jubar           krugi 
##        3.800471        3.657088        3.952605        3.886500 
##      lineatopus   longitibialis        loysiana          lucius 
##        4.128612        4.242103        3.701240        4.198915 
##    luteogularis      macilentus        marcanoi          marron 
##        5.101085        3.715765        4.079485        3.831810 
##         mestrei       monticola          noblei        occultus 
##        3.987147        3.770613        5.083473        3.663049 
##         olssoni        opalinus      ophiolepis        oporinus 
##        3.793899        3.838376        3.637962        3.845670 
##        paternus        placidus       poncensis        porcatus 
##        3.802961        3.773967        3.820378        4.258991 
##          porcus      pulchellus         pumilis quadriocellifer 
##        5.038034        3.799022        3.466860        3.901619 
##      reconditus        ricordii     rubribarbus          sagrei 
##        4.482607        5.013963        4.078469        4.067162 
##    semilineatus        sheplani         shrevei      singularis 
##        3.696631        3.682924        3.983003        4.057997 
##      smallwoodi         strahmi       stratulus     valencienni 
##        5.035096        4.274271        3.869881        4.321524 
##       vanidicus    vermiculatus        websteri       whitemani 
##        3.626206        4.802849        3.916546        4.097479

Let's conduct two main tests for phylogenetic signal using the anole body size data. The first test is Blomberg’s K, which compares the variance of PICs to what we would expect under a Brownian motion model. K = 1 means that relatives resemble one another as much as we should expect under BM; K < 1 means that there is less “phylogenetic signal” than expected under BM, while K > 1 means that there is more. A significant p-value returned from phylosig tells you that there is significant phylogenetic signal - that is, close relatives are more similar than random pairs of species.

phylosig(anole.tree,svl,test=TRUE)
## $K
## [1] 1.553678
## 
## $P
## [1] 0.001

Another method for testing phylogenetic signal is Pagel’s λ. λ is a scaling coefficient for the expected covariances between species. One way to interpret this is as a tree transformation that stretches tip branches (and thus the predicted variances between species) relative to internal branches (and predicted covariances). λ close to zero, means phylogenetic signal equivalent to that expected if the data arose on a star phylogeny (that is, no phylogenetic signal). λ = 1 corresponds to a Brownian motion model; 0 < λ < 1 is in between.

Another way to think about λ is as an implicit transformation of the tree in which λ = 0 is equivalente to a start-phylogeny, whereas λ = 1 has covariances among related species that match those implied by the original phylogeny. We can start by visualizing the implication of different values of λ on the predicted covariances between species, as represented by a tree transformation:

par(mfcol=c(1,3))
plotTree(phytools:::lambdaTree(anole.tree,1),ftype="i",
    mar=c(0.1,0.1,4.1,0.1),fsize=0.6,lwd=1)
title(main=expression(paste("Pagel's ",lambda," = 1.0",sep="")))
plotTree(phytools:::lambdaTree(anole.tree,0.5),ftype="i",
    mar=c(0.1,0.1,4.1,0.1),fsize=0.6,lwd=1)
title(main=expression(paste("Pagel's ",lambda," = 0.5",sep="")))
plotTree(phytools:::lambdaTree(anole.tree,0),ftype="i",
    mar=c(0.1,0.1,4.1,0.1),fsize=0.6,lwd=1)
title(main=expression(paste("Pagel's ",lambda," = 0.0",sep="")))

plot of chunk unnamed-chunk-6

Now, let's estimate phylogenetic signal using Pagel's λ with the phylosig function in phytools:

phylosig(anole.tree,svl,method="lambda",test=TRUE)
## $lambda
## [1] 1.016502
## 
## $logL
## [1] -3.810016
## 
## $logL0
## [1] -60.01946
## 
## $P
## [1] 2.892589e-26

Fitting models using fitContinuous

In addition to these relatively simple measures of phylogenetic signal, we can also fit alternative univariate models of trait evolution using the fitContinuous function in the geiger package.

There are a large number of models - but the most commonly considerd models of trait evolution for a single continuous trait on the tree are the Brownian motion (BM) model, the 'early-burst' model (EB) in which character change tends to be concentrated towards the base of the tree, and the Ornstein-Uhlenbeck model (OU), which is used to model trait evolution with the tendency towards a central value - such as under constant stabilizing selection.

Let's start by fitting each of these models:

fitBM<-fitContinuous(anole.tree,svl)
fitOU<-fitContinuous(anole.tree,svl,model="OU")
fitEB<-fitContinuous(anole.tree,svl,model="EB")

Let's inspect the results of each of these models:

fitBM
## GEIGER-fitted comparative model of continuous data
##  fitted 'BM' model parameters:
##  sigsq = 0.136160
##  z0 = 4.065918
## 
##  model summary:
##  log-likelihood = -4.700404
##  AIC = 13.400807
##  AICc = 13.524519
##  free parameters = 2
## 
## Convergence diagnostics:
##  optimization iterations = 100
##  failed iterations = 0
##  frequency of best fit = 1.00
## 
##  object summary:
##  'lik' -- likelihood function
##  'bnd' -- bounds for likelihood search
##  'res' -- optimization iteration summary
##  'opt' -- maximum likelihood parameter estimates
fitOU
## GEIGER-fitted comparative model of continuous data
##  fitted 'OU' model parameters:
##  alpha = 0.000000
##  sigsq = 0.136160
##  z0 = 4.065918
## 
##  model summary:
##  log-likelihood = -4.700404
##  AIC = 15.400807
##  AICc = 15.650807
##  free parameters = 3
## 
## Convergence diagnostics:
##  optimization iterations = 100
##  failed iterations = 0
##  frequency of best fit = 0.80
## 
##  object summary:
##  'lik' -- likelihood function
##  'bnd' -- bounds for likelihood search
##  'res' -- optimization iteration summary
##  'opt' -- maximum likelihood parameter estimates
fitEB
## GEIGER-fitted comparative model of continuous data
##  fitted 'EB' model parameters:
##  a = -0.736272
##  sigsq = 0.233528
##  z0 = 4.066519
## 
##  model summary:
##  log-likelihood = -4.285970
##  AIC = 14.571939
##  AICc = 14.821939
##  free parameters = 3
## 
## Convergence diagnostics:
##  optimization iterations = 100
##  failed iterations = 0
##  frequency of best fit = 0.31
## 
##  object summary:
##  'lik' -- likelihood function
##  'bnd' -- bounds for likelihood search
##  'res' -- optimization iteration summary
##  'opt' -- maximum likelihood parameter estimates

We can compare these models most easily using AIC (or the small-sample corrected AICc). AIC is an 'information criterion' that weights the fit of the model against the number of parameters in the model to help us measure the strength of evidence for each model. Lower AIC values indicate better evidence for a given model. We can also compute the AIC-weights - which essentially standardizes the AIC scores of fitted alternative models to measure the relative weight of evidence for each model in our data.

aic.vals<-setNames(c(fitBM$opt$aicc,fitOU$opt$aicc,fitEB$opt$aicc),
    c("BM","OU","EB"))
aic.vals
##       BM       OU       EB 
## 13.52452 15.65081 14.82194
aic.w(aic.vals)
##        BM        OU        EB 
## 0.5353068 0.1848779 0.2798153

In this case, all models are supported - but the greatest strength of evidence is for the simplest model - Brownian motion.

Challenge Problem 3: Fitting models of continuous character evolution

Download the following two files for cyprinid fishes:

  1. Cyprinidae-data.csv
  2. Cyprinidae-tree.tre

Estimate phylogenetic signal using Blomberg's K and Pagel's λ, the fit the BM, OU, and EB models to the data. The second column of the data matrix contains the standard errors of each species mean. Read the documentation files of phylosig and fitContinuous to figure out how that can be taken into account. Which model fits best?

Written by Liam J. Revell. Last updated 1 August 2017.