For this exercise we will fit models of discrete character evolution. We can use a dataset from a paper by Brandley et al. (2008 Evolution 62:2042)

First, load the data files into R. You can download the files here:

  1. brandley_table.csv
  2. squamate.tre
library(ape)
library(phytools)
## Loading required package: maps
## 
##  # maps v3.1: updated 'world': all lakes moved to separate new #
##  # 'lakes' database. Type '?world' or 'news(package="maps")'.  #
sqData<-read.csv("brandley_table.csv")
sqTree<-read.nexus("squamate.tre")


plotTree(sqTree,type="fan",lwd=1,fsize=0.5)

The data set that we have is composed of measurements for continuous characters. To get a character for “limbed” versus “limbless” we will have to create it by finding all species with all limbs of length 0.

hindLimbs<-sqData[,"HLL"]!=0
foreLimbs<-sqData[,"FLL"]!=0
limbless<-!hindLimbs & !foreLimbs
sum(limbless)
## [1] 51

Now we need to get species names that match our tree

speciesNames<-sqData[,1]
species<-sub(" ", "_", speciesNames)
names(limbless)<-species

Now we can fit models

library(geiger)

# now we can try to compare the fit of some models using fitDiscrete
erModel<-fitDiscrete(sqTree, limbless, model="ER")
## Warning in treedata(phy, dat): The following tips were not found in 'phy' and were dropped from 'data':
##  Gonatodes_albogularis
##  Lepidophyma_flavimaculatum
##  Trachyboa_boulengeri
ardModel<-fitDiscrete(sqTree, limbless, model="ARD")
## Warning in treedata(phy, dat): The following tips were not found in 'phy' and were dropped from 'data':
##  Gonatodes_albogularis
##  Lepidophyma_flavimaculatum
##  Trachyboa_boulengeri
im <- cbind(c(NA, 0), c(1, NA))
irrModel<-fitDiscrete(sqTree, limbless, model=im)
## Warning in treedata(phy, dat): The following tips were not found in 'phy' and were dropped from 'data':
##  Gonatodes_albogularis
##  Lepidophyma_flavimaculatum
##  Trachyboa_boulengeri
## Warning in fitDiscrete(sqTree, limbless, model = im): Parameter estimates appear at bounds:
##  q21
# are these different?
erModel$opt$aicc
## [1] 162.99
ardModel$opt$aicc
## [1] 162.8147
irrModel$opt$aicc
## [1] 169.8133
# Irreversible model is the best by far

aicc <- c(erModel$opt$aicc, ardModel$opt$aicc, irrModel$opt$aicc)
aiccD <- aicc - min(aicc)
aw <- exp(-0.5 * aiccD)
aiccW <- aw/sum(aw)
aiccW
## [1] 0.47067894 0.51379478 0.01552628

What happens if we try a character with more states? Let’s explore more.

table(foreLimbs, hindLimbs)
##          hindLimbs
## foreLimbs FALSE TRUE
##     FALSE    51   42
##     TRUE      3  165

So there are four different patterns out there. Let’s try to consider each as a separate state.

limbState<-paste(foreLimbs, hindLimbs)
names(limbState)<-species

erModel<-fitDiscrete(sqTree, limbState, model="ER")
## Warning in treedata(phy, dat): The following tips were not found in 'phy' and were dropped from 'data':
##  Gonatodes_albogularis
##  Lepidophyma_flavimaculatum
##  Trachyboa_boulengeri
symModel<-fitDiscrete(sqTree, limbState, model="SYM")
## Warning in treedata(phy, dat): The following tips were not found in 'phy' and were dropped from 'data':
##  Gonatodes_albogularis
##  Lepidophyma_flavimaculatum
##  Trachyboa_boulengeri
ardModel<-fitDiscrete(sqTree, limbState, model="ARD")
## Warning in treedata(phy, dat): The following tips were not found in 'phy' and were dropped from 'data':
##  Gonatodes_albogularis
##  Lepidophyma_flavimaculatum
##  Trachyboa_boulengeri
# are these different?
erModel$opt$aicc
## [1] 276.3895
symModel$opt$aicc
## [1] 261.7112
ardModel$opt$aicc
## [1] 267.3036
# SYM model is the best, and ER is substantially worse

aicc <- c(erModel$opt$aicc, symModel$opt$aicc, ardModel$opt$aicc)
aiccD <- aicc - min(aicc)
aw <- exp(-0.5 * aiccD)
aiccW <- aw/sum(aw)
aiccW
## [1] 0.0006118476 0.9418937231 0.0574944293
symModel
## GEIGER-fitted comparative model of discrete data
##  fitted Q matrix:
##                  FALSE FALSE    FALSE TRUE    TRUE FALSE     TRUE TRUE
##     FALSE FALSE -0.005122005  3.378297e-03  4.769400e-04  1.266768e-03
##     FALSE TRUE   0.003378297 -4.198246e-03  1.431625e-25  8.199491e-04
##     TRUE FALSE   0.000476940  1.431625e-25 -4.769400e-04  4.688964e-18
##     TRUE TRUE    0.001266768  8.199491e-04  4.688964e-18 -2.086717e-03
## 
##  model summary:
##  log-likelihood = -124.688269
##  AIC = 261.376537
##  AICc = 261.711199
##  free parameters = 6
## 
## Convergence diagnostics:
##  optimization iterations = 100
##  failed iterations = 0
##  frequency of best fit = 0.05
## 
##  object summary:
##  'lik' -- likelihood function
##  'bnd' -- bounds for likelihood search
##  'res' -- optimization iteration summary
##  'opt' -- maximum likelihood parameter estimates

Challenge problem 4: Discrete character evolution

OK, now you are ready for a challenge problem! You have a particular hypothesis about the four-state character: limbs can be lost, either one at a time or both at once, but not regained. Formulate this model in terms of a matrix - as we did for the irreversible model above - and test your hypothesis. Then, modify your matrix to not allow both limbs to be lost at once, and test that hypothesis. Is limb loss irreversable? Can both limbs be lost at once?