For this exercise we will fit models of discrete character evolution. We can use a dataset from a paper by Brandley et al. (2008 Evolution 62:2042)
First, load the data files into R. You can download the files here:
library(ape)
library(phytools)
## Loading required package: maps
##
## # maps v3.1: updated 'world': all lakes moved to separate new #
## # 'lakes' database. Type '?world' or 'news(package="maps")'. #
sqData<-read.csv("brandley_table.csv")
sqTree<-read.nexus("squamate.tre")
plotTree(sqTree,type="fan",lwd=1,fsize=0.5)
The data set that we have is composed of measurements for continuous characters. To get a character for “limbed” versus “limbless” we will have to create it by finding all species with all limbs of length 0.
hindLimbs<-sqData[,"HLL"]!=0
foreLimbs<-sqData[,"FLL"]!=0
limbless<-!hindLimbs & !foreLimbs
sum(limbless)
## [1] 51
Now we need to get species names that match our tree
speciesNames<-sqData[,1]
species<-sub(" ", "_", speciesNames)
names(limbless)<-species
Now we can fit models
library(geiger)
# now we can try to compare the fit of some models using fitDiscrete
erModel<-fitDiscrete(sqTree, limbless, model="ER")
## Warning in treedata(phy, dat): The following tips were not found in 'phy' and were dropped from 'data':
## Gonatodes_albogularis
## Lepidophyma_flavimaculatum
## Trachyboa_boulengeri
ardModel<-fitDiscrete(sqTree, limbless, model="ARD")
## Warning in treedata(phy, dat): The following tips were not found in 'phy' and were dropped from 'data':
## Gonatodes_albogularis
## Lepidophyma_flavimaculatum
## Trachyboa_boulengeri
im <- cbind(c(NA, 0), c(1, NA))
irrModel<-fitDiscrete(sqTree, limbless, model=im)
## Warning in treedata(phy, dat): The following tips were not found in 'phy' and were dropped from 'data':
## Gonatodes_albogularis
## Lepidophyma_flavimaculatum
## Trachyboa_boulengeri
## Warning in fitDiscrete(sqTree, limbless, model = im): Parameter estimates appear at bounds:
## q21
# are these different?
erModel$opt$aicc
## [1] 162.99
ardModel$opt$aicc
## [1] 162.8147
irrModel$opt$aicc
## [1] 169.8133
# Irreversible model is the best by far
aicc <- c(erModel$opt$aicc, ardModel$opt$aicc, irrModel$opt$aicc)
aiccD <- aicc - min(aicc)
aw <- exp(-0.5 * aiccD)
aiccW <- aw/sum(aw)
aiccW
## [1] 0.47067894 0.51379478 0.01552628
What happens if we try a character with more states? Let’s explore more.
table(foreLimbs, hindLimbs)
## hindLimbs
## foreLimbs FALSE TRUE
## FALSE 51 42
## TRUE 3 165
So there are four different patterns out there. Let’s try to consider each as a separate state.
limbState<-paste(foreLimbs, hindLimbs)
names(limbState)<-species
erModel<-fitDiscrete(sqTree, limbState, model="ER")
## Warning in treedata(phy, dat): The following tips were not found in 'phy' and were dropped from 'data':
## Gonatodes_albogularis
## Lepidophyma_flavimaculatum
## Trachyboa_boulengeri
symModel<-fitDiscrete(sqTree, limbState, model="SYM")
## Warning in treedata(phy, dat): The following tips were not found in 'phy' and were dropped from 'data':
## Gonatodes_albogularis
## Lepidophyma_flavimaculatum
## Trachyboa_boulengeri
ardModel<-fitDiscrete(sqTree, limbState, model="ARD")
## Warning in treedata(phy, dat): The following tips were not found in 'phy' and were dropped from 'data':
## Gonatodes_albogularis
## Lepidophyma_flavimaculatum
## Trachyboa_boulengeri
# are these different?
erModel$opt$aicc
## [1] 276.3895
symModel$opt$aicc
## [1] 261.7112
ardModel$opt$aicc
## [1] 267.3036
# SYM model is the best, and ER is substantially worse
aicc <- c(erModel$opt$aicc, symModel$opt$aicc, ardModel$opt$aicc)
aiccD <- aicc - min(aicc)
aw <- exp(-0.5 * aiccD)
aiccW <- aw/sum(aw)
aiccW
## [1] 0.0006118476 0.9418937231 0.0574944293
symModel
## GEIGER-fitted comparative model of discrete data
## fitted Q matrix:
## FALSE FALSE FALSE TRUE TRUE FALSE TRUE TRUE
## FALSE FALSE -0.005122005 3.378297e-03 4.769400e-04 1.266768e-03
## FALSE TRUE 0.003378297 -4.198246e-03 1.431625e-25 8.199491e-04
## TRUE FALSE 0.000476940 1.431625e-25 -4.769400e-04 4.688964e-18
## TRUE TRUE 0.001266768 8.199491e-04 4.688964e-18 -2.086717e-03
##
## model summary:
## log-likelihood = -124.688269
## AIC = 261.376537
## AICc = 261.711199
## free parameters = 6
##
## Convergence diagnostics:
## optimization iterations = 100
## failed iterations = 0
## frequency of best fit = 0.05
##
## object summary:
## 'lik' -- likelihood function
## 'bnd' -- bounds for likelihood search
## 'res' -- optimization iteration summary
## 'opt' -- maximum likelihood parameter estimates
OK, now you are ready for a challenge problem! You have a particular hypothesis about the four-state character: limbs can be lost, either one at a time or both at once, but not regained. Formulate this model in terms of a matrix - as we did for the irreversible model above - and test your hypothesis. Then, modify your matrix to not allow both limbs to be lost at once, and test that hypothesis. Is limb loss irreversable? Can both limbs be lost at once?