RPANDA is an R package that can fit a set of complex diversification models to all or parts of a phylogenetic tree.
We will apply RPANDA to understand the diversification of whales. Note that we are going to carry out a simplified version of this analysis with models for only two parts of the clade - for a more complete version, you can go here.
Load the whale data that we will analyze.
whales <- read.tree("whaletree.tre")
plot(whales, cex = 0.35)
The RPANDA models we will use are the following.
lambda.cst <- function(x,y){y}
lambda.var <- function(x,y){y[1]*exp(y[2]*x)}
mu.cst <- function(x,y){y}
mu.var <- function(x,y){y[1]*exp(y[2]*x)}
These models look like this:
Basically, we can fit constant rate birth-death models, or one or both of those can vary through time. We can fit each of these four scenarios to all or part of a tree.
For this exercise, we will extract the dolphins (family Delphinidae) from the tree and fit separate rates to that section and to the rest of the whales.
This is why we are justified in doing that:
wm<-medusa(whales)
## Appropriate aicc-threshold for a tree of 87 tips is: 4.397737.
##
## Step 1: lnLik=-277.6942; aicc=557.4119; model=yule
## Step 2: lnLik=-266.7389; aicc=539.6197; shift at node 141; model=yule; cut=stem; # shifts=1
##
## No significant increase in aicc score. Disregarding subsequent piecewise models.
##
## Model.ID Shift.Node Cut.At Model Ln.Lik.part r epsilon r.low
## 1 1 88 node yule -185.5379 0.076686 NA 0.0576867
## 2 2 141 stem yule -81.20095 0.232089 NA 0.1616527
## r.high
## 1 0.0994802
## 2 0.3204790
wm
##
## Optimal MEDUSA model for tree with 87 taxa.
##
## Model.ID Shift.Node Cut.At Model Ln.Lik.part r epsilon r.low
## 1 1 88 node yule -185.5379 0.076686 NA 0.0576867
## 2 2 141 stem yule -81.20095 0.232089 NA 0.1616527
## r.high
## 1 0.0994802
## 2 0.3204790
##
## 95% confidence intervals on parameter values calculated from profile likelihoods
plot(wm)
First find the dolphins using node numbers.
plot(whales, cex = 0.35)
nodelabels(cex = 0.5)
Dephinidae is the group that descends from node 140 in the tree.
delphinidae.tree <- extract.clade(whales, 140)
othercetaceans.tree <- drop.tip(whales,delphinidae.tree$tip.label)
plot(delphinidae.tree, main = "Delphinidae")
plot(othercetaceans.tree, main = "Other cetaceans")
First we can fit models to the Delphinidae:
fit.multi.rpanda <- function(tree,par)
{
bcstdcst <- fit_bd(tree, max(branching.times(tree)), f.lamb=lambda.cst, f.mu=mu.cst, lamb_par=par[[1]][1],mu_par=par[[1]][2],cst.lamb=TRUE,cst.mu=TRUE,cond="crown",f=87/89,dt=1e-3)
bvardcst <- fit_bd(tree, max(branching.times(tree)), f.lamb=lambda.var, f.mu=mu.cst, lamb_par=par[[2]][c(1,2)],mu_par=par[[2]][3],expo.lamb=TRUE,cst.mu=TRUE,cond="crown",f=87/89,dt=1e-3)
bcstdvar <- fit_bd(tree, max(branching.times(tree)), f.lamb=lambda.cst, f.mu=mu.var, lamb_par=par[[3]][1],mu_par=par[[3]][c(2,3)],cst.lamb=TRUE,expo.mu=TRUE,cond="crown",f=87/89,dt=1e-3)
bvardvar <- fit_bd(tree, max(branching.times(tree)), f.lamb=lambda.var, f.mu=mu.var, lamb_par=par[[4]][c(1,2)],mu_par=par[[4]][c(3,4)],expo.lamb=TRUE,expo.mu=TRUE,cond="crown",f=87/89,dt=1e-3)
return(list("bcstdcst"=bcstdcst,"bvardcst"=bvardcst,"bcstdvar"=bcstdvar,"bvardvar"=bvardvar))
}
whales.par <- list(c(0.4,0),c(0.4,-0.05,0),c(0.4,0.1,0.05),c(0.4,-0.05,0.1,0.05))
dolphin_results<- fit.multi.rpanda(delphinidae.tree, whales.par)
other_results <- fit.multi.rpanda(othercetaceans.tree, whales.par)
Now summarize results using aic.
dolp_aic<-numeric(4)
other_aic<-numeric(4)
names(dolp_aic)<-c("bcstdcst", "bvardcst", "bcstdvar", "bvardvar")
names(other_aic)<-names(dolp_aic)
for(i in 1:4) {
dolp_aic[i]<-dolphin_results[[i]]$aicc
other_aic[i]<-other_results[[i]]$aicc
}
aic.w(dolp_aic)
## bcstdcst bvardcst bcstdvar bvardvar
## 0.1814937 0.3620626 0.2666788 0.1897649
aic.w(other_aic)
## bcstdcst bvardcst bcstdvar bvardvar
## 0.3995571 0.3413438 0.1293870 0.1297121
These results are completely ambiguous. We likely do not have enough power with such small trees to discriminate among these various hypotheses.
What about the whole tree?
whole_results <- fit.multi.rpanda(whales, whales.par)
whole_aic<-numeric(4)
names(whole_aic)<-c("bcstdcst", "bvardcst", "bcstdvar", "bvardvar")
for(i in 1:4) {
whole_aic[i]<-whole_results[[i]]$aicc
}
aic.w(whole_aic)
## bcstdcst bvardcst bcstdvar bvardvar
## 0.55623992 0.19020578 0.19019717 0.06335714
A little evidence in favor of a constant rate model. Did breaking the tree up improve model fit?
whole_results[[1]]$LH
## [1] -277.6658
dolphin_results[[1]]$LH + other_results[[1]]$LH
## [1] -262.4696
dolphin_results[[2]]$LH + other_results[[1]]$LH
## [1] -260.5794
whole_results[[1]]$aic
## [1] 559.4745
splitLH<-dolphin_results[[1]]$LH + other_results[[1]]$LH
splitAIC <- 2 * 4 - 2 * splitLH
splitAIC
## [1] 532.9392
aic.w(c(whole_results[[1]]$aic, splitAIC))
## [1] 0.00000173 0.99999827
Yes, breaking the tree improved model fit - but trees were too small to differentiate among the various RPANDA models.