Fitting complex diversification models with RPANDA

What is RPANDA?

RPANDA is an R package that can fit a set of complex diversification models to all or parts of a phylogenetic tree.

We will apply RPANDA to understand the diversification of whales. Note that we are going to carry out a simplified version of this analysis with models for only two parts of the clade - for a more complete version, you can go here.

Loading the data

Load the whale data that we will analyze.

whales <- read.tree("whaletree.tre")
plot(whales, cex = 0.35)

The models used in RPANDA

The RPANDA models we will use are the following.

lambda.cst <- function(x,y){y}
lambda.var <- function(x,y){y[1]*exp(y[2]*x)}
mu.cst <- function(x,y){y}
mu.var <- function(x,y){y[1]*exp(y[2]*x)}

These models look like this: rpanda

Basically, we can fit constant rate birth-death models, or one or both of those can vary through time. We can fit each of these four scenarios to all or part of a tree.

Extract the dolphins from the tree

For this exercise, we will extract the dolphins (family Delphinidae) from the tree and fit separate rates to that section and to the rest of the whales.

This is why we are justified in doing that:

wm<-medusa(whales)

## Appropriate  aicc-threshold for a tree of 87 tips is: 4.397737.
## 
## Step 1: lnLik=-277.6942; aicc=557.4119; model=yule
## Step 2: lnLik=-266.7389; aicc=539.6197; shift at node 141; model=yule; cut=stem; # shifts=1
## 
## No significant increase in aicc score. Disregarding subsequent piecewise models.
## 
##   Model.ID Shift.Node Cut.At Model Ln.Lik.part        r epsilon     r.low
## 1        1         88   node  yule   -185.5379 0.076686      NA 0.0576867
## 2        2        141   stem  yule   -81.20095 0.232089      NA 0.1616527
##      r.high
## 1 0.0994802
## 2 0.3204790

wm

## 
## Optimal MEDUSA model for tree with 87 taxa.
## 
##   Model.ID Shift.Node Cut.At Model Ln.Lik.part        r epsilon     r.low
## 1        1         88   node  yule   -185.5379 0.076686      NA 0.0576867
## 2        2        141   stem  yule   -81.20095 0.232089      NA 0.1616527
##      r.high
## 1 0.0994802
## 2 0.3204790
## 
## 95% confidence intervals on parameter values calculated from profile likelihoods

plot(wm)

First find the dolphins using node numbers.

plot(whales, cex = 0.35)
nodelabels(cex = 0.5)

Dephinidae is the group that descends from node 140 in the tree.

delphinidae.tree <- extract.clade(whales, 140)

othercetaceans.tree <- drop.tip(whales,delphinidae.tree$tip.label)
plot(delphinidae.tree, main = "Delphinidae")

plot(othercetaceans.tree, main = "Other cetaceans")

Fit models

First we can fit models to the Delphinidae:

fit.multi.rpanda <- function(tree,par)
    {
        bcstdcst <- fit_bd(tree, max(branching.times(tree)), f.lamb=lambda.cst, f.mu=mu.cst, lamb_par=par[[1]][1],mu_par=par[[1]][2],cst.lamb=TRUE,cst.mu=TRUE,cond="crown",f=87/89,dt=1e-3)
        bvardcst <- fit_bd(tree, max(branching.times(tree)), f.lamb=lambda.var, f.mu=mu.cst, lamb_par=par[[2]][c(1,2)],mu_par=par[[2]][3],expo.lamb=TRUE,cst.mu=TRUE,cond="crown",f=87/89,dt=1e-3)
        bcstdvar <- fit_bd(tree, max(branching.times(tree)), f.lamb=lambda.cst, f.mu=mu.var, lamb_par=par[[3]][1],mu_par=par[[3]][c(2,3)],cst.lamb=TRUE,expo.mu=TRUE,cond="crown",f=87/89,dt=1e-3)
        bvardvar <- fit_bd(tree, max(branching.times(tree)), f.lamb=lambda.var, f.mu=mu.var, lamb_par=par[[4]][c(1,2)],mu_par=par[[4]][c(3,4)],expo.lamb=TRUE,expo.mu=TRUE,cond="crown",f=87/89,dt=1e-3)
        return(list("bcstdcst"=bcstdcst,"bvardcst"=bvardcst,"bcstdvar"=bcstdvar,"bvardvar"=bvardvar))
    }
whales.par <- list(c(0.4,0),c(0.4,-0.05,0),c(0.4,0.1,0.05),c(0.4,-0.05,0.1,0.05))

dolphin_results<- fit.multi.rpanda(delphinidae.tree, whales.par)
other_results <- fit.multi.rpanda(othercetaceans.tree, whales.par)

Now summarize results using aic.

dolp_aic<-numeric(4)
other_aic<-numeric(4)
names(dolp_aic)<-c("bcstdcst", "bvardcst", "bcstdvar", "bvardvar")
names(other_aic)<-names(dolp_aic)
for(i in 1:4) {
  dolp_aic[i]<-dolphin_results[[i]]$aicc
  
  other_aic[i]<-other_results[[i]]$aicc

}

aic.w(dolp_aic)

##  bcstdcst  bvardcst  bcstdvar  bvardvar 
## 0.1814937 0.3620626 0.2666788 0.1897649

aic.w(other_aic)

##  bcstdcst  bvardcst  bcstdvar  bvardvar 
## 0.3995571 0.3413438 0.1293870 0.1297121

These results are completely ambiguous. We likely do not have enough power with such small trees to discriminate among these various hypotheses.

What about the whole tree?

whole_results <- fit.multi.rpanda(whales, whales.par)
whole_aic<-numeric(4)
names(whole_aic)<-c("bcstdcst", "bvardcst", "bcstdvar", "bvardvar")
for(i in 1:4) {
  whole_aic[i]<-whole_results[[i]]$aicc
}

aic.w(whole_aic)

##   bcstdcst   bvardcst   bcstdvar   bvardvar 
## 0.55623992 0.19020578 0.19019717 0.06335714

A little evidence in favor of a constant rate model. Did breaking the tree up improve model fit?

whole_results[[1]]$LH

## [1] -277.6658

dolphin_results[[1]]$LH + other_results[[1]]$LH

## [1] -262.4696

dolphin_results[[2]]$LH + other_results[[1]]$LH

## [1] -260.5794

whole_results[[1]]$aic

## [1] 559.4745

splitLH<-dolphin_results[[1]]$LH + other_results[[1]]$LH
splitAIC <- 2 * 4 - 2 * splitLH
splitAIC

## [1] 532.9392

aic.w(c(whole_results[[1]]$aic, splitAIC))

## [1] 0.00000173 0.99999827

Yes, breaking the tree improved model fit - but trees were too small to differentiate among the various RPANDA models.