1. Corrections to the text of Chapter 6

a) Footnote 13 (section 6.3)

A careful reader of the book reports that in footnote 13 (p. 152) we wrote fitContinuous when we should have written fitDiscrete.

This footnote should actually read:

The main difference between fitDiscrete and fitMk is the assumption that each function makes about the state at the root node of the tree. In most cases, this difference will be of little consequence, but occasionally, it can affect the inferred model to a larger extent. We recommend thinking carefully about this assumption rather than taking it for granted.

As an additional addendum to this footnote, the root node assumption can now be equalized using the argument pi="fitzjohn" in fitMk (and other related phytools functions, such as fitMk.parallel, as we’ll show below in Correction 2).

2. Better ML solution for ARD model (section 6.3.3)

An astute reader discovered that they were obtaining a different fitted model, with a slightly better likelihood, for the all-rates-different (“ARD”) model with the tree of 119 squamate reptile species and the number of digits in the hindfoot as shown in Chapter 6, section 6.3.3.

Even though (in the end) this model was not particularly well-supported by our data, it does turn out to be the case that there are a number of different fitted models – some similar, but others quite dissimilar one from the other – that have highly similar log-likelihoods, and a few of these are better (by up to about 0.5 log-likelihood units) than the one we reported as our “best-fitting” model in the book!

In retrospect, this is relatively easily explained: we’re fitting a a highly complex (30 parameter!) model to a relatively small (119 observation) dataset. It shouldn’t be surprising that that the likelihood surface is fairly flat in a variety of different parameter value combinations!

Nonetheless, we can try a couple of different approaches to try to improve optimization.

First, we’ll simply increase the geiger::fitDiscrete argument control=list(niter). This indicates the number of times (“iterations”) of optimization that the function will run to try to find the best solution. niter defaults 100, which in many cases will more than suffice. Due to the high complexity of the problem we’ll increase it by five-fold over the default (niter=100) to niter=500.

Next, I’ll also try out the relatively new phytools function fitMk.parallel. This function is a parallelized version of the phytools function fitMk; however, it also uses some other optimization tricks that I won’t get into here. (I’m going to use fitMk.parallel by calling fitMk and setting the optional function argument opt.method="optimParallel".)

To start, of course, let’s read in our data from file, just as we did in Chapter 6, section 6.3 of the book.

library(geiger)
## Loading required package: ape
## Loading required package: phytools
## Loading required package: maps
library(phytools)
## read data matrix
sqData<-read.csv("squamate-data.csv",row.names=1)
## read phylogenetic tree
sqTree<-read.nexus("squamate.tre")
## check name matching
chk<-name.check(sqTree,sqData)
## drop tips of tree that are missing from data matrix
sqTree.pruned<-drop.tip(sqTree,chk$tree_not_data)
## drop rows of matrix that are missing from tree
sqData.pruned<-sqData[!(rownames(sqData)%in%
    chk$data_not_tree),,drop=FALSE]
## extract discrete trait
toes<-setNames(as.factor(sqData.pruned[,"rear.toes"]),
    rownames(sqData.pruned))

Now, let’s run our two optimizations. Before I do, let me mention that the log-likelihood value we reported in the book for our “best-fitting” ARD model in the book was about -109.94 – so we’ll be interested see how these to alternative optimizations compare to that!

fitARD<-fitDiscrete(sqTree.pruned,toes,model="ARD",
    control=list(niter=500))
fitARD
## GEIGER-fitted comparative model of discrete data
##  fitted Q matrix:
##                   0             1             2             3             4             5
##     0 -1.683849e-18  7.540918e-21  1.731984e-19  7.781702e-23  1.500923e-18  2.108822e-21
##     1  1.315710e-02 -1.315710e-02  2.373944e-23  2.913888e-26  1.085139e-24  2.477939e-27
##     2  1.258307e-02  1.873641e-17 -2.352085e-02  1.093778e-02  3.295470e-20  1.085079e-18
##     3  1.743362e-02  5.192556e-18  2.322546e-02 -4.065908e-02  1.531188e-28  1.301789e-19
##     4  3.325086e-19  3.780470e-03  2.953549e-23  1.204786e-02 -4.962866e-02  3.380033e-02
##     5  1.065070e-20  1.648269e-03  2.368410e-04  2.953897e-20  2.795021e-03 -4.680131e-03
## 
##  model summary:
##  log-likelihood = -109.417002
##  AIC = 278.834004
##  AICc = 299.970367
##  free parameters = 30
## 
## Convergence diagnostics:
##  optimization iterations = 500
##  failed iterations = 0
##  number of iterations with same best fit = 1
##  frequency of best fit = 0.002
## 
##  object summary:
##  'lik' -- likelihood function
##  'bnd' -- bounds for likelihood search
##  'res' -- optimization iteration summary
##  'opt' -- maximum likelihood parameter estimates
fitARD.parallel<-fitMk(sqTree.pruned,toes,
    model="ARD",pi="fitzjohn",opt.method="optimParallel")
fitARD.parallel
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##           0         1         2         3         4         5
## 0 -0.000003  0.000001  0.000001  0.000000  0.000000  0.000000
## 1  0.013129 -0.013131  0.000000  0.000001  0.000000  0.000000
## 2  0.012523  0.000003 -0.022217  0.009686  0.000002  0.000002
## 3  0.017446  0.000002  0.021999 -0.039452  0.000003  0.000003
## 4  0.000002  0.003536  0.000005  0.012066 -0.049368  0.033760
## 5  0.000000  0.001663  0.000237  0.000000  0.002770 -0.004670
## 
## Fitted (or set) value of pi:
##        0        1        2        3        4        5 
## 0.000000 0.000000 0.000000 0.000000 0.330107 0.669893 
## due to treating the root prior as (a) nuisance.
## 
## Log-likelihood: -109.422074 
## 
## Optimization method used was "optimParallel"

As noted in Correction 1a., above, we set pi="fitzjohn" so that our root prior probability distribution in fitMk is the same as it was for fitDiscrete in the geiger package.

We can see that even though both of these optimized models fit better than the fitted model we report in the book, they are also not precisely the same as each other – although they differ by only about 1/200 of a log-likelihood unit in this case!

Likewise, fitDiscrete only converged on the reported “best” solution in 1 of 500 optimization iterations, which could give us some pause.

In addition to this analysis, we can also now set fitMk or fitMk.parallel to use random starting values for the transition rates between states in Q. (This could be done before – but by manually specifying random starting rates using the optional function argument q.init.)

Let’s do precisely this, but with 20 random starts, as follows.

parallel_fits<-replicate(30,
  fitMk(sqTree.pruned,toes,model="ARD",pi="fitzjohn",
  opt.method="optimParallel",rand_start=TRUE),
  simplify=FALSE)

This creates a long list of optimized Mk models. To pull our log-likelihoods out in the form of a vector, I’ll use sapply and the logLik method for the object class as follows.

logL_ard<-sapply(parallel_fits,logLik)
logL_ard
##  [1] -109.5718 -109.4174 -109.5238 -109.8295 -111.7919 -109.8036 -109.4189 -109.4963 -109.4183
## [10] -109.4295 -109.5275 -109.4196 -109.6185 -109.4556 -111.5627 -109.4355 -109.5285 -110.1122
## [19] -109.5822 -113.8775 -113.8863 -109.4339 -113.9989 -109.5780 -109.7992 -109.4211 -109.4974
## [28] -109.9741 -109.5290 -109.9505

We can see that these values span a very small range – but they are all better than the log-likelihood that we published for our optimized model in the book! To see if these small log-likelihood differences correspond to substantive differences in the fitted model, let’s plot our top 15 Q matrices as follows.

rank<-order(logL_ard,decreasing=TRUE)
parallel_fits<-parallel_fits[rank]
par(mfrow=c(3,5))
for(i in 1:15){
  plot(parallel_fits[[i]],width=TRUE,color=TRUE,
    text=FALSE,cex.traits=1.5)
  mtext(paste("log(L) =",round(logLik(parallel_fits[[i]]),
    5)),line=-1.5,adj=0.2,cex=1.5)
}
Top 15 estimated **Q** matrices from 30 random optimization starts using `fitMk.parallel`, sorted by log-likelihood.

Top 15 estimated Q matrices from 30 random optimization starts using fitMk.parallel, sorted by log-likelihood.

Plotting our top 15 here reveals that, even though their log-likelihoods are quite numerically similar, each fitted model is slightly different one from the other.

Readers may have taken note that the best log-likelihood from fitMk.parallel matches that of our fitted Mk model from fitDiscrete with niter=500. Let’s graph both fitted models to verify that the same Q matrix has been obtained in each case.

fitARD.parallel_best<-parallel_fits[[1]]
par(mfrow=c(1,2))
plot(fitARD,color=TRUE,width=TRUE,show.zeros=FALSE,tol=1e-4,
    mar=c(0.1,0.1,2.1,0.1),signif=5,offset=0.03)
mtext(paste("a) \"ARD\" optimized by geiger::fitDiscrete (lnL = ",
    round(logLik(fitARD),3),")",sep=""),
    line=-1,adj=0.2,cex=1.1)
plot(fitARD.parallel_best,color=TRUE,width=TRUE,show.zeros=FALSE,
    tol=1e-4,mar=c(0.1,0.1,2.1,0.1),signif=5,offset=0.03)
mtext(paste("b) \"ARD\" optimized by phytools::fitMk.parallel (lnL = ",
    round(logLik(fitARD.parallel_best),3),")",sep=""),
    line=-1,adj=0.2,cex=1.1)
Comparison of fitted ARD models for squamate digit number data using two different optimization methods.

Comparison of fitted ARD models for squamate digit number data using two different optimization methods.

Close inspection of the two panels reveals that (to quite a high degree of numerical precision, as a matter of fact), our two different estimates are more or less identical. This is good news – suggesting that even these two methods used different optimization routines, they nonetheless converged to identical solutions. (Keeping in mind, of course, that compared to our “directional” evolutionary model, the ARD model was quite poorly supported!)