A careful reader of the book reports that in footnote 13 (p. 152) we
wrote fitContinuous
when we should have written
fitDiscrete
.
This footnote should actually read:
The main difference between
fitDiscrete
andfitMk
is the assumption that each function makes about the state at the root node of the tree. In most cases, this difference will be of little consequence, but occasionally, it can affect the inferred model to a larger extent. We recommend thinking carefully about this assumption rather than taking it for granted.
As an additional addendum to this footnote, the root node assumption
can now be equalized using the argument pi="fitzjohn"
in
fitMk
(and other related phytools functions, such
as fitMk.parallel
, as we’ll show below in Correction
2).
An astute reader discovered that they were obtaining a different fitted model, with a slightly better likelihood, for the all-rates-different (“ARD”) model with the tree of 119 squamate reptile species and the number of digits in the hindfoot as shown in Chapter 6, section 6.3.3.
Even though (in the end) this model was not particularly well-supported by our data, it does turn out to be the case that there are a number of different fitted models – some similar, but others quite dissimilar one from the other – that have highly similar log-likelihoods, and a few of these are better (by up to about 0.5 log-likelihood units) than the one we reported as our “best-fitting” model in the book!
In retrospect, this is relatively easily explained: we’re fitting a a highly complex (30 parameter!) model to a relatively small (119 observation) dataset. It shouldn’t be surprising that that the likelihood surface is fairly flat in a variety of different parameter value combinations!
Nonetheless, we can try a couple of different approaches to try to improve optimization.
First, we’ll simply increase the geiger::fitDiscrete
argument control=list(niter)
. This indicates the number of
times (“iterations”) of optimization that the function will run to try
to find the best solution. niter
defaults 100
,
which in many cases will more than suffice. Due to the high complexity
of the problem we’ll increase it by five-fold over the default
(niter=100
) to niter=500
.
Next, I’ll also try out the relatively new phytools function
fitMk.parallel
. This function is a parallelized version of
the phytools function fitMk
; however, it also uses
some other optimization tricks that I won’t get into here. (I’m going to
use fitMk.parallel
by calling fitMk
and
setting the optional function argument
opt.method="optimParallel"
.)
To start, of course, let’s read in our data from file, just as we did in Chapter 6, section 6.3 of the book.
library(geiger)
## Loading required package: ape
## Loading required package: phytools
## Loading required package: maps
library(phytools)
## read data matrix
sqData<-read.csv("squamate-data.csv",row.names=1)
## read phylogenetic tree
sqTree<-read.nexus("squamate.tre")
## check name matching
chk<-name.check(sqTree,sqData)
## drop tips of tree that are missing from data matrix
sqTree.pruned<-drop.tip(sqTree,chk$tree_not_data)
## drop rows of matrix that are missing from tree
sqData.pruned<-sqData[!(rownames(sqData)%in%
chk$data_not_tree),,drop=FALSE]
## extract discrete trait
toes<-setNames(as.factor(sqData.pruned[,"rear.toes"]),
rownames(sqData.pruned))
Now, let’s run our two optimizations. Before I do, let me mention that the log-likelihood value we reported in the book for our “best-fitting” ARD model in the book was about -109.94 – so we’ll be interested see how these to alternative optimizations compare to that!
fitARD<-fitDiscrete(sqTree.pruned,toes,model="ARD",
control=list(niter=500))
fitARD
## GEIGER-fitted comparative model of discrete data
## fitted Q matrix:
## 0 1 2 3 4 5
## 0 -1.683849e-18 7.540918e-21 1.731984e-19 7.781702e-23 1.500923e-18 2.108822e-21
## 1 1.315710e-02 -1.315710e-02 2.373944e-23 2.913888e-26 1.085139e-24 2.477939e-27
## 2 1.258307e-02 1.873641e-17 -2.352085e-02 1.093778e-02 3.295470e-20 1.085079e-18
## 3 1.743362e-02 5.192556e-18 2.322546e-02 -4.065908e-02 1.531188e-28 1.301789e-19
## 4 3.325086e-19 3.780470e-03 2.953549e-23 1.204786e-02 -4.962866e-02 3.380033e-02
## 5 1.065070e-20 1.648269e-03 2.368410e-04 2.953897e-20 2.795021e-03 -4.680131e-03
##
## model summary:
## log-likelihood = -109.417002
## AIC = 278.834004
## AICc = 299.970367
## free parameters = 30
##
## Convergence diagnostics:
## optimization iterations = 500
## failed iterations = 0
## number of iterations with same best fit = 1
## frequency of best fit = 0.002
##
## object summary:
## 'lik' -- likelihood function
## 'bnd' -- bounds for likelihood search
## 'res' -- optimization iteration summary
## 'opt' -- maximum likelihood parameter estimates
fitARD.parallel<-fitMk(sqTree.pruned,toes,
model="ARD",pi="fitzjohn",opt.method="optimParallel")
fitARD.parallel
## Object of class "fitMk".
##
## Fitted (or set) value of Q:
## 0 1 2 3 4 5
## 0 -0.000003 0.000001 0.000001 0.000000 0.000000 0.000000
## 1 0.013129 -0.013131 0.000000 0.000001 0.000000 0.000000
## 2 0.012523 0.000003 -0.022217 0.009686 0.000002 0.000002
## 3 0.017446 0.000002 0.021999 -0.039452 0.000003 0.000003
## 4 0.000002 0.003536 0.000005 0.012066 -0.049368 0.033760
## 5 0.000000 0.001663 0.000237 0.000000 0.002770 -0.004670
##
## Fitted (or set) value of pi:
## 0 1 2 3 4 5
## 0.000000 0.000000 0.000000 0.000000 0.330107 0.669893
## due to treating the root prior as (a) nuisance.
##
## Log-likelihood: -109.422074
##
## Optimization method used was "optimParallel"
As noted in Correction 1a., above, we set pi="fitzjohn"
so that our root prior probability distribution in fitMk
is
the same as it was for fitDiscrete
in the
geiger package.
We can see that even though both of these optimized models fit better than the fitted model we report in the book, they are also not precisely the same as each other – although they differ by only about 1/200 of a log-likelihood unit in this case!
Likewise, fitDiscrete
only converged on the reported
“best” solution in 1 of 500 optimization iterations, which could give us
some pause.
In addition to this analysis, we can also now set fitMk
or fitMk.parallel
to use random starting values for the
transition rates between states in Q. (This could be
done before – but by manually specifying random starting rates using the
optional function argument q.init
.)
Let’s do precisely this, but with 20 random starts, as follows.
parallel_fits<-replicate(30,
fitMk(sqTree.pruned,toes,model="ARD",pi="fitzjohn",
opt.method="optimParallel",rand_start=TRUE),
simplify=FALSE)
This creates a long list of optimized Mk models. To pull our
log-likelihoods out in the form of a vector, I’ll use
sapply
and the logLik
method for the object
class as follows.
logL_ard<-sapply(parallel_fits,logLik)
logL_ard
## [1] -109.5718 -109.4174 -109.5238 -109.8295 -111.7919 -109.8036 -109.4189 -109.4963 -109.4183
## [10] -109.4295 -109.5275 -109.4196 -109.6185 -109.4556 -111.5627 -109.4355 -109.5285 -110.1122
## [19] -109.5822 -113.8775 -113.8863 -109.4339 -113.9989 -109.5780 -109.7992 -109.4211 -109.4974
## [28] -109.9741 -109.5290 -109.9505
We can see that these values span a very small range – but they are all better than the log-likelihood that we published for our optimized model in the book! To see if these small log-likelihood differences correspond to substantive differences in the fitted model, let’s plot our top 15 Q matrices as follows.
rank<-order(logL_ard,decreasing=TRUE)
parallel_fits<-parallel_fits[rank]
par(mfrow=c(3,5))
for(i in 1:15){
plot(parallel_fits[[i]],width=TRUE,color=TRUE,
text=FALSE,cex.traits=1.5)
mtext(paste("log(L) =",round(logLik(parallel_fits[[i]]),
5)),line=-1.5,adj=0.2,cex=1.5)
}