A careful reader of the book reports that in footnote 13 (p. 152) we
wrote `fitContinuous`

when we should have written
`fitDiscrete`

.

This footnote should *actually* read:

The main difference between

`fitDiscrete`

and`fitMk`

is the assumption that each function makes about the state at the root node of the tree. In most cases, this difference will be of little consequence, but occasionally, it can affect the inferred model to a larger extent. We recommend thinking carefully about this assumption rather than taking it for granted.

As an additional addendum to this footnote, the root node assumption
can now be equalized using the argument `pi="fitzjohn"`

in
`fitMk`

(and other related *phytools* functions, such
as `fitMk.parallel`

, as we’ll show below in Correction
2).

An astute reader discovered that they were obtaining a different fitted model, with a slightly better likelihood, for the all-rates-different (“ARD”) model with the tree of 119 squamate reptile species and the number of digits in the hindfoot as shown in Chapter 6, section 6.3.3.

Even though (in the end) this model was not particularly
well-supported by our data, it does turn out to be the case that there
are a number of different fitted models – some similar, but others quite
dissimilar one from the other – that have *highly* similar
log-likelihoods, and a few of these are better (by up to about 0.5
log-likelihood units) than the one we reported as our “best-fitting”
model in the book!

In retrospect, this is relatively easily explained: we’re fitting a a highly complex (30 parameter!) model to a relatively small (119 observation) dataset. It shouldn’t be surprising that that the likelihood surface is fairly flat in a variety of different parameter value combinations!

Nonetheless, we can try a couple of different approaches to try to improve optimization.

First, we’ll simply increase the `geiger::fitDiscrete`

argument `control=list(niter)`

. This indicates the number of
times (“iterations”) of optimization that the function will run to try
to find the best solution. `niter`

defaults `100`

,
which in many cases will more than suffice. Due to the high complexity
of the problem we’ll increase it by five-fold over the default
(`niter=100`

) to `niter=500`

.

Next, I’ll also try out the relatively new *phytools* function
`fitMk.parallel`

. This function is a parallelized version of
the *phytools* function `fitMk`

; however, it also uses
some other optimization tricks that I won’t get into here. (I’m going to
use `fitMk.parallel`

by calling `fitMk`

and
setting the optional function argument
`opt.method="optimParallel"`

.)

To start, of course, let’s read in our data from file, just as we did in Chapter 6, section 6.3 of the book.

`library(geiger)`

`## Loading required package: ape`

`## Loading required package: phytools`

`## Loading required package: maps`

```
library(phytools)
## read data matrix
sqData<-read.csv("squamate-data.csv",row.names=1)
## read phylogenetic tree
sqTree<-read.nexus("squamate.tre")
## check name matching
chk<-name.check(sqTree,sqData)
## drop tips of tree that are missing from data matrix
sqTree.pruned<-drop.tip(sqTree,chk$tree_not_data)
## drop rows of matrix that are missing from tree
sqData.pruned<-sqData[!(rownames(sqData)%in%
chk$data_not_tree),,drop=FALSE]
## extract discrete trait
toes<-setNames(as.factor(sqData.pruned[,"rear.toes"]),
rownames(sqData.pruned))
```

Now, let’s run our two optimizations. Before I do, let me mention that the log-likelihood value we reported in the book for our “best-fitting” ARD model in the book was about -109.94 – so we’ll be interested see how these to alternative optimizations compare to that!

```
fitARD<-fitDiscrete(sqTree.pruned,toes,model="ARD",
control=list(niter=500))
fitARD
```

```
## GEIGER-fitted comparative model of discrete data
## fitted Q matrix:
## 0 1 2 3 4 5
## 0 -1.683849e-18 7.540918e-21 1.731984e-19 7.781702e-23 1.500923e-18 2.108822e-21
## 1 1.315710e-02 -1.315710e-02 2.373944e-23 2.913888e-26 1.085139e-24 2.477939e-27
## 2 1.258307e-02 1.873641e-17 -2.352085e-02 1.093778e-02 3.295470e-20 1.085079e-18
## 3 1.743362e-02 5.192556e-18 2.322546e-02 -4.065908e-02 1.531188e-28 1.301789e-19
## 4 3.325086e-19 3.780470e-03 2.953549e-23 1.204786e-02 -4.962866e-02 3.380033e-02
## 5 1.065070e-20 1.648269e-03 2.368410e-04 2.953897e-20 2.795021e-03 -4.680131e-03
##
## model summary:
## log-likelihood = -109.417002
## AIC = 278.834004
## AICc = 299.970367
## free parameters = 30
##
## Convergence diagnostics:
## optimization iterations = 500
## failed iterations = 0
## number of iterations with same best fit = 1
## frequency of best fit = 0.002
##
## object summary:
## 'lik' -- likelihood function
## 'bnd' -- bounds for likelihood search
## 'res' -- optimization iteration summary
## 'opt' -- maximum likelihood parameter estimates
```

```
fitARD.parallel<-fitMk(sqTree.pruned,toes,
model="ARD",pi="fitzjohn",opt.method="optimParallel")
fitARD.parallel
```

```
## Object of class "fitMk".
##
## Fitted (or set) value of Q:
## 0 1 2 3 4 5
## 0 -0.000003 0.000001 0.000001 0.000000 0.000000 0.000000
## 1 0.013129 -0.013131 0.000000 0.000001 0.000000 0.000000
## 2 0.012523 0.000003 -0.022217 0.009686 0.000002 0.000002
## 3 0.017446 0.000002 0.021999 -0.039452 0.000003 0.000003
## 4 0.000002 0.003536 0.000005 0.012066 -0.049368 0.033760
## 5 0.000000 0.001663 0.000237 0.000000 0.002770 -0.004670
##
## Fitted (or set) value of pi:
## 0 1 2 3 4 5
## 0.000000 0.000000 0.000000 0.000000 0.330107 0.669893
## due to treating the root prior as (a) nuisance.
##
## Log-likelihood: -109.422074
##
## Optimization method used was "optimParallel"
```

As noted in Correction 1a., above, we set `pi="fitzjohn"`

so that our root prior probability distribution in `fitMk`

is
the *same* as it was for `fitDiscrete`

in the
*geiger* package.

We can see that even though both of these optimized models fit
*better* than the fitted model we report in the book, they are
also not precisely the same as each other – although they differ by only
about 1/200 of a log-likelihood unit in this case!

Likewise, `fitDiscrete`

only converged on the reported
“best” solution in 1 of 500 optimization iterations, which could give us
some pause.

In addition to this analysis, we can also now set `fitMk`

or `fitMk.parallel`

to use random starting values for the
transition rates between states in **Q**. (This could be
done before – but by manually specifying random starting rates using the
optional function argument `q.init`

.)

Let’s do precisely this, but with 20 random starts, as follows.

```
parallel_fits<-replicate(30,
fitMk(sqTree.pruned,toes,model="ARD",pi="fitzjohn",
opt.method="optimParallel",rand_start=TRUE),
simplify=FALSE)
```

This creates a long list of optimized M*k* models. To pull our
log-likelihoods out in the form of a vector, I’ll use
`sapply`

and the `logLik`

method for the object
class as follows.

```
logL_ard<-sapply(parallel_fits,logLik)
logL_ard
```

```
## [1] -109.5718 -109.4174 -109.5238 -109.8295 -111.7919 -109.8036 -109.4189 -109.4963 -109.4183
## [10] -109.4295 -109.5275 -109.4196 -109.6185 -109.4556 -111.5627 -109.4355 -109.5285 -110.1122
## [19] -109.5822 -113.8775 -113.8863 -109.4339 -113.9989 -109.5780 -109.7992 -109.4211 -109.4974
## [28] -109.9741 -109.5290 -109.9505
```

We can see that these values span a very small range – but they are
*all* better than the log-likelihood that we published for our
optimized model in the book! To see if these small log-likelihood
differences correspond to substantive differences in the fitted model,
let’s plot our top 15 **Q** matrices as follows.

```
rank<-order(logL_ard,decreasing=TRUE)
parallel_fits<-parallel_fits[rank]
par(mfrow=c(3,5))
for(i in 1:15){
plot(parallel_fits[[i]],width=TRUE,color=TRUE,
text=FALSE,cex.traits=1.5)
mtext(paste("log(L) =",round(logLik(parallel_fits[[i]]),
5)),line=-1.5,adj=0.2,cex=1.5)
}
```