Here are a few more details about Brownian motion

BMlk is function that calculates the likelihood of BM. There are two steps to follow here .

  1. We can get the ML estimate of sigma^2 following the equation from O’Meara 2006.

Eq. 1

\[\hat{\sigma}^2 = \frac{[X - E(X)]'C^{-1}[X- E(X)]}{N}\]

Where \(X\) is a vector of the tip values, \(E(X)\) is the expected values at the root, \(C\) is the phylogenetic variance-covariance matrix, and \(N\) is the number of tips. Also…

  1. We can get the likelihood of any value of \(\sigma^2\) by using equation 1:

Eq. 2

\[log(L) = log\left [ \frac{exp\left \{ -\frac{1}{2}[X - E(X)]'V^{-1}[X- E(X)] \right \}}{((2\pi)^N*det(V)} \right ]\]

\(V\) here is called the rate matrix. It is simply the proposed value of \(\sigma^2\) multiplied by the phylogenetic variance-covariance matrix. So, Eq. 2 is saying that the likelihood (or rather the log of the likelihood) is a function of the difference between the tip and root values and the rate matrix, \(V\) (which is simply the vcv matrix multiplied by some value of \(\sigma^2\) that we wish to evaluate).

We use can use Eq 2 to write a function in R that will allow us to compare likelihoods given some tree and and a \(\sigma^2\). Our function will take the following arguments…

require(phytools)
## Loading required package: phytools
## Loading required package: ape
## Loading required package: maps
## 
##  # maps v3.1: updated 'world': all lakes moved to separate new #
##  # 'lakes' database. Type '?world' or 'news(package="maps")'.  #
BMlk <- function(C, inv.C, sigmasq, root.state, data) {
  
    N <- length(data); # the number of tips
    EX <- rep(root.state, N) # creates a vector of the expected trait value - which under BM is the root state
    V <- C * sigmasq; # multiply the entries in C by the BM rate
    inv.V <- inv.C * sigmasq ^-1; # do the same for the inverted matrix using the inverse of the rate
        
    lnlNum<- -0.5*(data - EX) %*% inv.V %*% (data - EX)
    lnlDen<- log(sqrt((2*pi)^N*det(V)))
    L<-lnlNum-lnlDen
    return(L);
}

Note that we are going to have to do a little work before we can use our function. We need to pass it a vcv matrix and the inverse of the matrix, we need to pass in a root state value, and we need to pass a value of \(\sigma^2\). Let’s try a fast example. We will…

  1. simulate a 5 taxon tree
  2. simulate BM on that tree with a known \(\sigma^2\) of 0.5
  3. compare the likelihood of the generating \(\sigma^2\) to a smaller \(\sigma^2\) of 0.005.
# we can check this works for a really simple example - let's simulate a tree and some data and try it out.
require(TreeSim);
## Loading required package: TreeSim
## Loading required package: laser
## Loading required package: geiger
require(phytools);


t <- 10  # total time
n <- 100  # total taxa
b <- (log(n) - log(2))/t
tree <- pbtree(b = b, n = n, t = t)
## simulating with both taxa-stop (n) and time-stop (t) is
## performed via rejection sampling & may be slow
## 
##   99 trees rejected before finding a tree
data <- fastBM(tree, sig2 = 0.5);
plotTree(ladderize(tree))

#data
# we'll need the covariance matrix
c <- vcvPhylo(tree, anc.nodes = F);
#c
inv.c <- solve(c);

Lets compare the likelihoods of the generating sigma square and a much lower value

#inv.c

## now we'll compute the likelihood of the true value and a different value ##

BMlk(c, inv.c, sigmasq = 0.5, root.state = 0, data  = data); # the true value for sigsq
##           [,1]
## [1,] -150.7476
#BMlk(c, inv.c, sigmasq = 0.05, root.state = 0, data  = data); # 
BMlk(c, inv.c, sigmasq = 0.005, root.state = 0, data  = data); # 
##           [,1]
## [1,] -5899.365

Remember when comparing likelihoods, the higher the number, the better the likelihood (because the likelihood of a hypothesis is proportional to the probability of the data given that the hypothesis is true). Smaller (more negative) log likelihoods are also smaller likelihoods (raise \(e\) to the power of the competing likelihoods to check this) so we can see that our data prefers the the generating \(\sigma^2\) value to the alternative.

We can also look at the likelihood surface around the generating value

vals <- rep(0,100)
sigmas <- 1:100/100
thetas <- 1:100/1000
for(ii in 1:100){
  
vals[ii] <- BMlk(c, inv.c, sigmasq = sigmas[ii], root.state = sigmas[ii], data  = data)
} 
vals
##   [1] -2913.9403 -1438.7831  -955.7866  -718.5371  -578.7152  -487.1794
##   [7]  -422.9934  -375.7506  -339.7030  -311.4221  -288.7390  -270.2161
##  [13]  -254.8642  -241.9808  -231.0539  -221.7016  -213.6338  -206.6262
##  [19]  -200.5028  -195.1237  -190.3762  -186.1688  -182.4263  -179.0865
##  [25]  -176.0975  -173.4156  -171.0037  -168.8305  -166.8689  -165.0957
##  [31]  -163.4909  -162.0369  -160.7186  -159.5225  -158.4370  -157.4516
##  [37]  -156.5572  -155.7455  -155.0093  -154.3421  -153.7381  -153.1920
##  [43]  -152.6992  -152.2552  -151.8564  -151.4992  -151.1804  -150.8972
##  [49]  -150.6469  -150.4271  -150.2357  -150.0705  -149.9299  -149.8120
##  [55]  -149.7154  -149.6385  -149.5801  -149.5389  -149.5138  -149.5037
##  [61]  -149.5077  -149.5249  -149.5545  -149.5956  -149.6475  -149.7096
##  [67]  -149.7812  -149.8618  -149.9508  -150.0476  -150.1519  -150.2631
##  [73]  -150.3808  -150.5047  -150.6344  -150.7695  -150.9097  -151.0547
##  [79]  -151.2042  -151.3579  -151.5157  -151.6771  -151.8421  -152.0104
##  [85]  -152.1818  -152.3562  -152.5332  -152.7128  -152.8948  -153.0791
##  [91]  -153.2655  -153.4539  -153.6441  -153.8360  -154.0296  -154.2247
##  [97]  -154.4211  -154.6189  -154.8179  -155.0180
#filled.contour(sigmas, sigmas, vals)
plot(sigmas, vals,type = "l", ylab = "ln(L)")

plot(sigmas,vals, xlim = c(0.2, 0.8), type = "l", ylab = "ln(L)")

# BMlk(c, inv.c, sigmasq = 0.5, root.state = 0, data  = data); # the true value for sigsq
# #BMlk(c, inv.c, sigmasq = 0.05, root.state = 0, data  = data); # 
# BMlk(c, inv.c, sigmasq = 0.49, root.state = 0, data  = data); # 
# BMlk(c, inv.c, sigmasq = 0.48, root.state = 0, data  = data); # 

The likelihood surface is quite flat around the generating value!

Another look at contrasts andBM rate…

library(phytools)
## simulate a coalescent shaped tree
tree<-rcoal(n=50)
plot(tree,ftype="off")
## Warning in plot.window(...): "ftype" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "ftype" is not a graphical parameter
## Warning in title(...): "ftype" is not a graphical parameter