Here are a few more details about Brownian motion
BMlk is function that calculates the likelihood of BM. There are two steps to follow here .
\[\hat{\sigma}^2 = \frac{[X - E(X)]'C^{-1}[X- E(X)]}{N}\]
Where \(X\) is a vector of the tip values, \(E(X)\) is the expected values at the root, \(C\) is the phylogenetic variance-covariance matrix, and \(N\) is the number of tips. Alsoā¦
\[log(L) = log\left [ \frac{exp\left \{ -\frac{1}{2}[X - E(X)]'V^{-1}[X- E(X)] \right \}}{((2\pi)^N*det(V)} \right ]\]
\(V\) here is called the rate matrix. It is simply the proposed value of \(\sigma^2\) multiplied by the phylogenetic variance-covariance matrix. So, Eq. 2 is saying that the likelihood (or rather the log of the likelihood) is a function of the difference between the tip and root values and the rate matrix, \(V\) (which is simply the vcv matrix multiplied by some value of \(\sigma^2\) that we wish to evaluate).
We use can use Eq 2 to write a function in R that will allow us to compare likelihoods given some tree and and a \(\sigma^2\). Our function will take the following argumentsā¦
require(phytools)
## Loading required package: phytools
## Loading required package: ape
## Loading required package: maps
##
## # maps v3.1: updated 'world': all lakes moved to separate new #
## # 'lakes' database. Type '?world' or 'news(package="maps")'. #
BMlk <- function(C, inv.C, sigmasq, root.state, data) {
N <- length(data); # the number of tips
EX <- rep(root.state, N) # creates a vector of the expected trait value - which under BM is the root state
V <- C * sigmasq; # multiply the entries in C by the BM rate
inv.V <- inv.C * sigmasq ^-1; # do the same for the inverted matrix using the inverse of the rate
lnlNum<- -0.5*(data - EX) %*% inv.V %*% (data - EX)
lnlDen<- log(sqrt((2*pi)^N*det(V)))
L<-lnlNum-lnlDen
return(L);
}
Note that we are going to have to do a little work before we can use our function. We need to pass it a vcv matrix and the inverse of the matrix, we need to pass in a root state value, and we need to pass a value of \(\sigma^2\). Letās try a fast example. We willā¦
# we can check this works for a really simple example - let's simulate a tree and some data and try it out.
require(TreeSim);
## Loading required package: TreeSim
## Loading required package: laser
## Loading required package: geiger
require(phytools);
t <- 10 # total time
n <- 100 # total taxa
b <- (log(n) - log(2))/t
tree <- pbtree(b = b, n = n, t = t)
## simulating with both taxa-stop (n) and time-stop (t) is
## performed via rejection sampling & may be slow
##
## 99 trees rejected before finding a tree
data <- fastBM(tree, sig2 = 0.5);
plotTree(ladderize(tree))
#data
# we'll need the covariance matrix
c <- vcvPhylo(tree, anc.nodes = F);
#c
inv.c <- solve(c);
Lets compare the likelihoods of the generating sigma square and a much lower value
#inv.c
## now we'll compute the likelihood of the true value and a different value ##
BMlk(c, inv.c, sigmasq = 0.5, root.state = 0, data = data); # the true value for sigsq
## [,1]
## [1,] -150.7476
#BMlk(c, inv.c, sigmasq = 0.05, root.state = 0, data = data); #
BMlk(c, inv.c, sigmasq = 0.005, root.state = 0, data = data); #
## [,1]
## [1,] -5899.365
Remember when comparing likelihoods, the higher the number, the better the likelihood (because the likelihood of a hypothesis is proportional to the probability of the data given that the hypothesis is true). Smaller (more negative) log likelihoods are also smaller likelihoods (raise \(e\) to the power of the competing likelihoods to check this) so we can see that our data prefers the the generating \(\sigma^2\) value to the alternative.
We can also look at the likelihood surface around the generating value
vals <- rep(0,100)
sigmas <- 1:100/100
thetas <- 1:100/1000
for(ii in 1:100){
vals[ii] <- BMlk(c, inv.c, sigmasq = sigmas[ii], root.state = sigmas[ii], data = data)
}
vals
## [1] -2913.9403 -1438.7831 -955.7866 -718.5371 -578.7152 -487.1794
## [7] -422.9934 -375.7506 -339.7030 -311.4221 -288.7390 -270.2161
## [13] -254.8642 -241.9808 -231.0539 -221.7016 -213.6338 -206.6262
## [19] -200.5028 -195.1237 -190.3762 -186.1688 -182.4263 -179.0865
## [25] -176.0975 -173.4156 -171.0037 -168.8305 -166.8689 -165.0957
## [31] -163.4909 -162.0369 -160.7186 -159.5225 -158.4370 -157.4516
## [37] -156.5572 -155.7455 -155.0093 -154.3421 -153.7381 -153.1920
## [43] -152.6992 -152.2552 -151.8564 -151.4992 -151.1804 -150.8972
## [49] -150.6469 -150.4271 -150.2357 -150.0705 -149.9299 -149.8120
## [55] -149.7154 -149.6385 -149.5801 -149.5389 -149.5138 -149.5037
## [61] -149.5077 -149.5249 -149.5545 -149.5956 -149.6475 -149.7096
## [67] -149.7812 -149.8618 -149.9508 -150.0476 -150.1519 -150.2631
## [73] -150.3808 -150.5047 -150.6344 -150.7695 -150.9097 -151.0547
## [79] -151.2042 -151.3579 -151.5157 -151.6771 -151.8421 -152.0104
## [85] -152.1818 -152.3562 -152.5332 -152.7128 -152.8948 -153.0791
## [91] -153.2655 -153.4539 -153.6441 -153.8360 -154.0296 -154.2247
## [97] -154.4211 -154.6189 -154.8179 -155.0180
#filled.contour(sigmas, sigmas, vals)
plot(sigmas, vals,type = "l", ylab = "ln(L)")
plot(sigmas,vals, xlim = c(0.2, 0.8), type = "l", ylab = "ln(L)")
# BMlk(c, inv.c, sigmasq = 0.5, root.state = 0, data = data); # the true value for sigsq
# #BMlk(c, inv.c, sigmasq = 0.05, root.state = 0, data = data); #
# BMlk(c, inv.c, sigmasq = 0.49, root.state = 0, data = data); #
# BMlk(c, inv.c, sigmasq = 0.48, root.state = 0, data = data); #
The likelihood surface is quite flat around the generating value!
Another look at contrasts andBM rateā¦
library(phytools)
## simulate a coalescent shaped tree
tree<-rcoal(n=50)
plot(tree,ftype="off")
## Warning in plot.window(...): "ftype" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "ftype" is not a graphical parameter
## Warning in title(...): "ftype" is not a graphical parameter