In this tutorial, we're going to cover two different analyses:
1. We're going to estimate the correlation between two binary
characters, as well as between a binary & a continuous character,
assuming a threshold model of character evolution for our discrete
character using both threshBayes in
the phytools package in Rthreshml
in Rphylip; and
2. We're going to reconstruct the ancestral states of a multistate,
naturally ordered character on the tree assuming the threshold model.
If you want to following along, but do not have a fast enough computer to run the MCMC, you can download and load the .Rdata file thresh.Rdata and then run everything except the MCMC.
First, we're going to use Bayesian MCMC method to sample the evolutionary parameters of our model from their joint posterior probability distribution. These parameters include evolutionary rates (σ2s) for each continuous character; and the correlation between our discrete and continuous character. Since this is a binary model with a single threshold - we don't have to worry about the relative positions of the thresholds.
We'll start by simulating a tree & two correlated Brownian variables on the tree.
For illustrative purposes, let's start by using threshBayes
on two
continuous characters & show that the result lines up closely with our MLE of
the correlation.
## first, load phytools library(phytools)
tree <- pbtree(n = 50, scale = 1) r <- 0.75 # simulate using a high correlation V <- matrix(c(1, r, r, 1), 2, 2) X <- sim.corrs(tree, V) ## set some parameters for the MCMC sample <- 500 ngen <- 2e+05 ## in 'real' studies this should be larger burnin <- 0.2 * ngen ## we can start by running our Bayesian MCMC on the original data AA <- threshBayes(tree, X, types = c("cont", "cont"), ngen = ngen, control = list(sample = sample))
## [1] "gen 1000" ## [1] "gen 2000" ## [1] "gen 3000" ## [1] "gen 4000" ## [1] "gen 5000" ## [1] "gen 6000" ## .... ## [1] "gen 200000"
## we can compute our MLE of the correlation V.mle <- phyl.vcv(X, vcv(tree), lambda = 1)$R V.mle[1, 2]/sqrt(V.mle[1, 1] * V.mle[2, 2])
## [1] 0.756
## here is our 'post burnin' mean from the posterior sample mean(AA$par[(burnin/sample + 1):nrow(AA$par), "r"])
## [1] 0.7467
## plot our likelihood profile plot(AA$par[, "logL"], type = "l", xlab = "generation", ylab = "logL")
## plot our posterior density for the correlation plot(density(AA$par[(burnin/sample + 1):nrow(AA$par), "r"], bw = 0.1), xlab = "r", main = "posterior density for r") lines(c(r, r), c(0, 1000), lty = "dashed")
OK, this was just a proof-of-concept exercise. Now let's try the same thing after having converted one of our trait values into a threshold character.
## convert X[,2] to a binary character X[, 2] <- as.numeric(sapply(X[, 2], threshState, thresholds = setNames(c(0, Inf), 0:1))) BB <- threshBayes(tree, X, types = c("cont", "disc"), ngen = ngen, control = list(sample = sample))
## [1] "gen 1000" ## [1] "gen 2000" ## [1] "gen 3000" ## .... ## [1] "gen 200000"
## what is our estimate of the correlation? mean(BB$par[(burnin/sample + 1):nrow(BB$par), "r"])
## [1] 0.7214
## plot our likelihood profile plot(BB$par[, "logL"], type = "l", xlab = "generation", ylab = "logL")
## plot our estimated posterior density for the correlation plot(density(BB$par[(burnin/sample + 1):nrow(BB$par), "r"], bw = 0.1), xlab = "r", main = "posterior density for r") lines(c(r, r), c(0, 1000), lty = "dashed")
Finally, let's analyze two binary characters under the same model. Remember, each time we're losing information about the underlying correlation between liabilities. Nonetheless....
## convert X[,2] to a binary character X[, 1] <- as.numeric(sapply(X[, 1], threshState, thresholds = setNames(c(0, Inf), 0:1))) CC <- threshBayes(tree, X, types = c("disc", "disc"), ngen = ngen, control = list(sample = sample))
## [1] "gen 1000" ## [1] "gen 2000" ## [1] "gen 3000" ## ... ## [1] "gen 200000"
## what is our estimate of the correlation? mean(CC$par[(burnin/sample + 1):nrow(CC$par), "r"])
## [1] 0.5179
## plot our likelihood profile plot(CC$par[, "logL"], type = "l", xlab = "generation", ylab = "logL")
## plot our posterior sample for the correlation plot(density(CC$par[(burnin/sample + 1):nrow(CC$par), "r"], bw = 0.1), xlab = "r", main = "posterior density for r") lines(c(r, r), c(0, 1000), lty = "dashed")
So we can see that we can see that threshBayes
does a
reasonable job at estimating the correlation between binary & continuous
characters under the threshold model.
Finally, in 'real' empirical studies, it is always wise to ensure that you have a 'good' sample from the posterior distribution - for example, by computing the effective sample size of your posterior sample.
## let's use coda to compute effective sample sizes & HPDs require(coda)
## first, analysis A rA <- AA$par[(burnin/sample + 1):nrow(AA$par), "r"] class(rA) <- "mcmc" effectiveSize(rA)
## var1 ## 313.2
HPDinterval(rA)
## lower upper ## var1 0.5998 0.8556 ## attr(,"Probability") ## [1] 0.9502
## second, analysis B rB <- BB$par[(burnin/sample + 1):nrow(AA$par), "r"] class(rB) <- "mcmc" effectiveSize(rB)
## var1 ## 29.75
HPDinterval(rB)
## lower upper ## var1 0.3822 0.9493 ## attr(,"Probability") ## [1] 0.9502
## finally, analysis C rC <- CC$par[(burnin/sample + 1):nrow(AA$par), "r"] class(rC) <- "mcmc" effectiveSize(rC)
## var1 ## 76.72
HPDinterval(rC)
## lower upper ## var1 0.1638 0.8465 ## attr(,"Probability") ## [1] 0.9502
In addition to this, we can also use Rthreshml in the Rphylip package to call THRESHML from the PHYLIP package internally. This is much faster & should produce similar etimates of the evolutionary covariances under most circumstances. It can also compute the covariances between multiple traits.
Since THRESHML is not in the current version of PHYLIP, it is first necessary to download & install Felsenstein's software. I would advise putting the executable file into the same folder as all your other PHYLIP executables, or (if you have not previously installed PHYLIP), put THRESHML into your current working directory.
Now let's try to run Rthreshml on our dataset from before. We'll just use the default conditions & compare our estimated covariances to those obtained by threshBayes.
require(Rphylip)
fit <- Rthreshml(tree, X, types = c("cont", "disc"))
## ## Threshold character Maximum Likelihood method version 3.7a ## ## In this output the first character is continuous, ## and the next character is discrete. ## ## Covariance matrix of continuous character ## and liability of the discrete character ## (the continuous characters are first) ## ## 1 2 ## ## 1 0.18911 0.22566 ## 2 0.22566 1.00000 ## ## ## Transform from independent variables (columns) ## to liabilities or characters (rows) ## ## 1 2 ## ## 1 0.43487 0.00000 ## 2 0.51891 0.85483 ## ## ## For Variance of ## variable its change ## -------- ----------- ## 1 1.058568 ## 2 0.130546 ## ## Transform from liabilities or characters (columns) ## to continuous variables or liabilities (rows) ## ## 1 2 ## ## 1 -0.25122 -0.96793 ## 2 -0.96793 0.25122
fit
## $Covariance_matrix ## [,1] [,2] ## [1,] 0.1891 0.2257 ## [2,] 0.2257 1.0000 ## ## $Transform_indepvar_liab ## [,1] [,2] ## [1,] 0.4349 0.0000 ## [2,] 0.5189 0.8548 ## ## $Var_change ## [1] 1.0586 0.1305 ## ## $Transform_liab_cont ## [,1] [,2] ## [1,] -0.2512 -0.9679 ## [2,] -0.9679 0.2512
We can also do ancestral character reconstruction for discrete characters using the threshold model.
## simulate tree & data n <- 50 ngen <- 1e+05 sample <- 500 tree <- pbtree(n = n, scale = 10) x <- fastBM(tree, sig2 = 1, a = 0.5, internal = TRUE) th <- sapply(x, threshState, thresholds = setNames(c(0, 0.5, 2, Inf), letters[1:4])) mcmc <- ancThresh(tree, th[1:n], ngen = ngen, sequence = letters[1:4], control = list(sample = sample, plot = FALSE))
plotTree(tree, ftype = "off") tiplabels(pie = to.matrix(th[1:n], letters[1:4]), piecol = palette()[1:4], cex = 0.3) nodelabels(pie = mcmc$ace, piecol = palette()[1:4], cex = 0.8) nodelabels(pie = to.matrix(th[1:tree$Nnode + n], letters[1:4]), piecol = palette()[1:4], cex = 0.4)
Theoretically, we can even use this method to identify the relative positions of thresholds:
colMeans(mcmc$par[(0.2 * ngen/sample):(ngen/sample) + 1, letters[1:4]])
## a b c d ## 0.0000 0.7229 2.3610 Inf
par(mfrow = c(2, 1)) par(mar = c(4.1, 5.1, 2.1, 2.1)) plot(density(mcmc$par[(0.2 * ngen/sample):(ngen/sample) + 1, letters[2]], bw = 0.5), xlab = "threshold a->b", main = "") lines(c(0.5, 0.5), c(0, 1000), lty = "dashed") plot(density(mcmc$par[(0.2 * ngen/sample):(ngen/sample) + 1, letters[3]], bw = 0.5), xlab = "threshold b->c", main = "") lines(c(2, 2), c(0, 1000), lty = "dashed")
Save the workspace:
save.image(file = "thresh.Rdata")
That's it!
Written by Liam J. Revell. Last updated May 27, 2014