Ancestral state reconstruction in R

For better or for worse, the estimation of phenotypic trait values for ancestral nodes in the tree continues to be an important goal in phylogenetic comparative biology.

In this tutorial, I focus on the following five topics:

1. Estimating the ancestral states of continuous characters.
2. Visualizing continuous character ancestral states for one or multiple traits.
3. Estimating ancestral character states for discrete characters under a continuous-time Markov chain.
4. Simulating stochastic character histories for a discrete character (stochastic character mapping).
5. Simultaneously plotting discrete & continuous character reconstructions.

Estimating the ancestral states of continuous characters

To estimate the states for a continuously valued character at ancestral nodes in the tree, we need to maximize the states at the internal nodes of tree which we expect to have a multivariate normal distribution.

This is relatively simple in principle. For instance, in the code below - I have taken advantage of the functions of "phytools" and another R package called "mnormt" to compute the joint likelihood of our Brownian rate (σ²)** and our states at all the internal nodes of the tree.

(**Note that the fitted Brownian rate is not the same as the MLE of σ² conditioning only on the states for the tips of the tree. The rate estimated here will be too low because it treats the states at internal nodes as known, and then conditions on these states.)

## set seed for reproducibility
set.seed(100)
## load phytools
library(phytools)
## simulate a tree & some data
tree <- pbtree(n = 26, scale = 1)
tree$tip.label <- LETTERS[26:1]
x <- fastBM(tree)
## let's try & write down the likelihood function we want
C <- vcvPhylo(tree)
pp <- rep(1, tree$Nnode + 1)
lik <- function(pp, C, x) -sum(dmnorm(c(x, pp[3:length(pp)]), rep(pp[2], nrow(C)), 
    pp[1] * C, log = TRUE))
tt <- setNames(c(1, rep(mean(x), tree$Nnode)), c("sig2", 1:tree$Nnode + length(tree$tip)))
RR <- optim(tt, lik, C = C, x = x, method = "L-BFGS-B", lower = c(1e-06, rep(-Inf, 
    tree$Nnode)))
print(RR)

## $par
##      sig2        27        28        29        30        31        32 
##  0.000001  0.360920  1.677493  1.472784  0.117096  0.051251  0.043723 
##        33        34        35        36        37        38        39 
##  0.067812 -0.063631 -1.014297 -1.141446 -0.159674 -0.271753 -0.559683 
##        40        41        42        43        44        45        46 
## -0.115334 -0.189465  0.202020  0.214177  0.435215  0.833868  1.109123 
##        47        48        49        50        51 
## -0.056934 -0.374485 -0.498910 -0.372938  0.116875 
## 
## $value
## [1] 18893643
## 
## $counts
## function gradient 
##       13       13 
## 
## $convergence
## [1] 0
## 
## $message
## [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"

The problem with this method is that it will generally be inefficient at finding the ML solution as the number of nodes in our tree increases, and will even often fail to find the MLE. Luckily some R packages have more efficient methods for finding the same MLEs of ancestral nodes. Here is an example, and we can use it to "truth" our optimization, shown above:

aa <- fastAnc(tree, x)
plot(aa, RR$par[2:length(RR$par)], xlab = "fastAnc", ylab = "custom")

We can also use fastAnc to get 95% CIs on ancestral state values, given our model - as follows:

AA <- fastAnc(tree, x, CI = TRUE)
print(AA)

## $ace
##         27         28         29         30         31         32 
##  0.4038429  1.5060325  1.5131013  0.1696467  0.0092989 -0.0009378 
##         33         34         35         36         37         38 
##  0.0265075 -0.2029285 -0.6334198 -0.7574224 -0.1916646 -0.2394627 
##         39         40         41         42         43         44 
## -0.3508794 -0.0562375 -0.1880745  0.2740616  0.2787586  0.4038396 
##         45         46         47         48         49         50 
##  0.7171769  0.9055803 -0.0958945 -0.3002784 -0.5135732 -0.3683351 
##         51 
##  0.1456773 
## 
## $CI95
##       [,1]     [,2]
## 27 -0.6686  1.47629
## 28  1.2191  1.79292
## 29  1.2925  1.73371
## 30 -0.6300  0.96929
## 31 -0.5995  0.61814
## 32 -0.5759  0.57399
## 33 -0.5412  0.59421
## 34 -0.8298  0.42399
## 35 -1.1754 -0.09142
## 36 -1.2020 -0.31288
## 37 -0.7036  0.32023
## 38 -0.7401  0.26118
## 39 -0.7633  0.06153
## 40 -0.5202  0.40770
## 41 -0.5368  0.16066
## 42 -0.3209  0.86905
## 43 -0.3389  0.89645
## 44 -0.2178  1.02550
## 45  0.1164  1.31797
## 46  0.3304  1.48079
## 47 -0.7395  0.54768
## 48 -1.0083  0.40776
## 49 -0.6071 -0.42006
## 50 -0.4589 -0.27776
## 51 -0.7051  0.99646

These CIs could theoretically be used to test hypotheses about ancestral state estimates for internal nodes.

Visualizing continuous character ancestral states for one or multiple traits

There are some convenient ways to visualize ancestral state reconstructions on the phylogeny.

The simplest type of ancestral character visualization is a method called the traitgram. In this method, we project a tree into a space defined by time on one axis (say, the abscissa) and our phenotypic trait of interest on the other.

This visualization is so simple that it is easy to recreate it ourselves. Let's do so using the data from before:

## first estimate ancestral states
aa <- fastAnc(tree, x)
## now put these data into a matrix with the tip states
YY <- matrix(NA, nrow(tree$edge), ncol(tree$edge))
aa <- c(x[tree$tip.label], aa)
names(aa)[1:length(tree$tip)] <- 1:length(tree$tip)
YY[] <- aa[as.character(tree$edge)]
XX <- nodeHeights(tree)
plot(XX[1, 1], YY[1, 1], xlim = range(XX) + c(0, 0.1 * max(XX)), ylim = range(YY), 
    xlab = "time", ylab = "phenotype")
invisible(sapply(1:nrow(XX), function(i, x, y) lines(x[i, ], y[i, ]), x = XX, 
    y = YY))
text(x = max(XX), y = YY[sapply(1:length(tree$tip.label), function(x, y) which(x == 
    y), y = tree$edge[, 2]), 2], tree$tip.label, pos = 4)

This is not super pretty, and we can do much better with canned routines that also allow us to automatically space the labels of tip taxa on the tree:

phenogram(tree, x, spread.labels = TRUE)

This plot throws out all the information that we have about uncertainty in reconstructed ancestral trait values. "phytools" has a function which overlays a 95% confidence limit using transparency. Here is a demo of what that looks like:

fancyTree(tree, type = "phenogram95", x = x)

We can also project a tree into a two-trait morphospace. The general approach for this is simple, but "phytools" has canned routines to generate this phylomorphospace visualization:

## simulate data with a modest (0.5) correlation
XY <- sim.corrs(tree, vcv = matrix(c(1, 0.5, 0.5, 1), 2, 2))
colnames(XY) <- LETTERS[1:2]
## project into morphospace
phylomorphospace(tree, XY, label = "horizontal")

The final type of visualization that we can use with ancestral character reconstruction for continuous traits uses the function contMap in phytools which maps the observed and reconstructed phenotypic trait values onto the tree using a color gradient.

## create contMap
XX <- contMap(tree, x)

plot(XX, type = "fan")

One last thing for visualizing ancestral state values for continuous traits. "phytools now has a function which can be used to create a "phylogenetic scattergram" - containing contMap plots on the diagonal and bivariate projections of the tree into morphospace on off-diagonal positions. Here is a demo of this function:

## simulate a third character
z <- fastBM(tree, sig2 = 2)
XYZ <- cbind(XY, z)
fancyTree(tree, X = XYZ, type = "scattergram")

Estimating ancestral character states for discrete characters under a continuous-time Markov chain

Discrete character data - particular whose evolution is modeled by a continuous time Markov chain (i.e., a non-QG model) - are not the focus of this course. However it is nonetheless useful to know how to do this properly in R.

"phytools" has a function for ancestral character estimation under a couple of different models that uses the re-rooting method of Yang (1995). For other models of evolutionary change not covered by this function, users should try Rich Fitzjohn's diversitree package.

## simulate discrete character data
Q <- matrix(c(-2, 1, 1, 1, -2, 1, 1, 1, -2), 3, 3)
colnames(Q) <- rownames(Q) <- letters[1:3]
x <- sim.history(tree, Q, anc = "a")$states
print(x)

##   Z   Y   X   W   V   U   T   S   R   Q   P   O   N   M   L   K   J   I 
## "b" "b" "b" "a" "a" "c" "c" "a" "a" "a" "a" "a" "c" "c" "a" "b" "b" "b" 
##   H   G   F   E   D   C   B   A 
## "a" "b" "b" "b" "b" "b" "c" "a"

## estimate ancestral states under a ER model
fitER <- rerootingMethod(tree, x, model = "ER")
print(fitER)

## $loglik
## [1] -21
## 
## $Q
##         a       b       c
## a -1.7772  0.8886  0.8886
## b  0.8886 -1.7772  0.8886
## c  0.8886  0.8886 -1.7772
## 
## $marginal.anc
##            a         b         c
## 27 3.506e-01 0.4098640 2.396e-01
## 28 7.889e-04 0.9984445 7.665e-04
## 29 3.833e-05 0.9999241 3.758e-05
## 30 3.965e-01 0.4042201 1.993e-01
## 31 4.081e-01 0.4728409 1.190e-01
## 32 4.192e-01 0.4703570 1.105e-01
## 33 4.482e-01 0.4365387 1.153e-01
## 34 7.674e-01 0.1481032 8.449e-02
## 35 8.773e-01 0.0315238 9.117e-02
## 36 8.228e-01 0.0206660 1.565e-01
## 37 9.437e-01 0.0167132 3.956e-02
## 38 9.267e-01 0.0157189 5.755e-02
## 39 9.927e-01 0.0023788 4.894e-03
## 40 9.909e-01 0.0033804 5.673e-03
## 41 9.989e-01 0.0004878 5.868e-04
## 42 3.361e-01 0.4845388 1.793e-01
## 43 3.439e-01 0.2750909 3.810e-01
## 44 1.811e-01 0.1502825 6.686e-01
## 45 6.565e-02 0.8908456 4.350e-02
## 46 3.207e-02 0.9445953 2.334e-02
## 47 3.854e-01 0.5102367 1.043e-01
## 48 4.019e-01 0.4955680 1.026e-01
## 49 1.772e-05 0.9999707 1.158e-05
## 50 1.283e-05 0.9999766 1.057e-05
## 51 4.243e-01 0.2565491 3.192e-01

plotTree(tree, setEnv = TRUE, offset = 0.5)

## setEnv=TRUE is experimental. please be patient with bugs

nodelabels(node = as.numeric(rownames(fitER$marginal.anc)), pie = fitER$marginal.anc, 
    piecol = c("blue", "red", "yellow"), cex = 0.6)
tiplabels(pie = to.matrix(x, sort(unique(x))), piecol = c("blue", "red", "yellow"), 
    cex = 0.3)

Simulating stochastic character histories for a discrete character (stochastic character mapping)

An alternative approach to the one outline above is to use an MCMC approach to sample character histories from their posterior probability distribution. This is called stochastic character mapping (Huelsenbeck et al. 2003). The model is the same but in this case we get a sample of unambiguous histories for our discrete character's evolution on the tree - rather than a probability distribution for the character at nodes.

For instance, given the data simulated above - we can generate the stochastic character map as follows:

## simulate single stochastic character map using empirical Bayes method
mtree <- make.simmap(tree, x, model = "ER")

## make.simmap is sampling character histories conditioned on the transition
## matrix Q =

##         a       b       c
## a -1.7772  0.8886  0.8886
## b  0.8886 -1.7772  0.8886
## c  0.8886  0.8886 -1.7772

## (estimated using likelihood);

## and (mean) root node prior probabilities pi =

##      a      b      c 
## 0.3333 0.3333 0.3333

## Done.

cols <- setNames(c("blue", "red", "yellow"), sort(unique(x)))
plotSimmap(mtree, cols, pts = FALSE, lwd = 3)
add.simmap.legend(colors = cols, vertical = FALSE, prompt = FALSE, x = 0, y = 24)

A single stochastic character map does not mean a whole lot in isolation; however in aggregate, we can do quite a bit. For instance, we can estimate the number of changes of each type, the proportion of time spent in each state, and the posterior probabilities that each internal node is in each state, under our model. For example:

## simulate 200 stochastic character maps
mtrees <- make.simmap(tree, x, model = "ER", nsim = 200)

## make.simmap is sampling character histories conditioned on the transition
## matrix Q =

##         a       b       c
## a -1.7772  0.8886  0.8886
## b  0.8886 -1.7772  0.8886
## c  0.8886  0.8886 -1.7772

## (estimated using likelihood);

## and (mean) root node prior probabilities pi =

##      a      b      c 
## 0.3333 0.3333 0.3333

## Done.

XX <- describe.simmap(mtrees, plot = FALSE)

## 200 trees with a mapped discrete character with states:
##  a, b, c 
## 
## trees have 18.785 changes between states on average
## 
## changes are of the following types:
##       a,b   a,c   b,a   b,c  c,a  c,b
## x->y 3.89 4.365 3.115 2.515 2.58 2.32
## 
## mean total time spent in each state is:
##           a     b      c total
## raw  4.0828 3.571 1.8944 9.548
## prop 0.4276 0.374 0.1984 1.000

## now let's plot a random map, and overlay the posterior probabilities
plotSimmap(mtrees[[1]], cols, lwd = 3, pts = F, setEnv = T)

## setEnv=TRUE is experimental. please be patient with bugs

nodelabels(pie = XX$ace, piecol = cols, cex = 0.6)
add.simmap.legend(colors = cols, vertical = FALSE, prompt = FALSE, x = 0, y = 24)

We can also use stochastic mapping to plot the posterior probability that the edges & nodes of the tree are in a binary state. This function is somewhat primitive, so we first have to convert our mapped edge states to 0 & 1.

## convert mapped states
stateb <- mergeMappedStates(mtrees, c("a", "c"), "0")
stateb <- mergeMappedStates(stateb, "b", "1")
## now plot density map
XX <- densityMap(stateb, lwd = 5)

## sorry - this might take a while; please be patient

Simultaneously plotting discrete & continuous character reconstructions

Discrete & continuous character visualizations can be combined in a range of ways. For instance, we can overlay a density map onto a phylomorphospace plot, as follows: