Exercise 8: Ancestral state reconstruction for discrete characters

This tutorial focuses on the estimation of ancestral character states for discretely valued traits using a continuous-time Markov chain model commonly known as the Mk model.

"phytools" has a function for ancestral character estimation under a couple of different models that uses the re-rooting method of Yang (1995). For other models of evolutionary change not covered by this function, users should try Rich Fitzjohn's diversitree package, or the ace function in ape. In this tutorial we will primarily use ace.

We will use a tree & dataset of feeding mode in elopomorph eels.

The data & tree are in two files as follows:

  1. elopomorph.tre
  2. elopomorph.csv

Now, let's load packages as well as the data & tree:

## first load packages
require(phytools)
## read data
X<-read.csv("elopomorph.csv",row.names=1)
feed.mode<-setNames(X[,1],rownames(X))
feed.mode
##                 Albula_vulpes             Anguilla_anguilla 
##                       suction                       suction 
##              Anguilla_bicolor             Anguilla_japonica 
##                       suction                       suction 
##             Anguilla_rostrata                Ariosoma_anago 
##                       suction                       suction 
##           Ariosoma_balearicum           Ariosoma_shiroanago 
##                       suction                       suction 
##        Bathyuroconger_vicinus   Brachysomophis_crocodilinus 
##                          bite                          bite 
##              Conger_japonicus              Conger_myriaster 
##                       suction                       suction 
##               Conger_verreaxi                Conger_wilsoni 
##                       suction                       suction 
##        Congresox_talabonoides            Cynoponticus_ferox 
##                       suction                          bite 
##            Dysomma_anguillare                  Elops_saurus 
##                          bite                       suction 
##         Facciolella_gilbertii          Gavialiceps_taeniola 
##                          bite                          bite 
##         Gnathophis_longicauda          Gorgasia_taiwanensis 
##                       suction                       suction 
##         Gymnothorax_castaneus   Gymnothorax_flavimarginatus 
##                          bite                          bite 
##            Gymnothorax_kidako           Gymnothorax_moringa 
##                          bite                          bite 
## Gymnothorax_pseudothyrsoideus       Gymnothorax_reticularis 
##                          bite                          bite 
##            Heteroconger_hassi          Ichthyapus_ophioneus 
##                       suction                          bite 
##      Kaupichthys_hyoproroides          Kaupichthys_nuchalis 
##                          bite                          bite 
##          Megalops_cyprinoides             Moringua_edwardsi 
##                       suction                          bite 
##             Moringua_javanica              Muraenesox_bagio 
##                          bite                          bite 
##           Muraenesox_cinereus          Myrichthys_breviceps 
##                          bite                       suction 
##          Myrichthys_maculosus         Myrichthys_magnificus 
##                       suction                       suction 
##                Myrophis_vafer        Nemichthys_scolopaceus 
##                          bite                          bite 
##          Nettastoma_melanurum        Ophichthus_serpentinus 
##                          bite                       suction 
##          Ophichthus_zophochir        Oxyconger_leptognathus 
##                       suction                          bite 
## Parabathymyrus_macrophthalmus           Paraconger_notialis 
##                       suction                       suction 
##      Pisodonophis_cancrivorus          Poeciloconger_kapala 
##                          bite                       suction 
##         Rhinomuraena_quaesita          Rhynchoconger_flavus 
##                          bite                       suction 
##        Saurenchelys_fierasfer      Scolecenchelys_breviceps 
##                          bite                       suction 
##            Scuticaria_tigrina             Serrivomer_beanii 
##                          bite                          bite 
##             Serrivomer_sector        Simenchelys_parasitica 
##                          bite                       suction 
##            Uroconger_lepturus      Uropterygius_micropterus 
##                       suction                          bite 
##          Venefica_proboscidea 
##                          bite 
## Levels: bite suction
## read tree
eel.tree<-read.tree("elopomorph.tre")
eel.tree
## 
## Phylogenetic tree with 61 tips and 60 internal nodes.
## 
## Tip labels:
##  Moringua_edwardsi, Kaupichthys_nuchalis, Gorgasia_taiwanensis, Heteroconger_hassi, Venefica_proboscidea, Anguilla_rostrata, ...
## 
## Rooted; includes branch lengths.

In visual format, these are the data that we have:

plotTree(eel.tree,type="fan",fsize=0.7,ftype="i",lwd=1)
cols<-setNames(c("red","blue"),levels(feed.mode))
tiplabels(pie=to.matrix(feed.mode[eel.tree$tip.label],levels(feed.mode)),
    piecol=cols,cex=0.3)
add.simmap.legend(colors=cols,prompt=FALSE,x=0.9*par()$usr[1],
    y=0.8*par()$usr[3],fsize=0.8)

plot of chunk unnamed-chunk-2

Next, let's fit a single-rate model & reconstruct ancestral states at internal nodes in the tree.

## estimate ancestral states under a ER model
fitER<-ace(feed.mode,eel.tree,model="ER",type="discrete")
fitER
## 
##     Ancestral Character Estimation
## 
## Call: ace(x = feed.mode, phy = eel.tree, type = "discrete", model = "ER")
## 
##     Log-likelihood: -36.33992 
## 
## Rate index matrix:
##         bite suction
## bite       .       1
## suction    1       .
## 
## Parameter estimates:
##  rate index estimate std-err
##           1   0.0158  0.0045
## 
## Scaled likelihoods at the root (type '...$lik.anc' to get them for all nodes):
##      bite   suction 
## 0.5480037 0.4519963
fitER$lik.anc
##               bite      suction
##  [1,] 0.5480037210 0.4519962790
##  [2,] 0.6242342545 0.3757657455
##  [3,] 0.6560898907 0.3439101093
##  [4,] 0.6904137188 0.3095862812
##  [5,] 0.7066485827 0.2933514173
##  [6,] 0.7158097958 0.2841902042
##  [7,] 0.7151789283 0.2848210717
##  [8,] 0.0005280939 0.9994719061
##  [9,] 0.7709017973 0.2290982027
## [10,] 0.7691863614 0.2308136386
## [11,] 0.6440479559 0.3559520441
## [12,] 0.0442832132 0.9557167868
## [13,] 0.0037653340 0.9962346660
## [14,] 0.0172004443 0.9827995557
## [15,] 0.8217944284 0.1782055716
## [16,] 0.8569460714 0.1430539286
## [17,] 0.7393039669 0.2606960331
## [18,] 0.7174480970 0.2825519030
## [19,] 0.6974032138 0.3025967862
## [20,] 0.7051615816 0.2948384184
## [21,] 0.7765900337 0.2234099663
## [22,] 0.9679563345 0.0320436655
## [23,] 0.7763459919 0.2236540081
## [24,] 0.7355606936 0.2644393064
## [25,] 0.5593038528 0.4406961472
## [26,] 0.5493603890 0.4506396110
## [27,] 0.6441467852 0.3558532148
## [28,] 0.5530135261 0.4469864739
## [29,] 0.2983895523 0.7016104477
## [30,] 0.2715130074 0.7284869926
## [31,] 0.0184977763 0.9815022237
## [32,] 0.0425564134 0.9574435866
## [33,] 0.0093509902 0.9906490098
## [34,] 0.5714981738 0.4285018262
## [35,] 0.4984340090 0.5015659910
## [36,] 0.3238194112 0.6761805888
## [37,] 0.0317279739 0.9682720261
## [38,] 0.0115367770 0.9884632230
## [39,] 0.3773398135 0.6226601865
## [40,] 0.5502264549 0.4497735451
## [41,] 0.7532755531 0.2467244469
## [42,] 0.7565228047 0.2434771953
## [43,] 0.8568317500 0.1431682500
## [44,] 0.9665651556 0.0334348444
## [45,] 0.9844171015 0.0155828985
## [46,] 0.9800267702 0.0199732298
## [47,] 0.9939408074 0.0060591926
## [48,] 0.9993040631 0.0006959369
## [49,] 0.9994145342 0.0005854658
## [50,] 0.9990235521 0.0009764479
## [51,] 0.9803292025 0.0196707975
## [52,] 0.8689467273 0.1310532727
## [53,] 0.8126122235 0.1873877765
## [54,] 0.9847413814 0.0152586186
## [55,] 0.2057221815 0.7942778185
## [56,] 0.0438289116 0.9561710884
## [57,] 0.0454895580 0.9545104420
## [58,] 0.0032374280 0.9967625720
## [59,] 0.3028016114 0.6971983886
## [60,] 0.1537905871 0.8462094129

The element lik.anc gives us the marginal ancestral states, also known as the 'empirical Bayesian posterior probabilities.'

It is fairly straightforward to overlay these posterior probabilities on the tree:

plotTree(eel.tree,type="fan",fsize=0.7,ftype="i",lwd=1)
nodelabels(node=1:eel.tree$Nnode+Ntip(eel.tree),
    pie=fitER$lik.anc,piecol=cols,cex=0.4)
tiplabels(pie=to.matrix(feed.mode[eel.tree$tip.label],levels(feed.mode)),
    piecol=cols,cex=0.3)
add.simmap.legend(colors=cols,prompt=FALSE,x=0.9*par()$usr[1],
    y=0.8*par()$usr[3],fsize=0.8)

plot of chunk unnamed-chunk-4

We can also consider a model in which the backward & forward rates between states are permitted to have different values:

fitARD<-ace(feed.mode,eel.tree,model="ARD",type="discrete")
fitARD
## 
##     Ancestral Character Estimation
## 
## Call: ace(x = feed.mode, phy = eel.tree, type = "discrete", model = "ARD")
## 
##     Log-likelihood: -36.3105 
## 
## Rate index matrix:
##         bite suction
## bite       .       2
## suction    1       .
## 
## Parameter estimates:
##  rate index estimate std-err
##           1   0.0175  0.0064
##           2   0.0156  0.0045
## 
## Scaled likelihoods at the root (type '...$lik.anc' to get them for all nodes):
##      bite   suction 
## 0.5128531 0.4871469
fitARD$lik.anc
##               bite      suction
##  [1,] 0.5128531470 0.4871468530
##  [2,] 0.5733355632 0.4266644368
##  [3,] 0.5944505875 0.4055494125
##  [4,] 0.6114008937 0.3885991063
##  [5,] 0.6228121844 0.3771878156
##  [6,] 0.6393068928 0.3606931072
##  [7,] 0.6388472597 0.3611527403
##  [8,] 0.0005291536 0.9994708464
##  [9,] 0.7063971096 0.2936028904
## [10,] 0.7051501899 0.2948498101
## [11,] 0.5868748673 0.4131251327
## [12,] 0.0398637553 0.9601362447
## [13,] 0.0034675193 0.9965324807
## [14,] 0.0150635115 0.9849364885
## [15,] 0.7669399749 0.2330600251
## [16,] 0.8098761695 0.1901238305
## [17,] 0.6911268707 0.3088731293
## [18,] 0.6276060901 0.3723939099
## [19,] 0.6065205446 0.3934794554
## [20,] 0.6265919527 0.3734080473
## [21,] 0.7085102287 0.2914897713
## [22,] 0.9588188326 0.0411811674
## [23,] 0.7124120033 0.2875879967
## [24,] 0.6732869064 0.3267130936
## [25,] 0.5044910053 0.4955089947
## [26,] 0.4979517327 0.5020482673
## [27,] 0.5569160674 0.4430839326
## [28,] 0.4731280335 0.5268719665
## [29,] 0.2569412886 0.7430587114
## [30,] 0.2488441105 0.7511558895
## [31,] 0.0155916308 0.9844083692
## [32,] 0.0373381060 0.9626618940
## [33,] 0.0082942371 0.9917057629
## [34,] 0.5029130319 0.4970869681
## [35,] 0.4344194663 0.5655805337
## [36,] 0.2722698888 0.7277301112
## [37,] 0.0255397209 0.9744602791
## [38,] 0.0090402495 0.9909597505
## [39,] 0.3355699063 0.6644300937
## [40,] 0.5149233061 0.4850766939
## [41,] 0.6755804498 0.3244195502
## [42,] 0.6835200870 0.3164799130
## [43,] 0.8015608211 0.1984391789
## [44,] 0.9479614699 0.0520385301
## [45,] 0.9740573341 0.0259426659
## [46,] 0.9716065836 0.0283934164
## [47,] 0.9915293041 0.0084706959
## [48,] 0.9989614588 0.0010385412
## [49,] 0.9991175147 0.0008824853
## [50,] 0.9987969620 0.0012030380
## [51,] 0.9745835766 0.0254164234
## [52,] 0.8326704196 0.1673295804
## [53,] 0.7795210027 0.2204789973
## [54,] 0.9793006382 0.0206993618
## [55,] 0.1886626789 0.8113373211
## [56,] 0.0411733964 0.9588266036
## [57,] 0.0420256656 0.9579743344
## [58,] 0.0030730170 0.9969269830
## [59,] 0.3123802200 0.6876197800
## [60,] 0.1618081107 0.8381918893
plotTree(eel.tree,type="fan",fsize=0.6,ftype="i",lwd=1)
nodelabels(node=1:eel.tree$Nnode+Ntip(eel.tree),
    pie=fitARD$lik.anc,piecol=cols,cex=0.4)
tiplabels(pie=to.matrix(feed.mode[eel.tree$tip.label],levels(feed.mode)),
    piecol=cols,cex=0.3)
add.simmap.legend(colors=cols,prompt=FALSE,x=0.9*par()$usr[1],
    y=0.8*par()$usr[3],fsize=0.8)

plot of chunk unnamed-chunk-6

An alternative tactic to the one outline above is to use an MCMC approach to sample character histories from their posterior probability distribution. This is called stochastic character mapping (Huelsenbeck et al. 2003). The model is the same but in this case we get a sample of unambiguous histories for our discrete character's evolution on the tree - rather than a probability distribution for the character at nodes.

For instance, given the data simulated above - we can generate the stochastic character map as follows:

## simulate single stochastic character map using empirical Bayes method
mtree<-make.simmap(eel.tree,feed.mode,model="ER")
## make.simmap is sampling character histories conditioned on the transition matrix
## 
## Q =
##                bite     suction
## bite    -0.01582783  0.01582783
## suction  0.01582783 -0.01582783
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
##    bite suction 
##     0.5     0.5
## Done.
plot(mtree,cols,type="fan",fsize=0.7,ftype="i")
add.simmap.legend(colors=cols,prompt=FALSE,x=0.9*par()$usr[1],
    y=0.8*par()$usr[3],fsize=0.8)

plot of chunk unnamed-chunk-7

A single stochastic character map does not mean a whole lot in isolation - we need to look at the whole distribution from a sample of stochastic maps. This can be a bit overwhelming. For instance, the following code generates 100 stochastic character maps from our dataset and plots them in a grid:

mtrees<-make.simmap(eel.tree,feed.mode,model="ER",nsim=100)
## make.simmap is sampling character histories conditioned on the transition matrix
## 
## Q =
##                bite     suction
## bite    -0.01582783  0.01582783
## suction  0.01582783 -0.01582783
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
##    bite suction 
##     0.5     0.5
## Done.
par(mfrow=c(10,10))
null<-sapply(mtrees,plotSimmap,colors=cols,lwd=1,ftype="off")

plot of chunk unnamed-chunk-8

It's possible to summarize a set of stochastic maps in a much more meaningful way. For instance, we can estimate the number of changes of each type, the proportion of time spent in each state, and the posterior probabilities that each internal node is in each state, under our model. For example:

pd<-summary(mtrees)
pd
## 100 trees with a mapped discrete character with states:
##  bite, suction 
## 
## trees have 30.9 changes between states on average
## 
## changes are of the following types:
##      bite,suction suction,bite
## x->y        16.42        14.48
## 
## mean total time spent in each state is:
##             bite    suction    total
## raw  1097.481754 886.619487 1984.101
## prop    0.553138   0.446862    1.000
plot(pd,fsize=0.6,ftype="i",colors=cols,ylim=c(-2,Ntip(eel.tree)))
add.simmap.legend(colors=cols[2:1],prompt=FALSE,x=0,y=-4,vertical=FALSE)

plot of chunk unnamed-chunk-9

## now let's plot a random map, and overlay the posterior probabilities
plot(mtrees[[1]],cols,fsize=0.6,ftype="i",ylim=c(-2,Ntip(eel.tree)))
nodelabels(pie=pd$ace,piecol=cols,cex=0.5)
add.simmap.legend(colors=cols[2:1],prompt=FALSE,x=0,y=-4,vertical=FALSE)

plot of chunk unnamed-chunk-10

For binary discrete characters, we can also use a method in phytools called densityMap to visualize the posterior probability of being in each state across all the edges and nodes of the tree as follows:

obj<-densityMap(mtrees,states=levels(feed.mode)[2:1],plot=FALSE)
## sorry - this might take a while; please be patient
plot(obj,fsize=c(0.6,1))

plot of chunk unnamed-chunk-11

Finally, since we obtained these inferences under exactly the same model, let's compare the posterior probabilities from stochastic mapping with our marginal ancestral states. In the former case, our probabilities were obtained by sampling from the joint (rather than marginal) probability distribution for the ancestral states.

plot(fitER$lik.anc,pd$ace,xlab="marginal ancestral states",
    ylab="posterior probabilities from stochastic mapping")
lines(c(0,1),c(0,1),lty="dashed",col="red",lwd=2)

plot of chunk unnamed-chunk-12

This tells us that although joint & marginal reconstruction are not the same, the marginal probabilities from joint stochastic sampling and the marginal ancestral states are quite highly correlated - at least in this case study.

Challenge Problem 5: Stochastic mapping

Use the following files containing (1) a set of trees from the Bayesian posterior distribution for eels & (2) phenotypic data for two characters for a subset of the species in (1).

  1. anguila.nex.
  2. elopomorph.csv

Conduct stochastic mapping of feeding mode on this set of trees. Summarize the results using phytools summary methods, and plotting your findings.

Written by Liam J. Revell. Last updated 29 Jun. 2015