This tutorial focuses on the estimation of ancestral character states for discretely valued traits using a continuous-time Markov chain model commonly known as the Mk model.
"phytools"
has a function for ancestral character estimation under a couple of different models that uses the re-rooting method of Yang (1995). For other models of evolutionary change not covered by this function, users should try Rich Fitzjohn's diversitree package, or the ace
function in ape. In this tutorial we will primarily use ace
.
We will use a tree & dataset of feeding mode in elopomorph eels.
The data & tree are in two files as follows:
Now, let's load packages as well as the data & tree:
## first load packages
require(phytools)
## read data
X<-read.csv("elopomorph.csv",row.names=1)
feed.mode<-setNames(X[,1],rownames(X))
feed.mode
## Albula_vulpes Anguilla_anguilla
## suction suction
## Anguilla_bicolor Anguilla_japonica
## suction suction
## Anguilla_rostrata Ariosoma_anago
## suction suction
## Ariosoma_balearicum Ariosoma_shiroanago
## suction suction
## Bathyuroconger_vicinus Brachysomophis_crocodilinus
## bite bite
## Conger_japonicus Conger_myriaster
## suction suction
## Conger_verreaxi Conger_wilsoni
## suction suction
## Congresox_talabonoides Cynoponticus_ferox
## suction bite
## Dysomma_anguillare Elops_saurus
## bite suction
## Facciolella_gilbertii Gavialiceps_taeniola
## bite bite
## Gnathophis_longicauda Gorgasia_taiwanensis
## suction suction
## Gymnothorax_castaneus Gymnothorax_flavimarginatus
## bite bite
## Gymnothorax_kidako Gymnothorax_moringa
## bite bite
## Gymnothorax_pseudothyrsoideus Gymnothorax_reticularis
## bite bite
## Heteroconger_hassi Ichthyapus_ophioneus
## suction bite
## Kaupichthys_hyoproroides Kaupichthys_nuchalis
## bite bite
## Megalops_cyprinoides Moringua_edwardsi
## suction bite
## Moringua_javanica Muraenesox_bagio
## bite bite
## Muraenesox_cinereus Myrichthys_breviceps
## bite suction
## Myrichthys_maculosus Myrichthys_magnificus
## suction suction
## Myrophis_vafer Nemichthys_scolopaceus
## bite bite
## Nettastoma_melanurum Ophichthus_serpentinus
## bite suction
## Ophichthus_zophochir Oxyconger_leptognathus
## suction bite
## Parabathymyrus_macrophthalmus Paraconger_notialis
## suction suction
## Pisodonophis_cancrivorus Poeciloconger_kapala
## bite suction
## Rhinomuraena_quaesita Rhynchoconger_flavus
## bite suction
## Saurenchelys_fierasfer Scolecenchelys_breviceps
## bite suction
## Scuticaria_tigrina Serrivomer_beanii
## bite bite
## Serrivomer_sector Simenchelys_parasitica
## bite suction
## Uroconger_lepturus Uropterygius_micropterus
## suction bite
## Venefica_proboscidea
## bite
## Levels: bite suction
## read tree
eel.tree<-read.tree("elopomorph.tre")
eel.tree
##
## Phylogenetic tree with 61 tips and 60 internal nodes.
##
## Tip labels:
## Moringua_edwardsi, Kaupichthys_nuchalis, Gorgasia_taiwanensis, Heteroconger_hassi, Venefica_proboscidea, Anguilla_rostrata, ...
##
## Rooted; includes branch lengths.
In visual format, these are the data that we have:
plotTree(eel.tree,type="fan",fsize=0.7,ftype="i",lwd=1)
cols<-setNames(c("red","blue"),levels(feed.mode))
tiplabels(pie=to.matrix(feed.mode[eel.tree$tip.label],levels(feed.mode)),
piecol=cols,cex=0.3)
add.simmap.legend(colors=cols,prompt=FALSE,x=0.9*par()$usr[1],
y=0.8*par()$usr[3],fsize=0.8)
Next, let's fit a single-rate model & reconstruct ancestral states at internal nodes in the tree.
## estimate ancestral states under a ER model
fitER<-ace(feed.mode,eel.tree,model="ER",type="discrete")
fitER
##
## Ancestral Character Estimation
##
## Call: ace(x = feed.mode, phy = eel.tree, type = "discrete", model = "ER")
##
## Log-likelihood: -36.33992
##
## Rate index matrix:
## bite suction
## bite . 1
## suction 1 .
##
## Parameter estimates:
## rate index estimate std-err
## 1 0.0158 0.0045
##
## Scaled likelihoods at the root (type '...$lik.anc' to get them for all nodes):
## bite suction
## 0.5480037 0.4519963
fitER$lik.anc
## bite suction
## [1,] 0.5480037210 0.4519962790
## [2,] 0.6242342545 0.3757657455
## [3,] 0.6560898907 0.3439101093
## [4,] 0.6904137188 0.3095862812
## [5,] 0.7066485827 0.2933514173
## [6,] 0.7158097958 0.2841902042
## [7,] 0.7151789283 0.2848210717
## [8,] 0.0005280939 0.9994719061
## [9,] 0.7709017973 0.2290982027
## [10,] 0.7691863614 0.2308136386
## [11,] 0.6440479559 0.3559520441
## [12,] 0.0442832132 0.9557167868
## [13,] 0.0037653340 0.9962346660
## [14,] 0.0172004443 0.9827995557
## [15,] 0.8217944284 0.1782055716
## [16,] 0.8569460714 0.1430539286
## [17,] 0.7393039669 0.2606960331
## [18,] 0.7174480970 0.2825519030
## [19,] 0.6974032138 0.3025967862
## [20,] 0.7051615816 0.2948384184
## [21,] 0.7765900337 0.2234099663
## [22,] 0.9679563345 0.0320436655
## [23,] 0.7763459919 0.2236540081
## [24,] 0.7355606936 0.2644393064
## [25,] 0.5593038528 0.4406961472
## [26,] 0.5493603890 0.4506396110
## [27,] 0.6441467852 0.3558532148
## [28,] 0.5530135261 0.4469864739
## [29,] 0.2983895523 0.7016104477
## [30,] 0.2715130074 0.7284869926
## [31,] 0.0184977763 0.9815022237
## [32,] 0.0425564134 0.9574435866
## [33,] 0.0093509902 0.9906490098
## [34,] 0.5714981738 0.4285018262
## [35,] 0.4984340090 0.5015659910
## [36,] 0.3238194112 0.6761805888
## [37,] 0.0317279739 0.9682720261
## [38,] 0.0115367770 0.9884632230
## [39,] 0.3773398135 0.6226601865
## [40,] 0.5502264549 0.4497735451
## [41,] 0.7532755531 0.2467244469
## [42,] 0.7565228047 0.2434771953
## [43,] 0.8568317500 0.1431682500
## [44,] 0.9665651556 0.0334348444
## [45,] 0.9844171015 0.0155828985
## [46,] 0.9800267702 0.0199732298
## [47,] 0.9939408074 0.0060591926
## [48,] 0.9993040631 0.0006959369
## [49,] 0.9994145342 0.0005854658
## [50,] 0.9990235521 0.0009764479
## [51,] 0.9803292025 0.0196707975
## [52,] 0.8689467273 0.1310532727
## [53,] 0.8126122235 0.1873877765
## [54,] 0.9847413814 0.0152586186
## [55,] 0.2057221815 0.7942778185
## [56,] 0.0438289116 0.9561710884
## [57,] 0.0454895580 0.9545104420
## [58,] 0.0032374280 0.9967625720
## [59,] 0.3028016114 0.6971983886
## [60,] 0.1537905871 0.8462094129
The element lik.anc
gives us the marginal ancestral states, also known as the 'empirical Bayesian posterior probabilities.'
It is fairly straightforward to overlay these posterior probabilities on the tree:
plotTree(eel.tree,type="fan",fsize=0.7,ftype="i",lwd=1)
nodelabels(node=1:eel.tree$Nnode+Ntip(eel.tree),
pie=fitER$lik.anc,piecol=cols,cex=0.4)
tiplabels(pie=to.matrix(feed.mode[eel.tree$tip.label],levels(feed.mode)),
piecol=cols,cex=0.3)
add.simmap.legend(colors=cols,prompt=FALSE,x=0.9*par()$usr[1],
y=0.8*par()$usr[3],fsize=0.8)
We can also consider a model in which the backward & forward rates between states are permitted to have different values:
fitARD<-ace(feed.mode,eel.tree,model="ARD",type="discrete")
fitARD
##
## Ancestral Character Estimation
##
## Call: ace(x = feed.mode, phy = eel.tree, type = "discrete", model = "ARD")
##
## Log-likelihood: -36.3105
##
## Rate index matrix:
## bite suction
## bite . 2
## suction 1 .
##
## Parameter estimates:
## rate index estimate std-err
## 1 0.0175 0.0064
## 2 0.0156 0.0045
##
## Scaled likelihoods at the root (type '...$lik.anc' to get them for all nodes):
## bite suction
## 0.5128531 0.4871469
fitARD$lik.anc
## bite suction
## [1,] 0.5128531470 0.4871468530
## [2,] 0.5733355632 0.4266644368
## [3,] 0.5944505875 0.4055494125
## [4,] 0.6114008937 0.3885991063
## [5,] 0.6228121844 0.3771878156
## [6,] 0.6393068928 0.3606931072
## [7,] 0.6388472597 0.3611527403
## [8,] 0.0005291536 0.9994708464
## [9,] 0.7063971096 0.2936028904
## [10,] 0.7051501899 0.2948498101
## [11,] 0.5868748673 0.4131251327
## [12,] 0.0398637553 0.9601362447
## [13,] 0.0034675193 0.9965324807
## [14,] 0.0150635115 0.9849364885
## [15,] 0.7669399749 0.2330600251
## [16,] 0.8098761695 0.1901238305
## [17,] 0.6911268707 0.3088731293
## [18,] 0.6276060901 0.3723939099
## [19,] 0.6065205446 0.3934794554
## [20,] 0.6265919527 0.3734080473
## [21,] 0.7085102287 0.2914897713
## [22,] 0.9588188326 0.0411811674
## [23,] 0.7124120033 0.2875879967
## [24,] 0.6732869064 0.3267130936
## [25,] 0.5044910053 0.4955089947
## [26,] 0.4979517327 0.5020482673
## [27,] 0.5569160674 0.4430839326
## [28,] 0.4731280335 0.5268719665
## [29,] 0.2569412886 0.7430587114
## [30,] 0.2488441105 0.7511558895
## [31,] 0.0155916308 0.9844083692
## [32,] 0.0373381060 0.9626618940
## [33,] 0.0082942371 0.9917057629
## [34,] 0.5029130319 0.4970869681
## [35,] 0.4344194663 0.5655805337
## [36,] 0.2722698888 0.7277301112
## [37,] 0.0255397209 0.9744602791
## [38,] 0.0090402495 0.9909597505
## [39,] 0.3355699063 0.6644300937
## [40,] 0.5149233061 0.4850766939
## [41,] 0.6755804498 0.3244195502
## [42,] 0.6835200870 0.3164799130
## [43,] 0.8015608211 0.1984391789
## [44,] 0.9479614699 0.0520385301
## [45,] 0.9740573341 0.0259426659
## [46,] 0.9716065836 0.0283934164
## [47,] 0.9915293041 0.0084706959
## [48,] 0.9989614588 0.0010385412
## [49,] 0.9991175147 0.0008824853
## [50,] 0.9987969620 0.0012030380
## [51,] 0.9745835766 0.0254164234
## [52,] 0.8326704196 0.1673295804
## [53,] 0.7795210027 0.2204789973
## [54,] 0.9793006382 0.0206993618
## [55,] 0.1886626789 0.8113373211
## [56,] 0.0411733964 0.9588266036
## [57,] 0.0420256656 0.9579743344
## [58,] 0.0030730170 0.9969269830
## [59,] 0.3123802200 0.6876197800
## [60,] 0.1618081107 0.8381918893
plotTree(eel.tree,type="fan",fsize=0.6,ftype="i",lwd=1)
nodelabels(node=1:eel.tree$Nnode+Ntip(eel.tree),
pie=fitARD$lik.anc,piecol=cols,cex=0.4)
tiplabels(pie=to.matrix(feed.mode[eel.tree$tip.label],levels(feed.mode)),
piecol=cols,cex=0.3)
add.simmap.legend(colors=cols,prompt=FALSE,x=0.9*par()$usr[1],
y=0.8*par()$usr[3],fsize=0.8)
An alternative tactic to the one outline above is to use an MCMC approach to sample character histories from their posterior probability distribution. This is called stochastic character mapping (Huelsenbeck et al. 2003). The model is the same but in this case we get a sample of unambiguous histories for our discrete character's evolution on the tree - rather than a probability distribution for the character at nodes.
For instance, given the data simulated above - we can generate the stochastic character map as follows:
## simulate single stochastic character map using empirical Bayes method
mtree<-make.simmap(eel.tree,feed.mode,model="ER")
## make.simmap is sampling character histories conditioned on the transition matrix
##
## Q =
## bite suction
## bite -0.01582783 0.01582783
## suction 0.01582783 -0.01582783
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
## bite suction
## 0.5 0.5
## Done.
plot(mtree,cols,type="fan",fsize=0.7,ftype="i")
add.simmap.legend(colors=cols,prompt=FALSE,x=0.9*par()$usr[1],
y=0.8*par()$usr[3],fsize=0.8)
A single stochastic character map does not mean a whole lot in isolation - we need to look at the whole distribution from a sample of stochastic maps. This can be a bit overwhelming. For instance, the following code generates 100 stochastic character maps from our dataset and plots them in a grid:
mtrees<-make.simmap(eel.tree,feed.mode,model="ER",nsim=100)
## make.simmap is sampling character histories conditioned on the transition matrix
##
## Q =
## bite suction
## bite -0.01582783 0.01582783
## suction 0.01582783 -0.01582783
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
## bite suction
## 0.5 0.5
## Done.
par(mfrow=c(10,10))
null<-sapply(mtrees,plotSimmap,colors=cols,lwd=1,ftype="off")
It's possible to summarize a set of stochastic maps in a much more meaningful way. For instance, we can estimate the number of changes of each type, the proportion of time spent in each state, and the posterior probabilities that each internal node is in each state, under our model. For example:
pd<-summary(mtrees)
pd
## 100 trees with a mapped discrete character with states:
## bite, suction
##
## trees have 30.9 changes between states on average
##
## changes are of the following types:
## bite,suction suction,bite
## x->y 16.42 14.48
##
## mean total time spent in each state is:
## bite suction total
## raw 1097.481754 886.619487 1984.101
## prop 0.553138 0.446862 1.000
plot(pd,fsize=0.6,ftype="i",colors=cols,ylim=c(-2,Ntip(eel.tree)))
add.simmap.legend(colors=cols[2:1],prompt=FALSE,x=0,y=-4,vertical=FALSE)
## now let's plot a random map, and overlay the posterior probabilities
plot(mtrees[[1]],cols,fsize=0.6,ftype="i",ylim=c(-2,Ntip(eel.tree)))
nodelabels(pie=pd$ace,piecol=cols,cex=0.5)
add.simmap.legend(colors=cols[2:1],prompt=FALSE,x=0,y=-4,vertical=FALSE)
For binary discrete characters, we can also use a method in phytools called densityMap
to visualize the posterior probability of being in each state across all the edges and nodes of the tree as follows:
obj<-densityMap(mtrees,states=levels(feed.mode)[2:1],plot=FALSE)
## sorry - this might take a while; please be patient
plot(obj,fsize=c(0.6,1))
Finally, since we obtained these inferences under exactly the same model, let's compare the posterior probabilities from stochastic mapping with our marginal ancestral states. In the former case, our probabilities were obtained by sampling from the joint (rather than marginal) probability distribution for the ancestral states.
plot(fitER$lik.anc,pd$ace,xlab="marginal ancestral states",
ylab="posterior probabilities from stochastic mapping")
lines(c(0,1),c(0,1),lty="dashed",col="red",lwd=2)
This tells us that although joint & marginal reconstruction are not the same, the marginal probabilities from joint stochastic sampling and the marginal ancestral states are quite highly correlated - at least in this case study.
Use the following files containing (1) a set of trees from the Bayesian posterior distribution for eels & (2) phenotypic data for two characters for a subset of the species in (1).
Conduct stochastic mapping of feeding mode on this set of trees. Summarize the results using phytools summary methods, and plotting your findings.
Written by Liam J. Revell. Last updated 29 Jun. 2015