Ancestral state reconstruction & visualizing ancestral states on a phylogeny

For better or for worse, the estimation of phenotypic trait values for ancestral nodes in the tree continues to be an important goal in phylogenetic comparative biology.

This tutorial focuses on the estimation of ancestral character states for discrete & continuous character, by several methods. Along the way, we'll learn some approaches from my R package, phytools, for visualizing those states on the tree.

Continuous characters

Anole body size

Let's start by estimating ancestral states for a a continuous character, in this case overall body size, on a phylogeny of Anolis lizards from the Caribbean.

To estimate the states for a continuously valued character at ancestral nodes, we need find the states that have the maximum probability of having arisen under our model. These will be our maximum likelihood estimates.

Ancestral character esimation is implemented in a variety of different R functions. The most commonly used is ace. Here, we start with the phytools function fastAnc.

For these exercises we will use the phytools package, as well as dependent packages such as ape. To follow all parts of the exercise you can install the latest version of phytools - not from CRAN, but from the phytools page:

## if phytools is loaded
## detach("package:phytools",unload=TRUE)
install.packages("http://www.phytools.org/nonstatic/phytools_0.4-98.tar.gz",
    type="source",repos=NULL)
## Installing package into 'C:/Users/Liam/Documents/R/win-library/3.2'
## (as 'lib' is unspecified)
packageVersion("phytools")
## [1] '0.4.98'

The data files you will need are here: anole.tre svl.csv. If you have a good web connection, it is also always possible to read in your files directly from the web, too.

## load libraries
library(phytools)
## Loading required package: ape
## Loading required package: maps
## read tree from file
anole.tree<-read.tree("anole.tre") ## or
anole.tree<-read.tree("http://www.phytools.org/eqg2015/data/anole.tre")
## plot tree
plotTree(anole.tree,type="fan",ftype="i")

plot of chunk unnamed-chunk-2

## setEnv=TRUE for this type is experimental. please be patient with bugs
## read data
svl<-read.csv("svl.csv",row.names=1) ## or
svl<-read.csv("http://www.phytools.org/eqg2015/data/svl.csv",
    row.names=1)
## change this into a vector
svl<-as.matrix(svl)[,1]
svl
##            ahli         alayoni         alfaroi        aliniger 
##        4.039125        3.815705        3.526655        4.036557 
##        allisoni         allogus   altitudinalis         alumina 
##        4.375390        4.040138        3.842994        3.588941 
##       alutaceus     angusticeps     argenteolus     argillaceus 
##        3.554891        3.788595        3.971307        3.757869 
##         armouri   bahorucoensis        baleatus        baracoae 
##        4.121684        3.827445        5.053056        5.042780 
##       barahonae        barbatus        barbouri        bartschi 
##        5.076958        5.003946        3.663932        4.280547 
##         bremeri        breslini    brevirostris        caudalis 
##        4.113371        4.051111        3.874155        3.911743 
##       centralis  chamaeleonides    chlorocyanus     christophei 
##        3.697941        5.042349        4.275448        3.884652 
##       clivicola     coelestinus        confusus           cooki 
##        3.758726        4.297965        3.938442        4.091535 
##    cristatellus    cupeyalensis         cuvieri    cyanopleurus 
##        4.189820        3.462014        4.875012        3.630161 
##         cybotes     darlingtoni       distichus dolichocephalus 
##        4.210982        4.302036        3.928796        3.908550 
##       equestris      etheridgei   eugenegrahami       evermanni 
##        5.113994        3.657991        4.128504        4.165605 
##         fowleri         garmani         grahami           guafe 
##        4.288780        4.769473        4.154274        3.877457 
##       guamuhaya         guazuma       gundlachi       haetianus 
##        5.036953        3.763884        4.188105        4.316542 
##      hendersoni      homolechis           imias    inexpectatus 
##        3.859835        4.032806        4.099687        3.537439 
##       insolitus        isolepis           jubar           krugi 
##        3.800471        3.657088        3.952605        3.886500 
##      lineatopus   longitibialis        loysiana          lucius 
##        4.128612        4.242103        3.701240        4.198915 
##    luteogularis      macilentus        marcanoi          marron 
##        5.101085        3.715765        4.079485        3.831810 
##         mestrei       monticola          noblei        occultus 
##        3.987147        3.770613        5.083473        3.663049 
##         olssoni        opalinus      ophiolepis        oporinus 
##        3.793899        3.838376        3.637962        3.845670 
##        paternus        placidus       poncensis        porcatus 
##        3.802961        3.773967        3.820378        4.258991 
##          porcus      pulchellus         pumilis quadriocellifer 
##        5.038034        3.799022        3.466860        3.901619 
##      reconditus        ricordii     rubribarbus          sagrei 
##        4.482607        5.013963        4.078469        4.067162 
##    semilineatus        sheplani         shrevei      singularis 
##        3.696631        3.682924        3.983003        4.057997 
##      smallwoodi         strahmi       stratulus     valencienni 
##        5.035096        4.274271        3.869881        4.321524 
##       vanidicus    vermiculatus        websteri       whitemani 
##        3.626206        4.802849        3.916546        4.097479

Now we can estimate ancestral states. We will also compute variances & 95% confidence intervals for each node:

fit<-fastAnc(anole.tree,svl,vars=TRUE,CI=TRUE)
fit
## $ace
##      101      102      103      104      105      106      107      108 
## 4.065918 4.045601 4.047993 4.066923 4.036743 4.049514 4.054528 4.045218 
##      109      110      111      112      113      114      115      116 
## 3.979493 3.952440 3.926814 3.964414 3.995835 3.948034 3.955203 3.977842 
##      117      118      119      120      121      122      123      124 
## 4.213995 4.240867 4.248450 4.257574 4.239061 4.004120 4.007024 4.015249 
##      125      126      127      128      129      130      131      132 
## 4.006720 3.992617 3.925848 3.995759 4.049619 3.930793 3.908343 3.901518 
##      133      134      135      136      137      138      139      140 
## 3.885086 4.040356 4.060970 3.987789 3.951878 3.814014 3.707733 3.926147 
##      141      142      143      144      145      146      147      148 
## 3.988745 4.167983 4.157021 4.166146 4.146362 4.146132 4.159052 4.113267 
##      149      150      151      152      153      154      155      156 
## 4.133069 4.222223 4.461376 4.488496 4.437250 4.512071 4.953146 5.033253 
##      157      158      159      160      161      162      163      164 
## 4.962377 5.009025 5.018389 3.974900 3.880482 3.859268 3.859265 3.871895 
##      165      166      167      168      169      170      171      172 
## 3.899914 3.807943 3.826619 4.151598 3.760446 3.654324 3.676713 3.825559 
##      173      174      175      176      177      178      179      180 
## 3.747726 3.813974 3.803077 3.702349 3.566476 3.678236 3.675620 3.662452 
##      181      182      183      184      185      186      187      188 
## 3.563181 3.645426 4.017460 4.113689 4.229393 4.381609 4.243153 5.041281 
##      189      190      191      192      193      194      195      196 
## 5.051563 5.051719 5.057591 4.183653 4.151505 4.113279 3.969795 3.905122 
##      197      198      199 
## 4.191810 4.161419 4.092141 
## 
## $var
##         101         102         103         104         105         106 
## 0.011775189 0.008409641 0.009452076 0.011870028 0.014459672 0.018112713 
##         107         108         109         110         111         112 
## 0.011685860 0.007268889 0.014487712 0.012814461 0.010852276 0.010048935 
##         113         114         115         116         117         118 
## 0.006240382 0.009816255 0.008377720 0.006250414 0.015406961 0.015306088 
##         119         120         121         122         123         124 
## 0.008914737 0.008485896 0.016998370 0.014850114 0.012880197 0.011964472 
##         125         126         127         128         129         130 
## 0.010525923 0.010528040 0.013248168 0.012789504 0.013568841 0.013971730 
##         131         132         133         134         135         136 
## 0.010048937 0.009535820 0.008387455 0.008359158 0.010126605 0.012608132 
##         137         138         139         140         141         142 
## 0.013951936 0.018502968 0.014105896 0.018182598 0.017745869 0.012861889 
##         143         144         145         146         147         148 
## 0.014828558 0.011891297 0.008732027 0.008604618 0.010009986 0.007660375 
##         149         150         151         152         153         154 
## 0.006817879 0.012378867 0.015540878 0.014154484 0.012761730 0.011952420 
##         155         156         157         158         159         160 
## 0.005064274 0.002463170 0.006924118 0.003953467 0.003509809 0.011102798 
##         161         162         163         164         165         166 
## 0.011378573 0.011045003 0.011682628 0.011560333 0.012242267 0.011462597 
##         167         168         169         170         171         172 
## 0.008888509 0.014206315 0.014405121 0.006290501 0.005434707 0.014714164 
##         173         174         175         176         177         178 
## 0.010884130 0.014842361 0.011333759 0.012932194 0.007449222 0.011094079 
##         179         180         181         182         183         184 
## 0.011015666 0.012850744 0.005326252 0.012876076 0.021615405 0.014608411 
##         185         186         187         188         189         190 
## 0.013155037 0.021043877 0.012658226 0.004540141 0.003540634 0.002698421 
##         191         192         193         194         195         196 
## 0.001277775 0.011837709 0.012449062 0.013536624 0.015848988 0.008788712 
##         197         198         199 
## 0.015789566 0.013529210 0.009534335 
## 
## $CI95
##         [,1]     [,2]
## 101 3.853231 4.278604
## 102 3.865861 4.225341
## 103 3.857439 4.238548
## 104 3.853382 4.280465
## 105 3.801056 4.272430
## 106 3.785731 4.313298
## 107 3.842649 4.266406
## 108 3.878112 4.212323
## 109 3.743577 4.215408
## 110 3.730566 4.174314
## 111 3.722632 4.130995
## 112 3.767935 4.160893
## 113 3.841003 4.150668
## 114 3.753844 4.142225
## 115 3.775804 4.134601
## 116 3.822885 4.132799
## 117 3.970710 4.457279
## 118 3.998380 4.483354
## 119 4.063391 4.433509
## 120 4.077021 4.438127
## 121 3.983520 4.494601
## 122 3.765272 4.242968
## 123 3.784582 4.229467
## 124 3.800860 4.229639
## 125 3.805632 4.207808
## 126 3.791509 4.193725
## 127 3.700250 4.151445
## 128 3.774102 4.217417
## 129 3.821307 4.277930
## 130 3.699117 4.162469
## 131 3.711864 4.104822
## 132 3.710121 4.092915
## 133 3.705584 4.064589
## 134 3.861156 4.219555
## 135 3.863733 4.258207
## 136 3.767708 4.207869
## 137 3.720366 4.183390
## 138 3.547403 4.080624
## 139 3.474947 3.940519
## 140 3.661855 4.190439
## 141 3.727646 4.249843
## 142 3.945699 4.390267
## 143 3.918347 4.395695
## 144 3.952414 4.379879
## 145 3.963210 4.329515
## 146 3.964320 4.327943
## 147 3.962955 4.355150
## 148 3.941721 4.284813
## 149 3.971230 4.294907
## 150 4.004152 4.440293
## 151 4.217036 4.705715
## 152 4.255309 4.721682
## 153 4.215833 4.658667
## 154 4.297789 4.726352
## 155 4.813665 5.092627
## 156 4.935977 5.130528
## 157 4.799283 5.125471
## 158 4.885787 5.132263
## 159 4.902272 5.134507
## 160 3.768375 4.181425
## 161 3.671408 4.089556
## 162 3.653282 4.065255
## 163 3.647416 4.071113
## 164 3.661158 4.082632
## 165 3.683050 4.116777
## 166 3.598098 4.017787
## 167 3.641833 4.011406
## 168 3.917985 4.385211
## 169 3.525204 3.995688
## 170 3.498871 3.809777
## 171 3.532221 3.821205
## 172 3.587807 4.063310
## 173 3.543245 3.952207
## 174 3.575189 4.052760
## 175 3.594415 4.011738
## 176 3.479458 3.925240
## 177 3.397311 3.735642
## 178 3.471792 3.884680
## 179 3.469907 3.881332
## 180 3.440264 3.884639
## 181 3.420138 3.706225
## 182 3.423020 3.867833
## 183 3.729297 4.305622
## 184 3.876793 4.350585
## 185 4.004590 4.454196
## 186 4.097282 4.665937
## 187 4.022636 4.463670
## 188 4.909215 5.173347
## 189 4.934937 5.168189
## 190 4.949904 5.153533
## 191 4.987529 5.127654
## 192 3.970403 4.396903
## 193 3.932817 4.370192
## 194 3.885239 4.341319
## 195 3.723046 4.216545
## 196 3.721376 4.088869
## 197 3.945523 4.438096
## 198 3.933441 4.389397
## 199 3.900759 4.283523
## as we discussed in class, 95% CIs can be broad
fit$CI[1,]
## [1] 3.853231 4.278604
range(svl)
## [1] 3.462014 5.113994

phytools has several different methods for visualizing reconstructed ancestral states for a continuous trait on the tree. One is a color gradient projection onto the tree:

## projection of the reconstruction onto the edges of the tree
obj<-contMap(anole.tree,svl,plot=FALSE)
plot(obj,type="fan",legend=0.7*max(nodeHeights(anole.tree)),
    fsize=c(0.7,0.9))

plot of chunk unnamed-chunk-4

A second is a projection of the tree into a space defined by time on the horizontal axis, and phenotype on the vertical dimension.

## projection of the tree into phenotype space
phenogram(anole.tree,svl,fsize=0.6,spread.costs=c(1,0))
## Optimizing the positions of the tip labels...

plot of chunk unnamed-chunk-5

Properties of ancestral states of ancestral state reconstruction

Next, we can explore some of the properties of ancestral state reconstruction of continuous traits in general.

To do this, we will use some simulation functions in the phytools package.

First, let's simulate some data:

## simulate a tree & some data
tree<-pbtree(n=26,scale=1,tip.label=LETTERS)
## simulate with ancestral states
x<-fastBM(tree,internal=TRUE)
## ancestral states
a<-x[as.character(1:tree$Nnode+Ntip(tree))]
## tip data
x<-x[tree$tip.label]

Now, let's estimate ancestral states using fastAnc which uses the re-rooting algorithm discussed in class:

fit<-fastAnc(tree,x,CI=TRUE)
fit
## $ace
##          27          28          29          30          31          32 
##  0.30696159  0.12063831  0.08144035 -0.01186937 -0.01243673 -0.29561804 
##          33          34          35          36          37          38 
##  0.23583553  0.24737980 -0.66652606 -0.54690301 -0.72462221 -0.78856261 
##          39          40          41          42          43          44 
## -1.08249928 -1.09079143 -0.73420513 -0.85876797 -0.42923557  0.03567978 
##          45          46          47          48          49          50 
##  0.81080416  0.53173049  0.52707331  0.60533272  0.45219307  1.31083890 
##          51 
##  1.50164751 
## 
## $CI95
##           [,1]         [,2]
## 27 -0.82459808  1.438521250
## 28 -0.65030909  0.891585713
## 29 -0.67957462  0.842455325
## 30 -0.71970440  0.695965664
## 31 -0.71934891  0.694475446
## 32 -0.69663413  0.105398045
## 33 -0.19444810  0.666119157
## 34 -0.05226882  0.547028416
## 35 -1.36962520  0.036573067
## 36 -0.97939125 -0.114414768
## 37 -1.11004584 -0.339198588
## 38 -1.16023344 -0.416891786
## 39 -1.35348133 -0.811517238
## 40 -1.31900458 -0.862578280
## 41 -0.99830581 -0.470104458
## 42 -0.90319321 -0.814342726
## 43 -0.86623182  0.007760684
## 44 -0.10021594  0.171575502
## 45  0.16272764  1.458880673
## 46  0.01208649  1.051374499
## 47  0.04963634  1.004510269
## 48  0.42024250  0.790422937
## 49  0.15161200  0.752774142
## 50  0.74997549  1.871702298
## 51  1.00688690  1.996408112

We can easily compare these estimates to the (normally unknown) generating values for simulated data:

plot(a,fit$ace,xlab="true states",ylab="estimated states")
lines(range(c(x,a)),range(c(x,a)),lty="dashed",col="red") ## 1:1 line

plot of chunk unnamed-chunk-8

One of the most common critiques of ancestral state estimation is that the uncertainty around ancestral states can be large; and, furthermore, that this uncertainty is often ignored. Let's address this by obtaining 95% CIs on ancestral values, and then let's show that the 95% CIs include the generating values around 95% of the time:

## first, let's go back to our previous dataset
print(fit)
## $ace
##          27          28          29          30          31          32 
##  0.30696159  0.12063831  0.08144035 -0.01186937 -0.01243673 -0.29561804 
##          33          34          35          36          37          38 
##  0.23583553  0.24737980 -0.66652606 -0.54690301 -0.72462221 -0.78856261 
##          39          40          41          42          43          44 
## -1.08249928 -1.09079143 -0.73420513 -0.85876797 -0.42923557  0.03567978 
##          45          46          47          48          49          50 
##  0.81080416  0.53173049  0.52707331  0.60533272  0.45219307  1.31083890 
##          51 
##  1.50164751 
## 
## $CI95
##           [,1]         [,2]
## 27 -0.82459808  1.438521250
## 28 -0.65030909  0.891585713
## 29 -0.67957462  0.842455325
## 30 -0.71970440  0.695965664
## 31 -0.71934891  0.694475446
## 32 -0.69663413  0.105398045
## 33 -0.19444810  0.666119157
## 34 -0.05226882  0.547028416
## 35 -1.36962520  0.036573067
## 36 -0.97939125 -0.114414768
## 37 -1.11004584 -0.339198588
## 38 -1.16023344 -0.416891786
## 39 -1.35348133 -0.811517238
## 40 -1.31900458 -0.862578280
## 41 -0.99830581 -0.470104458
## 42 -0.90319321 -0.814342726
## 43 -0.86623182  0.007760684
## 44 -0.10021594  0.171575502
## 45  0.16272764  1.458880673
## 46  0.01208649  1.051374499
## 47  0.04963634  1.004510269
## 48  0.42024250  0.790422937
## 49  0.15161200  0.752774142
## 50  0.74997549  1.871702298
## 51  1.00688690  1.996408112
mean(((a>=fit$CI95[,1]) + (a<=fit$CI95[,2]))==2)
## [1] 1

One small tree doesn't tell us much, so let's repeat for a sample of trees & reconstructions:

## custom function that conducts a simulation, estimates ancestral
## states, & returns the fraction on 95% CI
foo<-function(){
    tree<-pbtree(n=100)
    x<-fastBM(tree,internal=TRUE)
    fit<-fastAnc(tree,x[1:length(tree$tip.label)],CI=TRUE)
    mean(((x[1:tree$Nnode+length(tree$tip.label)]>=fit$CI95[,1]) +
        (x[1:tree$Nnode+length(tree$tip.label)]<=fit$CI95[,2]))==2)
}
## conduct 100 simulations
pp<-replicate(100,foo())    
mean(pp)
## [1] 0.949899

Cool. Well, this shows us that although CIs can be large, when the model is correct they will include the generating value (1-α)% of the time.

Ancestral state estimates when some nodes are known

We can theoretically fix the state for any internal node during MLE of ancestral states. In practice, we would do this by attaching a terminal edge of zero length to the internal node we wanted to fix, and then treat it like another tip value in our analyses.

Another possibility, which also allows for the possibility that ancestral states are uncertain, is to use an informative prior probability density on one or multiple states at internal nodes, and then estimate ancestral states using Bayesian MCMC.

Let's try this using a really bad case for ancestral character estimation from contemporaneous tip data: Brownian evolution with a trend. Note that although we theoretically could do so - we are not fitting a trended model here.

tree<-pbtree(n=100,scale=1)
## simulate data with a trend
x<-fastBM(tree,internal=TRUE,mu=3)
phenogram(tree,x,ftype="off",spread.labels=FALSE)

plot of chunk unnamed-chunk-11

a<-x[as.character(1:tree$Nnode+Ntip(tree))]
x<-x[tree$tip.label]
## let's see how bad we do if we ignore the trend
plot(a,fastAnc(tree,x),xlab="true values",
    ylab="estimated states under BM")
lines(range(c(x,a)),range(c(x,a)),lty="dashed",col="red")
title("estimated without prior information")

plot of chunk unnamed-chunk-11

## incorporate prior knowledge
pm<-setNames(c(1000,rep(0,tree$Nnode)),
    c("sig2",1:tree$Nnode+length(tree$tip.label)))
## the root & two randomly chosen nodes
nn<-as.character(c(length(tree$tip.label)+1,
    sample(2:tree$Nnode+length(tree$tip.label),2)))
pm[nn]<-a[as.character(nn)]
## prior variance
pv<-setNames(c(1000^2,rep(1000,length(pm)-1)),names(pm))
pv[as.character(nn)]<-1e-100
## run MCMC
mcmc<-anc.Bayes(tree,x,ngen=100000,
    control=list(pr.mean=pm,pr.var=pv,
    a=pm[as.character(length(tree$tip.label)+1)],
    y=pm[as.character(2:tree$Nnode+length(tree$tip.label))]))
## Control parameters (set by user or default):
## List of 7
##  $ sig2   : num 1.1
##  $ a      : Named num 0
##   ..- attr(*, "names")= chr "101"
##  $ y      : Named num [1:98] 0 0 0 0 0 ...
##   ..- attr(*, "names")= chr [1:98] "102" "103" "104" "105" ...
##  $ pr.mean: Named num [1:100] 1000 0 0 0 0 ...
##   ..- attr(*, "names")= chr [1:100] "sig2" "101" "102" "103" ...
##  $ pr.var : Named num [1:100] 1e+06 1e-100 1e+03 1e+03 1e+03 ...
##   ..- attr(*, "names")= chr [1:100] "sig2" "101" "102" "103" ...
##  $ prop   : num [1:100] 0.011 0.011 0.011 0.011 0.011 ...
##  $ sample : num 100
## Starting MCMC...
## Done MCMC.
anc.est<-colMeans(mcmc[201:1001,
    as.character(1:tree$Nnode+length(tree$tip.label))])
plot(a[names(anc.est)],anc.est,xlab="true values",
    ylab="estimated states using informative prior")
lines(range(c(x,a)),range(c(x,a)),lty="dashed",col="red")
title("estimated using informative prior")

plot of chunk unnamed-chunk-11

Discrete characters

This part of the tutorial focuses on the estimation of ancestral character states for discretely valued traits using a continuous-time Markov chain model commonly known as the Mk model.

phytools" has a function for ancestral character estimation under a couple of different models that uses the re-rooting method of Yang (1995). For other models of evolutionary change not covered by this function, users should try Rich Fitzjohn's diversitree package, or the ace function in ape.

OK - let's pull our data from the Anolis ecomorph tree (packaged with phytools) to use in these analyses:

data(anoletree)
## this is just to pull out the tip states from the
## data object - normally we would read this from file
x<-getStates(anoletree,"tips")
tree<-anoletree
rm(anoletree)
tree
## 
## Phylogenetic tree with 82 tips and 81 internal nodes.
## 
## Tip labels:
##  Anolis_ahli, Anolis_allogus, Anolis_rubribarbus, Anolis_imias, Anolis_sagrei, Anolis_bremeri, ...
## 
## Rooted; includes branch lengths.
x
##            Anolis_ahli         Anolis_allogus     Anolis_rubribarbus 
##                   "TG"                   "TG"                   "TG" 
##           Anolis_imias          Anolis_sagrei         Anolis_bremeri 
##                   "TG"                   "TG"                   "TG" 
## Anolis_quadriocellifer      Anolis_ophiolepis         Anolis_mestrei 
##                   "TG"                   "GB"                   "TG" 
##           Anolis_jubar      Anolis_homolechis        Anolis_confusus 
##                   "TG"                   "TG"                   "TG" 
##           Anolis_guafe         Anolis_garmani        Anolis_opalinus 
##                   "TG"                   "CG"                   "TC" 
##         Anolis_grahami     Anolis_valencienni      Anolis_lineatopus 
##                   "TC"                   "Tw"                   "TG" 
##       Anolis_evermanni       Anolis_stratulus           Anolis_krugi 
##                   "TC"                   "TC"                   "GB" 
##      Anolis_pulchellus       Anolis_gundlachi       Anolis_poncensis 
##                   "GB"                   "TG"                   "GB" 
##           Anolis_cooki    Anolis_cristatellus    Anolis_brevirostris 
##                   "TG"                   "TG"                   "Tr" 
##        Anolis_caudalis          Anolis_marron        Anolis_websteri 
##                   "Tr"                   "Tr"                   "Tr" 
##       Anolis_distichus         Anolis_alumina    Anolis_semilineatus 
##                   "Tr"                   "GB"                   "GB" 
##         Anolis_olssoni       Anolis_insolitus       Anolis_whitemani 
##                   "GB"                   "Tw"                   "TG" 
##       Anolis_haetianus        Anolis_breslini         Anolis_armouri 
##                   "TG"                   "TG"                   "TG" 
##         Anolis_cybotes         Anolis_shrevei   Anolis_longitibialis 
##                   "TG"                   "TG"                   "TG" 
##         Anolis_strahmi        Anolis_marcanoi        Anolis_baleatus 
##                   "TG"                   "TG"                   "CG" 
##       Anolis_barahonae        Anolis_ricordii         Anolis_cuvieri 
##                   "CG"                   "CG"                   "CG" 
##   Anolis_altitudinalis        Anolis_oporinus        Anolis_isolepis 
##                   "TC"                   "TC"                   "TC" 
##        Anolis_allisoni        Anolis_porcatus        Anolis_loysiana 
##                   "TC"                   "TC"                   "Tr" 
##         Anolis_guazuma        Anolis_placidus        Anolis_sheplani 
##                   "Tw"                   "Tw"                   "Tw" 
##         Anolis_alayoni     Anolis_angusticeps        Anolis_paternus 
##                   "Tw"                   "Tw"                   "Tw" 
##       Anolis_alutaceus    Anolis_inexpectatus       Anolis_clivicola 
##                   "GB"                   "GB"                   "GB" 
##    Anolis_cupeyalensis    Anolis_cyanopleurus         Anolis_alfaroi 
##                   "GB"                   "GB"                   "GB" 
##      Anolis_macilentus       Anolis_vanidicus        Anolis_baracoae 
##                   "GB"                   "GB"                   "CG" 
##          Anolis_noblei      Anolis_smallwoodi    Anolis_luteogularis 
##                   "CG"                   "CG"                   "CG" 
##       Anolis_equestris   Anolis_bahorucoensis Anolis_dolichocephalus 
##                   "CG"                   "GB"                   "GB" 
##      Anolis_hendersoni     Anolis_darlingtoni        Anolis_aliniger 
##                   "GB"                   "Tw"                   "TC" 
##      Anolis_singularis    Anolis_chlorocyanus     Anolis_coelestinus 
##                   "TC"                   "TC"                   "TC" 
##        Anolis_occultus 
##                   "Tw"

In visual format, these are the data that we have:

plotTree(tree,type="fan",fsize=0.8,ftype="i")
## setEnv=TRUE for this type is experimental. please be patient with bugs
cols<-setNames(palette()[1:length(unique(x))],sort(unique(x)))
tiplabels(pie=to.matrix(x,sort(unique(x))),piecol=cols,cex=0.3)
add.simmap.legend(colors=cols,prompt=FALSE,x=0.9*par()$usr[1],
    y=-max(nodeHeights(tree)),fsize=0.8)

plot of chunk unnamed-chunk-13

Next, let's fit a single-rate model & reconstruct ancestral states at internal nodes in the tree.

## estimate ancestral states under a ER model
fitER<-ace(x,tree,model="ER",type="discrete")
fitER
## 
##     Ancestral Character Estimation
## 
## Call: ace(x = x, phy = tree, type = "discrete", model = "ER")
## 
##     Log-likelihood: -78.04604 
## 
## Rate index matrix:
##    CG GB TC TG Tr Tw
## CG  .  1  1  1  1  1
## GB  1  .  1  1  1  1
## TC  1  1  .  1  1  1
## TG  1  1  1  .  1  1
## Tr  1  1  1  1  .  1
## Tw  1  1  1  1  1  .
## 
## Parameter estimates:
##  rate index estimate std-err
##           1   0.0231   0.004
## 
## Scaled likelihoods at the root (type '...$lik.anc' to get them for all nodes):
##          CG          GB          TC          TG          Tr          Tw 
## 0.018197565 0.202238628 0.042841575 0.428609607 0.004383532 0.303729093
round(fitER$lik.anc,3)
##          CG    GB    TC    TG    Tr    Tw
##  [1,] 0.018 0.202 0.043 0.429 0.004 0.304
##  [2,] 0.008 0.197 0.020 0.501 0.001 0.272
##  [3,] 0.003 0.134 0.020 0.716 0.005 0.122
##  [4,] 0.002 0.042 0.019 0.858 0.002 0.077
##  [5,] 0.000 0.001 0.001 0.995 0.000 0.002
##  [6,] 0.000 0.000 0.000 0.998 0.000 0.001
##  [7,] 0.000 0.000 0.000 1.000 0.000 0.000
##  [8,] 0.000 0.000 0.000 1.000 0.000 0.000
##  [9,] 0.000 0.000 0.000 1.000 0.000 0.000
## [10,] 0.000 0.000 0.000 1.000 0.000 0.000
## [11,] 0.000 0.003 0.000 0.997 0.000 0.000
## [12,] 0.000 0.000 0.000 1.000 0.000 0.000
## [13,] 0.000 0.000 0.000 1.000 0.000 0.000
## [14,] 0.000 0.000 0.000 1.000 0.000 0.000
## [15,] 0.000 0.000 0.000 1.000 0.000 0.000
## [16,] 0.000 0.000 0.000 1.000 0.000 0.000
## [17,] 0.007 0.019 0.084 0.786 0.005 0.099
## [18,] 0.020 0.018 0.314 0.428 0.011 0.210
## [19,] 0.030 0.001 0.945 0.014 0.001 0.007
## [20,] 0.035 0.001 0.945 0.012 0.001 0.006
## [21,] 0.004 0.148 0.036 0.692 0.071 0.049
## [22,] 0.003 0.179 0.057 0.721 0.024 0.017
## [23,] 0.001 0.004 0.979 0.013 0.001 0.001
## [24,] 0.001 0.214 0.010 0.767 0.005 0.003
## [25,] 0.000 0.353 0.003 0.640 0.002 0.001
## [26,] 0.001 0.921 0.002 0.073 0.002 0.002
## [27,] 0.001 0.356 0.003 0.637 0.002 0.001
## [28,] 0.001 0.046 0.003 0.947 0.002 0.001
## [29,] 0.001 0.003 0.001 0.013 0.980 0.002
## [30,] 0.000 0.000 0.000 0.000 1.000 0.000
## [31,] 0.000 0.000 0.000 0.000 1.000 0.000
## [32,] 0.000 0.000 0.000 0.000 1.000 0.000
## [33,] 0.009 0.203 0.018 0.492 0.001 0.276
## [34,] 0.022 0.213 0.013 0.495 0.002 0.255
## [35,] 0.020 0.359 0.014 0.311 0.008 0.287
## [36,] 0.003 0.954 0.002 0.020 0.002 0.019
## [37,] 0.000 0.997 0.000 0.001 0.000 0.001
## [38,] 0.092 0.141 0.012 0.582 0.005 0.168
## [39,] 0.003 0.005 0.001 0.985 0.001 0.005
## [40,] 0.000 0.000 0.000 0.999 0.000 0.000
## [41,] 0.000 0.000 0.000 1.000 0.000 0.000
## [42,] 0.000 0.000 0.000 1.000 0.000 0.000
## [43,] 0.000 0.000 0.000 1.000 0.000 0.000
## [44,] 0.000 0.000 0.000 1.000 0.000 0.000
## [45,] 0.000 0.000 0.000 1.000 0.000 0.000
## [46,] 0.000 0.000 0.000 1.000 0.000 0.000
## [47,] 0.911 0.015 0.005 0.049 0.004 0.017
## [48,] 1.000 0.000 0.000 0.000 0.000 0.000
## [49,] 1.000 0.000 0.000 0.000 0.000 0.000
## [50,] 0.006 0.291 0.042 0.136 0.006 0.519
## [51,] 0.002 0.055 0.071 0.026 0.006 0.840
## [52,] 0.002 0.039 0.118 0.019 0.009 0.812
## [53,] 0.004 0.016 0.653 0.010 0.047 0.270
## [54,] 0.001 0.004 0.927 0.002 0.010 0.056
## [55,] 0.000 0.000 0.999 0.000 0.000 0.001
## [56,] 0.000 0.000 1.000 0.000 0.000 0.000
## [57,] 0.000 0.000 0.995 0.000 0.001 0.003
## [58,] 0.000 0.005 0.006 0.002 0.001 0.985
## [59,] 0.000 0.000 0.000 0.000 0.000 1.000
## [60,] 0.000 0.000 0.000 0.000 0.000 0.999
## [61,] 0.000 0.000 0.000 0.000 0.000 1.000
## [62,] 0.000 0.988 0.001 0.002 0.000 0.008
## [63,] 0.000 1.000 0.000 0.000 0.000 0.000
## [64,] 0.000 1.000 0.000 0.000 0.000 0.000
## [65,] 0.000 1.000 0.000 0.000 0.000 0.000
## [66,] 0.000 1.000 0.000 0.000 0.000 0.000
## [67,] 0.000 1.000 0.000 0.000 0.000 0.000
## [68,] 0.000 1.000 0.000 0.000 0.000 0.000
## [69,] 0.039 0.212 0.096 0.264 0.007 0.381
## [70,] 0.079 0.253 0.212 0.118 0.008 0.329
## [71,] 1.000 0.000 0.000 0.000 0.000 0.000
## [72,] 1.000 0.000 0.000 0.000 0.000 0.000
## [73,] 1.000 0.000 0.000 0.000 0.000 0.000
## [74,] 1.000 0.000 0.000 0.000 0.000 0.000
## [75,] 0.048 0.280 0.278 0.071 0.006 0.317
## [76,] 0.043 0.322 0.238 0.062 0.007 0.328
## [77,] 0.003 0.959 0.013 0.004 0.002 0.018
## [78,] 0.000 0.999 0.000 0.000 0.000 0.000
## [79,] 0.007 0.034 0.907 0.010 0.002 0.039
## [80,] 0.000 0.001 0.997 0.000 0.000 0.001
## [81,] 0.000 0.000 1.000 0.000 0.000 0.000

The element lik.anc gives us the marginal ancestral states, also known as the 'empirical Bayesian posterior probabilities.'

It is fairly straightforward to overlay these posterior probabilities on the tree:

plotTree(tree,type="fan",fsize=0.8,ftype="i")
## setEnv=TRUE for this type is experimental. please be patient with bugs
nodelabels(node=1:tree$Nnode+Ntip(tree),
    pie=fitER$lik.anc,piecol=cols,cex=0.5)
tiplabels(pie=to.matrix(x,sort(unique(x))),piecol=cols,cex=0.3)
add.simmap.legend(colors=cols,prompt=FALSE,x=0.9*par()$usr[1],
    y=-max(nodeHeights(tree)),fsize=0.8)

plot of chunk unnamed-chunk-15

An alternative approach to the one outline above is to use an MCMC approach to sample character histories from their posterior probability distribution. This is called stochastic character mapping (Huelsenbeck et al. 2003). The model is the same but in this case we get a sample of unambiguous histories for our discrete character's evolution on the tree - rather than a probability distribution for the character at nodes.

For instance, given the data simulated above - we can generate the stochastic character map as follows:

## simulate single stochastic character map using empirical Bayes method
mtree<-make.simmap(tree,x,model="ER")
## make.simmap is sampling character histories conditioned on the transition matrix
## Q =
##             CG          GB          TC          TG          Tr          Tw
## CG -0.11570723  0.02314145  0.02314145  0.02314145  0.02314145  0.02314145
## GB  0.02314145 -0.11570723  0.02314145  0.02314145  0.02314145  0.02314145
## TC  0.02314145  0.02314145 -0.11570723  0.02314145  0.02314145  0.02314145
## TG  0.02314145  0.02314145  0.02314145 -0.11570723  0.02314145  0.02314145
## Tr  0.02314145  0.02314145  0.02314145  0.02314145 -0.11570723  0.02314145
## Tw  0.02314145  0.02314145  0.02314145  0.02314145  0.02314145 -0.11570723
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
##        CG        GB        TC        TG        Tr        Tw 
## 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
## Done.
mtree
## 
## Phylogenetic tree with 82 tips and 81 internal nodes.
## 
## Tip labels:
##  Anolis_ahli, Anolis_allogus, Anolis_rubribarbus, Anolis_imias, Anolis_sagrei, Anolis_bremeri, ...
## 
## The tree includes a mapped, 6-state discrete character with states:
##  CG, GB, TC, TG, Tr, Tw
## 
## Rooted; includes branch lengths.
plot(mtree,cols,type="fan",fsize=0.8,ftype="i")
## setEnv=TRUE for this type is experimental. please be patient with bugs
add.simmap.legend(colors=cols,prompt=FALSE,x=0.9*par()$usr[1],
    y=-max(nodeHeights(tree)),fsize=0.8)

plot of chunk unnamed-chunk-16

A single stochastic character map does not mean a whole lot in isolation - we need to look at the whole distribution from a sample of stochastic maps. This can be a bit overwhelming. For instance, the following code generates 100 stochastic character maps from our dataset and plots them in a grid:

mtrees<-make.simmap(tree,x,model="ER",nsim=100)
## make.simmap is sampling character histories conditioned on the transition matrix
## Q =
##             CG          GB          TC          TG          Tr          Tw
## CG -0.11570723  0.02314145  0.02314145  0.02314145  0.02314145  0.02314145
## GB  0.02314145 -0.11570723  0.02314145  0.02314145  0.02314145  0.02314145
## TC  0.02314145  0.02314145 -0.11570723  0.02314145  0.02314145  0.02314145
## TG  0.02314145  0.02314145  0.02314145 -0.11570723  0.02314145  0.02314145
## Tr  0.02314145  0.02314145  0.02314145  0.02314145 -0.11570723  0.02314145
## Tw  0.02314145  0.02314145  0.02314145  0.02314145  0.02314145 -0.11570723
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
##        CG        GB        TC        TG        Tr        Tw 
## 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
## Done.
mtrees
## 100 phylogenetic trees with mapped discrete characters
par(mfrow=c(10,10))
null<-sapply(mtrees,plot,colors=cols,lwd=1,ftype="off")

plot of chunk unnamed-chunk-17

It's possible to summarize a set of stochastic maps in a much more meaningful way. For instance, we can estimate the number of changes of each type, the proportion of time spent in each state, and the posterior probabilities that each internal node is in each state, under our model. For example:

pd<-summary(mtrees,plot=FALSE)
pd
## 100 trees with a mapped discrete character with states:
##  CG, GB, TC, TG, Tr, Tw 
## 
## trees have 24.07 changes between states on average
## 
## changes are of the following types:
##      CG,GB CG,TC CG,TG CG,Tr CG,Tw GB,CG GB,TC GB,TG GB,Tr GB,Tw TC,CG
## x->y  0.27  0.25  0.22  0.09  0.23  0.56  0.97  1.26  0.41  1.52  1.41
##      TC,GB TC,TG TC,Tr TC,Tw TG,CG TG,GB TG,TC TG,Tr TG,Tw Tr,CG Tr,GB
## x->y  0.64  0.32   0.8  1.13  0.97  3.17  1.91  0.93  1.81   0.2  0.17
##      Tr,TC Tr,TG Tr,Tw Tw,CG Tw,GB Tw,TC Tw,TG Tw,Tr
## x->y  0.15  0.14  0.13  0.56  1.31  1.43   0.7  0.41
## 
## mean total time spent in each state is:
##               CG        GB         TC         TG          Tr         Tw
## raw  12.90191857 47.001692 33.6593396 69.1939136 12.53922339 30.3708734
## prop  0.06273209  0.228533  0.1636594  0.3364367  0.06096858  0.1476702
##        total
## raw  205.667
## prop   1.000
plot(pd,fsize=0.6,ftype="i")

plot of chunk unnamed-chunk-18

## now let's plot a random map, and overlay the posterior probabilities
plot(mtrees[[1]],cols,type="fan",fsize=0.8,ftype="i")
## setEnv=TRUE for this type is experimental. please be patient with bugs
nodelabels(pie=pd$ace,piecol=cols,cex=0.5)
add.simmap.legend(colors=cols,prompt=FALSE,x=0.9*par()$usr[1],
    y=-max(nodeHeights(tree)),fsize=0.8)

plot of chunk unnamed-chunk-19

Finally, since we obtained these inferences under exactly the same model, let's compare the posterior probabilities from stochastic mapping with our marginal ancestral states. In the former case, our probabilities were obtained by sampling from the joint (rather than marginal) probability distribution for the ancestral states.

plot(fitER$lik.anc,pd$ace,xlab="marginal ancestral states",
    ylab="posterior probabilities from stochastic mapping")
lines(c(0,1),c(0,1),lty="dashed",col="red",lwd=2)

plot of chunk unnamed-chunk-20

This tells us that although joint & marginal reconstruction are not the same, the marginal probabilities from joint stochastic sampling and the marginal ancestral states are quite highly correlated - at least in this case study.

Other visualization methods for ancestral states

In this final part of the tutorial I'm going to quickly overview a range of plotting methods for phylogenies & comparative data that are implemented in the phytools package.

Continuous character methods

The first plotting method I'll illustrate is called a 'traitgram' which is a projection of the tree into a space defined by trait values & time since the root. So, for example:

## simulate a tree
tree<-pbtree(n=26,tip.label=LETTERS)
plotTree(tree)

plot of chunk unnamed-chunk-21

## simulate data under Brownian motion
x<-fastBM(tree)
x
##           A           B           C           D           E           F 
## -1.31892549 -0.89247317 -1.50604164 -1.15746924 -1.29421210  0.20115505 
##           G           H           I           J           K           L 
## -0.65600380 -1.52414832  1.95074529  1.94129933  1.72489401  1.55762043 
##           M           N           O           P           Q           R 
##  2.25736463  2.29204689  1.86703510  1.64176548  0.77126487 -0.04813491 
##           S           T           U           V           W           X 
##  0.06931664  0.49297554  1.02772880  1.05365961  2.37696337  2.59637119 
##           Y           Z 
##  1.72738064  1.81238371
phenogram(tree,x,spread.labels=TRUE,spread.cost=c(1,0))

plot of chunk unnamed-chunk-22

We can also plot a traitgram with the uncertainty about ancestral character states visualized using transparent probability density. This is implemented in the phytools function fancyTree. So, for instance:

## plot traitgram with 95% CI
fancyTree(tree,type="phenogram95",x=x,spread.cost=c(1,0))
## Computing density traitgram...

plot of chunk unnamed-chunk-23

Next, we can explore the continuous character mapping function we've seen already in phytools called contMap. Here is a quick demo using the same data:

## plot contMap
obj<-contMap(tree,x)

plot of chunk unnamed-chunk-24

The function contMap returns an object of class "contMap" which we can then more easily replot using, for instance, different parameters:

## plot leftward
plot(obj,direction="leftwards")

plot of chunk unnamed-chunk-25

For large trees, we might want to use a circular or “fan” tree:

tree<-pbtree(n=200,scale=1)
x<-fastBM(tree)
obj<-contMap(tree,x,plot=FALSE)
plot(obj,type="fan",outline=FALSE,ftype="off")

plot of chunk unnamed-chunk-26

Note that the intense aliasing is just a byproduct of plotting to .png format. To get a high quality image we can plot to .pdf or other formats. For instance:

pdf(file="plot-contMap.pdf",width=10,height=10)
plot(obj,type="fan",outline=FALSE)
dev.off()
## png 
##   2

The result can be viewed here.

Finally, a relatively simple new plotting method in phytools is the function plotTree.wBars. That function pretty much does what it sounds like it does:

plotTree.wBars(anole.tree,exp(svl),type="fan",scale=0.002,
    fsize=0.7,tip.labels=TRUE)

plot of chunk unnamed-chunk-28

It is not too difficult to combine this with a contMap plot. For example:

obj<-contMap(anole.tree,exp(svl),plot=FALSE)
plotTree.wBars(obj$tree,exp(svl),method="plotSimmap",
    tip.labels=TRUE,fsize=0.7,colors=obj$cols,type="fan",scale=0.002)
add.color.bar(1.0,obj$cols,title="trait value",lims=obj$lims,prompt=FALSE,
    x=0.9*par()$usr[1],y=0.9*par()$usr[3])

plot of chunk unnamed-chunk-29

We can also do two dimensional visualizations of phylogenies in morphospace. The is called a 'phylomorphospace'. E.g.:

tree<-pbtree(n=26,tip.label=LETTERS)
X<-fastBM(tree,nsim=3) ## simulate 3 characters
colnames(X)<-paste("trait",1:3)
phylomorphospace(tree,X[,c(1,2)],xlab="trait 1",ylab="trait 2")

plot of chunk unnamed-chunk-30

With phytools we can do static or animated version of the same. I'll just do a static version here, but animated is the default:

phylomorphospace3d(tree,X,method="static")

plot of chunk unnamed-chunk-31

Try the animated version on your own.

Finally, it's possible to combine the continuous character mapping and 2D phylomorphospaces using a type of phylogenetic scatterplot. Here's a demo:

fancyTree(tree,type="scattergram",X=X)
## Computing multidimensional phylogenetic scatterplot matrix...

plot of chunk unnamed-chunk-32

Discrete character methods

Stochastic character maps are easy to plot using phytools. For instance:

data(anoletree)
plotSimmap(anoletree,fsize=0.6,ftype="i",ylim=c(-1,Ntip(anoletree)))
## no colors provided. using the following legend:
##        CG        GB        TC        TG        Tr        Tw 
##   "black"     "red"  "green3"    "blue"    "cyan" "magenta"
## we can use any color scheme, but this is the default
cols<-setNames(palette()[1:length(unique(getStates(anoletree,
    "tips")))],sort(unique(getStates(anoletree,"tips"))))
add.simmap.legend(colors=cols,x=0,y=-1,vertical=FALSE,prompt=FALSE)
## mark the changes on the tree
xy<-markChanges(anoletree,plot=FALSE)
points(xy,pch=19)

plot of chunk unnamed-chunk-33

For binary characters, we can also map the posterior density of stochastic maps onto the nodes & branches of a tree using the phytools function densityMap. Here's a demo of this using simulated data.

## first let's simulate data
Q<-matrix(c(-1,1,1,-1),2,2)
rownames(Q)<-colnames(Q)<-c(0,1)
tree<-sim.history(tree,Q)
## Done simulation(s).
x<-tree$states
x
##   A   B   C   D   E   F   G   H   I   J   K   L   M   N   O   P   Q   R 
## "1" "1" "1" "0" "0" "1" "1" "1" "1" "1" "0" "1" "0" "1" "1" "0" "0" "0" 
##   S   T   U   V   W   X   Y   Z 
## "1" "1" "1" "1" "0" "0" "0" "0"
## stochastic maps
trees<-make.simmap(tree,x,nsim=100)
## make.simmap is sampling character histories conditioned on the transition matrix
## Q =
##            0          1
## 0 -0.9914265  0.9914265
## 1  0.9914265 -0.9914265
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
##   0   1 
## 0.5 0.5
## Done.
## make densityMap
obj<-densityMap(trees,lwd=4,outline=TRUE)
## sorry - this might take a while; please be patient

plot of chunk unnamed-chunk-34

Combining discrete & continuous characters

We can, for instance, overlay a discrete character history (or stochastic map) onto a traitgram or phylomorphospace.

## simulate data with a high rate on some branches
x<-sim.rates(tree,setNames(c(1,10),c(0,1)))
## plot traitgram   
phenogram(tree,x,colors=setNames(c("blue","red"),
    c(0,1)),spread.labels=TRUE,spread.cost=c(1,0))

plot of chunk unnamed-chunk-35

## we can do the same with phylomorphospaces
X<-cbind(sim.rates(tree,setNames(c(1,10),c(0,1))),
    sim.rates(tree,setNames(c(1,10),c(0,1))))
phylomorphospace(tree,X,colors=setNames(c("blue","red"),c(0,1)))

plot of chunk unnamed-chunk-35

## it is even possible to overlay a posterior density from stochastic mapping
phylomorphospace(obj$tree,X,colors=obj$cols,
    lwd=3,xlab="x",ylab="y")
add.color.bar(4,obj$cols,title="PP(state=1)",
    prompt=FALSE,x=0.9*par()$usr[1],y=0.9*par()$usr[3])

plot of chunk unnamed-chunk-35

Written by Liam J. Revell. Last updated August 14, 2015