For better or for worse, the estimation of phenotypic trait values for ancestral nodes in the tree continues to be an important goal in phylogenetic comparative biology.
This tutorial focuses on the estimation of ancestral character states for discrete & continuous character, by several methods. Along the way, we'll learn some approaches from my R package, phytools, for visualizing those states on the tree.
Let's start by estimating ancestral states for a a continuous character, in this case overall body size, on a phylogeny of Anolis lizards from the Caribbean.
To estimate the states for a continuously valued character at ancestral nodes, we need find the states that have the maximum probability of having arisen under our model. These will be our maximum likelihood estimates.
Ancestral character esimation is implemented in a variety of different R
functions. The most commonly used is ace
. Here, we start with
the phytools function fastAnc
.
For these exercises we will use the phytools package, as well as dependent packages such as ape. To follow all parts of the exercise you can install the latest version of phytools - not from CRAN, but from the phytools page:
## if phytools is loaded
## detach("package:phytools",unload=TRUE)
install.packages("http://www.phytools.org/nonstatic/phytools_0.4-98.tar.gz",
type="source",repos=NULL)
## Installing package into 'C:/Users/Liam/Documents/R/win-library/3.2'
## (as 'lib' is unspecified)
packageVersion("phytools")
## [1] '0.4.98'
The data files you will need are here: anole.tre svl.csv. If you have a good web connection, it is also always possible to read in your files directly from the web, too.
## load libraries
library(phytools)
## Loading required package: ape
## Loading required package: maps
## read tree from file
anole.tree<-read.tree("anole.tre") ## or
anole.tree<-read.tree("http://www.phytools.org/eqg2015/data/anole.tre")
## plot tree
plotTree(anole.tree,type="fan",ftype="i")
## setEnv=TRUE for this type is experimental. please be patient with bugs
## read data
svl<-read.csv("svl.csv",row.names=1) ## or
svl<-read.csv("http://www.phytools.org/eqg2015/data/svl.csv",
row.names=1)
## change this into a vector
svl<-as.matrix(svl)[,1]
svl
## ahli alayoni alfaroi aliniger
## 4.039125 3.815705 3.526655 4.036557
## allisoni allogus altitudinalis alumina
## 4.375390 4.040138 3.842994 3.588941
## alutaceus angusticeps argenteolus argillaceus
## 3.554891 3.788595 3.971307 3.757869
## armouri bahorucoensis baleatus baracoae
## 4.121684 3.827445 5.053056 5.042780
## barahonae barbatus barbouri bartschi
## 5.076958 5.003946 3.663932 4.280547
## bremeri breslini brevirostris caudalis
## 4.113371 4.051111 3.874155 3.911743
## centralis chamaeleonides chlorocyanus christophei
## 3.697941 5.042349 4.275448 3.884652
## clivicola coelestinus confusus cooki
## 3.758726 4.297965 3.938442 4.091535
## cristatellus cupeyalensis cuvieri cyanopleurus
## 4.189820 3.462014 4.875012 3.630161
## cybotes darlingtoni distichus dolichocephalus
## 4.210982 4.302036 3.928796 3.908550
## equestris etheridgei eugenegrahami evermanni
## 5.113994 3.657991 4.128504 4.165605
## fowleri garmani grahami guafe
## 4.288780 4.769473 4.154274 3.877457
## guamuhaya guazuma gundlachi haetianus
## 5.036953 3.763884 4.188105 4.316542
## hendersoni homolechis imias inexpectatus
## 3.859835 4.032806 4.099687 3.537439
## insolitus isolepis jubar krugi
## 3.800471 3.657088 3.952605 3.886500
## lineatopus longitibialis loysiana lucius
## 4.128612 4.242103 3.701240 4.198915
## luteogularis macilentus marcanoi marron
## 5.101085 3.715765 4.079485 3.831810
## mestrei monticola noblei occultus
## 3.987147 3.770613 5.083473 3.663049
## olssoni opalinus ophiolepis oporinus
## 3.793899 3.838376 3.637962 3.845670
## paternus placidus poncensis porcatus
## 3.802961 3.773967 3.820378 4.258991
## porcus pulchellus pumilis quadriocellifer
## 5.038034 3.799022 3.466860 3.901619
## reconditus ricordii rubribarbus sagrei
## 4.482607 5.013963 4.078469 4.067162
## semilineatus sheplani shrevei singularis
## 3.696631 3.682924 3.983003 4.057997
## smallwoodi strahmi stratulus valencienni
## 5.035096 4.274271 3.869881 4.321524
## vanidicus vermiculatus websteri whitemani
## 3.626206 4.802849 3.916546 4.097479
Now we can estimate ancestral states. We will also compute variances & 95% confidence intervals for each node:
fit<-fastAnc(anole.tree,svl,vars=TRUE,CI=TRUE)
fit
## $ace
## 101 102 103 104 105 106 107 108
## 4.065918 4.045601 4.047993 4.066923 4.036743 4.049514 4.054528 4.045218
## 109 110 111 112 113 114 115 116
## 3.979493 3.952440 3.926814 3.964414 3.995835 3.948034 3.955203 3.977842
## 117 118 119 120 121 122 123 124
## 4.213995 4.240867 4.248450 4.257574 4.239061 4.004120 4.007024 4.015249
## 125 126 127 128 129 130 131 132
## 4.006720 3.992617 3.925848 3.995759 4.049619 3.930793 3.908343 3.901518
## 133 134 135 136 137 138 139 140
## 3.885086 4.040356 4.060970 3.987789 3.951878 3.814014 3.707733 3.926147
## 141 142 143 144 145 146 147 148
## 3.988745 4.167983 4.157021 4.166146 4.146362 4.146132 4.159052 4.113267
## 149 150 151 152 153 154 155 156
## 4.133069 4.222223 4.461376 4.488496 4.437250 4.512071 4.953146 5.033253
## 157 158 159 160 161 162 163 164
## 4.962377 5.009025 5.018389 3.974900 3.880482 3.859268 3.859265 3.871895
## 165 166 167 168 169 170 171 172
## 3.899914 3.807943 3.826619 4.151598 3.760446 3.654324 3.676713 3.825559
## 173 174 175 176 177 178 179 180
## 3.747726 3.813974 3.803077 3.702349 3.566476 3.678236 3.675620 3.662452
## 181 182 183 184 185 186 187 188
## 3.563181 3.645426 4.017460 4.113689 4.229393 4.381609 4.243153 5.041281
## 189 190 191 192 193 194 195 196
## 5.051563 5.051719 5.057591 4.183653 4.151505 4.113279 3.969795 3.905122
## 197 198 199
## 4.191810 4.161419 4.092141
##
## $var
## 101 102 103 104 105 106
## 0.011775189 0.008409641 0.009452076 0.011870028 0.014459672 0.018112713
## 107 108 109 110 111 112
## 0.011685860 0.007268889 0.014487712 0.012814461 0.010852276 0.010048935
## 113 114 115 116 117 118
## 0.006240382 0.009816255 0.008377720 0.006250414 0.015406961 0.015306088
## 119 120 121 122 123 124
## 0.008914737 0.008485896 0.016998370 0.014850114 0.012880197 0.011964472
## 125 126 127 128 129 130
## 0.010525923 0.010528040 0.013248168 0.012789504 0.013568841 0.013971730
## 131 132 133 134 135 136
## 0.010048937 0.009535820 0.008387455 0.008359158 0.010126605 0.012608132
## 137 138 139 140 141 142
## 0.013951936 0.018502968 0.014105896 0.018182598 0.017745869 0.012861889
## 143 144 145 146 147 148
## 0.014828558 0.011891297 0.008732027 0.008604618 0.010009986 0.007660375
## 149 150 151 152 153 154
## 0.006817879 0.012378867 0.015540878 0.014154484 0.012761730 0.011952420
## 155 156 157 158 159 160
## 0.005064274 0.002463170 0.006924118 0.003953467 0.003509809 0.011102798
## 161 162 163 164 165 166
## 0.011378573 0.011045003 0.011682628 0.011560333 0.012242267 0.011462597
## 167 168 169 170 171 172
## 0.008888509 0.014206315 0.014405121 0.006290501 0.005434707 0.014714164
## 173 174 175 176 177 178
## 0.010884130 0.014842361 0.011333759 0.012932194 0.007449222 0.011094079
## 179 180 181 182 183 184
## 0.011015666 0.012850744 0.005326252 0.012876076 0.021615405 0.014608411
## 185 186 187 188 189 190
## 0.013155037 0.021043877 0.012658226 0.004540141 0.003540634 0.002698421
## 191 192 193 194 195 196
## 0.001277775 0.011837709 0.012449062 0.013536624 0.015848988 0.008788712
## 197 198 199
## 0.015789566 0.013529210 0.009534335
##
## $CI95
## [,1] [,2]
## 101 3.853231 4.278604
## 102 3.865861 4.225341
## 103 3.857439 4.238548
## 104 3.853382 4.280465
## 105 3.801056 4.272430
## 106 3.785731 4.313298
## 107 3.842649 4.266406
## 108 3.878112 4.212323
## 109 3.743577 4.215408
## 110 3.730566 4.174314
## 111 3.722632 4.130995
## 112 3.767935 4.160893
## 113 3.841003 4.150668
## 114 3.753844 4.142225
## 115 3.775804 4.134601
## 116 3.822885 4.132799
## 117 3.970710 4.457279
## 118 3.998380 4.483354
## 119 4.063391 4.433509
## 120 4.077021 4.438127
## 121 3.983520 4.494601
## 122 3.765272 4.242968
## 123 3.784582 4.229467
## 124 3.800860 4.229639
## 125 3.805632 4.207808
## 126 3.791509 4.193725
## 127 3.700250 4.151445
## 128 3.774102 4.217417
## 129 3.821307 4.277930
## 130 3.699117 4.162469
## 131 3.711864 4.104822
## 132 3.710121 4.092915
## 133 3.705584 4.064589
## 134 3.861156 4.219555
## 135 3.863733 4.258207
## 136 3.767708 4.207869
## 137 3.720366 4.183390
## 138 3.547403 4.080624
## 139 3.474947 3.940519
## 140 3.661855 4.190439
## 141 3.727646 4.249843
## 142 3.945699 4.390267
## 143 3.918347 4.395695
## 144 3.952414 4.379879
## 145 3.963210 4.329515
## 146 3.964320 4.327943
## 147 3.962955 4.355150
## 148 3.941721 4.284813
## 149 3.971230 4.294907
## 150 4.004152 4.440293
## 151 4.217036 4.705715
## 152 4.255309 4.721682
## 153 4.215833 4.658667
## 154 4.297789 4.726352
## 155 4.813665 5.092627
## 156 4.935977 5.130528
## 157 4.799283 5.125471
## 158 4.885787 5.132263
## 159 4.902272 5.134507
## 160 3.768375 4.181425
## 161 3.671408 4.089556
## 162 3.653282 4.065255
## 163 3.647416 4.071113
## 164 3.661158 4.082632
## 165 3.683050 4.116777
## 166 3.598098 4.017787
## 167 3.641833 4.011406
## 168 3.917985 4.385211
## 169 3.525204 3.995688
## 170 3.498871 3.809777
## 171 3.532221 3.821205
## 172 3.587807 4.063310
## 173 3.543245 3.952207
## 174 3.575189 4.052760
## 175 3.594415 4.011738
## 176 3.479458 3.925240
## 177 3.397311 3.735642
## 178 3.471792 3.884680
## 179 3.469907 3.881332
## 180 3.440264 3.884639
## 181 3.420138 3.706225
## 182 3.423020 3.867833
## 183 3.729297 4.305622
## 184 3.876793 4.350585
## 185 4.004590 4.454196
## 186 4.097282 4.665937
## 187 4.022636 4.463670
## 188 4.909215 5.173347
## 189 4.934937 5.168189
## 190 4.949904 5.153533
## 191 4.987529 5.127654
## 192 3.970403 4.396903
## 193 3.932817 4.370192
## 194 3.885239 4.341319
## 195 3.723046 4.216545
## 196 3.721376 4.088869
## 197 3.945523 4.438096
## 198 3.933441 4.389397
## 199 3.900759 4.283523
## as we discussed in class, 95% CIs can be broad
fit$CI[1,]
## [1] 3.853231 4.278604
range(svl)
## [1] 3.462014 5.113994
phytools has several different methods for visualizing reconstructed ancestral states for a continuous trait on the tree. One is a color gradient projection onto the tree:
## projection of the reconstruction onto the edges of the tree
obj<-contMap(anole.tree,svl,plot=FALSE)
plot(obj,type="fan",legend=0.7*max(nodeHeights(anole.tree)),
fsize=c(0.7,0.9))
A second is a projection of the tree into a space defined by time on the horizontal axis, and phenotype on the vertical dimension.
## projection of the tree into phenotype space
phenogram(anole.tree,svl,fsize=0.6,spread.costs=c(1,0))
## Optimizing the positions of the tip labels...
Next, we can explore some of the properties of ancestral state reconstruction of continuous traits in general.
To do this, we will use some simulation functions in the phytools package.
First, let's simulate some data:
## simulate a tree & some data
tree<-pbtree(n=26,scale=1,tip.label=LETTERS)
## simulate with ancestral states
x<-fastBM(tree,internal=TRUE)
## ancestral states
a<-x[as.character(1:tree$Nnode+Ntip(tree))]
## tip data
x<-x[tree$tip.label]
Now, let's estimate ancestral states using fastAnc
which uses
the re-rooting algorithm discussed in class:
fit<-fastAnc(tree,x,CI=TRUE)
fit
## $ace
## 27 28 29 30 31 32
## 0.30696159 0.12063831 0.08144035 -0.01186937 -0.01243673 -0.29561804
## 33 34 35 36 37 38
## 0.23583553 0.24737980 -0.66652606 -0.54690301 -0.72462221 -0.78856261
## 39 40 41 42 43 44
## -1.08249928 -1.09079143 -0.73420513 -0.85876797 -0.42923557 0.03567978
## 45 46 47 48 49 50
## 0.81080416 0.53173049 0.52707331 0.60533272 0.45219307 1.31083890
## 51
## 1.50164751
##
## $CI95
## [,1] [,2]
## 27 -0.82459808 1.438521250
## 28 -0.65030909 0.891585713
## 29 -0.67957462 0.842455325
## 30 -0.71970440 0.695965664
## 31 -0.71934891 0.694475446
## 32 -0.69663413 0.105398045
## 33 -0.19444810 0.666119157
## 34 -0.05226882 0.547028416
## 35 -1.36962520 0.036573067
## 36 -0.97939125 -0.114414768
## 37 -1.11004584 -0.339198588
## 38 -1.16023344 -0.416891786
## 39 -1.35348133 -0.811517238
## 40 -1.31900458 -0.862578280
## 41 -0.99830581 -0.470104458
## 42 -0.90319321 -0.814342726
## 43 -0.86623182 0.007760684
## 44 -0.10021594 0.171575502
## 45 0.16272764 1.458880673
## 46 0.01208649 1.051374499
## 47 0.04963634 1.004510269
## 48 0.42024250 0.790422937
## 49 0.15161200 0.752774142
## 50 0.74997549 1.871702298
## 51 1.00688690 1.996408112
We can easily compare these estimates to the (normally unknown) generating values for simulated data:
plot(a,fit$ace,xlab="true states",ylab="estimated states")
lines(range(c(x,a)),range(c(x,a)),lty="dashed",col="red") ## 1:1 line
One of the most common critiques of ancestral state estimation is that the uncertainty around ancestral states can be large; and, furthermore, that this uncertainty is often ignored. Let's address this by obtaining 95% CIs on ancestral values, and then let's show that the 95% CIs include the generating values around 95% of the time:
## first, let's go back to our previous dataset
print(fit)
## $ace
## 27 28 29 30 31 32
## 0.30696159 0.12063831 0.08144035 -0.01186937 -0.01243673 -0.29561804
## 33 34 35 36 37 38
## 0.23583553 0.24737980 -0.66652606 -0.54690301 -0.72462221 -0.78856261
## 39 40 41 42 43 44
## -1.08249928 -1.09079143 -0.73420513 -0.85876797 -0.42923557 0.03567978
## 45 46 47 48 49 50
## 0.81080416 0.53173049 0.52707331 0.60533272 0.45219307 1.31083890
## 51
## 1.50164751
##
## $CI95
## [,1] [,2]
## 27 -0.82459808 1.438521250
## 28 -0.65030909 0.891585713
## 29 -0.67957462 0.842455325
## 30 -0.71970440 0.695965664
## 31 -0.71934891 0.694475446
## 32 -0.69663413 0.105398045
## 33 -0.19444810 0.666119157
## 34 -0.05226882 0.547028416
## 35 -1.36962520 0.036573067
## 36 -0.97939125 -0.114414768
## 37 -1.11004584 -0.339198588
## 38 -1.16023344 -0.416891786
## 39 -1.35348133 -0.811517238
## 40 -1.31900458 -0.862578280
## 41 -0.99830581 -0.470104458
## 42 -0.90319321 -0.814342726
## 43 -0.86623182 0.007760684
## 44 -0.10021594 0.171575502
## 45 0.16272764 1.458880673
## 46 0.01208649 1.051374499
## 47 0.04963634 1.004510269
## 48 0.42024250 0.790422937
## 49 0.15161200 0.752774142
## 50 0.74997549 1.871702298
## 51 1.00688690 1.996408112
mean(((a>=fit$CI95[,1]) + (a<=fit$CI95[,2]))==2)
## [1] 1
One small tree doesn't tell us much, so let's repeat for a sample of trees & reconstructions:
## custom function that conducts a simulation, estimates ancestral
## states, & returns the fraction on 95% CI
foo<-function(){
tree<-pbtree(n=100)
x<-fastBM(tree,internal=TRUE)
fit<-fastAnc(tree,x[1:length(tree$tip.label)],CI=TRUE)
mean(((x[1:tree$Nnode+length(tree$tip.label)]>=fit$CI95[,1]) +
(x[1:tree$Nnode+length(tree$tip.label)]<=fit$CI95[,2]))==2)
}
## conduct 100 simulations
pp<-replicate(100,foo())
mean(pp)
## [1] 0.949899
Cool. Well, this shows us that although CIs can be large, when the model is correct they will include the generating value (1-α)% of the time.
We can theoretically fix the state for any internal node during MLE of ancestral states. In practice, we would do this by attaching a terminal edge of zero length to the internal node we wanted to fix, and then treat it like another tip value in our analyses.
Another possibility, which also allows for the possibility that ancestral states are uncertain, is to use an informative prior probability density on one or multiple states at internal nodes, and then estimate ancestral states using Bayesian MCMC.
Let's try this using a really bad case for ancestral character estimation from contemporaneous tip data: Brownian evolution with a trend. Note that although we theoretically could do so - we are not fitting a trended model here.
tree<-pbtree(n=100,scale=1)
## simulate data with a trend
x<-fastBM(tree,internal=TRUE,mu=3)
phenogram(tree,x,ftype="off",spread.labels=FALSE)
a<-x[as.character(1:tree$Nnode+Ntip(tree))]
x<-x[tree$tip.label]
## let's see how bad we do if we ignore the trend
plot(a,fastAnc(tree,x),xlab="true values",
ylab="estimated states under BM")
lines(range(c(x,a)),range(c(x,a)),lty="dashed",col="red")
title("estimated without prior information")
## incorporate prior knowledge
pm<-setNames(c(1000,rep(0,tree$Nnode)),
c("sig2",1:tree$Nnode+length(tree$tip.label)))
## the root & two randomly chosen nodes
nn<-as.character(c(length(tree$tip.label)+1,
sample(2:tree$Nnode+length(tree$tip.label),2)))
pm[nn]<-a[as.character(nn)]
## prior variance
pv<-setNames(c(1000^2,rep(1000,length(pm)-1)),names(pm))
pv[as.character(nn)]<-1e-100
## run MCMC
mcmc<-anc.Bayes(tree,x,ngen=100000,
control=list(pr.mean=pm,pr.var=pv,
a=pm[as.character(length(tree$tip.label)+1)],
y=pm[as.character(2:tree$Nnode+length(tree$tip.label))]))
## Control parameters (set by user or default):
## List of 7
## $ sig2 : num 1.1
## $ a : Named num 0
## ..- attr(*, "names")= chr "101"
## $ y : Named num [1:98] 0 0 0 0 0 ...
## ..- attr(*, "names")= chr [1:98] "102" "103" "104" "105" ...
## $ pr.mean: Named num [1:100] 1000 0 0 0 0 ...
## ..- attr(*, "names")= chr [1:100] "sig2" "101" "102" "103" ...
## $ pr.var : Named num [1:100] 1e+06 1e-100 1e+03 1e+03 1e+03 ...
## ..- attr(*, "names")= chr [1:100] "sig2" "101" "102" "103" ...
## $ prop : num [1:100] 0.011 0.011 0.011 0.011 0.011 ...
## $ sample : num 100
## Starting MCMC...
## Done MCMC.
anc.est<-colMeans(mcmc[201:1001,
as.character(1:tree$Nnode+length(tree$tip.label))])
plot(a[names(anc.est)],anc.est,xlab="true values",
ylab="estimated states using informative prior")
lines(range(c(x,a)),range(c(x,a)),lty="dashed",col="red")
title("estimated using informative prior")
This part of the tutorial focuses on the estimation of ancestral character states for discretely valued traits using a continuous-time Markov chain model commonly known as the Mk model.
phytools" has a function for ancestral character estimation
under a couple of different models that uses the re-rooting method of Yang
(1995). For other models of evolutionary change not covered by this function,
users should try Rich Fitzjohn's
diversitree
package, or the ace
function in ape.
OK - let's pull our data from the Anolis ecomorph tree (packaged with phytools) to use in these analyses:
data(anoletree)
## this is just to pull out the tip states from the
## data object - normally we would read this from file
x<-getStates(anoletree,"tips")
tree<-anoletree
rm(anoletree)
tree
##
## Phylogenetic tree with 82 tips and 81 internal nodes.
##
## Tip labels:
## Anolis_ahli, Anolis_allogus, Anolis_rubribarbus, Anolis_imias, Anolis_sagrei, Anolis_bremeri, ...
##
## Rooted; includes branch lengths.
x
## Anolis_ahli Anolis_allogus Anolis_rubribarbus
## "TG" "TG" "TG"
## Anolis_imias Anolis_sagrei Anolis_bremeri
## "TG" "TG" "TG"
## Anolis_quadriocellifer Anolis_ophiolepis Anolis_mestrei
## "TG" "GB" "TG"
## Anolis_jubar Anolis_homolechis Anolis_confusus
## "TG" "TG" "TG"
## Anolis_guafe Anolis_garmani Anolis_opalinus
## "TG" "CG" "TC"
## Anolis_grahami Anolis_valencienni Anolis_lineatopus
## "TC" "Tw" "TG"
## Anolis_evermanni Anolis_stratulus Anolis_krugi
## "TC" "TC" "GB"
## Anolis_pulchellus Anolis_gundlachi Anolis_poncensis
## "GB" "TG" "GB"
## Anolis_cooki Anolis_cristatellus Anolis_brevirostris
## "TG" "TG" "Tr"
## Anolis_caudalis Anolis_marron Anolis_websteri
## "Tr" "Tr" "Tr"
## Anolis_distichus Anolis_alumina Anolis_semilineatus
## "Tr" "GB" "GB"
## Anolis_olssoni Anolis_insolitus Anolis_whitemani
## "GB" "Tw" "TG"
## Anolis_haetianus Anolis_breslini Anolis_armouri
## "TG" "TG" "TG"
## Anolis_cybotes Anolis_shrevei Anolis_longitibialis
## "TG" "TG" "TG"
## Anolis_strahmi Anolis_marcanoi Anolis_baleatus
## "TG" "TG" "CG"
## Anolis_barahonae Anolis_ricordii Anolis_cuvieri
## "CG" "CG" "CG"
## Anolis_altitudinalis Anolis_oporinus Anolis_isolepis
## "TC" "TC" "TC"
## Anolis_allisoni Anolis_porcatus Anolis_loysiana
## "TC" "TC" "Tr"
## Anolis_guazuma Anolis_placidus Anolis_sheplani
## "Tw" "Tw" "Tw"
## Anolis_alayoni Anolis_angusticeps Anolis_paternus
## "Tw" "Tw" "Tw"
## Anolis_alutaceus Anolis_inexpectatus Anolis_clivicola
## "GB" "GB" "GB"
## Anolis_cupeyalensis Anolis_cyanopleurus Anolis_alfaroi
## "GB" "GB" "GB"
## Anolis_macilentus Anolis_vanidicus Anolis_baracoae
## "GB" "GB" "CG"
## Anolis_noblei Anolis_smallwoodi Anolis_luteogularis
## "CG" "CG" "CG"
## Anolis_equestris Anolis_bahorucoensis Anolis_dolichocephalus
## "CG" "GB" "GB"
## Anolis_hendersoni Anolis_darlingtoni Anolis_aliniger
## "GB" "Tw" "TC"
## Anolis_singularis Anolis_chlorocyanus Anolis_coelestinus
## "TC" "TC" "TC"
## Anolis_occultus
## "Tw"
In visual format, these are the data that we have:
plotTree(tree,type="fan",fsize=0.8,ftype="i")
## setEnv=TRUE for this type is experimental. please be patient with bugs
cols<-setNames(palette()[1:length(unique(x))],sort(unique(x)))
tiplabels(pie=to.matrix(x,sort(unique(x))),piecol=cols,cex=0.3)
add.simmap.legend(colors=cols,prompt=FALSE,x=0.9*par()$usr[1],
y=-max(nodeHeights(tree)),fsize=0.8)
Next, let's fit a single-rate model & reconstruct ancestral states at internal nodes in the tree.
## estimate ancestral states under a ER model
fitER<-ace(x,tree,model="ER",type="discrete")
fitER
##
## Ancestral Character Estimation
##
## Call: ace(x = x, phy = tree, type = "discrete", model = "ER")
##
## Log-likelihood: -78.04604
##
## Rate index matrix:
## CG GB TC TG Tr Tw
## CG . 1 1 1 1 1
## GB 1 . 1 1 1 1
## TC 1 1 . 1 1 1
## TG 1 1 1 . 1 1
## Tr 1 1 1 1 . 1
## Tw 1 1 1 1 1 .
##
## Parameter estimates:
## rate index estimate std-err
## 1 0.0231 0.004
##
## Scaled likelihoods at the root (type '...$lik.anc' to get them for all nodes):
## CG GB TC TG Tr Tw
## 0.018197565 0.202238628 0.042841575 0.428609607 0.004383532 0.303729093
round(fitER$lik.anc,3)
## CG GB TC TG Tr Tw
## [1,] 0.018 0.202 0.043 0.429 0.004 0.304
## [2,] 0.008 0.197 0.020 0.501 0.001 0.272
## [3,] 0.003 0.134 0.020 0.716 0.005 0.122
## [4,] 0.002 0.042 0.019 0.858 0.002 0.077
## [5,] 0.000 0.001 0.001 0.995 0.000 0.002
## [6,] 0.000 0.000 0.000 0.998 0.000 0.001
## [7,] 0.000 0.000 0.000 1.000 0.000 0.000
## [8,] 0.000 0.000 0.000 1.000 0.000 0.000
## [9,] 0.000 0.000 0.000 1.000 0.000 0.000
## [10,] 0.000 0.000 0.000 1.000 0.000 0.000
## [11,] 0.000 0.003 0.000 0.997 0.000 0.000
## [12,] 0.000 0.000 0.000 1.000 0.000 0.000
## [13,] 0.000 0.000 0.000 1.000 0.000 0.000
## [14,] 0.000 0.000 0.000 1.000 0.000 0.000
## [15,] 0.000 0.000 0.000 1.000 0.000 0.000
## [16,] 0.000 0.000 0.000 1.000 0.000 0.000
## [17,] 0.007 0.019 0.084 0.786 0.005 0.099
## [18,] 0.020 0.018 0.314 0.428 0.011 0.210
## [19,] 0.030 0.001 0.945 0.014 0.001 0.007
## [20,] 0.035 0.001 0.945 0.012 0.001 0.006
## [21,] 0.004 0.148 0.036 0.692 0.071 0.049
## [22,] 0.003 0.179 0.057 0.721 0.024 0.017
## [23,] 0.001 0.004 0.979 0.013 0.001 0.001
## [24,] 0.001 0.214 0.010 0.767 0.005 0.003
## [25,] 0.000 0.353 0.003 0.640 0.002 0.001
## [26,] 0.001 0.921 0.002 0.073 0.002 0.002
## [27,] 0.001 0.356 0.003 0.637 0.002 0.001
## [28,] 0.001 0.046 0.003 0.947 0.002 0.001
## [29,] 0.001 0.003 0.001 0.013 0.980 0.002
## [30,] 0.000 0.000 0.000 0.000 1.000 0.000
## [31,] 0.000 0.000 0.000 0.000 1.000 0.000
## [32,] 0.000 0.000 0.000 0.000 1.000 0.000
## [33,] 0.009 0.203 0.018 0.492 0.001 0.276
## [34,] 0.022 0.213 0.013 0.495 0.002 0.255
## [35,] 0.020 0.359 0.014 0.311 0.008 0.287
## [36,] 0.003 0.954 0.002 0.020 0.002 0.019
## [37,] 0.000 0.997 0.000 0.001 0.000 0.001
## [38,] 0.092 0.141 0.012 0.582 0.005 0.168
## [39,] 0.003 0.005 0.001 0.985 0.001 0.005
## [40,] 0.000 0.000 0.000 0.999 0.000 0.000
## [41,] 0.000 0.000 0.000 1.000 0.000 0.000
## [42,] 0.000 0.000 0.000 1.000 0.000 0.000
## [43,] 0.000 0.000 0.000 1.000 0.000 0.000
## [44,] 0.000 0.000 0.000 1.000 0.000 0.000
## [45,] 0.000 0.000 0.000 1.000 0.000 0.000
## [46,] 0.000 0.000 0.000 1.000 0.000 0.000
## [47,] 0.911 0.015 0.005 0.049 0.004 0.017
## [48,] 1.000 0.000 0.000 0.000 0.000 0.000
## [49,] 1.000 0.000 0.000 0.000 0.000 0.000
## [50,] 0.006 0.291 0.042 0.136 0.006 0.519
## [51,] 0.002 0.055 0.071 0.026 0.006 0.840
## [52,] 0.002 0.039 0.118 0.019 0.009 0.812
## [53,] 0.004 0.016 0.653 0.010 0.047 0.270
## [54,] 0.001 0.004 0.927 0.002 0.010 0.056
## [55,] 0.000 0.000 0.999 0.000 0.000 0.001
## [56,] 0.000 0.000 1.000 0.000 0.000 0.000
## [57,] 0.000 0.000 0.995 0.000 0.001 0.003
## [58,] 0.000 0.005 0.006 0.002 0.001 0.985
## [59,] 0.000 0.000 0.000 0.000 0.000 1.000
## [60,] 0.000 0.000 0.000 0.000 0.000 0.999
## [61,] 0.000 0.000 0.000 0.000 0.000 1.000
## [62,] 0.000 0.988 0.001 0.002 0.000 0.008
## [63,] 0.000 1.000 0.000 0.000 0.000 0.000
## [64,] 0.000 1.000 0.000 0.000 0.000 0.000
## [65,] 0.000 1.000 0.000 0.000 0.000 0.000
## [66,] 0.000 1.000 0.000 0.000 0.000 0.000
## [67,] 0.000 1.000 0.000 0.000 0.000 0.000
## [68,] 0.000 1.000 0.000 0.000 0.000 0.000
## [69,] 0.039 0.212 0.096 0.264 0.007 0.381
## [70,] 0.079 0.253 0.212 0.118 0.008 0.329
## [71,] 1.000 0.000 0.000 0.000 0.000 0.000
## [72,] 1.000 0.000 0.000 0.000 0.000 0.000
## [73,] 1.000 0.000 0.000 0.000 0.000 0.000
## [74,] 1.000 0.000 0.000 0.000 0.000 0.000
## [75,] 0.048 0.280 0.278 0.071 0.006 0.317
## [76,] 0.043 0.322 0.238 0.062 0.007 0.328
## [77,] 0.003 0.959 0.013 0.004 0.002 0.018
## [78,] 0.000 0.999 0.000 0.000 0.000 0.000
## [79,] 0.007 0.034 0.907 0.010 0.002 0.039
## [80,] 0.000 0.001 0.997 0.000 0.000 0.001
## [81,] 0.000 0.000 1.000 0.000 0.000 0.000
The element lik.anc
gives us the marginal ancestral
states, also known as the 'empirical Bayesian posterior probabilities.'
It is fairly straightforward to overlay these posterior probabilities on the tree:
plotTree(tree,type="fan",fsize=0.8,ftype="i")
## setEnv=TRUE for this type is experimental. please be patient with bugs
nodelabels(node=1:tree$Nnode+Ntip(tree),
pie=fitER$lik.anc,piecol=cols,cex=0.5)
tiplabels(pie=to.matrix(x,sort(unique(x))),piecol=cols,cex=0.3)
add.simmap.legend(colors=cols,prompt=FALSE,x=0.9*par()$usr[1],
y=-max(nodeHeights(tree)),fsize=0.8)
An alternative approach to the one outline above is to use an MCMC approach to sample character histories from their posterior probability distribution. This is called stochastic character mapping (Huelsenbeck et al. 2003). The model is the same but in this case we get a sample of unambiguous histories for our discrete character's evolution on the tree - rather than a probability distribution for the character at nodes.
For instance, given the data simulated above - we can generate the stochastic character map as follows:
## simulate single stochastic character map using empirical Bayes method
mtree<-make.simmap(tree,x,model="ER")
## make.simmap is sampling character histories conditioned on the transition matrix
## Q =
## CG GB TC TG Tr Tw
## CG -0.11570723 0.02314145 0.02314145 0.02314145 0.02314145 0.02314145
## GB 0.02314145 -0.11570723 0.02314145 0.02314145 0.02314145 0.02314145
## TC 0.02314145 0.02314145 -0.11570723 0.02314145 0.02314145 0.02314145
## TG 0.02314145 0.02314145 0.02314145 -0.11570723 0.02314145 0.02314145
## Tr 0.02314145 0.02314145 0.02314145 0.02314145 -0.11570723 0.02314145
## Tw 0.02314145 0.02314145 0.02314145 0.02314145 0.02314145 -0.11570723
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
## CG GB TC TG Tr Tw
## 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
## Done.
mtree
##
## Phylogenetic tree with 82 tips and 81 internal nodes.
##
## Tip labels:
## Anolis_ahli, Anolis_allogus, Anolis_rubribarbus, Anolis_imias, Anolis_sagrei, Anolis_bremeri, ...
##
## The tree includes a mapped, 6-state discrete character with states:
## CG, GB, TC, TG, Tr, Tw
##
## Rooted; includes branch lengths.
plot(mtree,cols,type="fan",fsize=0.8,ftype="i")
## setEnv=TRUE for this type is experimental. please be patient with bugs
add.simmap.legend(colors=cols,prompt=FALSE,x=0.9*par()$usr[1],
y=-max(nodeHeights(tree)),fsize=0.8)
A single stochastic character map does not mean a whole lot in isolation - we need to look at the whole distribution from a sample of stochastic maps. This can be a bit overwhelming. For instance, the following code generates 100 stochastic character maps from our dataset and plots them in a grid:
mtrees<-make.simmap(tree,x,model="ER",nsim=100)
## make.simmap is sampling character histories conditioned on the transition matrix
## Q =
## CG GB TC TG Tr Tw
## CG -0.11570723 0.02314145 0.02314145 0.02314145 0.02314145 0.02314145
## GB 0.02314145 -0.11570723 0.02314145 0.02314145 0.02314145 0.02314145
## TC 0.02314145 0.02314145 -0.11570723 0.02314145 0.02314145 0.02314145
## TG 0.02314145 0.02314145 0.02314145 -0.11570723 0.02314145 0.02314145
## Tr 0.02314145 0.02314145 0.02314145 0.02314145 -0.11570723 0.02314145
## Tw 0.02314145 0.02314145 0.02314145 0.02314145 0.02314145 -0.11570723
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
## CG GB TC TG Tr Tw
## 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
## Done.
mtrees
## 100 phylogenetic trees with mapped discrete characters
par(mfrow=c(10,10))
null<-sapply(mtrees,plot,colors=cols,lwd=1,ftype="off")
It's possible to summarize a set of stochastic maps in a much more meaningful way. For instance, we can estimate the number of changes of each type, the proportion of time spent in each state, and the posterior probabilities that each internal node is in each state, under our model. For example:
pd<-summary(mtrees,plot=FALSE)
pd
## 100 trees with a mapped discrete character with states:
## CG, GB, TC, TG, Tr, Tw
##
## trees have 24.07 changes between states on average
##
## changes are of the following types:
## CG,GB CG,TC CG,TG CG,Tr CG,Tw GB,CG GB,TC GB,TG GB,Tr GB,Tw TC,CG
## x->y 0.27 0.25 0.22 0.09 0.23 0.56 0.97 1.26 0.41 1.52 1.41
## TC,GB TC,TG TC,Tr TC,Tw TG,CG TG,GB TG,TC TG,Tr TG,Tw Tr,CG Tr,GB
## x->y 0.64 0.32 0.8 1.13 0.97 3.17 1.91 0.93 1.81 0.2 0.17
## Tr,TC Tr,TG Tr,Tw Tw,CG Tw,GB Tw,TC Tw,TG Tw,Tr
## x->y 0.15 0.14 0.13 0.56 1.31 1.43 0.7 0.41
##
## mean total time spent in each state is:
## CG GB TC TG Tr Tw
## raw 12.90191857 47.001692 33.6593396 69.1939136 12.53922339 30.3708734
## prop 0.06273209 0.228533 0.1636594 0.3364367 0.06096858 0.1476702
## total
## raw 205.667
## prop 1.000
plot(pd,fsize=0.6,ftype="i")
## now let's plot a random map, and overlay the posterior probabilities
plot(mtrees[[1]],cols,type="fan",fsize=0.8,ftype="i")
## setEnv=TRUE for this type is experimental. please be patient with bugs
nodelabels(pie=pd$ace,piecol=cols,cex=0.5)
add.simmap.legend(colors=cols,prompt=FALSE,x=0.9*par()$usr[1],
y=-max(nodeHeights(tree)),fsize=0.8)
Finally, since we obtained these inferences under exactly the same model, let's compare the posterior probabilities from stochastic mapping with our marginal ancestral states. In the former case, our probabilities were obtained by sampling from the joint (rather than marginal) probability distribution for the ancestral states.
plot(fitER$lik.anc,pd$ace,xlab="marginal ancestral states",
ylab="posterior probabilities from stochastic mapping")
lines(c(0,1),c(0,1),lty="dashed",col="red",lwd=2)
This tells us that although joint & marginal reconstruction are not the same, the marginal probabilities from joint stochastic sampling and the marginal ancestral states are quite highly correlated - at least in this case study.
In this final part of the tutorial I'm going to quickly overview a range of plotting methods for phylogenies & comparative data that are implemented in the phytools package.
The first plotting method I'll illustrate is called a 'traitgram' which is a projection of the tree into a space defined by trait values & time since the root. So, for example:
## simulate a tree
tree<-pbtree(n=26,tip.label=LETTERS)
plotTree(tree)
## simulate data under Brownian motion
x<-fastBM(tree)
x
## A B C D E F
## -1.31892549 -0.89247317 -1.50604164 -1.15746924 -1.29421210 0.20115505
## G H I J K L
## -0.65600380 -1.52414832 1.95074529 1.94129933 1.72489401 1.55762043
## M N O P Q R
## 2.25736463 2.29204689 1.86703510 1.64176548 0.77126487 -0.04813491
## S T U V W X
## 0.06931664 0.49297554 1.02772880 1.05365961 2.37696337 2.59637119
## Y Z
## 1.72738064 1.81238371
phenogram(tree,x,spread.labels=TRUE,spread.cost=c(1,0))
We can also plot a traitgram with the uncertainty about
ancestral character states visualized using transparent probability
density. This is implemented in the phytools function
fancyTree
. So, for instance:
## plot traitgram with 95% CI
fancyTree(tree,type="phenogram95",x=x,spread.cost=c(1,0))
## Computing density traitgram...
Next, we can explore the continuous character mapping function we've seen
already in phytools called contMap
. Here is a quick demo
using the same data:
## plot contMap
obj<-contMap(tree,x)
The function contMap
returns an object of class "contMap"
which we can then more easily replot using, for instance, different parameters:
## plot leftward
plot(obj,direction="leftwards")
For large trees, we might want to use a circular or “fan” tree:
tree<-pbtree(n=200,scale=1)
x<-fastBM(tree)
obj<-contMap(tree,x,plot=FALSE)
plot(obj,type="fan",outline=FALSE,ftype="off")
Note that the intense aliasing is just a byproduct of plotting to .png format. To get a high quality image we can plot to .pdf or other formats. For instance:
pdf(file="plot-contMap.pdf",width=10,height=10)
plot(obj,type="fan",outline=FALSE)
dev.off()
## png
## 2
The result can be viewed here.
Finally, a relatively simple new plotting method in phytools is
the function plotTree.wBars
. That function pretty much does what it
sounds like it does:
plotTree.wBars(anole.tree,exp(svl),type="fan",scale=0.002,
fsize=0.7,tip.labels=TRUE)
It is not too difficult to combine this with a contMap
plot. For
example:
obj<-contMap(anole.tree,exp(svl),plot=FALSE)
plotTree.wBars(obj$tree,exp(svl),method="plotSimmap",
tip.labels=TRUE,fsize=0.7,colors=obj$cols,type="fan",scale=0.002)
add.color.bar(1.0,obj$cols,title="trait value",lims=obj$lims,prompt=FALSE,
x=0.9*par()$usr[1],y=0.9*par()$usr[3])
We can also do two dimensional visualizations of phylogenies in morphospace. The is called a 'phylomorphospace'. E.g.:
tree<-pbtree(n=26,tip.label=LETTERS)
X<-fastBM(tree,nsim=3) ## simulate 3 characters
colnames(X)<-paste("trait",1:3)
phylomorphospace(tree,X[,c(1,2)],xlab="trait 1",ylab="trait 2")
With phytools we can do static or animated version of the same. I'll just do a static version here, but animated is the default:
phylomorphospace3d(tree,X,method="static")
Try the animated version on your own.
Finally, it's possible to combine the continuous character mapping and 2D phylomorphospaces using a type of phylogenetic scatterplot. Here's a demo:
fancyTree(tree,type="scattergram",X=X)
## Computing multidimensional phylogenetic scatterplot matrix...
Stochastic character maps are easy to plot using phytools. For instance:
data(anoletree)
plotSimmap(anoletree,fsize=0.6,ftype="i",ylim=c(-1,Ntip(anoletree)))
## no colors provided. using the following legend:
## CG GB TC TG Tr Tw
## "black" "red" "green3" "blue" "cyan" "magenta"
## we can use any color scheme, but this is the default
cols<-setNames(palette()[1:length(unique(getStates(anoletree,
"tips")))],sort(unique(getStates(anoletree,"tips"))))
add.simmap.legend(colors=cols,x=0,y=-1,vertical=FALSE,prompt=FALSE)
## mark the changes on the tree
xy<-markChanges(anoletree,plot=FALSE)
points(xy,pch=19)
For binary characters, we can also map the posterior density of
stochastic maps onto the nodes & branches of a tree using the
phytools function densityMap
. Here's a demo of this using
simulated data.
## first let's simulate data
Q<-matrix(c(-1,1,1,-1),2,2)
rownames(Q)<-colnames(Q)<-c(0,1)
tree<-sim.history(tree,Q)
## Done simulation(s).
x<-tree$states
x
## A B C D E F G H I J K L M N O P Q R
## "1" "1" "1" "0" "0" "1" "1" "1" "1" "1" "0" "1" "0" "1" "1" "0" "0" "0"
## S T U V W X Y Z
## "1" "1" "1" "1" "0" "0" "0" "0"
## stochastic maps
trees<-make.simmap(tree,x,nsim=100)
## make.simmap is sampling character histories conditioned on the transition matrix
## Q =
## 0 1
## 0 -0.9914265 0.9914265
## 1 0.9914265 -0.9914265
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
## 0 1
## 0.5 0.5
## Done.
## make densityMap
obj<-densityMap(trees,lwd=4,outline=TRUE)
## sorry - this might take a while; please be patient
We can, for instance, overlay a discrete character history (or stochastic map) onto a traitgram or phylomorphospace.
## simulate data with a high rate on some branches
x<-sim.rates(tree,setNames(c(1,10),c(0,1)))
## plot traitgram
phenogram(tree,x,colors=setNames(c("blue","red"),
c(0,1)),spread.labels=TRUE,spread.cost=c(1,0))
## we can do the same with phylomorphospaces
X<-cbind(sim.rates(tree,setNames(c(1,10),c(0,1))),
sim.rates(tree,setNames(c(1,10),c(0,1))))
phylomorphospace(tree,X,colors=setNames(c("blue","red"),c(0,1)))
## it is even possible to overlay a posterior density from stochastic mapping
phylomorphospace(obj$tree,X,colors=obj$cols,
lwd=3,xlab="x",ylab="y")
add.color.bar(4,obj$cols,title="PP(state=1)",
prompt=FALSE,x=0.9*par()$usr[1],y=0.9*par()$usr[3])
Written by Liam J. Revell. Last updated August 14, 2015