This tutorial is about fitting multi-rate Brownian motion models (using phytools), multi-regime OU models (using the OUwie package), and multivariate Brownian models (using phytools).
The first exercise is designed to explore a model developed by O'Meara et al. (2006; Evolution) in which the rate of Brownian evolution (σ2) takes different values in different parts of a phylogeny.
For this analysis we will use the function brownie.lite
in the phytools package. This function allows us to input a tree with painted rate “regimes” - for instance, from a stochastically mapped discrete character - and then fit a model in which the rate differs depending on the mapped regime.
First, let's read simulated tree & dataset from file. These data will be used to illustrate fitting a two-rate Brownian model using brownie.lite
in the phytools package. The tree is simulated.tre and the data vector is in simulated.csv.
## load phytools
library(phytools)
Now, let's load the single tree with painted regimes from file. We can do this by downloading the file from the web, and the loading it using the phytools function read.simmap
.
## read simulated tree from file
tree<-read.simmap("simulated.tre",format="phylip")
Now, let's plot the tree with painted regimes:
## plot tree with regimes
colors=setNames(c("brown","blue"),c("terr","aqua"))
plot(tree,colors,lwd=4)
add.simmap.legend(colors=colors,prompt=FALSE,
x=0.05*max(nodeHeights(tree)),y=0.1*Ntip(tree))
Next, we can read our data from file. These are data for a single continuous character:
## read data from file
x<-read.csv("simulated.csv",header=TRUE,row.names=1)
## convert to vector with names
x<-setNames(x[,1],rownames(x))
x
## A B C D E F
## 1.32028474 0.61662698 -1.00911612 -0.70287046 1.56931705 1.64972830
## G H I J K L
## 0.94406092 1.23534571 2.05738145 3.16707622 -0.39868954 -1.40144281
## M N O P Q R
## -0.89405959 -0.76103676 -0.76097347 -0.81459239 -0.19720860 -0.06206985
## S T U V W X
## 0.06946520 -0.18486903 -1.77787925 -0.79620907 -0.39166148 -0.63321831
## Y Z
## -0.48235455 -0.46151922
Now we are ready to fit our multi-rate
## fit multi-rate Brownian model
fitBM<-brownie.lite(tree,x)
fitBM
## ML single-rate model:
## s^2 se a k logL
## value 2.1036 0.5834 -0.1898 2 -31.5992
##
## ML multi-rate model:
## s^2(aqua) se(aqua) s^2(terr) se(terr) a k logL
## value 5.7837 2.7182 0.6659 0.2398 -0.4222 3 -26.2556
##
## P-value (based on X^2): 0.0011
##
## R thinks it has found the ML solution.
This result shows (1) that a two-rate model fits the data highly significantly better than a one-rate model; and (2) the rate of evolution in the fitted model is much higher for state "aqua"
than for state "terr"
.
In addition to this approach in which the rate of evolution differs between different parts of the tree, we can also fit an Ornstein-Uhlenbeck model in which the pull or selection regimes different (i.e., have different optimums) in different parts of the phylogeny.
To explore this model, let's read in the Anolis tree & data for body size, and the fit a multi-optimum OU model in which the regime shifts are associated with the ecomorph state of different anole species. Keep in mind, that our data are merely for overall size, while the ecomorph convergence is multivariate.
In this case, our tree is ecomorph.tre and our data is ecomorph-data.csv. Finally, we need the file ecomorph.csv, which contains the ecomorph identities of each tip.
First, let's load the package “OUwie”. If you do not have OUwie, then you should first install it from CRAN.
## load OUwie
library(OUwie)
Now let's read our data from the input file:
## read anole data from file
X<-read.csv("ecomorph-data.csv",row.names=1)
## read anole tree
anolis.tree<-read.tree("ecomorph.tre")
plotTree(anolis.tree,type="fan",fsize=0.8)
Next, to work in OUwie we need to make a special data frame for our analysis, as follows:
## make analysis input data.frame
ecomorph<-read.csv("ecomorph.csv",row.names=1)
ecomorph<-setNames(ecomorph[,1],rownames(X))
pca<-phyl.pca(anolis.tree,X)
data<-data.frame(Genus_species=rownames(pca$S),Reg=ecomorph,
X=as.numeric(pca$S[,"PC2"]))
This data frame contains our trait and the regimes (in this case, "ecomorph"
) for the tips, but we also need a reconstruction of the regimes across the branches and nodes of the tree. For this, we will use the method of stochastic mapping. Here, I will just use one stochastic map - but normally we would want to integrate across a set of stochastic maps.
## perform & plot stochastic maps (we would normally do this x100)
smap.tree<-make.simmap(anolis.tree,ecomorph)
## make.simmap is sampling character histories conditioned on the transition matrix
##
## Q =
## CG GB TC TG Tr Tw
## CG -0.3933363 0.0000000 0.19321575 0.0000000 0.00000000 0.2001205
## GB 0.0000000 -0.5024037 0.00000000 0.1959993 0.00000000 0.3064044
## TC 0.1932158 0.0000000 -0.74117200 0.0000000 0.09188022 0.4560760
## TG 0.0000000 0.1959993 0.00000000 -0.4428236 0.00000000 0.2468243
## Tr 0.0000000 0.0000000 0.09188022 0.0000000 -0.23442779 0.1425476
## Tw 0.2001205 0.3064044 0.45607602 0.2468243 0.14254757 -1.3519729
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
## CG GB TC TG Tr Tw
## 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
## Done.
plot(smap.tree,type="fan",fsize=0.8,ftype="i")
## no colors provided. using the following legend:
## CG GB TC TG Tr Tw
## "black" "red" "green3" "blue" "cyan" "magenta"
Now we are finally ready to fit our models. We will first fit a single-rate Brownian model and then we will also fit a multi-rate OU model for comparison:
## fit Brownian & multi-optimum OU models
fitBM<-OUwie(smap.tree,data,model="BM1",simmap.tree=TRUE) ## single rate
## Initializing...
## Finished. Begin thorough search...
## Finished. Summarizing results.
fitBMS<-OUwie(smap.tree,data,model="BMS",simmap.tree=TRUE) ## multiple rates
## Warning: You might not have enough data to fit this model well
## Initializing...
## Finished. Begin thorough search...
## Finished. Summarizing results.
fitOUM<-OUwie(smap.tree,data,model="OUM",simmap.tree=TRUE) ## multiple optima
## Initializing...
## Finished. Begin thorough search...
## Finished. Summarizing results.
fitBM
##
## Fit
## lnL AIC AICc model ntax
## -47.39086 98.78172 98.93362 BM1 82
##
## Rates
## CG GB TC TG Tr Tw
## alpha NA NA NA NA NA NA
## sigma.sq 0.3948603 0.3948603 0.3948603 0.3948603 0.3948603 0.3948603
##
## Optima
## CG GB TC TG Tr Tw
## estimate 2.421034e-08 0 0 0 0 0
## se 1.919428e-01 0 0 0 0 0
##
## Arrived at a reliable solution
fitBMS
##
## Fit
## lnL AIC AICc model ntax
## -34.3755 92.75101 97.27275 BMS 82
##
## Rates
## CG GB TC TG Tr Tw
## alpha NA NA NA NA NA NA
## sigma.sq 1.292975 0.1878845 0.3951411 0.3794906 0.04817196 0.1537394
##
## Optima
## CG GB TC TG Tr Tw
## estimate -0.2602895 -0.1503886 0.1059660 0.2107981 -0.07376918 -0.0551778
## se 1.0378414 0.3184524 0.3810225 0.3481540 0.48921505 0.1587834
##
## Arrived at a reliable solution
fitOUM
##
## Fit
## lnL AIC AICc model ntax
## -13.34063 42.68125 44.65386 OUM 82
##
##
## Rates
## CG GB TC TG Tr Tw
## alpha 354.88054 354.88054 354.88054 354.88054 354.88054 354.88054
## sigma.sq 57.52627 57.52627 57.52627 57.52627 57.52627 57.52627
##
## Optima
## CG GB TC TG Tr
## estimate -0.08156343 -0.05837721 0.01970191 0.07110250 -0.005318659
## se 0.09003619 0.06710291 0.07895967 0.05742501 0.117621570
## Tw
## estimate 0.01316362
## se 0.08976092
##
## Arrived at a reliable solution
The last thing we are going to do is fit a multivariate Brownian model in which the evolutinary covariance (and thus correlation) between characters can be different in different parts of the tree. This is a method based on Revell & Collar (2009; Evolution).
For this example we will use data and a phylogeny for centrarchid fishes. The data are in Centrarchidae.csv and the tree is in Centrarchidae.tre.
Let's read our tree & data:
## read centrarchid tree
fish.tree<-read.tree("Centrarchidae.tre")
## read centrarchid data
fish.data<-read.csv("Centrarchidae.csv",header=TRUE,row.names=1) ## or
fish.data
## feeding.mode gape.width buccal.length
## Acantharchus_pomotis pisc 0.114 -0.009
## Lepomis_gibbosus non -0.133 -0.009
## Lepomis_microlophus non -0.151 0.012
## Lepomis_punctatus non -0.103 -0.019
## Lepomis_miniatus non -0.134 0.001
## Lepomis_auritus non -0.222 -0.039
## Lepomis_marginatus non -0.187 -0.075
## Lepomis_megalotis non -0.073 -0.049
## Lepomis_humilis non 0.024 -0.027
## Lepomis_macrochirus non -0.191 0.002
## Lepomis_gulosus pisc 0.131 0.122
## Lepomis_symmetricus non 0.013 -0.025
## Lepomis_cyanellus pisc -0.002 -0.009
## Micropterus_coosae pisc 0.045 -0.009
## Micropterus_notius pisc 0.097 -0.009
## Micropterus_treculi pisc 0.056 0.001
## Micropterus_salmoides pisc 0.056 -0.059
## Micropterus_floridanus pisc 0.096 0.051
## Micropterus_punctulatus pisc 0.092 0.002
## Micropterus_dolomieu pisc 0.035 -0.069
## Centrarchus_macropterus non -0.007 -0.055
## Enneacantus_obesus non 0.016 -0.005
## Pomoxis_annularis pisc -0.004 -0.019
## Pomoxis_nigromaculatus pisc 0.105 0.041
## Archolites_interruptus pisc -0.024 0.051
## Ambloplites_ariommus pisc 0.135 0.123
## Ambloplites_rupestris pisc 0.086 0.041
## Ambloplites_cavifrons pisc 0.130 0.040
Now let's pull out the feeding mode from this data frame. This is the character that we are going to map on the tree for our different regimes. We can then use this character to generate a set of stochastic maps on the phylogeny:
fmode<-setNames(fish.data[,1],rownames(fish.data))
fmode
## Acantharchus_pomotis Lepomis_gibbosus Lepomis_microlophus
## pisc non non
## Lepomis_punctatus Lepomis_miniatus Lepomis_auritus
## non non non
## Lepomis_marginatus Lepomis_megalotis Lepomis_humilis
## non non non
## Lepomis_macrochirus Lepomis_gulosus Lepomis_symmetricus
## non pisc non
## Lepomis_cyanellus Micropterus_coosae Micropterus_notius
## pisc pisc pisc
## Micropterus_treculi Micropterus_salmoides Micropterus_floridanus
## pisc pisc pisc
## Micropterus_punctulatus Micropterus_dolomieu Centrarchus_macropterus
## pisc pisc non
## Enneacantus_obesus Pomoxis_annularis Pomoxis_nigromaculatus
## non pisc pisc
## Archolites_interruptus Ambloplites_ariommus Ambloplites_rupestris
## pisc pisc pisc
## Ambloplites_cavifrons
## pisc
## Levels: non pisc
## stochastic mapping of feeding mode on the tree (we would normally do x100)
fish.tree<-make.simmap(fish.tree,fmode,model="ARD")
## make.simmap is sampling character histories conditioned on the transition matrix
##
## Q =
## non pisc
## non -6.087789 6.087789
## pisc 3.048905 -3.048905
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
## non pisc
## 0.5 0.5
## Done.
Plot our tree & mapped regimes:
cols<-setNames(c("blue","red"),c("non","pisc"))
plot(fish.tree,colors=cols,ftype="i")
add.simmap.legend(colors=cols,prompt=FALSE,x=0,y=Ntip(fish.tree))
Next, we can use the two continuous characters to test our hypothesis that the feeding mode affects the evolutionary correlation/covariance between traits. For this analysis we will use evol.vcv
in the phytools package.
## data
fish.X<-as.matrix(fish.data[,2:3])
## fit model
fitMV<-evol.vcv(fish.tree,fish.X)
fitMV
## ML single-matrix model:
## R[1,1] R[1,2] R[2,2] k log(L)
## fitted 0.114 0.033 0.0556 5 72.1893
##
## ML multi-matrix model:
## R[1,1] R[1,2] R[2,2] k log(L)
## non 0.1656 0.0041 0.0181 8 79.5525
## pisc 0.0607 0.0615 0.1043
##
## P-value (based on X^2): 0.0021
##
## R thinks it has found the ML solution.
This shows us that the two covariance model fits significantly better than the one covariance model. We can also easily extract the evolutionary correlations from these two alernative models:
## now let's look at the correlation matrices
cov2cor(fitMV$R.single)
## gape.width buccal.length
## gape.width 1.000000 0.414274
## buccal.length 0.414274 1.000000
cov2cor(fitMV$R.multiple[,,"non"])
## gape.width buccal.length
## gape.width 1.00000000 0.07554396
## buccal.length 0.07554396 1.00000000
cov2cor(fitMV$R.multiple[,,"pisc"])
## gape.width buccal.length
## gape.width 1.0000000 0.7732997
## buccal.length 0.7732997 1.0000000
That's it!
Download the following two data files (you may have them already).
Use stochastic character mapping to generate 10 stochastic character maps of feeding
mode on the tree of elopomorphs, then use brownie.lite
and
OUwie
to fit a multi-rate & a multi-regime OU model to the continuous
character, transformed to a log-scale. What do you find?
Written by Liam J. Revell. Last updated 30 Jun. 2016