Exercise 9: Testing for an evolutionary correlation between binary characters using Pagel’s (1994) method

This exercise is a short exercise designed to fit and test Pagel's (1994) method for testing for an evolutionary relationship between two binary characters.

To do this exercise, you will first need to download the following dataset & tree, although they can also be loaded directly from the web:

Let's load phytools and then our data and tree:

library(phytools)
tree<-read.tree("fitPagel-tree.tre")
tree

## 
## Phylogenetic tree with 300 tips and 299 internal nodes.
## 
## Tip labels:
##  t149, t150, t164, t165, t59, t60, ...
## 
## Rooted; includes branch lengths.

X<-read.csv("fitPagel-data.csv",row.names=1,header=TRUE)
## X is a big matrix, so let's just visualize the top part
dim(X)

## [1] 300   3

head(X)

##    x y z
## t1 b b a
## t2 a a a
## t3 b b b
## t4 a a a
## t5 b b a
## t6 a a a

OK, having done this, we will use the phytools function fitPagel to fit Pagel's (1994) model. To get a sense of our data before we do, we should first visualize the data on the tree. Let's try this:

par(mfrow=c(1,2))
plot(tree,show.tip.label=FALSE,no.margin=TRUE)
par(fg="transparent")
x<-setNames(X[[1]],rownames(X))
tiplabels(pie=to.matrix(x[tree$tip.label],c("a","b")),piecol=c("blue","red"),
    cex=0.3)
par(fg="black")
add.simmap.legend(colors=setNames(c("blue","red"),c("a","b")),prompt=FALSE,
    x=0,y=10,fsize=1)
par(fg="transparent")
plot(tree,show.tip.label=FALSE,no.margin=TRUE,direction="leftwards")
y<-setNames(X[[2]],rownames(X))
tiplabels(pie=to.matrix(y[tree$tip.label],c("a","b")),piecol=c("blue","red"),
    cex=0.3)

plot of chunk unnamed-chunk-2

par(fg="black")

OK, so we can repeat that also with the third column of X:

par(mfrow=c(1,2))
plot(tree,show.tip.label=FALSE,no.margin=TRUE)
par(fg="transparent")
x<-setNames(X[[1]],rownames(X))
tiplabels(pie=to.matrix(x[tree$tip.label],c("a","b")),piecol=c("blue","red"),
    cex=0.3)
par(fg="black")
add.simmap.legend(colors=setNames(c("blue","red"),c("a","b")),prompt=FALSE,
    x=0,y=10,fsize=1)
par(fg="transparent")
plot(tree,show.tip.label=FALSE,no.margin=TRUE,direction="leftwards")
z<-setNames(X[[3]],rownames(X))
tiplabels(pie=to.matrix(z[tree$tip.label],c("a","b")),piecol=c("blue","red"),
    cex=0.3)

plot of chunk unnamed-chunk-3

par(fg="black")

Now we're ready to run our model:

fit.xy<-fitPagel(tree,x,y)
fit.xy

## 
## Pagel's binary character correlation test:
## 
## Assumes "ARD" substitution model for both characters
## 
## Independent model rate matrix:
##            a|a        a|b        b|a        b|b
## a|a -1.6338807  0.7929193  0.8409613  0.0000000
## a|b  0.4595752 -1.3005366  0.0000000  0.8409613
## b|a  0.5940199  0.0000000 -1.3869392  0.7929193
## b|b  0.0000000  0.5940199  0.4595752 -1.0535951
## 
## Dependent (x & y) model rate matrix:
##            a|a        a|b        b|a        b|b
## a|a -1.0668076  0.6538038  0.4130038  0.0000000
## a|b  0.9094271 -4.6819194  0.0000000  3.7724923
## b|a  2.6064761  0.0000000 -4.2118594  1.6053833
## b|b  0.0000000  0.3624449  0.3440521 -0.7064971
## 
## Model fit:
##             log-likelihood      AIC
## independent      -233.0357 474.0715
## dependent        -206.5951 429.1903
## 
## Hypothesis test result:
##   likelihood-ratio:  52.88121 
##   p-value:  9.023704e-11 
## 
## Model fitting method used was fitMk

fit.xz<-fitPagel(tree,x,z)

## Warning in log(comp[1:M + N]): NaNs produced

fit.xz

## 
## Pagel's binary character correlation test:
## 
## Assumes "ARD" substitution model for both characters
## 
## Independent model rate matrix:
##            a|a        a|b        b|a        b|b
## a|a -2.1937423  1.3527810  0.8409613  0.0000000
## a|b  0.8686791 -1.7096404  0.0000000  0.8409613
## b|a  0.5940199  0.0000000 -1.9468009  1.3527810
## b|b  0.0000000  0.5940199  0.8686791 -1.4626990
## 
## Dependent (x & y) model rate matrix:
##            a|a       a|b        b|a        b|b
## a|a -2.0365545  1.928809  0.1077458  0.0000000
## a|b  0.7142413 -2.033737  0.0000000  1.3194958
## b|a  0.8505095  0.000000 -0.9654704  0.1149610
## b|b  0.0000000  0.402200  0.4931755 -0.8953755
## 
## Model fit:
##             log-likelihood      AIC
## independent      -248.4293 504.8585
## dependent        -245.7055 507.4111
## 
## Hypothesis test result:
##   likelihood-ratio:  5.44746 
##   p-value:  0.2443865 
## 
## Model fitting method used was fitMk

Since one of this two models (x & y are correlated) is significant, let's use the plot method to visualize the result:

plot(fit.xy)

plot of chunk unnamed-chunk-5

## or
plot(fit.xy,lwd.by.rate=TRUE)

plot of chunk unnamed-chunk-5

Finally, we can also fit a model in which x depends on y, but not the converse, or y depends on x, but not the converse. To do this we can run the following:

fit.x<-fitPagel(tree,x,y,dep.var="x")
fit.x

## 
## Pagel's binary character correlation test:
## 
## Assumes "ARD" substitution model for both characters
## 
## Independent model rate matrix:
##            a|a        a|b        b|a        b|b
## a|a -1.6338807  0.7929193  0.8409613  0.0000000
## a|b  0.4595752 -1.3005366  0.0000000  0.8409613
## b|a  0.5940199  0.0000000 -1.3869392  0.7929193
## b|b  0.0000000  0.5940199  0.4595752 -1.0535951
## 
## Dependent (x only) model rate matrix:
##            a|a        a|b        b|a        b|b
## a|a -1.1647920  0.7981942  0.3665978  0.0000000
## a|b  0.4506319 -5.1481648  0.0000000  4.6975329
## b|a  3.0144733  0.0000000 -3.8126675  0.7981942
## b|b  0.0000000  0.3205245  0.4506319 -0.7711564
## 
## Model fit:
##             log-likelihood      AIC
## independent      -233.0357 474.0715
## dependent        -207.6239 427.2478
## 
## Hypothesis test result:
##   likelihood-ratio:  50.8237 
##   p-value:  9.19968e-12 
## 
## Model fitting method used was fitMk

plot(fit.x,lwd.by.rate=TRUE)

plot of chunk unnamed-chunk-6

fit.y<-fitPagel(tree,x,y,dep.var="y")
fit.y

## 
## Pagel's binary character correlation test:
## 
## Assumes "ARD" substitution model for both characters
## 
## Independent model rate matrix:
##            a|a        a|b        b|a        b|b
## a|a -1.6338807  0.7929193  0.8409613  0.0000000
## a|b  0.4595752 -1.3005366  0.0000000  0.8409613
## b|a  0.5940199  0.0000000 -1.3869392  0.7929193
## b|b  0.0000000  0.5940199  0.4595752 -1.0535951
## 
## Dependent (y only) model rate matrix:
##            a|a        a|b        b|a        b|b
## a|a -1.2400499  0.3923515  0.8476983  0.0000000
## a|b  3.1664689 -4.0141672  0.0000000  0.8476983
## b|a  0.6916232  0.0000000 -4.6332749  3.9416517
## b|b  0.0000000  0.6916232  0.1515701 -0.8431933
## 
## Model fit:
##             log-likelihood      AIC
## independent      -233.0357 474.0715
## dependent        -213.2927 438.5854
## 
## Hypothesis test result:
##   likelihood-ratio:  39.48603 
##   p-value:  2.665125e-09 
## 
## Model fitting method used was fitMk

plot(fit.y,lwd.by.rate=TRUE)

plot of chunk unnamed-chunk-6

We can compare these different possibilities using AIC:

aic<-setNames(c(fit.xy$independent.AIC,
    fit.x$dependent.AIC,
    fit.y$dependent.AIC,
    fit.xy$dependent.AIC),
    c("independent","dependent x",
    "dependent y","dependent x&y"))
aic

##   independent   dependent x   dependent y dependent x&y 
##      474.0715      427.2478      438.5854      429.1903

aic.w<-function(aic,signif=4){
    d.aic<-aic-min(aic)
    round(exp(-1/2*d.aic)/sum(exp(-1/2*d.aic)),signif)
}
aic.w(aic)

##   independent   dependent x   dependent y dependent x&y 
##        0.0000        0.7236        0.0025        0.2739

This shows that the strongest weight of evidence supports the dependent x model, although interdependent x & y is also strongly supported.

Challenge Problem 6: Detecting the evolutionary correlation between binary traits

Download the following two files:

Use the phytools function fitPagel to fit Pagel's model for correlated binary-trait evolution on the tree. What do you find? Visualize the character data on the tree. What does this tell you about the basis for your result?

Written by Liam J. Revell. Last updated 30 Jun. 2016.