The challenge problem was as follows.
Download the following two files:
Use the phytools function fitPagel
to fit Pagel's model for correlated
binary-trait evolution on the tree. What do you find? Visualize the character data on
the tree. What does this tell you about the basis for your result?
First, let's load packages:
library(phytools)
Now load data & tree from file:
tree<-read.tree("fitPagel-challenge.tre")
tree
##
## Phylogenetic tree with 64 tips and 63 internal nodes.
##
## Tip labels:
## t1, t2, t3, t4, t5, t6, ...
##
## Rooted; includes branch lengths.
X<-read.csv("fitPagel-challenge.csv",row.names=1)
X
## x y
## t1 a 0
## t2 a 0
## t3 a 0
## t4 a 0
## t5 a 0
## t6 a 0
## t7 a 0
## t8 a 0
## t9 a 0
## t10 a 0
## t11 a 0
## t12 a 0
## t13 a 0
## t14 a 0
## t15 a 0
## t16 a 0
## t17 a 0
## t18 a 0
## t19 a 0
## t20 a 0
## t21 a 0
## t22 a 0
## t23 a 0
## t24 a 0
## t25 a 0
## t26 a 0
## t27 a 0
## t28 a 0
## t29 a 0
## t30 a 0
## t31 a 0
## t32 a 0
## t33 b 0
## t34 b 1
## t35 b 0
## t36 b 1
## t37 b 0
## t38 b 1
## t39 b 0
## t40 b 1
## t41 b 0
## t42 b 1
## t43 b 0
## t44 b 1
## t45 b 0
## t46 b 1
## t47 b 0
## t48 b 1
## t49 b 0
## t50 b 1
## t51 b 0
## t52 b 1
## t53 b 0
## t54 b 1
## t55 b 0
## t56 b 1
## t57 b 0
## t58 b 1
## t59 b 0
## t60 b 1
## t61 b 0
## t62 b 1
## t63 b 0
## t64 b 1
Now, let's apply Pagel's (1994) method using fitPagel
:
x<-setNames(X[,1],rownames(X))
y<-setNames(as.factor(X[,2]),rownames(X))
fit.xy<-fitPagel(tree,x,y)
fit.xy
##
## Pagel's binary character correlation test:
##
## Assumes "ARD" substitution model for both characters
##
## Independent model rate matrix:
## a|0 a|1 b|0 b|1
## a|0 -48.7883436 48.6862230 0.1021206 0.0000000
## a|1 146.0589047 -146.1610253 0.0000000 0.1021206
## b|0 0.1021176 0.0000000 -48.7883406 48.6862230
## b|1 0.0000000 0.1021176 146.0589047 -146.1610223
##
## Dependent (x & y) model rate matrix:
## a|0 a|1 b|0 b|1
## a|0 0.00000 0.0000 0.0000 0.00000
## a|1 58.08975 -116.1796 0.0000 58.08985
## b|0 0.00000 0.0000 -184.0401 184.04013
## b|1 0.00000 0.0000 184.0402 -184.04023
##
## Model fit:
## log-likelihood AIC
## independent -40.21542 88.43084
## dependent -24.95330 65.90660
##
## Hypothesis test result:
## likelihood-ratio: 30.52424
## p-value: 3.827566e-06
##
## Model fitting method used was fitMk
plot(fit.xy,lwd.by.rate=TRUE)
We can also fit the model with dependent "x"
or dependent
"y"
variables, e.g.:
fit.x<-fitPagel(tree,x,y,dep.var="x")
fit.x
##
## Pagel's binary character correlation test:
##
## Assumes "ARD" substitution model for both characters
##
## Independent model rate matrix:
## a|0 a|1 b|0 b|1
## a|0 -48.7883436 48.6862230 0.1021206 0.0000000
## a|1 146.0589047 -146.1610253 0.0000000 0.1021206
## b|0 0.1021176 0.0000000 -48.7883406 48.6862230
## b|1 0.0000000 0.1021176 146.0589047 -146.1610223
##
## Dependent (x only) model rate matrix:
## a|0 a|1 b|0 b|1
## a|0 -12.91098 12.91098 0.00000 0.0000000
## a|1 38.87035 -39.78071 0.00000 0.9103622
## b|0 0.00000 0.00000 -12.91098 12.9109836
## b|1 0.00000 0.00000 38.87035 -38.8703517
##
## Model fit:
## log-likelihood AIC
## independent -40.21542 88.43084
## dependent -40.10414 92.20828
##
## Hypothesis test result:
## likelihood-ratio: 0.2225532
## p-value: 0.8946912
##
## Model fitting method used was fitMk
plot(fit.x,lwd.by.rate=TRUE)
fit.y<-fitPagel(tree,x,y,dep.var="y")
fit.y
##
## Pagel's binary character correlation test:
##
## Assumes "ARD" substitution model for both characters
##
## Independent model rate matrix:
## a|0 a|1 b|0 b|1
## a|0 -48.7883436 48.6862230 0.1021206 0.0000000
## a|1 146.0589047 -146.1610253 0.0000000 0.1021206
## b|0 0.1021176 0.0000000 -48.7883406 48.6862230
## b|1 0.0000000 0.1021176 146.0589047 -146.1610223
##
## Dependent (y only) model rate matrix:
## a|0 a|1 b|0 b|1
## a|0 -0.1021302 0.0000000 0.1021302 0.0000000
## a|1 2666.7799470 -2666.8820773 0.0000000 0.1021302
## b|0 0.1021205 0.0000000 -8940.5025370 8940.4004165
## b|1 0.0000000 0.1021205 8936.3891783 -8936.4912988
##
## Model fit:
## log-likelihood AIC
## independent -40.21542 88.43084
## dependent -26.40668 64.81337
##
## Hypothesis test result:
## likelihood-ratio: 27.61747
## p-value: 1.006798e-06
##
## Model fitting method used was fitMk
plot(fit.y,lwd.by.rate=TRUE)
Invariably, these 'dependent' models seem to fit better than our independent models. In particular:
aic<-setNames(c(fit.xy$independent.AIC,
fit.x$dependent.AIC,fit.y$dependent.AIC,
fit.xy$dependent.AIC),c("ind","dep:x","dep:y",
"dep:xy"))
aic.w(aic)
## ind dep:x dep:y dep:xy
## 0.00000471 0.00000071 0.63334650 0.36664808
suggests that the weight of evidence favors a model in which either y depends on x or in which x & y are interdependent.
But are they?
Let's now examine the pattern of character variation on the tree:
dotTree(tree,X,fsize=0.6,ftype="i",pts=F)
Here we see that rather than showing a consistent relationship, character y
merely seems to change a lot in the red ("b"
) clade for character
x, but not at all in the black ("a"
) clade.
This shows that the model is sensitive to finding a significant result in the absence of a true correlation between the characters, merely because of a heterogeneous rate in one character or the other.
This is essentially the result identified by Maddisson & Fitzjohn (2015).
Written by Liam J. Revell. Last updated 2 August 2017.