Exercise 3: Phylogenetically independent contrasts

In this tutorial we will learn to apply the independent contrasts method of Felsenstein (1985) for estimating the evolutionary correlation between characters.

Felsenstein (1985) identified (and, to a large extent, solved) a problem that had previously recognized, but that was underappreciated: and that is, that the data for species cannot be treated as independent data points from the point of view of statistical analysis.

To learn how to use the contrasts method to fit linear models in R, we'll first need to learn some basics in linear model.

Here, I will load some data for gape width & buccal length (different attributes of the mouth) in different species of fish & we will fit a simple linear regression to the data.

The data files we will use can be downloaded here:

  1. Centrarchidae.csv.
  2. Centrarchidae.tre.
obj<-read.csv("Centrarchidae.csv",row.names=1)
obj
##                         feeding.mode gape.width buccal.length
## Acantharchus_pomotis            pisc      0.114        -0.009
## Lepomis_gibbosus                 non     -0.133        -0.009
## Lepomis_microlophus              non     -0.151         0.012
## Lepomis_punctatus                non     -0.103        -0.019
## Lepomis_miniatus                 non     -0.134         0.001
## Lepomis_auritus                  non     -0.222        -0.039
## Lepomis_marginatus               non     -0.187        -0.075
## Lepomis_megalotis                non     -0.073        -0.049
## Lepomis_humilis                  non      0.024        -0.027
## Lepomis_macrochirus              non     -0.191         0.002
## Lepomis_gulosus                 pisc      0.131         0.122
## Lepomis_symmetricus              non      0.013        -0.025
## Lepomis_cyanellus               pisc     -0.002        -0.009
## Micropterus_coosae              pisc      0.045        -0.009
## Micropterus_notius              pisc      0.097        -0.009
## Micropterus_treculi             pisc      0.056         0.001
## Micropterus_salmoides           pisc      0.056        -0.059
## Micropterus_floridanus          pisc      0.096         0.051
## Micropterus_punctulatus         pisc      0.092         0.002
## Micropterus_dolomieu            pisc      0.035        -0.069
## Centrarchus_macropterus          non     -0.007        -0.055
## Enneacantus_obesus               non      0.016        -0.005
## Pomoxis_annularis               pisc     -0.004        -0.019
## Pomoxis_nigromaculatus          pisc      0.105         0.041
## Archolites_interruptus          pisc     -0.024         0.051
## Ambloplites_ariommus            pisc      0.135         0.123
## Ambloplites_rupestris           pisc      0.086         0.041
## Ambloplites_cavifrons           pisc      0.130         0.040

Now we can fit a model in which gape.width is affected by buccal.length. Without taking phylogeny into account, these characters do seem to be somewhat weakly correlated:

plot(obj[,c("buccal.length","gape.width")],
    xlab="relative buccal length",
    ylab="relative gape width",pch=21,bg="grey",
    cex=1.4)

plot of chunk unnamed-chunk-2

Now let's fit our OLS regression model to see:

fit.ols<-lm(gape.width~buccal.length,data=obj)
fit.ols
## 
## Call:
## lm(formula = gape.width ~ buccal.length, data = obj)
## 
## Coefficients:
##   (Intercept)  buccal.length  
##    -3.817e-05      1.069e+00
summary(fit.ols)
## 
## Call:
## lm(formula = gape.width ~ buccal.length, data = obj)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.19310 -0.07951  0.03057  0.05653  0.12366 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   -3.817e-05  1.832e-02  -0.002  0.99835   
## buccal.length  1.069e+00  3.843e-01   2.781  0.00995 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.09695 on 26 degrees of freedom
## Multiple R-squared:  0.2292, Adjusted R-squared:  0.1996 
## F-statistic: 7.733 on 1 and 26 DF,  p-value: 0.009951
plot(obj[,c("buccal.length","gape.width")],
    xlab="relative buccal length",
    ylab="relative gape width",pch=21,bg="grey",
    cex=1.4)
abline(fit.ols,lwd=2,lty="dashed",col="red")

plot of chunk unnamed-chunk-3

However, we cannot forget that when our data are phylogenetic, the assumption independent and identical distribution of the residual error does not hold. Consequently, we need to take the phylogeny into account. One way to do this is by using PICs:

library(ape)
cent.tree<-read.tree("Centrarchidae.tre")
buccal.length<-setNames(obj[,"buccal.length"],
    rownames(obj))
gape.width<-setNames(obj[,"gape.width"],rownames(obj))
pic.bl<-pic(buccal.length,cent.tree)
pic.gw<-pic(gape.width,cent.tree)
fit.pic<-lm(pic.gw~pic.bl+0)
fit.pic
## 
## Call:
## lm(formula = pic.gw ~ pic.bl + 0)
## 
## Coefficients:
## pic.bl  
## 0.5932
summary(fit.pic)
## 
## Call:
## lm(formula = pic.gw ~ pic.bl + 0)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.58401 -0.22945 -0.03814  0.12947  0.99784 
## 
## Coefficients:
##        Estimate Std. Error t value Pr(>|t|)  
## pic.bl   0.5932     0.2556   2.321   0.0284 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3189 on 26 degrees of freedom
## Multiple R-squared:  0.1716, Adjusted R-squared:  0.1398 
## F-statistic: 5.387 on 1 and 26 DF,  p-value: 0.0284
plot(pic.bl,pic.gw,xlab="PICs for buccal length",
    ylab="PICs for gape width",bg="grey",
    cex=1.4,pch=21)
abline(fit.pic,lwd=2,lty="dashed",col="red")

plot of chunk unnamed-chunk-4

So we can see that in this case, there is not a huge effect of taking phylogeny into consideration. However, it is easy to imagine (& simulate) conditions under which taking the phylogeny into account when fitting a regression model would be of great consequence. Let's try it:

## load phytools
library(phytools)
## Loading required package: maps
## set seed (so we all get the same result)
set.seed(21) 
## simulate a coalescent shaped tree
tree<-rcoal(n=100)
plotTree(tree,ftype="off")

plot of chunk unnamed-chunk-5

## simulate uncorrelated Brownian evolution
x<-fastBM(tree)
y<-fastBM(tree)
par(mar=c(5.1,4.1,2.1,2.1))
plot(x,y,cex=1.4,pch=21,bg="grey")
fit<-lm(y~x)
abline(fit,lwd=2,lty="dashed",col="red")

plot of chunk unnamed-chunk-5

fit
## 
## Call:
## lm(formula = y ~ x)
## 
## Coefficients:
## (Intercept)            x  
##      0.0322      -0.7496
summary(fit)
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.90636 -0.68689  0.03199  0.53566  2.06929 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.03220    0.09378   0.343    0.732    
## x           -0.74963    0.08651  -8.665 9.46e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9294 on 98 degrees of freedom
## Multiple R-squared:  0.4338, Adjusted R-squared:  0.428 
## F-statistic: 75.09 on 1 and 98 DF,  p-value: 9.458e-14
anova(fit)
## Analysis of Variance Table
## 
## Response: y
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## x          1 64.857  64.857   75.09 9.458e-14 ***
## Residuals 98 84.644   0.864                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Remember, these data were simulated in the absence of an evolutionary correlation between x & y.

We can see from this example, that it is not difficult for phylogeny to induce a type I error.

Here it is because clusters of closely related taxa have highly similar phenotypes. In other words, they are not independent data points about the evolutionary process for x and y on the tree:

## this is a projection of the tree into morphospace
phylomorphospace(tree,cbind(x,y),label="off",node.size=c(0,0))
points(x,y,pch=21,bg="grey",cex=1.4)
abline(fit,lwd=2,lty="dashed",col="red")

plot of chunk unnamed-chunk-6

When we use Felsenstein's (1985) algorithm to substitute the contrasts between species (& nodes) for the original values. Let's see if it resolves our type I error in this case:

ix<-pic(x,tree)
iy<-pic(y,tree)
fit<-lm(iy~ix-1) ## we have to fit the model without an intercept term
fit
## 
## Call:
## lm(formula = iy ~ ix - 1)
## 
## Coefficients:
##       ix  
## -0.03771
summary(fit)
## 
## Call:
## lm(formula = iy ~ ix - 1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.9265 -0.8937 -0.2124  0.4106  2.7002 
## 
## Coefficients:
##    Estimate Std. Error t value Pr(>|t|)
## ix -0.03771    0.09814  -0.384    0.702
## 
## Residual standard error: 1.043 on 98 degrees of freedom
## Multiple R-squared:  0.001504,   Adjusted R-squared:  -0.008685 
## F-statistic: 0.1476 on 1 and 98 DF,  p-value: 0.7017
anova(fit)
## Analysis of Variance Table
## 
## Response: iy
##           Df Sum Sq Mean Sq F value Pr(>F)
## ix         1   0.16 0.16046  0.1476 0.7017
## Residuals 98 106.53 1.08703

This is not an example of a “real” relationship that has been removed with contrasts! Remember, we know our data were simulated without a genuine evolutionary relationship between x & y. The relationship that we measure in the OLS case is a spurious relationship driven by the phylogeny.

Challenge Problem 1: Phylogenetically Independent Contrasts

Download the following two files:

  1. pic-exercise-data.csv
  2. pic-exercise-tree.tre

Fit a OLS (non-phylogenetic) regression model for y~x and a phylogenetic regression model using contrasts. What do you find? Why?

Written by Liam J. Revell. Last updated 31 July 2017