In this tutorial we will learn to apply the independent contrasts method of Felsenstein (1985) for estimating the evolutionary correlation between characters.
Felsenstein (1985) identified (and, to a large extent, solved) a problem that had previously recognized, but that was underappreciated: and that is, that the data for species cannot be treated as independent data points from the point of view of statistical analysis.
To learn how to use the contrasts method to fit linear models in R, we’ll first need to learn some basics in linear model.
Here, I will load some data for gape width & buccal length (different attributes of the mouth) in different species of fish & we will fit a simple linear regression to the data.
The data files can be downloaded here:
obj<-read.csv("Centrarchidae.csv",row.names=1)
obj
## feeding.mode gape.width buccal.length
## Acantharchus_pomotis pisc 0.114 -0.009
## Lepomis_gibbosus non -0.133 -0.009
## Lepomis_microlophus non -0.151 0.012
## Lepomis_punctatus non -0.103 -0.019
## Lepomis_miniatus non -0.134 0.001
## Lepomis_auritus non -0.222 -0.039
## Lepomis_marginatus non -0.187 -0.075
## Lepomis_megalotis non -0.073 -0.049
## Lepomis_humilis non 0.024 -0.027
## Lepomis_macrochirus non -0.191 0.002
## Lepomis_gulosus pisc 0.131 0.122
## Lepomis_symmetricus non 0.013 -0.025
## Lepomis_cyanellus pisc -0.002 -0.009
## Micropterus_coosae pisc 0.045 -0.009
## Micropterus_notius pisc 0.097 -0.009
## Micropterus_treculi pisc 0.056 0.001
## Micropterus_salmoides pisc 0.056 -0.059
## Micropterus_floridanus pisc 0.096 0.051
## Micropterus_punctulatus pisc 0.092 0.002
## Micropterus_dolomieu pisc 0.035 -0.069
## Centrarchus_macropterus non -0.007 -0.055
## Enneacantus_obesus non 0.016 -0.005
## Pomoxis_annularis pisc -0.004 -0.019
## Pomoxis_nigromaculatus pisc 0.105 0.041
## Archolites_interruptus pisc -0.024 0.051
## Ambloplites_ariommus pisc 0.135 0.123
## Ambloplites_rupestris pisc 0.086 0.041
## Ambloplites_cavifrons pisc 0.130 0.040
Now we can fit a model in which gape.width
is affected by buccal.length
. Without taking phylogeny into account, these characters do seem to be somewhat weakly correlated:
plot(obj[,c("buccal.length","gape.width")])
Now let’s fit our OLS regression model to see:
fit.ols<-lm(gape.width~buccal.length,data=obj)
fit.ols
##
## Call:
## lm(formula = gape.width ~ buccal.length, data = obj)
##
## Coefficients:
## (Intercept) buccal.length
## -3.817e-05 1.069e+00
summary(fit.ols)
##
## Call:
## lm(formula = gape.width ~ buccal.length, data = obj)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.19310 -0.07951 0.03057 0.05653 0.12366
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.817e-05 1.832e-02 -0.002 0.99835
## buccal.length 1.069e+00 3.843e-01 2.781 0.00995 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.09695 on 26 degrees of freedom
## Multiple R-squared: 0.2292, Adjusted R-squared: 0.1996
## F-statistic: 7.733 on 1 and 26 DF, p-value: 0.009951
plot(obj[,c("buccal.length","gape.width")])
abline(fit.ols,lwd=2,lty="dashed",col="red")
However, we cannot forget that when our data are phylogenetic, the assumption independent and identical distribution of the residual error does not hold. Consequently, we need to take the phylogeny into account. One way to do this is by using PICs:
library(ape)
setwd("~/Dropbox/R-puerto-rico-2016/")
cent.tree<-read.tree("Centrarchidae.tre")
buccal.length<-setNames(obj[,"buccal.length"],rownames(obj))
gape.width<-setNames(obj[,"gape.width"],rownames(obj))
pic.bl<-pic(buccal.length,cent.tree)
pic.gw<-pic(gape.width,cent.tree)
fit.pic<-lm(pic.gw~pic.bl+0)
fit.pic
##
## Call:
## lm(formula = pic.gw ~ pic.bl + 0)
##
## Coefficients:
## pic.bl
## 0.5932
summary(fit.pic)
##
## Call:
## lm(formula = pic.gw ~ pic.bl + 0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.58401 -0.22945 -0.03814 0.12947 0.99784
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## pic.bl 0.5932 0.2556 2.321 0.0284 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3189 on 26 degrees of freedom
## Multiple R-squared: 0.1716, Adjusted R-squared: 0.1398
## F-statistic: 5.387 on 1 and 26 DF, p-value: 0.0284
plot(pic.bl,pic.gw,xlab="PICs for buccal length",ylab="PICs for gape width")
abline(fit.pic,lwd=2,lty="dashed",col="red")
So we can see that in this case, there is not a huge effect of taking phylogeny into consideration. However, it is easy to imagine (& simulate) conditions under which taking the phylogeny into account when fitting a regression model would be of great consequence.
## load phytools
library(phytools)
## Loading required package: maps
##
## # maps v3.1: updated 'world': all lakes moved to separate new #
## # 'lakes' database. Type '?world' or 'news(package="maps")'. #
## set seed (so we all get the same result
set.seed(21) ## 10 ok, 2 ok, 3 ok, 6 ok, 7 ok,
## simulate a coalescent shaped tree
tree<-rcoal(n=100)
plotTree(tree,ftype="off")
## simulate uncorrelated Brownian evolution
x<-fastBM(tree)
y<-fastBM(tree)
par(mar=c(5.1,4.1,2.1,2.1))
plot(x,y)
fit<-lm(y~x)
abline(fit,lwd=2,lty="dashed",col="red")