Exercise 4: Phylogenetic generalized least squares regression and phylogenetic generalized ANOVA

First, we need to load the data & tree in R.

As always, we we need certain packages to read the phylogeny & run the analyses.

library(ape)
library(nlme)
library(geiger)

Now we have to load the data & tree to memory in R.

You should be able to find the files on the course page:

datos<-read.csv("Barbetdata.csv",header=TRUE,row.names=1)
datos

##                                          wing    Lnalt     patch   colour
## Calorhamphus_fuliginosus_fuliginosus 4.388257 5.298317  2.000000 1.666667
## Calorhamphus_fuliginosus_hayi        4.427239 5.298317  2.000000 1.666667
## M_armillaris                         4.532599 7.170120  6.333333 4.000000
## M_asiatica_asiatica                  4.611152 6.802395  7.333333 5.000000
## M_asiatica_davisoni                  4.605170 7.003065  6.666667 3.333333
## M_australis_duvaucelli               4.282206 6.620073  9.000000 4.000000
## M_chrysopogon                        4.829113 6.620073  9.000000 5.333333
## M_corvina                            4.824306 7.313220  2.000000 2.000000
## M_eximia                             4.359270 6.620073  7.666667 5.000000
## M_faiostricta                        4.693181 5.298317  5.000000 5.000000
## M_flavifrons                         4.487512 6.907755  5.000000 3.666667
## M_franklinii_auricularis             4.550714 7.313220  8.000000 5.666667
## M_franklinii_franklinii              4.604170 7.313220  7.666667 5.000000
## M_franklinii_ramsayi                 4.570579 7.313220  7.333333 5.000000
## M_haemacephala_indica                4.387014 6.109248  6.666667 4.000000
## M_henricii                           4.557030 6.163315  7.000000 4.666667
## M_incognita                          4.548600 6.802395  9.000000 3.333333
## M_javensis                           4.695925 6.214608  8.666667 4.333333
## M_lagrandieri                        4.898586 7.003065  7.000000 4.333333
## M_lineata_hodgsoni                   4.857484 5.991465  3.666667 2.666667
## M_monticola                          4.616110 6.802395  7.000000 5.000000
## M_mystacophanos                      4.553877 6.745236 10.333333 5.000000
## M_oorti_annamensis                   4.558079 7.313220  9.333333 5.000000
## M_oorti_nuchalis                     4.588024 6.620073  9.666667 5.666667
## M_oorti_oorti                        4.514151 7.170120  9.000000 5.000000
## M_pulcherrima                        4.521789 7.467371  5.666667 3.666667
## M_rafflesi                           4.741448 4.605170  8.333333 5.000000
## M_rubricapillus_malabarica           4.387014 6.109248  6.333333 4.000000
## M_rubricapillus_rubricapillus        4.346399 6.109248  9.000000 5.000000
## M_virens                             4.953712 7.495542  2.000000 2.000000
## M_viridis                            4.587006 6.214608  5.666667 3.000000
## M_zeylanica                          4.682131 5.991465  2.000000 2.333333
## Psilopogon_pyrolophus                4.773224 7.130899 10.000000 6.333333
##                                       Frequency      Length Lnote
## Calorhamphus_fuliginosus_fuliginosus  20.468894   3.1207387 0.260
## Calorhamphus_fuliginosus_hayi         22.483670   3.1371727 0.230
## M_armillaris                          -7.135924 -11.9001412 0.030
## M_asiatica_asiatica                  -10.153448   0.3671016 0.025
## M_asiatica_davisoni                   -9.700025   0.4492719 0.030
## M_australis_duvaucelli                 1.206501  -2.5198843 0.030
## M_chrysopogon                        -17.961279   0.1493287 0.100
## M_corvina                            -16.777077   0.8625655 0.115
## M_eximia                              -2.308277   3.2467332 0.030
## M_faiostricta                        -13.550036   0.3944917 0.045
## M_flavifrons                          -5.997081   1.7302251 0.140
## M_franklinii_auricularis             -10.305768   1.7466592 0.140
## M_franklinii_franklinii               -8.885098   1.7411811 0.135
## M_franklinii_ramsayi                  -8.648623   1.7685712 0.115
## M_haemacephala_indica                -16.782037   3.2248211 0.070
## M_henricii                            -9.201584  -6.4335091 0.125
## M_incognita                           -8.166673   0.3616236 0.040
## M_javensis                           -17.204273  -5.6678766 0.050
## M_lagrandieri                         -5.910024   3.0440464 0.400
## M_lineata_hodgsoni                   -12.746571   0.3287555 0.070
## M_monticola                          -16.093079  -7.7148614 0.060
## M_mystacophanos                      -15.948289  -9.0316506 0.060
## M_oorti_annamensis                    -9.654623  -0.1503694 0.030
## M_oorti_nuchalis                     -11.342388  -6.2060987 0.045
## M_oorti_oorti                        -11.091419   0.7530051 0.140
## M_pulcherrima                         -7.183282  -5.5981665 0.300
## M_rafflesi                           -22.419596 -16.9434720 0.050
## M_rubricapillus_malabarica           -16.255444   3.2357771 0.050
## M_rubricapillus_rubricapillus        -17.699099  -0.6955755 0.030
## M_virens                              -6.799948   2.9892662 0.500
## M_viridis                             -8.780492  -0.0736771 0.040
## M_zeylanica                          -13.327782   0.4273598 0.060
## Psilopogon_pyrolophus                  5.990051 -10.1022207 0.300

These data are from a study in which the purpose was to identify factors that were associated with the evolucion of characteristics of song in an group of Asian bird species called “barbets.”

arbol<-read.nexus("BarbetTree.nex")

One useful step that we haven't covered so far is to ensure that all the species in the data are in the tree & vice versa.

Remeber that if there are any spelling differences, even small, then the tree & dataset will not coincide.

To establish that we have the same species in our data as in the tree, or vice versa, we can use a function in the geiger package called name.check.

obj<-name.check(arbol,datos)
obj

## $tree_not_data
## [1] "M_asiatica_chersonesus"    "M_australis_australis"    
## [3] "M_haemacephala_celestinoi" "M_haemacephala_delica"    
## [5] "M_haemacephala_intermedia" "M_haemacephala_rosea"     
## [7] "M_lineata_lineata"         "M_oorti_faber"            
## [9] "M_oorti_sini"             
## 
## $data_not_tree
## character(0)

We should see that there are 9 species in the tree for which we do not have data. We have to prune these species from the tree to proceed with the analysis. We can do this using the function drop.tip as we learned:

arbol.cortado<-drop.tip(arbol, obj$tree_not_data)

Now the data & the tree should coincide in both number of species and names. We can verify this using name.check:

name.check(arbol.cortado,datos)

## [1] "OK"

Now that we are sure that the tree and dataset match, we can start to explore the phylogenetic GLS.

There are two primary packages that can be used to conduct PGLS: ape (with nlme) and caper. There are differences between the two packages in how they work and the information each package and corresponding function returns.

First, we will explore phylogenetic GLS in ape.

Phylogenetic GLS is basically a linear model in which the covariance (correlation) structure between species is permitted to match that expected under a Brownian motion process* of evolution on the tree. (*Or other processes.) Consequently, the first step is to define this covariance structure. We do this as follows:

bm<-corBrownian(1, arbol.cortado)
bm

## Uninitialized correlation structure of class corBrownian

We have defined a variance-covariance structure based on the model of Brownian evolution that we learned earlier.

The first model we will fit is for a simple analysis investigating the relationship between altitude at which a species is found and the length of the note in its song.

modelo1<-gls(Lnote~Lnalt, data=datos, correlation=bm)
summary(modelo1)

## Generalized least squares fit by REML
##   Model: Lnote ~ Lnalt 
##   Data: datos 
##         AIC     BIC   logLik
##   -68.33596 -64.034 37.16798
## 
## Correlation Structure: corBrownian
##  Formula: ~1 
##  Parameter estimate(s):
## numeric(0)
## 
## Coefficients:
##                   Value  Std.Error   t-value p-value
## (Intercept) -0.17975456 0.14940859 -1.203107  0.2380
## Lnalt        0.05475373 0.02162382  2.532103  0.0166
## 
##  Correlation: 
##       (Intr)
## Lnalt -0.879
## 
## Standardized residuals:
##        Min         Q1        Med         Q3        Max 
## -1.4790545 -1.0293796 -0.6849539 -0.4535629  2.0893357 
## 
## Residual standard error: 0.1289145 
## Degrees of freedom: 33 total; 31 residual

Look at the information tha this object contains. It should be very similar (in fact, almost identical) to what we see after performing a standard, OLS linear regression.

It is also possible to conduct this analysis while relaxing the assumption of Brownian motion. The most simple generalization of the Brownian model is one with one additional parameter to scale the expected covariance under pairs of species. This model is called the λ model of Pagel (Pagel's λ). When λ = 0 the covariance between species is zero and this corresponds to a non-phylogenetic regression. By contrast, when &lambda = 1, the evolution of the residual error is Brownian.

Let's try, then, to repeat our analysis with the λ model.

modelo2<-gls(Lnote~Lnalt, data=datos, correlation=corPagel(1,arbol.cortado))
summary(modelo2)

## Generalized least squares fit by REML
##   Model: Lnote ~ Lnalt 
##   Data: datos 
##         AIC       BIC   logLik
##   -67.82689 -62.09094 37.91345
## 
## Correlation Structure: corPagel
##  Formula: ~1 
##  Parameter estimate(s):
##   lambda 
## 1.015045 
## 
## Coefficients:
##                   Value  Std.Error   t-value p-value
## (Intercept) -0.16751498 0.15009041 -1.116094  0.2730
## Lnalt        0.05278484 0.02146686  2.458899  0.0197
## 
##  Correlation: 
##       (Intr)
## Lnalt -0.869
## 
## Standardized residuals:
##        Min         Q1        Med         Q3        Max 
## -1.4063402 -0.9813780 -0.6620451 -0.4260508  2.0281622 
## 
## Residual standard error: 0.1340445 
## Degrees of freedom: 33 total; 31 residual

What value of λ did you obtain? Did the value of the slope change between the two models?

We can also conduct multiple regression with the same framework. Let's now look at the relationship between body size & various different parameters of song.

modelo3<-gls(Lnote~Lnalt+wing, data=datos, correlation=corPagel(1, arbol.cortado))
summary(modelo3)

## Generalized least squares fit by REML
##   Model: Lnote ~ Lnalt + wing 
##   Data: datos 
##        AIC       BIC   logLik
##   -68.5887 -61.58271 39.29435
## 
## Correlation Structure: corPagel
##  Formula: ~1 
##  Parameter estimate(s):
##   lambda 
## 1.018194 
## 
## Coefficients:
##                  Value Std.Error   t-value p-value
## (Intercept) -1.4726160 0.5303303 -2.776790  0.0094
## Lnalt        0.0665875 0.0208823  3.188697  0.0033
## wing         0.2723970 0.1062074  2.564766  0.0156
## 
##  Correlation: 
##       (Intr) Lnalt 
## Lnalt -0.487       
## wing  -0.964  0.276
## 
## Standardized residuals:
##        Min         Q1        Med         Q3        Max 
## -1.7663313 -1.3894968 -0.9822076 -0.5451497  1.4419755 
## 
## Residual standard error: 0.1279261 
## Degrees of freedom: 33 total; 30 residual

Now let's try to repeat these analyses with caper. This package funcions in a manner that it is a little bit different. We need to combine our data and tree into an object with a special class: "comparative.data". In addition to the tree, this object contains a data frame in which one column contains the names of the species. The easiest way for us to get this data matrix is just to re-read our dataset from file and change the value of the argument row.names as follows:

library(caper)

## Warning: package 'caper' was built under R version 3.3.1

## Loading required package: MASS

## Loading required package: mvtnorm

datos<-read.csv("Barbetdata.csv",header=TRUE)
comp.data<-comparative.data(arbol.cortado, datos, names.col="Species", vcv.dim=2, warn.dropped=TRUE)
modelo4<-pgls(Lnote~Lnalt, data=comp.data)
summary(modelo4)

## 
## Call:
## pgls(formula = Lnote ~ Lnalt, data = comp.data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.04816 -0.01198  0.01042  0.02527  0.04437 
## 
## Branch length transformations:
## 
## kappa  [Fix]  : 1.000
## lambda [Fix]  : 1.000
## delta  [Fix]  : 1.000
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -0.179755   0.149409 -1.2031  0.23804  
## Lnalt        0.054754   0.021624  2.5321  0.01662 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02551 on 31 degrees of freedom
## Multiple R-squared: 0.1714,  Adjusted R-squared: 0.1446 
## F-statistic: 6.412 on 1 and 31 DF,  p-value: 0.01662

modelo5<-pgls(Lnote~Lnalt, data=comp.data, lambda="ML")
summary(modelo5)

## 
## Call:
## pgls(formula = Lnote ~ Lnalt, data = comp.data, lambda = "ML")
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.04816 -0.01198  0.01042  0.02527  0.04437 
## 
## Branch length transformations:
## 
## kappa  [Fix]  : 1.000
## lambda [ ML]  : 1.000
##    lower bound : 0.000, p = 1.3578e-08
##    upper bound : 1.000, p = 1    
##    95.0% CI   : (0.952, NA)
## delta  [Fix]  : 1.000
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -0.179755   0.149409 -1.2031  0.23804  
## Lnalt        0.054754   0.021624  2.5321  0.01662 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02551 on 31 degrees of freedom
## Multiple R-squared: 0.1714,  Adjusted R-squared: 0.1446 
## F-statistic: 6.412 on 1 and 31 DF,  p-value: 0.01662

modelo6<-pgls(Lnote~Lnalt+wing, data=comp.data, lambda="ML")
summary(modelo6)

## 
## Call:
## pgls(formula = Lnote ~ Lnalt + wing, data = comp.data, lambda = "ML")
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.03564 -0.01143  0.00644  0.01945  0.04477 
## 
## Branch length transformations:
## 
## kappa  [Fix]  : 1.000
## lambda [ ML]  : 1.000
##    lower bound : 0.000, p = 1.5e-08
##    upper bound : 1.000, p = 1    
##    95.0% CI   : (0.940, NA)
## delta  [Fix]  : 1.000
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -1.338053   0.547467 -2.4441 0.020617 * 
## Lnalt        0.067619   0.021240  3.1836 0.003378 **
## wing         0.240871   0.110005  2.1896 0.036464 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02408 on 30 degrees of freedom
## Multiple R-squared: 0.2856,  Adjusted R-squared: 0.2379 
## F-statistic: 5.995 on 2 and 30 DF,  p-value: 0.006449

modelo7<-pgls(patch~wing, data=comp.data, lambda="ML")
summary(modelo7)

## 
## Call:
## pgls(formula = patch ~ wing, data = comp.data, lambda = "ML")
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.0708 -0.2744  0.0189  0.5531  1.1698 
## 
## Branch length transformations:
## 
## kappa  [Fix]  : 1.000
## lambda [ ML]  : 0.752
##    lower bound : 0.000, p = 0.017467
##    upper bound : 1.000, p = 0.019576
##    95.0% CI   : (0.183, 0.978)
## delta  [Fix]  : 1.000
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  35.5609    14.4246  2.4653  0.01943 *
## wing         -6.6440     3.1915 -2.0818  0.04571 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6115 on 31 degrees of freedom
## Multiple R-squared: 0.1227,  Adjusted R-squared: 0.09435 
## F-statistic: 4.334 on 1 and 31 DF,  p-value: 0.04571

An interesting feature of caper is that it also permits us to easily visualize, for instance, a likelihood surface for our parameters. Here is the likelihood surface for the parameter λ:

lm.lk<-pgls.profile(modelo7, which="lambda")
plot(lm.lk)

plot of chunk unnamed-chunk-12

Challenge problem 2: Comparing GLS with independent contrasts regression in R

Using contrasts regression and the method of phylogenetic GLS fit a linear model for the variables wing and Lnalt. Compare the two fitted models from these different methods. In what ways are they similar or different?

The files once again are:

Written by Alejandro Gonzalez-Voyer y Liam J. Revell. Updated 27 Jun. 2016.