R is at the same time a statistical software, a scientific computing environment, and a programming language. R is distributed free and open source.

R can be intimidating, but most users of R are not doing R programming. However use of R requires that you enter commands into a command-line interface.

The two main components of an R session are *objects* and *functions*.
Objects are variables, data, and results that we have input, uploaded, or
created in memory. Functions are special types of objects that take one or more
*arguments* and then do something (e.g., create a new object, make
a visualization, or write a file).

R is built on *contributed packages*, which mostly contain libraries
of new, related R functions. Most contributed packages are stored in a public
repository called CRAN (the *Comprehensive R Archive Network*).

The best way to learn how to use R is by doing, so let's open a new R
session and proceed. For all the *code chunks* below it should be
possible to copy & paste from this window into your R session to reproduce
the operations we have conducted here. Of course you can also retype the
commands if you prefer. This helps build a familiarity with the environment
if you are new to R.

We can create objects using the *assign operator*, a symbol that points
towards the object that is being created:

```
n <- 15
n ## this tells us some information about the object, in this case its value
```

```
## [1] 15
```

```
## we can assign objects in either direction
5 -> m
m
```

```
## [1] 5
```

```
## objects are case sensitive
x <- 1
X <- 10
x
```

```
## [1] 1
```

```
X
```

```
## [1] 10
```

```
## all basic mathematical operations are available in R
(10 + 2) * 5
```

```
## [1] 60
```

```
(X / 500) * 12
```

```
## [1] 0.24
```

There are many functions that we can use to facilitate our session in R, but
one very useful one is `ls`

which lists all the R objects in the
current session:

```
name <- "Liam"
n1<-10
n2<-100
m<-0.5
ls()
```

```
## [1] "m" "n" "n1" "n2" "name" "x" "X"
```

We can also supply it with a pattern:

```
ls(pat="m") ## lists all objects whose names contain "m"
```

```
## [1] "m" "name"
```

```
## the function ls.str also displays some information about
## the objects we have in memory
ls.str()
```

```
## m : num 0.5
## n : num 15
## n1 : num 10
## n2 : num 100
## name : chr "Liam"
## x : num 1
## X : num 10
```

How did we know which arguments could be input into `ls`

or
`ls.str`

? Well, if these were functions that we were already
familiar with & we just wanted to remind ourselves, we can use the very
helpful function `args`

:

```
args(ls)
```

```
## function (name, pos = -1L, envir = as.environment(pos), all.names = FALSE,
## pattern, sorted = TRUE)
## NULL
```

```
args(ls.str)
```

```
## function (pos = -1, name, envir, all.names = FALSE, pattern,
## mode = "any")
## NULL
```

```
args(args)
```

```
## function (name)
## NULL
```

However, more typically, we might want to examine the help pages for the function. All packages (at least all packages stored on CRAN) have a help page for each function in the package. In many cases (but not always) documentation is extensive - but it may be somewhat unfamiliar to new users. Let's look at a page:

```
help(ls)
```

```
## starting httpd help server ...
```

```
## done
```

```
help.search("anova")
help.search("phylogeny")
```

There are five main types of data object in R: vector, factor, matrix, data frame, and list. All data objects have attributes and values.

Vector: a vector is a series of elements of the same type. It has two attributes: mode and length. Let's look a few vectors of different types.

```
# mode "numeric"
x<-1:5
x
```

```
## [1] 1 2 3 4 5
```

```
mode(x)
```

```
## [1] "numeric"
```

```
length(x)
```

```
## [1] 5
```

```
# mode "logical"
y<-c(FALSE,TRUE)
y
```

```
## [1] FALSE TRUE
```

```
mode(y)
```

```
## [1] "logical"
```

```
length(y)
```

```
## [1] 2
```

```
# logical vectors can also result from logical operations
x>=3
```

```
## [1] FALSE FALSE TRUE TRUE TRUE
```

```
# mode "character"
z<-c("order","superfamily","family","genus","species")
mode(z)
```

```
## [1] "character"
```

```
length(z)
```

```
## [1] 5
```

```
z
```

```
## [1] "order" "superfamily" "family" "genus" "species"
```

We can access individual elements in a vector using numerical indexing. For example:

```
z[2]
```

```
## [1] "superfamily"
```

```
z[c(1,3)]
```

```
## [1] "order" "family"
```

```
i<-c(4,5)
z[i]
```

```
## [1] "genus" "species"
```

```
z[c(1,1,1)]
```

```
## [1] "order" "order" "order"
```

```
# negative index removes the corresponding element
z[-2]
```

```
## [1] "order" "family" "genus" "species"
```

or with logical indexing, for example:

```
z[c(TRUE,FALSE,TRUE,TRUE,TRUE)]
```

```
## [1] "order" "family" "genus" "species"
```

Logical indexing combined with the function `which`

is a useful way to select
some data from a vector, but not others:

```
x<-runif(n=6,min=0,max=10)
x
```

```
## [1] 2.737364 3.270049 9.766799 5.868191 7.553235 7.995842
```

```
x>=5
```

```
## [1] FALSE FALSE TRUE TRUE TRUE TRUE
```

```
which(x>=5)
```

```
## [1] 3 4 5 6
```

```
x[x>=5]
```

```
## [1] 9.766799 5.868191 7.553235 7.995842
```

```
x[which(x>=5)]
```

```
## [1] 9.766799 5.868191 7.553235 7.995842
```

Vectors (and other R objects) sometimes but don't always have a third attribute, names. In phylogenetic analysis, our vectors will very frequently have names!

```
x<-1:5
names(x)<-z
x
```

```
## order superfamily family genus species
## 1 2 3 4 5
```

A factor is derived from a vector, but it has the additional attribute of levels:

```
f<-c("Male","Male","Female","Female","Female")
f<-factor(f)
f
```

```
## [1] Male Male Female Female Female
## Levels: Female Male
```

```
# or we could do equivalently
f<-c(0,0,1,1,1)
f<-factor(f)
levels(f)<-c("Male","Female")
table(f)
```

```
## f
## Male Female
## 2 3
```

```
summary(f)
```

```
## Male Female
## 2 3
```

A matrix is vector arranged in a tabular way. It has the additional attribute dim. This can be seen using the following example:

```
X<-matrix(1:9,3,3)
X
```

```
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
```

```
X<-1:9
dim(X)<-c(3,3)
X
```

```
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
```

```
# this is also handy
X<-matrix(1:9,3,3,byrow=TRUE)
X
```

```
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
```

We can call elements of a matrix using numerical indexing as well, in row/column order:

```
X[3,2]
```

```
## [1] 8
```

```
X[,3] # the third column
```

```
## [1] 3 6 9
```

```
X[2,] # the second row
```

```
## [1] 4 5 6
```

A data frame is a very important type of object in R as well. It looks like a matrix (although it's actually stored as a list, see below). It is the type of data object that is created by reading (say) a spreadsheet from a file.

```
Y<-data.frame(z,y=1:5,x=5:1)
Y
```

```
## z y x
## 1 order 1 5
## 2 superfamily 2 4
## 3 family 3 3
## 4 genus 4 2
## 5 species 5 1
```

Finally, a list is the most general data structure. It is the way we store all kinds of custom data types (including, and most importantly - for our purposes, phylogenetic trees). A list can be seen as a vector where each of the elements can be any kind of object. For example:

```
L<-list(z=z,1:2,Y)
length(L)
```

```
## [1] 3
```

```
names(L)
```

```
## [1] "z" "" ""
```

There are multiple ways that we can access the elements of a list:

```
L[[1]]
```

```
## [1] "order" "superfamily" "family" "genus" "species"
```

```
L$z
```

```
## [1] "order" "superfamily" "family" "genus" "species"
```

Many utilities in R exist to read & write data from & to files. We'll cover reading & writing phylogenetic data in a subsequent section, but let's first just try reading & writing a couple of different input & ouput formats.

First, let's use `read.csv`

to read in a .csv (comma delimited) file
from memory. To follow along, download the file
anole.data.csv.

```
## this is what comma delimited text looks like:
cat(readLines("anole.data.csv",10),sep="\n")
```

```
## "","SVL","HL","HLL","FLL","LAM","TL"
## "ahli",4.03913,2.88266,3.96202,3.34498,2.8662,4.504
## "alayoni",3.8157,2.70212,3.2795,2.80245,3.07527,4.07265
## "alfaroi",3.52665,2.37816,3.30542,2.48366,2.73387,4.41601
## "aliniger",4.03656,2.89884,3.64623,3.15908,3.15677,4.54173
## "allisoni",4.37539,3.35896,3.96069,3.4462,3.23921,5.05911
## "allogus",4.04014,2.86103,3.94018,3.33829,2.80827,4.52189
## "altitudinalis",3.84299,2.85273,3.25665,2.88466,3.19846,4.16762
## "alumina",3.58894,2.41783,3.44101,2.63466,2.69425,4.66676
## "alutaceus",3.55489,2.43405,3.3511,2.56994,2.78245,4.55473
```

```
## read it in
X<-read.csv("anole.data.csv",header=TRUE,row.names=1)
## let's look at a bit of this object
head(X)
```

```
## SVL HL HLL FLL LAM TL
## ahli 4.03913 2.88266 3.96202 3.34498 2.86620 4.50400
## alayoni 3.81570 2.70212 3.27950 2.80245 3.07527 4.07265
## alfaroi 3.52665 2.37816 3.30542 2.48366 2.73387 4.41601
## aliniger 4.03656 2.89884 3.64623 3.15908 3.15677 4.54173
## allisoni 4.37539 3.35896 3.96069 3.44620 3.23921 5.05911
## allogus 4.04014 2.86103 3.94018 3.33829 2.80827 4.52189
```

Finally, we can explore some basics of R programming/scripting. For instance, let's use a for loop to compute the average value for each trait across species:

```
averages<-vector(mode="numeric",length=ncol(X))
for(i in 1:ncol(X)) averages[i]<-mean(X[,i])
names(averages)<-colnames(X)
averages
```

```
## SVL HL HLL FLL LAM TL
## 4.096289 2.961037 3.826542 3.253969 3.018961 4.723669
```

R also lets us do this in multiple ways. For instance, we can also use
`apply`

family functions:

```
averages<-apply(X,2,mean)
averages
```

```
## SVL HL HLL FLL LAM TL
## 4.096289 2.961037 3.826542 3.253969 3.018961 4.723669
```

Finally, R has custom functions that can be handy here:

```
averages<-colMeans(X)
averages
```

```
## SVL HL HLL FLL LAM TL
## 4.096289 2.961037 3.826542 3.253969 3.018961 4.723669
```

In the above example, we used a for loop around very simple code, but we can also loop over multiple lines of code. Just for demonstrative purposes, let's print out the averages we have computed after each loop:

```
averages<-setNames(vector(mode="numeric",length=ncol(X)),colnames(X))
for(i in 1:ncol(X)){
averages[i]<-mean(X[,i])
print(averages[1:i])
}
```

```
## SVL
## 4.096289
## SVL HL
## 4.096289 2.961037
## SVL HL HLL
## 4.096289 2.961037 3.826542
## SVL HL HLL FLL
## 4.096289 2.961037 3.826542 3.253969
## SVL HL HLL FLL LAM
## 4.096289 2.961037 3.826542 3.253969 3.018961
## SVL HL HLL FLL LAM TL
## 4.096289 2.961037 3.826542 3.253969 3.018961 4.723669
```

```
averages
```

```
## SVL HL HLL FLL LAM TL
## 4.096289 2.961037 3.826542 3.253969 3.018961 4.723669
```

A different family of functions in addition to `for`

that can be used
to iterate over columns or rows of a matrix, or elements of a vector or list, is
the `apply`

family of functions. For instance:

```
averages<-apply(X,2,mean)
averages
```

```
## SVL HL HLL FLL LAM TL
## 4.096289 2.961037 3.826542 3.253969 3.018961 4.723669
```

The way we interpret this function call is *apply* to `X`

over the
second dimension of `X`

(the columns) the function `mean`

.
`apply`

family functions take some getting used to - but they are very
helpful.

Finally, it is straightforward to write custom functions within R to perform
idiosyncratic tasks. For instance, let's imagine that `colMeans`

does
not exist, & create a new function `col_means`

to duplicate it's operation:

```
col_means<-function(x,na.rm=TRUE){
obj<-vector(mode="numeric",length=ncol(x))
for(i in 1:ncol(x))
obj[i]<-sum(x[,i],na.rm=na.rm)/sum(!is.na(x[,i]))
setNames(obj,colnames(x))
}
averages<-col_means(X)
averages
```

```
## SVL HL HLL FLL LAM TL
## 4.096289 2.961037 3.826542 3.253969 3.018961 4.723669
```

Neat.

*Developed by Liam J. Revell. Last updated 27 Jun. 2016.*