R: Simulate stochastic character maps on a phylogenetic tree or...

make.simmap {phytools}

R Documentation

Simulate stochastic character maps on a phylogenetic tree or trees

Description

This function performs stochastic mapping using several methods.

For Q="empirical", it first fits a continuous-time reversible Markov model for the evolution of x and then simulates stochastic character histories using that model and the tip states on the tree. This is the same procedure that is described in Bollback (2006), except that simulation is performed using a fixed value of the transition matrix, Q, instead of by sampling Q from its posterior distribution.

For Q="mcmc", it first samples Q nsim times from the posterior probability distribution of Q using MCMC, then it simulates nsim stochastic maps conditioned on each sampled value of Q.

For Q set to a matrix, it samples stochastic mappings conditioned on the fixed input matrix.

Usage

make.simmap(tree, x, model="SYM", nsim=1, ...)

Arguments

`tree`	a phylogenetic tree as an object of class `"phylo"`, or a list of trees as an object of class `"multiPhylo"`.
`x`	a vector containing the tip states for a discretely valued character, or a matrix containing the prior probabilities of tip states in rows.
`model`	a character string containing the model - options as in `ace`.
`nsim`	number of simulations. If `tree` is an object of class `"multiPhylo"`, then `nsim` simulations will be conducted per tree.
`...`	optional arguments. So far, `pi` gives the prior distribution on the root node of the tree - options are `"equal"`, `"estimated"`, or a vector with the frequencies. If `pi="estimated"` then the stationary distribution is estimated by numerically solving `pi*Q=0` for `pi`, and this is used as a prior on the root. Defaults to `pi="equal"` which results in the root node being sampled from the conditional scaled likelihood distribution at the root. `message` tells whether or not to print a message containing the rate matrix, Q and state frequencies. Defaults to `message=TRUE`. For optional argument `Q="mcmc"` the mean value of `Q` from the posterior sample is printed. `tol` gives the tolerance for zero elements in `Q`. (Elements less then `tol` will be reset to `tol`). `Q` can be a string (`"empirical"` or `"mcmc"`), or a fixed value of the transition matrix, `Q`. If `"empirical"` than a single value of `Q`, the most likely value, is used for all simulations. If `"mcmc"`, then `nsim` values of `Q` are first obtained from the posterior distribution for `Q` using Bayesian MCMC, then a simulated stochastic character map is generated for each value of `Q`. `vQ` a single numeric value or a vector containing the (normal) sampling variances for the MCMC. The order of `vQ` is assumed to be in the order of the `index.matrix` in `ace` for the chosen model. `prior` a list containing `alpha` and `beta` parameters for the gamma prior distribution on the transition rates in `Q`. Note that `alpha` and `beta` can be single values or vectors, if different priors are desired for each value in `Q`. As for `vQ`, the order of `prior` is assumed to be the order of `index.matrix` in `ace`. `prior` can also be given the optional logical value `use.empirical` which tells the function whether or not to give the prior distribution the empirical mean for `Q`. If `TRUE` then only `prior$beta` is used and `prior$alpha` is set equal to `prior$beta` times the empirical mean of `Q`. `burnin` and `samplefreq` are burn-in and sample frequency for the MCMC, respectively.

Details

Uses code modified from ace (by Paradis et al.) to perform Felsenstein's pruning algorithm & compute the likelihood.

As of phytools>=0.2-33 x can be a vector of states or a matrix containing the prior probabilities of tip states in rows. In this case the column names of x should contain the states, and the row names should contain the tip names.

Note that there was a small (but potentially significant) bug in how node states were simulated by make.simmap in versions of phytools<=0.2-26. Between phytools 0.2-26 and 0.2-36 there was also a bug for asymmetric models of character change (e.g., model="ARD"). Finally, between phytools 0.2-33 and phytools 0.2-47 there was an error in use of the conditional likelihoods for the root node, which caused the root node of the tree to be sampled incorrectly. All of these issues should be fixed in the present version.

Q="mcmc" and Q set to a fixed value were introduced to phytools >= 0.2-53. As of the present version of phytools, this method is still somewhat experimental & should be used with caution.

If tree is an object of class "multiPhylo" then nsim stochastic maps are generated for each input tree.

Value

A modified phylogenetic tree of class "phylo" (or a modified "multiPhylo" object, for nsim > 1) with the following additional elements:

`maps`	a list of named vectors containing the times spent in each state on each branch, in the order in which they occur.
`mapped.edge`	a matrix containing the total time spent in each state along each edge of the tree.
`Q`	the assumed or sampled value of `Q`.
`logL`	the log-likelihood of the assumed or sampled `Q`.

Author(s)

Liam Revell liam.revell@umb.edu

References

Bollback, J. P. (2006) Stochastic character mapping of discrete traits on phylogenies. BMC Bioinformatics, 7, 88.

Huelsenbeck, J. P., R. Neilsen, and J. P. Bollback (2003) Stochastic mapping of morphological characters. Systematic Biology, 52, 131-138.

Paradis, E., J. Claude, and K. Strimmer (2004) APE: Analyses of phylogenetics and evolution in R language. Bioinformatics, 20, 289-290.

Revell, L. J. (2012) phytools: An R package for phylogenetic comparative biology (and other things). Methods Ecol. Evol., 3, 217-223.