\documentclass[a4paper]{article} %\VignetteIndexEntry{Running coalescentMCMC} %\VignettePackage{coalescentMCMC} \usepackage[utf8]{inputenc} \usepackage{fancyvrb} \usepackage{color,natbib} \newcommand{\code}{\texttt} \newcommand{\pkg}{\textsf} \newcommand{\coalmcmc}{\pkg{coalescentMCMC}} \newcommand{\ape}{\pkg{ape}} \author{Emmanuel Paradis} \title{Running Coalescent Analyses With \coalmcmc} \begin{document} \maketitle <>= options(width=60) @ \noindent Coalescent analysis is a powerful approach to investigate the demography of populations using genetic data. The coalescent is a random process describing the coalescent times of a genealogy with respect to population size and mutation rate. In the majority of cases, the genealogy of individuals within a population is unknown. So a coalescent analysis typically integrates over the likely genealogies to make inference on the dynamics of the population. This uses computer-intensive methods such as Monte Carlo simulations of Markov chains. Besides, if priors are defined on the distributions of the parameters, Bayesian inference can be done. Several methods have been proposed for such integrations, although currently there is no consensus on which method is the best, or which ones are the most appropriate in some circumstances \cite{Felsenstein2007}. \coalmcmc\ aims to provide a general framework to run coalescent analyses. In its current version, the package provides a simple MCMC algorithm based on Hastings's ratio. \coalmcmc\ has three main groups of functions that have different roles: \begin{itemize} \item the function \code{coalescentMCMC} itself which runs the chain; \item some functions doing operations on tree which are called by the previous one to move from one tree to another; \item some functions to infer demography from genealogies under various coalescent models which are typically used to analyse the output of a chain run. \end{itemize} The motivating idea behind \coalmcmc\ is that the user can have full control over the analysis. The options of the main function are: <<>>= library(coalescentMCMC) args(coalescentMCMC) @ where \code{ntrees} are the number of trees to output, and \code{tree0} is the initial tree (if not provided, a UPGMA tree from a JC69-based distance matrix is used). \code{model} is either \code{NULL} in which case $\Theta$ is assumed to be constant, or \code{"time"} in which case a model where $\Theta$ follows an exponential growth is used.\footnote{Other models will be implemented later. See the vignette ``CoalescentModels''.} Finally, \code{printevery} is an integer controlling the print frequency of the progress of the chain. The current implementation uses neighborhood rearrangement as proposed in \cite{Kuhner1995} calling the function \code{NeighborhoodRearrangement} at each step of the chain. Five other tree moves are availables taken from \cite{Drummond2002}. This can modified by using other functions described in \code{?treeOperators}. Let us now consider a very simple analysis with the woodmouse data available in \ape. For the purpose of this vignette, we run a very light analysis in order to produce small outputs in a reasonable time. <>= data(woodmouse) out <- coalescentMCMC(woodmouse, ntrees = 300, printevery = 0) @ \begin{Schunk} \begin{Sinput} > data(woodmouse) > out <- coalescentMCMC(woodmouse, ntrees = 300) \end{Sinput} \begin{Soutput} Running the Markov chain: Number of generations to run: 200 Generation Nb of accepted trees 300 162 Done. \end{Soutput} \end{Schunk} The output object is of class \code{"coda"}, so we can visualise it with the package of the same name (which has already been loaded): \setkeys{Gin}{width=\textwidth} <>= plot(out) @ The log-likelihood was relatively stable between $-1870$ and $-1880$. The trees are stored in a special place of the memory (an `environment' in R's jargon) from where they can be retrieved with a specific function: <<>>= TR <- getMCMCtrees() TR @ Note that the trees generated during the burn-in period are not output, but the corresponding values of log-likelihood and $\Theta$ are. Hence \code{out} has 400 rows. <<>>= dim(out) colnames(out) @ We now run a model of time-dependent coalescent where $\Theta$ follows an exponential change through time: <>= out2 <- coalescentMCMC(woodmouse, ntrees = 300, model = "time", printevery = 0) @ \begin{Schunk} \begin{Sinput} > out2 <- coalescentMCMC(woodmouse, ntrees = 300, model = "time") \end{Sinput} \begin{Soutput} Running the Markov chain: Number of generations to run: 300 Generation Nb of accepted trees 300 154 Done. \end{Soutput} \end{Schunk} <>= plot(out2) @ The change in log-likelihood along the chain is similar to what was observed above. The object \code{out2} has now three columns: <<>>= dim(out2) colnames(out2) @ If we try to extract the trees as previously done and R is running in interactive mode, we will be asked which list of trees to extract: \begin{Verbatim}[formatcom=\color{blue}] > getMCMCtrees() Several lists of MCMC trees are stored: 1 : TREES_001 2 : TREES_002 Return which number? \end{Verbatim} It is also possible to extract the trees of a specific chain with its number, which is useful when the R code is not run interactively (such as this vignette): <<>>= TR2 <- getMCMCtrees(2) @ The parameters of the MCMC runs are stored separately and can be extracted with: <<>>= getMCMCstats() @ We can now compare both coalescent models: the two hypotheses under consideration are: \begin{itemize} \item H$_0$: $\Theta$ is constant; \item H$_1$: $\Theta$ changes through time following an exponential model. \end{itemize} Other things that could be done with simple R commands include: \begin{itemize} \item Compute confidence intervals around $\hat{\Theta}_0$ and $\hat{\rho}$ (alternatively, posterior distributions of these parameters if a Bayesian sampling is done); \item Re-run the chain(s) with different initial trees, for instance to run branching chains taking a tree from \code{TR} or \code{TR2}. \item Use other models of time-dependent coalescent (see the other vignette in this package). \end{itemize} \bibliographystyle{plain} \bibliography{coalescentMCMC} \end{document}