%% How to compile by hand, not in package: % R --vanilla % library(knitr) % knit("intro.Rnw") %knit2pdf("intro.Rnw") %do not ever type .tex!!!! %after creating intro.pdf: %comment out the first code chunk (options) %delete .bbl and .blg files %copy the pdf to inst/doc %build %R CMD check --as-cran %ship \documentclass[11pt]{article} %\VignetteEngine{knitr::knitr} %\VignetteIndexEntry{Using mcmcse} %\VignettePackage{mcmcse} %\VignetteKeywords{Markov chain Monte Carlo, batch means} \usepackage{amsmath} \usepackage[sort,longnamesfirst]{natbib} \usepackage{verbatim} % useful for program listings \usepackage{amsfonts, framed} \usepackage[small,compact]{titlesec} \title{An Introduction to Estimating Monte Carlo Standard Errors with R Package \texttt{mcmcse} } \author{Dootika Vats, Kushagra Gupta} \begin{document} \maketitle \setlength\parindent{0pt} \tableofcontents \break \section{Introduction} The R package \texttt{mcmcse} provides estimates of Monte Carlo standard errors for Markov chain Monte Carlo (MCMC) when estimating means or quantiles of functions of the MCMC output. In addition to MCMC output, the package can be used for time series and other correlated processes. The package is predominantly useful after MCMC output has been obtained by the user. In addition to estimating the Monte Carlo standard errors, the package also provides univariate and multivariate estimates of effective sample size and tools to determine whether enough Monte Carlo samples have been obtained. The package also samples a stationary Markov chain from a bivariate normal target. There are also some graphical tools to ascertain the behavior of the Monte Carlo estimates. %<>= %library(knitr) %opts_chunk$set(comment = NA,background='white') %opts_knit$set(out.format = "latex") %knit_theme$set("seashell") %@ \bigskip \section{An MCMC Example} To illustrate the use of our package, we consider sampling from a bivariate normal distribution using a Gibbs sampler. For $\omega_1, \omega_2 > 0$ and $\rho$ such that $\rho^2 < \omega_1 \omega_2$, the target distribution is \[ \left(\begin{array}{c} X_1 \\ X_2 \end{array} \right) \sim N \begin{pmatrix} \begin{pmatrix} \mu_{1}\\ \mu_{2} \end{pmatrix}, \begin{pmatrix} \omega_1 & \rho \\ \rho & \omega_2 \end{pmatrix} \end{pmatrix} \; . \] The Gibbs sampler updates the Markov chain by using the following full conditional distributions transition equations: \[ X_{1} \mid X_{2} \sim N\left(\mu_{1} + \dfrac{\rho}{\omega_2}\left(X_{2} - \mu_{2}\right) , \omega_1 - \dfrac{\rho^{2}}{\omega_2}\right) \; , \] \[ X_{2} \mid X_{1} \sim N\left(\mu_{2} + \dfrac{\rho}{\omega_1}\left(X_{1} - \mu_{1}\right) , \omega_2 - \dfrac{\rho^{2}}{\omega_1}\right) \; . \] The function \texttt{BVN\_Gibbs} in the package draws samples from the above model. <>= library(mcmcse) mu = c(2, 50) sigma = matrix(c(1, 0.5, 0.5, 1), nrow = 2) # Monte Carlo sample size is N N <- 5e3 set.seed(100) chain <- BVN_Gibbs(n = N, mu = mu, sigma = sigma) @ For using the \texttt{mcmcse} package the rows of the MCMC output should store each iteration of the algorithm and so the output should have $n$ rows and $p$ columns. We will denote each row $i$ of the output as $y_i = (y^{(1)}_i, y^{(2)}_i)$. <>= colnames(chain) <- c("Y1", "Y2") @ <>= #Rows has observations (samples) and each comlumn is a component. head(chain) @ \bigskip This vignette will discuss estimating two sets of features of interest of $F$. \begin{itemize} \item $\text{E}_F y$: For estimating $\mu = \text{E}_Fy$, the estimator is the Monte Carlo sample mean \[ \mu_n = \dfrac{1}{n} \displaystyle \sum_{t=1}^{n} y_t.\] In \texttt{R}, $\mu_n$ is obtained using the usual \texttt{colMeans} function. If $p = 1$, then use \texttt{mean} instead of \texttt{colMeans}. <>= colMeans(chain) @ \item $\text{E}_F \left(y^{(1)2} + y^{(2)2} \right)$: When interested in estimating the sum of the second moments of each component of $y$, define the function $g: \mathbb{R}^2 \to \mathbb{R}$ as $g(x_1,x_2) = x_1^2 + x_2^2$. This is defined in \texttt{R} by creating a function that implements the function $g$, row-wise. <>= g <- function(x) { return(sum(x^2)) } @ The Monte Carlo estimator for $g$ is \[ \mu_{g,n} = \dfrac{1}{n} \displaystyle \sum_{t=1}^{n} g(y_t),\] <>= # Apply the function g to each row gofy <- apply(chain, 1, g) # Monte Carlo estimate mean(gofy) @ \end{itemize} Thus, to obtain Monte Carlo estimates from MCMC output, the base package is sufficient (generally). However, Monte Carlo estimates must be reported with Monte Carlo standard error. That is, if the following central limit theorems hold \begin{equation} \label{eq:clt} \sqrt{n}(\mu_n - \text{E}_F y) \overset{d}{\to} N_p(0, \Sigma)\,, \end{equation} and \begin{equation} \label{eq:g_clt} \sqrt{n}(\mu_{g,n} - \text{E}_F [||y||^2]) \overset{d}{\to} N_p(0, \Sigma_g)\,, \end{equation} then estimates of $\Sigma$ and $\Sigma_g$ must be reported, directly or indirectly. Since the samples obtained are correlated, these quantities require more sophisticated tools than usual sample estimators. (Note that a Markov chain CLT is not always guaranteed to hold. In fact, it depends on the rate of convergence of the Markov chain. Most of the functions in this package assume that a Markov chain CLT holds. Such an assumption is also made when using many of the convergence diagnostics). \bigskip \section{Estimating Monte Carlo Standard Error} In this package, the functions \texttt{mcse}, \texttt{mcse.mat}, \texttt{mcse.multi}, and \texttt{mcse.initseq} estimate the Monte Carlo standard error of $\mu_n$ (or $\mu_{g,n}$). \begin{itemize} \item \texttt{mcse}: consistent estimates of $\sqrt{\Sigma/n}$ (standard error) when $\Sigma$ is $1 \times 1$. \item \texttt{mcse.mat}: consistent estimates of the square root of the diagonals of $\Sigma/n$. \item \texttt{mcse.multi}: consistent estimates of $\Sigma$. \item \texttt{mcse.initseq}: asymptotically conservative estimates of $\Sigma$ using initial sequence estimators. \end{itemize} Using the \texttt{mcmcse} package we can estimate $\Sigma$ in \eqref{eq:clt} with the \texttt{mcse.multi} and \texttt{mcse.initseq} function. <>= # Batch means estimator mcerror_bm <- mcse.multi(x = chain, method = "bm", r = 1, size = NULL, g = NULL, adjust = TRUE, blather = TRUE) # Overlapping batch means estimator mcerror_obm <- mcse.multi(x = chain, method = "obm", r = 1, size = NULL, g = NULL, adjust = TRUE, blather = TRUE) # Spectral variance estimator with Bartlett window mcerror_bart <- mcse.multi(x = chain, method = "bartlett", r = 1, size = NULL, g = NULL, adjust = TRUE, blather = TRUE) # Spectral variance estimator with Tukey window mcerror_tuk <- mcse.multi(x = chain, method = "tukey", r = 1, size = NULL, g = NULL, adjust = TRUE, blather = TRUE) # Initial sequence estimator, unadjusted mcerror_is <- mcse.initseq(x = chain, g = NULL, adjust = FALSE, blather = TRUE) # Initial sequence estimator, adjusted mcerror_isadj <- mcse.initseq(x = chain, g = NULL, adjust = TRUE, blather = TRUE) @ \begin{itemize} \item \texttt{x} takes the $n \times p$ MCMC data. \texttt{x} can take only numeric entries in the form of a matrix or data frame. The rows of \texttt{x} are the iterations of the MCMC. \item \texttt{method = bm, obm, bartlett, tukey} calculates the estimate using the batch means method and spectral variance methods with the modified-Bartlett and Tukey-Hanning windows. \item \texttt{r} is the lugsail parameter that indicates how much to "lift" the lag window (this also applies to \texttt{bm} and \texttt{obm}). Higher values will increasingly remove underestimation of $\Sigma$ but may yield more variable estimators. Values more than 5 are not advised and negative values are not allowed. Reasonable choices are $r = 1, 2, 3$, where $r = 3$ yields the lugsail estimator, $r = 2$ is the flat-top estimator, and $r = 1$ is the vanilla estimator. \item \texttt{size} is the batch size for the \texttt{bm} method and the truncation point for \texttt{tukey} and \texttt{bartlett} methods. Default batch size is calulcated using the exported \texttt{batchSize} function. Other accepted values are \texttt{size = sqroot}, which sets the size as $\lfloor \sqrt{n} \rfloor$ and \texttt{size = cuberoot} which sets it at $\lfloor n^{1/3} \rfloor$. An integer value of \texttt{size} less than $n$ is also valid as long as $n/\texttt{size} > 1$. For reference on \texttt{batchSize} see \cite{liu:vats:2021}. For reference on \texttt{bm} (batch means estimators) see \cite{jones2006fixed} and \cite{vats:fleg:jones:2017b}. For reference on \texttt{bartlett} and \texttt{tukey} (spectral variance estimators)see \cite{flegal2010batch} and \cite{vats2015strong}. For reference on lugsail estimation see \cite{liu:fleg:2018} and \cite{vats:fleg:2018}. \item \texttt{g} is a function that is applied to each row of \texttt{x} and represents the features of interest of the process. Since here we are interested in only means, \texttt{g} is \texttt{NULL}. \texttt{g} will be explained in later examples. \item \texttt{adjust} is a logical argument indicating whether the resulting matrix should be adjusted in order to retain positive-definiteness. By default this is set to be \texttt{TRUE}. \item \texttt{blather} when TRUE outputs under the hood information about the estimation process. The default is set to \texttt{FALSE} since most users should be interested in only \texttt{cov} and \texttt{est}. For reference on \texttt{mcse.initseq} (initial sequence estimators) see \cite{dai2017multivariate}. \end{itemize} \texttt{mcse.multi} and \texttt{mcse.initseq} return an S3 class with multiple components. When \texttt{blather = FALSE}, \texttt{cov} stores the estimate of $\Sigma$ obtained using the method chosen, \texttt{est} stores the estimate of the mean of $g$ applied on the Markov chain. In addition, \texttt{nsim} stores the no. of Markov chain samples, \texttt{eigen\_values} stores the eigen values of the estimated $\Sigma$ and \texttt{cov.adj} stores the adjusted covariance matrix if \texttt{adjust = TRUE} (see \texttt{mcse.multi} for more details). When \texttt{blather = TRUE} the following are returned in addition to the above: \texttt{size} which indicates the size of batches/truncation, \texttt{method} used, \texttt{Adjustment-used} indicating whether an adjusted estimator was used (\texttt{adjust}) and \texttt{message} containing additional information about the estimation process (like the numerical adjustments possibly made to keep the estimate mathematically consistent). \bigskip \textbf{Note: } The Monte Carlo estimates of $\mu$ are not affected by the choice of the method. \bigskip \textbf{Note: } For consistent estimation, the batch means estimators are significantly faster to calculate than the spectral variance estimators. The user is advised to use the default \texttt{method = "bm"} for large input matrices. \bigskip \textbf{Note: }\texttt{cov} returns an estimate of $\Sigma$ and not $\Sigma/n$. \bigskip If the diagonals of $\Sigma$ are $\sigma_{ii}^2$, the function \texttt{mcse} and \texttt{mcse.mat} returns $\sigma_{ii}/\sqrt{n}$. \texttt{mcse} does it for one component and \texttt{mcse.mat} does it for all diagonals. <>= mcse(x = chain[,1], method = "bm", g = NULL) mcse.mat(x = chain, method = "bm", g = NULL) @ In order to estimate $\mu_{n,g}$ and $\Sigma_g$ as in \eqref{eq:g_clt}, we use the \texttt{R} function \texttt{g} we had defined before. Recall that \texttt{g} should be a function that takes vector inputs. <>= g mcerror_g_bm <- mcse.multi(x = chain, g = g, blather = TRUE) mcerror_g_is <- mcse.initseq(x = chain, g = g, blather = TRUE) mcerror_g_bm$cov # Initial Sequence error is larger than batch means, as expected. mcerror_g_is$cov # Returned value is asymptotic variance. # So we calculate the standard error here. sqrt(mcerror_g_bm$cov/N) sqrt(mcerror_g_is$cov/N) @ \bigskip \section{Confidence Regions} Using the function \texttt{confRegion} in the package, the user can create joint confidence regions for two parameters. The input for this function is the output list from the \texttt{mcse.multi} or \texttt{mcse.initseq} function. The function uses the attributes \text{cov}, \texttt{est}, and \texttt{nsim} from the output list. If the \texttt{mcse.initseq} is input and \texttt{adjust = TRUE} had been used, then \texttt{cov.adj} is used instead of \texttt{cov}. \text{mcse.multi} also uses the attribute \texttt{size}. <>= plot(confRegion(mcerror_bm, which = c(1,2), level = .90), type = 'l', asp = 1) lines(confRegion(mcerror_bart, which = c(1,2), level = .90), col = "red") @ \begin{itemize} \item \texttt{which} should be a vector of size 2 that indicates the two components for which the confidence ellipse is to be constructed. \item \texttt{level} is the confidence level of the confidence region. The default is .95 \end{itemize} \bigskip \textbf{NOTE: } \texttt{confRegion} calls on the function \texttt{ellipse} in package \texttt{ellipse} to draw the ellipse. \bigskip \textbf{NOTE: } Since the confidence region is created for two parameters only, the size of the ellipse is determined by setting $p = 2$ irrespective of the original dimension of the problem. \bigskip To determine the effect of the confidence level, we draw two regions with different confidence levels. We use \texttt{mcse.initseq} this time. <>= plot(confRegion(mcerror_is, which = c(1,2), level = .95), type = 'l', asp = 1) lines(confRegion(mcerror_is, which = c(1,2), level = .90), col = "red") @ \section{Effective Sample Size} Reporting $p \times p$ covariance matrix estimates is impractical and uninterpretable. The motivation of estimating Monte Carlo standard error is to ensure that said error is small. This is essentially the idea behind estimating effective sample size and ensuring that the estimated effective sample size is larger than a prespecified lower bound. Before sampling the Markov chain, the user is advised to use the function \texttt{minESS} to ascertain what is the minimum effective sample size needed for stable analysis. See \cite{vats:fleg:jones:2017b} for theoretical details. <>= # For mu minESS(p = 2, alpha = .05, eps = .05) #For mu_g minESS(p = 1, alpha = .05, eps = .05) @ \begin{itemize} \item \texttt{p} is the dimension of the estimation problem. \item \texttt{alpha} is the confidence level \item \texttt{eps} is the tolerance level. Default is .05. Reasonable levels are anywhere from .01 to .05. The smaller the tolerance, the larger the minimum effective samples. \texttt{eps} represents a tolerance level relative to the variability in the target distribution. It is akin to the idea of margin-of-error. \end{itemize} \texttt{minESS} is independent of the Markov chain or process, and is only a function of $p$, $\alpha$, and $\epsilon$. The user should find \texttt{minESS} and then sample their process until the required minimum samples are achieved. Alternatively, we often don't have the luxury of obtaining a lot of samples, and reaching a minimum effective sample size is not possible. In such a scenario, it is useful to know the $\epsilon$ tolerance level the number of estimated effective samples correspond to. So if we can only obtain 1000 effective samples, <>= # For mu minESS(p = 2, alpha = .05, ess = 1000) #For mu_g minESS(p = 1, alpha = .05, ess = 1000) @ Thus, if you obtained a sample with estimated effective sample size equaling 1000 for estimating $\mu_g$ and $\mu_{n,g}$, then the precision level of your estimate is $\epsilon = .137$ and $\epsilon = .124$, respectively. \texttt{multiESS} and \texttt{ess} are two functions that calculate the effective sample size of a correlated sample. \texttt{ess} calculations are based on \cite{gong2015practical} and is component-wise, and \texttt{multiESS} utilizes the multivariate nature of the problem. %<>= %ess(chain) %@ Since \texttt{ess} produces a different estimate for each component, conservative practice dictates choosing the smallest of the values. \texttt{multiESS} returns one estimate of the effective sample size based on the whole sample. The function calls \texttt{mcse.multi} function to obtain a batch means estimate of $\Sigma$. The user can provide another estimate of $\Sigma$ using the \texttt{covmat} argument. <>= multiESS(chain) # Using spectral variance estimators multiESS(chain, covmat = mcerror_bart$cov) # Using initial sequence estimators # Since this is a conservative estimator, ess will be smaller multiESS(chain, covmat = mcerror_is$cov) @ Since the effective sample size is less than the minimum effective samples, we should simulate more. Looking at the formula of ESS, we might need around $10,000$ Monte Carlo samples. <>= set.seed(100) chain <- BVN_Gibbs(1e4, mu, sigma) # larger than 7529 multiESS(chain) # larger than 7529 multiESS(chain, covmat = mcerror_bart$cov) # larger than 7529 multiESS(chain, covmat = mcerror_is$cov) @ So no matter which estimator we choose for the Monte Carlo standard error, $10,000$ Monte Carlo samples are sufficient to have $\epsilon = .05$ relative tolerance. \bigskip \textbf{NOTE:} Ideally, we want to get more samples using the last iteration of the previous Markov chain. However, \texttt{BVN\_Gibbs} does not allow user specified starting values and starts from stationarity itself, so to demonstrate the use of \texttt{minESS} and \texttt{multiESS}, we get a new sample altogether. \bigskip \section{Graphical Diagnostics} The function \texttt{estvssamp} plots the Monte Carlo estimates versus the sample size for a component of the MCMC output. This plot indicates whether the Monte Carlo estimate has stabilized. <>= estvssamp(chain[,1]) @ \bibliographystyle{apalike} \bibliography{mcse} \end{document}