--- title: "Geometric MCMC sampling with geomc" author: "Vivekananda Roy" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true pdf_document: toc: true toc_depth: 2 bibliography: references.bib link-citations: true vignette: > %\VignetteIndexEntry{Geometric MCMC sampling with geomc} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = FALSE, tidy = FALSE, comment = "#>", progress = FALSE ) ``` ## What is Geometric approach to MCMC? Geometric approach to MCMC (`geomc`) is a Riemannian manifold based method proposed by @roy2024 for constructing informative proposal distributions for Metropolis-Hastings (MH) algorithms. Unlike traditional random walk Metropolis algorithms, geometric MH sampling adapts to the Riemannian geometry of the target space. The geometric MH algorithm transports a base uninformed kernel (e.g., a random walk proposal) along geodesic paths informed by global and local approximations of the target density. `geomc` can be used to sample from both discrete and continuous distributions and this vignette illustrates its use for sampling from a target distribution of interest. The method can be effective for: - Multimodal distributions (can move between modes efficiently) - High-dimensional problems (adapts to local geometry) - Complex posterior distributions in Bayesian inference ```{r setup} # Install from GitHub (if not already installed) # devtools::install_github("vroys/geommc") library(geommc) ``` ## Basic Usage The main function is `geomc()`, which requires: - `logp`: A function that evaluates the log-target density (or a list with additional specifications) - `initial`: Starting values for the MCMC chain - `n.iter`: Number of iterations to run We denote the target desity by $\psi$. Following @roy2024, `geomc` constructs informative geometric MH proposals by perturbing a 'baseline' proposal density $f(y|x)$ along directions specified by $\mathcal{G} = \{g_1, g_2, \dots , g_k\}$, a set of $k$ pdfs representing suitable (local or global) approximations of the target density $\psi$. @roy2024 showed that the geometric MH chain dominates the MH chain corresponding the 'base' proposal with respect to different Markov chain orderings by at least a strictly positive factor. Let's start with two simple examples. ## Example 1: Sampling from a Multivariate Normal We'll sample from a bivariate normal distribution to demonstrate the basic workflow. Thus, the target density is $$\psi(y) = \phi_2(y; \mu, \Sigma),$$ where $\phi_d(y; \mu, \Sigma)$ denotes the pdf of the $d-$dimensional normal distribution with mean vector $\mu$, covariance matrix $\Sigma$, and evaluated at $y$. ```{r mvnorm-example} # Define the log-target density log_target_mvnorm <- function(x, target.mean, target.Sigma) { d <- length(x) xc <- x - target.mean Q <- solve(target.Sigma) -0.5 * drop(t(xc) %*% Q %*% xc) } # Target distribution parameters target_mean <- c(1, -2) target_Sigma <- matrix(c(1.5, 0.7, 0.7, 2.0), 2, 2) # Run geomc with default settings set.seed(3) result <- geomc( logp = log_target_mvnorm, initial = c(0, 0), n.iter = 1000, target.mean = target_mean, target.Sigma = target_Sigma ) ``` Note how we pass additional arguments (`target.mean` and `target.Sigma`) through `...` to the `logp` (log-target) function. ### Examining the Results The `geomc()` function returns a list with several components: ```{r examine-results} names(result) ``` The most important components are: - `samples`: The MCMC samples (matrix with `n.iter` rows) - `acceptance.rate`: The proportion of accepted proposals - `log.p`: Log-density values at each sample ```{r results-summary} # Sample means (should be close to target_mean) cat("Sample means:\n") print(colMeans(result$samples)) cat("\nTarget means:\n") print(target_mean) # Sample covariance (should be close to target_Sigma) cat("\nSample covariance:\n") print(cov(result$samples)) cat("\nTarget covariance:\n") print(target_Sigma) ``` ### Visualizing the Results Let's create trace, autocorrelation, density, and scatter plots to assess convergence and mixing. See @roy2020 for a review of MCMC convergence assessment methods. ```{r plots-mvnorm, fig.width=7, fig.height=6} par(mfrow = c(3, 2)) # Trace plots plot(result$samples[, 1], type = "l", main = "Trace plot", ylab = expression(y[1]), xlab = "Iteration") abline(h = target_mean[1], col = "red", lty = 2, lwd = 2) plot(result$samples[, 2], type = "l", main = "Trace plot", ylab = expression(y[2]), xlab = "Iteration") abline(h = target_mean[2], col = "red", lty = 2, lwd = 2) # Density plots plot(density(result$samples[, 1]), main = "Density", xlab = expression(y[1])) abline(v = target_mean[1], col = "red", lty = 2, lwd = 2) plot(density(result$samples[, 2]), main = "Density", xlab = expression(y[2])) abline(v = target_mean[2], col = "red", lty = 2, lwd = 2) ``` ```{r acf-plot, fig.width=6, fig.height=4} # Autocorrelation plot of samples par(mfrow = c(1, 2)) acf(result$samples[, 1], main = "Autocorrelation plot", xlab = expression(y[1]), ylab = "Autocorrelation") acf(result$samples[, 2], main = "Autocorrelation plot", xlab = expression(y[2]), ylab = "Autocorrelation") ``` ```{r scatter-plot, fig.width=6, fig.height=6} # Scatter plot of samples plot(result$samples[, 1], result$samples[, 2], pch = 16, col = rgb(0, 0, 0, 0.3), xlab = expression(y[1]), ylab = expression(y[2]), main = "MCMC Samples") points(target_mean[1], target_mean[2], col = "red", pch = 4, cex = 2, lwd = 3) legend("topright", legend = "True mean", col = "red", pch = 4, lwd = 2) ``` ## Example 2: Bayesian Inference for Normal Data Next, we consider an example involving Bayesian inference for the mean and variance of normal data. ### Model Setup Suppose we observe iid data $w_1, \ldots, w_n \sim N(\mu, \sigma^2)$ and want to infer $\mu$ and $\sigma^2$. Let us denote $(w_1, \ldots, w_n)$ by $w$. We use: - Prior on $\mu$: $\mu \sim N(\mu_0, \tau_0^2)$ - Prior on $\sigma^2$: $\sigma^2 \sim \text{Inverse-Gamma}(\alpha_0, \beta_0)$ The log-posterior (up to a constant) is: $$\log p(\mu, \sigma^2 | w) \propto -\frac{n}{2}\log(\sigma^2) - \frac{1}{2\sigma^2}\sum_{i=1}^n(w_i - \mu)^2 - \frac{(\mu - \mu_0)^2}{2\tau_0^2} - (\alpha_0 + 1)\log(\sigma^2) - \frac{\beta_0}{\sigma^2},$$ and the target density to sample from is $\psi(\mu, \sigma^2)= p(\mu, \sigma^2 | w)$. ```{r normal-inference} # Generate data set.seed(42) true_mu <- 5 true_sigma <- 2 n <- 100 w <- rnorm(n, mean = true_mu, sd = true_sigma) cat("True parameters:\n") cat(" mu =", true_mu, "\n") cat(" sigma =", true_sigma, "\n") cat(" sigma^2 =", true_sigma^2, "\n\n") cat("Sample statistics:\n") cat(" mean =", round(mean(w), 3), "\n") cat(" sd =", round(sd(w), 3), "\n") # Define log-posterior log_target <- function(par, x, mu0, tau0, alpha0, beta0) { mu <- par[1] sigma2 <- par[2] # Check constraint if (sigma2 <= 0) return(-Inf) n <- length(x) SSE <- sum((x - mu)^2) val <- -(n/2) * log(sigma2) - SSE / (2 * sigma2) val <- val - (mu - mu0)^2 / (2 * tau0^2) val <- val - (alpha0 + 1) * log(sigma2) - beta0 / sigma2 return(val) } # Hyperparameters (weakly informative priors) mu0 <- 0 # prior mean for mu tau0 <- 10 # prior sd for mu alpha0 <- 2.01 # shape parameter for inverse-gamma beta0 <- 1.01 # scale parameter for inverse-gamma # Run geomc set.seed(3) result <- geomc( logp = log_target, initial = c(0, 1), # Starting at mu=0, sigma^2=1 n.iter = 1000, x = w, mu0 = mu0, tau0 = tau0, alpha0 = alpha0, beta0 = beta0 ) burnin<-50 ``` ### Trace plots ```{r trace-plots-normal, fig.width=7, fig.height=6} par(mfrow = c(1, 2)) # Trace plots plot(result$samples[, 1], type = "l", main = expression(paste("Trace plot: ", mu)), ylab = expression(mu), xlab = "Iteration") abline(v = burnin, col = "gray", lty = 2) abline(h = true_mu, col = "red", lty = 2, lwd = 2) plot(result$samples[, 2], type = "l", main = expression(paste("Trace plot: ", sigma^2)), ylab = expression(sigma^2), xlab = "Iteration") abline(v = burnin, col = "gray", lty = 2) abline(h = true_sigma^2, col = "red", lty = 2, lwd = 2) ``` ### Posterior Summaries ```{r posterior-summaries} burnin <- 50 # Remove burn-in (first 500 samples) samples_post_burnin <- result$samples[-(1:burnin), ] # Posterior mean estimates mu_post_mean <- mean(samples_post_burnin[, 1]) sigma2_post_mean <- mean(samples_post_burnin[, 2]) sigma_post_mean <- mean(sqrt(samples_post_burnin[, 2])) cat("Posterior mean estimates:\n") cat(" mu:", round(mu_post_mean, 3), "\n") cat(" sigma^2:", round(sigma2_post_mean, 3), "\n") cat(" sigma:", round(sigma_post_mean, 3), "\n\n") # 95% credible intervals mu_ci <- quantile(samples_post_burnin[, 1], c(0.025, 0.975)) sigma2_ci <- quantile(samples_post_burnin[, 2], c(0.025, 0.975)) cat("95% Credible Intervals:\n") cat(" mu: [", round(mu_ci[1], 3), ",", round(mu_ci[2], 3), "]\n") cat(" sigma^2: [", round(sigma2_ci[1], 3), ",", round(sigma2_ci[2], 3), "]\n") ``` ### Visualizing the Posterior ```{r posterior-plots, fig.width=7, fig.height=6} layout(matrix(c(1,2,3), nrow = 1), widths = c(4, 4, 2)) par(mar = c(4, 4, 2, 1)) plot(density(samples_post_burnin[,1]), main=expression(mu), xlab=expression(mu)) abline(v = true_mu, col="red", lty=2, lwd=2) abline(v = mu_post_mean, col="blue", lty=2, lwd=2) plot(density(samples_post_burnin[,2]), main=expression(sigma^2), xlab=expression(sigma^2)) abline(v = true_sigma^2, col="red", lty=2, lwd=2) abline(v = sigma2_post_mean, col="blue", lty=2, lwd=2) par(mar = c(0,0,0,0)) plot.new() legend("center", legend=c("True value","Posterior mean est."), col=c("red","blue"), lty=2, lwd=2, bty="n") ``` ## Understanding the Output The `geomc()` function returns a list with these components: ```{r output-structure} #- **samples**: Matrix of MCMC samples (rows = iterations, columns = parameters) head(result$samples) #- **log.p**: Log-density values at each sample # Trace plot of log-density values plot(result$log.p[-(1:burnin)], type = "l", main = "Trace plot of log-density values", ylab = "log.p", xlab = "Iteration") #- **acceptance.rate**: Proportion of accepted proposals cat("\nAcceptance rate:", round(result$acceptance.rate, 3), "\n") #- **var.base**: Variance of the random walk base density used cat("Variance of the random walk base=", result$var.base, "\n") #- **mean.ap.tar**: Mean of the approximate target density cat("Mean of the approximate target:\n") print(result$mean.ap.tar) #- **var.ap.tar**: Variance of the approximate target density cat("\nVariance of the approximate target:\n") print(result$var.ap.tar) #- **model.case**: Which model configuration was used cat("model configuration:\n") print(result$model.case) #- **gaus**: Whether Gaussian densities for both base and ap.tar were assumed cat("Whether Gaussian densities for both base and ap.tar were assumed=", result$gaus, "\n") #- **ind**: Whether independence was assumed for both base and ap.tar cat("Whether independence was assumed for both base and ap.tar=", result$ind, "\n") ``` Next, we consider more examples to demonstrate different non-default settings and features of the `geomc` function, including: 1. **Custom base densities** - Specifying your own base density 2. **Custom approximate target densities** - Guiding proposals toward specific regions 3. **Bhattacharyya coefficient** - Providing a function to compute the Bhattacharyya coefficient when `gaus` is FALSE (for example, when the target is a discrete distribution) 4. **Importance sampling** - Alternative to **3.** and numerical integration to compute the Bhattacharyya coefficient when `gaus` is FALSE 5. **Mixture models** - Sampling from multimodal target distributions 6. **Discrete distributions** - Sampling from non-continuous targets 7. **Comparison with other methods** - Understanding `geomc`'s advantages over random walk Metropolis We will revisit the **Example 2** later in this document to discuss some of these advanced features of the `geomc` function. Now, we consider some multi-modal target examples. ## Example 3: Univariate Mixture of Normals Let's start with a bimodal distribution: a 50-50 mixture of two normal distributions. In particular, the target density is $$\psi(y) = 0.5 \cdot \phi_1(y; 0, 1) + 0.5 \cdot \phi_1(y; 10, 1.4^2).$$ This is a challenging target to sample from because the modes are well-separated. ## Univariate Mixture of Normals: Run geomc with Default Settings ```{r mixture-default} # Define the log-target log_target_mixture <- function(y) { log(0.5 * dnorm(y) + 0.5 * dnorm(y, mean = 10, sd = 1.4)) } # Run geomc with default settings set.seed(3) result <- geomc( logp = list(log.target = log_target_mixture), initial = 0, n.iter = 1000 ) ``` ### Visualizing the Results ```{r mixture-default-plot, fig.width=7, fig.height=8} par(mfrow = c(2, 1)) # Trace plot plot(result$samples, type = "l", main = "Trace plot: Mixture of Normals", ylab = "y", xlab = "Iteration") abline(h = c(0, 10), col = c("blue", "red"), lty = 2) # Histogram with true density overlay hist(result$samples, breaks = 50, freq = FALSE, main = "Posterior samples vs True density", xlab = "y", col = "lightblue", border = "white") # Add true density y_grid <- seq(-5, 15, length.out = 1000) true_density <- 0.5 * dnorm(y_grid) + 0.5 * dnorm(y_grid, mean = 10, sd = 1.4) lines(y_grid, true_density, col = "red", lwd = 2) legend("topright", legend = "True density", col = "red", lwd = 2) ``` **Observation**: Even with default settings, `geomc` successfully explores both modes! ## Univariate Mixture of Normals: Run geomc with Custom Random Walk Base Density Now let's explicitly specify a random walk base density. This gives us more control over the proposal mechanism. In particular, we use a random walk proposal density $\phi_1(y; x, 4)$ as the base density $f(y|x)$. ```{r mixture-custom-base} set.seed(123) result_custom <- geomc( logp = list( log.target = log_target_mixture, mean.base = function(x) x, # Centered at current state var.base = function(x) 4 # Variance = 4 ), initial = 0, n.iter = 1000 ) #Note the variance of the random walk base density of the default setting cat("Variance of the default geomc =", result$var.base, "\n") cat("The acceptance rate of the default geomc:\n") result$acceptance.rate cat("The acceptance rate of geomc with the custom base density:\n") result_custom$acceptance.rate ``` Note that the scale of the default random walk base is higher than that of the custom random walk base resulting in lower acceptance rates. ```{r mixture-custom-plot, fig.width=7, fig.height=4} par(mfrow = c(1, 2)) hist(result$samples, breaks = 50, freq = FALSE, main = "Default settings", xlab = "y", col = "lightblue", border = "white", ylim = c(0, 0.25)) lines(y_grid, true_density, col = "red", lwd = 2) hist(result_custom$samples, breaks = 50, freq = FALSE, main = "Custom random walk base", xlab = "y", col = "lightgreen", border = "white", ylim = c(0, 0.25)) lines(y_grid, true_density, col = "red", lwd = 2) ``` ## Univariate Mixture of Normals: Run geomc with Informed Approximate Targets We can dramatically improve performance by providing approximate target densities that match the true target structure. We will use two approximate denisites $\mathcal{G} =\{g_1, g_2\}$, where $g_1(y) = \phi(y; 0, 1)$ and $g_2(y) = \phi_1(y; 10, 1.4^2)$. Thus, the set of approximate denisites, $\mathcal{G} = \{g_1, g_2\}$ is a set of two normal densities centered at the different local modes of the target. ```{r mixture-informed} set.seed(123) result_informed <- geomc( logp = list( log.target = log_target_mixture, # Specify the two densities g_1 and g_2 mean.ap.tar = function(x) c(0, 10), # Means of the two densities var.ap.tar = function(x) c(1, 1.4^2) # Variances of the two densities ), initial = 0, n.iter = 1000 ) ``` Note that, in contrast to all earlier examples in which a single approximate target density is used by default, this case enforces two approximate Gaussian target densities. ```{r mixture-informed-plot, fig.width=7, fig.height=8} par(mfrow = c(3, 1)) plot(result$samples, type = "l", main = "Default settings", ylab = "y", xlab = "Iteration") abline(h = c(0, 10), col = c("blue", "red"), lty = 2) plot(result_custom$samples, type = "l", main = "Custom random walk", ylab = "y", xlab = "Iteration") abline(h = c(0, 10), col = c("blue", "red"), lty = 2) plot(result_informed$samples, type = "l", main = "Informed approximate targets", ylab = "y", xlab = "Iteration") abline(h = c(0, 10), col = c("blue", "red"), lty = 2) ``` **Key insight**: With informed approximate targets, the sampler moves between modes very efficiently! ## Univariate Mixture of Normals: Run geomc with Informed Approximate Targets along with a Custom Random Walk Base Density Users can provide both one or more approximate target densities, as well as a custom base density in `geomc`. Since the base and approximate target densities are normal, we can simply specify these by their means and variances. ```{r mixture-informed-apbase} set.seed(123) result_informed_apbase <- geomc( logp = list( log.target = log_target_mixture, mean.base = function(x) x, # specify the base mean var.base = function(x) 4, # specify the base variance # Specify the two densities g_1 and g_2 mean.ap.tar = function(x) c(0, 10), # Means of the two densities var.ap.tar = function(x) c(1, 1.4^2) # Variances of the two densities ), initial = 0, n.iter = 1000 ) # Compare the values of model.case cat("The model configuration under default setting=\n") print(result$model.case) cat("The model configuration under result_custom\n") print(result_custom$model.case) cat("The model configuration under result_informed\n") print(result_informed$model.case) cat("The model configuration under result_informed_apbase\n") print(result_informed_apbase$model.case) ``` ```{r mixture-informed-apbase-plot, fig.width=7, fig.height=8} par(mfrow = c(3, 1)) plot(result$samples, type = "l", main = "Default settings", ylab = "y", xlab = "Iteration") abline(h = c(0, 10), col = c("blue", "red"), lty = 2) plot(result_informed$samples, type = "l", main = "Informed approximate targets", ylab = "y", xlab = "Iteration") abline(h = c(0, 10), col = c("blue", "red"), lty = 2) plot(result_informed_apbase$samples, type = "l", main = "Informed approximate targets and custom base", ylab = "y", xlab = "Iteration") abline(h = c(0, 10), col = c("blue", "red"), lty = 2) ``` ## Univariate Mixture of Normals: Run geomc with other choices of **eps** `geomc` also allows tuning the parameter `eps`. As mentioned in @roy2024, `eps` controls the degree of perturbation of the base density along the directions defined by the approximate targets. In particular, larger `eps` puts less weight on the base density. On the other hand, as `eps` $\downarrow 0$, the transition density of the geometric MH chain converges to that of the MH chain with base proposal density. By default, `eps` is fixed at the moderate value of 0.5, but it can be set at any other proper fraction. ```{r mixture-eps} set.seed(123) result_informed_apbase_eps_large <- geomc( logp = list( log.target = log_target_mixture, mean.base = function(x) x, var.base = function(x) 4, # Specify the two densities g_1 and g_2 mean.ap.tar = function(x) c(0, 10), # Means of the two densities var.ap.tar = function(x) c(1, 1.4^2) # Variances of the two densities ), initial = 0, n.iter = 1000, eps = 0.9 ) set.seed(123) result_informed_apbase_eps_small <- geomc( logp = list( log.target = log_target_mixture, mean.base = function(x) x, var.base = function(x) 4, # Specify the two densities g_1 and g_2 mean.ap.tar = function(x) c(0, 10), # Means of the two densities var.ap.tar = function(x) c(1, 1.4^2) # Variances of the two densities ), initial = 0, n.iter = 1000, eps = 0.01 ) set.seed(123) result_rwm_mix <- geommc:::rwm( log_target_mixture, initial = 0, n_iter = 1000, sig = 4, return_sample = TRUE ) cat("Geometric MCMC:\n") cat("With default eps:\n") cat(" Acceptance rate:", round(result_informed_apbase$acceptance.rate, 3), "\n") cat("With eps =0.9:\n") cat(" Acceptance rate:", round(result_informed_apbase_eps_large$acceptance.rate, 3), "\n") cat("With eps=0.01:\n") cat(" Acceptance rate:", round(result_informed_apbase_eps_small$acceptance.rate, 3), "\n") cat("Random Walk Metropolis:\n") cat(" Acceptance rate:", round(result_rwm_mix$acceptance_rate, 3), "\n") ``` ```{r mixture-informed-apbase-eps-plot, fig.width=7, fig.height=11} par(mfrow = c(2, 1)) plot(result_informed_apbase$samples, type = "l", main = "Informed approximate targets and custom base (Default eps)", ylab = "y", xlab = "Iteration") abline(h = c(0, 10), col = c("blue", "red"), lty = 2) plot(result_informed_apbase_eps_large$samples, type = "l", main = "Informed approximate targets and custom base (eps=0.9)", ylab = "y", xlab = "Iteration") abline(h = c(0, 10), col = c("blue", "red"), lty = 2) plot(result_informed_apbase_eps_small$samples, type = "l", main = "Informed approximate targets and custom base (eps=0.01)", ylab = "y", xlab = "Iteration") abline(h = c(0, 10), col = c("blue", "red"), lty = 2) plot(result_rwm_mix$samples, type = "l", main = "Random walk Matropolis", ylab = "y", xlab = "Iteration") abline(h = c(0, 10), col = c("blue", "red"), lty = 2) ``` We observe that `geomc` with a small `eps` value has similar performance as the random walk Metropolis (the MH chain with the base proposal density). ## Univariate Mixture of Normals: Run geomc with another (non-Gaussian) Informed Approximate Target So far we have used one or two Gaussian approximate target densities. Next, we consider a non-normal approximate target $g=\psi,$ the target density itself. Since $\langle \sqrt{f}, \sqrt{g} \rangle$ is no more available in closed form, it is estimated by numerical integration. ```{r mixture-informed_another} # Sampler that draws from the mixture density g samp.ap.tar_mixture <- function(x, kk = 1) { component <- sample.int(2, 1, prob = c(0.5, 0.5)) if (component == 1) { return(rnorm(1)) } else { return(10 + 1.4 * rnorm(1)) } } set.seed(123) result_informed_another <- geomc( logp = list( log.target = log_target_mixture, dens.ap.tar=function(y,x) 0.5*dnorm(y)+0.5*dnorm(y,mean=10,sd=1.4), samp.ap.tar=samp.ap.tar_mixture ), initial = 0, n.iter = 1000, gaus = FALSE # Not using Gaussian assumption ) ``` Note that, the argument `gaus` is required to be set as FALSE (not the default value). ```{r mixture-informed-another-plot, fig.width=7, fig.height=8} par(mfrow = c(3, 1)) plot(result$samples, type = "l", main = "Default settings", ylab = "y", xlab = "Iteration") abline(h = c(0, 10), col = c("blue", "red"), lty = 2) plot(result_informed$samples, type = "l", main = "Informed normal approximate targets", ylab = "y", xlab = "Iteration") abline(h = c(0, 10), col = c("blue", "red"), lty = 2) plot(result_informed_another$samples, type = "l", main = "Informed mixture approximate target", ylab = "y", xlab = "Iteration") abline(h = c(0, 10), col = c("blue", "red"), lty = 2) ``` ## Univariate Mixture of Normals: Importance Sampling to estimate inner products for non-Gaussian densities Instead of numerical integration to compute $\langle \sqrt{f}, \sqrt{g} \rangle$, we can use importance sampling. We can also specify a custom base density. ```{r mixture-importance} set.seed(123) result_imp <- geomc( logp = list( log.target = log_target_mixture, dens.base = function(y, x) dnorm(y, mean = x, sd = 2), samp.base = function(x) x + 2 * rnorm(1), dens.ap.tar = function(y, x) 0.5 * dnorm(y) + 0.5 * dnorm(y, mean = 10, sd = 1.4), samp.ap.tar = samp.ap.tar_mixture ), initial = 0, n.iter = 1000, gaus = FALSE, # Not using Gaussian assumption imp = list( enabled = TRUE, n.samp = 50, samp.base = FALSE # Draw samples from approximate target density ), show.progress = FALSE ) ``` ```{r importance-plot, fig.width=7, fig.height=8} par(mfrow = c(3, 1)) plot(result_informed$samples, type = "l", main = "Informed normal approximate target", ylab = "y", xlab = "Iteration") abline(h = c(0, 10), col = c("blue", "red"), lty = 2) plot(result_informed_another$samples, type = "l", main = "Informed mixture approximate target (numerical integration)", ylab = "y", xlab = "Iteration") abline(h = c(0, 10), col = c("blue", "red"), lty = 2) plot(result_imp$samples, type = "l", main = "Informed mixture approximate target (importance sampling)", ylab = "y", xlab = "Iteration") abline(h = c(0, 10), col = c("blue", "red"), lty = 2) ``` As discussed in @roy2024, use of a non-normal approximate target density results in extra run time due to computation of $\langle \sqrt{f}, \sqrt{g} \rangle$ by numerical integration or importance sampling in every iteration of the Markov chain, unless the Bhattacharyya coefficient is provided via the argument `bhat.coef`. ## Example 4: Bivariate Mixture of Normals Next, we consider a bivariate multi-modal target density. In particular, we consider the following bivariate mixture of normals: $$\psi(y) = 0.5 \cdot \phi_2\Bigg(y; \begin{pmatrix}0\\0\end{pmatrix}, I\Bigg) + 0.5 \cdot \phi_2\Bigg(y; \begin{pmatrix}10\\10\end{pmatrix}, 2I\Bigg).$$ ## Bivariate Mixture of Normals: Run geomc with Informed Approximate Targets along with a Custom Random Walk Base Density We first run `geomc` with the base density $f(y|x) = \phi_2(y; x, 2I)$ and two approximate denisites: $g_1(y) = \phi_2(y; \mu_1, \Sigma_1)$ and $g_2(y) = \phi_2(y; \mu_2, \Sigma_2)$. Thus, we use a random walk proposal as the base density and the set of approximate denisites, $\mathcal{G} = \{g_1, g_2\}$ is a set of two Gaussian densities centered at the different local modes of the target. ```{r bivariate-mixture} # Define log-target for bivariate mixture log_target_mvnorm_mix <- function(x, mean1, Sigma1, mean2, Sigma2) { ldens1 <- geommc:::ldens_mvnorm(x, mean1, Sigma1) ldens2 <- geommc:::ldens_mvnorm(x, mean2, Sigma2) return(log(0.5 * exp(ldens1) + 0.5 * exp(ldens2))) } # Parameters for the two components mean1 <- c(0, 0) Sigma1 <- diag(2) mean2 <- c(10, 10) Sigma2 <- 2 * diag(2) # Run geomc with informed approximate targets set.seed(123) result_biv <- geomc( logp = list( log.target = log_target_mvnorm_mix, mean.base = function(x) x, var.base = function(x) 2 * diag(2), mean.ap.tar = function(x) c(0, 0, 10, 10), # Means of both densities var.ap.tar = function(x) cbind(diag(2), 2 * diag(2)) # Covariances specified by cbind ), initial = c(5, 5), # Start between the modes n.iter = 1000, show.progress = FALSE, mean1 = mean1, Sigma1 = Sigma1, mean2 = mean2, Sigma2 = Sigma2 ) cat("Acceptance rate:", round(result_biv$acceptance.rate, 3), "\n") cat("\nSample means:\n") print(colMeans(result_biv$samples)) cat("\nExpected mean: (5, 5)\n") ``` ### Visualizing Bivariate Mixture Results ```{r bivariate-plot, fig.width=8, fig.height=8} par(mfrow = c(2, 2)) # Trace plots plot(result_biv$samples[, 1], type = "l", main = "Trace plot: y1", ylab = expression(y[1]), xlab = "Iteration") abline(h = 5, col = "red", lty = 2) plot(result_biv$samples[, 2], type = "l", main = "Trace plot: y2", ylab = expression(y[2]), xlab = "Iteration") abline(h = 5, col = "red", lty = 2) # Marginal densities plot(density(result_biv$samples[, 1]), main = "Marginal density: y1", xlab = expression(y[1])) plot(density(result_biv$samples[, 2]), main = "Marginal density: y2", xlab = expression(y[2])) ``` ```{r bivariate-scatter, fig.width=7, fig.height=7} # Scatter plot plot(result_biv$samples[, 1], result_biv$samples[, 2], pch = 16, col = rgb(0, 0, 0, 0.3), xlab = expression(y[1]), ylab = expression(y[2]), main = "Bivariate Mixture Samples") points(mean1[1], mean1[2], col = "blue", pch = 4, cex = 3, lwd = 3) points(mean2[1], mean2[2], col = "red", pch = 4, cex = 3, lwd = 3) legend("topright", legend = c("Mode 1", "Mode 2"), col = c("blue", "red"), pch = 4, lwd = 2) ``` ## Bivariate Mixture of Normals: Run geomc with a diffuse $g$ Next, we run `geomc` with the default base density and a diffuse choice of $g$, namely $$g(y) = \phi_2\Bigg(y; \begin{pmatrix}0\\0\end{pmatrix}, 400I\Bigg).$$ ```{r bivariate-mixture-diffuse} # Run geomc with diffuse g set.seed(123) result_biv_diffuse <- geomc( logp = list( log.target = log_target_mvnorm_mix, mean.ap.tar = function(x) c(0, 0), var.ap.tar = function(x) 400*diag(2) ), initial = c(5, 5), # Start between the modes n.iter = 1000, mean1 = mean1, Sigma1 = Sigma1, mean2 = mean2, Sigma2 = Sigma2 ) cat("Acceptance rate:", round(result_biv_diffuse$acceptance.rate, 3), "\n") cat("\nSample means:\n") print(colMeans(result_biv_diffuse$samples)) cat("\nExpected mean: (5, 5)\n") ``` ### Visualizing Bivariate Mixture Results ```{r bivariate-plot-diffuse, fig.width=8, fig.height=8} par(mfrow = c(2, 2)) # Trace plots plot(result_biv_diffuse$samples[, 1], type = "l", main = "Trace plot: y1", ylab = expression(y[1]), xlab = "Iteration") abline(h = 5, col = "red", lty = 2) plot(result_biv_diffuse$samples[, 2], type = "l", main = "Trace plot: y2", ylab = expression(y[2]), xlab = "Iteration") abline(h = 5, col = "red", lty = 2) # Marginal densities plot(density(result_biv_diffuse$samples[, 1]), main = "Marginal density: y1", xlab = expression(y[1])) plot(density(result_biv_diffuse$samples[, 2]), main = "Marginal density: y2", xlab = expression(y[2])) ``` We see that even with a diffuse density for $g$, `geomc` successfully moves between the two local modes. ## Bivariate Mixture of Normals: Comparison with Random Walk Metropolis Let's compare the performance of `geomc` with that of a standard random walk Metropolis algorithm on this bivariate mixture target. For this comparison, we use the same scale for the random walk Metropolis as the base random walk density used while running `geomc` with a custom base. ```{r rwm-comparison} # Run random walk Metropolis set.seed(123) result_rwm <- geommc:::rwm( log_target = function(x) { log_target_mvnorm_mix(x, mean1, Sigma1, mean2, Sigma2) }, initial = c(5, 5), n_iter = 1000, sig = 2, return_sample = TRUE ) cat("Random Walk Metropolis:\n") cat(" Acceptance rate:", round(result_rwm$acceptance_rate, 3), "\n") cat(" Sample means:", round(colMeans(result_rwm$samples), 3), "\n\n") cat("Geometric MCMC:\n") cat(" Acceptance rate:", round(result_biv$acceptance.rate, 3), "\n") cat(" Sample means:", round(colMeans(result_biv$samples), 3), "\n") ``` ```{r rwm-plot, fig.width=7, fig.height=7} par(mfrow = c(1, 2)) # Random Walk Metropolis plot(result_rwm$samples[, 1], result_rwm$samples[, 2], pch = 16, col = rgb(0, 0, 0, 0.3), xlab = expression(y[1]), ylab = expression(y[2]), main = "Random Walk Metropolis", xlim = c(-5, 15), ylim = c(-5, 15)) points(mean1[1], mean1[2], col = "blue", pch = 4, cex = 3, lwd = 3) points(mean2[1], mean2[2], col = "red", pch = 4, cex = 3, lwd = 3) # Geometric MCMC plot(result_biv$samples[, 1], result_biv$samples[, 2], pch = 16, col = rgb(0, 0, 0, 0.3), xlab = expression(y[1]), ylab = expression(y[2]), main = "Geometric MCMC", xlim = c(-5, 15), ylim = c(-5, 15)) points(mean1[1], mean1[2], col = "blue", pch = 4, cex = 3, lwd = 3) points(mean2[1], mean2[2], col = "red", pch = 4, cex = 3, lwd = 3) ``` **Key observation**: Random walk Metropolis often gets trapped in one mode, while geomc successfully explores both modes, resulting in estimates of the means close to their true values. ## Example 2 Continued: Bayesian Inference for Normal Data with MALA Base Density So far, we have used only random-walk base densities for both the default and custom settings. We now revisit Example 2, which involves Bayesian inference for the mean and variance of normal data, to demonstrate the use of a custom Metropolis-Adjusted Langevin Algorithm (MALA) base density that leverages gradients of the target posterior density. We reparameterize: $(\mu, \sigma^2) \rightarrow \theta \equiv (\mu, \eta)$ with $\eta = \log \sigma^2$ and use the base density $$ f(\theta'|\theta) = \phi_2(\theta'; \theta + \delta \nabla \log \psi(\theta)/2, \delta I), $$ where $\psi(\theta)$ is the target posterior density of $\theta$. ### Model Setup ```{r normal-inference-MALA} # Generate data set.seed(42) true_mu <- 5 true_sigma <- 2 n <- 100 w <- rnorm(n, mean = true_mu, sd = true_sigma) # Define log-posterior # Log posterior for theta = (mu, eta = log(sigma2)) log_target_normal <- function(theta, x, mu0, tau0, alpha0, beta0) { mu <- theta[1] eta <- theta[2] sigma2 <- exp(eta) n <- length(x) SSE <- sum((x - mu)^2) val <- -(n/2)*eta - SSE/(2*sigma2) val <- val - (mu - mu0)^2/(2*tau0^2) val <- val - (alpha0 + 1)*eta - beta0/sigma2 return(val) } # Gradient of log posterior wrt (mu, eta) grad_log_target_normal <- function(theta, x, mu0, tau0, alpha0, beta0) { mu <- theta[1] eta <- theta[2] sigma2 <- exp(eta) n <- length(x) SSE <- sum((x - mu)^2) # d/dmu grad_mu <- (sum(x) - n*mu)/sigma2 - (mu - mu0)/tau0^2 # d/deta grad_eta <- -(n/2) - (alpha0 + 1) grad_eta <- grad_eta + SSE/(2*sigma2) + beta0/sigma2 return(c(grad_mu, grad_eta)) } # Hyperparameters set as before (weakly informative priors) mu0 <- 0 # prior mean for mu tau0 <- 10 # prior sd for mu alpha0 <- 2.01 # shape parameter for inverse-gamma beta0 <- 1.01 # scale parameter for inverse-gamma #step-size for MALA proposal step <- 0.001 mala_mean<- function(theta) theta+(step/2)*grad_log_target_normal(theta, w, mu0, tau0, alpha0, beta0) # Run geomc set.seed(123) result <- geomc( logp = list(log.target=log_target_normal, mean.base= function(theta) mala_mean(theta), var.base=function(theta) step*diag(2)), initial = c(0, log(1)), # Starting at mu=0, eta= log(1) n.iter = 1000, x = w, mu0 = mu0, tau0 = tau0, alpha0 = alpha0, beta0 = beta0 ) ``` ### Trace plots ```{r trace-plots-normal-mala, fig.width=7, fig.height=6} par(mfrow = c(1, 2)) # Trace plots plot(result$samples[, 1], type = "l", main = expression(paste("Trace plot: ", mu)), ylab = expression(mu), xlab = "Iteration") abline(v = burnin, col = "gray", lty = 2) abline(h = true_mu, col = "red", lty = 2, lwd = 2) plot(exp(result$samples[, 2]), type = "l", main = expression(paste("Trace plot: ", sigma^2)), ylab = expression(sigma^2), xlab = "Iteration") abline(v = burnin, col = "gray", lty = 2) abline(h = true_sigma^2, col = "red", lty = 2, lwd = 2) ``` ## Example 5: Discrete Distribution (Binomial) `geomc` can also sample from discrete target distributions. We consider an example with a Binomial target. Thus, the target pmf is $$ \psi(y) = \binom{m}{y} p^y (1-p)^{m-y}, \quad y = 0, 1, 2, \ldots, m. $$ ## Discrete Distribution (Binomial): Run geomc with Random Walk Base (and user defined `bhat.coef` function) For sampling from a discrete target distribution, we must provide a base pmf $f$ and one or more $g_i$'s. First, we consider a random walk base and a non-informative uniform approximate target pmf with $$ f(y|x) = \begin{cases} 0.5 & \text{if } |y - x| = 1 \\ 0.5 & \text{if } x \in \{0, m\} \text{ and } y = x \\ 0 & \text{otherwise} \end{cases} \quad \text{for } x, y \in \{0, 1, \ldots, m\}, $$ and $$ g(y) = \begin{cases} \frac{1}{m+1} & \text{if } y \in \{0, 1, \ldots, m\} \\ 0 & \text{otherwise}. \end{cases} $$ ```{r binomial-rw} # Binomial parameters size <- 5 #m=size true_prob <- 0.3 #p=0.3 # Define discrete random walk base density dens.base <- function(y, x){ if (x > 0 && x < size) { if (y == x - 1 || y == x + 1) { return(0.5) } else { return(0) } } if (x == 0) { if (y == 0 || y == 1) { return(0.5) } else { return(0) } } if (x == size) { if (y == size - 1 || y == size) { return(0.5) } else { return(0) } } return(0) } samp.base<- function(x) { if (x > 0 && x < size) { return(x + sample(c(-1, 1), 1)) } if (x == 0) { return(sample(c(0, 1), 1)) } if (x == size) { return(sample(c(size - 1, size), 1)) } } # Define approximate target as the discrete uniform distribution dens.ap.tar <- function(y, x) 1 / (size + 1) samp.ap.tar <- function(x, kk = 1) sample(0:size, 1) # Define the Bhattacharyya coefficients \langle \sqrt{f(\cdot|x)}, #\sqrt{g(\cdot|x)} \rangle bhat.coef=function(x) sqrt(2 / (size + 1)) set.seed(123) result_binom_rw <- geomc( logp = list( log.target = function(y) dbinom(y, size, true_prob, log = TRUE), dens.base = dens.base, samp.base = samp.base, dens.ap.tar = dens.ap.tar, samp.ap.tar = samp.ap.tar, bhat.coef=bhat.coef ), initial = 2, n.iter = 1000, gaus = FALSE, # Not Gaussian ) cat("Acceptance rate:", round(result_binom_rw$acceptance.rate, 3), "\n") ``` Note that for sampling from discrete target distributions, user must set `gaus` as FALSE and either provide the `bhat.coef` function or set `imp$enabled` as TRUE (these are not the default values of `gaus` and `imp$enabled`). This ensures that the algorithm uses importance sampling rather than numerical integration or closed-form Gaussian expressions for computing $\langle \sqrt{f}, \sqrt{g} \rangle$ for discrete targets. In this example, the Bhattacharyya coefficient is available in closed form, and is provided to the call to `geomc` through the `bhat.coef` function. ### Comparing with True Distribution ```{r binomial-plot-rw, fig.width=7, fig.height=5} # Observed frequencies obs_freq_rw <- table(factor(result_binom_rw$samples, levels = 0:size)) obs_prop_rw <- obs_freq_rw / sum(obs_freq_rw) # True probabilities true_prob_vals <- dbinom(0:size, size, true_prob) # Plot comparison barplot(rbind(obs_prop_rw, true_prob_vals), beside = TRUE, names.arg = 0:size, col = c("lightblue", "red"), main = "Binomial Distribution: Samples vs Truth", xlab = "Value", ylab = "Probability", legend.text = c("Sampled", "True"), args.legend = list(x = "topright")) ``` ```{r binomial-table-rw} # Detailed comparison comparison <- data.frame( Value = 0:size, Sampled = as.numeric(obs_prop_rw), True = true_prob_vals, Difference = as.numeric(obs_prop_rw) - true_prob_vals ) print(round(comparison, 4)) ``` ## Discrete Distribution (Binomial): Run geomc with Reflecting Random Walk Base (and user defined `bhat.coef` function) Next, we consider a reflecting random walk base pmf with $$ f(y|x) = \begin{cases} 0.5 & \text{if } x \in \{1, 2, \ldots, m-1\} \text{ and } |y - x| = 1 \\ 1 & \text{if } (x = 0 \text{ and } y = 1) \text{ or } (x = m \text{ and } y = m - 1) \\ 0 & \text{otherwise} \end{cases} $$ As before, we use the non-informative uniform approximate target pmf. As in the previous case, the Bhattacharyya coefficient is available in closed form, and is provided to the call to `geomc` through the `bhat.coef` function. ```{r binomial-rrw} # Binomial parameters size <- 5 true_prob <- 0.3 # Define discrete reflecting random walk base density dens.base <- function(y, x){ if (x > 0 && x < size) { if (y == x - 1 || y == x + 1) { return(0.5) } else { return(0) } } if (x == 0) { if (y == 1) { return(1) } else { return(0) } } if (x == size) { if (y == size - 1) { return(1) } else { return(0) } } return(0) } samp.base<- function(x) { if (x == 0) { return(1) } if (x == size) { return(size - 1) } return(x + sample(c(-1, 1), 1)) } # Define approximate target as the discrete uniform distribution dens.ap.tar <- function(y, x) 1 / (size + 1) samp.ap.tar <- function(x, kk = 1) sample(0:size, 1) # Define the Bhattacharyya coefficients \langle \sqrt{f(\cdot|x)}, #\sqrt{g(\cdot|x)} \rangle bhat.coef<- function(x) { ifelse(x == 0 || x == size, 1, sqrt(2)) / sqrt(size + 1) } set.seed(123) result_binom_rrw <- geomc( logp = list( log.target = function(y) dbinom(y, size, true_prob, log = TRUE), dens.base = dens.base, samp.base = samp.base, dens.ap.tar = dens.ap.tar, samp.ap.tar = samp.ap.tar, bhat.coef = bhat.coef ), initial = 2, n.iter = 1000, gaus = FALSE, # Not Gaussian ) cat("Acceptance rate:", round(result_binom_rrw$acceptance.rate, 3), "\n") ``` ### Comparing with True Distribution ```{r binomial-plot-rrw, fig.width=7, fig.height=5} # Observed frequencies obs_freq_rrw <- table(factor(result_binom_rrw$samples, levels = 0:size)) obs_prop_rrw <- obs_freq_rrw / sum(obs_freq_rrw) # True probabilities true_prob_vals <- dbinom(0:size, size, true_prob) # Plot comparison barplot(rbind(obs_prop_rrw, true_prob_vals), beside = TRUE, names.arg = 0:size, col = c("lightblue", "red"), main = "Binomial Distribution: Samples vs Truth", xlab = "Value", ylab = "Probability", legend.text = c("Sampled", "True"), args.legend = list(x = "topright")) ``` ```{r binomial-table-rrw} # Detailed comparison comparison <- data.frame( Value = 0:size, Sampled = as.numeric(obs_prop_rrw), True = true_prob_vals, Difference = as.numeric(obs_prop_rrw) - true_prob_vals ) print(round(comparison, 4)) ``` We observe that the estimated probabilities obtained by `geomc` with the proposal of either of the two random walks as its base pmf $f$ and the discrete uniform pmf as $g$ are close to the target probabilities. `geomc` with these choices of $f$ and $g$ can also be used for sampling from other discrete target distributions. ## Discrete Distribution (Binomial): Run geomc with Uniform Base (an example with `ind` set as TRUE) Finally, we consider a uniform base pmf, that is $$ f(y|x) = \begin{cases} \frac{1}{m+1} & \text{if } y \in \{0, 1, \ldots, m\} \\ 0 & \text{otherwise}, \end{cases} $$ and the approximate target pmf $g$ is another Binomial pmf. In particular, $$ g(y) = \binom{m}{y} p_1^y (1-p_1)^{m-y}, \quad y = 0, 1, 2, \ldots, m, $$ for some $p_1 \neq p$. ```{r binomial} # Binomial parameters size <- 5 true_prob <- 0.3 # Define discrete uniform base density dens.base <- function(y, x) 1 / (size + 1) samp.base <- function(x) sample(0:size, 1) # Define approximate target (another binomial with different probability) dens.ap.tar <- function(y, x) dbinom(y, size, 0.7) #p_1=0.7 samp.ap.tar <- function(x, kk = 1) rbinom(1, size, 0.7) set.seed(123) result_binom <- geomc( logp = list( log.target = function(y) dbinom(y, size, true_prob, log = TRUE), dens.base = dens.base, samp.base = samp.base, dens.ap.tar = dens.ap.tar, samp.ap.tar = samp.ap.tar ), initial = 2, n.iter = 1000, ind = TRUE, # Base and approximate target don't depend on current state gaus = FALSE, # Not Gaussian imp = list( enabled = TRUE, n.samp = 1000, samp.base = TRUE #samples from the base density is used in the importance sampling ) ) cat("Acceptance rate:", round(result_binom$acceptance.rate, 3), "\n") ``` Note that since the base and approximate target don't depend on current state, `ind` is set as TRUE implying $\langle f, g \rangle$ is computed only once instead of in every MCMC iteration. Also since the `bhat.coef` function is not provided (although the Bhattacharyya coefficient is available in closed form), we set `imp$enabled` as TRUE (not the default value). This ensures that the algorithm uses importance sampling rather than numerical integration or closed-form Gaussian expressions for computing $\langle \sqrt{f}, \sqrt{g} \rangle$ for discrete targets. ### Comparing with True Distribution ```{r binomial-plot, fig.width=7, fig.height=5} # Observed frequencies obs_freq<- table(factor(result_binom$samples, levels = 0:size)) obs_prop <- obs_freq / sum(obs_freq) # True probabilities true_prob_vals <- dbinom(0:size, size, true_prob) # Plot comparison barplot(rbind(obs_prop, true_prob_vals), beside = TRUE, names.arg = 0:size, col = c("lightblue", "red"), main = "Binomial Distribution: Samples vs Truth", xlab = "Value", ylab = "Probability", legend.text = c("Sampled", "True"), args.legend = list(x = "topright")) ``` ```{r binomial-table} # Detailed comparison comparison <- data.frame( Value = 0:size, Sampled = as.numeric(obs_prop), True = true_prob_vals, Difference = as.numeric(obs_prop) - true_prob_vals ) print(round(comparison, 4)) ``` ## Summary of Non-Default Settings ### When to Use Custom Specifications 1. **Custom base density**: When you have an idea of appropriate scale for random walk, or want to use a heavy-tailed or a non-random walk (e.g. MALA) proposal 2. **Custom approximate target**: When you know the structure of your target (e.g., mixture components, multimodality) 3. **Bhattacharyya coefficient**: When `gaus = FALSE`, the Bhattacharyya coefficient should be provided via the function `bhat.coef`, if it is available in closed form or if computing it without numerical integration or importance sampling is preferred. 4. **Importance sampling**: When `gaus = FALSE`, `bhat.coef` is not provided, and dimension is not small so that numerical integration is expensive or target is a discrete distribution 5. **Discrete distributions**: Set `gaus = FALSE` and provide appropriate base and ap.tar discrete densities. 6. **Base and approximate targets don't depend on the current state**: If both base and approximate targets don't depend on the current state, `ind` should be set TRUE to avoid unnecessary inner product computation in every Markov chain iteration. ## References