Version: | 0.7.9 |
License: | GPL-3 |
Title: | Markov Chain Monte Carlo Small Area Estimation |
Type: | Package |
LazyLoad: | yes |
Encoding: | UTF-8 |
Language: | en-GB |
Description: | Fit multi-level models with possibly correlated random effects using Markov Chain Monte Carlo simulation. Such models allow smoothing over space and time and are useful in, for example, small area estimation. |
Date: | 2025-06-04 |
Depends: | R (≥ 4.1.0) |
Imports: | Matrix (≥ 1.6.2), Rcpp (≥ 0.11.0), methods, GIGrvg (≥ 0.7), loo (≥ 2.0.0), collapse |
Suggests: | dbarts, BayesLogit, lintools, splines, spdep, sf, bayesplot, coda, posterior, parallel, testthat, roxygen2, knitr, rmarkdown, survey |
LinkingTo: | Rcpp, RcppEigen, Matrix, GIGrvg |
VignetteBuilder: | knitr |
RoxygenNote: | 7.3.2 |
Collate: | 'AR1_sampler.R' 'GMRF_extension.R' 'MCMCsim.R' 'MH.R' 'tabMatrix.R' 'MatrixUtils.R' 'RcppExports.R' 'TMVN_methods.R' 'TMVN_sampler.R' 'aux_closures.R' 'cMVN_sampler.R' 'cholesky.R' 'conjugate_gradients.R' 'f_binomial.R' 'f_gamma.R' 'f_gaussian.R' 'f_multinomial.R' 'f_negbinomial.R' 'f_poisson.R' 'formulas.R' 'kronprod.R' 'mc_bart.R' 'mc_block.R' 'mc_gen.R' 'mc_gl.R' 'mc_mec.R' 'mc_offset.R' 'mc_reg.R' 'mc_vfac.R' 'mc_vreg.R' 'mcmcsae.R' 'model_eval.R' 'modelmatrix.R' 'models.R' 'parallel.R' 'plots.R' 'prediction.R' 'priors.R' 'projection.R' 'random.R' 'samplers.R' 'sbc.R' 'set_constraints.R' 'sparse_template.R' 'utils.R' |
NeedsCompilation: | yes |
Packaged: | 2025-06-04 13:38:46 UTC; hbta |
Author: | Harm Jan Boonstra [aut, cre], Grzegorz Baltissen [ctb] |
Maintainer: | Harm Jan Boonstra <hjboonstra@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-06-04 14:10:02 UTC |
Markov Chain Monte Carlo Small Area Estimation
Description
Fit multi-level models with possibly correlated random effects using MCMC.
Details
Functions to fit multi-level models with Gaussian, binomial, multinomial, negative binomial or Poisson likelihoods using MCMC. Models with a linear predictor consisting of various possibly correlated random effects are supported, allowing flexible modelling of temporal, spatial or other kinds of dependence structures. For Gaussian models the variance can be modelled too. By modelling variances at the unit level the marginal distribution can be changed to a Student-t or Laplace distribution, which may account better for outliers. The package has been developed with applications to small area estimation in official statistics in mind. The posterior samples for the model parameters can be passed to a prediction function to generate samples from the posterior predictive distribution for user-defined quantities such as finite population domain means. For model assessment, posterior predictive checks and DIC/WAIC criteria can easily be computed.
Solve Ax=b by preconditioned conjugate gradients
Description
Solve Ax=b by preconditioned conjugate gradients
Usage
CG(
b,
env,
x = 0 * b,
max.it = length(b),
e = 1e+06 * length(b),
verbose = FALSE,
...
)
Arguments
b |
right hand side vector. |
env |
environment containing at least a function |
x |
start value for the conjugate gradient algorithm. |
max.it |
maximum number of iterations. |
e |
total squared error stop criterion. |
verbose |
whether progress information is shown. |
... |
any parameters passed to |
Value
The (approximated) solution to Ax=b.
References
M.R. Hestenes and E. Stiefel (1952). Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards 49(6), 409-436.
Set options for the conjugate gradient (CG) sampler
Description
Set options for the conjugate gradient (CG) sampler
Usage
CG_control(
max.it = NULL,
stop.criterion = NULL,
preconditioner = c("GMRF", "GMRF2", "GMRF3", "identity"),
scale = 1,
chol.control = chol_control(),
verbose = FALSE
)
Arguments
max.it |
maximum number of CG iterations. |
stop.criterion |
total squared error stop criterion for the CG algorithm. |
preconditioner |
one of "GMRF", "GMRF2", "GMRF3" and "identity". |
scale |
scale parameter; only used by the "GMRF3" preconditioner. |
chol.control |
options for Cholesky decomposition, see |
verbose |
whether diagnostic information about the CG sampler is shown. |
Value
A list of options used by the conjugate gradients algorithm.
Set up a GMRF structure for a generic model component
Description
This function is used to specify a (non-default) GMRF structure
to pass to argument strucA
of function gen
.
Usage
GMRF_structure(
type = c("default", "bym2", "leroux", "Leroux"),
scale.precision = (type == "bym2"),
prior = NULL,
control = NULL
)
Arguments
type |
one of "default", "bym2" or "leroux". The default choice
corresponds to the precision matrix |
scale.precision |
whether to scale the structured precision matrix. By default
set to |
prior |
prior for the parameter |
control |
options for the Metropolis-Hastings sampler used to sample
from the full conditional distribution of parameter |
Value
An environment defining the desired GMRF structure, for use by other package functions.
References
B. Leroux, X. Lei and N. Breslow (1999). Estimation of Disease Rates in Small Areas: A New Mixed Model for Spatial Dependence. In M. Halloran and D. Berry (Eds.), Statistical Models in Epidemiology, the Environment and Clinical Trials, 135-178.
A. Riebler, S.H. Sorbye, D. Simpson and H. Rue (2016). An intuitive Bayesian spatial model for disease mapping that accounts for scaling. Statistical methods in medical research, 25(4), 1145-1165.
Compute MCMC diagnostic measures
Description
R_hat
computes Gelman-Rubin convergence diagnostics based on the MCMC output
in a model component, and n_eff
computes the effective sample sizes, .i.e.
estimates for the number of independent samples from the posterior distribution.
Usage
R_hat(dc)
n_eff(dc, useFFT = TRUE, lag.max, cl = NULL)
Arguments
dc |
a draws component (dc) object corresponding to a model parameter. |
useFFT |
whether to use the Fast Fourier Transform algorithm. Default is |
lag.max |
the lag up to which autocorrelations are computed in case |
cl |
a cluster for parallel computation. |
Value
In case of R_hat
the split R-hat convergence diagnostic for each
component of the vector parameter, and in case of n_eff
the effective
number of independent samples for each component of the vector parameter.
References
A. Gelman and D. B. Rubin (1992). Inference from Iterative Simulation Using Multiple Sequences. Statistical Science 7, 457-511.
A. Gelman, J.B. Carlin, H.S. Stern, D.B. Dunson, A. Vehtari and D.B. Rubin (2013). Bayesian Data Analysis, 3rd edition. Chapman & Hall/CRC.
Examples
ex <- mcmcsae_example()
sampler <- create_sampler(ex$model, data=ex$dat)
sim <- MCMCsim(sampler, burnin=100, n.iter=300, thin=2, n.chain=4, store.all=TRUE)
n_eff(sim$beta)
n_eff(sim$v_sigma)
n_eff(sim$v_rho)
R_hat(sim$beta)
R_hat(sim$llh_)
R_hat(sim$v_sigma)
Convert a draws component object to another format
Description
Use to_mcmc
to convert a draws component to class mcmc.list
,
allowing one to use MCMC diagnostic functions provided by package coda.
Use as.array
to convert to an array of dimension (draws, chains, parameters)
.
The array format is supported by some packages for analysis or visualisation of MCMC
simulation results, e.g. bayesplot.
Use as.matrix
to convert to a matrix, concatenating the chains.
Finally, use to_draws_array
to convert either a draws component or
(a subset of components of) an mcdraws object to a draws_array
object
as defined in package posterior.
Usage
to_mcmc(x)
to_draws_array(x, components = NULL)
## S3 method for class 'dc'
as.array(x, ...)
## S3 method for class 'dc'
as.matrix(x, colnames = TRUE, ...)
Arguments
x |
a component of an mcdraws object corresponding to a scalar or vector model parameter. |
components |
optional character vector of names of draws components in an mcdraws object.
This can be used to select a subset of components to convert to
|
... |
currently ignored. |
colnames |
whether column names should be set. |
Value
The draws component(s) coerced to an mcmc.list
object,
a draws_array
object, an array, or a matrix.
Examples
data(iris)
sampler <- create_sampler(Sepal.Length ~ reg(~ Petal.Length + Species, name="beta"), data=iris)
sim <- MCMCsim(sampler, burnin=100, n.chain=2, n.iter=200)
summary(sim)
if (require("coda", quietly=TRUE)) {
mcbeta <- to_mcmc(sim$beta)
geweke.diag(mcbeta)
}
if (require("posterior", quietly=TRUE)) {
mcbeta <- to_draws_array(sim$beta)
mcbeta
draws <- to_draws_array(sim)
str(draws)
}
str(as.array(sim$beta))
str(as.matrix(sim$beta))
# generate some example data
n <- 250
dat <- data.frame(x=runif(n), f=as.factor(sample(1:5, n, replace=TRUE)))
gd <- generate_data(~ reg(~ x + f, prior=pr_normal(precision=1), name="beta"), data=dat)
dat$y <- gd$y
sampler <- create_sampler(y ~ reg(~ x + f, name="beta"), data=dat)
sim <- MCMCsim(sampler, burnin=100, n.chain=2, n.iter=300)
str(sim$beta)
str(as.array(sim$beta))
bayesplot::mcmc_hist(as.array(sim$beta))
bayesplot::mcmc_dens_overlay(as.array(sim$beta))
# fake data simulation check:
bayesplot::mcmc_recover_intervals(as.array(sim$beta), gd$pars$beta)
bayesplot::mcmc_recover_hist(as.array(sim$beta), gd$pars$beta)
ex <- mcmcsae_example()
plot(ex$dat$fT, ex$dat$y)
sampler <- create_sampler(ex$model, data=ex$dat)
sim <- MCMCsim(sampler, burnin=100, n.chain=2, n.iter=200, store.all=TRUE)
str(sim$beta)
str(as.matrix(sim$beta))
# fake data simulation check:
bayesplot::mcmc_recover_intervals(as.matrix(sim$beta), ex$pars$beta)
bayesplot::mcmc_recover_intervals(as.matrix(sim$u), ex$pars$u)
Run a Markov Chain Monte Carlo simulation
Description
Given a sampler object this function runs a MCMC simulation and stores the
posterior draws. A sampler object for a wide class of multilevel models
can be created using create_sampler
, but users can also define
their own sampler functions, see below.
MCMCsim
allows to choose the parameters for which simulation results
must be stored. It is possible to define derived quantities that will also
be stored. To save memory, it is also possible to only store Monte Carlo
means/standard errors for some large vector parameters, say. Another
way to use less memory is to save the simulation results of large vector
parameters to file.
For parameters specified in plot.trace
trace plots or pair plots of
multiple parameters are displayed during the simulation.
Usage
MCMCsim(
sampler,
from.prior = FALSE,
n.iter = 1000L,
n.chain = 3L,
thin = 1L,
burnin = if (from.prior) 0L else 250L,
start = NULL,
store,
store.all = FALSE,
pred = NULL,
store.mean,
store.sds = FALSE,
to.file = NULL,
filename = "MCdraws_",
write.single.prec = FALSE,
verbose = TRUE,
n.progress = n.iter%/%10L,
trace.convergence = NULL,
stop.on.convergence = FALSE,
convergence.bound = 1.05,
plot.trace = NULL,
add.to.plot = TRUE,
plot.type = "l",
n.cores = 1L,
cl = NULL,
seed = NULL,
export = NULL
)
Arguments
sampler |
sampler object created by |
from.prior |
whether to sample from the prior. By default |
n.iter |
number of draws after burnin. |
n.chain |
number of independent chains. |
thin |
only every |
burnin |
number of draws to discard at the beginning of each chain. |
start |
an optional function to generate starting values or a list containing for each chain a named list of starting values. It may be used to provide starting values for some or all parameters. The sampler object's own start function, if it exists, is called to generate any starting values not provided by the user. |
store |
vector of names of parameters to store MCMC draws for. By default, simulations are
stored for all parameters returned by |
store.all |
if |
pred |
list of character strings defining derived quantities to be computed (and stored) for each draw. |
store.mean |
vector of names of parameters for which only the mean (per chain) is to be stored.
This may be useful for large vector parameters (e.g. regression residuals) for which storing complete
MCMC output would use too much memory. The function |
store.sds |
if |
to.file |
vector of names of parameters to write to file. |
filename |
name of file to write parameter draws to.
Each named parameter is written to a separate file, named |
write.single.prec |
Whether to write to file in single precision. Default is |
verbose |
if |
n.progress |
update diagnostics and plots after so many iterations. |
trace.convergence |
vector of names of parameters for which Gelman-Rubin R-hat diagnostics are printed to the screen every |
stop.on.convergence |
if |
convergence.bound |
threshold used with |
plot.trace |
character vector of parameter names for which to plot draws
during the simulation. For one or two parameters trace plots will be shown,
and if more parameters are specified the results will be displayed in a pairs
plot. For vector parameters a specific component can be selected using brackets,
e.g. |
add.to.plot |
if |
plot.type |
default is "l" (lines). |
n.cores |
the number of cpu cores to use. Default is 1, i.e. no parallel computation.
If an existing cluster |
cl |
an existing cluster can be passed for parallel computation. If |
seed |
a random seed (integer). For parallel computation it is used to independently seed RNG streams for all workers. |
export |
a character vector with names of objects to export to the workers. This may
be needed for parallel execution if expressions in |
Details
A sampler object is an environment containing data and functions to use
for sampling. The following elements of the sampler object are used by
MCMCsim
:
- start
function to generate starting values.
- draw
function to draw samples, typically from a full conditional posterior distribution.
- rprior
function to draw from a prior distribution.
- coef.names
list of vectors of parameter coefficient names, for vector parameters.
- MHpars
vector of names of parameters that are sampled using a Metropolis-Hastings (MH) sampler; acceptance rates are kept for these parameters.
- adapt
function of acceptance rates of
MHpars
to adapt MH-kernel, called every 100 iterations during the burn-in period.
Value
An object of class mcdraws
containing posterior draws as well as some meta information.
Examples
# 1. create a sampler function
sampler <- new.env()
sampler$draw <- function(p) list(x=rnorm(1L), y=runif(1L))
# 2. do the simulation
sim <- MCMCsim(sampler, store=c("x", "y"))
str(sim)
summary(sim)
# example that requires start values or a start function
sampler$draw <- function(p) list(x=rnorm(1L), y=p$x * runif(1L))
sampler$start <- function(p) list(x=rnorm(1L), y=runif(1L))
sim <- MCMCsim(sampler, store=c("x", "y"))
summary(sim)
plot(sim, c("x", "y"))
# example using create_sampler; first generate some data
n <- 100
dat <- data.frame(x=runif(n), f=as.factor(sample(1:4, n, replace=TRUE)))
gd <- generate_data(~ reg(~ x + f, prior=pr_normal(precision=1), name="beta"), data=dat)
dat$y <- gd$y
sampler <- create_sampler(y ~ x + f, data=dat)
sim <- MCMCsim(sampler, burnin=100, n.iter=400, n.chain=2)
(summary(sim))
gd$pars
S4 methods for products of matrix objects
Description
Several methods for products of matrix objects. Here a matrix object can be
an ordinary (dense) matrix
or a (sparse) Matrix
of class
ddiMatrix-class
, tabMatrix
,
dgCMatrix-class
, or dsCMatrix-class
.
The return types are restricted to matrix
or numeric
in case of dense objects, and
dgCMatrix
, dsCMatrix
, ddiMatrix
or tabMatrix
in case of sparse objects.
Usage
## S4 method for signature 'ddiMatrix,matrix'
x %*% y
## S4 method for signature 'dgCMatrix,matrix'
x %*% y
## S4 method for signature 'dsCMatrix,matrix'
x %*% y
## S4 method for signature 'tabMatrix,matrix'
x %*% y
## S4 method for signature 'matrix,tabMatrix'
x %*% y
## S4 method for signature 'tabMatrix,numLike'
x %*% y
## S4 method for signature 'ddiMatrix,dgCMatrix'
x %*% y
## S4 method for signature 'ddiMatrix,tabMatrix'
x %*% y
## S4 method for signature 'CsparseMatrix,tabMatrix'
x %*% y
## S4 method for signature 'tabMatrix,CsparseMatrix'
x %*% y
## S4 method for signature 'matrix,dgCMatrix'
tcrossprod(x, y)
## S4 method for signature 'matrix,tabMatrix'
tcrossprod(x, y)
## S4 method for signature 'matrix,ddiMatrix'
tcrossprod(x, y)
## S4 method for signature 'tabMatrix,missing'
crossprod(x, y)
## S4 method for signature 'tabMatrix,matrix'
crossprod(x, y)
## S4 method for signature 'tabMatrix,dgCMatrix'
crossprod(x, y)
## S4 method for signature 'tabMatrix,tabMatrix'
crossprod(x, y)
## S4 method for signature 'dgCMatrix,matrix'
crossprod(x, y)
Arguments
x |
a matrix object. |
y |
a matrix object. |
Value
A matrix object. In case one of the arguments is a regular (dense) matrix
the result is a matrix
as well.
Simulation based calibration
Description
Simulation based calibration
Usage
SBC_test(
...,
pars,
n.draw = 25L,
n.sim = 20L * n.draw,
burnin = 25L,
thin = 2L,
show.progress = TRUE,
verbose = TRUE,
n.cores = 1L,
cl = NULL,
seed = NULL,
export = NULL
)
Arguments
... |
passed to |
pars |
named list with univariate functions of the parameters to use in test. This list
is passed to argument |
n.draw |
number of posterior draws to retain in posterior simulations. |
n.sim |
number of simulation iterations. |
burnin |
burnin to use in posterior simulations, passed to |
thin |
thinning to use in posterior simulations, passed to |
show.progress |
whether a progress bar should be shown. |
verbose |
set to |
n.cores |
the number of cpu cores to use. Default is one, i.e. no parallel computation.
If an existing cluster |
cl |
an existing cluster can be passed for parallel computation. If |
seed |
a random seed (integer). For parallel computation it is used to independently seed RNG streams for all workers. |
export |
a character vector with names of objects to export to the workers. This may be needed for parallel execution if expressions in the model formulae depend on global variables. |
Value
A matrix with ranks.
References
M. Modrak, A.H. Moon, S. Kim, P. Buerkner, N. Huurre, K. Faltejskova, A. Gelman and A. Vehtari (2023). Simulation-based calibration checking for Bayesian computation: The choice of test quantities shapes sensitivity. Bayesian Analysis, 1(1), 1-28.
Examples
## Not run:
# this example may take a long time
n <- 10L
dat <- data.frame(x=runif(n))
ranks <- SBC_test(~ reg(~ 1 + x, prior=pr_normal(mean=c(0.25, 1), precision=1), name="beta"),
family=f_gaussian(var.prior=pr_invchisq(df=1, scale=list(df=1, scale=1))),
data=dat,
pars=list(mu="beta[1]", beta_x="beta[2]", sigma="sigma_"),
n.draw=9L, n.sim=10L*20L, thin=2L, burnin=20L
)
ranks
## End(Not run)
Functions for specifying the method and corresponding options for sampling from a possibly truncated and degenerate multivariate normal distribution
Description
These functions are intended for use in the method
argument of create_TMVN_sampler
.
Usage
m_direct(use.cholV = NULL)
m_Gibbs(slice = FALSE, eps = sqrt(.Machine$double.eps), diagnostic = FALSE)
m_HMC(Tsim = pi/2, max.events = .Machine$integer.max, diagnostic = FALSE)
m_HMCZigZag(
Tsim = 1,
scale = 1,
prec.eq = NULL,
diagnostic = FALSE,
max.events = .Machine$integer.max,
adapt = FALSE
)
m_softTMVN(
sharpness = 100,
useV = FALSE,
CG = NULL,
PG.approx = TRUE,
PG.approx.m = -2L
)
Arguments
use.cholV |
whether to use the Cholesky factor of the variance instead
of precision matrix for sampling. If |
slice |
if |
eps |
small positive value to control numerical robustness of the algorithm. |
diagnostic |
whether information about violations of inequalities, bounces off inequality walls (for 'HMC' and 'HMCZigZag' methods) or gradient events (for 'HMCZigZag') is printed to the screen. |
Tsim |
the duration of a Hamiltonian Monte Carlo simulated particle trajectory. This can be specified as either a single positive numeric value for a fixed simulation time, or as a function that is applied in each MCMC iteration to generates a simulation time. |
max.events |
maximum number of events (reflections off inequality walls and for method 'HMCZigZag' also gradient events). Default is unlimited. Specifying a finite number may speed up the sampling but may also result in a biased sampling algorithm. |
scale |
vector of Laplace scale parameters for method 'HMCZigZag'. It must be a positive numeric vector of length equal to one or the number of variables. |
prec.eq |
positive numeric vector of length 1 or the number of equality restrictions,
to control the precision by which the equality restrictions are imposed; the larger
|
adapt |
experimental feature: if |
sharpness |
for method 'softTMVN', the sharpness of the soft inequalities; the larger the better the approximation of exact inequalities. It must be a positive numeric vector of length one or the number of inequality restrictions. |
useV |
for method 'softTMVN' whether to base computations on variance instead of precision matrices. |
CG |
use a conjugate gradient iterative algorithm instead of Cholesky updates for sampling
the model's coefficients. This must be a list with possible components |
PG.approx |
see |
PG.approx.m |
see |
Value
A method object, for internal use only.
Compute autocovariance or autocorrelation function via Wiener-Khinchin theorem using Fast Fourier Transform
Description
Compute autocovariance or autocorrelation function via Wiener-Khinchin theorem using Fast Fourier Transform
Usage
ac_fft(x, demean = TRUE)
Arguments
x |
matrix with time (iteration number) along the rows and variables along the columns. |
demean |
whether to subtract from each column its mean. |
Value
A matrix of the same size as x with autocovariances at all lags from 1 to the number of rows.
Return Metropolis-Hastings acceptance rates
Description
Return Metropolis-Hastings acceptance rates
Usage
acceptance_rates(obj, aggregate.chains = FALSE)
Arguments
obj |
an mcdraws object, i.e. the output of function |
aggregate.chains |
whether to return averages over chains or results per chain. |
Value
A list of acceptance rates.
Examples
ex <- mcmcsae_example()
# specify a model that requires MH sampling (in this case for a modelled
# degrees of freedom parameter in the variance part of the model)
sampler <- create_sampler(ex$model, data=ex$dat,
family = f_gaussian(var.model = ~vfac(factor="fA", prior=pr_invchisq(df="modeled")))
)
sim <- MCMCsim(sampler, burnin=100, n.iter=300, thin=2, n.chain=4, store.all=TRUE)
(summary(sim))
acceptance_rates(sim)
Utility function to construct a sparse aggregation matrix from a factor
Description
Utility function to construct a sparse aggregation matrix from a factor
Usage
aggrMatrix(fac, w = 1, mean = FALSE, facnames = FALSE)
Arguments
fac |
factor variable. |
w |
vector of weights associated with the levels of |
mean |
if |
facnames |
whether the factor levels should be used as column names for the aggregation matrix. |
Value
A sparse aggregation matrix of class tabMatrix
.
Examples
n <- 1000
f <- sample(1:100, n, replace=TRUE)
x <- runif(n)
M <- aggrMatrix(f)
all.equal(crossprod_mv(M, x), as.vector(tapply(x, f, sum)))
S4 method for generic 'anyNA' and signature 'tabMatrix'
Description
S4 method for generic 'anyNA' and signature 'tabMatrix'
Usage
## S4 method for signature 'tabMatrix'
anyNA(x, recursive = FALSE)
Arguments
x |
a tabMatrix object. |
recursive |
not used. |
Value
Whether the tabMatrix object contains missings or not.
Create a model component object for a BART (Bayesian Additive Regression Trees) component in the linear predictor
Description
This function is intended to be used on the right hand side of the
formula
argument to create_sampler
or
generate_data
. It creates a BART term in the
model's linear predictor. To use this model component one needs
to have R package dbarts installed.
Usage
brt(
formula,
X = NULL,
n.trees = 75L,
name = "",
debug = FALSE,
keepTrees = FALSE,
...
)
Arguments
formula |
a formula specifying the predictors to be used in the BART
model component. Variable names are looked up in the data frame
passed as |
X |
a design matrix can be specified directly, as an alternative
to the creation of one based on |
n.trees |
number of trees used in the BART ensemble. |
name |
the name of the model component. This name is used in the output of the
MCMC simulation function |
debug |
if |
keepTrees |
whether to store the trees ensemble for each Monte Carlo draw. This
is required for prediction based on new data. The default is |
... |
parameters passed to |
Value
An object with precomputed quantities and functions for sampling from prior or conditional posterior distributions for this model component. Intended for internal use by other package functions.
References
H.A. Chipman, E.I. Georgea and R.E. McCulloch (2010). BART: Bayesian additive regression trees. The Annals of Applied Statistics 4(1), 266-298.
J.H. Friedman (1991). Multivariate adaptive regression splines. The Annals of Statistics 19, 1-67.
Examples
# generate data, based on an example in Friedman (1991)
gendat <- function(n=150L, p=10L, sigma=1) {
x <- matrix(runif(n * p), n, p)
mu <- 10*sin(pi*x[, 1] * x[, 2]) + 20*(x[, 3] - 0.5)^2 + 10*x[, 4] + 5*x[, 5]
y <- mu + sigma * rnorm(n)
data.frame(x=x, mu=mu, y=y)
}
train <- gendat()
test <- gendat(n=25)
# keep trees for later prediction based on new data
sampler <- create_sampler(
y ~ brt(~ . - y, name="bart", keepTrees=TRUE),
family = f_gaussian(var.prior=pr_invchisq(df=3, scale=var(train$y))),
data = train
)
# increase burnin and n.iter below to improve MCMC convergence
sim <- MCMCsim(sampler, n.chain=2, burnin=100, n.iter=200, thin=2,
store.all=TRUE, verbose=FALSE)
(summ <- summary(sim))
plot(train$mu, summ$bart[, "Mean"]); abline(0, 1)
# NB prediction is currently slow
pred <- predict(sim, newdata=test,
iters=sample(seq_len(n_draws(sim)), 50),
show.progress=FALSE
)
(summpred <- summary(pred))
plot(test$mu, summpred[, "Mean"]); abline(0, 1)
Set options for Cholesky decomposition
Description
These options are only effective in case the matrix to be decomposed is sparse, i.p.
of class dsCMatrix-class
.
Usage
chol_control(perm = NULL, super = NA, ordering = 0L, inplace = TRUE)
Arguments
perm |
logical scalar, see |
super |
logical scalar, see |
ordering |
an integer scalar passed to CHOLMOD routines determining which reordering schemes are tried to limit sparse Cholesky fill-in. |
inplace |
whether sparse Cholesky updates should re-use the same memory location. |
Value
A list with specified options used for Cholesky decomposition.
Combine multiple mcdraws objects into a single one by combining their chains
Description
This function can be used to combine the results of parallel simulations.
Usage
combine_chains(...)
Arguments
... |
objects of class |
Value
A combined object of class mcdraws
where the number of stored
chains equals the sum of the numbers of chains in the input objects.
Combine multiple mcdraws objects into a single one by combining their draws
Description
This function is used to combine the results of parallel posterior predictive simulations.
Usage
combine_iters(...)
Arguments
... |
objects of class |
Value
A combined object of class mcdraws
where the number of stored
draws equals the sum of the numbers of draws in the input objects.
Compute a list of design matrices for all terms in a model formula, or based on a sampler environment
Description
If sampler
is provided instead of formula
, the design matrices
are based on the model used to create the sampler environment. In that case, if
data
is NULL
, the design matrices stored in sampler
are returned,
otherwise the design matrices are computed for the provided data based on the sampler's model.
The output is a list of dense or sparse design matrices for the model components
with respect to data
.
Usage
computeDesignMatrix(formula = NULL, data = NULL, labels = TRUE)
Arguments
formula |
model formula. |
data |
data frame to be used in deriving the design matrices. |
labels |
if |
Value
A list of design matrices.
Examples
n <- 1000
dat <- data.frame(
x = rnorm(n),
f = factor(sample(1:50, n, replace=TRUE))
)
str(computeDesignMatrix(~ x, dat)[[1]])
model <- ~ reg(~x, name="beta") + gen(~x, factor=~f, name="v")
X <- computeDesignMatrix(model, dat)
str(X)
Correlation factor structures in generic model components
Description
Element 'factor' of a model component created using function
gen
is a formula composed of several possible terms described
below. It is used to derive a (typically sparse) precision matrix for a set of
coefficients, and possibly a matrix representing a set of linear constraints
to be imposed on the coefficient vector.
- iid(f)
Independent effects corresponding to the levels of factor
f
.- RW1(f, circular=FALSE, w=NULL)
First-order random walk over the levels of factor
f
. The random walk can be made circular and different (fixed) weights can be attached to the innovations. If specified,w
must be a positive numeric vector of length one less than the number of factor levels. For example, if the levels correspond to different times, it would often be reasonable to choosew
proportional to the reciprocal time differences. For equidistant times there is generally no need to specifyw
.- RW2(f)
Second-order random walk.
- AR1(f, phi, w=NULL, control=NULL)
First-order autoregressive correlation structure among the levels of
f
. Argumentphi
can be a single numerical value of the autoregressive parameter, or an appropriate prior specification if phi should be inferred. If not supplied, a uniform prior on (-1, 1] is assumed. For irregularly spaced AR(1) processes weights can be specified, in the same way as forRW1
.- season(f, period)
Dummy seasonal with period
period
.- spatial(f, graph, snap, queen)
CAR spatial correlation. Argument
graph
can either be an object of (S4) classSpatialPolygonsDataFrame
or an object of (S3) classsf
. The latter can be obtained, e.g., by reading in a shape file using functionst_read
. Argumentssnap
andqueen
are passed topoly2nb
, which computes a neighbours list. Alternatively, a neighbours list object of classnb
can be passed directly as argumentgraph
.- splines(f, knots, degree)
P-splines, i.e. penalised B-splines structure over the domain of a quantitative variable f. Arguments knots and degree are passed to
splineDesign
. Ifknots
is a single value it is interpreted as the number of knots, otherwise as a vector of knot positions. By default 40 equally spaced knots are used, and a degree of 3.- custom(f, D=NULL, Q=NULL, R=NULL, derive.constraints=NULL)
Either a custom precision or incidence matrix associated with factor f can be passed to argument
Q
orD
. Optionally a constraint matrix can be supplied asR
, or constraints can be derived from the null space of the precision matrix by settingderive.constraints=TRUE
.
Usage
iid(name)
RW1(name, circular = FALSE, w = NULL)
RW2(name)
AR1(name, phi, w = NULL, control = NULL)
season(name, period)
spatial(
name,
graph = NULL,
snap = sqrt(.Machine$double.eps),
queen = TRUE,
poly.df = NULL,
derive.constraints = FALSE
)
splines(name, knots, degree)
custom(name, D = NULL, Q = NULL, R = NULL, derive.constraints = NULL)
Arguments
name |
name of a variable, unquoted. |
circular |
whether the random walk is circular. |
w |
a vector of weights. |
phi |
prior distribution, or fixed value, for an
autoregressive parameter. The default is a uniform prior over
the interval [-1, 1]. A single numeric value is interpreted as
a fixed value, corresponding to a degenerate prior, which can also
be specified as |
control |
options for Metropolis-Hastings sampling from the
conditional posterior for an autoregressive parameter. These options
can be set using function |
period |
a positive integer specifying the seasonal period. |
graph |
either a spatial object of class |
snap |
passed to |
queen |
passed to |
poly.df |
a spatial data frame. DEPRECATED, use argument
|
derive.constraints |
whether to derive the constraint matrix for an
IGMRF model component numerically from the precision matrix.
The use of |
knots |
passed to |
degree |
passed to |
D |
custom incidence matrix. |
Q |
custom precision matrix. |
R |
custom restriction matrix. |
References
B. Allevius (2018). On the precision matrix of an irregularly sampled AR(1) process. arXiv:1801.03791v2.
H. Rue and L. Held (2005). Gaussian Markov Random Fields. Chapman & Hall/CRC.
Examples
# example of CAR spatial random effects
if (requireNamespace("sf")) {
# 1. load a shape file of counties in North Carolina
nc <- sf::st_read(system.file("shape/nc.shp", package="sf"))
# 2. generate some data according to a model with a few regression
# effects, as well as spatial random effects
gd <- generate_data(
~ reg(~ AREA + BIR74, prior=pr_normal(precision=1), name="beta") +
gen(factor = ~ spatial(NAME, graph=nc), name="vs"),
family=f_gaussian(var.prior = pr_invchisq(df=10, scale=1)),
data = nc
)
# add the generated target variable and the spatial random effects to the
# spatial dataframe object
nc$y <- gd$y
nc$vs_true <- gd$pars$vs
# 3. fit a model to the generated data, and see to what extent the
# parameters used to generate the data, gd$pars, are reproduced
sampler <- create_sampler(
y ~ reg(~ AREA + BIR74, prior=pr_normal(precision=1), name="beta") +
gen(factor = ~ spatial(NAME, graph=nc), name="vs"),
data=nc
)
# increase burnin and n.iter below to improve MCMC convergence
sim <- MCMCsim(sampler, store.all=TRUE, burnin=100, n.iter=200, n.chain=2, verbose=FALSE)
(summ <- summary(sim))
nc$vs <- summ$vs[, "Mean"]
plot(nc[c("vs_true", "vs")])
plot(gd$pars$vs, summ$vs[, "Mean"]); abline(0, 1, col="red")
}
Set up a sampler object for sampling from a possibly truncated and degenerate multivariate normal distribution
Description
This function sets up an object for multivariate normal sampling based on a specified precision matrix. Linear equality and inequality restrictions are supported. For sampling under inequality restrictions four algorithms are available. The default in that case is an exact Hamiltonian Monte Carlo algorithm (Pakman and Paninski, 2014). A related algorithm is the zig-zag Hamiltonian Monte Carlo method (Nishimura et al., 2021) in which momentum is sampled from a Laplace instead of normal distribution. Alternatively, a Gibbs sampling algorithm can be used (Rodriguez-Yam et al., 2004). The fourth option is a data augmentation method that samples from a smooth approximation to the truncated multivariate normal distribution (Souris et al., 2018).
Usage
create_TMVN_sampler(
Q,
mu = NULL,
Xy = NULL,
update.Q = FALSE,
update.mu = update.Q,
name = "x",
coef.names = NULL,
constraints = NULL,
check.constraints = FALSE,
method = NULL,
reduce = NULL,
chol.control = chol_control(),
debug = FALSE
)
Arguments
Q |
precision matrix of the (unconstrained) multivariate normal distribution. |
mu |
mean of the (unconstrained) multivariate normal distribution. |
Xy |
alternative to specifying mu; in this case |
update.Q |
whether |
update.mu |
whether |
name |
name of the TMVN vector parameter. |
coef.names |
optional labels for the components of the vector parameter. |
constraints |
optional linear equality and/or inequality constraints.
Use function |
check.constraints |
if |
method |
sampling method. The options are "direct" for direct sampling from the
unconstrained or equality constrained multivariate normal (MVN). For inequality constrained
MVN sampling three methods are supported: "HMC" for (exact) Hamiltonian Monte Carlo,
"HMCZigZag" for (exact) Hamiltonian Monte Carlo with Laplace momentum, "Gibbs" for a
component-wise Gibbs sampling approach, and "softTMVN" for a data augmentation method that samples
from a smooth approximation to the truncated MVN. Alternatively, the method setting
functions |
reduce |
whether to a priori restrict the simulation to the subspace defined by the equality constraints. |
chol.control |
options for Cholesky decomposition, see |
debug |
if |
Details
The componentwise Gibbs sampler uses univariate truncated normal samplers as described in Botev and L'Ecuyer (2016). These samplers are implemented in R package TruncatedNormal, but here translated to C++ for an additional speed-up.
Value
An environment for sampling from a possibly degenerate and truncated multivariate normal distribution.
Author(s)
Harm Jan Boonstra, with help from Grzegorz Baltissen
References
Z.I. Botev and P. L'Ecuyer (2016). Simulation from the Normal Distribution Truncated to an Interval in the Tail. in VALUETOOLS.
Y. Cong, B. Chen and M. Zhou (2017). Fast simulation of hyperplane-truncated multivariate normal distributions. Bayesian Analysis 12(4), 1017-1037.
Y. Li and S.K. Ghosh (2015). Efficient sampling methods for truncated multivariate normal and student-t distributions subject to linear inequality constraints. Journal of Statistical Theory and Practice 9(4), 712-732.
A. Nishimura, Z. Zhang and M.A. Suchard (2024). Zigzag path connects two Monte Carlo samplers: Hamiltonian counterpart to a piecewise deterministic Markov process. Journal of the American Statistical Association, 1-13.
A. Pakman and L. Paninski (2014). Exact Hamiltonian Monte Carlo for truncated multivariate gaussians. Journal of Computational and Graphical Statistics 23(2), 518-542.
G. Rodriguez-Yam, R.A. Davis and L.L. Scharf (2004). Efficient Gibbs sampling of truncated multivariate normal with application to constrained linear regression. Unpublished manuscript.
H. Rue and L. Held (2005). Gaussian Markov Random Fields. Chapman & Hall/CRC.
A. Souris, A. Bhattacharya and P. Debdeep (2019). The Soft Multivariate Truncated Normal Distribution with Applications to Bayesian Constrained Estimation. arXiv:1807.09155v2.
K.A. Valeriano, C.E. Galarza and L.A. Matos (2023). Moments and random number generation for the truncated elliptical family of distributions. Statistics and Computing 33(1), 1-20.
Examples
S <- cbind(diag(2), c(-1, 1), c(1.1, -1)) # inequality matrix
# S'x >= 0 represents the wedge x1 <= x2 <= 1.1 x1
C <- set_constraints(S=S) # create a constraints object
# example taken from Pakman and Paninski (2014)
# 1. exact Hamiltonian Monte Carlo (Pakman and Paninski, 2014)
sampler <- create_TMVN_sampler(Q=diag(2), mu=c(4, 4), constraints=C, method="HMC")
sim <- MCMCsim(sampler, n.iter=600, verbose=FALSE)
summary(sim)
plot(as.matrix(sim$x), pch=".")
# 2. exact Hamiltonian Monte Carlo with Laplace momentum (Nishimura et al., 2021)
sampler <- create_TMVN_sampler(Q=diag(2), mu=c(4, 4), constraints=C, method="HMCZigZag")
sim <- MCMCsim(sampler, n.iter=600, verbose=FALSE)
summary(sim)
plot(as.matrix(sim$x), pch=".")
# 3. Gibbs sampling approach (Rodriguez-Yam et al., 2004)
sampler <- create_TMVN_sampler(Q=diag(2), mu=c(4, 4), constraints=C, method="Gibbs")
sim <- MCMCsim(sampler, burnin=500, n.iter=2000, verbose=FALSE)
summary(sim)
plot(as.matrix(sim$x), pch=".")
# 4. soft TMVN approximation (Souris et al., 2018)
sampler <- create_TMVN_sampler(Q=diag(2), mu=c(4, 4), constraints=C, method="softTMVN")
sim <- MCMCsim(sampler, n.iter=600, verbose=FALSE)
summary(sim)
plot(as.matrix(sim$x), pch=".")
Set up a a function for direct sampling from a constrained multivariate normal distribution
Description
Set up a a function for direct sampling from a constrained multivariate normal distribution
Usage
create_block_cMVN_sampler(
mbs,
X,
Q,
R = NULL,
r = NULL,
sampler,
name = "x",
chol.control
)
Arguments
mbs |
block component containing several model components. |
X |
design matrix. |
Q |
precision matrix. |
R |
equality restriction matrix. |
r |
rhs vector for equality constraints |
sampler |
sampler object as created by |
name |
name of the cMVN vector parameter. |
chol.control |
options for Cholesky decomposition, see |
Value
An environment with precomputed quantities and functions for sampling from a multivariate normal distribution subject to equality constraints.
Set up a function for direct sampling from a constrained multivariate normal distribution
Description
Set up a function for direct sampling from a constrained multivariate normal distribution
Usage
create_cMVN_sampler(
D = NULL,
Q = NULL,
update.Q = FALSE,
R = NULL,
r = NULL,
eps1 = sqrt(.Machine$double.eps),
eps2 = sqrt(.Machine$double.eps),
chol.control = chol_control(perm = TRUE)
)
Arguments
D |
factor of precision matrix Q such that Q=D'D. |
Q |
precision matrix. |
update.Q |
whether to update (D and) Q for each draw. |
R |
equality restriction matrix. |
r |
rhs vector for equality constraints |
eps1 |
scalar parameter to control numerical robustness against singularity of Q. |
eps2 |
scalar parameter associated with the constraint part to control numerical robustness. |
chol.control |
options for Cholesky decomposition, see |
Value
An environment with precomputed quantities and a method 'draw' for sampling from a multivariate normal distribution subject to equality constraints.
Create a sampler object
Description
This function sets up a sampler object, based on the specification of a model. The object contains functions to
draw a set of model parameters from their prior and conditional posterior distributions, and
to generate starting values for the MCMC simulation. The functions share a common environment
containing precomputed quantities such as design matrices based on the model and the data.
The sampler object is the main input for the MCMC simulation function MCMCsim
.
Usage
create_sampler(
formula,
data = NULL,
family = "gaussian",
ny = NULL,
ry = NULL,
r.mod = NULL,
sigma.fixed = NULL,
sigma.mod = NULL,
Q0 = NULL,
formula.V = NULL,
logJacobian = NULL,
linpred = NULL,
compute.weights = FALSE,
block = NULL,
prior.only = FALSE,
control = sampler_control()
)
Arguments
formula |
formula to specify the response variable and additive model components. The model components
form the linear predictor part of the model. A model component on the right hand side can be either
a regression term specified by |
data |
data frame with n rows in which the variables specified in model components can be found. |
family |
character string describing the data distribution. The default is 'gaussian'.
Other options are 'binomial', 'multinomial', 'negbinomial' for the negative binomial distribution,
'poisson', and 'gamma'. Alternatively, functions starting with 'f_' followed
by the family name can be used to specify the sampling distribution and possibly
further options. See |
ny |
NO LONGER USED. Please use |
ry |
NO LONGER USED. Please use |
r.mod |
NO LONGER USED. Please use |
sigma.fixed |
for Gaussian models, if |
sigma.mod |
prior for the variance parameter of a gaussian sampling distribution.
This can be specified by a call to one of the prior specification functions
|
Q0 |
n x n data-level precision matrix for a Gaussian model. It defaults to the unit matrix.
If an n-vector is provided it will be expanded to a (sparse) diagonal matrix with Q0 on its diagonal.
If a name is supplied it will be looked up in |
formula.V |
a formula specifying the terms of a variance model in the case of a Gaussian likelihood.
Currently two types of terms are supported: a regression term for the log-variance
specified with |
logJacobian |
if the data are transformed the logarithm of the Jacobian can be supplied so that it
is incorporated in all log-likelihood computations. This can be useful for comparing information criteria
for different transformations. It should be supplied as a vector of the same size as the response variable,
and is currently only supported if |
linpred |
a list of matrices defining (possibly out-of-sample) linear predictors to be simulated.
This allows inference on e.g. (sub)population totals or means. The list must be of the form
|
compute.weights |
if |
block |
DEPRECATED, please use argument |
prior.only |
whether a sampler is set up only for sampling from the prior or for sampling from both prior
and posterior distributions. Default |
control |
a list with further computational options. These options can
be specified using function |
Details
The right hand side of the formula
argument to create_sampler
can be used to specify
additive model components. Currently four model components are supported: reg(...)
for regression or 'fixed' effects, gen(...)
for generic random effects,
mec(...)
for measurement in covariates effects, and brt(...)
for a Bayesian additive regression trees component. Note that an offset can be added
separately, in the usual way using offset(...)
.
For gaussian models, formula.V
can be used to specify the variance structure of the model.
Currently two specialised variance model components are supported, vreg(...)
for regression
effects predicting the log-variance and vfac(...)
for modelled variance factors.
Value
A sampler object, which is the main input for the MCMC simulation
function MCMCsim
. The sampler object is an environment with
precomputed quantities and functions. The main functions are rprior
,
which returns a sample from the prior distributions, draw
,
which returns a sample from the full conditional posterior distributions,
and start
, which returns a list with starting values for the Gibbs
sampler. If prior.only
is TRUE
, functions draw
and
start
are not created.
References
J.H. Albert and S. Chib (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American statistical Association 88(422), 669-679.
D. Bates, M. Maechler, B. Bolker and S.C. Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software 67(1), 1-48.
S.W. Linderman, M.J. Johnson and R.P. Adams (2015). Dependent multinomial models made easy: Stick-breaking with the Polya-Gamma augmentation. Advances in Neural Information Processing Systems, 3456-3464.
P.A. Parker, S.H. Holan and R. Janicki (2024). Conjugate Modeling Approaches for Small Area Estimation with Heteroscedastic Structure. Journal of Survey Statistics and Methodology 12(4), 1061-1080.
N. Polson, J.G. Scott and J. Windle (2013). Bayesian Inference for Logistic Models Using Polya-Gamma Latent Variables. Journal of the American Statistical Association 108(504), 1339-1349.
H. Rue and L. Held (2005). Gaussian Markov Random Fields. Chapman & Hall/CRC.
Examples
# first generate some data
n <- 200
x <- rnorm(n)
y <- 0.5 + 2*x + 0.3*rnorm(n)
# create a sampler for a simple linear regression model
sampler <- create_sampler(y ~ x)
sim <- MCMCsim(sampler)
(summary(sim))
y <- rbinom(n, 1, 1 / (1 + exp(-(0.5 + 2*x))))
# create a sampler for a binary logistic regression model
sampler <- create_sampler(y ~ x, family="binomial")
sim <- MCMCsim(sampler)
(summary(sim))
Specify a binomial sampling distribution
Description
This function can be used in the family
argument of create_sampler
or generate_data
to specify a binomial sampling distribution. This
includes the special case of binary (Bernoulli) data.
Usage
f_binomial(link = c("logit", "probit"), n.trial = NULL)
Arguments
link |
the name of a link function. Currently the only allowed link functions
for the binomial distribution are |
n.trial |
the number of binomial trials. This can be specified either as a formula for a variable number of trials, or as a scalar value for a common number of trials for all units. |
Value
A family object.
Specify a Gamma sampling distribution
Description
This function can be used in the family
argument of create_sampler
or generate_data
to specify a Gamma sampling distribution.
Usage
f_gamma(
link = "log",
shape.vec = ~1,
shape.prior = pr_gamma(0.1, 0.1),
control = set_MH(type = "RWLN", scale = 0.2, adaptive = TRUE)
)
Arguments
link |
the name of a link function. Currently the only allowed link function
for the gamma distribution is |
shape.vec |
optional formula specification of unequal shape parameter. |
shape.prior |
prior for gamma shape parameter. Supported prior distributions:
|
control |
options for the Metropolis-Hastings algorithm employed
in case the shape parameter is to be inferred. Function |
Value
A family object.
References
J.W. Miller (2019). Fast and Accurate Approximation of the Full Conditional for Gamma Shape Parameters. Journal of Computational and Graphical Statistics 28(2), 476-480.
Specify a Gaussian sampling distribution
Description
This function can be used in the family
argument of create_sampler
or generate_data
to specify a Gaussian sampling distribution.
Usage
f_gaussian(
link = "identity",
var.prior = pr_invchisq(df = 0, scale = 1),
var.vec = ~1,
prec.mat = NULL,
var.model = NULL,
logJacobian = NULL
)
Arguments
link |
the name of a link function. Currently the only allowed link functions
for the Gaussian distribution is |
var.prior |
prior for the variance parameter of a Gaussian sampling distribution.
This can be specified by a call to one of the prior specification functions
|
var.vec |
a formula to specify unequal variances, i.e. heteroscedasticity. The default corresponds to equal variances. |
prec.mat |
a possibly non-diagonal positive-definite symmetric matrix
interpreted as the precision matrix, i.e. inverse of the covariance matrix.
If this argument is specified |
var.model |
a formula specifying the terms of a variance model in the case of a Gaussian likelihood.
Several types of terms are supported: a regression term for the log-variance
specified with |
logJacobian |
if the data are transformed the logarithm of the Jacobian can be supplied so that it
is incorporated in all log-likelihood computations. This can be useful for comparing information criteria
for different transformations. It should be supplied as a vector of the same size as the response variable,
and is currently only supported if |
Value
A family object.
Specify a Gaussian-Gamma sampling distribution
Description
This function can be used in the family
argument of
create_sampler
or generate_data
to specify a
Gaussian-Gamma sampling distribution, i.e., a Gaussian sampling distribution
whose variances are observed subject to error according to a Gamma
distribution.
Usage
f_gaussian_gamma(link = "identity", var.model, ...)
Arguments
link |
the name of a link function. Currently the only allowed link function
for this distribution family is |
var.model |
a formula specifying the terms of a variance model.
The left-hand side of the formula should specify the observed variances,
unless the family object is used for data generation only.
Several types of model terms on the right-hand side of the formula are supported:
a regression term for the log-variance specified with |
... |
further arguments passed to |
Value
A family object.
Specify a multinomial sampling distribution
Description
This function can be used in the family
argument of create_sampler
or generate_data
to specify a multinomial sampling distribution. This
includes the special case of categorical (multinoulli) data.
Usage
f_multinomial(link = "logit", n.trial = NULL, K = NULL)
Arguments
link |
the name of a link function. Currently the only allowed link function
for the multinomial distribution is |
n.trial |
the number of multinomial trials. This can be specified either as a formula for a variable number of trials, or as a scalar value for a common number of trials for all units. |
K |
number of categories for multinomial model; only used for prior predictive sampling. |
Value
A family object.
Specify a negative binomial sampling distribution
Description
This function can be used in the family
argument of create_sampler
or generate_data
to specify a negative binomial sampling distribution.
Usage
f_negbinomial(
link = "log",
shape.vec = ~1,
inv.shape.prior = pr_invchisq(df = 1, scale = 1),
control = negbin_control()
)
Arguments
link |
the name of a link function. Currently the only allowed link function
for the negative binomial sampling distribution is |
shape.vec |
optional formula specification of unequal shape values. The
negative binomial (vector) shape parameter is then equal to this vector of
shape values, multiplied by the scalar shape parameter, whose prior is
specified through |
inv.shape.prior |
Prior on the (scalar) reciprocal shape parameter,
i.e. the overdispersion parameter. Supported prior distributions are
|
control |
a list with computational options. These options can
be specified using function |
Details
The negative binomial distribution with shape r and probability p has density
p(y|r, p) = {\Gamma(y + r)\over y!\Gamma(r)}(1-p)^r p^y
with mean \mu = E(y|r,p) = {rp\over 1-p}
and variance
V(y|r,p) = \mu(1 + \mu/r)
. The second term of the variance
can be interpreted as overdispersion with respect to a Poisson
distribution, which would correspond to the limit r \rightarrow \infty
.
So the reciprocal shape 1/r
is an overdispersion parameter,
which typically is inferred. It is assigned a default prior, which
may be changed through argument inv.shape.prior
.
The only supported link function is "log"
. Strictly speaking
the relation between mean \mu
and linear predictor \eta
is
\log\mu = \log r + \log{p\over 1-p} = \log r + \eta
This way the likelihood function has the same form as that of logistic
binomial regression, so that a Polya-Gamma data augmentation sampling
algorithm can be employed. Note that the fact that the linear predictor
\eta
does not include \log r
effectively changes the
interpretation of its intercept.
Value
A family object.
References
N. Polson, J.G. Scott and J. Windle (2013). Bayesian Inference for Logistic Models Using Polya-Gamma Latent Variables. Journal of the American Statistical Association 108(504), 1339-1349.
M. Zhou and L. Carin (2015). Negative Binomial Process Count and Mixture Modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence 37(2), 307-320.
Specify a Poisson sampling distribution
Description
This function can be used in the family
argument of create_sampler
or generate_data
to specify a Poisson sampling distribution.
Usage
f_poisson(link = "log", control = poisson_control())
Arguments
link |
the name of a link function. Currently the only allowed
link function for the Poisson distribution is |
control |
a list with computational options. These options can
be specified using function |
Value
A family object.
Create a model component object for a generic random effects component in the linear predictor
Description
This function is intended to be used on the right hand side of the formula
argument to
create_sampler
or generate_data
.
Usage
gen(
formula = ~1,
factor = NULL,
remove.redundant = FALSE,
drop.empty.levels = FALSE,
X = NULL,
var = NULL,
prior = NULL,
Q0 = NULL,
PX = NULL,
priorA = NULL,
strucA = GMRF_structure(),
GMRFconstr = NULL,
constraints0 = NULL,
constraintsA = NULL,
formula.gl = NULL,
a = 1000,
name = "",
sparse = NULL,
control = gen_control(),
debug = FALSE
)
Arguments
formula |
a model formula specifying the effects that vary over the levels of
the factor variable(s) specified by argument |
factor |
a formula with factors by which the effects specified in the |
remove.redundant |
whether redundant columns should be removed from the model matrix
associated with |
drop.empty.levels |
whether to remove factor levels without observations. |
X |
A (possibly sparse) design matrix. If |
var |
the (co)variance structure among the varying effects defined by |
prior |
the prior specification for the variance parameters of the random effects.
These can currently be specified by a call to |
Q0 |
precision matrix associated with |
PX |
whether parameter expansion should be used. Default is
|
priorA |
prior distribution for scale factors at the variance scale associated with |
strucA |
this option can be used to modify the default structure encoded by
|
GMRFconstr |
whether constraints corresponding to the null-vectors of the precision matrix
are to be imposed on the vector of coefficients. By default this is |
constraints0 |
an optional set of linear (in)equality restrictions acting on the coefficients defined
by |
constraintsA |
an optional set of linear (in)equality restrictions acting on the coefficients defined
by |
formula.gl |
a formula of the form |
a |
only used in case the effects are MLiG distributed, as
assumed in case of a gamma sampling distribution, or for
gaussian variance modelling. In those cases |
name |
the name of the model component. This name is used in the output of the MCMC simulation
function |
sparse |
whether the model matrix associated with |
control |
a list with further computational options. These options can
be specified using function |
debug |
if |
Value
An object with precomputed quantities and functions for sampling from prior or conditional posterior distributions for this model component. Intended for internal use by other package functions.
References
J. Besag and C. Kooperberg (1995). On Conditional and Intrinsic Autoregression. Biometrika 82(4), 733-746.
C.M. Carvalho, N.G. Polson and J.G. Scott (2010). The horseshoe estimator for sparse signals. Biometrika 97(2), 465-480.
L. Fahrmeir, T. Kneib and S. Lang (2004). Penalized Structured Additive Regression for Space-Time Data: a Bayesian Perspective. Statistica Sinica 14, 731-761.
A. Gelman (2006). Prior distributions for variance parameters in hierarchical models. Bayesian Analysis 1(3), 515-533.
A. Gelman, D.A. Van Dyk, Z. Huang and W.J. Boscardin (2008). Using Redundant Parameterizations to Fit Hierarchical Models. Journal of Computational and Graphical Statistics 17(1), 95-122.
T. Park and G. Casella (2008). The Bayesian Lasso. Journal of the American Statistical Association 103(482), 681-686.
H. Rue and L. Held (2005). Gaussian Markov Random Fields. Chapman & Hall/CRC.
Set computational options for the sampling algorithms used for a 'gen' model component
Description
Set computational options for the sampling algorithms used for a 'gen' model component
Usage
gen_control(MHprop = c("GiG", "LNRW"))
Arguments
MHprop |
MH proposal for the variance component in case of a MLiG prior
on the coefficients. The two options are "GiG" for a generalized inverse gamma
proposal, and "LNRW" for a log-normal random walk proposal. The former should
approximate the conditional posterior quite well provided MLiG parameter |
Value
A list of computational options regarding a 'gen' model component.
Generate a data vector according to a model
Description
This function generates draws from the prior predictive distribution. Parameter values are drawn from their priors, and consequently data is generated from the sampling distribution given these parameter values.
Usage
generate_data(
formula,
data = NULL,
family = "gaussian",
ny = NULL,
ry = NULL,
r.mod = NULL,
sigma.fixed = NULL,
sigma.mod = NULL,
Q0 = NULL,
formula.V = NULL,
linpred = NULL
)
Arguments
formula |
A model formula, see |
data |
see |
family |
sampling distribution family, see |
ny |
NO LONGER USED; see |
ry |
NO LONGER USED; see |
r.mod |
NO LONGER USED; see |
sigma.fixed |
DEPRECATED; see |
sigma.mod |
DEPRECATED; see |
Q0 |
DEPRECATED; see |
formula.V |
DEPRECATED; see |
linpred |
see |
Value
A list with a generated data vector and a list of prior means of the parameters. The parameters are drawn from their priors.
Examples
n <- 250
dat <- data.frame(
x = rnorm(n),
g = factor(sample(1:10, n, replace=TRUE)),
ny = 10
)
gd <- generate_data(
~ reg(~ 1 + x, prior=pr_normal(precision=10, mean=c(0, 1)), name="beta") +
gen(factor = ~ g, name="v"),
family=f_binomial(n.trial = ~ ny), data=dat
)
gd
plot(dat$x, gd$y)
Extract a list of parameter values for a single draw
Description
Extract a list of parameter values for a single draw
Usage
get_draw(obj, iter, chain)
Arguments
obj |
an object of class |
iter |
iteration number. |
chain |
chain number. |
Value
A list with all parameter values of draw iter
from chain chain
.
Examples
ex <- mcmcsae_example(n=50)
sampler <- create_sampler(ex$model, data=ex$dat)
sim <- MCMCsim(sampler, burnin=100, n.iter=300, thin=2, n.chain=4, store.all=TRUE)
get_draw(sim, iter=20, chain=3)
Create a model object for group-level regression effects within a generic random effects component.
Description
This function is intended to be used to specify the formula.gl
argument to
the gen
model component specification function.
Group-level predictors and hierarchical centring are
not used by default, and they currently cannot be used in a model component that is sampled
together with another model component in the same Gibbs block.
Usage
glreg(
formula = NULL,
remove.redundant = FALSE,
prior = NULL,
Q0 = NULL,
data = NULL,
name = ""
)
Arguments
formula |
a formula specifying the group-level predictors to be used within a model
component. If no |
remove.redundant |
whether redundant columns should be removed from the design matrix.
Default is |
prior |
prior specification for the group-level effects. Currently only
normal priors with mean 0 can be specified, using function |
Q0 |
prior precision matrix for the group-level effects. The default is a
zero matrix corresponding to a noninformative improper prior.
DEPRECATED, please use argument |
data |
group-level data frame in which the group-level variables specified in
|
name |
the name of the model component. This name is used in the output of the
MCMC simulation function |
Value
An object with precomputed quantities for sampling from prior or conditional posterior distributions for this model component. Only intended for internal use by other package functions.
Get and set the variable labels of a draws component object for a vector-valued parameter
Description
Get and set the variable labels of a draws component object for a vector-valued parameter
Usage
## S3 method for class 'dc'
labels(object, ...)
labels(object) <- value
Arguments
object |
a draws component object. |
... |
currently not used. |
value |
a vector of labels. |
Value
The extractor function returns the variable labels.
Examples
ex <- mcmcsae_example()
sampler <- create_sampler(ex$model, data=ex$dat)
sim <- MCMCsim(sampler, burnin=50, n.iter=100, n.chain=1, store.all=TRUE)
labels(sim$beta)
labels(sim$v)
labels(sim$beta) <- c("a", "b")
labels(sim$beta)
Fast matrix-vector multiplications
Description
Functions for matrix-vector multiplies like %*%
and crossprod
,
but often faster for the matrix types supported. The return value is always a
numeric vector.
Usage
M %m*v% v
crossprod_mv(M, v)
Arguments
M |
a matrix of class 'matrix', 'dgCMatrix', 'dsCMatrix', 'tabMatrix', or 'ddiMatrix'. |
v |
a numeric vector. |
Value
For %m*v%
the vector Mv
and for crossprod_mv
the vector
M'v
where M'
denotes the transpose of M
.
Examples
M <- matrix(rnorm(10*10), 10, 10)
x <- rnorm(10)
M %m*v% x
crossprod_mv(M, x)
M <- Matrix::rsparsematrix(100, 100, nnz=100)
x <- rnorm(100)
M %m*v% x
crossprod_mv(M, x)
Maximise the log-likelihood or log-posterior as defined by a sampler closure
Description
Maximise the log-likelihood or log-posterior as defined by a sampler closure
Usage
maximize_log_lh_p(
sampler,
type = c("llh", "lpost"),
method = "BFGS",
control = list(fnscale = -1),
...
)
Arguments
sampler |
sampler function closure, i.e. the return value of a call to |
type |
either "llh" (default) or "lpost", for optimization of the log-likelihood, or the log-posterior, respectively. |
method |
optimization method, passed to |
control |
control parameters, passed to |
... |
other parameters passed to |
Value
A list of parameter values that, provided the optimisation was successful, maximize the (log-)likelihood or (log-)posterior.
Examples
n <- 1000
dat <- data.frame(
x = rnorm(n),
f = factor(sample(1:50, n, replace=TRUE))
)
df <- generate_data(
~ reg(~x, name="beta", prior=pr_normal(precision=1)) + gen(~x, factor=~f, name="v"),
family=f_gaussian(var.prior=pr_fixed(value=1)), data=dat
)
dat$y <- df$y
sampler <- create_sampler(y ~ x + gen(~x, factor=~f, name="v"), data=dat)
opt <- maximize_log_lh_p(sampler)
str(opt)
plot(df$par$v, opt$par$v); abline(0, 1, col="red")
Create a model component object for an offset, i.e. fixed, non-parametrised term in the linear predictor
Description
This function is intended to be used on the right hand side of the formula
argument to
create_sampler
or generate_data
.
Usage
mc_offset(formula, value = NULL, name = "")
Arguments
formula |
model formula. |
value |
alternative specification of an offset as a single scalar value that is the same for each data unit. |
name |
the name of the model component. This name is used in the output of the MCMC simulation
function |
Value
An model component object with data and methods needed for dealing with an offset term in model estimation, and prior and posterior prediction. Intended for internal use by other package functions.
Generate artificial data according to an additive spatio-temporal model
Description
This function is used to generate data for several examples.
Usage
mcmcsae_example(n = 100L, family = "gaussian")
Arguments
n |
the size of the generated dataset. |
family |
sampling distribution family, see |
Value
A list
containing the generated dataset, the values of the model
parameters, and the model specification as a formula.
Examples
ex <- mcmcsae_example()
str(ex)
Create a model component object for a regression (fixed effects) component in the linear predictor with measurement errors in quantitative covariates
Description
This function is intended to be used on the right hand side of the
formula
argument to create_sampler
or
generate_data
. It creates an additive regression term in the
model's linear predictor. Covariates are assumed to be measured subject
to normally distributed errors with zero mean and variance specified using
the formula
or V
arguments. Note that this means that formula
should only contain quantitative variables, and no intercept.
By default, the prior for the regression
coefficients is improper uniform. A proper normal prior can be set up
using function pr_normal
, and passed to argument prior
.
It should be noted that pr_normal
expects a precision matrix
as input for its second argument, and that the prior variance (matrix) is
taken to be the inverse of this precision matrix, where in case the
model's family is "gaussian"
this matrix is additionally
multiplied by the residual scalar variance parameter sigma_^2
.
Usage
mec(
formula = ~1,
sparse = NULL,
X = NULL,
V = NULL,
prior = NULL,
Q0 = NULL,
b0 = NULL,
constraints = NULL,
name = "",
debug = FALSE
)
Arguments
formula |
a formula specifying the predictors subject to measurement error
and possibly their variances as well. In the latter case the formula syntax
|
sparse |
whether the model matrix associated with |
X |
a (possibly sparse) design matrix can be specified directly, as an
alternative to the creation of one based on |
V |
measurement error variance; can contain zeros |
prior |
prior specification for the regression coefficients. Currently only
normal priors are supported, specified using function |
Q0 |
prior precision matrix for the regression effects. The default is a
zero matrix corresponding to a noninformative improper prior.
It can be specified as a scalar value, as a numeric vector of appropriate
length, or as a matrix object. DEPRECATED, please use argument |
b0 |
prior mean for the regression effect. Defaults to a zero vector.
It can be specified as a scalar value or as a numeric vector of
appropriate length. DEPRECATED, please use argument |
constraints |
optional linear equality and/or inequality constraints
imposed on the vector of regression coefficients. Use function
|
name |
the name of the model component. This name is used in the output of the
MCMC simulation function |
debug |
if |
Value
An object with precomputed quantities and functions for sampling from prior or conditional posterior distributions for this model component. Intended for internal use by other package functions.
References
L.M. Ybarra and S.L. Lohr (2008). Small area estimation when auxiliary information is measured with error. Biometrika 95(4), 919-931.
S. Arima, G.S. Datta and B. Liseo (2015). Bayesian estimators for small area models when auxiliary information is measured with error. Scandinavian Journal of Statistics 42(2), 518-529.
Examples
# example of Ybarra and Lohr (2008)
m <- 50
X <- rnorm(m, mean=5, sd=3) # true covariate values
v <- rnorm(m, sd=2)
theta <- 1 + 3*X + v # true values
psi <- rgamma(m, shape=4.5, scale=2)
e <- rnorm(m, sd=sqrt(psi)) # sampling error
y <- theta + e # direct estimates
C <- c(rep(3, 10), rep(0, 40)) # measurement error for first 10 values
W <- X + rnorm(m, sd=sqrt(C)) # covariate subject to measurement error
# fit Ybarra-Lohr model
sampler <- create_sampler(
y ~ 1 + mec(~ 0 + W, V=C, name="ME") + gen(factor=~local_),
family=f_gaussian(var.prior=pr_fixed(1), var.vec=~psi),
linpred="fitted"
)
sim <- MCMCsim(sampler, n.iter=800, n.chain=2, store.all=TRUE, verbose=FALSE)
(summ <- summary(sim))
plot(X, W, xlab="true X", ylab="inferred X")
points(X, summ$ME_X[, "Mean"], col="green")
abline(0, 1, col="red")
legend("topleft", legend=c("prior mean", "posterior mean"), col=c("black", "green"), pch=c(1,1))
Compute DIC, WAIC and leave-one-out cross-validation model measures
Description
Compute the Deviance Information Criterion (DIC) or
Watanabe-Akaike Information Criterion (WAIC) from an
object of class mcdraws
output by MCMCsim
.
Method waic.mcdraws
computes WAIC using package loo.
Method loo.mcdraws
also depends on package loo to compute
a Pareto-smoothed importance sampling (PSIS) approximation
to leave-one-out cross-validation.
Usage
compute_DIC(x, use.pV = FALSE)
compute_WAIC(
x,
diagnostic = FALSE,
batch.size = NULL,
show.progress = TRUE,
cl = NULL,
n.cores = 1L
)
## S3 method for class 'mcdraws'
waic(x, by.unit = FALSE, ...)
## S3 method for class 'mcdraws'
loo(x, by.unit = FALSE, r_eff = FALSE, n.cores = 1L, ...)
Arguments
x |
an object of class |
use.pV |
whether half the posterior variance of the deviance should be used as an alternative estimate of the effective number of model parameters for DIC. |
diagnostic |
whether vectors of log-pointwise-predictive-densities and pointwise contributions to the WAIC effective number of model parameters should be returned. |
batch.size |
number of data units to process per batch. |
show.progress |
whether to show a progress bar. |
cl |
an existing cluster can be passed for parallel computation. If |
n.cores |
the number of cpu cores to use. Default is one, i.e. no parallel computation. |
by.unit |
if |
... |
Other arguments, passed to |
r_eff |
whether to compute relative effective sample size estimates
for the likelihood of each observation. This takes more time, but should
result in a better PSIS approximation. See |
Value
For compute_DIC
a vector with the deviance information criterion and
effective number of model parameters. For compute_WAIC
a vector with the
WAIC model selection criterion and WAIC effective number of model parameters.
Method waic
returns an object of class waic, loo
, see the
documentation for waic
in package loo.
Method loo
returns an object of class psis_loo
, see
loo
.
References
D. Spiegelhalter, N. Best, B. Carlin and A. van der Linde (2002). Bayesian Measures of Model Complexity and Fit. Journal of the Royal Statistical Society B 64 (4), 583-639.
S. Watanabe (2010). Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. Journal of Machine Learning 11, 3571-3594.
A. Gelman, J. Hwang and A. Vehtari (2014). Understanding predictive information criteria for Bayesian models. Statistics and Computing 24, 997-1016.
A. Vehtari, D. Simpson, A. Gelman, Y. Yao and J. Gabry (2024). Pareto smoothed importance sampling. arXiv:1507.02646v9.
A. Vehtari, A. Gelman and J. Gabry (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing 27, 1413-1432.
P.-C. Buerkner, J. Gabry and A. Vehtari (2021). Efficient leave-one-out cross-validation for Bayesian non-factorized normal and Student-t models. Computational Statistics 36, 1243-1261.
Examples
ex <- mcmcsae_example(n=100)
sampler <- create_sampler(ex$model, data=ex$dat)
sim <- MCMCsim(sampler, burnin=100, n.iter=300, n.chain=4, store.all=TRUE)
compute_DIC(sim)
compute_WAIC(sim)
if (require(loo)) {
waic(sim)
loo(sim, r_eff=TRUE)
}
Compute possibly sparse model matrix
Description
Compute possibly sparse model matrix
Usage
model_matrix(
formula,
data = NULL,
contrasts.arg = NULL,
drop.unused.levels = FALSE,
sparse = NULL,
drop0 = TRUE,
catsep = "",
by = NULL,
tabM = FALSE
)
Arguments
formula |
model formula. |
data |
data frame containing all variables used in |
contrasts.arg |
specification of contrasts for factor variables. Currently supported are "contr.none" (no contrasts applied), "contr.treatment" (first level removed) and "contr.SAS" (last level removed). Alternatively, a named list specifying a single level per factor variable can be passed. |
drop.unused.levels |
whether empty levels of individual factor variables should be removed. |
sparse |
if |
drop0 |
whether to drop any remaining explicit zeros in resulting sparse matrix. |
catsep |
separator for concatenating factor variable names and level names.
By default it is the empty string, reproducing the labels of |
by |
a vector by which to aggregate the result. |
tabM |
if |
Value
Design matrix X, either an ordinary matrix or a sparse dgCMatrix
.
Get the number of chains, samples per chain or the number of variables in a simulation object
Description
Get the number of chains, samples per chain or the number of variables in a simulation object
Usage
n_chains(obj)
n_draws(obj)
n_vars(dc)
Arguments
obj |
an mcdraws object or a draws component (dc) object. |
dc |
a draws component object. |
Value
The number of chains or retained samples per chain or the number of variables.
Examples
ex <- mcmcsae_example(n=50)
sampler <- create_sampler(ex$model, data=ex$dat)
sim <- MCMCsim(sampler, burnin=100, n.iter=300, thin=2, n.chain=5, store.all=TRUE)
n_chains(sim); n_chains(sim$beta)
n_draws(sim); n_draws(sim$beta)
n_vars(sim$beta); n_vars(sim$sigma_); n_vars(sim$llh_); n_vars(sim$v)
plot(sim, "beta")
n_chains(subset(sim$beta, chains=1:2))
n_draws(subset(sim$beta, draws=sample(1:n_draws(sim), 100)))
n_vars(subset(sim$u, vars=1:2))
Set computational options for the sampling algorithms
Description
Set computational options for the sampling algorithms
Usage
negbin_control(CRT.approx.m = 20L)
Arguments
CRT.approx.m |
scalar integer specifying the degree of approximation to sampling from a Chinese Restaurant Table distribution. The approximation is based on Le Cam's theorem. Larger values yield a slower but more accurate sampler. |
Value
A list with computational options for the sampling algorithm.
Get the parameter names from an mcdraws object
Description
Get the parameter names from an mcdraws object
Usage
par_names(obj)
Arguments
obj |
an mcdraws object. |
Value
The names of the parameters whose MCMC simulations are stored in obj
.
Examples
data(iris)
sampler <- create_sampler(Sepal.Length ~
reg(~ Petal.Length + Species, name="beta"), data=iris)
sim <- MCMCsim(sampler, burnin=100, n.iter=400)
(summary(sim))
par_names(sim)
Trace, density and autocorrelation plots for (parameters of a) draws component (dc) object
Description
Trace, density and autocorrelation plots for (parameters of a) draws component (dc) object
Usage
## S3 method for class 'dc'
plot(x, nrows, ncols, ask = FALSE, ...)
Arguments
x |
a draws component object. |
nrows |
number of rows in plot layout. |
ncols |
number of columns in plot layout. |
ask |
ask before plotting the next page; default is |
... |
arguments passed to |
Examples
ex <- mcmcsae_example(n=50)
sampler <- create_sampler(ex$model, data=ex$dat)
sim <- MCMCsim(sampler, store.all=TRUE)
plot(sim$u)
Trace, density and autocorrelation plots
Description
Trace, density and autocorrelation plots for selected components of an mcdraws
object.
Usage
## S3 method for class 'mcdraws'
plot(x, vnames, nrows, ncols, ask = FALSE, ...)
Arguments
x |
an object of class |
vnames |
optional character vector to select a subset of parameters. |
nrows |
number of rows in plot layout. |
ncols |
number of columns in plot layout. |
ask |
ask before plotting the next page; default is |
... |
arguments passed to |
Examples
ex <- mcmcsae_example(n=50)
sampler <- create_sampler(ex$model, data=ex$dat)
sim <- MCMCsim(sampler, store.all=TRUE)
plot(sim, c("beta", "u", "u_sigma", "v_sigma"), ask=TRUE)
Plot a set of model coefficients or predictions with uncertainty intervals based on summaries of simulation results or other objects.
Description
This function plots estimates with error bars. Multiple sets of
estimates can be compared. The error bars can either be based on
standard errors or on explicitly specified lower and upper bounds.
The function is adapted from function plot.sae
in package
hbsae, which in turn was adapted from function
coefplot.default
from package arm.
Usage
plot_coef(
...,
n.se = 1,
est.names,
sort.by = NULL,
decreasing = FALSE,
index = NULL,
maxrows = 50L,
maxcols = 6L,
offset = 0.1,
cex.var = 0.8,
mar = c(0.1, 2.1, 5.1, 0.1)
)
Arguments
... |
|
n.se |
number of standard errors below and above the point estimates
to use for error bars. By default equal to 1. This only refers to the
objects of class |
est.names |
labels to use in the legend for the components of the |
sort.by |
vector by which to sort the coefficients, referring to the first object passed. |
decreasing |
if |
index |
vector of names or indices of the selected areas to be plotted. |
maxrows |
maximum number of rows in a column. |
maxcols |
maximum number of columns of estimates on a page. |
offset |
space used between plots of multiple estimates for the same area. |
cex.var |
the font size for the variable names, default=0.8. |
mar |
a numeric vector of the form |
Examples
# create artificial data
set.seed(21)
n <- 100
dat <- data.frame(
x=runif(n),
f=factor(sample(1:20, n, replace=TRUE))
)
model <- ~ reg(~ x, prior=pr_normal(precision=1), name="beta") + gen(factor=~f, name="v")
gd <- generate_data(model, data=dat)
dat$y <- gd$y
# fit a base model
model0 <- y ~ reg(~ 1, name="beta") + gen(factor=~f, name="v")
sampler <- create_sampler(model0, data=dat)
sim <- MCMCsim(sampler, store.all=TRUE)
(summ0 <- summary(sim))
# fit 'true' model
model <- y ~ reg(~ x, name="beta") + gen(factor=~f, name="v")
sampler <- create_sampler(model, data=dat)
sim <- MCMCsim(sampler, store.all=TRUE)
(summ <- summary(sim))
# compare random effect estimates against true parameter values
plot_coef(summ0$v, summ$v, list(est=gd$pars$v), n.se=2, offset=0.2,
maxrows=10, est.names=c("base model", "true model", "true"))
Set computational options for the sampling algorithms
Description
Set computational options for the sampling algorithms
Usage
poisson_control(nb.shape = 100)
Arguments
nb.shape |
shape parameter of the negative binomial distribution used internally to approximate the Poisson distribution. This should be set to a relatively large value (default is 100), corresponding to negligible overdispersion, to obtain a good approximation to the Poisson sampling distribution. However, note that very large values may cause slow MCMC exploration of the posterior distribution. |
Value
A list with computational options for the sampling algorithm.
Get means or standard deviations of parameters from the MCMC output in an mcdraws object
Description
Get means or standard deviations of parameters from the MCMC output in an mcdraws object
Usage
get_means(obj, vnames = NULL)
get_sds(obj, vnames = NULL)
Arguments
obj |
an object of class |
vnames |
optional character vector to select a subset of parameters. |
Value
A list with simulation means or standard deviations.
Examples
ex <- mcmcsae_example(n=50)
sampler <- create_sampler(ex$model, data=ex$dat)
sim <- MCMCsim(sampler, burnin=100, n.iter=300, thin=2, n.chain=4)
get_means(sim)
get_means(sim, "e_")
sim <- MCMCsim(sampler, burnin=100, n.iter=300, thin=2, n.chain=4,
store.mean=c("beta", "u"), store.sds=TRUE)
summary(sim, "beta")
get_means(sim, "beta")
get_sds(sim, "beta")
get_means(sim, "u")
get_sds(sim, "u")
Create an object representing a Multivariate Log inverse Gamma (MLiG) prior distribution
Description
Create an object representing a Multivariate Log inverse Gamma (MLiG) prior distribution
Usage
pr_MLiG(mean = 0, precision = 0, labels = NULL, a = 1000)
Arguments
mean |
scalar or vector parameter for the mean in the large
|
precision |
scalar or vector parameter for the precision in the
large |
labels |
optional character vector with coefficient labels. If specified,
it should have the same length as at least one of |
a |
scalar parameter that controls how close the prior is to independent
normal priors with |
Value
An environment representing the specified prior, for internal use.
References
J.R. Bradley, S.H. Holan and C.K. Wikle (2018). Computationally efficient multivariate spatio-temporal models for high-dimensional count-valued data (with discussion). Bayesian Analysis 13(1), 253-310.
Create an object representing beta prior distributions
Description
Create an object representing beta prior distributions
Usage
pr_beta(a = 1, b = 1)
Arguments
a |
positive shape parameter. |
b |
positive shape parameter. |
Value
An environment representing the specified prior, for internal use.
Create an object representing exponential prior distributions
Description
Create an object representing exponential prior distributions
Usage
pr_exp(scale = 1)
Arguments
scale |
scalar or vector scale parameter. |
Value
An environment representing the specified prior, for internal use.
Create an object representing a degenerate prior fixing a parameter (vector) to a fixed value
Description
Create an object representing a degenerate prior fixing a parameter (vector) to a fixed value
Usage
pr_fixed(value = 1)
Arguments
value |
scalar or vector value parameter. |
Value
An environment representing the specified prior, for internal use.
Create an object representing gamma prior distributions
Description
Create an object representing gamma prior distributions
Usage
pr_gamma(shape = 1, rate = 1)
Arguments
shape |
scalar or vector shape parameter. |
rate |
scalar or vector rate, i.e. inverse scale, parameter. |
Value
An environment representing the specified prior, for internal use.
Create an object representing Generalised Inverse Gaussian (GIG) prior distributions
Description
Create an object representing Generalised Inverse Gaussian (GIG) prior distributions
Usage
pr_gig(a, b, p)
Arguments
a |
scalar or vector parameter. |
b |
scalar or vector parameter. |
p |
scalar or vector parameter. |
Value
An environment representing the specified prior, for internal use.
Create an object representing inverse chi-squared priors with possibly modelled degrees of freedom and scale parameters
Description
Create an object representing inverse chi-squared priors with possibly modelled degrees of freedom and scale parameters
Usage
pr_invchisq(df = 1, scale = 1)
Arguments
df |
degrees of freedom parameter. This can be a numeric scalar or
vector of length
|
scale |
scalar or vector scale parameter. Alternatively,
|
Value
An environment representing the specified prior, for internal use.
Create an object representing an inverse Wishart prior, possibly with modelled scale matrix
Description
Create an object representing an inverse Wishart prior, possibly with modelled scale matrix
Usage
pr_invwishart(df = NULL, scale = NULL)
Arguments
df |
Degrees of freedom parameter. This should be a scalar numeric value. Default value is the dimension plus one. |
scale |
Either a (known) scale matrix, or
|
Value
An environment representing the specified prior, for internal use.
References
A. Huang and M.P. Wand (2013). Simple marginally noninformative prior distributions for covariance matrices. Bayesian Analysis 8, 439-452.
Create an object representing a possibly multivariate normal prior distribution
Description
Create an object representing a possibly multivariate normal prior distribution
Usage
pr_normal(mean = 0, precision = 0, labels = NULL)
Arguments
mean |
scalar or vector mean parameter. |
precision |
scalar, vector or matrix precision parameter. |
labels |
optional character vector with coefficient labels. If specified,
it should have the same length as at least one of |
Value
An environment representing the specified prior, for internal use.
Create an object representing truncated normal prior distributions
Description
Create an object representing truncated normal prior distributions
Usage
pr_truncnormal(mean = 0, precision = 1, lower = 0, upper = Inf)
Arguments
mean |
scalar or vector mean parameter. |
precision |
scalar, vector or matrix precision parameter. |
lower |
lower limit of the truncated interval. |
upper |
lower limit of the truncated interval. |
Value
An environment representing the specified prior, for internal use.
Create an object representing uniform prior distributions
Description
Create an object representing uniform prior distributions
Usage
pr_unif(min = 0, max = 1)
Arguments
min |
lower limit. |
max |
upper limit. |
Value
An environment representing the specified prior, for internal use.
Generate draws from the predictive distribution
Description
Generate draws from the predictive distribution
Usage
## S3 method for class 'mcdraws'
predict(
object,
newdata = NULL,
X. = if (is.null(newdata)) "in-sample" else NULL,
type = c("data", "link", "response", "data_cat"),
weights = NULL,
fun. = identity,
labels = NULL,
ppcheck = FALSE,
iters = NULL,
to.file = FALSE,
filename,
write.single.prec = FALSE,
show.progress = TRUE,
verbose = TRUE,
n.cores = 1L,
cl = NULL,
seed = NULL,
export = NULL,
...
)
Arguments
object |
an object of class |
newdata |
data frame with auxiliary information to be used for prediction. |
X. |
a list of design matrices; alternatively, |
type |
the type of predictions. The default is |
weights |
an optional formula specifying a vector of nonnegative weights.
Weights are only used for data-generating prediction ( |
fun. |
function applied to the vector of posterior predictions to compute one or multiple summaries or test statistics. The function can have one or two arguments. The first argument is always the vector of posterior predictions. The optional second argument must be named 'p' and represents a list of model parameters, needed only when a test statistic depends on them. The function must return an integer or numeric vector. |
labels |
optional names for the output object. Must be a vector of the same length as the result of |
ppcheck |
if |
iters |
iterations in |
to.file |
if |
filename |
name of the file to write predictions to in case |
write.single.prec |
Whether to write to file in single precision. Default is |
show.progress |
whether to show a progress bar. |
verbose |
whether to show informative messages. |
n.cores |
the number of cpu cores to use. Default is one, i.e. no parallel computation.
If an existing cluster |
cl |
an existing cluster can be passed for parallel computation. If |
seed |
a random seed (integer). For parallel computation it is used to independently seed RNG streams for all workers. |
export |
a character vector with names of objects to export to the workers. This may
be needed for parallel execution if expressions in |
... |
currently not used. |
Value
An object of class dc
, containing draws from the posterior (or prior) predictive distribution.
If ppcheck=TRUE
posterior predictive p-values are returned as an additional attribute.
In case to.file=TRUE
the file name used is returned.
Examples
n <- 250
dat <- data.frame(x=runif(n))
dat$y <- 1 + dat$x + rnorm(n)
sampler <- create_sampler(y ~ x, data=dat)
sim <- MCMCsim(sampler)
summary(sim)
# in-sample prediction
pred <- predict(sim, ppcheck=TRUE)
hist(attr(pred, "ppp"))
# out-of-sample prediction
pred <- predict(sim, newdata=data.frame(x=seq(0, 1, by=0.1)))
summary(pred)
Display a summary of a dc
object
Description
Display a summary of a dc
object
Usage
## S3 method for class 'dc_summary'
print(
x,
digits = 3L,
max.lines = 1000L,
tail = FALSE,
sort = NULL,
max.label.length = NULL,
...
)
Arguments
x |
an object of class |
digits |
number of digits to use, defaults to 3. |
max.lines |
maximum number of lines to display.
If |
tail |
if |
sort |
column name on which to sort the output. |
max.label.length |
if specified, printed row labels will be abbreviated to at most this length. |
... |
passed on to |
Examples
ex <- mcmcsae_example()
sampler <- create_sampler(ex$model, data=ex$dat)
sim <- MCMCsim(sampler, store.all=TRUE)
print(summary(sim$u), sort="n_eff")
Print a summary of MCMC simulation results
Description
Display a summary of an mcdraws
object, as output by MCMCsim
.
Usage
## S3 method for class 'mcdraws_summary'
print(x, digits = 3L, max.lines = 10L, tail = FALSE, sort = NULL, ...)
Arguments
x |
an object of class |
digits |
number of digits to use, defaults to 3. |
max.lines |
maximum number of elements per vector parameter to display.
If |
tail |
if |
sort |
column name on which to sort the output. |
... |
passed on to |
Examples
ex <- mcmcsae_example()
sampler <- create_sampler(ex$model, data=ex$dat)
sim <- MCMCsim(sampler, store.all=TRUE)
print(summary(sim), sort="n_eff")
Read MCMC draws from a file
Description
Read draws written to file by MCMCsim
used with argument to.file
.
Usage
read_draws(name, filename = paste0("MCdraws_", name, ".dat"))
Arguments
name |
name of the parameter to load the corresponding file with posterior draws for. |
filename |
name of the file in which the draws are stored. |
Value
An object of class dc
containing MCMC draws for a (vector) parameter.
Examples
## Not run:
# NB this example creates a file "MCdraws_e_.dat" in the working directory
n <- 100
dat <- data.frame(x=runif(n), f=as.factor(sample(1:5, n, replace=TRUE)))
gd <- generate_data(~ reg(~ x + f, prior=pr_normal(precision=1), name="beta"), data=dat)
dat$y <- gd$y
sampler <- create_sampler(y ~ reg(~ x + f, name="beta"), data=dat)
# run the MCMC simulation and write draws of residuals to file:
sim <- MCMCsim(sampler, n.iter=500, to.file="e_")
summary(sim)
mcres <- read_draws("e_")
summary(mcres)
## End(Not run)
Create a model component object for a regression (fixed effects) component in the linear predictor
Description
This function is intended to be used on the right hand side of the
formula
argument to create_sampler
or
generate_data
. It creates an additive regression term in the
model's linear predictor. By default, the prior for the regression
coefficients is improper uniform. A proper normal prior can be set up
using function pr_normal
, and passed to argument prior
.
It should be noted that pr_normal
expects a precision matrix
as input for its second argument, and that the prior variance (matrix) is
taken to be the inverse of this precision matrix, where in case the
model's family is "gaussian"
this matrix is additionally
multiplied by the residual scalar variance parameter sigma_^2
.
Usage
reg(
formula = ~1,
remove.redundant = FALSE,
sparse = NULL,
X = NULL,
prior = NULL,
Q0 = NULL,
b0 = NULL,
constraints = NULL,
name = "",
debug = FALSE
)
Arguments
formula |
a formula specifying the predictors to be used in the model,
in the same way as the right hand side of the |
remove.redundant |
whether redundant columns should be removed from the
design matrix. Default is |
sparse |
whether the model matrix associated with |
X |
a (possibly sparse) design matrix can be specified directly, as an
alternative to the creation of one based on |
prior |
prior specification for the regression coefficients. Supported
priors can be specified using functions |
Q0 |
prior precision matrix for the regression effects. The default is a
zero matrix corresponding to a noninformative improper prior.
It can be specified as a scalar value, as a numeric vector of appropriate
length, or as a matrix object. DEPRECATED, please use argument |
b0 |
prior mean for the regression effect. Defaults to a zero vector.
It can be specified as a scalar value or as a numeric vector of
appropriate length. DEPRECATED, please use argument |
constraints |
optional linear equality and/or inequality constraints
imposed on the vector of regression coefficients. Use function
|
name |
the name of the model component. This name is used in the output of the
MCMC simulation function |
debug |
if |
Value
An object with precomputed quantities and functions for sampling from prior or conditional posterior distributions for this model component. Intended for internal use by other package functions.
Examples
data(iris)
# default: flat priors on regression coefficients
sampler <- create_sampler(Sepal.Length ~
reg(~ Petal.Length + Species, name="beta"),
data=iris
)
sim <- MCMCsim(sampler, burnin=100, n.iter=400)
summary(sim)
# (weakly) informative normal priors on regression coefficients
sampler <- create_sampler(Sepal.Length ~
reg(~ Petal.Length + Species, prior=pr_normal(precision=1e-2), name="beta"),
data=iris
)
sim <- MCMCsim(sampler, burnin=100, n.iter=400)
summary(sim)
# binary regression
sampler <- create_sampler(Species == "setosa" ~
reg(~ Sepal.Length, prior=pr_normal(precision=0.1), name="beta"),
family="binomial", data=iris)
sim <- MCMCsim(sampler, burnin=100, n.iter=400)
summary(sim)
pred <- predict(sim)
str(pred)
# example with equality constrained regression effects
n <- 500
df <- data.frame(x=runif(n))
df$y <- rnorm(n, 1 + 2*df$x)
R <- matrix(1, 2, 1)
r <- 3
C <- set_constraints(R=R, r=r)
sampler <- create_sampler(y ~ reg(~ 1 + x, constraints=C, name="beta"), data=df)
sim <- MCMCsim(sampler)
summary(sim)
plot(sim, "beta")
summary(transform_dc(sim$beta, fun=function(x) crossprod_mv(R, x) - r))
Extract draws of fitted values or residuals from an mcdraws object
Description
For a model created with create_sampler
and estimated using MCMCsim
,
these functions return the posterior draws of fitted values or residuals.
In the current implementation the fitted values correspond to the linear predictor
and the residuals are computed as the data vector minus the fitted values,
regardless of the model's distribution family.
For large datasets the returned object can become very large. One may therefore
select a subset of draws or chains or use mean.only=TRUE
to
return a vector of posterior means only.
Usage
## S3 method for class 'mcdraws'
fitted(
object,
mean.only = FALSE,
units = NULL,
chains = seq_len(n_chains(object)),
draws = seq_len(n_draws(object)),
matrix = FALSE,
type = c("link", "response"),
...
)
## S3 method for class 'mcdraws'
residuals(
object,
mean.only = FALSE,
units = NULL,
chains = seq_len(n_chains(object)),
draws = seq_len(n_draws(object)),
matrix = FALSE,
...
)
Arguments
object |
an object of class |
mean.only |
if |
units |
the data units (by default all) for which fitted values or residuals should be computed. |
chains |
optionally, a selection of chains. |
draws |
optionally, a selection of draws per chain. |
matrix |
whether a matrix should be returned instead of a dc object. |
type |
the type of fitted values: "link" for fitted values on the linear predictor scale (the default), and "response" for fitted values on the response scale. Returned residuals are always on the response scale. |
... |
currently not used. |
Value
Either a draws component object or a matrix with draws of fitted values or residuals.
The residuals are always on the response scale, whereas fitted values can
be on the scale of the linear predictor or the response depending on type
.
If mean.only=TRUE
, a vector of posterior means.
Examples
ex <- mcmcsae_example(n=50)
sampler <- create_sampler(ex$model, data=ex$dat)
sim <- MCMCsim(sampler, burnin=100, n.iter=300, thin=2, store.all=TRUE)
fitted(sim, mean.only=TRUE)
summary(fitted(sim))
residuals(sim, mean.only=TRUE)
summary(residuals(sim))
bayesplot::mcmc_intervals(as.matrix(subset(residuals(sim), vars=1:20)))
Set computational options for the sampling algorithms
Description
Set computational options for the sampling algorithms
Usage
sampler_control(
add.outer.R = TRUE,
add.eps.I = FALSE,
eps = sqrt(.Machine$double.eps),
recompute.e = TRUE,
cMVN.sampler = FALSE,
CG = NULL,
block = TRUE,
block.V = TRUE,
auto.order.block = TRUE,
chol.control = chol_control(),
max.size.cps.template = 100,
PG.approx = TRUE,
PG.approx.m = -2L
)
Arguments
add.outer.R |
whether to add the outer product of a constraint matrix to the
conditional posterior precision matrix of coefficients sampled in a block. This is used
to resolve singularity due to intrinsic GMRF components.
By default, |
add.eps.I |
whether to add a small positive multiple of the identity matrix
to the conditional posterior precision matrix of coefficients sampled in a block.
If needed, this can resolve singularity as an alternative to |
eps |
a positive scalar value, used only in case |
recompute.e |
when |
cMVN.sampler |
whether an extended linear system including dual variables is used
for equality constrained multivariate normal sampling. If set to |
CG |
use a conjugate gradient iterative algorithm instead of Cholesky updates for sampling
the model's coefficients. This must be a list with possible components |
block |
if |
block.V |
if |
auto.order.block |
whether Gibbs blocks should be ordered automatically in such a way that those with the most sparse design matrices come first. This way of ordering can make Cholesky updates more efficient. |
chol.control |
options for Cholesky decomposition, see |
max.size.cps.template |
maximum allowed size in MB of the sparse matrix serving as a template for the sparse symmetric crossproduct X'QX of a dgCMatrix X, where Q is a diagonal matrix subject to change. |
PG.approx |
whether Polya-Gamma draws for logistic binomial models are
approximated by a hybrid gamma convolution approach. If not, |
PG.approx.m |
if |
Value
A list with specified computational options used by various sampling functions.
References
D. Bates, M. Maechler, B. Bolker and S.C. Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software 67(1), 1-48.
Y. Chen, T.A. Davis, W.W. Hager and S. Rajamanickam (2008). Algorithm 887: CHOLMOD, supernodal sparse Cholesky factorization and update/downdate. ACM Transactions on Mathematical Software 35(3), 1-14.
Set options for Metropolis-Hastings sampling
Description
Set options for Metropolis-Hastings sampling
Usage
set_MH(type = "RWTN", scale = 0.025, adaptive = NULL, ...)
Arguments
type |
a character string defining the proposal distribution. Among supported types are random walk proposals "RWTN", "RWN" and "RWLN" with truncated normal, normal and log-normal proposal distributions. Other choices correspond to independence proposals: "TN" for a truncated normal proposal, "unif" for a uniform proposal, and "beta" and "gamma" for specific beta and gamma proposal distributions. Not all types are supported for a particular parameter; see the specific help of the function defining the model component of interest to see which proposal distribution types are supported. |
scale |
in case of the "RWTN" proposal, the (initial) scale of the distribution. |
adaptive |
in case of the random walk "RWTN" or "RWN"
proposals, whether the scale parameter is adapted based on
acceptance rates during the burnin phase of the MCMC simulation.
The default is |
... |
additional parameters depending on the proposal type. Supported arguments are 'l' and 'u' to pass the lower and upper limits of uniform or random walk truncated normal proposals (defaults l=0 and u=1), and 'a' and 'b' to pass the shape parameters of a beta proposal distribution (defaults a = b = 0.5). |
Value
An environment with variables and methods for Metropolis-Hastings sampling, for use by other package functions.
Set up a system of linear equality and/or inequality constraints
Description
Two-sided inequalities specified by S2, l2, u2
are currently
transformed into the one-sided form S'x >= s
, combined with
any directly specified constraints of this form. Some basic consistency
checks are carried out, notably regarding the dimensions of the inputs.
Usage
set_constraints(
R = NULL,
r = NULL,
S = NULL,
s = NULL,
S2 = NULL,
l2 = NULL,
u2 = NULL,
scale = FALSE
)
Arguments
R |
equality constraint matrix each column of which corresponds to a constraint. |
r |
right-hand side vector for equality constraints |
S |
inequality constraint matrix each column of which corresponds to an inequality constraint. |
s |
rhs vector for inequality constraints |
S2 |
inequality constraint matrix each column of which corresponds to a two-sided inequality constraint. |
l2 |
vector of lower bounds for two-sided inequality constraints |
u2 |
vector of upper bounds for two-sided inequality constraints |
scale |
whether to scale the columns of all constraint matrices to unit Euclidean norm. |
Value
An environment with constraint matrices and vectors and a method to check whether a numeric vector
satisfies all constraints. Returns NULL
in case of no constraints.
Set up conjugate gradient sampler
Description
Set up conjugate gradient sampler
Usage
setup_CG_sampler(mbs, X, sampler, control = CG_control())
Arguments
mbs |
block component containing several model components. |
X |
design matrix. |
sampler |
sampler object as created by |
control |
a list of options for the conjugate gradient algorithm that can be passed
using function |
Value
An environment with precomputed quantities and functions for multiplication by A and by an (inverse) preconditioning matrix.
References
A. Nishimura and M.A. Suchard (2022). Prior-preconditioned conjugate gradient method for accelerated Gibbs sampling in "large n, large p" Bayesian sparse regression. Journal of the American Statistical Association, 1-14.
Set up a cluster for parallel computing
Description
The cluster is set up for a number of workers by loading the mcmcsae package and setting up independent RNG streams.
Usage
setup_cluster(n.cores = NULL, seed = NULL, export = NULL)
Arguments
n.cores |
the number of cpu cores to use. |
seed |
optional random seed for reproducibility. |
export |
a character vector with names of objects to export to the workers. |
Value
An object representing the cluster.
Compute a Monte Carlo estimate of the marginal variances of a (I)GMRF
Description
Estimate marginal variances of a (I)GMRF prior defined in terms of a sparse precision matrix and possibly a set of equality constraints. The marginal variances might be used to rescale the precision matrix such that a default prior for a corresponding variance component is more appropriate.
Usage
sim_marg_var(
D,
Q = NULL,
R = NULL,
r = NULL,
eps1 = 1e-09,
eps2 = 1e-09,
nSim = 100L
)
Arguments
D |
factor of precision matrix Q such that Q=D'D. |
Q |
precision matrix. |
R |
equality restriction matrix. |
r |
rhs vector for equality constraints |
eps1 |
passed to |
eps2 |
passed to |
nSim |
number of Monte Carlo samples used to estimate the marginal variances. |
Value
A vector of Monte Carlo estimates of the marginal variances.
References
S.H. Sorbye and H. Rue (2014). Scaling intrinsic Gaussian Markov random field priors in spatial modelling. Spatial Statistics, 8, 39-51.
Stop a cluster
Description
Stop a cluster set up by setup_cluster
.
Usage
stop_cluster(cl)
Arguments
cl |
the cluster object. |
Value
NULL
.
Select a subset of chains, samples and parameters from a draws component (dc) object
Description
Select a subset of chains, samples and parameters from a draws component (dc) object
Usage
## S3 method for class 'dc'
subset(x, chains = NULL, draws = NULL, vars = NULL, ...)
Arguments
x |
a draws component (dc) object. |
chains |
an integer vector indicating which chains to select. |
draws |
an integer vector indicating which samples to select. |
vars |
an integer vector indicating which parameters to select. |
... |
not used. |
Value
The selected part of the draws component as an object of class dc
.
Examples
n <- 300
dat <- data.frame(x=runif(n), f=as.factor(sample(1:7, n, replace=TRUE)))
gd <- generate_data(~ reg(~ x + f, prior=pr_normal(precision=1), name="beta"), data=dat)
dat$y <- gd$y
sampler <- create_sampler(y ~ reg(~ x + f, name="beta"), data=dat)
sim <- MCMCsim(sampler)
(summary(sim$beta))
(summary(subset(sim$beta, chains=1)))
(summary(subset(sim$beta, chains=1, draws=sample(1:n_draws(sim), 100))))
(summary(subset(sim$beta, vars=1:2)))
Summarise a draws component (dc) object
Description
Summarise a draws component (dc) object
Usage
## S3 method for class 'dc'
summary(
object,
probs = c(0.05, 0.5, 0.95),
na.rm = FALSE,
time = NULL,
abbr = FALSE,
batch.size = 100L,
...
)
Arguments
object |
an object of class |
probs |
vector of probabilities at which to evaluate quantiles. |
na.rm |
whether to remove NA/NaN draws in computing the summaries. |
time |
MCMC computation time; if specified the effective sample size per unit of time is returned in an extra column labelled 'efficiency'. |
abbr |
if |
batch.size |
number of parameter columns to process simultaneously. A larger batch size may speed things up a little, but if an out of memory error occurs it may be a good idea to use a smaller number and try again. The default is 100. |
... |
arguments passed to |
Value
A matrix with summaries of class dc_summary
.
Examples
ex <- mcmcsae_example()
sampler <- create_sampler(ex$model, data=ex$dat)
sim <- MCMCsim(sampler, store.all=TRUE)
summary(sim$u)
Summarise an mcdraws object
Description
Summarise an mcdraws object
Usage
## S3 method for class 'mcdraws'
summary(
object,
vnames = NULL,
probs = c(0.05, 0.5, 0.95),
na.rm = FALSE,
efficiency = FALSE,
abbr = FALSE,
batch.size = 100L,
...
)
Arguments
object |
an object of class |
vnames |
optional character vector to select a subset of parameters. |
probs |
vector of probabilities at which to evaluate quantiles. |
na.rm |
whether to remove NA/NaN draws in computing the summaries. |
efficiency |
if |
abbr |
if |
batch.size |
number of parameter columns to process simultaneously for vector parameters. A larger batch size may speed things up a little, but if an out of memory error occurs it may be a good idea to use a smaller number and try again. The default is 100. |
... |
arguments passed to |
Value
A list of class mcdraws_summary
summarizing object
.
Examples
ex <- mcmcsae_example()
sampler <- create_sampler(ex$model, data=ex$dat)
sim <- MCMCsim(sampler, store.all=TRUE)
summary(sim)
par_names(sim)
summary(sim, c("beta", "v_sigma", "u_sigma"))
S4 method for row and column subsetting a 'tabMatrix'
Description
S4 method for row and column subsetting a 'tabMatrix'
Usage
## S4 method for signature 'tabMatrix,index,missing,logical'
x[i, j, ..., drop = TRUE]
## S4 method for signature 'tabMatrix,index,missing,missing'
x[i, j, ..., drop = TRUE]
## S4 method for signature 'tabMatrix,missing,index,logical'
x[i, j, ..., drop = TRUE]
## S4 method for signature 'tabMatrix,missing,index,missing'
x[i, j, ..., drop = TRUE]
Arguments
x |
a tabMatrix object. |
i |
integer vector indicating the rows to select. Not used for column subsetting. |
j |
integer vector indicating the columns to select. Not used for row subsetting. |
... |
not used. |
drop |
whether to return a vector in case of a single selected row or column. |
Value
The selected rows/columns as a tabMatrix or a vector.
Transform one or more draws component objects into a new one by applying a function
Description
Transform one or more draws component objects into a new one by applying a function
Usage
transform_dc(..., fun, to.matrix = FALSE, labels = NULL)
Arguments
... |
draws component object(s) of class |
fun |
a function to apply. This function should take as many arguments as there are input objects. The arguments can be arbitrarily named, but they are assumed to be in the same order as the input objects. The function should return a vector. |
to.matrix |
if |
labels |
optional labels for the output object. |
Value
Either a matrix or a draws component object.
Examples
ex <- mcmcsae_example(n=50)
sampler <- create_sampler(ex$model, data=ex$dat)
sim <- MCMCsim(sampler, burnin=100, n.iter=300, thin=2, n.chain=4, store.all=TRUE)
summary(sim$v_sigma)
summary(transform_dc(sim$v_sigma, fun=function(x) x^2))
summary(transform_dc(sim$u, sim$u_sigma, fun=function(x1, x2) abs(x1)/x2))
Create a model component object for a variance factor component in the variance function of a gaussian sampling distribution
Description
This function is intended to be used on the right hand side of the formula.V
argument to
create_sampler
or generate_data
.
Usage
vfac(
factor = "local_",
prior = pr_invchisq(df = 1, scale = 1),
name = "",
debug = FALSE
)
Arguments
factor |
The name of a factor variable. The name |
prior |
the prior assigned to the variance factors. Currently the prior can be inverse chi-squared
or exponential, specified by a call to |
name |
The name of the variance model component. This name is used in the output of the MCMC simulation
function |
debug |
If |
Value
An object with precomputed quantities and functions for sampling from prior or conditional posterior distributions for this model component. Intended for internal use by other package functions.
Create a model component object for a regression component in the variance function of a gaussian sampling distribution
Description
This function is intended to be used on the right hand side of the formula.V
argument to
create_sampler
or generate_data
.
Usage
vreg(
formula = NULL,
remove.redundant = FALSE,
sparse = NULL,
X = NULL,
prior = NULL,
Q0 = NULL,
b0 = NULL,
name = ""
)
Arguments
formula |
a formula for the regression effects explaining the log-variance.
Variable names are looked up in the data frame passed as |
remove.redundant |
whether redundant columns should be removed from the design matrix.
Default is |
sparse |
whether the model matrix associated with |
X |
a (possibly sparse) design matrix can be specified directly, as an alternative to the
creation of one based on |
prior |
prior specification for the coefficients. A normal prior can be
specified using function |
Q0 |
prior precision matrix for the regression effects. The default is a
zero matrix corresponding to a noninformative improper prior.
DEPRECATED, please use argument |
b0 |
prior mean for the regression effect. Defaults to a zero vector.
DEPRECATED, please use argument |
name |
the name of the model component. This name is used in the output of the
MCMC simulation function |
Value
An object with precomputed quantities and functions for sampling from prior or conditional posterior distributions for this model component. Intended for internal use by other package functions.
References
E. Cepeda and D. Gamerman (2000). Bayesian modeling of variance heterogeneity in normal regression models. Brazilian Journal of Probability and Statistics, 207-221.
T.I. Lin and W.L. Wang (2011). Bayesian inference in joint modelling of location and scale parameters of the t distribution for longitudinal data. Journal of Statistical Planning and Inference 141(4), 1543-1553.
Extract weights from an mcdraws object
Description
Extract weights from an mcdraws object
Usage
## S3 method for class 'mcdraws'
weights(object, ...)
Arguments
object |
an object of class |
... |
currently not used. |
Value
A vector with (simulation means of) weights.
Examples
# first create a population data frame
N <- 1000 # population size
pop <- data.frame(x=rnorm(N), area=factor(sample(1:10, N, replace=TRUE)))
pop$y <- 1 + 2*pop$x + seq(-1, to=1, length.out=10)[pop$area] + 0.5*rnorm(N)
pop$sample <- FALSE
pop$sample[sample(seq_len(N), 100)] <- TRUE
# a simple linear regression model:
sampler <- create_sampler(
y ~ reg(~ x, name="beta"),
linpred=list(beta=rowsum(model.matrix(~ x, pop), pop$area)), compute.weights=TRUE,
data=pop[pop$sample, ]
)
sim <- MCMCsim(sampler)
(summary(sim))
str(weights(sim))
crossprod_mv(weights(sim), pop$y[pop$sample])
summary(sim$linpred_)
# a multilevel model:
sampler <- create_sampler(
y ~ reg(~ x, name="beta") + gen(factor = ~ area, name="v"),
linpred=list(beta=rowsum(model.matrix(~ x, pop), pop$area), v=diag(10)), compute.weights=TRUE,
data=pop[pop$sample, ]
)
sim <- MCMCsim(sampler)
(summary(sim))
str(weights(sim))
crossprod_mv(weights(sim), pop$y[pop$sample])
summary(sim$linpred_)