Help for package mcp

Title:

Regression with Multiple Change Points

Version:

0.3.4

Date:

2024-03-14

URL:

https://lindeloev.github.io/mcp/

BugReports:

https://github.com/lindeloev/mcp/issues

Description:

Flexible and informed regression with Multiple Change Points. 'mcp' can infer change points in means, variances, autocorrelation structure, and any combination of these, as well as the parameters of the segments in between. All parameters are estimated with uncertainty and prediction intervals are supported - also near the change points. 'mcp' supports hypothesis testing via Savage-Dickey density ratio, posterior contrasts, and cross-validation. 'mcp' is described in Lindeløv (submitted) <doi:10.31219/osf.io/fzqxv> and generalizes the approach described in Carlin, Gelfand, & Smith (1992) <doi:10.2307/2347570> and Stephens (1994) <doi:10.2307/2986119>.

License:

GPL-2

Encoding:

UTF-8

Language:

en-US

LazyData:

true

RoxygenNote:

7.3.1

Depends:

R (≥ 3.5.0)

Imports:

parallel, future (≥ 1.16), future.apply (≥ 1.4), rjags (≥ 4.9), coda (≥ 0.19.3), loo (≥ 2.1.0), bayesplot (≥ 1.7.0), tidybayes (≥ 3.0.0), dplyr (≥ 1.1.1), magrittr (≥ 1.5), tidyr (≥ 1.0.0), tidyselect (≥ 0.2.5), tibble (≥ 2.1.3), stringr (≥ 1.4.0), ggplot2 (≥ 3.2.1), patchwork (≥ 1.0.0), stats, rlang (≥ 0.4.1)

Suggests:

hexbin, testthat (≥ 3.1.0), purrr (≥ 0.3.4), knitr, rmarkdown

NeedsCompilation:

Packaged:

2024-03-17 19:39:34 UTC; jonas

Author:

Jonas Kristoffer Lindeløv

[aut, cre]

Maintainer:

Jonas Kristoffer Lindeløv <jonas@lindeloev.dk>

Repository:

CRAN

Date/Publication:

2024-03-17 20:10:02 UTC

mcp: Regression with Multiple Change Points

Description

Flexible and informed regression with Multiple Change Points. 'mcp' can infer change points in means, variances, autocorrelation structure, and any combination of these, as well as the parameters of the segments in between. All parameters are estimated with uncertainty and prediction intervals are supported - also near the change points. 'mcp' supports hypothesis testing via Savage-Dickey density ratio, posterior contrasts, and cross-validation. 'mcp' is described in Lindeløv (submitted) doi:10.31219/osf.io/fzqxv and generalizes the approach described in Carlin, Gelfand, & Smith (1992) doi:10.2307/2347570 and Stephens (1994) doi:10.2307/2986119.

Author(s)

Maintainer: Jonas Kristoffer Lindeløv jonas@lindeloev.dk (ORCID)

Bernoulli family for mcp

Description

Bernoulli family for mcp

Usage

bernoulli(link = "logit")

Arguments

link

Link function.

Checks if all terms are in the data

Description

Checks if all terms are in the data

Usage

check_terms_in_data(form, data, i, n_terms = NULL)

Arguments

form

Formula or character (tilde will be prefixed if it isn't already)

data

A data.frame or tibble

i

The segment number (integer)

n_terms

Int >= 1. Number of expected terms. Will raise error if it doesn't match.

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Compute information criteria for model comparison

Description

Takes an mcpfit as input and computes information criteria using loo or WAIC. Compare models using loo_compare and loo_model_weights. more in loo.

Usage

criterion(fit, criterion = "loo", ...)

## S3 method for class 'mcpfit'
loo(x, ...)

## S3 method for class 'mcpfit'
waic(x, ...)

Arguments

fit

An mcpfit object.

criterion

One of "loo" (calls loo) or "waic" (calls waic).

...

Currently ignored

x

An mcpfit object.

Value

a loo or psis_loo object.

Functions

loo(mcpfit): Computes loo on mcpfit objects
waic(mcpfit): Computes WAIC on mcpfit objects

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Examples


# Define two models and sample them
# options(mc.cores = 3)  # Speed up sampling
ex = mcp_example("intercepts")  # Get some simulated data.
model1 = list(y ~ 1 + x, ~ 1)
model2 = list(y ~ 1 + x)  # Without a change point
fit1 = mcp(model1, ex$data)
fit2 = mcp(model2, ex$data)

# Compute LOO for each and compare (works for waic(fit) too)
fit1$loo = loo(fit1)
fit2$loo = loo(fit2)
loo::loo_compare(fit1$loo, fit2$loo)

Cumulative pasting of character columns

Description

Cumulative pasting of character columns

Usage

cumpaste(x, .sep = " ")

Arguments

x

A column

.sep

A character to append between pastes

Value

string.

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk but Inspired by https://stackoverflow.com/questions/24862046/cumulatively-paste-concatenate-values-grouped-by-another-variable

Example `mcpfit` for examples

Description

This was generated using mcp_examples("demo", sample = TRUE).

Usage

demo_fit

Format

An mcpfit object.

Exponential family for mcp

Description

Exponential family for mcp

Usage

exponential(link = "identity")

Arguments

link

Link function (Character).

Expected Values from the Posterior Predictive Distribution

Description

Expected Values from the Posterior Predictive Distribution

Usage

## S3 method for class 'mcpfit'
fitted(
  object,
  newdata = NULL,
  summary = TRUE,
  probs = TRUE,
  rate = TRUE,
  prior = FALSE,
  which_y = "ct",
  varying = TRUE,
  arma = TRUE,
  nsamples = NULL,
  samples_format = "tidy",
  scale = "response",
  ...
)

Arguments

object

An mcpfit object.

newdata

A tibble or a data.frame containing predictors in the model. If NULL (default), the original data is used.

summary

Summarise at each x-value

probs

Vector of quantiles. Only in effect when summary == TRUE.

rate

Boolean. For binomial models, plot on raw data (rate = FALSE) or response divided by number of trials (rate = TRUE). If FALSE, linear interpolation on trial number is used to infer trials at a particular x.

prior

TRUE/FALSE. Plot using prior samples? Useful for mcp(..., sample = "both")

which_y

What to plot on the y-axis. One of

"ct": The central tendency which is often the mean after applying the link function.
"sigma": The variance
"ar1", "ar2", etc. depending on which order of the autoregressive effects you want to plot.

varying

One of:

TRUE All varying effects (fit$pars$varying).
FALSE No varying effects (c()).
Character vector: Only include specified varying parameters - see fit$pars$varying.

arma

Whether to include autoregressive effects.

TRUE Compute autoregressive residuals. Requires the response variable in newdata.
FALSE Disregard the autoregressive effects. For family = gaussian(), predict() just use sigma for residuals.

nsamples

Integer or NULL. Number of samples to return/summarise. If there are varying effects, this is the number of samples from each varying group. NULL means "all". Ignored if both are FALSE. More samples trade speed for accuracy.

samples_format

One of "tidy" or "matrix". Controls the output format when summary == FALSE. See more under "value"

scale

One of

"response": return on the observed scale, i.e., after applying the inverse link function.
"linear": return on the parameter scale (where the linear trends are modelled).

...

Currently unused

Value

If summary = TRUE: A tibble with the posterior mean for each row in newdata, If newdata is NULL, the data in fit$data is used.
If summary = FALSE and samples_format = "tidy": A tidybayes tibble with all the posterior samples (Ns) evaluated at each row in newdata (Nn), i.e., with ⁠Ns x Nn⁠ rows. If there are varying effects, the returned data is expanded with the relevant levels for each row.

The return columns are:
- Predictors from newdata.
- Sample descriptors: ".chain", ".iter", ".draw" (see the tidybayes package for more), and "data_row" (newdata rownumber)
- Sample values: one column for each parameter in the model.
- The estimate. Either "predict" or "fitted", i.e., the name of the type argument.
If summary = FALSE and samples_format = "matrix": An N_draws X nrows(newdata) matrix with fitted/predicted values (depending on type). This format is used by brms and it's useful as yrep in ⁠bayesplot::ppc_*⁠ functions.

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Examples


fitted(demo_fit)
fitted(demo_fit, probs = c(0.1, 0.5, 0.9))  # With median and 80% credible interval.
fitted(demo_fit, summary = FALSE)  # Samples instead of summary.
fitted(demo_fit,
       newdata = data.frame(time = c(-5, 20, 300)),  # New data
       probs = c(0.025, 0.5, 0.975))

Format code with one or multiple terms

Description

Take a value like "a + b" and (1) replace it with NA if na_col == NA. (2) Change to "(a + b)" if there is a "+" (3) Return itself otherwise, e.g., "a" –> "a".

Usage

format_code(col, na_col)

Arguments

col

A column

na_col

If this column is NA, return NA

Value

string

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Density geom for `plot.mcpfit()`

Description

Density geom for plot.mcpfit()

Usage

geom_cp_density(fit, facet_by, limits_y)

Arguments

fit

An mcpfit object

facet_by

NULL or a a string, like ⁠plot.mcpfit(..., facet_by = "id").⁠

Value

A ggplot2::stat_density geom representing the change point densities.

Return a geom_line representing the quantiles

Description

Called by plot.mcpfit.

Usage

geom_quantiles(samples, quantiles, xvar, yvar, facet_by, ...)

Arguments

samples

A tidybayes tibble

quantiles

Vector of quantiles (0.0 to 1.0)

xvar

An rlang::sym() with the name of the x-col in samples

yvar

An rlang::sym() with the name of the response col in samples

facet_by

String. Name of a varying group.

...

Arguments passed to geom_line

Value

A ggplot2::geom_line object.

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Call `get_formula_str` for multiple ytypes and paste strings

Description

Currently used to differentiate between the JAGS model (use all) and the fit$simulate model (do not include arma).

Usage

get_all_formulas(ST, prior, par_x, ytypes = c("ct", "sigma", "arma"))

Arguments

ST

Tibble. Returned by get_segment_table.

prior

Named list. Names are parameter names (cp_i, int_i, xvar_i, 'sigma“) and the values are either

A JAGS distribution (e.g., int_1 = "dnorm(0, 1) T(0,)") indicating a conventional prior distribution. Uninformative priors based on data properties are used where priors are not specified. This ensures good parameter estimations, but it is a questionable for hypothesis testing. mcp uses SD (not precision) for dnorm, dt, dlogis, etc. See details. Change points are forced to be ordered through the priors using truncation, except for uniform priors where the lower bound should be greater than the previous change point, dunif(cp_1, MAXX).
A numerical value (e.g., int_1 = -2.1) indicating a fixed value.
A model parameter name (e.g., int_2 = "int_1"), indicating that this parameter is shared - typically between segments. If two varying effects are shared this way, they will need to have the same grouping variable.
A scaled Dirichlet prior is supported for change points if they are all set to ⁠cp_i = "dirichlet(N)⁠ where N is the alpha for this change point and N = 1 is most often used. This prior is less informative about the location of the change points than the default uniform prior, but it samples less efficiently, so you will often need to set iter higher. It is recommended for hypothesis testing and for the estimation of more than 5 change points. Read more.

par_x

String (default: NULL). Only relevant if no segments contains slope (no hint at what x is). Set this, e.g., par_x = "time".

ytypes

A character vector of ytypes to including in model building

Value

A string with JAGS code.

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Gets code for ARMA terms, resulting in a "resid_"

Description

Developer note: Ensuring that this can be used in both simulate() and JAGS got quite messy with a lot of if-statements. It works but some refactoring may be good in the future.

Usage

get_ar_code(ar_order, family, is_R, xvar, yvar = NA)

Arguments

ar_order

Positive integer. The order of ARMA

family

An mcpfamily object

is_R

Bool. Is this R code (TRUE) or JAGS code (FALSE)?

Value

String with JAGS code for AR.

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Extracts the order from ARMA parameter name(s)

Description

If several names are provided (vector), it returns the maximum. If pars_arma is an empty string, it returns 0.

Usage

get_arma_order(pars_arma)

Arguments

pars_arma

Character vector

Value

integer

Compute the density at a specific point.

Description

Used in hypothesis

Usage

get_density(samples, LHS, value)

Arguments

samples

An mcmc.list

LHS

Expression to compute posterior

value

What value to evaluate the density at

Value

A float

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Get a list of x-coordinates to evaluate fit$simulate at

Description

Solves two problems: if setting the number of points too high, the function becomes slow. If setting it too low, the posterior at large intercept- changes at change points look discrete, because they are evaluated at very few x in that interval.

Usage

get_eval_at(fit, facet_by)

Arguments

fit

An mcpfit object.

facet_by

String. Name of a varying group.

Details

This function makes a vector of x-values with large spacing in general, but finer resolution at change points.

Value

A vector of x-values to evaluate at.

Build an R formula (as string) given a segment table (ST)

Description

You will need to replace PAR_X for whatever your x-axis observation column is called. In JAGS typically x[i_]. In R just x.

Usage

get_formula_str(ST, par_x, ytype = "ct", init = FALSE)

Arguments

ST

Tibble. Returned by get_segment_table.

par_x

String (default: NULL). Only relevant if no segments contains slope (no hint at what x is). Set this, e.g., par_x = "time".

ytype

One of "ct" (central tendency), "sigma", "ar1" (or another order), or "ma1" (or another order)

init

TRUE/FALSE. Set to TRUE for the first call. Adds segment-relative X-codings and verbose commenting of one formula

Value

A string with JAGS code.

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Adds helper variables for use in `run_jags`

Description

Returns the relevant data columns as a list and add elements with unique varying group levels.

Usage

get_jags_data(data, ST, jags_code, sample)

Arguments

data

A tibble

ST

A segment table (tibble), returned by get_segment_table.

jags_code

A string. JAGS model, usually returned by make_jagscode().

sample

One of

"post": Sample the posterior.
"prior": Sample only the prior. Plots, summaries, etc. will use the prior. This is useful for prior predictive checks.
"both": Sample both prior and posterior. Plots, summaries, etc. will default to using the posterior. The prior only has effect when doing Savage-Dickey density ratios in hypothesis.
"none" or FALSE: Do not sample. Returns an mcpfit object without sample. This is useful if you only want to check prior strings (fit$prior), the JAGS model (fit$jags_code), etc.

Make JAGS code for Multiple Change Point model

Description

Make JAGS code for Multiple Change Point model

Usage

get_jagscode(prior, ST, formula_str, arma_order, family, sample)

Arguments

prior

Named list. Names are parameter names (cp_i, int_i, xvar_i, 'sigma“) and the values are either

A JAGS distribution (e.g., int_1 = "dnorm(0, 1) T(0,)") indicating a conventional prior distribution. Uninformative priors based on data properties are used where priors are not specified. This ensures good parameter estimations, but it is a questionable for hypothesis testing. mcp uses SD (not precision) for dnorm, dt, dlogis, etc. See details. Change points are forced to be ordered through the priors using truncation, except for uniform priors where the lower bound should be greater than the previous change point, dunif(cp_1, MAXX).
A numerical value (e.g., int_1 = -2.1) indicating a fixed value.
A model parameter name (e.g., int_2 = "int_1"), indicating that this parameter is shared - typically between segments. If two varying effects are shared this way, they will need to have the same grouping variable.
A scaled Dirichlet prior is supported for change points if they are all set to ⁠cp_i = "dirichlet(N)⁠ where N is the alpha for this change point and N = 1 is most often used. This prior is less informative about the location of the change points than the default uniform prior, but it samples less efficiently, so you will often need to set iter higher. It is recommended for hypothesis testing and for the estimation of more than 5 change points. Read more.

ST

Segment table. Returned by get_segment_table().

formula_str

String. The formula string returned by build_formula_str.

arma_order

Positive integer. The autoregressive order.

family

One of gaussian(), binomial(), bernoulli(), or poission(). Only default link functions are currently supported.

sample

One of

"post": Sample the posterior.
"prior": Sample only the prior. Plots, summaries, etc. will use the prior. This is useful for prior predictive checks.
"both": Sample both prior and posterior. Plots, summaries, etc. will default to using the posterior. The prior only has effect when doing Savage-Dickey density ratios in hypothesis.
"none" or FALSE: Do not sample. Returns an mcpfit object without sample. This is useful if you only want to check prior strings (fit$prior), the JAGS model (fit$jags_code), etc.

Value

String. A JAGS model.

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

pp_check for loo statistics

Description

pp_check for loo statistics

Usage

get_ppc_plot(fit, type, y, yrep, nsamples, draws = NULL, ...)

Arguments

type

One of bayesplot::available_ppc("grouped", invert = TRUE) %>% stringr::str_remove("ppc_")

y

Response vector

yrep

S X N matrix of predicted responses

nsamples

Number of draws. Note that you may want to use all data for summary geoms. e.g., pp_check(fit, type = "ribbon", nsamples = NULL).

draws

(required for loo-type plots) Indices of draws to use.

...

Arguments passed to bayesplot::ppc_type(y, yrep, ...)

Value

A ggplot2 object returned by ⁠tidybayes::ppc_*(y, yrep, ...)⁠.

A string

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Get priors for all parameters in a segment table.

Description

Starts by finding all default priors. Then replace them with user priors. User priors for change points are truncated appropriately using 'truncate_prior_cp“, if not done manually by the user already.

Usage

get_prior(ST, family, prior = list())

Arguments

ST

Tibble. A segment table returned by get_segment_table.

family

One of gaussian(), binomial(), bernoulli(), or poission(). Only default link functions are currently supported.

prior

A list of user-defined priors. Will overwrite the relevant default priors.

Value

A named list of strings. The names correspond to the parameter names and the strings are the JAGS code for the prior (before converting SD to precision).

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Get JAGS code for a prior

Description

Get JAGS code for a prior

Usage

get_prior_str(prior, i, varying_group = NULL)

Arguments

prior

Named list. Names are parameter names (cp_i, int_i, xvar_i, 'sigma“) and the values are either

A JAGS distribution (e.g., int_1 = "dnorm(0, 1) T(0,)") indicating a conventional prior distribution. Uninformative priors based on data properties are used where priors are not specified. This ensures good parameter estimations, but it is a questionable for hypothesis testing. mcp uses SD (not precision) for dnorm, dt, dlogis, etc. See details. Change points are forced to be ordered through the priors using truncation, except for uniform priors where the lower bound should be greater than the previous change point, dunif(cp_1, MAXX).
A numerical value (e.g., int_1 = -2.1) indicating a fixed value.
A model parameter name (e.g., int_2 = "int_1"), indicating that this parameter is shared - typically between segments. If two varying effects are shared this way, they will need to have the same grouping variable.
A scaled Dirichlet prior is supported for change points if they are all set to ⁠cp_i = "dirichlet(N)⁠ where N is the alpha for this change point and N = 1 is most often used. This prior is less informative about the location of the change points than the default uniform prior, but it samples less efficiently, so you will often need to set iter higher. It is recommended for hypothesis testing and for the estimation of more than 5 change points. Read more.

i

The index in prior to get code for

varying_group

String or NULL. Null indicates a population- level prior. String indicates a varying-effects prior (one for each group level).

Value

A string

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Expand samples with quantiles

Description

TO DO: implement using fitted() and predict() but avoid double-computing the samples? E.g.: ⁠get_quantiles2 = function(fit, quantiles, facet_by = NULL) {⁠ ⁠fitted(fit, probs = c(0.1, 0.5, 0.9), newdata = data.frame(x = c(11, 50, 100))) %>%⁠ ⁠tidyr::pivot_longer(tidyselect::starts_with("Q")) %>%⁠ dplyr::mutate(quantile = stringr::str_remove(name, "Q") %>% as.numeric() / 100) ⁠}⁠

Usage

get_quantiles(samples, quantiles, xvar, yvar, facet_by = NULL)

Arguments

samples

A tidybayes tibble

quantiles

Vector of quantiles (0.0 to 1.0)

xvar

An rlang::sym() with the name of the x-col in samples

yvar

An rlang::sym() with the name of the response col in samples

facet_by

String. Name of a varying group.

Value

A tidybayes long format tibble with the column "quantile"

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Build a table describing a list of segments

Description

Used internally for most mcp functions.

Usage

get_segment_table(model, data = NULL, family = gaussian(), par_x = NULL)

Arguments

model

A list of formulas - one for each segment. The first formula has the format response ~ predictors while the following formulas have the format response ~ changepoint ~ predictors. The response and change points can be omitted (changepoint ~ predictors assumes same response. ~ predictors assumes an intercept-only change point). The following can be modeled:

Regular formulas: e.g., ~ 1 + x). Read more.
Extended formulas:, e.g., ~ I(x^2) + exp(x) + sin(x). Read more.
Variance: e.g., ~sigma(1) for a simple variance change or ~sigma(rel(1) + I(x^2))) for more advanced variance structures. Read more
Autoregression: e.g., ~ar(1) for a simple onset/change in AR(1) or ⁠ar(2, 0 + x⁠) for an AR(2) increasing by x. Read more

data

Data.frame or tibble in long format.

family

One of gaussian(), binomial(), bernoulli(), or poission(). Only default link functions are currently supported.

par_x

String (default: NULL). Only relevant if no segments contains slope (no hint at what x is). Set this, e.g., par_x = "time".

Value

A tibble with one row describing each segment and the corresponding code.

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Examples

model = list(
  y ~ 1 + x,
  1 + (1|id) ~ 1
)
get_segment_table(model)

Turn formula_str into a proper R function

Description

Turn formula_str into a proper R function

Usage

get_simulate(formula_str, pars, nsegments, family)

Arguments

formula_str

string. Returned by get_formula.

pars

List of user-provided parameters, in the format of fit$pars.

nsegments

Positive integer. Number of segments, typically nrow(ST).

family

One of gaussian(), binomial(), bernoulli(), or poission(). Only default link functions are currently supported.

Value

A string with R code for the fit$simulate() function corresponding to the model.

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Internal function for summary.mcpfit, fixef.mcpfit, and ranef.mcpfit

Description

Internal function for summary.mcpfit, fixef.mcpfit, and ranef.mcpfit

Usage

get_summary(fit, width, varying = FALSE, prior = FALSE)

Arguments

fit

An mcpfit' object.

width

Float. The width of the highest posterior density interval (between 0 and 1).

varying

Boolean. Get results for varying (TRUE) or population (FALSE)?

prior

TRUE/FALSE. Summarise prior instead of posterior?

Value

A data.frame with summaries for each model parameter.

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Get formula inside a wrapper

Description

Get formula inside a wrapper

Usage

get_term_content(term)

Arguments

term

E.g., "ct(1 + x)", "sigma(0 + rel(x) + I(x^2))", etc.

Value

char formula with the content inside the brackets.

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Test hypotheses on mcp objects.

Description

Returns posterior probabilities and Bayes Factors for flexible hypotheses involving model parameters. The documentation for the argument hypotheses below shows examples of how to specify hypotheses, and read worked examples on the mcp website. For directional hypotheses, ⁠hypothesis`` executes the hypothesis string in a ⁠tidybayes“ environment and summerises the proportion of samples where the expression evaluates to TRUE. For equals-hypothesis, a Savage-Dickey ratio is computed. Savage-Dickey requires a prior too, so remember mcp(..., sample = "both"). This function is heavily inspired by the 'hypothesis' function from the 'brms' package.

Usage

hypothesis(fit, hypotheses, width = 0.95, digits = 3)

Arguments

fit

An mcpfit object.

hypotheses

String representation of a logical test involving model parameters. Takes R code that evaluates to TRUE or FALSE in a vectorized way.

Directional hypotheses are specified using <, >, <=, or >=. hypothesis returns the posterior probability and odds in favor of the stated hypothesis. The odds can be interpreted as a Bayes Factor. For example:

"cp_1 > 30": the first change point is above 30.
"int_1 > int_2": the intercept is greater in segment 1 than 2.
"x_2 - x_1 <= 3": the difference between slope 1 and 2 is less than or equal to 3.
"int_1 > -2 & int_1 < 2": int_1 is between -2 and 2 (an interval hypothesis). This can be useful as a Region Of Practical Equivalence test (ROPE).
"cp_1^2 < 30 | (log(x_1) + log(x_2)) > 5": be creative.
"`cp_1_id[1]` > `cp_1_id[2]`": id1 is greater than id2, as estimated through the varying-by-"id" change point in segment 1. Note that `` required for varying effects.

Hypotheses can also test equality using the equal sign (=). This runs a Savage-Dickey test, i.e., the proportion by which the probability density has increased from the prior to the posterior at a given value. Therefore, it requires mcp(sample = "both"). There are two requirements: First, there can only be one equal sign, so don't use and (&) or or (|). Second, the point to test has to be on the right, and the variables on the left.

"cp_1 = 30": is the first change point at 30? Or to be more precise: by what factor has the credence in cp_1 = 30 risen/fallen when conditioning on the data, relative to the prior credence?
"int_1 + int_2 = 0": Is the sum of two intercepts zero?
"`cp_1_id[John]`/`cp_1_id[Erin]` = 2": is the varying change point for John (which is relative to 'cp_1“) double that of Erin?

width

Float. The width of the highest posterior density interval (between 0 and 1).

digits

a non-null value for digits specifies the minimum number of significant digits to be printed in values. The default, NULL, uses getOption("digits"). (For the interpretation for complex numbers see signif.) Non-integer values will be rounded down, and only values greater than or equal to 1 and no greater than 22 are accepted.

Value

A data.frame with a row per hypothesis and the following columns:

hypothesis is the hypothesis; often re-arranged to test against zero.
mean is the posterior mean of the left-hand side of the hypothesis.
lower is the lower bound of the (two-sided) highest-density interval of width width.
upper is the upper bound of ditto.
p Posterior probability. For "=" (Savage-Dickey), it is the BF converted to p. For directional hypotheses, it is the proportion of samples that returns TRUE.
BF Bayes Factor in favor of the hypothesis. For "=" it is the Savage-Dickey density ratio. For directional hypotheses, it is p converted to odds.

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Inverse logit function

Description

Inverse logit function

Usage

ilogit(eta)

Arguments

eta

A vector of logits

Value

A vector with same length as eta

Checks if argument is an `mcpfit` object

Description

Checks if argument is an mcpfit object

Usage

is.mcpfit(x)

Arguments

x

An R object.

Logit function

Description

Logit function

Usage

logit(mu)

Arguments

mu

A vector of probabilities (0.0 to 1.0)

Value

A vector with same length as mu

Internal function to get samples.

Description

Returns posterior samples, if available. If not, then prior samples. If not, then throw an informative error. This is useful for summary and plotting, that works on both.

Usage

mcmclist_samples(fit, prior = FALSE, message = TRUE, error = TRUE)

Arguments

fit

An mcpfit object

prior

TRUE/FALSE. Summarise prior instead of posterior?

message

TRUE: gives a message if returning prior samples. FALSE = no message

error

TRUE: err if there are no samples. FALSE: return NULL

Fit Multiple Linear Segments And Their Change Points

Description

Given a model (a list of segment formulas), mcp infers the posterior distributions of the parameters of each segment as well as the change points between segments. See more details and worked examples on the mcp website. All segments must regress on the same x-variable. Change points are forced to be ordered using truncation of the priors. You can run fit = mcp(model, sample=FALSE) to avoid sampling and the need for data if you just want to get the priors (fit$prior), the JAGS code fit$jags_code, or the R function to simulate data (fit$simulate).

Usage

mcp(
  model,
  data = NULL,
  prior = list(),
  family = gaussian(),
  par_x = NULL,
  sample = "post",
  cores = 1,
  chains = 3,
  iter = 3000,
  adapt = 1500,
  inits = NULL,
  jags_code = NULL
)

Arguments

model

Regular formulas: e.g., ~ 1 + x). Read more.
Extended formulas:, e.g., ~ I(x^2) + exp(x) + sin(x). Read more.
Variance: e.g., ~sigma(1) for a simple variance change or ~sigma(rel(1) + I(x^2))) for more advanced variance structures. Read more
Autoregression: e.g., ~ar(1) for a simple onset/change in AR(1) or ⁠ar(2, 0 + x⁠) for an AR(2) increasing by x. Read more

data

Data.frame or tibble in long format.

prior

Named list. Names are parameter names (cp_i, int_i, xvar_i, 'sigma“) and the values are either

A JAGS distribution (e.g., int_1 = "dnorm(0, 1) T(0,)") indicating a conventional prior distribution. Uninformative priors based on data properties are used where priors are not specified. This ensures good parameter estimations, but it is a questionable for hypothesis testing. mcp uses SD (not precision) for dnorm, dt, dlogis, etc. See details. Change points are forced to be ordered through the priors using truncation, except for uniform priors where the lower bound should be greater than the previous change point, dunif(cp_1, MAXX).
A numerical value (e.g., int_1 = -2.1) indicating a fixed value.
A model parameter name (e.g., int_2 = "int_1"), indicating that this parameter is shared - typically between segments. If two varying effects are shared this way, they will need to have the same grouping variable.
A scaled Dirichlet prior is supported for change points if they are all set to ⁠cp_i = "dirichlet(N)⁠ where N is the alpha for this change point and N = 1 is most often used. This prior is less informative about the location of the change points than the default uniform prior, but it samples less efficiently, so you will often need to set iter higher. It is recommended for hypothesis testing and for the estimation of more than 5 change points. Read more.

family

One of gaussian(), binomial(), bernoulli(), or poission(). Only default link functions are currently supported.

par_x

String (default: NULL). Only relevant if no segments contains slope (no hint at what x is). Set this, e.g., par_x = "time".

sample

One of

"post": Sample the posterior.
"prior": Sample only the prior. Plots, summaries, etc. will use the prior. This is useful for prior predictive checks.
"both": Sample both prior and posterior. Plots, summaries, etc. will default to using the posterior. The prior only has effect when doing Savage-Dickey density ratios in hypothesis.
"none" or FALSE: Do not sample. Returns an mcpfit object without sample. This is useful if you only want to check prior strings (fit$prior), the JAGS model (fit$jags_code), etc.

cores

Positive integer or "all". Number of cores.

1: serial sampling. options(mc.cores = 3) will dominate cores = 1 but not larger values of cores.
⁠>1⁠: parallel sampling on this number of cores. Ideally set chains to the same value. Note: cores > 1 takes a few extra seconds the first time it's called but subsequent calls will start sampling immediately.
"all": use all cores but one and sets chains to the same value. This is a convenient way to maximally use your computer's power.

chains

Positive integer. Number of chains to run.

iter

Positive integer. Number of post-warmup draws from each chain. The total number of draws is iter * chains.

adapt

Positive integer. Also sometimes called "burnin", this is the number of samples used to reach convergence. Set lower for greater speed. Set higher if the chains haven't converged yet or look at tips, tricks, and debugging.

inits

A list if initial values for the parameters. This can be useful if a model fails to converge. Read more in jags.model. Defaults to NULL, i.e., no inits.

jags_code

String. Pass JAGS code to mcp to use directly. This is useful if you want to tweak the code in fit$jags_code and run it within the mcp framework.

Details

Notes on priors:

Order restriction is automatically applied to cp_\* parameters using truncation (e.g., T(cp_1, )) so that they are in the correct order on the x-axis UNLESS you do it yourself. The one exception is for dunif distributions where you have to do it as above.
In addition to the model parameters, MINX (minimum x-value), MAXX (maximum x-value), SDX (etc...), MINY, MAXY, and SDY are also available when you set priors. They are used to set uninformative default priors.
Use SD when you specify priors for dt, dlogis, etc. JAGS uses precision but mcp converts to precision under the hood via the sd_to_prec() function. So you will see SDs in fit$prior but precision ($1/SD^2) in fit$jags_code

Value

An mcpfit object.

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Examples


# Define the segments using formulas. A change point is estimated between each formula.
model = list(
  response ~ 1,  # Plateau in the first segment (int_1)
  ~ 0 + time,    # Joined slope (time_2) at cp_1
  ~ 1 + time     # Disjoined slope (int_3, time_3) at cp_2
)

# Fit it and sample the prior too.
# options(mc.cores = 3)  # Uncomment to speed up sampling
ex = mcp_example("demo")  # Simulated data example
demo_fit = mcp(model, data = ex$data, sample = "both")

# See parameter estimates
summary(demo_fit)

# Visual inspection of the results
plot(demo_fit)  # Visualization of model fit/predictions
plot_pars(demo_fit)  # Parameter distributions
pp_check(demo_fit)  # Prior/Posterior predictive checks

# Test a hypothesis
hypothesis(demo_fit, "cp_1 > 10")

# Make predictions
fitted(demo_fit)
predict(demo_fit)
predict(demo_fit, newdata = data.frame(time = c(55.545, 80, 132)))

# Compare to a one-intercept-only model (no change points) with default prior
model_null = list(response ~ 1)
fit_null = mcp(model_null, data = ex$data, par_x = "time")  # fit another model here
demo_fit$loo = loo(demo_fit)
fit_null$loo = loo(fit_null)
loo::loo_compare(demo_fit$loo, fit_null$loo)

# Inspect the prior. Useful for prior predictive checks.
summary(demo_fit, prior = TRUE)
plot(demo_fit, prior = TRUE)

# Show all priors. Default priors are added where you don't provide any
print(demo_fit$prior)

# Set priors and re-run
prior = list(
  int_1 = 15,
  time_2 = "dt(0, 2, 1) T(0, )",  # t-dist slope. Truncated to positive.
  cp_2 = "dunif(cp_1, 80)",    # change point to segment 2 > cp_1 and < 80.
  int_3 = "int_1"           # Shared intercept between segment 1 and 3
)

fit3 = mcp(model, data = ex$data, prior = prior)

# Show the JAGS model
demo_fit$jags_code

Get example models and data

Description

Get example models and data

Usage

mcp_example(name, sample = FALSE)

Arguments

name

Name of the example. One of:

"demo": Two change points between intercepts and joined/disjoined slopes.
"ar": One change point in autoregressive residuals.
"binomial": Binomial with two change points. Much like "demo" on a logit scale.
"intercepts": An intercept-only change point.
rel_prior: Relative parameterization and informative priors.
"quadratic": A change point to a quadratic segment.
"trigonometric": Trigonometric/seasonal data and model.
"varying": Varying / hierarchical change points.
"variance": A change in variance, including a variance slope.

sample

TRUE (run fit = mcp(model, data, ...)) or FALSE.

Value

List with

model: A list of formulas
data: The simulated data
simulated: The parameters used for simulating the data.
fit: an mcpfit if sample = TRUE,
call: the code to run the above.

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Examples


ex = mcp_example("demo")
plot(ex$data)  # Plot data
print(ex$simulated)  # See true parameters used to simulate
print(ex$call)  # See how the data was simulated

# Fit the model. Either...
fit = mcp(ex$model, ex$data)
plot(fit)

ex_with_fit = mcp_example("demo", sample = TRUE)
plot(ex_with_fit$fit)

Add A family object to store link functions between R and JAGS.

Description

This will make more sense once more link functions / families are added.

Usage

mcpfamily(family)

Arguments

family

A family object, e.g., binomial(link = "identity").

Class `mcpfit` of models fitted with the mcp package

Description

Models fitted with the mcp function are represented as an mcpfit object which contains the user input (model, data, family), derived model characteristics (prior, parameter names, and jags code), and the fit (prior and/or posterior mcmc samples).

Details

See methods(class = "mcpfit") for an overview of available methods.

User-provided information (see mcp for more details):

Slots

model: A list of formulas, making up the model. Provided by user. See mcp for more details.
data: A data frame. Provided by user. See mcp for more details.
family: An mcpfamily object. Provided by user. See mcp for more details.
prior: A named list. Provided by user. See mcp for more details.
mcmc_post: An mcmc.list object with posterior samples.
mcmc_prior: An mcmc.list object with prior samples.
mcmc_loglik: An mcmc.list object with samples of log-likelihood.
pars: A list of character vectors of model parameter names.
jags_code: A string with jags code. Use cat(fit$jags_code) to show it.
simulate: A method to simulate and predict data.
.other: Information that is used internally by mcp.

Negative binomial for mcp

Description

Parameterized as mu (mean; poisson lambda) and size (a shape parameter), so you can do rnbinom(10, mu = 10, size = 1). Read more in the doc for rnbinom,

Usage

negbinomial(link = "log")

Arguments

link

Link function (Character).

Inverse probit function

Description

Inverse probit function

Usage

phi(eta)

Arguments

eta

A vector of probits

Value

A vector with same length as mu

Plot full fits

Description

Plot prior or posterior model draws on top of data. Use plot_pars to plot individual parameter estimates.

Usage

## S3 method for class 'mcpfit'
plot(
  x,
  facet_by = NULL,
  lines = 25,
  geom_data = "point",
  cp_dens = TRUE,
  q_fit = FALSE,
  q_predict = FALSE,
  rate = TRUE,
  prior = FALSE,
  which_y = "ct",
  arma = TRUE,
  nsamples = 2000,
  scale = "response",
  ...
)

Arguments

x

An mcpfit object

facet_by

String. Name of a varying group.

lines

Positive integer or FALSE. Number of lines (posterior draws). FALSE or lines = 0 plots no lines. Note that lines always plot fitted values - not predicted. For prediction intervals, see the q_predict argument.

geom_data

String. One of "point", "line" (good for time-series), or FALSE (don not plot).

cp_dens

TRUE/FALSE. Plot posterior densities of the change point(s)? Currently does not respect facet_by. This will be added in the future.

q_fit

Whether to plot quantiles of the posterior (fitted value).

TRUE Add 2.5% and 97.5% quantiles. Corresponds to q_fit = c(0.025, 0.975).
FALSE No quantiles
A vector of quantiles. For example, quantiles = 0.5 plots the median and quantiles = c(0.2, 0.8) plots the 20% and 80% quantiles.

q_predict

Same as q_fit, but for the prediction interval.

rate

prior

TRUE/FALSE. Plot using prior samples? Useful for mcp(..., sample = "both")

which_y

What to plot on the y-axis. One of

"ct": The central tendency which is often the mean after applying the link function.
"sigma": The variance
"ar1", "ar2", etc. depending on which order of the autoregressive effects you want to plot.

arma

Whether to include autoregressive effects.

TRUE Compute autoregressive residuals. Requires the response variable in newdata.
FALSE Disregard the autoregressive effects. For family = gaussian(), predict() just use sigma for residuals.

nsamples

scale

One of

"response": return on the observed scale, i.e., after applying the inverse link function.
"linear": return on the parameter scale (where the linear trends are modelled).

...

Currently ignored.

Details

plot() uses fit$simulate() on posterior samples. These represent the (joint) posterior distribution.

Value

A ggplot2 object.

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Examples

# Typical usage. demo_fit is an mcpfit object.
plot(demo_fit)

plot(demo_fit, prior = TRUE)  # The prior

plot(demo_fit, lines = 0, q_fit = TRUE)  # 95% HDI without lines
plot(demo_fit, q_predict = c(0.1, 0.9))  # 80% prediction interval
plot(demo_fit, which_y = "sigma", lines = 100)  # The variance parameter on y

# Show a panel for each varying effect
# plot(fit, facet_by = "my_column")

# Customize plots using regular ggplot2
library(ggplot2)
plot(demo_fit) + theme_bw(15) + ggtitle("Great plot!")

Plot individual parameters

Description

Plot many types of plots of parameter estimates. See examples for typical use cases.

Usage

plot_pars(
  fit,
  pars = "population",
  regex_pars = character(0),
  type = "combo",
  ncol = 1,
  prior = FALSE
)

Arguments

fit

An mcpfit object.

pars

Character vector. One of:

Vector of parameter names.
"population" plots all population parameters.
"varying" plots all varying effects. To plot a particular varying effect, use regex_pars = "^name".

regex_pars

Vector of regular expressions. This will typically just be the beginning of the parameter name(s), i.e., "^cp_" plots all change points, "^my_varying" plots all levels of a particular varying effect, and "^cp_|^my_varying" plots both.

type

String or vector of strings. Calls ⁠bayesplot::mcmc_>>type<<()⁠. Common calls are "combo", "trace", and "dens_overlay". Current options include 'acf', 'acf_bar', 'areas', 'areas_ridges', 'combo', 'dens', 'dens_chains', 'dens_overlay', 'hist', 'intervals', 'rank_hist', 'rank_overlay', 'trace', 'trace_highlight', and 'violin".

ncol

Number of columns in plot. This is useful when you have many parameters and only one plot type.

prior

TRUE/FALSE. Plot using prior samples? Useful for mcp(..., sample = "both")

Details

For other type, it calls bayesplot::mcmc_type(). Use these directly on fit$mcmc_post or fit$mcmc_prior if you want finer control of plotting, e.g., bayesplot::mcmc_dens(fit$mcmc_post). There are also a number of useful plots in the coda package, i.e., coda::gelman.plot(fit$mcmc_post) and coda::crosscorr.plot(fit$mcmc_post)

In any case, if you see a few erratic lines or parameter estimates, this is a sign that you may want to increase argument 'adapt' and 'iter' in mcp.

Value

A ggplot2 object.

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Examples

# Typical usage. demo_fit is an mcpfit object.
plot_pars(demo_fit)

## Not run: 
# More options
plot_pars(demo_fit, regex_pars = "^cp_")  # Plot only change points
plot_pars(demo_fit, pars = c("int_3", "time_3"))  # Plot these parameters
plot_pars(demo_fit, type = c("trace", "violin"))  # Combine plots
# Some plots only take pairs. hex is good to assess identifiability
plot_pars(demo_fit, type = "hex", pars = c("cp_1", "time_2"))

# Visualize the priors:
plot_pars(demo_fit, prior = TRUE)

# Useful for varying effects:
# plot_pars(my_fit, pars = "varying", ncol = 3)  # plot all varying effects
# plot_pars(my_fit, regex_pars = "my_varying", ncol = 3)  # plot all levels of a particular varying

# Customize multi-column ggplots using "*" instead of "+" (patchwork)
library(ggplot2)
plot_pars(demo_fit, type = c("trace", "dens_overlay")) * theme_bw(10)

## End(Not run)

Posterior Predictive Checks For Mcpfit Objects

Description

Plot posterior (default) or prior (prior = TRUE) predictive checks. This is convenience wrapper around the ⁠bayesplot::ppc_*()⁠ methods.

Usage

pp_check(
  object,
  type = "dens_overlay",
  facet_by = NULL,
  newdata = NULL,
  prior = FALSE,
  varying = TRUE,
  arma = TRUE,
  nsamples = 100,
  ...
)

Arguments

object

An mcpfit object.

type

One of bayesplot::available_ppc("grouped", invert = TRUE) %>% stringr::str_remove("ppc_")

facet_by

Name of a column in data modeled as varying effect(s).

newdata

A tibble or a data.frame containing predictors in the model. If NULL (default), the original data is used.

prior

TRUE/FALSE. Plot using prior samples? Useful for mcp(..., sample = "both")

varying

One of:

TRUE All varying effects (fit$pars$varying).
FALSE No varying effects (c()).
Character vector: Only include specified varying parameters - see fit$pars$varying.

arma

Whether to include autoregressive effects.

TRUE Compute autoregressive residuals. Requires the response variable in newdata.
FALSE Disregard the autoregressive effects. For family = gaussian(), predict() just use sigma for residuals.

nsamples

Number of draws. Note that you may want to use all data for summary geoms. e.g., pp_check(fit, type = "ribbon", nsamples = NULL).

...

Further arguments passed to bayesplot::ppc_type(y, yrep, ...)

Value

A ggplot2 object for single plots. Enriched by patchwork for faceted plots.

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Examples


pp_check(demo_fit)
pp_check(demo_fit, type = "ecdf_overlay")
#pp_check(some_varying_fit, type = "loo_intervals", facet_by = "id")

Fits and predictions from samples and newdata

Description

Fits and predictions from samples and newdata

Usage

pp_eval(
  object,
  newdata = NULL,
  summary = TRUE,
  type = "fitted",
  probs = TRUE,
  rate = TRUE,
  prior = FALSE,
  which_y = "ct",
  varying = TRUE,
  arma = TRUE,
  nsamples = NULL,
  samples_format = "tidy",
  scale = "response",
  ...
)

Arguments

object

An mcpfit object.

newdata

A tibble or a data.frame containing predictors in the model. If NULL (default), the original data is used.

summary

Summarise at each x-value

type

One of:

"fitted": return fitted values. See also fitted()
"predict": return predicted values, using random dispersion around the central tendency (e.g., y_predict = rnorm(N, y_fitted, sigma_fitted) for family = gaussian()). See also predict().
"residuals": same as "predict" but the observed y-values are subtracted. See also residuals()

probs

Vector of quantiles. Only in effect when summary == TRUE.

rate

prior

TRUE/FALSE. Plot using prior samples? Useful for mcp(..., sample = "both")

which_y

What to plot on the y-axis. One of

"ct": The central tendency which is often the mean after applying the link function.
"sigma": The variance
"ar1", "ar2", etc. depending on which order of the autoregressive effects you want to plot.

varying

One of:

TRUE All varying effects (fit$pars$varying).
FALSE No varying effects (c()).
Character vector: Only include specified varying parameters - see fit$pars$varying.

arma

Whether to include autoregressive effects.

TRUE Compute autoregressive residuals. Requires the response variable in newdata.
FALSE Disregard the autoregressive effects. For family = gaussian(), predict() just use sigma for residuals.

nsamples

samples_format

One of "tidy" or "matrix". Controls the output format when summary == FALSE. See more under "value"

scale

One of

"response": return on the observed scale, i.e., after applying the inverse link function.
"linear": return on the parameter scale (where the linear trends are modelled).

...

Currently unused

Value

If summary = TRUE: A tibble with the posterior mean for each row in newdata, If newdata is NULL, the data in fit$data is used.
If summary = FALSE and samples_format = "tidy": A tidybayes tibble with all the posterior samples (Ns) evaluated at each row in newdata (Nn), i.e., with ⁠Ns x Nn⁠ rows. If there are varying effects, the returned data is expanded with the relevant levels for each row.

The return columns are:
- Predictors from newdata.
- Sample descriptors: ".chain", ".iter", ".draw" (see the tidybayes package for more), and "data_row" (newdata rownumber)
- Sample values: one column for each parameter in the model.
- The estimate. Either "predict" or "fitted", i.e., the name of the type argument.
If summary = FALSE and samples_format = "matrix": An N_draws X nrows(newdata) matrix with fitted/predicted values (depending on type). This format is used by brms and it's useful as yrep in ⁠bayesplot::ppc_*⁠ functions.

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Samples from the Posterior Predictive Distribution

Description

Samples from the Posterior Predictive Distribution

Usage

## S3 method for class 'mcpfit'
predict(
  object,
  newdata = NULL,
  summary = TRUE,
  probs = TRUE,
  rate = TRUE,
  prior = FALSE,
  which_y = "ct",
  varying = TRUE,
  arma = TRUE,
  nsamples = NULL,
  samples_format = "tidy",
  ...
)

Arguments

object

An mcpfit object.

newdata

A tibble or a data.frame containing predictors in the model. If NULL (default), the original data is used.

summary

Summarise at each x-value

probs

Vector of quantiles. Only in effect when summary == TRUE.

rate

prior

TRUE/FALSE. Plot using prior samples? Useful for mcp(..., sample = "both")

which_y

What to plot on the y-axis. One of

"ct": The central tendency which is often the mean after applying the link function.
"sigma": The variance
"ar1", "ar2", etc. depending on which order of the autoregressive effects you want to plot.

varying

One of:

TRUE All varying effects (fit$pars$varying).
FALSE No varying effects (c()).
Character vector: Only include specified varying parameters - see fit$pars$varying.

arma

Whether to include autoregressive effects.

TRUE Compute autoregressive residuals. Requires the response variable in newdata.
FALSE Disregard the autoregressive effects. For family = gaussian(), predict() just use sigma for residuals.

nsamples

samples_format

One of "tidy" or "matrix". Controls the output format when summary == FALSE. See more under "value"

...

Currently unused

Value

If summary = TRUE: A tibble with the posterior mean for each row in newdata, If newdata is NULL, the data in fit$data is used.
If summary = FALSE and samples_format = "tidy": A tidybayes tibble with all the posterior samples (Ns) evaluated at each row in newdata (Nn), i.e., with ⁠Ns x Nn⁠ rows. If there are varying effects, the returned data is expanded with the relevant levels for each row.

The return columns are:
- Predictors from newdata.
- Sample descriptors: ".chain", ".iter", ".draw" (see the tidybayes package for more), and "data_row" (newdata rownumber)
- Sample values: one column for each parameter in the model.
- The estimate. Either "predict" or "fitted", i.e., the name of the type argument.
If summary = FALSE and samples_format = "matrix": An N_draws X nrows(newdata) matrix with fitted/predicted values (depending on type). This format is used by brms and it's useful as yrep in ⁠bayesplot::ppc_*⁠ functions.

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Examples


predict(demo_fit)  # Evaluate at each demo_fit$data
predict(demo_fit, probs = c(0.1, 0.5, 0.9))  # With median and 80% credible interval.
predict(demo_fit, summary = FALSE)  # Samples instead of summary.
predict(
  demo_fit,
  newdata = data.frame(time = c(-5, 20, 300)),  # Evaluate
  probs = c(0.025, 0.5, 0.975)
)

Print mcplist

Description

Shows a list in a more condensed format using str(list).

Usage

## S3 method for class 'mcplist'
print(x, ...)

Arguments

x

An mcpfit object.

...

Currently ignored

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Nice printing texts

Description

Useful for print(fit$jags_code), print(mcp_demo$call), etc.

Usage

## S3 method for class 'mcptext'
print(x, ...)

Arguments

x

Character, often with newlines.

...

Currently ignored.

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Examples

mytext = "line1 = 2\n line2 = 'horse'"
class(mytext) = "mcptext"
print(mytext)

Probit function

Description

Probit function

Usage

probit(mu)

Arguments

mu

A vector of probabilities (0.0 to 1.0)

Value

A vector with same length as mu

Recover the levels of varying effects in mcmc.list

Description

Jags uses 1, 2, 3, ..., etc. for indexing of varying effects. This function adds back the original levels, whether numeric or string

Usage

recover_levels(samples, data, mcmc_col, data_col)

Arguments

samples

An mcmc.list with varying columns starting in mcmc_col.

data

A tibble or data.frame with the cols in data_col.

mcmc_col

A vector of strings.

data_col

A vector of strings. Has to be same length as mcmc_col.'

Remove varying or population terms from a formula

Description

WARNING: removes response side from the formula

Usage

remove_terms(form, remove)

Arguments

form

A formula

remove

Either "varying" or "population". These are removed.

Value

A formula

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Compute Residuals From Mcpfit Objects

Description

Equivalent to fitted(fit, ...) - fit$data[, fit$data$yvar] (or fitted(fit, ...) - newdata[, fit$data$yvar]), but with fixed arguments for fitted: ⁠rate = FALSE, which_y = 'ct', samples_format = 'tidy'⁠.

Usage

## S3 method for class 'mcpfit'
residuals(
  object,
  newdata = NULL,
  summary = TRUE,
  probs = TRUE,
  prior = FALSE,
  varying = TRUE,
  arma = TRUE,
  nsamples = NULL,
  ...
)

Arguments

object

An mcpfit object.

newdata

A tibble or a data.frame containing predictors in the model. If NULL (default), the original data is used.

summary

Summarise at each x-value

probs

Vector of quantiles. Only in effect when summary == TRUE.

prior

TRUE/FALSE. Plot using prior samples? Useful for mcp(..., sample = "both")

varying

One of:

TRUE All varying effects (fit$pars$varying).
FALSE No varying effects (c()).
Character vector: Only include specified varying parameters - see fit$pars$varying.

arma

Whether to include autoregressive effects.

TRUE Compute autoregressive residuals. Requires the response variable in newdata.
FALSE Disregard the autoregressive effects. For family = gaussian(), predict() just use sigma for residuals.

nsamples

...

Currently unused

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Examples


residuals(demo_fit)
residuals(demo_fit, probs = c(0.1, 0.5, 0.9))  # With median and 80% credible interval.
residuals(demo_fit, summary = FALSE)  # Samples instead of summary.

Run parallel MCMC sampling using JAGS.

Description

Run parallel MCMC sampling using JAGS.

Usage

run_jags(
  data,
  jags_code,
  pars,
  ST,
  cores,
  sample,
  n.chains,
  n.iter,
  n.adapt,
  inits
)

Arguments

data

Data.frame or tibble in long format.

jags_code

A string. JAGS model, usually returned by make_jagscode().

pars

Character vector of parameters to save/monitor.

ST

A segment table (tibble), returned by get_segment_table. Only really used when the model contains varying effects.

cores

Positive integer or "all". Number of cores.

1: serial sampling. options(mc.cores = 3) will dominate cores = 1 but not larger values of cores.
⁠>1⁠: parallel sampling on this number of cores. Ideally set chains to the same value. Note: cores > 1 takes a few extra seconds the first time it's called but subsequent calls will start sampling immediately.
"all": use all cores but one and sets chains to the same value. This is a convenient way to maximally use your computer's power.

sample

One of

"post": Sample the posterior.
"prior": Sample only the prior. Plots, summaries, etc. will use the prior. This is useful for prior predictive checks.
"both": Sample both prior and posterior. Plots, summaries, etc. will default to using the posterior. The prior only has effect when doing Savage-Dickey density ratios in hypothesis.
"none" or FALSE: Do not sample. Returns an mcpfit object without sample. This is useful if you only want to check prior strings (fit$prior), the JAGS model (fit$jags_code), etc.

n.chains

the number of parallel chains for the model

n.iter

number of iterations to monitor

n.adapt

the number of iterations for adaptation. See adapt for details. If n.adapt = 0 then no adaptation takes place.

inits

A list if initial values for the parameters. This can be useful if a model fails to converge. Read more in jags.model. Defaults to NULL, i.e., no inits.

Value

'mcmc.list“

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Transform a prior from SD to precision.

Description

JAGS uses precision rather than SD. This function converts dnorm(4.2, 1.3) into dnorm(4.2, 1/1.3^2). It allows users to specify priors using SD and then it's transformed for the JAGS code. It works for the following distributions: dnorm|dt|dcauchy|ddexp|dlogis|dlnorm. In all of these, tau/sd is the second parameter.

Usage

sd_to_prec(prior_str)

Arguments

prior_str

String. A JAGS prior. Can be truncated, e.g. ⁠dt(3, 2, 1) T(my_var, )⁠.

Value

A string

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Summarise mcpfit objects

Description

Summarise parameter estimates and model diagnostics.

Usage

## S3 method for class 'mcpfit'
summary(object, width = 0.95, digits = 2, prior = FALSE, ...)

fixef(object, width = 0.95, prior = FALSE, ...)

ranef(object, width = 0.95, prior = FALSE, ...)

## S3 method for class 'mcpfit'
print(x, ...)

Arguments

object

An mcpfit object.

width

Float. The width of the highest posterior density interval (between 0 and 1).

digits

prior

TRUE/FALSE. Summarise prior instead of posterior?

...

Currently ignored

x

An mcpfit object.

Value

A data frame with parameter estimates and MCMC diagnostics. OBS: The change point distributions are often not unimodal and symmetric so the intervals can be deceiving Plot them using plot_pars(fit).

mean is the posterior mean
lower is the lower quantile of the highest-density interval (HDI) given in width.
upper is the upper quantile.
Rhat is the Gelman-Rubin convergence diagnostic which is often taken to be acceptable if < 1.1. It is computed using gelman.diag.
n.eff is the effective sample size computed using effectiveSize. Low effective sample sizes are also obvious as poor mixing in trace plots (see plot_pars(fit)). Read how to deal with such problems here
ts_err is the time-series error, taking autoregressive correlation into account. It is computed using spectrum0.ar.

For simulated data, the summary contains two additional columns so that it is easy to inspect whether the model can recover the parameters. Run simulation and summary multiple times to get a sense of the robustness.

sim is the value used to generate the data.
match is "OK" if sim is contained in the HDI interval (lower to upper).

Functions

fixef(): Get population-level ("fixed") effects of an mcpfit object.
ranef(): Get varying ("random") effects of an mcpfit object.
print(mcpfit): Print the posterior summary of an mcpfit object.

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Examples

# Typical usage
summary(demo_fit)
summary(demo_fit, width = 0.8, digits = 4)  # Set HDI width

# Get the results as a data frame
results = summary(demo_fit)

# Varying (random) effects
# ranef(my_fit)

# Summarise prior
summary(demo_fit, prior = TRUE)

Get tidy samples with or without varying effects

Description

Returns in a format useful for fit$simulate() with population parameters in wide format and varying effects in long format (the number of rows will be nsamples * n_levels_in_varying).

Usage

tidy_samples(
  fit,
  population = TRUE,
  varying = TRUE,
  absolute = FALSE,
  prior = FALSE,
  nsamples = NULL
)

Arguments

fit

An mcpfit object

population

TRUE All population effects. Same as fit$pars$population.
- FALSE No population effects. Same as c().
- Character vector: Only include specified population parameters - see fit$pars$population.

varying

One of:

TRUE All varying effects (fit$pars$varying).
FALSE No varying effects (c()).
Character vector: Only include specified varying parameters - see fit$pars$varying.

absolute

TRUE Returns the absolute location of all varying change points.
- FALSE Just returns the varying effects.
- Character vector: Only do absolute transform for these varying parameters - see fit$pars$varying.
OBS: This currently only applies to varying change points. It is not implemented for rel() regressors yet.

prior

TRUE/FALSE. Summarise prior instead of posterior?

nsamples

Value

tibble of posterior draws in tidybayes format.

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Convert from tidy to matrix

Description

Converts from the output of tidy_samples() or pp_eval(fit, samples_format = "tidy") to an N_draws X nrows(newdata) matrix with fitted/predicted values. This format is used y brms and it's useful as yrep in ⁠bayesplot::ppc_*⁠ functions.

Usage

tidy_to_matrix(samples, returnvar)

Arguments

samples

Samples in tidy format

returnvar

An rlang::sym() object.

Value

An N_draws X nrows(newdata) matrix.

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Takes any formula-like input and returns a formula

Description

Takes any formula-like input and returns a formula

Usage

to_formula(form)

Arguments

form

Formula or character (with or without initial tilde/"~")

Value

A formula

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Unpack arma order and formula

Description

Unpack arma order and formula

Usage

unpack_arma(form_str_in)

Arguments

form_str_in

A character. These are allowed: "ar(number)" or "ar(number, formula)"

Value

A list with $order and $form_str (e.g., "ar(formula)"). The formula is ar(1) or ma(1) if no formula is given

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Takes a cp formula (as a string) and returns its properties

Description

Takes a cp formula (as a string) and returns its properties

Usage

unpack_cp(form_cp, i)

Arguments

form_cp

Segment formula as string.

i

The segment number (integer)

Value

A one-row tibble with columns:

cp_int: bool. Whether there is an intercept change in the change point.
cp_in_rel: bool. Is this intercept change relative?
cp_ran_int: bool or NA. Is there a random intercept on the change point?
cp_group_col: char or NA. Which column in data define the random intercept?

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Get the intercept of a formula

Description

Get the intercept of a formula

Usage

unpack_int(form, i, ttype)

Arguments

form

A formula

i

Segment number (integer)

ttype

The term type. One of "ct" (central tendency), "sigma" (variance), or "ar" (autoregressive)

Value

A one-row tibble describing the intercept.

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Unpack right-hand side

Description

This is a pretty big function. It includes unpacking sigma(), ar(), etc.

Usage

unpack_rhs(form_rhs, i, family, data, last_segment)

Arguments

form_rhs

A character representation of a formula

i

The segment number (integer)

family

An mcpfamily object returned by mcpfamily().

data

A data.frame or tibble

last_segment

The last row in the segment table, made in get_segment_table()

Value

A one-row tibble with three columns for each of ct. sigma, ar, and ma:

⁠_int⁠: NA or a one-row tibble describing the intercept.
⁠_slope⁠: NA or a tibble with a row for each slope term.
⁠_code⁠: NA or a char with the JAGS/R code to implement the slope.

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Unpack the slope of a formula

Description

Makes A list of terms and applies unpack_slope_term() to each of them. Then builds the code for this segment's slope

Usage

unpack_slope(form, i, ttype, last_slope)

Arguments

form

A formula

i

Segment number (integer)

ttype

The term type. One of "ct" (central tendency), "sigma" (variance), or "ar" (autoregressive)

last_slope

The element in the slope column for this ttype in the previous segment. I.e., probably what this function returned last time it was called!

Value

A "one-row" list with code (char) and a tibble of slopes.

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Unpacks a single term

Description

Returns a row for unpack_slope().

Usage

unpack_slope_term(term, i, last_slope, ttype = "")

Arguments

term

A character, e.g., "x", "I(x^2)", or "log(x)".

i

Segment number (integer)

last_slope

The element in the slope column for this ttype in the previous segment. I.e., probably what this function returned last time it was called!

ttype

The term type. One of "ct" (central tendency), "sigma" (variance), or "ar" (autoregressive)

Value

A "one-row" list describing a slope term.

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Takes a formula and returns a string representation of y, cp, and rhs

Description

Takes a formula and returns a string representation of y, cp, and rhs

Usage

unpack_tildes(segment, i)

Arguments

segment

A formula

i

The segment number (integer)

Value

A one-row tibble with columns:

form: String. The full formula for this segment.
form_y: String. The expression for y (without tilde)
form_cp: String. The formula for the change point
form_rhs: String. The formula for the right-hand side

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Get relevant info about varying parameters

Description

Returns parameters, data columns, and implicated segments given parameter name(s) or column(s).

Usage

unpack_varying(fit, pars = NULL, cols = NULL)

Arguments

pars

NULL/FALSE for nothing. TRUE for all. A vector of varying parameter names for specifics.

cols

NULL/FALSE for nothing. TRUE for all. A vector of varying column names for specifics. Usually provided via "facet_by" argument in other functions.

Details

Returns a list with

Value

A list. See details.

Slots

pars: Character vector of parameter names. NULL if empty.
cols: Character vector of data column names. NULL if empty.
indices: Logical vector of segments in the segment table that contains the varying effect

Unpack varying effects

Description

Unpack varying effects

Usage

unpack_varying_term(term, i)

Arguments

term

A character, e.g., "x", "I(x^2)", or "log(x)".

i

Segment number (integer)

Value

A "one-row" list describing a varying intercept.

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Unpacks y variable name

Description

Unpacks y variable name

Usage

unpack_y(form_y, i, family)

Arguments

form_y

Character representation of formula

i

The segment number (integer)

family

An mcpfamily object returned by mcpfamily().

Value

A one-row tibble with the columns

y: string. The y variable name.
trials: string. The trials variable name.
weights: string. The weights variable name.

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

Add loo if not already present

Description

Add loo if not already present

Usage

with_loo(fit, save_psis = FALSE, info = NULL)

Arguments

fit

An mcpfit object

save_psis

Logical. See documentation of loo::loo

info

Optional message if adding loo

Value

An mcpfit object with loo.

Author(s)

Jonas Kristoffer Lindeløv jonas@lindeloev.dk

mcp: Regression with Multiple Change Points

Description

Author(s)

See Also

Bernoulli family for mcp

Description

Usage

Arguments

Checks if all terms are in the data

Description

Usage

Arguments

Author(s)

Compute information criteria for model comparison

Description

Usage

Arguments

Value

Functions

Author(s)

See Also

Examples

Cumulative pasting of character columns

Description

Usage

Arguments

Value

Author(s)

Example mcpfit for examples

Description

Usage

Format

Exponential family for mcp

Description

Usage

Arguments

Expected Values from the Posterior Predictive Distribution

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Format code with one or multiple terms

Description

Usage

Arguments

Value

Author(s)

Density geom for plot.mcpfit()

Description

Usage

Arguments

Value

Return a geom_line representing the quantiles

Description

Usage

Arguments

Value

Author(s)

Call get_formula_str for multiple ytypes and paste strings

Description

Usage

Arguments

Value

Author(s)

Gets code for ARMA terms, resulting in a "resid_"

Description

Usage

Arguments

Value

Author(s)

Extracts the order from ARMA parameter name(s)

Description

Usage

Arguments

Value

Compute the density at a specific point.

Description

Example `mcpfit` for examples

Density geom for `plot.mcpfit()`

Call `get_formula_str` for multiple ytypes and paste strings

Adds helper variables for use in `run_jags`