---
title: "Introduction to heterogeneous effects analysis of conjoint experiments using **cjbart**"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to cjbart}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7, fig.height = 5
)

library(randomForestSRC)
options(rf.cores = 2)
```

```{r setup}
set.seed(89)
library(ggplot2)
library(cjbart)
```

This vignette provides a brief example of how to estimate heterogeneous effects in conjoint experiments, using \code{cjbart}.

We first simulate a basic conjoint design for the purpose of illustration. Suppose we conducted a conjoint experiment on 100 individuals, where each individual makes 5 discrete choices between two profiles. Each profile has three attributes (A-E), and we also record two covariates for each subject.  The resulting data contains 1000 observations. 

```{r data, include=TRUE}
subjects <- 100
rounds <- 5
profiles <- 2
obs <- subjects*rounds*profiles

fake_data <- data.frame(A = sample(c("a1","a2","a3"), obs, replace = TRUE),
                        B = sample(c("b1","b2","b3"), obs, replace = TRUE),
                        C = sample(c("c1","c2","c3"), obs, replace = TRUE),
                        D = sample(c("d1","d2","d3"), obs, replace = TRUE),
                        E = sample(c("e1","e2","e3"), obs, replace = TRUE),

                        covar1 = rep(runif(subjects, 0 ,1),
                                     each = rounds),
                        covar2 = rep(sample(c(1,0),
                                            subjects,
                                            replace = TRUE),
                                     each = rounds),

                        id1 = rep(1:subjects, each=rounds),
                        stringsAsFactors = TRUE)

fake_data$Y <- ifelse(fake_data$E == "e2",
                      rbinom(obs, 1, fake_data$covar1),
                      sample(c(0,1), obs, replace = TRUE))
```

## Train a probit BART model on conjoint data

To estimate a conjoint model on this data we will use the `cjbart()` function, which uses Bayesian Additive Regression Trees (BART) to approximate the relationships between the outcome (a binary choice), the randomized attribute-levels, and the included covariates. For the sake of efficiency in this example, we reduce the number of trees (`ntree`) and cuts (`numcut`) used to estimate the BART model. More generally, users can pass arguments to the underlying BART algorithm (see `?BART::pbart`) directly from the `cjbart` function to adjust the model hyperparameters:

```{r train, include=TRUE}
cj_model <- cjbart(data = fake_data,
                   Y = "Y", 
                   id = "id1",
                   ntree = 15, numcut = 20)

```

To generate heterogeneity estimates at the observation- and individual-level, we use the `IMCE()` function. We pass in both the original experimental data, and the conjoint model generated in the last step. At this point, we also declare the names of the conjoint attribute variables (`attribs`) and the reference category we want to use for each attribute when calculating the marginal component effects (`ref_levels`). The vector of reference levels must be of the same length as `attribs`, with the level at position `i` corresponding to the `attrib[i]`:

```{r OMCE, include=TRUE}
het_effects <- IMCE(data = fake_data, 
                    model = cj_model, 
                    attribs = c("A","B","C","D","E"),
                    ref_levels = c("a1","b1","c1","d1","e1"),
                    cores = 1)
```

`IMCE()` returns an object of class "cjbart", which contains the individual-level marginal component effects (IMCEs), matrices reflecting the upper and lower values of each estimate's confidence interval respectively, and optionally the underlying observation-level marginal component effects (OMCEs) when `keep_omces = TRUE`.

It is likely that users will want to compare the conjoint estimation strategy to other estimation strategies, such as logistic regression models. We can recover the average marginal component effect (AMCE) for the BART model using the `summary()` command:

```{r summary}
summary(het_effects)
```

We can plot the IMCEs, color coding the points by some covariate value, using the in-built `plot()` function:

```{r plot_imces}
plot(het_effects, covar = "covar1")
```

To aid presentation, the plot function can restrict which attribute-levels are displayed by using the `plot_levels` argument:

```{r plot_imces2}
plot(het_effects, covar = "covar1", plot_levels = c("a2","a3","e2","e3"))
```
We can estimate the importance of covariates to the model using the `het_vimp()` function, which calculates standardized importance scores using random forest-based permutation tests. Calling `plot()` on the result will return a heatmap of these scores for each combination of attribute-level and covariate (users can restrict which attribute levels and covariates are considered, see the documentation for more details):


```{r plot_vimp1}
vimp_estimates <- het_vimp(imces = het_effects, cores = 1)
plot(vimp_estimates)
```

Supplying a single attribute-level, the same plot function will display the importance estimates with corresponding 95 percent confidence intervals:

```{r plot_vimp2}
plot(vimp_estimates, att_levels = "d3")
```

Finally, it is possible to estimate population IMCEs (pIMCEs) using `pIMCE()` This function requires a list of the marginal probabilities for each attribute in the population of interest, and can only be estimated for one attribute-level comparison at a time (due to the computational requirements):

```{r pimces}
fake_marginals <- list()
fake_marginals[["A"]] <- c("a1" = 0.33,"a2" = 0.33,"a3"=0.34)
fake_marginals[["B"]] <- c("b1" = 0.33,"b2" = 0.33,"b3" = 0.34)
fake_marginals[["C"]] <- c("c1" = 0.33,"c2" = 0.33, "c3" = 0.34)
fake_marginals[["D"]] <- c("d1" = 0.75,"d2" = 0.2,"d3" = 0.05)
fake_marginals[["E"]] <- c("e1" = 0.33,"e2" = 0.33,"e3" = 0.34)

# Reduced number of covariate data for sake of speed
fake_pimces <- pIMCE(covar_data = fake_data[fake_data$id1 %in% 1:3,
                                            c("id1","covar1","covar2")],
                     model = cj_model,
                     attribs = c("A","B","C","D","E"),
                     l = "E",
                     l_1 = "e2",
                     l_0 = "e1",
                     marginals = fake_marginals,
                     method = "bayes",
                     cores = 1)

head(fake_pimces)
```