Title: | Difference-in-Differences with a Continuous Treatment |
Version: | 0.1.0 |
Description: | Provides methods for difference-in-differences with a continuous treatment and staggered treatment adoption. Includes estimation of treatment effects and causal responses as a function of the dose, event studies indexed by length of exposure to the treatment, and aggregation into overall average effects. Uniform inference procedures are included, along with both parametric and nonparametric models for treatment effects. The methods are based on Callaway, Goodman-Bacon, and Sant'Anna (2025) <doi:10.48550/arXiv.2107.02637>. |
Depends: | R (≥ 4.1.0), |
License: | GPL-3 |
Encoding: | UTF-8 |
Imports: | BMisc (≥ 1.4.8), ptetools, checkmate, splines2, sandwich, ggplot2, MASS, npiv |
RoxygenNote: | 7.3.2 |
URL: | https://bcallaway11.github.io/contdid/, https://github.com/bcallaway11/contdid |
BugReports: | https://github.com/bcallaway11/contdid/issues |
Suggests: | testthat (≥ 3.0.0), tidyr |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2025-06-28 03:19:14 UTC; bmc43193 |
Author: | Brantly Callaway [aut, cre], Andrew Goodman-Bacon [aut], Pedro H. C. Sant'Anna [aut] |
Maintainer: | Brantly Callaway <brantly.callaway@uga.edu> |
Repository: | CRAN |
Date/Publication: | 2025-07-03 15:10:02 UTC |
Difference-in-differences with a continuous treatment
Description
contdid
is package for estimating the effect of a continuous treatment in a difference-in-differences framework.
Author(s)
Maintainer: Brantly Callaway brantly.callaway@uga.edu
Authors:
Andrew Goodman-Bacon andrew@goodman-bacon.com
Pedro H. C. Sant'Anna pedro.h.santanna@emory.edu
See Also
Useful links:
Report bugs at https://github.com/bcallaway11/contdid/issues
Choose Evenly Spaced Knots
Description
A function to place equally spaced knots for fitting b-splines
Usage
choose_knots_even(x, num_knots)
Arguments
x |
vector of treatment doses |
num_knots |
the number of knots to use |
Value
a vector containing the locations of the knots
Choose Knots at Quantiles
Description
A function to choose knots for fitting b-splines by the quantile of x
Usage
choose_knots_quantile(x, num_knots)
Arguments
x |
vector of treatment doses |
num_knots |
the number of knots to use |
Value
a vector containing the locations of the knots
Difference-in-differences with a Continuous Treatment
Description
A function for difference-in-differences with a continuous treatment in a staggered treatment adoption setting.
cont_did
currently supports staggered treatment with continuous treatments using
B-splines under the hood.
Usage
cont_did(
yname,
dname,
gname = NULL,
tname,
idname,
xformula = ~1,
data,
target_parameter = c("level", "slope"),
aggregation = c("dose", "eventstudy", "none"),
treatment_type = c("continuous", "discrete"),
dose_est_method = c("parametric", "cck"),
dvals = NULL,
degree = 3,
num_knots = 0,
allow_unbalanced_panel = FALSE,
control_group = c("notyettreated", "nevertreated", "eventuallytreated"),
anticipation = 0,
weightsname = NULL,
alp = 0.05,
bstrap = TRUE,
cband = FALSE,
boot_type = "multiplier",
biters = 1000,
clustervars = NULL,
est_method = NULL,
base_period = "varying",
print_details = FALSE,
cl = 1,
...
)
Arguments
yname |
The name of the outcome variable |
dname |
The name of the treatment variable in the data. The functionality of
|
gname |
The name of the timing-group variable, i.e., when treatment starts for a particular unit. The value of this variable should be set to be 0 for units that do not participate in the treatment in any time period. |
tname |
The name of the column containing the time periods |
idname |
The individual (cross-sectional unit) id name |
xformula |
A formula for additional covariates. This is not currently supported. |
data |
The name of the data.frame that contains the data |
target_parameter |
Two options are "level" and "slope". In the first case, the function will report level effects, i.e., ATT's. In the second case, the function will report slope effects, i.e., ACRT's |
aggregation |
"dose" averages across timing-groups and time periods and provides results as a function of the dose. "eventstudy" averages across timing-groups and doses and reports results as a function of the length of exposure to the treatment. "none" is a stub for reporting fully disaggregated results that can be processed as desired by the user. This is not currently supported though. The combination of the arguments |
treatment_type |
"continuous" or "discrete" depending on the nature of the treatment. Default is "continuous". "discrete" is not yet supported. |
dose_est_method |
The method used to estimate the dose-specific effects. The default
is "parametric", where the user needs to specify the number of knots and degree for
a B-spline which is assumed to be correctly specified. The other option is "cck"
which uses the a data-driven nonparametric method to estimate the dose-specific effects
based on the |
dvals |
The values of the treatment at which to compute dose-specific effects. If it is not specified, the default choice will be use the percentiles of the dose among all ever-treated units. |
degree |
The degree of the B-Spline used in estimation. The default is 3, which in
combination with the default choice for the |
num_knots |
The number of knots to include for the B-Spline. The default is 0 so that the spline is global (i.e., this will amount to fitting a global polynomial). There is a bias-variance tradeoff for including more or less knots. |
allow_unbalanced_panel |
Whether or not function should
"balance" the panel with respect to time and id. The default
values if |
control_group |
Which units to use the control group.
The default is "nevertreated" which sets the control group
to be the group of units that never participate in the
treatment. This group does not change across groups or
time periods. The other option is to set
|
anticipation |
The number of time periods before participating in the treatment where units can anticipate participating in the treatment and therefore it can affect their untreated potential outcomes |
weightsname |
The name of the column containing the sampling weights. If not set, all observations have same weight. |
alp |
the significance level, default is 0.05 |
bstrap |
Boolean for whether or not to compute standard errors using
the multiplier bootstrap. If standard errors are clustered, then one
must set |
cband |
Boolean for whether or not to compute a uniform confidence
band that covers all of the group-time average treatment effects
with fixed probability |
boot_type |
should be one of "multiplier" (the default) or "empirical".
The multiplier bootstrap is generally much faster, but |
biters |
The number of bootstrap iterations to use. The default is 1000,
and this is only applicable if |
clustervars |
A vector of variables names to cluster on. At most, there
can be two variables (otherwise will throw an error) and one of these
must be the same as idname which allows for clustering at the individual
level. By default, we cluster at individual level (when |
est_method |
the method to compute group-time average treatment effects. The default is "dr" which uses the doubly robust
approach in the |
base_period |
Whether to use a "varying" base period or a "universal" base period. Either choice results in the same post-treatment estimates of ATT(g,t)'s. In pre-treatment periods, using a varying base period amounts to computing a pseudo-ATT in each treatment period by comparing the change in outcomes for a particular group relative to its comparison group in the pre-treatment periods (i.e., in pre-treatment periods this setting computes changes from period t-1 to period t, but repeatedly changes the value of t) A universal base period fixes the base period to always be (g-anticipation-1). This does not compute pseudo-ATT(g,t)'s in pre-treatment periods, but rather reports average changes in outcomes from period t to (g-anticipation-1) for a particular group relative to its comparison group. This is analogous to what is often reported in event study regressions. Using a varying base period results in an estimate of ATT(g,t) being reported in the period immediately before treatment. Using a universal base period normalizes the estimate in the period right before treatment (or earlier when the user allows for anticipation) to be equal to 0, but one extra estimate in an earlier period. |
print_details |
Whether or not to show details/progress of computations.
Default is |
cl |
number of clusters to be used when bootstrapping; default is 1 |
... |
extra arguments that can be passed to create the correct subsets
of the data (depending on |
Value
cont_did_obj
Examples
# build small simulated data
set.seed(1234)
df <- simulate_contdid_data(
n = 1000,
num_time_periods = 4,
num_groups = 4,
dose_linear_effect = 0,
dose_quadratic_effect = 0
)
# estimate effects of continuous treatment
cd_res <- cont_did(
yname = "Y",
tname = "time_period",
idname = "id",
dname = "D",
data = df,
gname = "G",
target_parameter = "slope",
aggregation = "dose",
treatment_type = "continuous",
control_group = "notyettreated",
biters = 50,
cband = TRUE,
num_knots = 1,
degree = 3,
)
summary(cd_res)
Compute ACRT's for a Timing Group and Time Period
Description
This is the main function for computing dose-specific effects of a continuous treatment, given a particular timing group and time period.
Usage
cont_did_acrt(gt_data, dvals = NULL, degree = 1, knots = numeric(0), ...)
Arguments
gt_data |
data that is "local" to a particular group-time average treatment effect |
dvals |
The values of the treatment at which to compute dose-specific effects. If it is not specified, the default choice will be use the percentiles of the dose among all ever-treated units. |
degree |
The degree of the B-Spline used in estimation. The default is 3, which in
combination with the default choice for the |
knots |
A vector of placements of knots for b-splines. Since this function is typically called internally, this would typically be set by the calling function. |
... |
additional arguments |
Value
ptetools::attgt_if object
Continuous Two-by-Two Subset
Description
A function for computing a 2x2 subset of original data.
This function is adapted from ptetools::two_by_two_subset
and allows
for the treatment to be continuous.
This is the subset with post treatment periods separately for the
treated group and comparison group and pre-treatment periods in the period
immediately before the treated group became treated.
Usage
cont_two_by_two_subset(
data,
g,
tp,
control_group = "notyettreated",
anticipation = 0,
base_period = "varying",
...
)
Arguments
data |
the full dataset |
g |
the current group |
tp |
the current time period |
control_group |
whether to use "notyettreated" (default) or "nevertreated" |
anticipation |
the number of periods of anticipation (i.e., number of periods before the treatment happens where the treatment can "already" affect the outcome) |
base_period |
The type of base period to use. This only affects the numeric value of results in pre-treatment periods. Results in post-treatment periods are not affected by this choice. The default is "varying", where the base period will "back up" to the immediately preceding period in pre-treatment periods. The other option is "universal" where the base period is fixed in pre-treatment periods to be the period right before the treatment starts. "Universal" is commonly used in difference-in-differences applications, but can be unnatural for other identification strategies. |
... |
extra arguments to get the subset correct |
Value
list that contains correct subset of data, n1
number of observations
in this subset, and disidx
a vector of the correct ids for this
subset.
Plot Results with a Continuous Treatment
Description
a function to plot results with a continuous treatment
Usage
ggcont_did(dose_obj, type = "att")
Arguments
dose_obj |
a result from running |
type |
whether to plot ATT(d) or ACRT(d), defaults to |
Value
A ggplot object
Examples
# build small simulated data
set.seed(1234)
df <- simulate_contdid_data(
n = 5000,
num_time_periods = 4,
num_groups = 4,
dose_linear_effect = 0,
dose_quadratic_effect = 0
)
# estimate effects of continuous treatment
cd_res <- cont_did(
yname = "Y",
tname = "time_period",
idname = "id",
dname = "D",
data = df,
gname = "G",
target_parameter = "slope",
aggregation = "dose",
treatment_type = "continuous",
control_group = "notyettreated",
biters = 50,
cband = TRUE,
num_knots = 1,
degree = 3,
)
# plot ATT as a function of the dose
ggcont_did(cd_res, type = "att")
# plot ACRT as a function of the dose
ggcont_did(cd_res, type = "acrt")
Setup for DiD with a Continuous Treatment
Description
A function that creates a pte_params object, adding several different variables that are needed when there is a continuous treatment.
Usage
setup_pte_cont(
yname,
gname,
tname,
idname,
data,
xformula = ~1,
target_parameter,
aggregation,
treatment_type,
required_pre_periods = 1,
anticipation = 0,
base_period = "varying",
cband = TRUE,
alp = 0.05,
boot_type = "multiplier",
weightsname = NULL,
gt_type = "att",
biters = 100,
cl = 1,
dname,
dvals = NULL,
degree = 1,
num_knots = 0,
...
)
Arguments
yname |
Name of outcome in |
gname |
Name of group in |
tname |
Name of time period in |
idname |
Name of id in |
data |
balanced panel data |
xformula |
A formula for additional covariates. This is not currently supported. |
target_parameter |
Two options are "level" and "slope". In the first case, the function will report level effects, i.e., ATT's. In the second case, the function will report slope effects, i.e., ACRT's |
aggregation |
"dose" averages across timing-groups and time periods and provides results as a function of the dose. "eventstudy" averages across timing-groups and doses and reports results as a function of the length of exposure to the treatment. "none" is a stub for reporting fully disaggregated results that can be processed as desired by the user. This is not currently supported though. The combination of the arguments |
treatment_type |
"continuous" or "discrete" depending on the nature of the treatment. Default is "continuous". "discrete" is not yet supported. |
required_pre_periods |
The number of required pre-treatment periods to implement the estimation strategy. Default is 1. |
anticipation |
how many periods before the treatment actually takes place that it can have an effect on outcomes |
base_period |
The type of base period to use. This only affects the numeric value of results in pre-treatment periods. Results in post-treatment periods are not affected by this choice. The default is "varying", where the base period will "back up" to the immediately preceding period in pre-treatment periods. The other option is "universal" where the base period is fixed in pre-treatment periods to be the period right before the treatment starts. "Universal" is commonly used in difference-in-differences applications, but can be unnatural for other identification strategies. |
cband |
whether or not to report a uniform (instead of pointwise) confidence band (default is TRUE) |
alp |
significance level; default is 0.05 |
boot_type |
which type of bootstrap to use |
weightsname |
The name of the column that contains sampling weights. The default is NULL, in which case no sampling weights are used. |
gt_type |
which type of group-time effects are computed.
The default is "att". Different estimation strategies can implement
their own choices for |
biters |
number of bootstrap iterations; default is 100 |
cl |
number of clusters to be used when bootstrapping; default is 1 |
dname |
The name of the treatment variable in the data. The functionality of
|
dvals |
an optional argument specifying which values of the
treatment to evaluate ATT(d) and/or ACRT(d). If no values are
supplied, then the default behavior is to set
|
degree |
The degree of the B-Spline used in estimation. The default is 3, which in
combination with the default choice for the |
num_knots |
The number of knots to include for the B-Spline. The default is 0 so that the spline is global (i.e., this will amount to fitting a global polynomial). There is a bias-variance tradeoff for including more or less knots. |
... |
additional arguments |
Value
pte_params
object
Simulate data for DiD with a Continuous Treatment
Description
A function that simulates panel data when there is a continuous treatment.
Besides the parameters that can be passed to the function, some values are hard coded. The individual fixed effect is drawn from a normal distribution with mean equal to the group. The time effects are hard coded to be equal to the time period. The dose is drawn from a uniform distribution between 0 and 1.
Usage
simulate_contdid_data(
n = 5000,
num_time_periods = 4,
num_groups = num_time_periods,
pg = rep(1/num_groups, num_groups - 1),
pu = 1/(num_groups),
dose_linear_effect = 0,
dose_quadratic_effect = 0
)
Arguments
n |
The number of cross-sectional units. Default is 5000. |
num_time_periods |
The number of time periods. Default is 4. |
num_groups |
The number of groups. Default is the number of time periods. In this case, the groups will consist of a never-treated group and groups that become treated in every period starting in the second period. |
pg |
A vector of probabilities that a unit will be in a particular treated group. The default is equal probabilities. |
pu |
The probability that a unit will be in the never-treated group. The default is that it is 1/num_groups. |
dose_linear_effect |
The linear effect of the treatment. Default is 0. |
dose_quadratic_effect |
The quadratic effect of the treatment. Default is 0. |
Value
A balanced panel data frame with the following columns:
id: unit id
time_period: time period
Y: outcome
G: unit's group
D: amount of the treatment