Type: | Package |
Title: | Precision Medicine |
Version: | 1.1.0 |
Description: | A doubly robust precision medicine approach to fit, cross-validate and visualize prediction models for the conditional average treatment effect (CATE). It implements doubly robust estimation and semiparametric modeling approach of treatment-covariate interactions as proposed by Yadlowsky et al. (2020) <doi:10.1080/01621459.2020.1772080>. |
Depends: | R (≥ 3.5.0) |
License: | Apache License (== 2.0) |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.2 |
Imports: | dplyr, gbm, gam, ggplot2, glmnet, graphics, MASS, mgcv, rlang, stringr, tidyr, survival, randomForestSRC |
NeedsCompilation: | no |
BugReports: | https://github.com/smartdata-analysis-and-statistics/precmed/issues |
URL: | https://github.com/smartdata-analysis-and-statistics/precmed, https://smartdata-analysis-and-statistics.github.io/precmed/ |
Packaged: | 2024-10-05 09:19:43 UTC; Thomas |
Author: | Lu Tian |
Maintainer: | Thomas Debray <tdebray@fromdatatowisdom.com> |
Repository: | CRAN |
Date/Publication: | 2024-10-05 17:30:02 UTC |
Compute the area between curves from the "precmed"
object
Description
Compute the area between curves (ABC) for each scoring method in the "precmed"
object.
This should be run only after results of catecv()
have been obtained.
Usage
abc(x, ...)
## Default S3 method:
abc(x, ...)
Arguments
x |
An object of class |
... |
Additional arguments (currently unused). |
Details
The ABC is the area between a validation curve and the overall ATE in the validation set.
It is calculated for each scoring method separately. Higher ABC values are preferable as they
indicate that more treatment effect heterogeneity is captured by the scoring method.
Negative values of ABC are possible if segments of the validation curve cross the overall ATE line.
The ABC is calculated with the auc()
in utility.R
with a natural
cubic spline interpolation. The calculation of the ABC is always based on validation curves based on
100 proportions equally spaced from min(prop.cutoff)
to max(prop.cutoff)
.
The ABC is a metric to help users select the best scoring method in terms of capturing treatment
effect heterogeneity in the data. It should be used in complement to the visual inspection of
the validation curves in the validation set in plot()
.
Value
Returns a matrix of numeric values with number of columns equal to the number cross-validation
iteration and number of rows equal to the number of scoring methods in x
.
References
Zhao, L., Tian, L., Cai, T., Claggett, B., & Wei, L. J. (2013). Effectively selecting a target population for a future comparative study. Journal of the American Statistical Association, 108(502), 527-539.
See Also
catecv()
function and plot()
, boxplot()
methods for
"precmed"
objects.
Examples
# Count outcome
cv_count <- catecv(response = "count",
data = countExample,
score.method = "poisson",
cate.model = y ~ age + female + previous_treatment +
previous_cost + previous_number_relapses +
offset(log(years)),
ps.model = trt ~ age + previous_treatment,
higher.y = FALSE, cv.n = 5, verbose = 1)
# ABC of the validation curves for each method and each CV iteration
abc(cv_count)
# Survival outcome
library(survival)
cv_surv <- catecv(response = "survival",
data = survivalExample,
score.method = c("poisson", "randomForest"),
cate.model = Surv(y, d) ~ age + female + previous_cost +
previous_number_relapses,
ps.model = trt ~ age + previous_treatment,
higher.y = FALSE,
cv.n = 5)
# ABC of the validation curves for each method and each CV iteration
abc(cv_surv)
Compute the area between curves from the "precmed"
object
Description
Compute the area between curves (ABC) for each scoring method in the "precmed"
object.
This should be run only after results of catecv()
have been obtained.
Usage
## S3 method for class 'precmed'
abc(x, ...)
Arguments
x |
An object of class |
... |
Additional arguments (currently unused). |
Details
The ABC is the area between a validation curve and the overall ATE in the validation set.
It is calculated for each scoring method separately. Higher ABC values are preferable as they
indicate that more treatment effect heterogeneity is captured by the scoring method.
Negative values of ABC are possible if segments of the validation curve cross the overall ATE line.
The ABC is calculated with the auc()
in utility.R
with a natural
cubic spline interpolation. The calculation of the ABC is always based on validation curves based on
100 proportions equally spaced from min(prop.cutoff)
to max(prop.cutoff)
.
The ABC is a metric to help users select the best scoring method in terms of capturing treatment
effect heterogeneity in the data. It should be used in complement to the visual inspection of
the validation curves in the validation set in plot()
.
Value
Returns a matrix of numeric values with number of columns equal to the number cross-validation
iteration and number of rows equal to the number of scoring methods in x
.
References
Zhao, L., Tian, L., Cai, T., Claggett, B., & Wei, L. J. (2013). Effectively selecting a target population for a future comparative study. Journal of the American Statistical Association, 108(502), 527-539.
See Also
catecv()
function and plot()
, boxplot()
methods for
"precmed"
objects.
Examples
# Count outcome
cv_count <- catecv(response = "count",
data = countExample,
score.method = "poisson",
cate.model = y ~ age + female + previous_treatment +
previous_cost + previous_number_relapses +
offset(log(years)),
ps.model = trt ~ age + previous_treatment,
higher.y = FALSE, cv.n = 5, verbose = 1)
# ABC of the validation curves for each method and each CV iteration
abc(cv_count)
# Survival outcome
library(survival)
cv_surv <- catecv(response = "survival",
data = survivalExample,
score.method = c("poisson", "randomForest"),
cate.model = Surv(y, d) ~ age + female + previous_cost +
previous_number_relapses,
ps.model = trt ~ age + previous_treatment,
higher.y = FALSE,
cv.n = 5)
# ABC of the validation curves for each method and each CV iteration
abc(cv_surv)
Check arguments
Catered to all types of outcome
Apply at the beginning of pmcount()
, cvcount()
, drcount.inference()
, catefitsurv()
, catecvsurv()
, and drsurv.inference()
Description
Check arguments
Catered to all types of outcome
Apply at the beginning of pmcount()
, cvcount()
, drcount.inference()
, catefitsurv()
, catecvsurv()
, and drsurv.inference()
Usage
arg.checks(
fun,
response,
data,
followup.time = NULL,
tau0 = NULL,
surv.min = NULL,
ipcw.method = NULL,
ps.method,
minPS,
maxPS,
higher.y = NULL,
score.method = NULL,
abc = NULL,
prop.cutoff = NULL,
prop.multi = NULL,
train.prop = NULL,
cv.n = NULL,
error.max = NULL,
max.iter = NULL,
initial.predictor.method = NULL,
tree.depth = NULL,
n.trees.rf = NULL,
n.trees.boosting = NULL,
B = NULL,
Kfold = NULL,
plot.gbmperf = NULL,
error.maxNR = NULL,
max.iterNR = NULL,
tune = NULL,
n.boot = NULL,
plot.boot = NULL,
interactions = NULL
)
Arguments
fun |
A function for which argument check is needed; "catefit" for |
response |
The type of response. Always 'survival' for this function. |
data |
A data frame containing the variables in the outcome and propensity score models;
a data frame with |
followup.time |
Follow-up time, interpreted as the potential censoring time. If the potential censoring time is known,
followup.time is the name of a corresponding column in the data. Otherwise, set |
tau0 |
The truncation time for defining restricted mean time lost. |
surv.min |
Lower truncation limit for probability of being censored (positive and very close to 0). |
ipcw.method |
The censoring model. Allowed values are: |
ps.method |
A character value for the method to estimate the propensity score.
Allowed values include one of:
|
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
higher.y |
A logical value indicating whether higher ( |
score.method |
A vector of one or multiple methods to estimate the CATE score.
Allowed values are: |
abc |
A logical value indicating whether the area between curves (ABC) should be calculated
at each cross-validation iterations, for each |
prop.cutoff |
A vector of numerical values (in '(0, 1]') specifying percentiles of the
estimated log CATE scores to define nested subgroups. Each element represents the cutoff to
separate observations in nested subgroups (below vs above cutoff).
The length of |
prop.multi |
A vector of numerical values (in '[0, 1]') specifying percentiles of the
estimated log CATE scores to define mutually exclusive subgroups.
It should start with 0, end with 1, and be of |
train.prop |
A numerical value (in '(0, 1)') indicating the proportion of total data used
for training. Default is |
cv.n |
A positive integer value indicating the number of cross-validation iterations.
Default is |
error.max |
A numerical value > 0 indicating the tolerance (maximum value of error)
for the largest standardized absolute difference in the covariate distributions or in the
doubly robust estimated rate ratios between the training and validation sets. This is used
to define a balanced training-validation splitting. Default is |
max.iter |
A positive integer value indicating the maximum number of iterations when
searching for a balanced training-validation split. Default is |
initial.predictor.method |
A character vector for the method used to get initial
outcome predictions conditional on the covariates in |
tree.depth |
A positive integer specifying the depth of individual trees in boosting
(usually 2-3). Used only if |
n.trees.rf |
A positive integer specifying the maximum number of trees in random forest.
Used if |
n.trees.boosting |
A positive integer specifying the maximum number of trees in boosting
(usually 100-1000). Used if |
B |
A positive integer specifying the number of time cross-fitting is repeated in
|
Kfold |
A positive integer specifying the number of folds (parts) used in cross-fitting
to partition the data in |
plot.gbmperf |
A logical value indicating whether to plot the performance measures in
boosting. Used only if |
error.maxNR |
A numerical value > 0 indicating the minimum value of the mean absolute
error in Newton Raphson algorithm. Used only if |
max.iterNR |
A positive integer indicating the maximum number of iterations in the
Newton Raphson algorithm. Used only if |
tune |
A vector of 2 numerical values > 0 specifying tuning parameters for the
Newton Raphson algorithm. |
n.boot |
A numeric value indicating the number of bootstrap samples used. This is only relevant
if |
plot.boot |
A logic value indicating whether histograms of the bootstrapped log(rate ratio) should
be produced at every |
interactions |
A logical value indicating whether the outcome model should assume interactions
between |
Value
Nothing. Will stop if arguments are incorrect.
Check arguments that are common to all types of outcome
USed inside arg.checks()
Description
Check arguments that are common to all types of outcome
USed inside arg.checks()
Usage
arg.checks.common(
fun,
ps.method,
minPS,
maxPS,
higher.y = NULL,
abc = NULL,
prop.cutoff = NULL,
prop.multi = NULL,
B = NULL,
Kfold = NULL,
plot.gbmperf = NULL,
tree.depth = NULL,
n.trees.boosting = NULL,
error.maxNR = NULL,
max.iterNR = NULL,
tune = NULL,
train.prop = NULL,
cv.n = NULL,
error.max = NULL,
max.iter = NULL,
n.boot = NULL,
plot.boot = NULL
)
Arguments
fun |
A function for which argument check is needed; "pm" for |
ps.method |
A character value for the method to estimate the propensity score.
Allowed values include one of:
|
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
higher.y |
A logical value indicating whether higher ( |
abc |
A logical value indicating whether the area between curves (ABC) should be calculated
at each cross-validation iterations, for each |
prop.cutoff |
A vector of numerical values (in '(0, 1]') specifying percentiles of the
estimated log CATE scores to define nested subgroups. Each element represents the cutoff to
separate observations in nested subgroups (below vs above cutoff).
The length of |
prop.multi |
A vector of numerical values (in '[0, 1]') specifying percentiles of the
estimated log CATE scores to define mutually exclusive subgroups.
It should start with 0, end with 1, and be of |
B |
A positive integer specifying the number of time cross-fitting is repeated in
|
Kfold |
A positive integer specifying the number of folds (parts) used in cross-fitting
to partition the data in |
plot.gbmperf |
A logical value indicating whether to plot the performance measures in
boosting. Used only if |
tree.depth |
A positive integer specifying the depth of individual trees in boosting
(usually 2-3). Used only if |
n.trees.boosting |
A positive integer specifying the maximum number of trees in boosting
(usually 100-1000). Used only if |
error.maxNR |
A numerical value > 0 indicating the minimum value of the mean absolute
error in Newton Raphson algorithm. Used only if |
max.iterNR |
A positive integer indicating the maximum number of iterations in the
Newton Raphson algorithm. Used only if |
tune |
A vector of 2 numerical values > 0 specifying tuning parameters for the
Newton Raphson algorithm. |
train.prop |
A numerical value (in '(0, 1)') indicating the proportion of total data used
for training. Default is |
cv.n |
A positive integer value indicating the number of cross-validation iterations.
Default is |
error.max |
A numerical value > 0 indicating the tolerance (maximum value of error)
for the largest standardized absolute difference in the covariate distributions or in the
doubly robust estimated rate ratios between the training and validation sets. This is used
to define a balanced training-validation splitting. Default is |
max.iter |
A positive integer value indicating the maximum number of iterations when
searching for a balanced training-validation split. Default is |
n.boot |
A numeric value indicating the number of bootstrap samples used. This is only relevant
if |
plot.boot |
A logic value indicating whether histograms of the bootstrapped log(rate ratio) should
be produced at every |
Value
Nothing. Will stop if arguments are incorrect.
Doubly robust estimator of and inference for the average treatment effect for count, survival and continuous data
Description
Doubly robust estimator of the average treatment effect between two treatments, which is the rate ratio for count outcomes, the restricted mean time lost ratio for survival outcomes and the mean difference for continuous outcome. Bootstrap is used for inference.
Usage
atefit(
response,
data,
cate.model,
ps.model,
ps.method = "glm",
ipcw.model = NULL,
ipcw.method = "breslow",
minPS = 0.01,
maxPS = 0.99,
followup.time = NULL,
tau0 = NULL,
surv.min = 0.025,
interactions = TRUE,
n.boot = 500,
seed = NULL,
verbose = 0
)
Arguments
response |
A string describing the type of outcome in the data. Allowed values include
"count" (see |
data |
A data frame containing the variables in the outcome, propensity score, and inverse
probability of censoring models (if specified); a data frame with |
cate.model |
A formula describing the outcome model to be fitted.
The outcome must appear on the left-hand side. For survival outcomes,
a |
ps.model |
A formula describing the propensity score (PS) model to be fitted. The treatment must
appear on the left-hand side. The treatment must be a numeric vector coded as 0/1.
If data are from a randomized controlled trial, specify |
ps.method |
A character value for the method to estimate the propensity score.
Allowed values include one of:
|
ipcw.model |
A formula describing the inverse probability of censoring weighting (IPCW)
model to be fitted. The left-hand side must be empty. Only applies for survival outcomes.
Default is |
ipcw.method |
A character value for the censoring model. Only applies for survival
outcomes. Allowed values are: |
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
followup.time |
A column name in |
tau0 |
The truncation time for defining restricted mean time lost. Only applies for
survival outcomes. Default is |
surv.min |
Lower truncation limit for the probability of being censored.
It must be a positive value and should be chosen close to 0. Only applies for survival
outcomes. Default is |
interactions |
A logical value indicating whether the outcome model should assume interactions
between |
n.boot |
A numeric value indicating the number of bootstrap samples used. Default is |
seed |
An optional integer specifying an initial randomization seed for reproducibility.
Default is |
verbose |
An integer value indicating whether intermediate progress messages and histograms should
be printed. |
Details
For count response, see details in atefitcount()
.
For survival response, see details in atefitsurv()
.
Value
For count response, see description of outputs in atefitcount()
.
For survival response, see description of outputs in atefitsurv()
.
Examples
# Count outcome
output <- atefit(response = "count",
data = countExample,
cate.model = y ~ age + female + previous_treatment +
previous_cost + previous_number_relapses +
offset(log(years)),
ps.model = trt ~ age + previous_treatment,
n.boot = 50,
seed = 999)
output
plot(output)
# Survival outcome
tau0 <- with(survivalExample,
min(quantile(y[trt == "drug1"], 0.95), quantile(y[trt == "drug0"], 0.95)))
output2 <- atefit(response = "survival",
data = survivalExample,
cate.model = survival::Surv(y, d) ~ age + female +
previous_cost + previous_number_relapses,
ps.model = trt ~ age + previous_treatment,
tau0 = tau0,
seed = 999)
output2
plot(output2)
Doubly robust estimator of and inference for the average treatment effect for count data
Description
Doubly robust estimator of the average treatment effect between two treatments, which is the rate ratio for count outcomes. Bootstrap is used for inference.
Usage
atefitcount(
data,
cate.model,
ps.model,
ps.method = "glm",
minPS = 0.01,
maxPS = 0.99,
interactions = TRUE,
n.boot = 500,
seed = NULL,
verbose = 0
)
Arguments
data |
A data frame containing the variables in the outcome, propensity
score, and inverse probability of censoring models (if specified); a data
frame with |
cate.model |
A formula describing the outcome model to be fitted. The outcome must appear on the left-hand side. |
ps.model |
A formula describing the propensity score (PS) model to be
fitted. The treatment must appear on the left-hand side. The treatment must
be a numeric vector coded as 0 or 1. If data are from a randomized controlled
trial, specify |
ps.method |
A character value for the method to estimate the propensity
score. Allowed values include one of: |
minPS |
A numerical value between 0 and 1 below which estimated
propensity scores should be truncated. Default is |
maxPS |
A numerical value between 0 and 1 above which estimated
propensity scores should be truncated. Must be strictly greater than
|
interactions |
A logical value indicating whether the outcome model
should assume treatment-covariate interaction by |
n.boot |
A numeric value indicating the number of bootstrap samples
used. Default is |
seed |
An optional integer specifying an initial randomization seed for
reproducibility. Default is |
verbose |
An integer value indicating whether intermediate progress
messages should be printed. |
Details
This helper function estimates the average treatment effect (ATE) between two treatment groups in a given dataset. The ATE is estimated with a doubly robust estimator that accounts for imbalances in covariate distributions between the two treatment groups with inverse probability treatment weighting. For count outcomes, the estimated ATE is the estimated rate ratio between treatment 1 versus treatment 0.
Value
Return an item of the class atefit
with the following
elements:
log.rate.ratio
: A vector of numeric values of the estimated ATE (expressed as a log rate ratio oftrt=1
overtrt=0
), the bootstrap standard error, the lower and upper limits of 95% confidence interval, and the p-value.rate0
: A numeric value of the estimated rate in the grouptrt=0
.rate1
: A numeric value of the estimated rate in the grouptrt=1
.trt.boot
: Estimated log rate ratios in each bootstrap sample.warning
: A warning message produced if the treatment variable was not coded as 0 or 1. The key to map the original coding of the variable to a 0-1 coding is displayed in the warning to facilitate the interpretation of the remaining of the output.
Examples
output <- atefitcount(data = countExample,
cate.model = y ~ age + female + previous_treatment +
previous_cost + previous_number_relapses +
offset(log(years)),
ps.model = trt ~ age + previous_treatment,
verbose = 1, n.boot = 50, seed = 999)
output
plot(output)
Doubly robust estimator of and inference for the average treatment effect for continuous data
Description
Doubly robust estimator of the average treatment effect between two treatments, which is the rate ratio of treatment 1 over treatment 0 for count outcomes. Bootstrap is used for inference.
Usage
atefitmean(
data,
cate.model,
ps.model,
ps.method = "glm",
minPS = 0.01,
maxPS = 0.99,
interactions = TRUE,
n.boot = 500,
plot.boot = FALSE,
seed = NULL,
verbose = 0
)
Arguments
data |
A data frame containing the variables in the outcome and
propensity score models; a data frame with |
cate.model |
A formula describing the outcome model to be fitted. The outcome must appear on the left-hand side. |
ps.model |
A formula describing the propensity score model to be fitted.
The treatment must appear on the left-hand side. The treatment must be a
numeric vector coded as 0/1. If data are from a RCT, specify |
ps.method |
A character value for the method to estimate the propensity
score. Allowed values include one of: |
minPS |
A numerical value between 0 and 1 below which estimated
propensity scores should be truncated. Default is |
maxPS |
A numerical value between 0 and 1 above which estimated
propensity scores should be truncated. Must be strictly greater than
|
interactions |
A logical value indicating whether the outcome model
should be fitted separately by treatment arm with the variables in
|
n.boot |
A numeric value indicating the number of bootstrap samples
used. Default is |
plot.boot |
A logical value indicating whether histograms of the
bootstrapped treatment effect estimates should be produced at every
|
seed |
An optional integer specifying an initial randomization seed for
reproducibility. Default is |
verbose |
An integer value indicating whether intermediate progress
messages and histograms should be printed. |
Details
This helper function estimates the average treatment effect (ATE) between two
treatment groups in a given dataset specified by y, trt, x.cate, x.ps, time
. The ATE is
estimated with a doubly robust estimator that accounts for imbalances in covariate distributions
between the two treatment groups with inverse probability treatment weighting.
For count outcomes, the estimated ATE is the estimated
rate ratio between treatment 1 versus treatment 0. Both original and log-transformed ATEs are
returned, as well as the rate in either treatment group.
If inference = TRUE
, the variability of the estimated rate ratio is also calculated
using bootstrap. Additional variability outputs include standard error of the log rate ratio,
95% confidence interval of the rate ratio, p-value, and a histogram of the log rate ratio.
Value
Return a list of 8 elements:
log.rate.ratio
: A numeric value of the estimated log rate ratio.se.boot.log.rate.ratio
: A numeric value of the bootstrap standard error of log rate ratio.rate.ratio
: A numeric value of the estimated rate ratio.rate.ratio0
: A numeric value of the estimated rate in the group trt=0.rate.ratio1
: A numeric value of the estimated rate in the group trt=1.rate.ratio.CIl
: A numeric value of the lower limit 95% bootstrap confidence interval for estimated rate ratio.rate.ratio.CIu
: A numeric value of the upper limit 95% bootstrap confidence interval for estimated rate ratio.pvalue
: A numeric value of the p-value derived from the bootstrapped values based on a Chi-squared distribution.warning
: A warning message produced if the treatment variable was not coded as 0/1. The key to map the original coding of the variable to a 0/1 key is displayed in the warning to facilitate the interpretation of the remaining of the output.plot
: Ifplot.boot
isTRUE
, a histogram displaying the distribution of the bootstrapped log rate ratios. The red vertical reference line in the histogram represents the estimated log rate ratio.
Examples
# This module is not implemented yet!
Doubly robust estimator of and inference for the average treatment effect for survival data
Description
Doubly robust estimator of the average treatment effect between two treatments, which is the restricted mean time lost ratio for survival outcomes. Bootstrap is used for inference.
Usage
atefitsurv(
data,
cate.model,
ps.model,
ps.method = "glm",
ipcw.model = NULL,
ipcw.method = "breslow",
minPS = 0.01,
maxPS = 0.99,
followup.time = NULL,
tau0 = NULL,
surv.min = 0.025,
n.boot = 500,
seed = NULL,
verbose = 0
)
Arguments
data |
A data frame containing the variables in the outcome, propensity score, and inverse
probability of censoring models (if specified); a data frame with |
cate.model |
A formula describing the outcome model to be fitted.
The outcome must appear on the left-hand side. For survival outcomes, a |
ps.model |
A formula describing the propensity score (PS) model to be fitted. The treatment must
appear on the left-hand side. The treatment must be a numeric vector coded as 0/1.
If data are from a randomized controlled trial, specify |
ps.method |
A character value for the method to estimate the propensity score.
Allowed values include one of:
|
ipcw.model |
A formula describing the inverse probability of censoring weighting (IPCW)
model to be fitted. The left-hand side must be empty. Only applies for survival outcomes.
Default is |
ipcw.method |
A character value for the censoring model. Only applies for survival
outcomes. Allowed values are: |
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
followup.time |
A column name in |
tau0 |
The truncation time for defining restricted mean time lost. Only applies for
survival outcomes. Default is |
surv.min |
Lower truncation limit for the probability of being censored.
It must be a positive value and should be chosen close to 0. Only applies for survival
outcomes. Default is |
n.boot |
A numeric value indicating the number of bootstrap samples used. Default is |
seed |
An optional integer specifying an initial randomization seed for reproducibility.
Default is |
verbose |
An integer value indicating whether intermediate progress messages should
be printed. |
Details
This helper function estimates the average treatment effect (ATE) for survival data between two treatment groups in a given dataset. The ATE is estimated with a doubly robust estimator that accounts for imbalances in covariate distributions between the two treatment groups with inverse probability treatment and censoring weighting. For survival outcomes, the estimated ATE is the estimated by RMTL ratio between treatment 1 versus treatment 0. The log-transformed ATEs and log-transformed adjusted hazard ratios are returned, as well as the estimated RMST in either treatment group. The variability of the estimated RMTL ratio is calculated using bootstrap. Additional outputs include standard error of the log RMTL ratio, 95% confidence interval, p-value, and a histogram of the bootstrap estimates.
Value
Return an object of class atefit
with 6 elements:
rmst1
: A vector of numeric values of the estimated RMST, bootstrap standard error, lower and upper limits of 95% confidence interval, and the p-value in the grouptrt=1
.rmst0
: A vector of numeric values of the estimated RMST, bootstrap standard error, lower and upper limits of 95% confidence interval, and the p-value in the grouptrt=0
.log.rmtl.ratio
: A vector of numeric values of the estimated log RMTL ratio oftrt=1
overtrt=0
, bootstrap standard error, lower and upper limits of 95% confidence interval, and the p-value.log.hazard.ratio
: A vector of numeric values of the estimated adjusted log hazard ratio oftrt=1
overtrt=0
, bootstrap standard error, lower and upper limits of 95% confidence interval, and the p-value.trt.boot
: Estimates ofrmst1
,rmst0
,log.rmtl.ratio
andlog.hazard.ratio
in each bootstrap sample.warning
: A warning message produced if the treatment variable was not coded as 0/1. The key to map the original coding of the variable to a 0/1 key is displayed in the warning to facilitate the interpretation of the remaining of the output.
Examples
library(survival)
tau0 <- with(survivalExample,
min(quantile(y[trt == "drug1"], 0.95), quantile(y[trt == "drug0"], 0.95)))
output <- atefitsurv(data = survivalExample,
cate.model = Surv(y, d) ~ age + female +
previous_cost + previous_number_relapses,
ps.model = trt ~ age + previous_treatment,
tau0 = tau0,
n.boot = 50,
seed = 999,
verbose = 1)
output
plot(output)
Compute the area under the curve using linear or natural spline interpolation
Description
This function computes the area under the curve for two vectors where one corresponds to the x values and the other corresponds to the y values. It supports both linear and spline interpolation.
Usage
auc(
x,
y,
from = min(x, na.rm = TRUE),
to = max(x, na.rm = TRUE),
type = c("linear", "spline"),
subdivisions = 100,
...
)
Arguments
x |
A numeric vector of x values. |
y |
A numeric vector of y values of the same length as x. |
from |
The value from where to start calculating the area under the curve. Defaults to the smallest x value. |
to |
The value from where to end the calculation of the area under the curve. Defaults to the greatest x value. |
type |
The type of interpolation: "linear" or "spline". Defaults to "linear". |
subdivisions |
An integer indicating how many subdivisions to use for 'integrate' (for spline-based approximations). |
... |
Additional arguments passed on to 'approx' (for linear interpolations). |
Value
A numeric value representing the area under the curve.
Split the given dataset into balanced training and validation sets (within a pre-specified tolerance) Balanced means 1) The ratio of treated and controls is maintained in the training and validation sets 2) The covariate distributions are balanced between the training and validation sets
Description
Split the given dataset into balanced training and validation sets (within a pre-specified tolerance) Balanced means 1) The ratio of treated and controls is maintained in the training and validation sets 2) The covariate distributions are balanced between the training and validation sets
Usage
balance.split(
y,
trt,
x.cate,
x.ps,
time,
minPS = 0.01,
maxPS = 0.99,
train.prop = 3/4,
error.max = 0.1,
max.iter = 5000
)
Arguments
y |
Observed outcome; vector of size |
trt |
Treatment received; vector of size |
x.cate |
Matrix of |
x.ps |
Matrix of |
time |
Log-transformed person-years of follow-up; vector of size |
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
train.prop |
A numerical value (in '(0, 1)') indicating the proportion of total data used
for training. Default is |
error.max |
A numerical value > 0 indicating the tolerance (maximum value of error)
for the largest standardized absolute difference in the covariate distributions or in the
doubly robust estimated rate ratios between the training and validation sets. This is used
to define a balanced training-validation splitting. Default is |
max.iter |
A positive integer value indicating the maximum number of iterations when
searching for a balanced training-validation split. Default is |
Value
A list of 10 objects, 5 training and 5 validation of y, trt, x.cate, x.ps, time:
y.train - observed outcome in the training set; vector of size m
(observations in the training set)
trt.train - treatment received in the training set; vector of size m
coded as 0/1
x.cate.train - baseline covariates for the outcome model in the training set; matrix of dimension m
by p.cate
x.ps.train - baseline covariates (plus intercept) for the propensity score model in the training set; matrix of dimension m
by p.ps + 1
time.train - log-transformed person-years of follow-up in the training set; vector of size m
y.valid - observed outcome in the validation set; vector of size n-m
trt.valid - treatment received in the validation set; vector of size n-m
coded as 0/1
x.cate.valid - baseline covariates for the outcome model in the validation set; matrix of dimension n-m
by p.cate
x.ps.valid - baseline covariates (plus intercept) for the propensity score model in the validation set; matrix of dimension n-m
by p.ps + 1
time.valid - log-transformed person-years of follow-up in the validation set; vector of size n-m
Split the given dataset into balanced training and validation sets (within a pre-specified tolerance) Balanced means 1) The ratio of treated and controls is maintained in the training and validation sets 2) The covariate distributions are balanced between the training and validation sets
Description
Split the given dataset into balanced training and validation sets (within a pre-specified tolerance) Balanced means 1) The ratio of treated and controls is maintained in the training and validation sets 2) The covariate distributions are balanced between the training and validation sets
Usage
balancemean.split(
y,
trt,
x.cate,
x.ps,
minPS = 0.01,
maxPS = 0.99,
train.prop = 3/4,
error.max = 0.1,
max.iter = 5000
)
Arguments
y |
Observed outcome; vector of size |
trt |
Treatment received; vector of size |
x.cate |
Matrix of |
x.ps |
Matrix of |
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
train.prop |
A numerical value (in '(0, 1)') indicating the proportion of total data used
for training. Default is |
error.max |
A numerical value > 0 indicating the tolerance (maximum value of error)
for the largest standardized absolute difference in the covariate distributions or in the
doubly robust estimated rate ratios between the training and validation sets. This is used
to define a balanced training-validation splitting. Default is |
max.iter |
A positive integer value indicating the maximum number of iterations when
searching for a balanced training-validation split. Default is |
Value
A list of 10 objects, 5 training and 5 validation of y, trt, x.cate, x.ps, time:
y.train - observed outcome in the training set; vector of size m
(observations in the training set)
trt.train - treatment received in the training set; vector of size m
coded as 0/1
x.cate.train - baseline covariates for the outcome model in the training set; matrix of dimension m
by p.cate
x.ps.train - baseline covariates (plus intercept) for the propensity score model in the training set; matrix of dimension m
by p.ps + 1
y.valid - observed outcome in the validation set; vector of size n-m
trt.valid - treatment received in the validation set; vector of size n-m
coded as 0/1
x.cate.valid - baseline covariates for the outcome model in the validation set; matrix of dimension n-m
by p.cate
x.ps.valid - baseline covariates (plus intercept) for the propensity score model in the validation set; matrix of dimension n-m
by p.ps + 1
bestid.valid - id for the validation set by the best split; vector of size n-m
Split the given time-to-event dataset into balanced training and validation sets (within a pre-specified tolerance) Balanced means 1) The ratio of treated and controls is maintained in the training and validation sets 2) The covariate distributions are balanced between the training and validation sets
Description
Split the given time-to-event dataset into balanced training and validation sets (within a pre-specified tolerance) Balanced means 1) The ratio of treated and controls is maintained in the training and validation sets 2) The covariate distributions are balanced between the training and validation sets
Usage
balancesurv.split(
y,
d,
trt,
x.cate,
x.ps,
x.ipcw,
yf = NULL,
train.prop = 3/4,
error.max = 0.1,
max.iter = 5000
)
Arguments
y |
Observed survival or censoring time; vector of size |
d |
The event indicator, normally |
trt |
Treatment received; vector of size |
x.cate |
Matrix of |
x.ps |
Matrix of |
x.ipcw |
Matrix of |
yf |
Follow-up time, interpreted as the potential censoring time; vector of size |
train.prop |
A numerical value (in '(0, 1)') indicating the proportion of total data used
for training. Default is |
error.max |
A numerical value > 0 indicating the tolerance (maximum value of error)
for the largest standardized absolute difference in the covariate distributions or in the
doubly robust estimated rate ratios between the training and validation sets. This is used
to define a balanced training-validation splitting. Default is |
max.iter |
A positive integer value indicating the maximum number of iterations when
searching for a balanced training-validation split. Default is |
Value
A list of 14 objects, 7training and 7 validation of y, trt, x.cate, x.ps, x.ipcw, time, yf:
y.train - observed survival or censoring time in the training set; vector of size m
(observations in the training set)
d.train - event indicator in the training set; vector of size m
coded as 0/1
trt.train - treatment received in the training set; vector of size m
coded as 0/1
x.cate.train - baseline covariates for the outcome model in the training set; matrix of dimension m
by p.cate
x.ps.train - baseline covariates (plus intercept) for the propensity score model in the training set; matrix of dimension m
by p.ps + 1
x.ipcw.train - baseline covariates for inverse probability of censoring in the training set; matrix of dimension m
by p.ipw
yf.train - follow-up time in the training set; if known, vector of size m
; if unknown, yf == NULL
y.valid - observed survival or censoring time in the validation set; vector of size n-m
d.valid - event indicator in the validation set; vector of size n-m
coded as 0/1
trt.valid - treatment received in the validation set; vector of size n-m
coded as 0/1
x.cate.valid - baseline covariates for the outcome model in the validation set; matrix of dimension n-m
by p.cate
x.ps.valid - baseline covariates (plus intercept) for the propensity score model in the validation set; matrix of dimension n-m
by p.ps + 1
x.ipcw.valid - baseline covariates for inverse probability of censoring in the validation set; matrix of dimension n-m
by p.ipw
yf.valid - follow-up time in the training set; if known, vector of size n-m
; if unknown, yf == NULL
A set of box plots of estimated ATEs from the "precmed"
object
Description
Provides box plots which depict distributions of estimated ATEs for each multi-category subgroup in
the validation set across all cross-validation iterations. The subgroups are mutually exclusive and
are categorized by the CATE score percentiles (prop.multi
specified in catecv()
or
catecvmean()
). Box plots of mutually exclusive subgroups are constructed separately by scoring
method specified in catecv()
. This should be run only after results of catecv()
or
catecvmean()
) have been obtained.
Usage
## S3 method for class 'precmed'
boxplot(
x,
ylab = NULL,
plot.hr = FALSE,
title = waiver(),
theme = theme_classic(),
...
)
Arguments
x |
An object of class |
ylab |
A character value for the y-axis label to describe what the ATE is. Default is |
plot.hr |
A logical value indicating whether the hazard ratios should be plotted in the
validation curves ( |
title |
The text for the title |
theme |
Defaults to |
... |
Other parameters |
Details
boxplot()
takes in outputs from catecv()
and generates
the box plots of estimated ATEs for multi-category subgroups of the validation set.
The box plots together with the overall ATE reference line can help compare the scoring methods'
ability to distinguish subgroups of patients with different treatment effects.
For a given scoring method, box plots showing increasing or decreasing trends across the multi-category subgroups indicate presence of treatment effect heterogeneity (and the ability of the scoring method to capture it). On the contrary, box plots which are relatively aligned across the multi-category subgroups indicate absence of treatment effect heterogeneity (or the inability of the scoring method to capture it).
Value
Returns sets of box plots, one set for each scoring method, over each of the multi-category subgroups. A gray horizontal dashed line of the overall ATE is included as a reference.
References
Yadlowsky, S., Pellegrini, F., Lionetto, F., Braune, S., & Tian, L. (2020). Estimation and validation of ratio-based conditional average treatment effects using observational data. Journal of the American Statistical Association, 1-18. DOI: 10.1080/01621459.2020.1772080.
See Also
plot
and abc()
for "precmed"
objects.
Examples
# Count outcome
eval_1 <- catecv(response = "count",
data = countExample,
score.method = "poisson",
cate.model = y ~ age + female + previous_treatment +
previous_cost + previous_number_relapses +
offset(log(years)),
ps.model = trt ~ age + previous_treatment,
higher.y = FALSE,
cv.n = 5)
boxplot(eval_1, ylab = "Rate ratio of drug1 vs drug0 in each subgroup")
# Survival outcome
library(survival)
tau0 <- with(survivalExample,
min(quantile(y[trt == "drug1"], 0.95), quantile(y[trt == "drug0"], 0.95)))
eval_2 <- catecv(response = "survival",
data = survivalExample,
score.method = c("poisson", "randomForest"),
cate.model = Surv(y, d) ~ age + female + previous_cost +
previous_number_relapses,
ps.model = trt ~ age + previous_treatment,
initial.predictor.method = "randomForest",
ipcw.model = ~ age + previous_cost + previous_treatment,
tau0 = tau0,
higher.y = TRUE,
cv.n = 5,
seed = 999)
boxplot(eval_2, ylab = "RMTL ratio of drug1 vs drug0 in each subgroup")
Cross-validation of the conditional average treatment effect (CATE) score for count, survival or continuous outcomes
Description
Provides (doubly robust) estimation of the average treatment effect (ATE) for count, survival or continuous outcomes in nested and mutually exclusive subgroups of patients defined by an estimated conditional average treatment effect (CATE) score via cross-validation (CV).
Usage
catecv(
response,
data,
score.method,
cate.model,
ps.model,
ps.method = "glm",
init.model = NULL,
initial.predictor.method = NULL,
ipcw.model = NULL,
ipcw.method = "breslow",
minPS = 0.01,
maxPS = 0.99,
followup.time = NULL,
tau0 = NULL,
higher.y = TRUE,
prop.cutoff = seq(0.5, 1, length = 6),
prop.multi = c(0, 1/3, 2/3, 1),
abc = TRUE,
train.prop = 3/4,
cv.n = 10,
error.max = 0.1,
max.iter = 5000,
surv.min = 0.025,
xvar.smooth.score = NULL,
xvar.smooth.init = NULL,
tree.depth = 2,
n.trees.rf = 1000,
n.trees.boosting = 200,
B = 3,
Kfold = 5,
error.maxNR = 0.001,
max.iterNR = 150,
tune = c(0.5, 2),
seed = NULL,
plot.gbmperf = TRUE,
verbose = 0
)
Arguments
response |
A string describing the type of outcome in the data.
Allowed values include "count" (see |
data |
A data frame containing the variables in the outcome, propensity score, and inverse
probability of censoring models (if specified); a data frame with |
score.method |
A vector of one or multiple methods to estimate the CATE score.
Allowed values are: |
cate.model |
A formula describing the outcome model to be fitted.
The outcome must appear on the left-hand side. For survival outcomes, a
|
ps.model |
A formula describing the propensity score (PS) model to be fitted. The treatment must
appear on the left-hand side. The treatment must be a numeric vector coded as 0/1.
If data are from a randomized controlled trial, specify |
ps.method |
A character value for the method to estimate the propensity score.
Allowed values include one of:
|
init.model |
A formula describing the initial predictor model. The outcome must appear on the left-hand side.
It must be specified when |
initial.predictor.method |
A character vector for the method used to get initial
outcome predictions conditional on the covariates specified in |
ipcw.model |
A formula describing the inverse probability of censoring weighting (IPCW)
model to be fitted. The left-hand side must be empty. Only applies for survival outcomes.
Default is |
ipcw.method |
A character value for the censoring model. Only applies for survival
outcomes. Allowed values are: |
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
followup.time |
A column name in |
tau0 |
The truncation time for defining restricted mean time lost. Only applies for
survival outcomes. Default is |
higher.y |
A logical value indicating whether higher ( |
prop.cutoff |
A vector of numerical values (in '(0, 1]') specifying percentiles of the
estimated log CATE scores to define nested subgroups. Each element represents the cutoff to
separate observations in nested subgroups (below vs above cutoff).
The length of |
prop.multi |
A vector of numerical values (in '[0, 1]') specifying percentiles of the
estimated log CATE scores to define mutually exclusive subgroups.
It should start with 0, end with 1, and be of |
abc |
A logical value indicating whether the area between curves (ABC) should be calculated
at each cross-validation iterations, for each |
train.prop |
A numerical value (in '(0, 1)') indicating the proportion of total data used
for training. Default is |
cv.n |
A positive integer value indicating the number of cross-validation iterations.
Default is |
error.max |
A numerical value > 0 indicating the tolerance (maximum value of error)
for the largest standardized absolute difference in the covariate distributions or in the
doubly robust estimated rate ratios between the training and validation sets. This is used
to define a balanced training-validation splitting. Default is |
max.iter |
A positive integer value indicating the maximum number of iterations when
searching for a balanced training-validation split. Default is |
surv.min |
Lower truncation limit for the probability of being censored.
It must be a positive value and should be chosen close to 0. Only applies for survival
outcomes. Default is |
xvar.smooth.score |
A vector of characters indicating the name of the variables used as
the smooth terms if |
xvar.smooth.init |
A vector of characters indicating the name of the variables used as
the smooth terms if |
tree.depth |
A positive integer specifying the depth of individual trees in boosting
(usually 2-3). Used only if |
n.trees.rf |
A positive integer specifying the maximum number of trees in random forest.
Used if |
n.trees.boosting |
A positive integer specifying the maximum number of trees in boosting
(usually 100-1000). Used if |
B |
A positive integer specifying the number of time cross-fitting is repeated in
|
Kfold |
A positive integer specifying the number of folds used in cross-fitting
to partition the data in |
error.maxNR |
A numerical value > 0 indicating the minimum value of the mean absolute
error in Newton Raphson algorithm. Used only if |
max.iterNR |
A positive integer indicating the maximum number of iterations in the
Newton Raphson algorithm. Used only if |
tune |
A vector of 2 numerical values > 0 specifying tuning parameters for the
Newton Raphson algorithm. |
seed |
An optional integer specifying an initial randomization seed for reproducibility.
Default is |
plot.gbmperf |
A logical value indicating whether to plot the performance measures in
boosting. Used only if |
verbose |
An integer value indicating what kind of intermediate progress messages should
be printed. |
Details
For count response, see details in catecvcount()
.
For survival response, see details in catecvsurv()
.
For continuous response, see details in catecvmean()
.
Value
For count response, see description of outputs in catecvcount()
.
For survival response, see description of outputs in catecvsurv()
.
For continuous response, see description of outputs in catecvmean()
.
References
Yadlowsky, S., Pellegrini, F., Lionetto, F., Braune, S., & Tian, L. (2020). Estimation and validation of ratio-based conditional average treatment effects using observational data. Journal of the American Statistical Association, 1-18. DOI: 10.1080/01621459.2020.1772080.
See Also
catefit()
function and boxplot()
, abc
methods for
"precmed"
objects.
Examples
cate_1 <- catecv(response = "count",
data = countExample,
score.method = "poisson",
cate.model = y ~ age + female + previous_treatment +
previous_cost + previous_number_relapses +
offset(log(years)),
ps.model = trt ~ age + previous_treatment,
higher.y = FALSE, cv.n = 5, seed = 999, verbose = 1)
plot(cate_1, ylab = "RMTL ratio of drug1 vs drug0 in each subgroup")
boxplot(cate_1, ylab = "RMTL ratio of drug1 vs drug0 in each subgroup")
abc(cate_1)
# Survival outcome
library(survival)
tau0 <- with(survivalExample,
min(quantile(y[trt == "drug1"], 0.95), quantile(y[trt == "drug0"], 0.95)))
cate_2 <- catecv(response = "survival",
data = survivalExample,
score.method = c("poisson", "randomForest"),
cate.model = Surv(y, d) ~ age + female + previous_cost +
previous_number_relapses,
ps.model = trt ~ age + previous_treatment,
initial.predictor.method = "randomForest",
ipcw.model = ~ age + previous_cost + previous_treatment,
tau0 = tau0,
higher.y = TRUE,
surv.min = 0.025,
cv.n = 5,
seed = 999)
plot(cate_2, ylab = "RMTL ratio of drug1 vs drug0 in each subgroup")
boxplot(cate_2, ylab = "RMTL ratio of drug1 vs drug0 in each subgroup")
abc(cate_2)
Cross-validation of the conditional average treatment effect (CATE) score for count outcomes
Description
Provides doubly robust estimation of the average treatment effect (ATE) in
nested and mutually exclusive subgroups of patients defined by an estimated
conditional average treatment effect (CATE) score via cross-validation (CV).
The CATE score can be estimated with up to 5 methods among the following:
Poisson regression, boosting, two regressions, contrast regression, and
negative binomial (see score.method
).
Usage
catecvcount(
data,
score.method,
cate.model,
ps.model,
ps.method = "glm",
initial.predictor.method = "boosting",
minPS = 0.01,
maxPS = 0.99,
higher.y = TRUE,
prop.cutoff = seq(0.5, 1, length = 6),
prop.multi = c(0, 1/3, 2/3, 1),
abc = TRUE,
train.prop = 3/4,
cv.n = 10,
error.max = 0.1,
max.iter = 5000,
xvar.smooth = NULL,
tree.depth = 2,
n.trees.boosting = 200,
B = 3,
Kfold = 5,
error.maxNR = 0.001,
max.iterNR = 150,
tune = c(0.5, 2),
seed = NULL,
plot.gbmperf = TRUE,
verbose = 0,
...
)
Arguments
data |
A data frame containing the variables in the outcome and
propensity score model; a data frame with |
score.method |
A vector of one or multiple methods to estimate the CATE
score. Allowed values are: |
cate.model |
A formula describing the outcome model to be fitted. The outcome must appear on the left-hand side. |
ps.model |
A formula describing the propensity score model to be fitted.
The treatment must appear on the left-hand side. The treatment must be a
numeric vector coded as 0 or 1. If data are from a randomized trial, specify
|
ps.method |
A character value for the method to estimate the propensity
score. Allowed values include one of: |
initial.predictor.method |
A character vector for the method used to get
initial outcome predictions conditional on the covariates in
|
minPS |
A numerical value between 0 and 1 below which estimated
propensity scores should be truncated. Default is |
maxPS |
A numerical value between 0 and 1 above which estimated
propensity scores should be truncated. Must be strictly greater than
|
higher.y |
A logical value indicating whether higher ( |
prop.cutoff |
A vector of numerical values between 0 and 1 specifying
percentiles of the estimated log CATE scores to define nested subgroups. Each
element represents the cutoff to separate observations in nested subgroups
(below vs above cutoff). The length of |
prop.multi |
A vector of numerical values between 0 and 1 specifying
percentiles of the estimated log CATE scores to define mutually exclusive
subgroups. It should start with 0, end with 1, and be of
|
abc |
A logical value indicating whether the area between curves (ABC)
should be calculated at each cross-validation iterations, for each
|
train.prop |
A numerical value between 0 and 1 indicating the proportion
of total data used for training. Default is |
cv.n |
A positive integer value indicating the number of
cross-validation iterations. Default is |
error.max |
A numerical value > 0 indicating the tolerance (maximum
value of error) for the largest standardized absolute difference in the
covariate distributions or in the doubly robust estimated rate ratios between
the training and validation sets. This is used to define a balanced
training-validation splitting. Default is |
max.iter |
A positive integer value indicating the maximum number of
iterations when searching for a balanced training-validation split. Default
is |
xvar.smooth |
A vector of characters indicating the name of the variables used as
the smooth terms if |
tree.depth |
A positive integer specifying the depth of individual trees in boosting
(usually 2-3). Used only if |
n.trees.boosting |
A positive integer specifying the maximum number of trees in boosting
(usually 100-1000). Used if |
B |
A positive integer specifying the number of time cross-fitting is repeated in
|
Kfold |
A positive integer specifying the number of folds used in cross-fitting
to partition the data in |
error.maxNR |
A numerical value > 0 indicating the minimum value of the mean absolute
error in Newton Raphson algorithm. Used only if |
max.iterNR |
A positive integer indicating the maximum number of iterations in the
Newton Raphson algorithm. Used only if |
tune |
A vector of 2 numerical values > 0 specifying tuning parameters for the
Newton Raphson algorithm. |
seed |
An optional integer specifying an initial randomization seed for reproducibility.
Default is |
plot.gbmperf |
A logical value indicating whether to plot the performance measures in
boosting. Used only if |
verbose |
An integer value indicating what kind of intermediate progress messages should
be printed. |
... |
Additional arguments for |
Details
The CATE score represents an individual-level treatment effect expressed as a rate ratio for count outcomes. It can be estimated with boosting, Poisson regression, negative binomial regression, and the doubly robust estimator two regressions (Yadlowsky, 2020) applied separately by treatment group or with the other doubly robust estimator contrast regression (Yadlowsky, 2020) applied to the entire data set.
Internal CV is applied to reduce optimism in choosing the CATE estimation method that
captures the most treatment effect heterogeneity. The CV is applied by repeating the
following steps cv.n
times:
Split the data into a training and validation set according to
train.prop
. The training and validation sets must be balanced with respect to covariate distributions and doubly robust rate ratio estimates (seeerror.max
).Estimate the CATE score in the training set with the specified scoring method.
Predict the CATE score in the validation set using the scoring model fitted from the training set.
Build nested subgroups of treatment responders in the training and validation sets, separately, and estimate the ATE within each nested subgroup. For each element i of
prop.cutoff
(e.g.,prop.cutoff[i]
= 0.6), take the following steps:Identify high responders as observations with the 60% (i.e.,
prop.cutoff[i]
x100%) highest (ifhigher.y = TRUE
) or lowest (ifhigher.y = FALSE
) estimated CATE scores.Estimate the ATE in the subgroup of high responders using a doubly robust estimator.
Conversely, identify low responders as observations with the 40% (i.e., 1 -
prop.cutoff[i]
x100%) lowest (ifhigher.y
= TRUE) or highest (ifhigher.y
= FALSE) estimated CATE scores.Estimate the ATE in the subgroup of low responders using a doubly robust estimator.
If
abc
= TRUE, calculate the area between the ATE and the series of ATEs in nested subgroups of high responders in the validation set.Build mutually exclusive subgroups of treatment responders in the training and validation sets, separately, and estimate the ATE within each subgroup. Mutually exclusive subgroups are built by splitting the estimated CATE scores according to
prop.multi
.
Value
Returns a list containing the following components saved as a "precmed"
object:
ate.poisson
: A list of results output ifscore.method
includes'poisson'
:ate.est.train.high.cv
: A matrix of numerical values withlength(prop.cutoff)
rows andcv.n
columns. The ith row/jth column cell contains the estimated ATE in the nested subgroup of high responders defined by CATE score above (ifhigher.y = TRUE
) or below (ifhigher.y = FALSE
) theprop.cutoff[i]
x100% percentile of the estimated CATE score in the training set in the jth cross-validation iteration.ate.est.train.low.cv
: A matrix of numerical values withlength(prop.cutoff) - 1
rows andcv.n
columns. The ith row/jth column cell contains the estimated ATE in the nested subgroup of low responders defined by CATE score below (ifhigher.y = TRUE
) or above (ifhigher.y = FALSE
) theprop.cutoff[i]
x100% percentile of the estimated CATE score in the training set in the jth cross-validation iteration.ate.est.valid.high.cv
: Same asate.est.train.high.cv
, but in the validation set.ate.est.valid.low.cv
: Same asate.est.train.low.cv
, but in the validation set.ate.est.train.group.cv
: A matrix of numerical values withlength(prop.multi) - 1
rows andcv.n
columns. The jth column contains the estimated ATE inlength(prop.multi) - 1
mutually exclusive subgroups defined byprop.multi
in the training set in jth cross-validation iteration.ate.est.valid.group.cv
: Same asate.est.train.group.cv
, but in the validation set.abc.valid
: A vector of numerical values of lengthcv.n
. The ith element returns the ABC of the validation curve in the ith cross-validation iteration. Only returned ifabc = TRUE
.
ate.boosting
: A list of results similar toate.poisson
output ifscore.method
includes'boosting'
.ate.twoReg
: A list of results similar toate.poisson
output ifscore.method
includes'twoReg'
.ate.contrastReg
: A list of results similar toate.poisson
output ifscore.method
includes'contrastReg'
. This method has an additional element in the list of results:converge.contrastReg.cv
: A vector of logical value of lengthcv.n
. The ith element indicates whether the algorithm converged in the ith cross-validation iteration.
ate.negBin
: A list of results similar toate.poisson
output ifscore.method
includes'negBin'
.props
: A list of 3 elements:prop.onlyhigh
: The original argumentprop.cutoff
, reformatted as necessary.prop.bi
: The original argumentprop.cutoff
, similar toprop.onlyhigh
but reformatted to exclude 1.prop.multi
: The original argumentprop.multi
, reformatted as necessary to include 0 and 1.
overall.ate.valid
: A vector of numerical values of lengthcv.n
. The ith element contains the ATE in the validation set of the ith cross-validation iteration, estimated with the doubly robust estimator.overall.ate.train
: A vector of numerical values of lengthcv.n
. The ith element contains the ATE in the training set of the ith cross-validation iteration, estimated with the doubly robust estimator.fgam
: The formula used in GAM ifinitial.predictor.method = 'gam'
.higher.y
: The originalhigher.y
argument.abc
: The originalabc
argument.cv.n
: The originalcv.n
argument.response
: The type of response. Always 'count' for this function.formulas
:A list of 3 elements: (1)cate.model
argument, (2)ps.model
argument and (3) original labels of the left-hand side variable inps.model
(treatment) if it was not 0/1.
References
Yadlowsky, S., Pellegrini, F., Lionetto, F., Braune, S., & Tian, L. (2020). Estimation and validation of ratio-based conditional average treatment effects using observational data. Journal of the American Statistical Association, 1-18.. DOI: 10.1080/01621459.2020.1772080.
See Also
plot.precmed()
, boxplot.precmed()
,
abc()
methods for "precmed"
objects,
and catefitcount()
function.
Examples
catecv <- catecvcount(data = countExample,
score.method = "poisson",
cate.model = y ~ age + female + previous_treatment +
previous_cost + previous_number_relapses +
offset(log(years)),
ps.model = trt ~ age + previous_treatment,
higher.y = FALSE,
cv.n = 5,
seed = 999,
plot.gbmperf = FALSE,
verbose = 1)
plot(catecv, ylab = "Rate ratio of drug1 vs drug0 in each subgroup")
boxplot(catecv, ylab = "Rate ratio of drug1 vs drug0 in each subgroup")
abc(catecv)
Cross-validation of the conditional average treatment effect (CATE) score for continuous outcomes
Description
Provides doubly robust estimation of the average treatment effect (ATE) in nested and
mutually exclusive subgroups of patients defined by an estimated conditional average
treatment effect (CATE) score via cross-validation (CV). The CATE score can be estimated
with up to 6 methods among the following: Linear regression, boosting, two regressions,
contrast regression, random forest and generalized additive model (see score.method
).
Usage
catecvmean(
data,
score.method,
cate.model,
ps.model,
ps.method = "glm",
init.model = NULL,
initial.predictor.method = "boosting",
minPS = 0.01,
maxPS = 0.99,
higher.y = TRUE,
prop.cutoff = seq(0.5, 1, length = 6),
prop.multi = c(0, 1/3, 2/3, 1),
abc = TRUE,
train.prop = 3/4,
cv.n = 10,
error.max = 0.1,
max.iter = 5000,
xvar.smooth.score = NULL,
xvar.smooth.init = NULL,
tree.depth = 2,
n.trees.rf = 1000,
n.trees.boosting = 200,
B = 3,
Kfold = 6,
plot.gbmperf = TRUE,
error.maxNR = 0.001,
tune = c(0.5, 2),
seed = NULL,
verbose = 0,
...
)
Arguments
data |
A data frame containing the variables in the outcome and propensity score models;
a data frame with |
score.method |
A vector of one or multiple methods to estimate the CATE score.
Allowed values are: |
cate.model |
A formula describing the outcome model to be fitted. The outcome must appear on the left-hand side. |
ps.model |
A formula describing the propensity score model to be fitted.
The treatment must appear on the left-hand side. The treatment must be a numeric vector
coded as 0/1. If data are from a RCT, specify |
ps.method |
A character value for the method to estimate the propensity score.
Allowed values include one of:
|
init.model |
A formula describing the initial predictor model. The outcome must appear on the left-hand side.
It must be specified when |
initial.predictor.method |
A character vector for the method used to get initial
outcome predictions conditional on the covariates in |
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
higher.y |
A logical value indicating whether higher ( |
prop.cutoff |
A vector of numerical values (in '(0, 1]') specifying percentiles of the
estimated CATE scores to define nested subgroups. Each element represents the cutoff to
separate observations in nested subgroups (below vs above cutoff).
The length of |
prop.multi |
A vector of numerical values (in '[0, 1]') specifying percentiles of the
estimated CATE scores to define mutually exclusive subgroups.
It should start with 0, end with 1, and be of |
abc |
A logical value indicating whether the area between curves (ABC) should be calculated
at each cross-validation iterations, for each |
train.prop |
A numerical value (in '(0, 1)') indicating the proportion of total data used
for training. Default is |
cv.n |
A positive integer value indicating the number of cross-validation iterations.
Default is |
error.max |
A numerical value > 0 indicating the tolerance (maximum value of error)
for the largest standardized absolute difference in the covariate distributions or in the
doubly robust estimated rate ratios between the training and validation sets. This is used
to define a balanced training-validation splitting. Default is |
max.iter |
A positive integer value indicating the maximum number of iterations when
searching for a balanced training-validation split. Default is |
xvar.smooth.score |
A vector of characters indicating the name of the variables used as
the smooth terms if |
xvar.smooth.init |
A vector of characters indicating the name of the variables used as
the smooth terms if |
tree.depth |
A positive integer specifying the depth of individual trees in boosting
(usually 2-3). Used only if |
n.trees.rf |
A positive integer specifying the maximum number of trees in random forest.
Used if |
n.trees.boosting |
A positive integer specifying the maximum number of trees in boosting
(usually 100-1000). Used only if |
B |
A positive integer specifying the number of time cross-fitting is repeated in
|
Kfold |
A positive integer specifying the number of folds (parts) used in cross-fitting
to partition the data in |
plot.gbmperf |
A logical value indicating whether to plot the performance measures in
boosting. Used only if |
error.maxNR |
A numerical value > 0 indicating the minimum value of the mean absolute
error in Newton Raphson algorithm. Used only if |
tune |
A vector of 2 numerical values > 0 specifying tuning parameters for the
Newton Raphson algorithm. |
seed |
An optional integer specifying an initial randomization seed for reproducibility.
Default is |
verbose |
An integer value indicating what kind of intermediate progress messages should
be printed. |
... |
Additional arguments for |
Details
The CATE score represents an individual-level treatment effect for continuous data, estimated with boosting, linear regression, random forest, generalized additive model and the doubly robust estimator (two regressions, Yadlowsky, 2020) applied separately by treatment group or with the other doubly robust estimators (contrast regression, Yadlowsky, 2020) applied to the entire data set.
Internal CV is applied to reduce optimism in choosing the CATE estimation method that
captures the most treatment effect heterogeneity. The CV is applied by repeating the
following steps cv.n
times:
Split the data into a training and validation set according to
train.prop
. The training and validation sets must be balanced with respect to covariate distributions and doubly robust rate ratio estimates (seeerror.max
).Estimate the CATE score in the training set with the specified scoring method.
Predict the CATE score in the validation set using the scoring model fitted from the training set.
Build nested subgroups of treatment responders in the training and validation sets, separately, and estimate the ATE within each nested subgroup. For each element i of
prop.cutoff
(e.g.,prop.cutoff[i]
= 0.6), take the following steps:Identify high responders as observations with the 60% (i.e.,
prop.cutoff[i]
x100%) highest (ifhigher.y = TRUE
) or lowest (ifhigher.y = FALSE
) estimated CATE scores.Estimate the ATE in the subgroup of high responders using a doubly robust estimator.
Conversely, identify low responders as observations with the 40% (i.e., 1 -
prop.cutoff[i]
x100%) lowest (ifhigher.y
= TRUE) or highest (ifhigher.y
= FALSE) estimated CATE scores.Estimate the ATE in the subgroup of low responders using a doubly robust estimator.
Build mutually exclusive subgroups of treatment responders in the training and validation sets, separately, and estimate the ATE within each subgroup. Mutually exclusive subgroups are built by splitting the estimated CATE scores according to
prop.multi
.If
abc
= TRUE, calculate the area between the ATE and the series of ATEs in nested subgroups of high responders in the validation set.
Value
Returns a list containing the following components saved as a "precmed"
object:
ate.gaussian
: A list of results output ifscore.method
includes'gaussian'
:ate.est.train.high.cv
: A matrix of numerical values withlength(prop.cutoff)
rows andcv.n
columns. The ith column/jth row cell contains the estimated ATE in the nested subgroup of high responders defined by CATE score above (ifhigher.y = TRUE
) or below (ifhigher.y = FALSE
) theprop.cutoff[j]
x100% percentile of the estimated CATE score in the training set in the ith cross-validation iteration.ate.est.train.low.cv
: A matrix of numerical values withlength(prop.cutoff) - 1
rows andcv.n
columns. The ith column/jth row cell contains the estimated ATE in the nested subgroup of low responders defined by CATE score below (ifhigher.y = TRUE
) or above (ifhigher.y = FALSE
) theprop.cutoff[j]
x100% percentile of the estimated CATE score in the training set in the ith cross-validation iteration.ate.est.valid.high.cv
: Same asate.est.train.high.cv
, but in the validation set.ate.est.valid.low.cv
: Same asate.est.train.low.cv
, but in the validation set.ate.est.train.group.cv
: A matrix of numerical values withlength(prop.multi) - 1
rows andcv.n
columns. The ith column contains the estimated ATE inlength(prop.multi) - 1
mutually exclusive subgroups defined byprop.multi
in the training set in ith cross-validation iteration.ate.est.valid.group.cv
: Same asate.est.train.group.cv
, but in the validation set.abc.valid
: A vector of numerical values of lengthcv.n
, The ith element returns the ABC of the validation curve in the ith cross-validation iteration. Only returned ifabc = TRUE
.
ate.boosting
: A list of results similar toate.gaussian
output ifscore.method
includes'boosting'
.ate.twoReg
: A list of results similar toate.gaussian
output ifscore.method
includes'twoReg'
.ate.contrastReg
: A list of results similar toate.gaussian
output ifscore.method
includes'contrastReg'
.ate.randomForest
: A list of ATE output measured by the RMTL ratio ifscore.method
includes'randomForest'
:ate.gam
: A list of results similar toate.gaussian
output ifscore.method
includes'gam'
.props
: A list of 3 elements:prop.onlyhigh
: The original argumentprop.cutoff
, reformatted as necessary.prop.bi
: The original argumentprop.cutoff
, similar toprop.onlyhigh
but reformatted to exclude 1.prop.multi
: The original argumentprop.multi
, reformatted as necessary.
overall.ate.train
: A vector of numerical values of lengthcv.n
. The ith element contains the ATE in the training set of the ith cross-validation iteration, estimated with the doubly robust estimator.overall.ate.valid
: A vector of numerical values of lengthcv.n
. The ith element contains the ATE in the validation set of the ith cross-validation iteration, estimated with the doubly robust estimator.higher.y
: The originalhigher.y
argument.abc
: The originalabc
argument.cv.n
: The originalcv.n
argument.response
: The type of response. Always 'continuous' for this function.formulas
:A list of 3 elements: (1)cate.model
argument, (2)ps.model
argument and (3) original labels of the left-hand side variable inps.model
(treatment) if it was not 0/1.
References
Yadlowsky, S., Pellegrini, F., Lionetto, F., Braune, S., & Tian, L. (2020). Estimation and validation of ratio-based conditional average treatment effects using observational data. Journal of the American Statistical Association, 1-18. DOI: 10.1080/01621459.2020.1772080.
See Also
plot.precmed()
, boxplot.precmed()
, abc()
methods for "precmed"
objects,
and catefitmean()
function.
Examples
# Not implemented yet!
Cross-validation of the conditional average treatment effect (CATE) score for survival outcomes
Description
Provides doubly robust estimation of the average treatment effect (ATE) by the
RMTL (restricted mean time lost) ratio in nested and mutually exclusive subgroups of patients
defined by an estimated conditional average treatment effect (CATE) score via
cross-validation (CV). The CATE score can be estimated with up to 5 methods among the following:
Random forest, boosting, poisson regression, two regressions, and contrast regression
(see score.method
).
Usage
catecvsurv(
data,
score.method,
cate.model,
ps.model,
ps.method = "glm",
initial.predictor.method = "randomForest",
ipcw.model = NULL,
ipcw.method = "breslow",
minPS = 0.01,
maxPS = 0.99,
followup.time = NULL,
tau0 = NULL,
higher.y = TRUE,
prop.cutoff = seq(0.5, 1, length = 6),
prop.multi = c(0, 1/3, 2/3, 1),
abc = TRUE,
train.prop = 3/4,
cv.n = 10,
error.max = 0.1,
max.iter = 5000,
surv.min = 0.025,
tree.depth = 2,
n.trees.rf = 1000,
n.trees.boosting = 200,
B = 3,
Kfold = 5,
error.maxNR = 0.001,
max.iterNR = 150,
tune = c(0.5, 2),
seed = NULL,
plot.gbmperf = TRUE,
verbose = 0
)
Arguments
data |
A data frame containing the variables in the outcome, propensity score, and inverse
probability of censoring models (if specified); a data frame with |
score.method |
A vector of one or multiple methods to estimate the CATE score.
Allowed values are: |
cate.model |
A standard |
ps.model |
A formula describing the propensity score (PS) model to be fitted. The treatment must
appear on the left-hand side. The treatment must be a numeric vector coded as 0/1.
If data are from a randomized controlled trial, specify |
ps.method |
A character value for the method to estimate the propensity score.
Allowed values include one of:
|
initial.predictor.method |
A character vector for the method used to get initial
outcome predictions conditional on the covariates specified in |
ipcw.model |
A formula describing the inverse probability of censoring weighting (IPCW)
model to be fitted. The left-hand side must be empty. Default is |
ipcw.method |
A character value for the censoring model. Allowed values are:
|
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
followup.time |
A column name in |
tau0 |
The truncation time for defining restricted mean time lost. Default is |
higher.y |
A logical value indicating whether higher ( |
prop.cutoff |
A vector of numerical values (in '(0, 1]') specifying percentiles of the
estimated log CATE scores to define nested subgroups. Each element represents the cutoff to
separate observations in nested subgroups (below vs above cutoff).
The length of |
prop.multi |
A vector of numerical values (in '[0, 1]') specifying percentiles of the
estimated log CATE scores to define mutually exclusive subgroups.
It should start with 0, end with 1, and be of |
abc |
A logical value indicating whether the area between curves (ABC) should be calculated
at each cross-validation iterations, for each |
train.prop |
A numerical value (in '(0, 1)') indicating the proportion of total data used
for training. Default is |
cv.n |
A positive integer value indicating the number of cross-validation iterations.
Default is |
error.max |
A numerical value > 0 indicating the tolerance (maximum value of error)
for the largest standardized absolute difference in the covariate distributions or in the
doubly robust estimated rate ratios between the training and validation sets. This is used
to define a balanced training-validation splitting. Default is |
max.iter |
A positive integer value indicating the maximum number of iterations when
searching for a balanced training-validation split. Default is |
surv.min |
Lower truncation limit for the probability of being censored.
It must be a positive value and should be chosen close to 0. Default is |
tree.depth |
A positive integer specifying the depth of individual trees in boosting
(usually 2-3). Used only if |
n.trees.rf |
A positive integer specifying the maximum number of trees in random forest.
Used if |
n.trees.boosting |
A positive integer specifying the maximum number of trees in boosting
(usually 100-1000). Used if |
B |
A positive integer specifying the number of time cross-fitting is repeated in
|
Kfold |
A positive integer specifying the number of folds used in cross-fitting
to partition the data in |
error.maxNR |
A numerical value > 0 indicating the minimum value of the mean absolute
error in Newton Raphson algorithm. Used only if |
max.iterNR |
A positive integer indicating the maximum number of iterations in the
Newton Raphson algorithm. Used only if |
tune |
A vector of 2 numerical values > 0 specifying tuning parameters for the
Newton Raphson algorithm. |
seed |
An optional integer specifying an initial randomization seed for reproducibility.
Default is |
plot.gbmperf |
A logical value indicating whether to plot the performance measures in
boosting. Used only if |
verbose |
An integer value indicating what kind of intermediate progress messages should
be printed. |
Details
The CATE score represents an individual-level treatment effect expressed as the restricted mean survival time (RMTL) ratio) for survival outcomes. It can be estimated with boosting, Poisson regression, random forest, and the doubly robust estimator two regressions (Yadlowsky, 2020) applied separately by treatment group or with the other doubly robust estimator contrast regression (Yadlowsky, 2020) applied to the entire data set.
Internal CV is applied to reduce optimism in choosing the CATE estimation method that
captures the most treatment effect heterogeneity. The CV is applied by repeating the
following steps cv.n
times:
Split the data into a training and validation set according to
train.prop
. The training and validation sets must be balanced with respect to covariate distributions and doubly robust RMTL ratio estimates (seeerror.max
).Estimate the CATE score in the training set with the specified scoring method.
Predict the CATE score in the validation set using the scoring model fitted from the training set.
Build nested subgroups of treatment responders in the training and validation sets, separately, and estimate the ATE within each nested subgroup. For each element i of
prop.cutoff
(e.g.,prop.cutoff[i]
= 0.6), take the following steps:Identify high responders as observations with the 60% (i.e.,
prop.cutoff[i]
x100%) highest (ifhigher.y = FALSE
) or lowest (ifhigher.y = TRUE
) estimated CATE scores.Estimate the ATE in the subgroup of high responders using a doubly robust estimator.
Conversely, identify low responders as observations with the 40% (i.e., 1 -
prop.cutoff[i]
x100%) lowest (ifhigher.y
= FALSE) or highest (ifhigher.y
= TRUE) estimated CATE scores.Estimate the ATE in the subgroup of low responders using a doubly robust estimator.
If
abc
= TRUE, calculate the area between the ATE and the series of ATEs in nested subgroups of high responders in the validation set.Build mutually exclusive subgroups of treatment responders in the training and validation sets, separately, and estimate the ATE within each subgroup. Mutually exclusive subgroups are built by splitting the estimated CATE scores according to
prop.multi
.
Value
Returns a list containing the following components saved as a "precmed"
object:
ate.randomForest
: A list of ATE output measured by the RMTL ratio ifscore.method
includes'randomForest'
:ate.est.train.high.cv
: A matrix of numerical values withlength(prop.cutoff)
rows andcv.n
columns. The ith column/jth row cell contains the estimated ATE in the nested subgroup of high responders defined by CATE score above (ifhigher.y = FALSE
) or below (ifhigher.y = TRUE
) theprop.cutoff[j]
x100% percentile of the estimated CATE score in the training set in the ith cross-validation iteration.ate.est.train.low.cv
: A matrix of numerical values withlength(prop.cutoff) - 1
rows andcv.n
columns. TThe ith column/jth row cell contains the estimated ATE in the nested subgroup of low responders defined by CATE score below (ifhigher.y = FALSE
) or above (ifhigher.y = TRUE
) theprop.cutoff[j]
x100% percentile of the estimated CATE score in the training set in the ith cross-validation iteration.ate.est.valid.high.cv
: Same asate.est.train.high.cv
, but in the validation set.ate.est.valid.low.cv
: Same asate.est.train.low.cv
, but in the validation set.ate.est.train.group.cv
: A matrix of numerical values withlength(prop.multi) - 1
rows andcv.n
columns. The ith column contains the estimated ATE inlength(prop.multi) - 1
mutually exclusive subgroups defined byprop.multi
in the training set in ith cross-validation iteration.ate.est.valid.group.cv
: Same asate.est.train.group.cv
, but in the validation set.abc.valid
: A vector of numerical values of lengthcv.n
, The ith element returns the ABC of the validation curve in the ith cross-validation iteration. Only returned ifabc = TRUE
.
ate.boosting
: A list of results similar toate.randomForest
output ifscore.method
includes'boosting'
.ate.poisson
: A list of results similar toate.randomForest
output ifscore.method
includes'poisson'
.ate.twoReg
: A list of results similar toate.randomForest
output ifscore.method
includes'twoReg'
.ate.contrastReg
: A list of results similar toate.randomForest
output ifscore.method
includes'contrastReg'
. This method has an additional element in the list of results:converge.contrastReg.cv
: A vector of logical value of lengthcv.n
. The ith element indicates whether the algorithm converged in the ith cross-validation iteration.
hr.randomForest
: A list of adjusted hazard ratio ifscore.method
includes'randomForest'
:hr.est.train.high.cv
: A matrix of numerical values withlength(prop.cutoff)
rows andcv.n
columns. The ith column/jth row cell contains the estimated HR in the nested subgroup of high responders defined by CATE score above (ifhigher.y = FALSE
) or below (ifhigher.y = TRUE
) theprop.cutoff[j]
x100% percentile of the estimated CATE score in the training set in the ith cross-validation iteration.hr.est.train.low.cv
: A matrix of numerical values withlength(prop.cutoff) - 1
rows andcv.n
columns. TThe ith column/jth row cell contains the estimated HR in the nested subgroup of low responders defined by CATE score below (ifhigher.y = FALSE
) or above (ifhigher.y = TRUE
) theprop.cutoff[j]
x100% percentile of the estimated CATE score in the training set in the ith cross-validation iteration.hr.est.valid.high.cv
: Same ashr.est.train.high.cv
, but in the validation set.hr.est.valid.low.cv
: Same ashr.est.train.low.cv
, but in the validation set.hr.est.train.group.cv
: A matrix of numerical values withlength(prop.multi) - 1
rows andcv.n
columns. The ith column contains the estimated HR inlength(prop.multi) - 1
mutually exclusive subgroups defined byprop.multi
in the training set in ith cross-validation iteration.hr.est.valid.group.cv
: Same ashr.est.train.group.cv
, but in the validation set.
hr.boosting
: A list of results similar tohr.randomForest
output ifscore.method
includes'boosting'
.hr.poisson
: A list of results similar tohr.randomForest
output ifscore.method
includes'poisson'
.hr.twoReg
: A list of results similar tohr.randomForest
output ifscore.method
includes'twoReg'
.hr.contrastReg
: A list of results similar tohr.randomForest
output ifscore.method
includes'contrastReg'
.props
: A list of 3 elements:prop.onlyhigh
: The original argumentprop.cutoff
, reformatted as necessary.prop.bi
: The original argumentprop.cutoff
, similar toprop.onlyhigh
but reformatted to exclude 1.prop.multi
: The original argumentprop.multi
, reformatted as necessary to include 0 and 1.
overall.ate.train
: A vector of numerical values of lengthcv.n
. The ith element contains the ATE (RMTL ratio) in the training set of the ith cross-validation iteration, estimated with the doubly robust estimator.overall.hr.train
: A vector of numerical values of lengthcv.n
. The ith element contains the ATE (HR) in the training set of the ith cross-validation iteration.overall.ate.valid
: A vector of numerical values of lengthcv.n
. The ith element contains the ATE (RMTL ratio) in the validation set of the ith cross-validation iteration, estimated with the doubly robust estimator.overall.hr.valid
: A vector of numerical values of lengthcv.n
. The ith element contains the ATE (HR) in the validation set of the ith cross-validation iteration.errors/warnings
: A nested list of errors and warnings that were wrapped during the calculation of ATE. Errors and warnings are organized byscore.method
and position in the CV flow.higher.y
: The originalhigher.y
argument.abc
: The originalabc
argument.cv.n
: The originalcv.n
argument.response
: The type of response. Always 'survival' for this function.formulas
:A list of 3 elements: (1)cate.model
argument, (2)ps.model
argument and (3) original labels of the left-hand side variable inps.model
(treatment) if it was not 0/1.
References
Yadlowsky, S., Pellegrini, F., Lionetto, F., Braune, S., & Tian, L. (2020). Estimation and validation of ratio-based conditional average treatment effects using observational data. Journal of the American Statistical Association, 1-18.. DOI: 10.1080/01621459.2020.1772080.
See Also
catefitsurv()
function and boxplot()
, abc
methods for
"precmed"
objects.
Examples
library(survival)
tau0 <- with(survivalExample,
min(quantile(y[trt == "drug1"], 0.95), quantile(y[trt == "drug0"], 0.95)))
catecv <- catecvsurv(data = survivalExample,
score.method = "poisson",
cate.model = Surv(y, d) ~ age + female + previous_cost +
previous_number_relapses,
ps.model = trt ~ age + previous_treatment,
initial.predictor.method = "logistic",
ipcw.model = ~ age + previous_cost + previous_treatment,
tau0 = tau0,
higher.y = TRUE,
cv.n = 5, seed = 999, verbose = 1)
# Try:
plot(catecv, ylab = "RMTL ratio of drug1 vs drug0 in each subgroup")
boxplot(catecv, ylab = "RMTL ratio of drug1 vs drug0 in each subgroup")
abc(catecv)
Estimation of the conditional average treatment effect (CATE) score for count, survival and continuous data
Description
Provides singly robust and doubly robust estimation of CATE score for count, survival and continuous data with up the following scoring methods among the following: Random forest (survival, continuous only), boosting, poisson regression (count, survival only), two regressions, contrast regression, negative binomial regression (count only), linear regression (continuous only), and generalized additive model (continuous only).
Usage
catefit(
response,
data,
score.method,
cate.model,
ps.model,
ps.method = "glm",
init.model = NULL,
initial.predictor.method = NULL,
ipcw.model = NULL,
ipcw.method = "breslow",
minPS = 0.01,
maxPS = 0.99,
followup.time = NULL,
tau0 = NULL,
higher.y = TRUE,
prop.cutoff = seq(0.5, 1, length = 6),
surv.min = 0.025,
xvar.smooth.score = NULL,
xvar.smooth.init = NULL,
tree.depth = 2,
n.trees.rf = 1000,
n.trees.boosting = 200,
B = 3,
Kfold = 5,
error.maxNR = 0.001,
max.iterNR = 150,
tune = c(0.5, 2),
seed = NULL,
plot.gbmperf = TRUE,
verbose = 0,
...
)
Arguments
response |
A string describing the type of outcome in the data. Allowed values include
"count" (see |
data |
A data frame containing the variables in the outcome, propensity score, and inverse
probability of censoring models (if specified); a data frame with |
score.method |
A vector of one or multiple methods to estimate the CATE score.
Allowed values are: |
cate.model |
A formula describing the outcome model to be fitted.
The outcome must appear on the left-hand side. For survival outcomes, a |
ps.model |
A formula describing the propensity score (PS) model to be fitted. The treatment must
appear on the left-hand side. The treatment must be a numeric vector coded as 0/1.
If data are from a randomized controlled trial, specify |
ps.method |
A character value for the method to estimate the propensity score.
Allowed values include one of:
|
init.model |
A formula describing the initial predictor model. The outcome must appear on the left-hand side.
It must be specified when |
initial.predictor.method |
A character vector for the method used to get initial
outcome predictions conditional on the covariates specified in |
ipcw.model |
A formula describing the inverse probability of censoring weighting (IPCW)
model to be fitted. The left-hand side must be empty. Only applies for survival outcomes.
Default is |
ipcw.method |
A character value for the censoring model. Only applies for survival
outcomes. Allowed values are: |
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
followup.time |
A column name in |
tau0 |
The truncation time for defining restricted mean time lost. Only applies for
survival outcomes. Default is |
higher.y |
A logical value indicating whether higher ( |
prop.cutoff |
A vector of numerical values (in '(0, 1]') specifying percentiles of the
estimated log CATE scores to define nested subgroups. Each element represents the cutoff to
separate observations in nested subgroups (below vs above cutoff).
The length of |
surv.min |
Lower truncation limit for the probability of being censored.
It must be a positive value and should be chosen close to 0. Only applies for survival
outcomes. Default is |
xvar.smooth.score |
A vector of characters indicating the name of the variables used as
the smooth terms if |
xvar.smooth.init |
A vector of characters indicating the name of the variables used as
the smooth terms if |
tree.depth |
A positive integer specifying the depth of individual trees in boosting
(usually 2-3). Used only if |
n.trees.rf |
A positive integer specifying the maximum number of trees in random forest.
Used if |
n.trees.boosting |
A positive integer specifying the maximum number of trees in boosting
(usually 100-1000). Used if |
B |
A positive integer specifying the number of time cross-fitting is repeated in
|
Kfold |
A positive integer specifying the number of folds used in cross-fitting
to partition the data in |
error.maxNR |
A numerical value > 0 indicating the minimum value of the mean absolute
error in Newton Raphson algorithm. Used only if |
max.iterNR |
A positive integer indicating the maximum number of iterations in the
Newton Raphson algorithm. Used only if |
tune |
A vector of 2 numerical values > 0 specifying tuning parameters for the
Newton Raphson algorithm. |
seed |
An optional integer specifying an initial randomization seed for reproducibility.
Default is |
plot.gbmperf |
A logical value indicating whether to plot the performance measures in
boosting. Used only if |
verbose |
An integer value indicating whether intermediate progress messages and histograms should
be printed. |
... |
Additional arguments for |
Details
For count response, see details in catefitcount()
.
For survival response, see details in catefitsurv()
.
Value
For count response, see description of outputs in catefitcount()
.
For survival response, see description of outputs in catefitsurv()
.
References
Yadlowsky, S., Pellegrini, F., Lionetto, F., Braune, S., & Tian, L. (2020). Estimation and validation of ratio-based conditional average treatment effects using observational data. Journal of the American Statistical Association, 1-18. DOI: 10.1080/01621459.2020.1772080.
See Also
catecv()
Examples
# Count outcome
fit_1 <- catefit(response = "count",
data = countExample,
score.method = "poisson",
cate.model = y ~ age + female + previous_treatment +
previous_cost + previous_number_relapses +
offset(log(years)),
ps.model = trt ~ age + previous_treatment,
higher.y = TRUE,
seed = 999)
coef(fit_1)
# Survival outcome
library(survival)
tau0 <- with(survivalExample,
min(quantile(y[trt == "drug1"], 0.95), quantile(y[trt == "drug0"], 0.95)))
fit_2 <- catefit(response = "survival",
data = survivalExample,
score.method = c("poisson", "boosting", "randomForest"),
cate.model = Surv(y, d) ~ age + female + previous_cost +
previous_number_relapses,
ps.model = trt ~ age + previous_treatment,
initial.predictor.method = "logistic",
ipcw.model = ~ age + previous_cost + previous_treatment,
tau0 = tau0, higher.y = TRUE, seed = 999, n.cores = 1)
coef(fit_2)
Estimation of the conditional average treatment effect (CATE) score for count data
Description
Provides singly robust and doubly robust estimation of CATE score with up to 5 scoring methods among the following: Poisson regression, boosting, two regressions, contrast regression, and negative binomial.
Usage
catefitcount(
data,
score.method,
cate.model,
ps.model,
ps.method = "glm",
initial.predictor.method = "boosting",
minPS = 0.01,
maxPS = 0.99,
higher.y = TRUE,
prop.cutoff = seq(0.5, 1, length = 6),
xvar.smooth = NULL,
tree.depth = 2,
n.trees.boosting = 200,
B = 3,
Kfold = 5,
error.maxNR = 0.001,
max.iterNR = 150,
tune = c(0.5, 2),
seed = NULL,
plot.gbmperf = FALSE,
verbose = 0,
...
)
Arguments
data |
A data frame containing the variables in the outcome and propensity score model; a data frame with |
score.method |
A vector of one or multiple methods to estimate the CATE score.
Allowed values are: |
cate.model |
A formula describing the outcome model to be fitted. The outcome must appear on the left-hand side. |
ps.model |
A formula describing the propensity score model to be fitted.
The treatment must appear on the left-hand side. The treatment must be a numeric vector
coded as 0/1. If data are from a randomized trial, specify |
ps.method |
A character value for the method to estimate the propensity score.
Allowed values include one of:
|
initial.predictor.method |
A character vector for the method used to get initial
outcome predictions conditional on the covariates in |
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
higher.y |
A logical value indicating whether higher ( |
prop.cutoff |
A vector of numerical values (in '(0, 1]') specifying percentiles of the
estimated log CATE scores to define nested subgroups. Each element represents the cutoff to
separate observations in nested subgroups (below vs above cutoff).
The length of |
xvar.smooth |
A vector of characters indicating the name of the variables used as
the smooth terms if |
tree.depth |
A positive integer specifying the depth of individual trees in boosting
(usually 2-3). Used only if |
n.trees.boosting |
A positive integer specifying the maximum number of trees in boosting
(usually 100-1000). Used if |
B |
A positive integer specifying the number of time cross-fitting is repeated in
|
Kfold |
A positive integer specifying the number of folds used in cross-fitting
to partition the data in |
error.maxNR |
A numerical value > 0 indicating the minimum value of the mean absolute
error in Newton Raphson algorithm. Used only if |
max.iterNR |
A positive integer indicating the maximum number of iterations in the
Newton Raphson algorithm. Used only if |
tune |
A vector of 2 numerical values > 0 specifying tuning parameters for the
Newton Raphson algorithm. |
seed |
An optional integer specifying an initial randomization seed for reproducibility.
Default is |
plot.gbmperf |
A logical value indicating whether to plot the performance measures in
boosting. Used only if |
verbose |
An integer value indicating what kind of intermediate progress messages should
be printed. |
... |
Additional arguments for |
Details
The CATE score represents an individual-level treatment effect, estimated with either Poisson regression, boosting or negative binomial regression applied separately by treatment group or with two doubly robust estimators, two regressions and contrast regression (Yadlowsky, 2020) applied to the entire dataset.
catefitcount()
provides the coefficients of the CATE score for each scoring method requested
through score.method
. Currently, contrast regression is the only method which allows
for inference of the CATE coefficients by providing standard errors of the coefficients.
The coefficients can be used to learn the effect size of each variable and predict the
CATE score for a new observation.
catefitcount()
also provides the predicted CATE score of each observation in the data set,
for each scoring method. The predictions allow ranking the observations from potentially
high responders to the treatment to potentially low or standard responders.
The estimated ATE among nested subgroups of high responders are also provided by scoring method.
Note that the ATEs in catefitcount()
are derived based on the CATE score which is estimated
using the same data sample. Therefore, overfitting may be an issue. catecvcount()
is more
suitable to inspect the estimated ATEs across scoring methods as it implements internal cross
validation to reduce optimism.
Value
Returns a list containing the following components:
ate.poisson
: A vector of numerical values of lengthprop.cutoff
containing the estimated ATE in nested subgroups (defined byprop.cutoff
) constructed based on the estimated CATE scores with poisson regression. Only provided ifscore.method
includes'poisson'
.ate.boosting
: Same asate.poisson
, but with the nested subgroups based the estimated CATE scores with boosting. Only provided ifscore.method
includes'boosting'
.ate.twoReg
: Same asate.poisson
, but with the nested subgroups based the estimated CATE scores with two regressions. Only provided ifscore.method
includes'twoReg'
.ate.contrastReg
: Same asate.poisson
, but with the nested subgroups based the estimated CATE scores with contrast regression. Only provided ifscore.method
includes'contrastReg'
.ate.negBin
: Same asate.poisson
, but with the nested subgroups based the estimated CATE scores with negative binomial regression. Only provided ifscore.method
includes'negBin'
.score.poisson
: A vector of numerical values of length n (number of observations indata
) containing the estimated log-CATE scores according to the Poisson regression. Only provided ifscore.method
includes'poisson'
.score.boosting
: Same asscore.poisson
, but with estimated log-CATE score according to boosting. Only provided ifscore.method
includes'boosting'
.score.twoReg
: Same asscore.poisson
, but with estimated log-CATE score according to two regressions. Only provided ifscore.method
includes'twoReg'
.score.contrastReg
: Same asscore.poisson
, but with estimated log-CATE score according to contrast regression. Only provided ifscore.method
includes'contrastReg'
.score.negBin
: Same asscore.poisson
, but with estimated log-CATE score according to negative binomial regression. Only provided ifscore.method
includes'negBin'
.fit
: Additional details on model fitting ifscore.method
includes 'boosting' or 'contrastReg':result.boosting
: Details on the boosting model fitted to observations with treatment = 0($fit0.boosting)
and to observations with treatment = 1($fit1.boosting)
. Only provided ifscore.method
includes'boosting'
.result.contrastReg$sigma.contrastReg
: Variance-covariance matrix of the estimated log-CATE coefficients in contrast regression. Only provided ifscore.method
includes'contrastReg'
.
coefficients
: A data frame with the coefficients of the estimated log-CATE score byscore.method
. The data frame has number of rows equal to the number of covariates incate.model
and number of columns equal tolength(score.method)
. Ifscore.method
includes'contrastReg'
, the data frame has an additional column containing the standard errors of the coefficients estimated with contrast regression.'boosting'
does not have coefficient results because tree-based methods typically do not express the log-CATE as a linear combination of coefficients and covariates.
References
Yadlowsky, S., Pellegrini, F., Lionetto, F., Braune, S., & Tian, L. (2020). Estimation and validation of ratio-based conditional average treatment effects using observational data. Journal of the American Statistical Association, 1-18. DOI: 10.1080/01621459.2020.1772080.
See Also
Examples
fit <- catefitcount(data = countExample,
score.method = "poisson",
cate.model = y ~ age + female + previous_treatment +
previous_cost + previous_number_relapses +
offset(log(years)),
ps.model = trt ~ age + previous_treatment,
higher.y = FALSE,
seed = 999, verbose = 1)
Estimation of the conditional average treatment effect (CATE) score for continuous data
Description
Provides singly robust and doubly robust estimation of CATE score with up to 6 scoring methods among the following: Linear regression, boosting, two regressions, contrast regression, random forest and generalized additive model.
Usage
catefitmean(
data,
score.method,
cate.model,
ps.model,
ps.method = "glm",
init.model = NULL,
initial.predictor.method = "boosting",
minPS = 0.01,
maxPS = 0.99,
higher.y = TRUE,
prop.cutoff = seq(0.5, 1, length = 6),
xvar.smooth.score = NULL,
xvar.smooth.init = NULL,
tree.depth = 2,
n.trees.rf = 1000,
n.trees.boosting = 200,
B = 3,
Kfold = 6,
plot.gbmperf = FALSE,
error.maxNR = 0.001,
tune = c(0.5, 2),
seed = NULL,
verbose = 0,
...
)
Arguments
data |
A data frame containing the variables in the outcome and propensity score models;
a data frame with |
score.method |
A vector of one or multiple methods to estimate the CATE score.
Allowed values are: |
cate.model |
A formula describing the outcome model to be fitted. The outcome must appear on the left-hand side. |
ps.model |
A formula describing the propensity score model to be fitted.
The treatment must appear on the left-hand side.
The treatment must be a numeric vector coded as 0/1.
If data are from a RCT, specify |
ps.method |
A character value for the method to estimate the propensity score.
Allowed values include one of: |
init.model |
A formula describing the initial predictor model. The outcome must appear on the left-hand side.
It must be specified when |
initial.predictor.method |
A character vector for the method used to get initial outcome
predictions conditional on the covariates in |
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
higher.y |
A logical value indicating whether higher ( |
prop.cutoff |
A vector of numerical values (in '(0, 1]') specifying percentiles of
the estimated log CATE scores to define nested subgroups. Each element represents the
cutoff to separate observations in nested subgroups (below vs above cutoff).
The length of |
xvar.smooth.score |
A vector of characters indicating the name of the variables used as the
smooth terms if |
xvar.smooth.init |
A vector of characters indicating the name of the variables used as the
smooth terms if |
tree.depth |
A positive integer specifying the depth of individual trees in boosting
(usually 2-3). Used only if |
n.trees.rf |
A positive integer specifying the number of trees. Used only if
|
n.trees.boosting |
A positive integer specifying the maximum number of trees in boosting
(usually 100-1000). Used only if |
B |
A positive integer specifying the number of time cross-fitting is repeated in
|
Kfold |
A positive integer specifying the number of folds (parts) used in cross-fitting
to partition the data in |
plot.gbmperf |
A logical value indicating whether to plot the performance measures
in boosting. Used only if |
error.maxNR |
A numerical value > 0 indicating the minimum value of the mean absolute
error in Newton Raphson algorithm. Used only if |
tune |
A vector of 2 numerical values > 0 specifying tuning parameters for the
Newton Raphson algorithm. |
seed |
An optional integer specifying an initial randomization seed for reproducibility.
Default is |
verbose |
An integer value indicating what kind of intermediate progress messages should
be printed. |
... |
Additional arguments for |
Details
The CATE score represents an individual-level treatment effect, estimated with either linear regression, boosting, random forest and generalized additive model applied separately by treatment group or with two doubly robust estimators, two regressions and contrast regression (Yadlowsky, 2020) applied to the entire dataset.
catefitmean()
provides the coefficients of the CATE score for each scoring method requested
through score.method
. Currently, contrast regression is the only method which allows
for inference of the CATE coefficients by providing standard errors of the coefficients.
The coefficients can be used to learn the effect size of each variable and predict the
CATE score for a new observation.
catefitmean()
also provides the predicted CATE score of each observation in the data set,
for each scoring method. The predictions allow ranking the observations from potentially
high responders to the treatment to potentially low or standard responders.
The estimated ATE among nested subgroups of high responders are also provided by scoring method.
Note that the ATEs in catefitmean()
are derived based on the CATE score which is estimated
using the same data sample. Therefore, overfitting may be an issue. catefitmean()
is more
suitable to inspect the estimated ATEs across scoring methods as it implements internal cross
validation to reduce optimism.
Value
Returns a list containing the following components:
ate.gaussian
: A vector of numerical values of lengthprop.cutoff
containing the estimated ATE in nested subgroups (defined byprop.cutoff
) constructed based on the estimated CATE scores with Poisson regression. Only provided ifscore.method
includes'gaussian'
.ate.boosting
: Same asate.gaussian
, but with the nested subgroups based the estimated CATE scores with boosting. Only provided ifscore.method
includes'boosting'
.ate.twoReg
: Same asate.gaussian
, but with the nested subgroups based the estimated CATE scores with two regressions. Only provided ifscore.method
includes'twoReg'
.ate.contrastReg
: Same asate.gaussian
, but with the nested subgroups based the estimated CATE scores with contrast regression. Only provided ifscore.method
includes'contrastReg'
.ate.randomForest
: Same asate.gaussian
, but with the nested subgroups based the estimated CATE scores with random forest. Only provided ifscore.method
includes'gam'
.ate.gam
: Same asate.gaussian
, but with the nested subgroups based the estimated CATE scores with generalized additive model. Only provided ifscore.method
includes'gam'
.score.gaussian
: A vector of numerical values of length n (number of observations indata
) containing the estimated CATE scores according to the linear regression. Only provided ifscore.method
includes'gaussian'
.score.boosting
: Same asscore.gaussian
, but with estimated CATE score according to boosting. Only provided ifscore.method
includes'boosting'
.score.twoReg
: Same asscore.gaussian
, but with estimated CATE score according to two regressions. Only provided ifscore.method
includes'twoReg'
.score.contrastReg
: Same asscore.gaussian
, but with estimated CATE score according to contrast regression. Only provided ifscore.method
includes'contrastReg'
.score.randomForest
: Same asscore.gaussian
, but with estimated CATE score according to random forest. Only provided ifscore.method
includes'randomForest'
.score.gam
: Same asscore.gaussian
, but with estimated CATE score according to generalized additive model. Only provided ifscore.method
includes'gam'
.fit
: Additional details on model fitting ifscore.method
includes 'boosting' or 'contrastReg':result.boosting
: Details on the boosting model fitted to observations with treatment = 0($fit0.boosting)
and to observations with treatment = 1($fit1.boosting)
. Only provided ifscore.method
includes'boosting'
.result.randomForest
: Details on the boosting model fitted to observations with treatment = 0($fit0.randomForest)
and to observations with treatment = 1($fit1.randomForest)
. Only provided ifscore.method
includes'randomForest'
.result.gam
: Details on the boosting model fitted to observations with treatment = 0($fit0.gam)
and to observations with treatment = 1($fit1.gam)
. Only provided ifscore.method
includes'gam'
.result.contrastReg$sigma.contrastReg
: Variance-covariance matrix of the estimated CATE coefficients in contrast regression. Only provided ifscore.method
includes'contrastReg'
.
coefficients
: A data frame with the coefficients of the estimated CATE score byscore.method
. The data frame has number of rows equal to the number of covariates incate.model
and number of columns equal tolength(score.method)
. Ifscore.method
includes'contrastReg'
, the data frame has an additional column containing the standard errors of the coefficients estimated with contrast regression.'boosting'
,'randomForest'
,'gam'
do not have coefficient results because these methods do not express the CATE as a linear combination of coefficients and covariates.
References
Yadlowsky, S., Pellegrini, F., Lionetto, F., Braune, S., & Tian, L. (2020). Estimation and validation of ratio-based conditional average treatment effects using observational data. Journal of the American Statistical Association, 1-18. DOI: 10.1080/01621459.2020.1772080.
See Also
catecvmean()
function
Estimation of the conditional average treatment effect (CATE) score for survival data
Description
Provides singly robust and doubly robust estimation of CATE score for survival data with up to 5 scoring methods among the following: Random forest, boosting, poisson regression, two regressions, and contrast regression.
Usage
catefitsurv(
data,
score.method,
cate.model,
ps.model,
ps.method = "glm",
initial.predictor.method = "randomForest",
ipcw.model = NULL,
ipcw.method = "breslow",
minPS = 0.01,
maxPS = 0.99,
followup.time = NULL,
tau0 = NULL,
higher.y = TRUE,
prop.cutoff = seq(0.5, 1, length = 6),
surv.min = 0.025,
tree.depth = 2,
n.trees.rf = 1000,
n.trees.boosting = 200,
B = 3,
Kfold = 5,
plot.gbmperf = TRUE,
error.maxNR = 0.001,
max.iterNR = 100,
tune = c(0.5, 2),
seed = NULL,
verbose = 0,
...
)
Arguments
data |
A data frame containing the variables in the outcome, propensity score, and inverse
probability of censoring models (if specified); a data frame with |
score.method |
A vector of one or multiple methods to estimate the CATE score.
Allowed values are: |
cate.model |
A standard |
ps.model |
A formula describing the propensity score (PS) model to be fitted. The treatment must
appear on the left-hand side. The treatment must be a numeric vector coded as 0/1.
If data are from a randomized controlled trial, specify |
ps.method |
A character value for the method to estimate the propensity score.
Allowed values include one of:
|
initial.predictor.method |
A character vector for the method used to get initial
outcome predictions conditional on the covariates specified in |
ipcw.model |
A formula describing the inverse probability of censoring weighting (IPCW)
model to be fitted. The left-hand side must be empty. Default is |
ipcw.method |
A character value for the censoring model. Allowed values are:
|
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
followup.time |
A column name in |
tau0 |
The truncation time for defining restricted mean time lost. Default is |
higher.y |
A logical value indicating whether higher ( |
prop.cutoff |
A vector of numerical values (in '(0, 1]') specifying percentiles of the
estimated log CATE scores to define nested subgroups. Each element represents the cutoff to
separate observations in nested subgroups (below vs above cutoff).
The length of |
surv.min |
Lower truncation limit for the probability of being censored.
It must be a positive value and should be chosen close to 0. Default is |
tree.depth |
A positive integer specifying the depth of individual trees in boosting
(usually 2-3). Used only if |
n.trees.rf |
A positive integer specifying the maximum number of trees in random forest.
Used if |
n.trees.boosting |
A positive integer specifying the maximum number of trees in boosting
(usually 100-1000). Used if |
B |
A positive integer specifying the number of time cross-fitting is repeated in
|
Kfold |
A positive integer specifying the number of folds used in cross-fitting
to partition the data in |
plot.gbmperf |
A logical value indicating whether to plot the performance measures in
boosting. Used only if |
error.maxNR |
A numerical value > 0 indicating the minimum value of the mean absolute
error in Newton Raphson algorithm. Used only if |
max.iterNR |
A positive integer indicating the maximum number of iterations in the
Newton Raphson algorithm. Used only if |
tune |
A vector of 2 numerical values > 0 specifying tuning parameters for the
Newton Raphson algorithm. |
seed |
An optional integer specifying an initial randomization seed for reproducibility.
Default is |
verbose |
An integer value indicating what kind of intermediate progress messages should
be printed. |
... |
Additional arguments for |
Details
The CATE score represents an individual-level treatment effect for survival data, estimated with random forest, boosting, Poisson regression, and the doubly robust estimator (two regressions, Yadlowsky, 2020) applied separately by treatment group or with the other doubly robust estimators (contrast regression, Yadlowsky, 2020) applied to the entire data set.
catefitsurv()
provides the coefficients of the CATE score for each scoring method requested
through score.method
. Currently, contrast regression is the only method which allows
for inference of the CATE coefficients by providing standard errors of the coefficients.
The coefficients can be used to learn the effect size of each variable and predict the
CATE score for a new observation.
catefitsurv()
also provides the predicted CATE score of each observation in the data set,
for each scoring method. The predictions allow ranking the observations from potentially
high responders to the treatment to potentially low or standard responders.
The estimated ATE among nested subgroups of high responders are also provided by scoring method.
Note that the ATEs in catefitsurv()
are derived based on the CATE score which is estimated
using the same data sample. Therefore, overfitting may be an issue. catecvsurv()
is more
suitable to inspect the estimated ATEs across scoring methods as it implements internal cross
validation to reduce optimism.
Value
Returns an object of the class catefit
containing the following components:
ate.randomForest
: A vector of numerical values of lengthprop.cutoff
containing the estimated ATE by the RMTL ratio in nested subgroups (defined byprop.cutoff
) constructed based on the estimated CATE scores with random forest method. Only provided ifscore.method
includes'randomForest'
.ate.boosting
: Same asate.randomForest
, but with the nested subgroups based the estimated CATE scores with boosting. Only provided ifscore.method
includes'boosting'
.ate.poisson
: Same asate.randomForest
, but with the nested subgroups based the estimated CATE scores with poisson regression. Only provided ifscore.method
includes'poisson'
.ate.twoReg
: Same asate.randomForest
, but with the nested subgroups based the estimated CATE scores with two regressions. Only provided ifscore.method
includes'twoReg'
.ate.contrastReg
: Same asate.randomForest
, but with the nested subgroups based the estimated CATE scores with contrast regression. Only provided ifscore.method
includes'contrastReg'
.hr.randomForest
: A vector of numerical values of lengthprop.cutoff
containing the adjusted hazard ratio in nested subgroups (defined byprop.cutoff
) constructed based on the estimated CATE scores with random forest method. Only provided ifscore.method
includes'randomForest'
.hr.boosting
: Same ashr.randomForest
, but with the nested subgroups based the estimated CATE scores with boosting. Only provided ifscore.method
includes'boosting'
.hr.poisson
: Same ashr.randomForest
, but with the nested subgroups based the estimated CATE scores with poisson regression. Only provided ifscore.method
includes'poisson'
.hr.twoReg
: Same ashr.randomForest
, but with the nested subgroups based the estimated CATE scores with two regressions. Only provided ifscore.method
includes'twoReg'
.hr.contrastReg
: Same ashr.randomForest
, but with the nested subgroups based the estimated CATE scores with contrast regression. Only provided ifscore.method
includes'contrastReg'
.score.randomForest
: A vector of numerical values of length n (number of observations indata
) containing the estimated log-CATE scores according to random forest. Only provided ifscore.method
includes'randomForest'
.score.boosting
: Same asscore.randomForest
, but with estimated log-CATE score according to boosting. Only provided ifscore.method
includes'boosting'
.score.poisson
: Same asscore.randomForest
, but with estimated log-CATE score according to the Poisson regression. Only provided ifscore.method
includes'poisson'
.score.twoReg
: Same asscore.randomForest
, but with estimated log-CATE score according to two regressions. Only provided ifscore.method
includes'twoReg'
.score.contrastReg
: Same asscore.randomForest
, but with estimated log-CATE score according to contrast regression. Only provided ifscore.method
includes'contrastReg'
.fit
: Additional details on model fitting ifscore.method
includes 'randomForest', 'boosting' or 'contrastReg':result.randomForest
: Details on the random forest model fitted to observations with treatment = 0($fit0.rf)
and to observations with treatment = 1($fit1.rf)
. Only provided ifscore.method
includes'randomForest'
.result.boosting
: Details on the boosting model fitted to observations with treatment = 0,($fit0.boosting)
and($fit0.gam)
and to observations with treatment = 1,($fit1.boosting)
and($fit1.gam)
. Only provided ifscore.method
includes'boosting'
.result.contrastReg$converge.contrastReg
: Whether the contrast regression algorithm converged or not. Only provided ifscore.method
includes'contrastReg'
.
coefficients
: A data frame with the coefficients of the estimated log-CATE score byscore.method
. The data frame has number of rows equal to the number of covariates incate.model
and number of columns equal tolength(score.method)
. Ifscore.method
includes'contrastReg'
, the data frame has an additional column containing the standard errors of the coefficients estimated with contrast regression.'randomForest'
and'boosting'
do not have coefficient results because tree-based methods typically do not express the log-CATE as a linear combination of coefficients and covariates.errors/warnings
: A nested list of errors and warnings that were wrapped during the calculation of ATE. Errors and warnings are organized byscore.method
.
References
Yadlowsky, S., Pellegrini, F., Lionetto, F., Braune, S., & Tian, L. (2020). Estimation and validation of ratio-based conditional average treatment effects using observational data. Journal of the American Statistical Association, 1-18. DOI: 10.1080/01621459.2020.1772080.
See Also
Examples
library(survival)
tau0 <- with(survivalExample, min(quantile(y[trt == "drug1"], 0.95),
quantile(y[trt == "drug0"], 0.95)))
fit <- catefitsurv(data = survivalExample,
score.method = "randomForest",
cate.model = Surv(y, d) ~ age + female + previous_cost +
previous_number_relapses,
ps.model = trt ~ age + previous_treatment,
ipcw.model = ~ age + previous_cost + previous_treatment,
tau0 = tau0,
seed = 999)
coef(fit)
Simulated data with count outcome
Description
A dataset containing a count outcome, a length of follow-up and 6 baseline covariates
Usage
data(countExample)
Format
A dataframe with 4000 rows (patients) and 9 variables:
- age
age at baseline, centered to 48 years old, in years
- female
sex, 0 for male, 1 for female
- previous_treatment
previous treatment, "drugA", "drugB", or "drugC"
- previous_cost
previous medical cost, in US dollars
- previous_number_symptoms
previous number of symptoms, "0", "1", or ">=2"
- previous_number_relapses
previous number of relapses
- trt
current treatment, "drug0" or "drug1"
- y
count outcome, current number of relapses
- years
length of follow-up, in years
Examples
data(countExample)
str(countExample)
rate <- countExample$y / countExample$years
Estimate restricted mean survival time (RMST) based on Cox regression model
Description
Estimate restricted mean survival time (RMST) based on Cox regression model
Usage
cox.rmst(y, d, x.cate, xnew, tau0)
Arguments
y |
Observed survival or censoring time; vector of size |
d |
The event indicator, normally |
x.cate |
Matrix of |
xnew |
Matrix of |
tau0 |
The truncation time for defining restricted mean time lost. |
Value
The estimated RMST for new subjects with covariates xnew
; vector of size m
.
Data preprocessing
Apply at the beginning of pmcount()
and cvcount()
, after arg.checks()
Description
Data preprocessing
Apply at the beginning of pmcount()
and cvcount()
, after arg.checks()
Usage
data.preproc(
fun,
cate.model,
ps.model,
data,
prop.cutoff = NULL,
prop.multi = NULL,
ps.method,
initial.predictor.method = NULL
)
Arguments
fun |
A function for which argument check is needed; "pm" for |
cate.model |
A formula describing the outcome model to be fitted. The outcome must appear on the left-hand side. |
ps.model |
A formula describing the propensity score model to be fitted.
The treatment must appear on the left-hand side. The treatment must be a numeric vector
coded as 0/1. If data are from a RCT, specify |
data |
A data frame containing the variables in the outcome and propensity score models;
a data frame with |
prop.cutoff |
A vector of numerical values (in '(0, 1]') specifying percentiles of the
estimated log CATE scores to define nested subgroups. Each element represents the cutoff to
separate observations in nested subgroups (below vs above cutoff).
The length of |
prop.multi |
A vector of numerical values (in '[0, 1]') specifying percentiles of the
estimated log CATE scores to define mutually exclusive subgroups.
It should start with 0, end with 1, and be of |
ps.method |
A character value for the method to estimate the propensity score.
Allowed values include one of:
|
initial.predictor.method |
A character vector for the method used to get initial
outcome predictions conditional on the covariates. Only applies when |
Value
A list of 6 elements:
- y: outcome; vector of length n
(observations)
- trt: binary treatment; vector of length n
- x.ps: matrix of p.ps
baseline covariates (plus intercept); dimension n
by p.ps + 1
- x.cate: matrix of p.cate
baseline covariates; dimension n
by p.cate
- time: offset; vector of length n
- if fun = "pm"
:
- prop: formatted prop.cutoff
- if fun = "cv"
- prop.onlyhigh: formatted prop.cutoff
with 0 removed if applicable
- prop.bi; formatted prop.cutoff
with 0 and 1 removed if applicable
- prop.multi: formatted prop.multi
, starting with 0 and ending with 1
Data preprocessing
Apply at the beginning of catefitmean()
and catecvmean()
, after arg.checks()
Description
Data preprocessing
Apply at the beginning of catefitmean()
and catecvmean()
, after arg.checks()
Usage
data.preproc.mean(
fun,
cate.model,
init.model,
ps.model,
data,
prop.cutoff = NULL,
prop.multi = NULL,
ps.method,
score.method = NULL,
initial.predictor.method = NULL
)
Arguments
fun |
A function for which argument check is needed; "pm" for |
cate.model |
A formula describing the outcome model to be fitted. The outcome must appear on the left-hand side. |
init.model |
A formula describing the initial predictor model. The outcome must appear on the left-hand side.
It must be specified when |
ps.model |
A formula describing the propensity score model to be fitted.
The treatment must appear on the left-hand side. The treatment must be a numeric vector
coded as 0/1. If data are from a RCT, specify |
data |
A data frame containing the variables in the outcome and propensity score models;
a data frame with |
prop.cutoff |
A vector of numerical values (in '(0, 1]') specifying percentiles of the
estimated log CATE scores to define nested subgroups. Each element represents the cutoff to
separate observations in nested subgroups (below vs above cutoff).
The length of |
prop.multi |
A vector of numerical values (in '[0, 1]') specifying percentiles of the
estimated log CATE scores to define mutually exclusive subgroups.
It should start with 0, end with 1, and be of |
ps.method |
A character value for the method to estimate the propensity score.
Allowed values include one of:
|
score.method |
A vector of one or multiple methods to estimate the CATE score.
Allowed values are: |
initial.predictor.method |
A character vector for the method used to get initial
outcome predictions conditional on the covariates. Only applies when |
Value
A list of 6 elements:
- y: outcome; vector of length n
(observations)
- trt: binary treatment; vector of length n
- x.ps: matrix of p.ps
baseline covariates (plus intercept); dimension n
by p.ps + 1
- x.cate: matrix of p.cate
baseline covariates; dimension n
by p.cate
- x.init: matrix of p.init
baseline covariates; dimension n
by p.init
- if fun = "pm"
:
- prop: formatted prop.cutoff
- if fun = "cv"
- prop.onlyhigh: formatted prop.cutoff
with 0 removed if applicable
- prop.bi; formatted prop.cutoff
with 0 and 1 removed if applicable
- prop.multi: formatted prop.multi
, starting with 0 and ending with 1
Data preprocessing
Apply at the beginning of catefitcount()
, catecvcount()
, catefitsurv()
, and catecvsurv()
, after arg.checks()
Description
Data preprocessing
Apply at the beginning of catefitcount()
, catecvcount()
, catefitsurv()
, and catecvsurv()
, after arg.checks()
Usage
data.preproc.surv(
fun,
cate.model,
ps.model,
ipcw.model = NULL,
tau0 = NULL,
data,
prop.cutoff = NULL,
prop.multi = NULL,
ps.method,
initial.predictor.method = NULL,
response = "count"
)
Arguments
fun |
A function for which argument check is needed; "catefit" for |
cate.model |
A formula describing the outcome model to be fitted. The outcome must appear on the left-hand side. |
ps.model |
A formula describing the propensity score model to be fitted.
The treatment must appear on the left-hand side. The treatment must be a numeric vector
coded as 0/1. If data are from a RCT, specify |
ipcw.model |
A formula describing inverse probability of censoring weighting(IPCW) model to be fitted.
If covariates are the same as outcome model, set |
tau0 |
The truncation time for defining restricted mean time lost. Default is |
data |
A data frame containing the variables in the outcome, propensity score, and IPCW models;
a data frame with |
prop.cutoff |
A vector of numerical values (in '(0, 1]') specifying percentiles of the
estimated log CATE scores to define nested subgroups. Each element represents the cutoff to
separate observations in nested subgroups (below vs above cutoff).
The length of |
prop.multi |
A vector of numerical values (in '[0, 1]') specifying percentiles of the
estimated log CATE scores to define mutually exclusive subgroups.
It should start with 0, end with 1, and be of |
ps.method |
A character value for the method to estimate the propensity score.
Allowed values include one of:
|
initial.predictor.method |
A character vector for the method used to get initial
outcome predictions conditional on the covariates. Only applies when |
response |
The type of response variables; |
Value
A list of elements:
- y: outcome; vector of length n
(observations)
- d : the event indicator; vector of length n
; only if respone = "survival"
- trt: binary treatment; vector of length n
- x.ps: matrix of p.ps
baseline covariates specified in the propensity score model (plus intercept); dimension n
by p.ps + 1
- x.cate: matrix of p.cate
baseline covariates specified in the outcome model; dimension n
by p.cate
- x.ipcw: matrix of p.ipw
baseline covarites specified in inverse probability of censoring weighting model; dimension n
by p.ipw
- time: offset; vector of length n
; only if response = "count"
- if fun = "catefit"
:
- prop: formatted prop.cutoff
- prop.no1: formatted prop.cutoff
with 1 removed if applicable; otherwise prop.no1 is the same as prop
- if fun = "crossv"
- prop.onlyhigh: formatted prop.cutoff
with 0 removed if applicable
- prop.bi; formatted prop.cutoff
with 0 and 1 removed if applicable
- prop.multi: formatted prop.multi
, starting with 0 and ending with 1
Doubly robust estimator of the average treatment effect for count data
Description
Doubly robust estimator of the average treatment effect between two treatments, which is the rate ratio of treatment 1 over treatment 0 for count outcomes.
Usage
drcount(
y,
trt,
x.cate,
x.ps,
time,
ps.method = "glm",
minPS = 0.01,
maxPS = 0.99,
interactions = TRUE
)
Arguments
y |
A numeric vector of size |
trt |
A numeric vector (in {0, 1}) of size |
x.cate |
A numeric matrix of dimension |
x.ps |
A numeric matrix of dimension |
time |
A numeric vector of size |
ps.method |
A character value for the method to estimate the propensity
score. Allowed values include one of: |
minPS |
A numerical value between 0 and 1 below which estimated
propensity scores should be truncated. Default is |
maxPS |
A numerical value between 0 and 1 above which estimated
propensity scores should be truncated. Must be strictly greater than
|
interactions |
A logical value indicating whether the outcome model
should allow for treatment-covariate interaction by |
Value
Return a list of 4 elements:
log.rate.ratio
: A numeric value of the estimated log rate ratio.rate0
: A numeric value of the estimated rate in the group trt=0.rate1
: A numeric value of the estimated rate in the group trt=1.
Doubly robust estimator of the average treatment effect for continuous data
Description
Doubly robust estimator of the average treatment effect between two treatments, which is the mean difference of treatment 1 over treatment 0 for continuous outcomes.
Usage
drmean(
y,
trt,
x.cate,
x.ps,
ps.method = "glm",
minPS = 0.01,
maxPS = 0.99,
interactions = TRUE
)
Arguments
y |
A numeric vector of size |
trt |
A numeric vector (in {0, 1}) of size |
x.cate |
A numeric matrix of dimension |
x.ps |
A numeric matrix of dimension |
ps.method |
A character value for the method to estimate the propensity
score. Allowed values include one of:
|
minPS |
A numerical value between 0 and 1 below which estimated propensity
scores should be truncated. Default is |
maxPS |
A numerical value between 0 and 1 above which estimated propensity
scores should be truncated. Must be strictly greater than |
interactions |
A logical value indicating whether the outcome model
should assume interactions between |
Value
Return a list of 4 elements:
mean.diff
: A numeric value of the estimated mean difference.mean.diff0
: A numeric value of the estimated mean difference in treatment group 0.mean.diff1
: A numeric value of the estimated mean difference in treatment group 1.
Doubly robust estimator of the average treatment effect with Cox model for survival data
Description
Doubly robust estimator of the average treatment effect between two treatments, which is the restricted mean time lost (RMTL) ratio of treatment 1 over treatment 0 for survival outcomes.
Usage
drsurv(
y,
d,
x.cate,
x.ps,
x.ipcw,
trt,
yf = NULL,
tau0,
surv.min = 0.025,
ps.method = "glm",
minPS = 0.01,
maxPS = 0.99,
ipcw.method = "breslow"
)
Arguments
y |
Observed survival or censoring time; vector of size |
d |
The event indicator, normally |
x.cate |
Matrix of |
x.ps |
Matrix of |
x.ipcw |
Matrix of |
trt |
Treatment received; vector of size |
yf |
Follow-up time, interpreted as the potential censoring time; vector of size |
tau0 |
The truncation time for defining restricted mean time lost. |
surv.min |
Lower truncation limit for probability of being censored (positive and very close to 0). |
ps.method |
A character value for the method to estimate the propensity score. Allowed values include one of:
|
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
ipcw.method |
The censoring model. Allowed values are: |
Value
Return a list of 4 elements:
rmst1
: A numeric value of the estimated restricted mean survival time n the grouptrt = 1
.rmst0
: A numeric value of the estimated restricted mean survival time n the grouptrt = 0
.log.rmtl.ratio
: A numeric value of the estimated log rmtl ratio.log.hazard.ratio
: A numeric value of the estimated log hazard ratio.
Estimate the Average Treatment Effect of the log risk ratio in multiple bi-level subgroups defined by the proportions
Description
If only care about the higher subgroup (above cutoff), only need trt.est.high so set onlyhigh
to be TRUE
Scores are adjusted to the opposite sign if higher.y
== FALSE; scores stay the same if higher.y
== TRUE;
this is because estcount.bilevel.subgroups() always takes the subgroup of the top highest adjusted scores,
and higher adjusted scores should always represent high responders of trt=1
Usage
estcount.bilevel.subgroups(
y,
x.cate,
x.ps,
time,
trt,
score,
higher.y,
prop,
onlyhigh,
ps.method = "glm",
minPS = 0.01,
maxPS = 0.99
)
Arguments
y |
Observed outcome; vector of size |
x.cate |
Matrix of |
x.ps |
Matrix of |
time |
Log-transformed person-years of follow-up; vector of size |
trt |
Treatment received; vector of size |
score |
Estimated log CATE scores for all |
higher.y |
A logical value indicating whether higher ( |
prop |
Proportions corresponding to percentiles in the estimated log CATE scores that define subgroups to calculate ATE for;
vector of floats in '(0, 1]' (if onlyhigh=T) or in '(0, 1)' (if onlyhigh=F):
Each element of |
onlyhigh |
Indicator of returning only the ATEs in the higher-than-cutoff category of the bi-level subgroups; boolean |
ps.method |
A character value for the method to estimate the propensity score. Allowed values include one of:
|
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
Value
ate.est.high: estimated ATEs in the multiple bi-level subgroups that are in the higher-than-cutoff category;
vector of size equal to the length of prop; always returned
ate.est.low: estimated ATEs in the multiple bi-level subgroups that are in the lower-than-cutoff category;
vector of size equal to the length of prop; returned only when onlyhigh
== TRUE
Estimate the ATE of the log RR ratio in one multilevel subgroup defined by the proportions
Description
Scores are adjusted to the opposite sign if higher.y
== FALSE; scores stay the same if higher.y
== TRUE;
this is because subgroups defined in estcount.multilevel.subgroup() start from the lowest to the highest adjusted scores,
and higher adjusted scores should always represent high responders of trt=1
Usage
estcount.multilevel.subgroup(
y,
x.cate,
x.ps,
time,
trt,
score,
higher.y,
prop,
ps.method = "glm",
minPS = 0.01,
maxPS = 0.99
)
Arguments
y |
Observed outcome; vector of size |
x.cate |
Matrix of |
x.ps |
Matrix of |
time |
Log-transformed person-years of follow-up; vector of size |
trt |
Treatment received; vector of size |
score |
Estimated log CATE scores for all |
higher.y |
A logical value indicating whether higher ( |
prop |
Proportions corresponding to percentiles in the estimated log CATE scores that define subgroups to calculate ATE for;
vector of floats in '[0, 1]' always starting with 0 and ending with 1:
Each element of |
ps.method |
A character value for the method to estimate the propensity score. Allowed values include one of:
|
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
Value
estimated ATEs of all categories in the one multilevel subgroup; vector of size equal to the length of categories in the multilevel subgroup
Estimate the ATE of the mean difference in multiple bi-level subgroups defined by the proportions
Description
If only care about the higher subgroup (above cutoff), only need
trt.est.high so set onlyhigh
to be TRUE. Scores are adjusted to the
opposite sign if higher.y
== FALSE; scores stay the same if
higher.y
== TRUE. This is because estcount.bilevel.subgroups
()
always takes the subgroup of the top highest adjusted scores,and higher
adjusted scores should always represent high responders in treatment group 1.
Usage
estmean.bilevel.subgroups(
y,
x.cate,
x.ps,
trt,
score,
higher.y,
prop,
onlyhigh,
ps.method = "glm",
minPS = 0.01,
maxPS = 0.99
)
Arguments
y |
Observed outcome; vector of size |
x.cate |
Matrix of |
x.ps |
Matrix of |
trt |
Treatment received; vector of size |
score |
Estimated CATE scores for all |
higher.y |
A logical value indicating whether higher ( |
prop |
Proportions corresponding to percentiles in the estimated CATE scores that define subgroups to calculate ATE for;
vector of floats in '(0, 1]' (if onlyhigh=T) or in '(0, 1)' (if onlyhigh=F):
Each element of |
onlyhigh |
Indicator of returning only the ATEs in the higher-than-cutoff category of the bi-level subgroups; boolean |
ps.method |
A character value for the method to estimate the propensity score. Allowed values include one of:
|
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
Value
ate.est.high: estimated ATEs in the multiple bi-level subgroups that are in the higher-than-cutoff category;
vector of size equal to the length of prop; always returned
ate.est.low: estimated ATEs in the multiple bi-level subgroups that are in the lower-than-cutoff category;
vector of size equal to the length of prop; returned only when onlyhigh
== TRUE
Estimate the ATE of the mean difference in one multilevel subgroup defined by the proportions
Description
Scores are adjusted to the opposite sign if higher.y
== FALSE; scores stay the same if higher.y
== TRUE;
this is because subgroups defined in estmean.multilevel.subgroup() start from the lowest to the highest adjusted scores,
and higher adjusted scores should always represent high responders of trt=1
Usage
estmean.multilevel.subgroup(
y,
x.cate,
x.ps,
trt,
score,
higher.y,
prop,
ps.method = "glm",
minPS = 0.01,
maxPS = 0.99
)
Arguments
y |
Observed outcome; vector of size |
x.cate |
Matrix of |
x.ps |
Matrix of |
trt |
Treatment received; vector of size |
score |
Estimated CATE scores for all |
higher.y |
A logical value indicating whether higher ( |
prop |
Proportions corresponding to percentiles in the estimated CATE scores that define subgroups to calculate ATE for;
vector of floats in '[0, 1]' always starting with 0 and ending with 1:
Each element of |
ps.method |
A character value for the method to estimate the propensity score. Allowed values include one of:
|
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
Value
estimated ATEs of all categories in the one multilevel subgroup; vector of size equal to the length of categories in the multilevel subgroup
Estimate the ATE of the RMTL ratio and unadjusted hazard ratio in multiple bi-level subgroups defined by the proportions
Description
If only care about the higher subgroup (above cutoff), only need ate.rmtl.high and hr.high so set "onlyhigh" to be TRUE
Scores are adjusted to the opposite sign if higher.y
== FALSE; scores stay the same if higher.y
== FALSE;
this is because estsurv() function always takes the subgroup of the top highest adjusted scores,
and higher adjusted scores should always represent high responders of trt=1
Usage
estsurv.bilevel.subgroups(
y,
d,
x.cate,
x.ps,
x.ipcw,
trt,
yf,
tau0 = tau0,
score,
higher.y,
prop,
onlyhigh,
surv.min = 0.025,
ps.method = "glm",
minPS = 0.01,
maxPS = 0.99,
ipcw.method = "breslow"
)
Arguments
y |
Observed survival or censoring time; vector of size |
d |
The event indicator, normally |
x.cate |
Matrix of |
x.ps |
Matrix of |
x.ipcw |
Matrix of |
trt |
Treatment received; vector of size |
yf |
Follow-up time, interpreted as the potential censoring time; vector of size |
tau0 |
The truncation time for defining restricted mean time lost. |
score |
Estimated log CATE scores for all |
higher.y |
A logical value indicating whether higher ( |
prop |
Proportions corresponding to percentiles in the estimated log CATE scores that define subgroups to calculate ATE for;
vector of floats in '(0, 1]' (if |
onlyhigh |
Indicator of returning only the ATEs in the higher-than-cutoff category of the bi-level subgroups; boolean. |
surv.min |
Lower truncation limit for probability of being censored (positive and very close to 0). |
ps.method |
A character value for the method to estimate the propensity score. Allowed values include one of:
|
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
ipcw.method |
The censoring model. Allowed values are: |
Value
ate.rmtl.high: estimated ATEs (ratio of RMTL) in the multiple bi-level subgroups that are in the higher-than-cutoff category;
vector of size equal to the length of prop; always returned.
ate.rmtl.low: estimated ATEs (ratio of RMTL) in the multiple bi-level subgroups that are in the lower-than-cutoff category;
vector of size equal to the length of prop; returned only when onlyhigh = TRUE
.
hr.high: unadjusted hazard ratio in the multiple bi-level subgroups that are in the higher-than-cutoff category;
vector of size equal to the length of prop; always returned.
hr.low: unadjusted hazard ratio in the multiple bi-level subgroups that are in the lower-than-cutoff category;
vector of size equal to the length of prop; returned only when onlyhigh = TRUE
Estimate the ATE of the RMTL ratio and unadjusted hazard ratio in one multilevel subgroup defined by the proportions
Description
Scores are adjusted to the opposite sign if higher.y
== FALSE; scores stay the same if higher.y
== FALSE;
this is because estsurv function for multilevel subgroups start from the lowest to the highest adjusted scores,
and higher adjusted scores should always represent high responders of trt=1
Usage
estsurv.multilevel.subgroups(
y,
d,
x.cate,
x.ps,
x.ipcw,
trt,
yf,
tau0 = tau0,
score,
higher.y,
prop,
surv.min = 0.025,
ps.method = "glm",
minPS = 0.01,
maxPS = 0.99,
ipcw.method = "breslow"
)
Arguments
y |
Observed survival or censoring time; vector of size |
d |
The event indicator, normally |
x.cate |
Matrix of |
x.ps |
Matrix of |
x.ipcw |
Matrix of |
trt |
Treatment received; vector of size |
yf |
Follow-up time, interpreted as the potential censoring time; vector of size |
tau0 |
The truncation time for defining restricted mean time lost. |
score |
Estimated log CATE scores for all |
higher.y |
A logical value indicating whether higher ( |
prop |
Proportions corresponding to percentiles in the estimated log CATE scores that define subgroups to calculate ATE for;
vector of floats in '[0, 1]' always starting with 0 and ending with 1:
Each element of |
surv.min |
Lower truncation limit for probability of being censored (positive and very close to 0). |
ps.method |
A character value for the method to estimate the propensity score. Allowed values include one of:
|
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
ipcw.method |
The censoring model. Allowed values are: |
Value
ate.rmtl: estimated ATEs (ratio of RMTL) of all categories in the one multilevel subgroup; vector of size equal to the length of categories in the multilevel subgroup. ate.hr: unadjusted hazard ratio of all categories in the one multilevel subgroup; vector of size equal to the length of categories in the multilevel subgroup.
Generate K-fold Indices for Cross-Validation
Description
This function generates indices for K-fold cross-validation based on the total sample size 'N' and the number of folds 'Kfold'. If 'reverse = TRUE', the remainder indices will be assigned in reverse order.
Usage
generate_kfold_indices(N, Kfold, reverse = FALSE)
Arguments
N |
Integer. Total sample size (number of observations). |
Kfold |
Integer. The number of folds to split the data into. |
reverse |
Logical. Whether to reverse the remainder indices when 'N' is not divisible by 'Kfold'. Defaults to 'FALSE'. |
Value
A vector of length 'N' containing the fold assignments (from 1 to 'Kfold').
Author(s)
Thomas Debray
Propensity score estimation with LASSO
Description
Propensity score based on a multivariate logistic regression with LASSO penalization on the two-way interactions
Usage
glm.ps(trt, x.ps, xnew = NULL, minPS = 0.01, maxPS = 0.99)
Arguments
trt |
Treatment received; vector of size |
x.ps |
Matrix of |
xnew |
Matrix of |
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
Value
The trimmed propensity score for each unit; vector of size n
(if xnew
is NULL) or m
Propensity score estimation with a linear model
Description
Propensity score based on a multivariate logistic regression with main effects only
Usage
glm.simplereg.ps(trt, x.ps, xnew = NULL, minPS = 0.01, maxPS = 0.99)
Arguments
trt |
Treatment received; vector of size |
x.ps |
A matrix of |
xnew |
A matrix of |
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
Value
The estimated propensity score for each unit; vector of size n
(if xnew
is NULL) or m
Estimate the CATE model using specified scoring methods
Description
Coefficients of the CATE estimated with boosting, naive Poisson, two regression, contrast regression, negative binomial
Usage
intxcount(
y,
trt,
x.cate,
x.ps,
time,
score.method = c("boosting", "poisson", "twoReg", "contrastReg", "negBin"),
ps.method = "glm",
minPS = 0.01,
maxPS = 0.99,
initial.predictor.method = "boosting",
xvar.smooth = NULL,
tree.depth = 2,
n.trees.boosting = 200,
B = 3,
Kfold = 6,
plot.gbmperf = TRUE,
error.maxNR = 0.001,
max.iterNR = 150,
tune = c(0.5, 2),
...
)
Arguments
y |
Observed outcome; vector of size |
trt |
Treatment received; vector of size |
x.cate |
Matrix of |
x.ps |
Matrix of |
time |
Log-transformed person-years of follow-up; vector of size |
score.method |
A vector of one or multiple methods to estimate the CATE score.
Allowed values are: |
ps.method |
A character value for the method to estimate the propensity score.
Allowed values include one of:
|
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A number above which estimated propensity scores should be trimmed; scalar |
initial.predictor.method |
A character vector for the method used to get initial
outcome predictions conditional on the covariates in |
xvar.smooth |
A vector of characters indicating the name of the variables used as
the smooth terms if |
tree.depth |
A positive integer specifying the depth of individual trees in boosting
(usually 2-3). Used only if |
n.trees.boosting |
A positive integer specifying the maximum number of trees in boosting
(usually 100-1000). Used only if |
B |
A positive integer specifying the number of time cross-fitting is repeated in
|
Kfold |
A positive integer specifying the number of folds (parts) used in cross-fitting
to partition the data in |
plot.gbmperf |
A logical value indicating whether to plot the performance measures in
boosting. Used only if |
error.maxNR |
A numerical value > 0 indicating the minimum value of the mean absolute
error in Newton Raphson algorithm. Used only if |
max.iterNR |
A positive integer indicating the maximum number of iterations in the
Newton Raphson algorithm. Used only if |
tune |
A vector of 2 numerical values > 0 specifying tuning parameters for the
Newton Raphson algorithm. |
... |
Additional arguments for |
Value
Depending on what score.method is, the outputs is a combination of the following:
result.boosting: Results of boosting fit and best iteration, for trt = 0 and trt = 1 separately
result.poisson: Naive Poisson estimator (beta1 - beta0); vector of length p.cate
+ 1
result.twoReg: Two regression estimator (beta1 - beta0); vector of length p.cate
+ 1
result.contrastReg: A list of the contrast regression results with 3 elements:
$delta.contrastReg: Contrast regression DR estimator; vector of length p.cate
+ 1
$sigma.contrastReg: Variance covariance matrix for delta.contrastReg; matrix of size p.cate
+ 1 by p.cate
+ 1
$converge.contrastReg: Indicator that the Newton Raphson algorithm converged for delta_0
; boolean
result.negBin: Negative binomial estimator (beta1 - beta0); vector of length p.cate
+ 1
best.iter: Largest best iterations for boosting (if used)
fgam: Formula applied in GAM (if used)
Estimate the CATE model using specified scoring methods
Description
Coefficients of the CATE estimated with boosting, linear regression, two regression, contrast regression, random forest, generalized additive model
Usage
intxmean(
y,
trt,
x.cate,
x.init,
x.ps,
score.method = c("boosting", "gaussian", "twoReg", "contrastReg", "gam",
"randomForest"),
ps.method = "glm",
minPS = 0.01,
maxPS = 0.99,
initial.predictor.method = "boosting",
xvar.smooth.init,
xvar.smooth.score,
tree.depth = 2,
n.trees.rf = 1000,
n.trees.boosting = 200,
B = 1,
Kfold = 2,
plot.gbmperf = TRUE,
...
)
Arguments
y |
Observed outcome; vector of size |
trt |
Treatment received; vector of size |
x.cate |
Matrix of |
x.init |
Matrix of |
x.ps |
Matrix of |
score.method |
A vector of one or multiple methods to estimate the CATE score.
Allowed values are: |
ps.method |
A character value for the method to estimate the propensity score.
Allowed values include one of:
|
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A number above which estimated propensity scores should be trimmed; scalar |
initial.predictor.method |
A character vector for the method used to get initial
outcome predictions conditional on the covariates in |
xvar.smooth.init |
A vector of characters indicating the name of the variables used as
the smooth terms if |
xvar.smooth.score |
A vector of characters indicating the name of the variables used as
the smooth terms if |
tree.depth |
A positive integer specifying the depth of individual trees in boosting
(usually 2-3). Used only if |
n.trees.rf |
A positive integer specifying the number of trees. Used only if
|
n.trees.boosting |
A positive integer specifying the maximum number of trees in boosting
(usually 100-1000). Used only if |
B |
A positive integer specifying the number of time cross-fitting is repeated in
|
Kfold |
A positive integer specifying the number of folds (parts) used in cross-fitting
to partition the data in |
plot.gbmperf |
A logical value indicating whether to plot the performance measures in
boosting. Used only if |
... |
Additional arguments for |
Value
Depending on what score.method is, the outputs is a combination of the following:
result.boosting: Results of boosting fit and best iteration, for trt = 0 and trt = 1 separately
result.gaussian: Linear regression estimator (beta1 - beta0); vector of length p.cate
+ 1
result.twoReg: Two regression estimator (beta1 - beta0); vector of length p.cate
+ 1
result.contrastReg: A list of the contrast regression results with 3 elements:
$delta.contrastReg: Contrast regression DR estimator; vector of length p.cate
+ 1
$sigma.contrastReg: Variance covariance matrix for delta.contrastReg; matrix of size p.cate
+ 1 by p.cate
+ 1
result.randomForest: Results of random forest fit and best iteration, for trt = 0 and trt = 1 separately
result.gam: Results of generalized additive model fit and best iteration, for trt = 0 and trt = 1 separately
best.iter: Largest best iterations for boosting (if used)
fgam: Formula applied in GAM when initial.predictor.method = 'gam'
warn.fit: Warnings occurred when fitting score.method
err.fit:: Errors occurred when fitting score.method
Estimate the CATE model using specified scoring methods for survival outcomes
Description
Coefficients of the CATE estimated with random forest, boosting, naive Poisson, two regression, and contrast regression
Usage
intxsurv(
y,
d,
trt,
x.cate,
x.ps,
x.ipcw,
yf = NULL,
tau0,
surv.min = 0.025,
score.method = c("randomForest", "boosting", "poisson", "twoReg", "contrastReg"),
ps.method = "glm",
minPS = 0.01,
maxPS = 0.99,
ipcw.method = "breslow",
initial.predictor.method = "randomForest",
tree.depth = 3,
n.trees.rf = 1000,
n.trees.boosting = 150,
B = 3,
Kfold = 5,
plot.gbmperf = TRUE,
error.maxNR = 0.001,
max.iterNR = 100,
tune = c(0.5, 2),
...
)
Arguments
y |
Observed survival or censoring time; vector of size |
d |
The event indicator, normally |
trt |
Treatment received; vector of size |
x.cate |
Matrix of |
x.ps |
Matrix of |
x.ipcw |
Matrix of |
yf |
Follow-up time, interpreted as the potential censoring time; vector of size |
tau0 |
The truncation time for defining restricted mean time lost. |
surv.min |
Lower truncation limit for probability of being censored (positive and very close to 0). |
score.method |
A vector of one or multiple methods to estimate the CATE score.
Allowed values are: |
ps.method |
A character vector for the method to estimate the propensity score.
Allowed values include one of:
|
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A number above which estimated propensity scores should be trimmed; scalar |
ipcw.method |
The censoring model. Allowed values are: |
initial.predictor.method |
A character vector for the method used to get initial
outcome predictions conditional on the covariates in |
tree.depth |
A positive integer specifying the depth of individual trees in boosting
(usually 2-3). Used only if |
n.trees.rf |
A positive integer specifying the maximum number of trees in random forest.
Used if |
n.trees.boosting |
A positive integer specifying the maximum number of trees in boosting
(usually 100-1000). Used if |
B |
A positive integer specifying the number of time cross-fitting is repeated in
|
Kfold |
A positive integer specifying the number of folds (parts) used in cross-fitting
to partition the data in |
plot.gbmperf |
A logical value indicating whether to plot the performance measures in
boosting. Used only if |
error.maxNR |
A numerical value > 0 indicating the minimum value of the mean absolute
error in Newton Raphson algorithm. Used only if |
max.iterNR |
A positive integer indicating the maximum number of iterations in the
Newton Raphson algorithm. Used only if |
tune |
A vector of 2 numerical values > 0 specifying tuning parameters for the
Newton Raphson algorithm. |
... |
Additional arguments for |
Value
Depending on what score.method is, the outputs is a combination of the following:
result.randomForest: Results of random forest fit, for trt = 0 and trt = 1 separately
result.boosting: Results of boosting fit, for trt = 0 and trt = 1 separately
result.poisson: Naive Poisson estimator (beta1 - beta0); vector of length p.cate
+ 1
result.twoReg: Two regression estimator (beta1 - beta0); vector of length p.cate
+ 1
result.contrastReg: A list of the contrast regression results with 2 elements:
$delta.contrastReg: Contrast regression DR estimator; vector of length p.cate
+ 1
$converge.contrastReg: Indicator that the Newton Raphson algorithm converged for delta_0
; boolean
Probability of being censored
Description
Probability of being censored which is used to correct the effect of right censoring.
Usage
ipcw.surv(
y,
d,
x.ipcw,
yf = NULL,
ipcw.method = "breslow",
tau0,
surv.min = 0.025
)
Arguments
y |
Observed survival or censoring time; vector of size |
d |
The event indicator, normally |
x.ipcw |
Matrix of |
yf |
Follow-up time, interpreted as the potential censoring time; vector of size |
ipcw.method |
The censoring model. Allowed values are: |
tau0 |
The truncation time for defining restricted mean time lost. |
surv.min |
Lower truncation limit for probability of being censored (positive and very close to 0). |
Value
A vector of size n
with the estimated probabilities Pr(C > min(y, tau0) | x.ipcw)
Catch errors and warnings when estimating the ATEs in the nested subgroup for continuous data
Description
Storing the errors and warnings that occurred when estimating the ATEs in the nested subgroups. If there are no errors and no warnings, the estimated mean difference is provided. If there are warnings but no errors, the estimated mean difference is provided with a warning attribute set. If there are errors, the NA values are returned for mean difference. A error attribute set is also provided.
Usage
meanCatch(fun)
Arguments
fun |
The drsurv function... |
Value
A list containing
Simulated data with a continuous outcome
Description
A dataset containing a continuous outcome and 6 baseline covariates
Usage
data(meanExample)
Format
A dataframe with 4000 rows (patients) and 9 variables:
- age
age at baseline, centered to 48 years old, in years
- female
sex, 0 for male, 1 for female
- previous_treatment
previous treatment, "drugA", "drugB", or "drugC"
- previous_cost
previous medical cost, in US dollars
- previous_number_symptoms
previous number of symptoms, "0", "1", or ">=2"
- previous_number_relapses
previous number of relapses
- trt
current treatment, "drug0" or "drug1"
- y
count outcome, current number of relapses
#'
Examples
data(meanExample)
str(meanExample)
Doubly robust estimators of the coefficients in the two regression
Description
Doubly robust estimators of the coefficients in the two regression
Usage
onearmglmcount.dr(y, x.cate, time, trt, ps, f.predictor)
Arguments
y |
Observed outcome; vector of size |
x.cate |
Matrix of |
time |
Log-transformed person-years of follow-up; vector of size |
trt |
Treatment received; vector of size |
ps |
Estimated propensity scores for all observations; vector of size |
f.predictor |
Initial prediction of the outcome (expected number of relapses for one unit of exposure time) conditioned
on the covariates |
Value
Doubly robust estimators of the regression coefficients beta_r
in the doubly robust estimating equation
where r = 0, 1
is treatment received; vector of size p
+ 1 (intercept included)
Doubly robust estimators of the coefficients in the two regression
Description
Doubly robust estimators of the coefficients in the two regression
Usage
onearmglmmean.dr(y, x.cate, trt, ps, f.predictor)
Arguments
y |
Observed outcome; vector of size |
x.cate |
Matrix of |
trt |
Treatment received; vector of size |
ps |
Estimated propensity scores for all observations; vector of size |
f.predictor |
Initial prediction of the outcome (expected number of relapses for one unit of exposure time) conditioned
on the covariates |
Value
Doubly robust estimators of the regression coefficients beta_r
in the doubly robust estimating equation
where r = 0, 1
is treatment received; vector of size p
+ 1 (intercept included)
Doubly robust estimators of the coefficients in the two regression
Description
Doubly robust estimators of the coefficients in the two regression
Usage
onearmsurv.dr(ynew, dnew, trt, x.cate, tau0, weightsurv, ps, f.predictor)
Arguments
ynew |
Truncated survival or censoring time; vector of size |
dnew |
The event indicator after truncation, |
trt |
Treatment received; vector of size |
x.cate |
Matrix of |
tau0 |
The truncation time for defining restricted mean time lost. |
weightsurv |
Estimated inverse probability of censoring weights with truncation for all observations; vector of size |
ps |
Estimated propensity scores for all observations; vector of size |
f.predictor |
Initial prediction of the outcome (restricted mean time loss) conditioned on the covariates |
Value
Doubly robust estimators of the two regression coefficients beta_r
where r = 0, 1
is treatment received; vector of size p.cate
+ 1 (intercept included)
Histogram of bootstrap estimates
Description
Histogram of bootstrap estimates
Usage
## S3 method for class 'atefit'
plot(x, bins, alpha = 0.7, title = waiver(), theme = theme_classic(), ...)
Arguments
x |
An object of class |
bins |
Number of bins |
alpha |
Opacity |
title |
The text for the title |
theme |
Defaults to |
... |
Other parameters |
Details
Create a histogram displaying the distribution of the bootstrap estimates. The red vertical reference line represents the final estimate.
Value
A plot of the class ggplot
, displaying the estimated ATE across
the bootstrap samples
Author(s)
Thomas Debray
Two side-by-side line plots of validation curves from the "precmed"
object
Description
Provides validation curves in two side-by-side plots, visualizing the estimated ATEs in a series
of nested subgroups in the training set and validation set separately, where each line represents
one scoring method specified in catecv()
or catecvmean()
. This should be run
only after results of catecv()
or catecvmean()
have been obtained.
Usage
## S3 method for class 'precmed'
plot(
x,
cv.i = NULL,
combine = "mean",
show.abc = TRUE,
valid.only = FALSE,
plot.hr = FALSE,
ylab = NULL,
legend.position = "bottom",
xlim = NULL,
title = waiver(),
theme = theme_classic(),
...
)
Arguments
x |
An object of class |
cv.i |
A positive integer indicating the index of the CV iteration results to be plotted.
Allowed values are: a positive integer |
combine |
A character value indicating how to combine the estimated ATEs across all CV
iterations into a validation curve for each nested subgroup, separately for the training and
validation results. Allowed values are: |
show.abc |
A logical value indicating whether to show the ABC statistics in the validation set. Used
only if |
valid.only |
A logical value indicating whether only the validation curves in the validation set
should be plotted ( |
plot.hr |
A logical value indicating whether the hazard ratios should be plotted in the
validation curves ( |
ylab |
A character value for the y-axis label to describe what the ATE is. Default is |
legend.position |
A character value for the legend position argument to be passed to |
xlim |
A numeric value for the range of the x-axis. Default is |
title |
The text for the title |
theme |
Defaults to |
... |
Other parameters |
Details
plot()
takes in outputs from catecv()
and generates two plots
of validation curves side-by-side, one for the training set and one for validation set.
Separate validation curves are produced for each scoring method specified via score.method
in catecv()
or catecvmean()
.
The validation curves (and ABC statistics, if applicable) can help compare the performance of different scoring methods in terms of discerning potential treatment heterogeneity in subgroups with internal validation. Steeper validation curves in the validation set suggest presence of treatment effect heterogeneity (and the ability of the scoring methods to capture it) while flat validation curves indicate absence of treatment effect heterogeneity (or inability of the scoring method to capture it).
Value
Returns two side-by-side line plots, one of which shows the validation curves of the training
sets and the other the validation curves in the validation sets. A gray horizontal dashed line of
overall ATE is included as a reference. ABC statistics will be added to the legend if
show.abc = TRUE
.
References
Yadlowsky, S., Pellegrini, F., Lionetto, F., Braune, S., & Tian, L. (2020). Estimation and validation of ratio-based conditional average treatment effects using observational data. Journal of the American Statistical Association, 1-18. DOI: 10.1080/01621459.2020.1772080.
See Also
abc()
and boxplot()
for "precmed"
objects.
Examples
# Count outcome
eval_1 <- catecv(response = "count",
data = countExample,
score.method = "poisson",
cate.model = y ~ age + female + previous_treatment +
previous_cost + previous_number_relapses + offset(log(years)),
ps.model = trt ~ age + previous_treatment,
higher.y = FALSE,
cv.n = 5)
# default setting
plot(eval_1)
# turn off ABC annotation
plot(eval_1, show.abc = FALSE)
# use a different theme
plot(eval_1, theme = ggplot2::theme_bw())
# plot the validation curves from the 2nd CV iteration instead of the mean
# of all validation curves
plot(eval_1, cv.i = 2)
# median of the validation curves
plot(eval_1, combine = "median")
# plot validation curves in validation set only
plot(eval_1, valid.only = TRUE)
# Survival outcome
library(survival)
tau0 <- with(survivalExample,
min(quantile(y[trt == "drug1"], 0.95), quantile(y[trt == "drug0"], 0.95)))
eval_2 <- catecv(response = "survival",
data = survivalExample,
score.method = c("poisson", "randomForest"),
cate.model = Surv(y, d) ~ age + female + previous_cost +
previous_number_relapses,
ps.model = trt ~ age + previous_treatment,
initial.predictor.method = "randomForest",
ipcw.model = ~ age + previous_cost + previous_treatment,
tau0 = tau0,
cv.n = 5,
seed = 999)
# default setting, plot RMTL ratios in both training and validation sets
plot(eval_2)
# plot hazard ratio
plot(eval_2, plot.hr = TRUE)
Print function for atefit
Description
Print function for atefit
Usage
## S3 method for class 'atefit'
print(x, ...)
Arguments
x |
An object of class |
... |
Other parameters |
Details
Display the estimated treatment effects for survival outcomes (log restricted mean time lost ratio and log hazard ratio) and count outcomes (the log rate ratio).
Value
No return value
Author(s)
Thomas Debray
Print function for atefit
Description
Print function for atefit
Usage
## S3 method for class 'catefit'
print(x, ...)
Arguments
x |
An object of class |
... |
Other parameters |
Details
Display the estimated treatment effects for survival outcomes (log restricted mean time lost ratio and log hazard ratio) and count outcomes (the log rate ratio).
Value
No return value
Author(s)
Thomas Debray
Calculate the log CATE score given the baseline covariates and follow-up time for specified scoring method methods
Description
Based on intxcount results of the CATE coefficients estimated with boosting, naive Poisson, two regression, contrast regression, negative binomial
Usage
scorecount(
fit,
x.cate,
time,
score.method = c("boosting", "poisson", "twoReg", "contrastReg", "negBin")
)
Arguments
fit |
List of objects generated from intxcount: outputs of boosting, naive Poisson, two regression, contrast regression, negative binomial |
x.cate |
Matrix of |
time |
Log-transformed person-years of follow-up; vector of size |
score.method |
A vector of one or multiple methods to estimate the CATE score.
Allowed values are: |
Value
score.boosting: Estimated log CATE score for all n
observations with the boosting method; vector of size n
score.poisson: Estimated log CATE score for all n
observations with the naive Poisson method; vector of size n
score.twoReg: Estimated log CATE score for all n
observations with the two regression method; vector of size n
score.contrastReg: Estimated log CATE score for all n
observations with the contrast regression method; vector of size n
score.negBin: Estimated log CATE score for all n
observations with the naive Poisson method; vector of size n
score = NA if the corresponding method is not called
Calculate the CATE score given the baseline covariates for specified scoring method methods
Description
Based on intxmean results of the CATE coefficients estimated with boosting, linear regression, two regression, contrast regression, random forest, generalized additive model
Usage
scoremean(
fit,
x.cate,
score.method = c("boosting", "gaussian", "twoReg", "contrastReg", "randomForest",
"gam")
)
Arguments
fit |
List of objects generated from intxmean: outputs of boosting, linear regression, two regression, contrast regression, random forest, generalized additive model |
x.cate |
Matrix of |
score.method |
A vector of one or multiple methods to estimate the CATE score.
Allowed values are: |
Value
score.boosting: Estimated CATE score for all n
observations with the boosting method; vector of size n
score.gaussian: Estimated CATE score for all n
observations with the linear regression method; vector of size n
score.twoReg: Estimated CATE score for all n
observations with the two regression method; vector of size n
score.contrastReg: Estimated CATE score for all n
observations with the contrast regression method; vector of size n
score.randomForest: Estimated CATE score for all n
observations with the random forest method; vector of size n
score.gam: Estimated CATE score for all n
observations with the generalized additive model; vector of size n
score = NA if the corresponding method is not called
Calculate the log CATE score given the baseline covariates and follow-up time for specified scoring method methods for survival outcomes
Description
Based on intxsurv results of the CATE coefficients estimated with random forest, boosting, naive Poisson, two regression, contrast regression
Usage
scoresurv(
fit,
x.cate,
tau0,
score.method = c("randomForest", "boosting", "poisson", "twoReg", "contrastReg")
)
Arguments
fit |
List of objects generated from intxsurv: outputs of random forest, boosting, naive Poisson, two regression, contrast regression |
x.cate |
Matrix of |
tau0 |
The truncation time for defining restricted mean time lost. |
score.method |
A vector of one or multiple methods to estimate the CATE score.
Allowed values are: |
Value
score.randomForest: Estimated log CATE score for all n
observations with the random forest method; vector of size n
score.boosting: Estimated log CATE score for all n
observations with the boosting method; vector of size n
score.poisson: Estimated log CATE score for all n
observations with the naive Poisson method; vector of size n
score.twoReg: Estimated log CATE score for all n
observations with the two regression method; vector of size n
score.contrastReg: Estimated log CATE score for all n
observations with the contrast regression method; vector of size n
score = NA if the corresponding method is not called
Catch errors and warnings when estimating the ATEs in the nested subgroup
Description
Storing the errors and warnings that occurred when estimating the ATEs in the nested subgroups. If there are no errors and no warnings, the estimated log.rmtl.ratio and log.hazard.ratio are provided. If there are warnings but no errors, the estimated log.rmtl.ratio and log.hazard.ratio are provided with a warning attribute set. If there are errors, the NA values are returned for log.rmtl.ratio and log.hazard.ratio. A error attribute set is also provided.
Usage
survCatch(fun)
Arguments
fun |
The drsurv function... |
Value
A list containing
Simulated data with survival outcome
Description
A dataset containing a time-to-event outcome, an event indicator, treatment group, and 6 baseline covariates
Usage
data(survivalExample)
Format
A dataframe with 4000 rows (patients) and 9 variables:
- age
age at baseline, centered to 48 years old, in years
- female
sex, 0 for male, 1 for female
- previous_treatment
previous treatment, "drugA", "drugB", or "drugC"
- previous_cost
previous medical cost, in US dollars
- previous_number_symptoms
previous number of symptoms, "0", "1", or ">=2"
- previous_number_relapses
previous number of relapses
- trt
current treatment, "drug0" or "drug1"
- y
time to first relapse or censoring
- d
event indicator, 1: relapse, 0: censored
Examples
data(survivalExample)
str(survivalExample)
Doubly robust estimators of the coefficients in the contrast regression as well as their covariance matrix and convergence information
Description
Newton-Raphson algorithm is used to solve the estimating equation bar S_n (delta) = 0
Usage
twoarmglmcount.dr(
y,
x.cate,
time,
trt,
ps,
f1.predictor,
f0.predictor,
error.maxNR = 0.001,
max.iterNR = 150,
tune = c(0.5, 2)
)
Arguments
y |
Observed outcome; vector of size |
x.cate |
Matrix of |
time |
Log-transformed person-years of follow-up; vector of size |
trt |
Treatment received; vector of size |
ps |
Estimated propensity scores for all observations; vector of size |
f1.predictor |
Initial predictions of the outcome (expected number of relapses for one unit of exposure time)
conditioned on the covariates |
f0.predictor |
Initial predictions of the outcome (expected number of relapses for one unit of exposure time)
conditioned on the covariates |
error.maxNR |
A numerical value > 0 indicating the minimum value of the mean absolute
error in Newton Raphson algorithm. Used only if |
max.iterNR |
A positive integer indicating the maximum number of iterations in the
Newton Raphson algorithm. Used only if |
tune |
A vector of 2 numerical values > 0 specifying tuning parameters for the
Newton Raphson algorithm. |
Value
coef: Doubly robust estimators of the regression coefficients delta_0
; vector of size p
+ 1 (intercept included)
vcov: Variance-covariance matrix of the estimated coefficient delta_0
; matrix of size p
+ 1 by p
+ 1
converge: Indicator that the Newton Raphson algorithm converged for delta_0
; boolean
Doubly robust estimators of the coefficients in the contrast regression as well as their covariance matrix
Description
Solving the estimating equation bar S_n (delta) = 0
Usage
twoarmglmmean.dr(y, x.cate, trt, ps, f1.predictor, f0.predictor)
Arguments
y |
Observed outcome; vector of size |
x.cate |
Matrix of |
trt |
Treatment received; vector of size |
ps |
Estimated propensity scores for all observations; vector of size |
f1.predictor |
Initial predictions of the outcome (expected number of relapses for one unit of exposure time)
conditioned on the covariates |
f0.predictor |
Initial predictions of the outcome (expected number of relapses for one unit of exposure time)
conditioned on the covariates |
Value
coef: Doubly robust estimators of the regression coefficients delta_0
; vector of size p
+ 1 (intercept included)
vcov: Variance-covariance matrix of the estimated coefficient delta_0
; matrix of size p
+ 1 by p
+ 1
Doubly robust estimators of the coefficients in the contrast regression as well as their covariance matrix and convergence information
Description
Newton-Raphson algorithm is used to solve the estimating equation bar S_n (delta) = 0
Usage
twoarmsurv.dr(
ynew,
dnew,
trt,
x.cate,
tau0,
weightsurv,
ps,
f1.predictor,
f0.predictor,
error.maxNR = 0.001,
max.iterNR = 100,
tune = c(0.5, 2)
)
Arguments
ynew |
Truncated survival time; vector of size |
dnew |
Event indicator after truncation; vector of size |
trt |
Treatment received; vector of size |
x.cate |
Matrix of |
tau0 |
The truncation time for defining restricted mean time lost. |
weightsurv |
Estimated inverse probability of censoring weights with truncation for all observations; vector of size |
ps |
Estimated propensity scores for all observations; vector of size |
f1.predictor |
Initial predictions of the outcome (restricted mean time loss) conditioned on the covariates |
f0.predictor |
Initial predictions of the outcome (restricted mean time loss) conditioned on the covariates |
error.maxNR |
A numerical value > 0 indicating the minimum value of the mean absolute
error in Newton Raphson algorithm. Used only if |
max.iterNR |
A positive integer indicating the maximum number of iterations in the
Newton Raphson algorithm. Used only if |
tune |
A vector of 2 numerical values > 0 specifying tuning parameters for the
Newton Raphson algorithm. |
Value
coef: Doubly robust estimators of the contrast regression coefficients delta_0
; vector of size p.cate
+ 1 (intercept included)
converge: Indicator that the Newton Raphson algorithm converged for delta_0
; boolean