Type: | Package |
Title: | CURE (Cumulative Residual) Plots |
Version: | 1.1.1 |
Description: | Creates 'ggplot2' Cumulative Residual (CURE) plots to check the goodness-of-fit of a count model; or the tables to create a customized version. A dataset of crashes in Washington state is available for illustrative purposes. |
License: | AGPL (≥ 3) |
Encoding: | UTF-8 |
LazyData: | true |
URL: | https://github.com/gbasulto/cureplots, https://gbasulto.github.io/cureplots/ |
BugReports: | https://github.com/gbasulto/cureplots/issues |
Imports: | dplyr, ggplot2, glue |
RoxygenNote: | 7.3.2 |
Depends: | R (≥ 2.10) |
Suggests: | testthat (≥ 3.0.0) |
Config/testthat/edition: | 3 |
Language: | en-US |
NeedsCompilation: | no |
Packaged: | 2024-10-30 18:19:56 UTC; basulto |
Author: | Jonathan Wood |
Maintainer: | Guillermo Basulto-Elias <basulto@iastate.edu> |
Repository: | CRAN |
Date/Publication: | 2024-10-30 18:30:02 UTC |
Calculate CURE Dataframe
Description
Calculate CURE Dataframe
Usage
calculate_cure_dataframe(covariate_values, residuals)
Arguments
covariate_values |
name to be plot. With or without quotes. |
residuals |
Residuals. |
Value
A data frame with five columns: independent variable, residuals, cumulative residuals, lower confidence interval limit, and upper confidence interval limit.
Examples
set.seed(2000)
## Define parameters
beta <- c(-1, 0.3, 3)
## Simulate independent variables
n <- 900
AADT <- c(runif(n, min = 2000, max = 150000))
nlanes <- sample(x = c(2, 3, 4), size = n, replace = TRUE)
LNAADT <- log(AADT)
## Simulate dependent variable
theta <- exp(beta[1] + beta[2] * LNAADT + beta[3] * nlanes)
y <- rpois(n, theta)
## Fit model
mod <- glm(y ~ LNAADT + nlanes, family = poisson)
## Calculate residuals
res <- residuals(mod, type = "response")
## Calculate CURE plot data
cure_df <- calculate_cure_dataframe(AADT, res)
head(cure_df)
CURE Plot
Description
CURE Plot
Usage
cure_plot(x, covariate = NULL, n_resamples = 0)
Arguments
x |
Either a data frame produced with
|
covariate |
Required when |
n_resamples |
Number of resamples to overlay on CURE plot. Zero is the default. |
Value
A CURE plot generated with ggplot2.
Examples
## basic example code
set.seed(2000)
## Define parameters
beta <- c(-1, 0.3, 3)
## Simulate independent variables
n <- 900
AADT <- c(runif(n, min = 2000, max = 150000))
nlanes <- sample(x = c(2, 3, 4), size = n, replace = TRUE)
LNAADT <- log(AADT)
## Simulate dependent variable
theta <- exp(beta[1] + beta[2] * LNAADT + beta[3] * nlanes)
y <- rpois(n, theta)
## Fit model
mod <- glm(y ~ LNAADT + nlanes, family = poisson)
## Calculate residuals
res <- residuals(mod, type = "response")
## Calculate CURE plot data
cure_df <- calculate_cure_dataframe(AADT, res)
head(cure_df)
## Providing CURE data frame
cure_plot(cure_df)
## Providing glm object
cure_plot(mod, "LNAADT")
## Providing glm object adding resamples cumulative residuals
cure_plot(mod, "LNAADT", n_resamples = 3)
Resample residuals
Description
Resample residuals to compute several cumulative residual curves. Receives the covariate values, residuals and number of samples and shuffles (i.e., samples without replacement a vector of the same size) the residuals, and returns a stacked data frame.
Usage
resample_residuals(covariate_values, residuals, n_resamples)
Arguments
covariate_values |
Covariate values. |
residuals |
Residuals. |
n_resamples |
Number of times to sample the residuals. |
Value
Data frame of stacked
Examples
library(cureplots)
library(ggplot2)
## basic example
set.seed(2000)
## Define parameters.
beta <- c(-1, 0.3, 3)
## Simulate independent variables
n <- 900
AADT <- c(runif(n, min = 2000, max = 150000))
nlanes <- sample(x = c(2, 3, 4), size = n, replace = TRUE)
LNAADT <- log(AADT)
## Simulate dependent variable
theta <- exp(beta[1] + beta[2] * LNAADT + beta[3] * nlanes)
y <- rpois(n, theta)
## Fit model
mod <- glm(y ~ LNAADT + nlanes, family = poisson)
## Calculate residuals
res <- residuals(mod, type = "response")
## Calculate CURE plot data
cure_df <- calculate_cure_dataframe(AADT, res)
resampled_residuals_tbl <- resample_residuals(AADT, res, n_resamples = 3)
ggplot(data = cure_df) +
aes(AADT, cumres) +
geom_line(
data = resampled_residuals_tbl,
aes(group = sample),
col = "grey"
) +
geom_line(color = "darkgreen", linewidth = 0.8) +
geom_line(
aes(y = lower),
color = "magenta",
linetype = "dashed",
linewidth = 0.8) +
geom_line(
aes(y = upper),
color = "blue",
linetype = "dashed",
linewidth = 0.8) +
theme_bw()
Washington Road Crashes
Description
Crashes on Washington primary roads from 2016, 2017, and 2018. Data acquired from Washington Department of Transportation through the Highway Safety Information System (HSIS).
Usage
washington_roads
Format
The data frame washington_roads
has 1,501 rows and 9 columns:
- ID
Anonymized road ID. Factor.
- Year
Year. Integer.
- AADT
Annual Average Daily Traffic (AADT). Double.
- Length
Segment length in miles. Double.
- Total_crashes
Total crashes. Integer.
- lnaadt
Natural logarithm of AADT. Double.
- lnlength
Natural logarithm of length in miles. Double.
- speed50
Indicator of whether the speed limit is 50 mph or greater. Binary.
- ShouldWidth04
Indicator of whether the shoulder is 4 feet or wider. Binary.
Source
<https://highways.dot.gov/research/safety/hsis>