Title: An Elegant Approach to Summarizing Clinical Data
Version: 0.1.0
Description: Streamlines the analysis of clinical data by automatically selecting appropriate statistical descriptions and inference methods based on variable types. For method details see Motulsky H J (2016) https://www.graphpad.com/guides/prism/10/statistics/index.htm and d'Agostino R B (1971) <doi:10.1093/biomet/58.2.341>.
License: MIT + file LICENSE
Encoding: UTF-8
RoxygenNote: 7.3.2
Imports: car, cli, dplyr, fBasics, glue, qqplotr, rlang, stats, stringr, tibble, tidyplots, tidyr
Suggests: knitr, rmarkdown
VignetteBuilder: knitr
Depends: R (≥ 4.1.0)
NeedsCompilation: no
Packaged: 2025-07-10 07:33:20 UTC; Lixiang
Author: Xiang Li [aut, cre]
Maintainer: Xiang Li <htqqdd@126.com>
Repository: CRAN
Date/Publication: 2025-07-15 07:00:02 UTC

Add statistical test results to summary data

Description

Calculates and appends p-values with optional statistical details to a summary table based on variable types and group comparisons. Handles both continuous and categorical variables with appropriate statistical tests.

Usage

add_p(
  summary,
  digit = 3,
  asterisk = FALSE,
  add_method = FALSE,
  add_statistic_name = FALSE,
  add_statistic_value = FALSE
)

Arguments

summary

A data frame that has been processed by add_summary().

digit

A numeric determine decimal. Accepts:

  • 3:convert to 3 decimal, default

  • 4:convert to 4 decimal

asterisk

Logical indicating whether to show asterisk significance markers.

add_method

Control parameter for display of statistical methods. Accepts:

  • 'code': Show method as codes according to order of appearance

  • TRUE/'true': Show method text

  • FALSE/'false': Not show method text

add_statistic_name

Logical indicating whether to include test statistic names.

add_statistic_value

Logical indicating whether to include test statistic values.

Value

A data frame merged with statistical test results, containing: - Variable names - Summary - Formatted p-values - Optional method names/codes - Optional statistic names/values

Examples

# `summary` is a data frame processed by `add_var()` and `add_summary()`:
data <- add_var(iris, var = c("Sepal.Length", "Species"), group = "Species")
summary <- add_summary(data)

# Add statistical test results
result <- add_p(summary)


Add summary statistics to a add_var object

Description

This function generates summary statistics for variables from a data frame that has been processed by add_var(), with options to format outputs.

Usage

add_summary(
  data,
  add_overall = TRUE,
  continuous_format = NULL,
  norm_continuous_format = "{mean} ± {SD}",
  unnorm_continuous_format = "{median} ({Q1}, {Q3})",
  categorical_format = "{n} ({pct})",
  binary_show = "last",
  digit = 2
)

Arguments

data

A data frame that has been processed by add_var().

add_overall

Logical indicating whether to include an "Overall" summary column. TRUE, by default.

continuous_format

Format string to override both normal/abnormal continuous formats. Accepted placeholders are {mean}, {SD}, {median}, {Q1}, {Q3}.

norm_continuous_format

Format string for normally distributed continuous variables. Default is "{mean} ± {SD}". Accepted placeholders same as continuous_format.

unnorm_continuous_format

Format string for non-normal continuous variables. Default is "{median} ({Q1}, {Q3})". Accepted placeholders same as continuous_format.

categorical_format

Format string for categorical variables. Default is "{n} ({pct})". Accepted placeholders are {n} and {pct}.

binary_show

Display option for binary variables:

  • "first": show only first level

  • "last": show only last level, default

  • "all": show all levels

digit

digit A numeric determine decimal.

Value

A data frame containing summary statistics with the following columns:

Examples

# `data` is a data frame processed by `add_var()`:
data <- add_var(iris, var = c("Sepal.Length", "Species"), group = "Species")
# Add summary statistics
result <- add_summary(data, add_overall = TRUE)
result <- add_summary(data, continuous_format = "{mean}, ({SD})")


Prepare variables for add_summary

Description

This function processes a dataset for statistical analysis by categorizing variables into continuous and categorical types. It automatically handles normality checks, equality of variances checks, and expected frequency assumptions checks.

Usage

add_var(data, var = NULL, group = "group", norm = "auto", center = "median")

Arguments

data

A data frame containing the variables to analyze, with variables at columns and observations at rows.

var

A character vector of variable names to include. If NULL, by default, all columns except the group column will be used.

group

A character string specifying the grouping variable in data. If not specified, 'group', by default.

norm

Control parameter for normality tests. Accepts:

  • 'auto': Automatically decide based on p-values, but the same as 'ask' when n > 1000, default

  • 'ask': Show p-values, plots QQ plots and prompts for decision

  • TRUE/'true': Always assuming data are normally distributed

  • FALSE/'false': Always assuming data are non-normally distributed

center

A character string specifying the center to use in Levene's test for equality of variances. Default is 'median', which is more robust than the mean.

Value

A modified data frame with an attribute 'add_var' containing a list of categorized variables and their properties:

Examples

data <- add_var(iris, var = c("Sepal.Length", "Species"), group = "Species")


Test for Equality of Variances

Description

Performs Levene's test to assess equality of variances between groups.

Usage

equal_test(data, var, group, center = "median")

Arguments

data

A data frame containing the variables to be tested.

var

A character string specifying the numeric variable in data to test.

group

A character string specifying the grouping variable in data.

center

A character string specifying the center to use in Levene's test. Default is 'median', which is more robust than the mean.

Value

Logical value:

Methodology for Equality of Variances

Levene's test is the default method adopted in SPSS, the original Levene's test select center = mean, but here select center = median for a more robust test

Examples

equal_test(iris, "Sepal.Length", "Species")


Format p-values with significance markers

Description

Formats p-values as strings with specified precision and optional significance asterisks.

Usage

format_p(p, digit = 3, asterisk = FALSE)

Arguments

p

A numeric p-value between 0 and 1.

digit

A numeric determine decimal. Accepts:

  • 3:convert to 3 decimal, default

  • 4:convert to 4 decimal

asterisk

Logical indicating whether to return significance asterisks.

Value

Character of formatted p-value or asterisks.

Examples

format_p(0.00009, 4)
format_p(0.03, 3)
format_p(0.02, asterisk = TRUE)


Perform normality test on a variable

Description

Conducts normality tests for a specified variable, optionally by group. Supports automatic testing and interactive visualization.

Usage

normal_test(data = NULL, var = NULL, group = NULL, norm = "auto")

Arguments

data

A data frame containing the variables to be tested.

var

A character string specifying the numeric variable in data to test.

group

A character string specifying the grouping variable in data. If NULL, treated as one group.

norm

Control parameter for test behavior. Accepts:

  • 'auto': Automatically decide based on p-values, but the same as 'ask' when n > 1000, default

  • 'ask': Show p-values, plots QQ plots and prompts for decision

  • TRUE/'true': Always returns TRUE

  • FALSE/'false': Always returns FALSE

Value

A logical value:

Methodology for p-values

Automatically selects test based on sample size per group:

Examples

normal_test(iris, "Sepal.Length", "Species", norm = "auto")
normal_test(iris, "Sepal.Length", "Species", norm = TRUE)


Check Sample Size Adequacy for Chi-Squared Test

Description

This function determines if a contingency table meets the expected frequency assumptions for a valid chi-squared test. It categorizes the data into "not_small", "small", or "very_small" based on sample size and expected frequencies.

Usage

small_test(data, var, group)

Arguments

data

A data frame containing the variables to be tested.

var

A character string specifying the factor variable in data to test.

group

A character string specifying the grouping variable in data.

Value

A character string with one of three values:

Examples

df <- data.frame(
  category = factor(c("A", "B", "A", "B")),
  group    = factor(c("X", "X", "Y", "Y"))
)
small_test(data = df, var = "category", group = "group")