Title: Tools for Conditional Probability
Version: 0.4.1
Description: Streamline the calculation of conditional probabilities for various numeric ranges in an R DataFrame. It automates the need to convert numerical data into categorical data for conditional probability calculation, making it ideal for quick and preliminary data analysis.
License: GPL-3
Depends: R (≥ 3.0.0), dplyr
Encoding: UTF-8
RoxygenNote: 7.3.3
URL: https://github.com/kimmanlui/calc_cond_prob
BugReports: https://github.com/kimmanlui/calc_cond_prob/issues
NeedsCompilation: no
Packaged: 2026-02-24 06:53:58 UTC; admin
Author: Kim Man Lui [aut, cre]
Maintainer: Kim Man Lui <cskmlui@gmail.com>
Repository: CRAN
Date/Publication: 2026-03-02 22:00:02 UTC

Calculate Conditional Probability

Description

Calculate Conditional Probability

Usage

calc_cond_prob(
  clean_data,
  formula_string = NULL,
  range_list,
  cond_evaluation = NULL,
  col_name_list = NULL,
  verbose = FALSE
)

Arguments

clean_data

A data.frame containing input data for analysis. It must include all variables referenced in 'formula_string' and 'col_name_list'.

formula_string

An optional formula string in the format 'y ~ x1 + x2 + ...'. This specifies the relationship between dependent and independent variables. If provided, it determines the conditional evaluation and the list of column names.

range_list

A list of ranges or boundaries corresponding to each column in 'col_name_list'. This parameter is mandatory and must contain appropriate range definitions.

cond_evaluation

An optional string for the conditional evaluation expression. If not provided, it defaults to the left-hand side of 'formula_string'.

col_name_list

An optional list of column names for the analysis. If not specified, it is derived from 'formula_string'.

verbose

A boolean indicating more message. When set to TRUE, additional output will be printed to help trace the computation steps.

Value

A list containing the results of the conditional probability calculation, the good chance evaluation, and the adjusted range list.

Examples

## Prepare some sample data       
df<-data.frame(exam_lang_score=c(80,88,85,82,34,34),age=c(6,7,8,6,7,8),height =c(5,6,6,7,5,7))
## Find P(exam_lang_score >= 80 ~ age ) where age is divided into 3 groups.                                     
calc_cond_prob(df, "exam_lang_score >= 80  ~ age ",  range_list=list(3))

## Find P(exam_lang_score >= 80 ~ age  ) 
## where age is divided into two groups as (5, 6.5) and (6.5 , 10) 
calc_cond_prob(df, "exam_lang_score >= 80  ~ age ",  range_list=list( list(c(5,6.5), c(6.5,10) )))
## the above is the same as below
calc_cond_prob(df, "exam_lang_score >= 80  ~ age ",  range_list=list( c(5,6.5,10) )) 

Filter Data Based on Odds

Description

Filter Data Based on Odds

Usage

goodchance(df, col_name = "odd", upper = 0.75, lower = 0.25)

Arguments

df

A data.frame containing the data to be evaluated. The DataFrame must include the column specified by 'col_name' containing odds values.

col_name

A string representing the name of the column in the DataFrame that contains the odds values. The default is 'odd', which means the function will look for an 'odd' column in the DataFrame.

upper

A numeric threshold for the upper bound of the odds. Rows with an 'odd' value greater than or equal to this threshold will be included in the output. The default value is 0.75.

lower

A numeric threshold for the lower bound of the odds. Rows with an 'odd' value less than or equal to this threshold and greater than 0 will also be included in the output. The default value is 0.25.

Value

A filtered data.frame containing only the rows where the odds meet the specified conditions (either above the upper threshold or below the lower threshold and greater than 0).

Examples

## Prepare some sample data       
df<-data.frame(exam_lang_score=c(80,88,85,82,34,34),age=c(6,7,8,6,7,8),height =c(5,6,6,7,5,7))
## Find P(exam_lang_score >= 80 ~ age + height) 
## where age is divided into 3 groups and height into 4 groups.   
res=calc_cond_prob(df, "exam_lang_score >= 80 ~ age + height", range_list=list( 3,4))
## Use the results to calculate the conditional probability for 
## the power set of 'age' and 'height'.         
summary_result_list=shortSummary(res[[1]], "age + height ", combination=1)
## Extract the results of the high and low odds obtained from 
## executing the cal_cond_prob() function. 
lapply(summary_result_list, goodchance, upper=0.7, lower=0.25)

Generate Short Summary of Grouped Data after the Output by calc_cond_prob

Description

Generate Short Summary of Grouped Data after the Output by calc_cond_prob

Usage

shortSummary(df, coln = "Weekday , wkhwk.c.bp", combination = 1)

Arguments

df

A data.frame containing the raw data to be summarized. The DataFrame must include the columns specified in the 'coln' parameter for grouping.

coln

A comma-separated string of column names to be used for grouping in the summary. Whitespace will be trimmed from the specified column names. For example: "Weekday, wkhwk.c.bp".

combination

An integer indicating how to handle column combinations for the summary. When set to 1, the function generates all possible combinations of the columns specified in 'coln'. For any other value, only the specified columns will be used.

Value

A list of data.frames containing the summarized results for each group based on the specified columns. Each data.frame will include the specified grouping columns, the sum of hits ('hit_sum'), the sum of totals ('total_sum'), and the calculated odds ('odd').

A list.

Examples

## Prepare some sample data          
df<-data.frame(exam_lang_score=c(80,88,85,82,34,34),age=c(6,7,8,6,7,8),height =c(5,6,6,7,5,7))
## Find P(exam_lang_score >= 80 ~ age + height) 
## where age is divided into 3 groups and height into 4 groups.   
res=calc_cond_prob(df, "exam_lang_score >= 80 ~ age + height", range_list=list( 3,4))
## Use the results to calculate the conditional probability for 
## the power set of 'age' and 'height'.             
shortSummary(res[[1]], "age + height ", combination=1)