Type: Package
Title: Tools for Population Health Management Analytics
Version: 1.0.2
Maintainer: Asif Laldin <laldin.asif@gmail.com>
Description: Created for population health analytics and monitoring. The functions in this package work best when working with patient level Master Patient Index-like datasets . Built to be used by NHS bodies and other health service providers.
License: AGPL (≥ 3)
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.1.2
Imports: ggplot2, dplyr, scales, janitor, readr, utils, ggthemes, magrittr, readxl, ggtext, DBI, odbc, rlang, tibble
Suggests: knitr, rmarkdown, testthat (≥ 3.0.0), kableExtra
Config/testthat/edition: 3
VignetteBuilder: knitr
Depends: R (≥ 2.10)
Language: en-gb
NeedsCompilation: no
Packaged: 2022-02-20 12:52:44 UTC; ald04
Author: Asif Laldin [aut, cre], Gary Hutson ORCID iD [aut]
Repository: CRAN
Date/Publication: 2022-02-20 13:10:02 UTC

Pipe operator See magrittr::%>% for details.

Description

Pipe operator See magrittr::%>% for details.

Usage

lhs %>% rhs

Arguments

lhs

A value or the magrittr placeholder.

rhs

A function call using the magrittr semantics.

Value

The result of calling 'rhs(lhs)'.


PopHealthData - Population health data for testing functions

Description

Population Health NHS data to use with the package and allows the calculation of the various metrics.

Usage

PopHealthData

Format

A small dataset with 1000 observations (rows) and 8 columns, as described hereunder:

Sex

The identifiable sex of the patient

Smoker

Indicates if the patient is a smoker

Diabetes

Flag to indicate if patient has a type of diabetes

AgeBand

The age of the patient when they came into contact with the service

IMD_Decile

The decile of indices of multiple deprivation: https://www.gov.uk/government/statistics/english-indices-of-deprivation-2019

Ethnicity

The identifiable ethnicity of the patient

Locality

The region where the patient lives - sampled from Gloucestershire Clinical Commissioning Group

PrimaryCareNetwork

The primary care network of the patient


Age Band Creation: Create a new column of 5 year Age Bands from an integer column

Description

Age Band Creation: Create a new column of 5 year Age Bands from an integer column

Usage

age_bandizer(df, Age_col)

Arguments

df

a tidy dataframe in standard Master Patient Index format ie SangerTools::PopHealthData

Age_col

a integer column within @param df NAs must be removed or imputed prior to running this function

Value

A dataframe with width ncol(df)+1, new column will be named Ageband and will be a factor with levels defined

Examples

library(SangerTools)
library(dplyr)
health_data <- SangerTools::PopHealthData

Create age bands from a numerical column

Description

An alternative age banding function that allows users greater flexibility for defining band size. This function utilises Base R standard evaluation. The function currently supports band size of 2, 5, 10 & 20. The input,column, Age_col should be numeric and must not contain NAs; if either of these conditions is violated the function will terminate.

Usage

age_bandizer_2(df, Age_col, Age_band_size = 5)

Arguments

df

A dataframe with a numerical column denoting Age.

Age_col

A numerical column within 'df'; passed with quotation marks.

Age_band_size

The size of the Age band to use. Defaults to 5; will take values 2,5,10,20.

Value

A dataframe containing a new column 'Ageband' which has factor levels defined.

Examples

## Not run: 
library(SangerTools)
df <- data.frame(Age = sample(x = 0:120, size = 100, replace = TRUE))
df_agebanded <- age_bandizer_2(
  df = df,
  Age_col = "Age",
  Age_band = 5
)
print(df_agebanded)

## End(Not run)

Plot Counts of Categorical Variables

Description

Create a ggplot2 column chart of categorical variables with labels, in ascending order. The plot will be customised using the provided theme theme_sanger, y-axis labels will have a comma for every third integer value. If the column provided to 'grouping_var' has more than approximately 5 values, you may need to consider rotating x axis labels using theme

A comprehensive explanation of ggplot2 customisation is available here

Usage

categorical_col_chart(df, grouping_var)

Arguments

df

A dataframe with categorical variables

grouping_var

a categorical variable by which to group the count by

Value

a ggplot2 object

Examples

library(SangerTools)
library(dplyr)
library(ggplot2)
# Group by Age Band
health_data <- SangerTools::PopHealthData
health_data %>%
  dplyr::filter(Smoker == 1) %>%
  SangerTools::categorical_col_chart(AgeBand) +
  labs(
    title = "Smoking Population by Age Band",
    subtitle = "Majority of Smokers are Working Aged ",
    x = NULL,
    y = "Patient Number"
  )

Patient Cohort Re-Identification Processing

Description

Population Health Management commonly leads practitioners to identify a cohort that will have an intervention applied. As a rule of thumb most analysts will work with pseudonymised data sets. For targeted interventions patients require re-identification; this process is generally carried out by a third party organisation. As third party organisations work with many health care providers they have a strict set of requirements. This has been based around SW CSU's required formatting.

Usage

cohort_processing(
  df,
  Split_by,
  path,
  prefix = "DSCRO",
  com_code = "11M",
  date_format = "%Y%m%d",
  suffix = "_REID_V01"
)

Arguments

df

a tidy dataframe in standard Master Patient Index format ie SangerTools::PopHealthData.

Split_by

A column within df that will be used to split the patients and will also appear in the file name. Ideally should be a health organisation code such as GP Practice Code or Hospital Trust Code. Should only have alpha-numeric values

path

A file path to which the CSV files will be written

prefix

File name prefix, default is "DSCRO" See more here: NHS DSCRO

com_code

Commissioner Code, default is "11M"; Gloucestershire.

date_format

A date format passed internally to 'format(Sys.Date())'; will form part of file name to denote date of generation. You can read more about date formatting in R from R lang

suffix

A file name suffix, default is "_REID_V01", To be left as blank use "", without spaces.

Value

n number of CSV files written to the location specified by path argument.


Crude Prevalence Calculator

Description

Calculate the crude prevalence of a health condition from a Master Patient Index like dataset

Usage

crude_rates(df, Condition, ...)

Arguments

df

a tidy dataframe in standard Master Patient Index format ie SangerTools::PopHealthData

Condition

A Health condition flag denoted by 1 & 0; where 1 denotes the patient being positive for the health condition

...

Variables used to standardise by; Must always have Ageband, additional variables are optional

Value

a tibble with Crude Prevalence Rates(Rate per 1,000) for each value included in ...

Examples

library(SangerTools)
library(dplyr)
health_data <- SangerTools::PopHealthData
glimpse(health_data)
# Generate crude prevalene rate stats
crude_prevalence <- SangerTools::crude_rates(health_data, Diabetes, Locality)
print(crude_prevalence)

Dataframe to SQL

Description

DataFrame to SQL; Write your DataFrame or Tibble directly to SQL from R This wrapper function allows for the easy movement of your computed results in R to a SQL Database for saving. The function uses a ODBC driver to establish a connection. You will need to select a Database that your user has write-access to. The user credentials are the same as your OS login details; as such this function will most likely only work from you work computer.

Usage

df_to_sql(df, driver, server, database, sql_table_name, overwrite = FALSE, ...)

Arguments

df

A 'dataFrame' or 'tibble' ie PopHealthData.

driver

A driver for database ie "SQL Server"; must be passed in quotation.

server

The unique name of your database server; must be passed in quotation.

database

The name of the database to which you will write 'df'; must be passed in quotation.

sql_table_name

The name that 'df' will be referred to in SQL database; must be passed in quotation.

overwrite

If there is a SQL table with the same name whether it will be overwritten; defaults to FALSE.

...

Function forwarding for additional functionality.

Value

A message confirming that a new table has been created in a SQL 'database'.

Examples

## Not run: 
library(odbc)
library(DBI)
health_data <- SangerTools::PopHealthData
df_to_sql(
  df = health_data,
  driver = "SQL SERVER",
  database = "DATABASE",
  sql_table_name = "New Table Name",
  overwrite = FALSE
)

## End(Not run)

Dataframe or Tibble to Clipboard

Description

This function copies a data frame or tibble to your clipboard in a format that allows for a simple paste into excel whilst maintaining column and row structure. By default row_names has been set to FALSE.

Usage

excel_clip(df, row_names = FALSE, col_names = TRUE, ...)

Arguments

df

A dataframe or tibble

row_names

Set to FALSE for row.names not to be included

col_names

Set to TRUE for col.names to be included

...

function forwarding for additional write.table functionality

Value

a data frame copied to your clipboard


Master Patient Index

Description

A fabricated Master Patient Index (MPI) inspired by Gloucestershire's population to be used with functions included in SangerTools

Usage

master_patient_index

Format

A tibble with 10,000 rows and 11 variables:

PseudoNHSNumber

A Pseudonymised NHS Patient Identifier

Sex

The identifiable sex of the patient

Smoker

Health Condition Flag: 1 denotes if the patient is a smoker

Diabetes

Health Condition Flag: 1 denotes if the patient has diabetes

Dementia

Health Condition Flag: 1 denotes if the patient has dementia

Obesity

Health Condition Flag: 1 denotes if the patient is Obese

Age

Age of the patient

IMD_Decile

The decile of indices of multiple deprivation: https://www.gov.uk/government/statistics/english-indices-of-deprivation-2019

Ethnicity

The identifiable ethnicity of the patient

Locality

The region where the patient lives - sampled from Gloucestershire Clinical Commissioning Group

PrimaryCareNetwork

The network of General Practioners that the patient is registerd with - sampled from Gloucestershire Clinical Commissioning Group

Source

Generated by Asif Laldin a.laldin@nhs.net, Feb-2022

Examples

library(dplyr)
data(master_patient_index)
# Convert diabetes data to factor'
master_patient_index %>%
 glimpse()

Read Multiple CSV files into R

Description

This function reads multiple CSVs in a directory must be same structure. This function reads multiple excel files into R after which all files are aggregated into a single data frame.

There are assumptions about they underlying files:

Usage

multiple_csv_reader(file_path, sheet = 1, rows_to_skip = 0, col_names = TRUE)

Arguments

file_path

The Directory in which the files are located

sheet

Sheet to read. Either a string (the name of a sheet), or an integer (the position of the sheet). Defaults to the first sheet

rows_to_skip

The number of rows from the top to be excluded

col_names

If columns are named; defaults to TRUE

Value

a data frame object full of file paths

Examples

library(SangerTools)
file_path <- "my_file_path_where_csvs_are_stored"
if (length(SangerTools::multiple_csv_reader(file_path)) == 0) {
  message("This won't work without changing the variable input to a local file path with CSVs in")
}

Read Multiple Excel files into R

Description

This function reads multiple excel files into R after which all files are aggregated into a single data frame.

There are assumptions about they underlying files:

To understand more about the underlying function that 'multiple_excel_reader' wraps around Click Here

Usage

multiple_excel_reader(
  file_path,
  pattern = "*.xlsx",
  sheet = 1,
  rows_to_skip = 0,
  col_names = TRUE
)

Arguments

file_path

The Directory in which the files are located

pattern

The file extension of the files of which you are going to read. Defaults to "*.xlsx"

sheet

Sheet to read. Either a string (the name of a sheet), or an integer (the position of the sheet). Defaults to the first sheet

rows_to_skip

The number of rows from the top to be excluded

col_names

A boolean value to determine if column headers name are present in files. Currently only accepts TRUE

Value

a data frame object full of file paths

Examples

## Not run: 
combined_excel_files <- multiple_excel_reader("Inputs/", 1, TRUE)

## End(Not run)

Branded discrete colour scale

Description

This anonymous function allows you to apply the Sanger Theme colours to your ggplot2 plot

Usage

scale_fill_sanger()

Value

A custom colour filled ggplot2 plot

Examples

library(SangerTools)
library(dplyr)
library(ggplot2)
# Group by Age Band
health_data <- SangerTools::PopHealthData
health_data %>%
  dplyr::filter(Smoker == 1) %>%
  SangerTools::categorical_col_chart(AgeBand) +
  labs(
    title = "Smoking Population by Age Band",
    subtitle = "Majority of Smokers are Working Aged ",
    x = NULL,
    y = "Patient Number"
  )+
  scale_fill_sanger()

Brand Colour Palette

Description

Displays a brand colour palette for showing the hex codes associated with brand

Usage

show_brand_palette()

Value

a Base R plot object

Examples

library(scales)
library(SangerTools)
show_brand_palette()

Extended Brand Colour Palette

Description

Displays extended brand colour palette for charting

Usage

show_extended_palette()

Value

a Base R plot object

Examples

library(scales)
library(SangerTools)
show_extended_palette()

Split & Save

Description

A simpler alternative to cohort_processing. Will split a data frame and save as a csv

Usage

split_and_save(df, Split_by, path, prefix = NULL)

Arguments

df

A 'dataFrame' or 'tibble' ie PopHealthData.

Split_by

A column within df that will be used to split the patients and will also appear in the file name. Ideally should be a health organisation code such as GP Practice Code or Hospital Trust Code. Should only have alpha-numeric values

path

A file path to which the CSV files will be written

prefix

File name prefix

Value

n number of CSV files written to the location specified by path argument.

Examples

 ## Not run: 
split_and_save(
 df = pseudo_data,
 Split_by = "Locality",
 file_path = "Inputs/",
 prefix = NULL
)

## End(Not run)




Standardised Prevalence Rates.

Description

Standardisation will be performed for all unique values in the column passed to 'split_by'. If input data frame does not contain age bands or age bands are not of class factor, it is recommended to use age_bandizer or age_bandizer_2. After the function has run, the output can be copied using excel_clip or written to a database using df_to_sql. Alternatively, if you are interested in seeing the effects of age confounding; consider joining the outputs of this function with the output from crude_rates using a left_join

Usage

standardised_rates_df(
  df,
  Split_by,
  Condition,
  Population_Standard,
  Granular = FALSE,
  ...
)

Arguments

df

a tidy data frame in standard Master Patient Index format ie SangerTools::PopHealthData.

Split_by

A column name within df for which the standardised rates will be calculated for.

Condition

A Health condition flag denoted by 1 & 0; where 1 denotes the patient being positive for the health condition.

Population_Standard

Population Standard Weight used for Standardising; default set to NULL; which denotes use of Age Structure of df.

Granular

Takes a boolean value. If set to TRUE will output a tibble with Standardised Rates using values provided in 'Split_col' and '...'By default is set to FALSE.

...

Variables used to standardise by; Must always have Age band for age standardisation, additional variables are optional and should be passed separated by commas.

Value

A tibble containing standardised Prevalence Rates by specified group.

Examples

library(SangerTools)
health_data <- SangerTools::age_bandizer(df = SangerTools::master_patient_index,
                                         Age_col=Age)
df_rates <- standardised_rates_df(
  df = health_data,
  Split_by = Locality,
  Condition = Diabetes,
  Population_Standard = NULL,
  Granular = TRUE,
  Ageband
)
print(df_rates)

Customised ggplot2 Theme

Description

A customised ggplot2 theme for the SangerTools package

Usage

theme_sanger()

Value

A customised ggplot2 plot

Examples

library(SangerTools)
library(ggthemes)
library(ggplot2)
library(ggtext)
categorical_col_chart(SangerTools::PopHealthData, Locality) +
  theme_sanger()+
  labs(title = "Categorical Column Chart",
  x = "Locality",
  y = "Number of Patients")+
  scale_fill_sanger()

Data set of 2018 UK Population

Description

Data is taken from ONS and is split into 5 year age band

Usage

uk_pop_standard

Format

A tibble with 29 rows and 2 variables:

UK_Population

dbl Year price was recorded

Ageband

5 Year age band for population

Source

https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationestimates