Help for package emcAdr

Type:

Package

Title:

Evolutionary Version of the Metropolis-Hastings Algorithm

Version:

1.2

Date:

2025-01-31

Author:

Jules Bangard

[aut, cre]

Maintainer:

Jules Bangard <jules.bangard@etu.unistra.fr>

Description:

Provides computational methods for detecting adverse high-order drug interactions from individual case safety reports using statistical techniques, allowing the exploration of higher-order interactions among drug cocktails.

License:

GPL-3

Imports:

Rcpp (≥ 1.0.7), ggplot2, dplyr, umap, dbscan, stats

LinkingTo:

Rcpp, RcppArmadillo

RoxygenNote:

7.2.3

LazyData:

true

Suggests:

knitr, rmarkdown, gridExtra

VignetteBuilder:

knitr

NeedsCompilation:

yes

Packaged:

2025-02-26 12:36:14 UTC; julesbangard

Depends:

R (≥ 3.5.0)

Repository:

CRAN

Date/Publication:

2025-02-27 17:00:02 UTC

Evolutionary Version of the Metropolis-Hastings Algorithm

Description

Author(s)

Jules Bangard [aut, cre] (<https://orcid.org/0009-0007-4670-7860>)

Maintainer: Jules Bangard <jules.bangard@etu.unistra.fr>

ATC Tree Upper Bound 2024

Description

Example dataset representing the ATC tree structure, sourced from the WHO website (2024-02-23). This dataset is provided for demonstration and testing purposes with the package.

Usage

ATC_Tree_UpperBound_2024

Format

A data frame with 4 variables:

ATCCode: The code of ATC nodes
Name: The name of ATC nodes
ATC_length: The number of characters in the ATCCode
upperBound: The index of the last child node in the tree

Source

World Health Organization, ATC classification register

Convert ATC Code for each patients to the corresponding DFS number of the ATC tree

Description

Convert ATC Code for each patients to the corresponding DFS number of the ATC tree

Usage

ATCtoNumeric(patientATC, tree)

Arguments

patientATC

: patients observations, for each patient we got a string containing taken medications (ATC code)

tree

: ATC tree (we assume that there is a column 'ATCCode' )

Value

a matrix of the same size as patientATC but containing integer that are the index of the corresponding ATC code.

Examples

 ATC_code <- c('A01AA30 A01AB03', 'A10AC30')
 ATCtoNumeric(ATC_code, ATC_Tree_UpperBound_2024)

The MCMC method that runs the random walk on a single cocktail in order to estimate the distribution of score among cocktails of size Smax.

Description

The MCMC method that runs the random walk on a single cocktail in order to estimate the distribution of score among cocktails of size Smax.

Usage

DistributionApproximation(
  epochs,
  ATCtree,
  observations,
  temperature = 1L,
  nbResults = 5L,
  Smax = 2L,
  p_type1 = 0.01,
  beta = 4L,
  max_score = 500L,
  num_thread = 1L,
  verbose = FALSE
)

Arguments

epochs

: number of steps for the MCMC algorithm

ATCtree

: ATC tree with upper bound of the DFS (without the root, also see on the github repo for an example)

observations

: real observation of the AE based on the medications of each patients (a DataFrame containing the medication on the first column and the ADR (boolean) on the second)

temperature

: starting temperature, default = 1 (denoted T in the article)

nbResults

: Number of returned solution (Cocktail of size Smax with the best oberved score during the run), 5 by default

Smax

: Size of the cocktail we approximate the distribution from

p_type1

: probability to operate type1 mutation. Note : the probability to operate the type 2 mutation is then 1 - P_type1. P_type1 must be in [0;1]. Default is .01

beta

: filter the minimum number of patients that must have taken the cocktail for his risk to be taken into account in the DistributionScoreBeta default is 4

max_score

: maximum number the score can take. Score greater than this one would be added to the distribution as the value max_score. Default is 500

num_thread

: Number of thread to run in parallel if openMP is available, 1 by default

verbose

: Output summary (default is false)

Value

I no problem, return a List containing : - ScoreDistribution : the distribution of the score as an array with each cells representing the number of risks = (index-1)/ 10 - Outstanding_score : An array of the score greater than max_score, - Best_cocktails : the nbResults bests cocktails encountered during the run. - Best_scores : Score corresponding to the bestCocktails. - FilteredDistribution : Distribution containing score for cocktails taken by at least beta patients. - Best_cocktails_beta : the nbResults bests cocktails taken by at least beta patients encountered during the run. - Best_scores_beta : Score corresponding to the bestCocktailsBeta. - cocktailSize : Smax parameter used during the run. ; Otherwise the list is empty

Examples


data("ATC_Tree_UpperBound_2024")
data("FAERS_myopathy")

estimation = DistributionApproximation(epochs = 10, ATCtree = ATC_Tree_UpperBound_2024,
            observations = FAERS_myopathy)

FAERS Myopathy Dataset

Description

Example dataset representing drug intake and adverse event reports from FAERS. This dataset is provided to demonstrate the functionality of genetic and MCMC algorithms in the package.

Usage

FAERS_myopathy

Format

A data frame with 2 columns:

patientATC: Drug intake for each patient as a vector of ATC tree indices
patientADR: Indicates if the patient experienced myopathy as an adverse event

Source

Food & Drug Administration Event Reporting System (FAERS)

Genetic algorithm, trying to reach riskiest cocktails (the ones which maximize the fitness function, Hypergeometric score in our case)

Description

Genetic algorithm, trying to reach riskiest cocktails (the ones which maximize the fitness function, Hypergeometric score in our case)

Usage

GeneticAlgorithm(
  epochs,
  nbIndividuals,
  ATCtree,
  observations,
  num_thread = 1L,
  diversity = FALSE,
  p_crossover = 0.8,
  p_mutation = 0.01,
  nbElite = 0L,
  tournamentSize = 2L,
  alpha = 1,
  summary = TRUE
)

Arguments

epochs

: number of step or the algorithm

nbIndividuals

: size of the population

ATCtree

: ATC tree with upper bound of the DFS (without the root)

observations

: real observation of the AE based on the medications of each patients (a DataFrame containing the medication on the first column and the ADR (boolean) on the second)

num_thread

: Number of thread to run in parallel if openMP is available, 1 by default

diversity

: enable the diversity mechanism of the algorithm (favor the diversity of cocktail in the population), default is false

p_crossover

: probability to operate a crossover on the crossover phase. Default is 80%

p_mutation

: probability to operate a mutation after the crossover phase. Default is 1%

nbElite

: number of best individual we keep from generation to generation. Default is 0

tournamentSize

: size of the tournament (select the best individual between tournamentSize sampled individuals)

alpha

: when making a type 1 mutation you have (alpha / size of cocktail) chance to add a drug.

summary

: print the summary of population at each steps ?

Value

If no problem, return a List : - meanFitnesses : The mean score of the population at each epochs of the algorithm. - BestFitnesses : The best score of the population at each epochs of the algorithm. - FinalPopulation : The final population of the algorithm when finished (medications and corresponding scores)

Examples


data("ATC_Tree_UpperBound_2024")
data("FAERS_myopathy")

results = GeneticAlgorithm(epochs = 10, nbIndividuals = 10, 
            ATCtree = ATC_Tree_UpperBound_2024,
            observations = FAERS_myopathy)

Output the outstanding score (Outstanding_score) outputed by the MCMC algorithm in a special format

Description

Output the outstanding score (Outstanding_score) outputed by the MCMC algorithm in a special format

Usage

OutsandingScoreToDistribution(outstanding_score, max_score)

Arguments

outstanding_score

: Outstanding_score outputed by MCMC algorithm to be converted to the ScoreDistribution format

max_score

: max_score parameter used during the MCMC algorithm

Value

outstanding_score in a format compatible with MCMC algorithm output

Examples


 data("ATC_Tree_UpperBound_2024")
 data("FAERS_myopathy")

  DistributionApproximationResults = DistributionApproximation(epochs = 10,
            ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy)
  OutsandingScoreToDistribution(DistributionApproximationResults$Outstanding_score, max_score = 100)

Calculate the divergence between 2 distributions (the true Distribution and the learned one)

Description

Calculate the divergence between 2 distributions (the true Distribution and the learned one)

Usage

calculate_divergence(
  empirical_distribution,
  true_distribution,
  method = "TV",
  Filtered = FALSE
)

Arguments

empirical_distribution

A numeric vector of values representing the empirical distribution (return value of DistributionAproximation function)

true_distribution

A numeric vector of values representing the true distribution computed by the trueDistributionSizeTwoCocktail function

method

A string, either "TV" or "KL" to respectively use the total variation distance or the Kullback-Leibler divergence. (default = "TV")

Filtered

Should we use the filtered distribution or the normal one

Value

A numeric value representing the divergence of the 2 distributions

Examples


data("ATC_Tree_UpperBound_2024")
data("FAERS_myopathy")

estimated_score_distribution = DistributionApproximation(epochs = 10,
            ATCtree = ATC_Tree_UpperBound_2024,
            observations = FAERS_myopathy[1:100,], Smax =2)
            
true_score_distribution = trueDistributionSizeTwoCocktail(ATCtree = ATC_Tree_UpperBound_2024,
            observations = FAERS_myopathy[1:100,], beta = 4)

divergence <- calculate_divergence(empirical_distribution = estimated_score_distribution,
                true_distribution = true_score_distribution)

Clustering of the solutions of the genetic algorithm using the hclust algorithm

Description

Clustering of the solutions of the genetic algorithm using the hclust algorithm

Usage

clustering_genetic_algorithm(
  genetic_results,
  ATCtree,
  dist.normalize = TRUE,
  umap_config = NULL
)

Arguments

genetic_results

A list of cocktails in the form of integer vector

ATCtree

ATC tree with upper bound of the DFS

dist.normalize

Do we normalize the distance (so it belongs to [0;1])

umap_config

The configuration to use in order to project the cocktails in a smaller space (umap::umap.defaults by default)

Value

A dataframe containing UMAP 1/2 the two coordinates of each cocktails in the plane as well as the cluster number of each cocktails

Examples


 data("ATC_Tree_UpperBound_2024")

 results = GeneticAlgorithm(epochs = 10, nbIndividuals = 10, 
            ATCtree = ATC_Tree_UpperBound_2024,
            observations = FAERS_myopathy)

 hclust_genetic_solution(genetic_results = results,
                 ATCtree = ATC_Tree_UpperBound_2024)

Function used in the reference article to compare diverse Disproportionality Analysis metrics

Description

Function used in the reference article to compare diverse Disproportionality Analysis metrics

Usage

computeMetrics_size2(CocktailList, ATCtree, observations, num_thread = 1L)

Arguments

CocktailList

: A list of cocktails on which the Disproportionality analysis metrics should be computed

ATCtree

: ATC tree with upper bound of the DFS (without the root)

observations

: observation of the AE based on the medications of each patients (a DataFrame containing the medication on the first column and the ADR (boolean) on the second) on which we want to compute the risk distribution

num_thread

: Number of thread to run in parallel if openMP is available, 1 by default

Value

Multiple DA metrics computed on CocktailList cocktails

Examples


data("ATC_Tree_UpperBound_2024")
data("FAERS_myopathy")

cocktails = list(c(561, 904),
               c(1902, 4585)) # only size 2 cocktails allowed for this function

scores_of_cocktails = computeMetrics_size2(CocktailList = cocktails,
                              ATCtree = ATC_Tree_UpperBound_2024, 
                              observations = FAERS_myopathy[1:100,])

Function used to compute the Relative Risk on a list of cocktails

Description

Function used to compute the Relative Risk on a list of cocktails

Usage

compute_RR_on_list(cocktails, ATCtree, observations, num_thread = 1L)

Arguments

cocktails

: A list containing cocktails in the form of vector of integers (ATC index)

ATCtree

: ATC tree with upper bound of the DFS (without the root)

observations

num_thread

: Number of thread to run in parallel if openMP is available, 1 by default

Value

RR score among "cocktails" parameters

Examples


data("ATC_Tree_UpperBound_2024")
data("FAERS_myopathy")

cocktails = list(c(561, 904),
               c(1902, 4585))

RR_of_cocktails = compute_RR_on_list(cocktails = cocktails,
                              ATCtree = ATC_Tree_UpperBound_2024, 
                              observations = FAERS_myopathy)

Function used to compute the Hypergeometric score on a list of cocktails

Description

Function used to compute the Hypergeometric score on a list of cocktails

Usage

compute_hypergeom_on_list(cocktails, ATCtree, observations, num_thread = 1L)

Arguments

cocktails

: A list containing cocktails in the form of vector of integers (ATC index)

ATCtree

: ATC tree with upper bound of the DFS (without the root)

observations

num_thread

: Number of thread to run in parallel if openMP is available, 1 by default

Value

Hypergeometric score among "cocktails" parameters

Examples


data("ATC_Tree_UpperBound_2024")
data("FAERS_myopathy")

cocktails = list(c(561, 904),
               c(1902, 4585))

Hypergeom_of_cocktails = compute_hypergeom_on_list(cocktails = cocktails,
                              ATCtree = ATC_Tree_UpperBound_2024, 
                              observations = FAERS_myopathy)

Function used to convert your genetic algorithm results that are stored into a .csv file to a Data structure that can be used by the clustering algorithm

Description

Function used to convert your genetic algorithm results that are stored into a .csv file to a Data structure that can be used by the clustering algorithm

Usage

csv_to_population(ATC_name, filename, sep = ";")

Arguments

ATC_name

the ATC_name column of the ATC tree

filename

Name of the file where the results are located

sep

the separator to use when opening the csv file (';' by default)

Value

An R List that can be used by other algorithms (e.g. clustering algorithm)

Examples


  data("ATC_Tree_UpperBound_2024")
  genetic_results = csv_to_population(ATC_Tree_UpperBound_2024$Name,
                    "path/to/output.csv")

Recover the square matrix of distance between cocktails where the index (i,j) of the matrix is the distance between cocktails i and j in an arbitrary cocktail list

Description

Recover the square matrix of distance between cocktails where the index (i,j) of the matrix is the distance between cocktails i and j in an arbitrary cocktail list

Usage

get_dissimilarity_from_cocktail_list(cocktails, ATCtree, normalization = TRUE)

Arguments

cocktails

: A list of cocktails in the form of a vector of integer

ATCtree

: ATC tree with upper bound of the DFS (without the root)

normalization

: Do we keep the distance between cocktail in the range [0;1] ?

Value

The square matrix of distances between cocktails

Examples


data("ATC_Tree_UpperBound_2024")

cocktails = list(c(561, 904),
               c(1902, 4585)) # only size 2 cocktails allowed for this function

distance_matrix = get_dissimilarity_from_cocktail_list(cocktails = cocktails,
                              ATCtree = ATC_Tree_UpperBound_2024, 
                              normalization = TRUE)

Recover the square matrix of distance between cocktails where the index (i,j) of the matrix is the distance between cocktails i and j in the genetic_results list.

Description

Recover the square matrix of distance between cocktails where the index (i,j) of the matrix is the distance between cocktails i and j in the genetic_results list.

Usage

get_dissimilarity_from_genetic_results(genetic_results, ATCtree, normalization)

Arguments

genetic_results

the List returned by the genetic algorithm.

ATCtree

: ATC tree with upper bound of the DFS (without the root)

normalization

: Do we keep the distance between cocktail in the range [0;1] ?

Value

The square matrix of distances between cocktails

Examples


 data("ATC_Tree_UpperBound_2024")
 data("FAERS_myopathy")
 
 genetic_results = GeneticAlgorithm(epochs = 10, nbIndividuals = 10,
            ATCtree = ATC_Tree_UpperBound_2024,
            observations = FAERS_myopathy)
 distance_matrix = get_dissimilarity_from_genetic_results(genetic_results = genetic_results,
                        ATCtree = ATC_Tree_UpperBound_2024, normalization = TRUE)

Recover the square matrix of distance between cocktails where the index (i,j) of the matrix is the distance between cocktails i and j in the csv file containing results of genetic algorithm

Description

Recover the square matrix of distance between cocktails where the index (i,j) of the matrix is the distance between cocktails i and j in the csv file containing results of genetic algorithm

Usage

get_dissimilarity_from_txt_file(filename, ATCtree, normalization = TRUE)

Arguments

filename

: the name of the file returned by the print_csv function.

ATCtree

: ATC tree with upper bound of the DFS (without the root)

normalization

: Do we keep the distance between cocktail in the range [0;1] ?

Value

The square matrix of distances between cocktails

Examples


 data("ATC_Tree_UpperBound_2024")
 
 distance_matrix = get_dissimilarity_from_txt_file(filename = '250e_700ind_0.2mr_0ne_2alpha.txt',
                        ATCtree = ATC_Tree_UpperBound_2024, normalization = TRUE)

Clustering of the solutions of the genetic algorithm using the hclust algorithm

Description

Clustering of the solutions of the genetic algorithm using the hclust algorithm

Usage

hclust_genetic_solution(
  genetic_results,
  ATCtree,
  dist.normalize = TRUE,
  method = "complete"
)

Arguments

genetic_results

The return value of the genetic algorithm

ATCtree

ATC tree with upper bound of the DFS

dist.normalize

Do we normalize the distance (so it bellongs to [0;1])

method

(from hclust function) the agglomeration method to be used. This should be (an unambiguous abbreviation of) one of "ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC).

Value

the hierarchical clustering of the results of the genetic algorithm

Examples


data("ATC_Tree_UpperBound_2024")
data("FAERS_myopathy")

results = GeneticAlgorithm(epochs = 10, nbIndividuals = 10, 
            ATCtree = ATC_Tree_UpperBound_2024,
            observations = FAERS_myopathy)

hclust_genetic_solution(genetic_results = results,
                 ATCtree = ATC_Tree_UpperBound_2024)

Convert the histogram returned by the DistributionApproximation function, to a real number distribution (that can be used in a test for example)

Description

Convert the histogram returned by the DistributionApproximation function, to a real number distribution (that can be used in a test for example)

Usage

histogramToDitribution(vec)

Arguments

vec

: distribution returned by the DistributionAproximationFunction

Value

A vector containing sampled risk during the MCMC algorithm

Examples


data("ATC_Tree_UpperBound_2024")
data("FAERS_myopathy")

 DistributionApproximationResults = DistributionApproximation(epochs = 10,
            ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy)
  histogramToDitribution(DistributionApproximationResults$ScoreDistribution)

This function can be used in order to try different set of parameters for the genetic algorithm in a convenient way. This will run each combination of mutation_rate, nb_elite and alphas possible nb_test_desired times. For each sets of parameters, results will be saved in a file named according to the set of parameter. One can regroup the results of each run in a csv file by using the print_csv function specifying the names of each file that needs to be treated and the number of performed runs on each parameter set

Description

This function can be used in order to try different set of parameters for the genetic algorithm in a convenient way. This will run each combination of mutation_rate, nb_elite and alphas possible nb_test_desired times. For each sets of parameters, results will be saved in a file named according to the set of parameter. One can regroup the results of each run in a csv file by using the print_csv function specifying the names of each file that needs to be treated and the number of performed runs on each parameter set

Usage

hyperparam_test_genetic_algorithm(
  epochs,
  nb_individuals,
  ATCtree,
  observations,
  nb_test_desired,
  mutation_rate,
  nb_elite,
  alphas,
  path = "./",
  num_thread = 1L
)

Arguments

epochs

: the number of epochs for the genetic algorithm

nb_individuals

: the size of the population in the genetic algorithm

ATCtree

: ATC tree with upper bound of the DFS (without the root)

observations

nb_test_desired

: number of genetic algorithm runs on each sets of parameters

mutation_rate

: a vector with each mutation_rate to be tested

nb_elite

: a vector with each nb_elite to be tested

alphas

: a vector with each alphas to be tested

path

: the path where the resulting files should be written

num_thread

: Number of thread to run in parallel if openMP is available, 1 by default

Value

No return value, this function should output results of the runs of the genetic algorithm in a specific format supported by function print_csv and p_value_csv_file. The files are outputed in path which is current directory by default.

Examples


data("ATC_Tree_UpperBound_2024")
data("FAERS_myopathy")

# different parameter to test for
mutation_rate = c(.1,.2,.3)
nb_elite = c(0,1,2)
alphas = c(0.5,1,2)
hyperparam_test_genetic_algorithm(epochs = 2, nb_individuals = 2,
                              ATCtree = ATC_Tree_UpperBound_2024, 
                              observations = FAERS_myopathy,
                              nb_test_desired = 5, mutation_rate = mutation_rate,
                              nb_elite = nb_elite, alphas = alphas)

Function used to convert integer cocktails (like the one outputed by the distributionApproximation function) to string cocktail in order to make them more readable

Description

Function used to convert integer cocktails (like the one outputed by the distributionApproximation function) to string cocktail in order to make them more readable

Usage

int_cocktail_to_string_cocktail(cocktails, ATC_name)

Arguments

cocktails

cocktails vector to be converted (index in the ATC tree)

ATC_name

The ATC_name column of the ATC tree

Value

The name of integer cocktails in cocktails

Examples


  data("ATC_Tree_UpperBound_2024")
  int_list = list(c(561, 904),
               c(1902, 4585))
  int_cocktail_to_string_cocktail(int_list, ATC_Tree_UpperBound_2024$Name)

Used to add the p_value to each cocktail of cocktail list

Description

Used to add the p_value to each cocktail of cocktail list

Usage

p_value_cocktails(
  distribution_outputs,
  cocktails,
  ATCtree,
  observations,
  num_thread = 1L,
  filtred_distribution = FALSE
)

Arguments

distribution_outputs

A list of distribution of cocktails of different sizes in order to compute the p_value for multiple cocktail sizes

cocktails

A list containing cocktails in the form of vector of integers (ATC index)

ATCtree

ATC tree with upper bound of the DFS (without the root)

observations

observation of the AE based on the medications of each patients (a DataFrame containing the medication on the first column and the ADR (boolean) on the second) on which we want to compute the risk distribution

num_thread

Number of thread to run in parallel if openMP is available, 1 by default

filtred_distribution

Does the p-values have to be computed using filtered distribution or normal distribution (filtered distribution by default)

Value

A real valued number vector representing the p-value of the inputed cocktails computed on the distribution_outputs List.

Examples


 data("ATC_Tree_UpperBound_2024")
 data("FAERS_myopathy")
 
  DistributionApproximationResults_size2 = DistributionApproximation(epochs = 10,
            ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy, Smax = 2)
            
  DistributionApproximationResults_size3 = DistributionApproximation(epochs = 10,
            ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy, Smax = 3)
            
  score_distribution_list = list(DistributionApproximationResults_size2,
                              DistributionApproximationResults_size3)

  cocktails = list(c(561, 904),
               c(1902, 4585))
 
  p_value_cocktails(score_distribution_list, cocktails, ATC_Tree_UpperBound_2024,
                    FAERS_myopathy)

Used to add the p_value to each cocktail of a csv_file that is an output of the genetic algorithm

Description

Used to add the p_value to each cocktail of a csv_file that is an output of the genetic algorithm

Usage

p_value_csv_file(
  distribution_outputs,
  filename,
  filtred_distribution = FALSE,
  sep = ";"
)

Arguments

distribution_outputs

A list of distribution of cocktails of different sizes in order to compute the p_value for multiple cocktail sizes

filename

The file name of the .csv file containing the output

filtred_distribution

Does the p-values have to be computed using filtered distribution or normal distribution (filtered distribution by default)

sep

The separator used in the csv file (';' by default)

Value

A real valued number vector representing the p-value of the inputed csv file filename, computed on the distribution_outputs List.

Examples


 data("ATC_Tree_UpperBound_2024")
 data("FAERS_myopathy")

  DistributionApproximationResults_size2 = DistributionApproximation(epochs = 10,
            ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy, Smax = 2)
            
  DistributionApproximationResults_size3 = DistributionApproximation(epochs = 10,
            ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy, Smax = 3)
            
  score_distribution_list = list(DistributionApproximationResults_size2,
                              DistributionApproximationResults_size3)
  p_value_csv_file(score_distribution_list, "path/to/output.csv")

Used to add the p_value to each cocktail of an output of the genetic algorithm

Description

Used to add the p_value to each cocktail of an output of the genetic algorithm

Usage

p_value_genetic_results(
  distribution_outputs,
  genetic_results,
  filtred_distribution = FALSE
)

Arguments

distribution_outputs

A list of distribution of cocktails of different sizes in order to compute the p_value for multiple cocktail sizes

genetic_results

outputs of the genetic algorithm

filtred_distribution

Does the p-values have to be computed using filtered distribution or normal distribution (filtered distribution by default)

Value

A real valued number vector representing the p-value of the inputed genetic algorithm results (genetic_results) computed on the distribution_outputs List.

Examples


 data("ATC_Tree_UpperBound_2024")
 data("FAERS_myopathy")
  DistributionApproximationResults_size2 = DistributionApproximation(epochs = 10,
            ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy, Smax = 2)
            
  DistributionApproximationResults_size3 = DistributionApproximation(epochs = 10,
            ATCtree = ATC_Tree_UpperBound_2024, observations = FAERS_myopathy, Smax = 3)
            
  score_distribution_list = list(DistributionApproximationResults_size2,
                              DistributionApproximationResults_size3)
  genetic_results = GeneticAlgorithm(epochs = 10, nbIndividuals = 20, 
            ATCtree = ATC_Tree_UpperBound_2024,
            observations = FAERS_myopathy)
  p_value_genetic_results(score_distribution_list, genetic_results)

Calculate p-value of sampled value

Description

Calculate p-value of sampled value

Usage

p_value_on_sampled(
  empirical_distribution,
  sampled_values,
  isFiltered = FALSE,
  includeZeroValue = FALSE
)

Arguments

empirical_distribution

A numeric vector of values representing the empirical distribution (return value of DistributionAproximation function)

sampled_values

A scalar or a vector of real valued number representing the sampled value (score to be tested)

isFiltered

A boolean representing if we want to use the filtered distribution or the distribution as is (False by default)

includeZeroValue

A boolean that indicate if you want to take into account the null score (False by default)

Value

A numeric value representing the empirical p-value

Examples


data("ATC_Tree_UpperBound_2024")
data("FAERS_myopathy")

cocktails = list(c(561, 904),
               c(1902, 4585))
               
estimated_score_distribution = DistributionApproximation(epochs = 10,
            ATCtree = ATC_Tree_UpperBound_2024,
            observations = FAERS_myopathy)
            
Hypergeom_of_cocktails = compute_hypergeom_on_list(cocktails = cocktails,
                              ATCtree = ATC_Tree_UpperBound_2024, 
                              observations = FAERS_myopathy)
            
p_value = p_value_on_sampled(empirical_distribution = estimated_score_distribution,
      sampled_values = Hypergeom_of_cocktails)

Plot the evolution of the mean and the best value of the population used by the GeneticAlgorithm

Description

Plot the evolution of the mean and the best value of the population used by the GeneticAlgorithm

Usage

plot_evolution(
  list,
  mean_color = "#F2A900",
  best_color = "#008080",
  xlab = "Epochs",
  ylab = "Score"
)

Arguments

list

A list with 2 elements returned by the GeneticAlgorithm: "mean" and "best", containing the numeric vectors representing the mean and best fitness of the population

mean_color

A string specifying the color of the mean values

best_color

A string specifying the color of the best values

xlab

A string specifying the label for the x-axis

ylab

A string specifying the label for the y-axis

Value

no returned value, should plot the evolution of the genetic algorithm results (mean/max score for each epoch).

Examples


data("ATC_Tree_UpperBound_2024")
data("FAERS_myopathy")

results = GeneticAlgorithm(epochs = 10, nbIndividuals = 10, 
            ATCtree = ATC_Tree_UpperBound_2024,
            observations = FAERS_myopathy)

plot_evolution(list = results)

Plot the histogram of the approximation of the RR distribution

Description

Plot the histogram of the approximation of the RR distribution

Usage

plot_frequency(
  estimated,
  sqrt = FALSE,
  binwidth = 0.1,
  hist_color = "#69b3a2",
  density_color = "#FF5733",
  xlab = "Score"
)

Arguments

estimated

The ScoreDistribution element in the list returned by the DistributionApproximation function

sqrt

A Boolean to specify whether we normalize the estimated or not, it is recommended on large random walk.

binwidth

The width of the histogram bins

hist_color

The fill color for the histogram bars

density_color

The color for the density curve

xlab

Label of X axis

Value

no returned value, should plot the histogram of the estimated distribution (estimated).

Examples


data("ATC_Tree_UpperBound_2024")
data("FAERS_myopathy")

estimation = DistributionApproximation(epochs = 10, ATCtree = ATC_Tree_UpperBound_2024,
            observations = FAERS_myopathy)

plot_frequency(estimated = estimation$ScoreDistribution)

Print every cocktails found during the genetic algorithm when used with the hyperparam_test_genetic_algorithm function. This enables to condense the solutions found in each files by collapsing similar cocktail in a single row by cocktail.

Description

Print every cocktails found during the genetic algorithm when used with the hyperparam_test_genetic_algorithm function. This enables to condense the solutions found in each files by collapsing similar cocktail in a single row by cocktail.

Usage

print_csv(
  input_filenames,
  observations,
  repetition,
  ATCtree,
  csv_filename = "solutions.csv"
)

Arguments

input_filenames

: A List containing filename of hyperparam_test_genetic_algorithm output file

observations

repetition

: The parameter nb_test_desired used in the hyperparam test function

ATCtree

: ATC tree with upper bound of the DFS (without the root)

csv_filename

: Name of the output file, "solutions.csv" by default

Value

No return value, should process the output of the genetic algorithm in files produced by hyperparam_test_genetic_algorithm and output a summary csv file. The csv file is outputed in current directory and named after the csv_filename variable (solutions.csv by default).

Examples


 data("ATC_Tree_UpperBound_2024")
 data("FAERS_myopathy")
 files = c('250e_700ind_0.2mr_0ne_2alpha.txt') # results of hyperparam_test_genetic_algorithm

 print_csv(input_filenames = files, observations = FAERS_myopathy,
          repetition = 5, ATCtree = ATC_Tree_UpperBound_2024)

Make a Quantile-Quantile diagram from the output of the MCMC algorithm (DistributionAproximation) and the algorithm that exhaustively calculates the distribution

Description

Make a Quantile-Quantile diagram from the output of the MCMC algorithm (DistributionAproximation) and the algorithm that exhaustively calculates the distribution

Usage

qq_plot_output(estimated, true, filtered = FALSE, color = "steelblue")

Arguments

estimated

Outputed object of DistributionApproximation function

true

Outputed object of either DistributionApproximation function or True distribution computation function

filtered

Make use of the classic distributuion estimation or of the filtred one (number of patient taking the cocktail > beta)

color

The color of the dashed line of the qq-plot

Value

no returned value, should plot the quantile-quantile plot of the estimated distribution (estimated) vs the true distribution (true).

Examples


data("ATC_Tree_UpperBound_2024")
data("FAERS_myopathy")

estimated_score_distribution = DistributionApproximation(epochs = 10,
            ATCtree = ATC_Tree_UpperBound_2024,
            observations = FAERS_myopathy[1:100,], Smax =2)
            
true_score_distribution = trueDistributionSizeTwoCocktail(ATCtree = ATC_Tree_UpperBound_2024,
            observations = FAERS_myopathy[1:100,], beta = 4)

qq_plot_output(estimated = estimated_score_distribution,
                true = true_score_distribution)

Function used to convert a string vector of drugs in form "drug1:drug2" to a vector of index of the ATC tree ex: c(ATC_index(drug1), ATC_index(drugs2))

Description

Function used to convert a string vector of drugs in form "drug1:drug2" to a vector of index of the ATC tree ex: c(ATC_index(drug1), ATC_index(drugs2))

Usage

string_list_to_int_cocktails(ATC_name, lines)

Arguments

ATC_name

the ATC_name column of the ATC tree

lines

A string vector of drugs cocktail in the form "drug1:drug2:...:drug_n"

Value

An R List that can be used by other algorithms (e.g. clustering algorithm)

Examples


  data("ATC_Tree_UpperBound_2024")
  string_list = c('hmg coa reductase inhibitors:nervous system',
                  'metformin:prasugrel')
  string_list_to_int_cocktails(ATC_Tree_UpperBound_2024$Name,
                              string_list)

The true distribution of the score among every single nodes of the ATC

Description

The true distribution of the score among every single nodes of the ATC

Usage

trueDistributionDrugs(
  ATCtree,
  observations,
  beta,
  max_score = 1000L,
  nbResults = 100L,
  num_thread = 1L
)

Arguments

ATCtree

: ATC tree with upper bound of the DFS (without the root)

observations

beta

: minimum number of person taking the cocktails in order to consider it in the beta score distribution

max_score

: maximum number the score can take. Score greater than this one would be added to the distribution as the value max_score. Default is 1000

nbResults

: Number of returned solution (Cocktail with the best oberved score during the run), 100 by default

num_thread

: Number of thread to run in parallel if openMP is available, 1 by default

Value

Return a List containing : - ScoreDistribution : the distribution of the score as an array with each cells representing the number of risks = (index-1)/ 10 - Filtered_score_distribution : Distribution containing score for cocktails taken by at least beta patients. - Outstanding_score : An array of the score greater than max_score, - Best_cocktails : the nbResults bests cocktails encountered during the run. - Best_cocktails_beta : the nbResults bests cocktails taken by at least beta patients encountered during the run. - Best_scores : Score corresponding to the Best_cocktails. - Best_scores_beta : Score corresponding to the Best_cocktails_beta.

Examples


data("ATC_Tree_UpperBound_2024")
data("FAERS_myopathy")

size_1_score_distribution = trueDistributionDrugs(ATCtree = ATC_Tree_UpperBound_2024,
            observations = FAERS_myopathy[1:100,], beta = 4)

The true distribution of the score among every size-two cocktails

Description

The true distribution of the score among every size-two cocktails

Usage

trueDistributionSizeTwoCocktail(
  ATCtree,
  observations,
  beta,
  max_score = 100L,
  nbResults = 100L,
  num_thread = 1L
)

Arguments

ATCtree

: ATC tree with upper bound of the DFS (without the root)

observations

beta

: minimum number of person taking the cocktails in order to consider it in the beta score distribution

max_score

: maximum number the score can take. Score greater than this one would be added to the distribution as the value max_score. Default is 1000

nbResults

: Number of returned solution (Cocktail with the best oberved score during the run), 100 by default

num_thread

: Number of thread to run in parallel if openMP is available, 1 by default

Value

Examples


data("ATC_Tree_UpperBound_2024")
data("FAERS_myopathy")

size_2_score_distribution = trueDistributionSizeTwoCocktail(ATCtree = ATC_Tree_UpperBound_2024,
            observations = FAERS_myopathy[1:100,], beta = 4)

Evolutionary Version of the Metropolis-Hastings Algorithm

Description

Author(s)

See Also

ATC Tree Upper Bound 2024

Description

Usage

Format

Source

Convert ATC Code for each patients to the corresponding DFS number of the ATC tree

Description

Usage

Arguments

Value

Examples

The MCMC method that runs the random walk on a single cocktail in order to estimate the distribution of score among cocktails of size Smax.

Description

Usage

Arguments

Value

Examples

FAERS Myopathy Dataset

Description

Usage

Format

Source

Genetic algorithm, trying to reach riskiest cocktails (the ones which maximize the fitness function, Hypergeometric score in our case)

Description

Usage

Arguments

Value

Examples

Output the outstanding score (Outstanding_score) outputed by the MCMC algorithm in a special format

Description

Usage

Arguments

Value

Examples

Calculate the divergence between 2 distributions (the true Distribution and the learned one)

Description

Usage

Arguments

Value

Examples

Clustering of the solutions of the genetic algorithm using the hclust algorithm

Description

Usage

Arguments

Value

Examples

Function used in the reference article to compare diverse Disproportionality Analysis metrics

Description

Usage

Arguments

Value

Examples

Function used to compute the Relative Risk on a list of cocktails

Description

Usage

Arguments

Value

Examples

Function used to compute the Hypergeometric score on a list of cocktails

Description

Usage

Arguments

Value

Examples

Function used to convert your genetic algorithm results that are stored into a .csv file to a Data structure that can be used by the clustering algorithm

Description

Usage

Arguments

Value

Examples

Recover the square matrix of distance between cocktails where the index (i,j) of the matrix is the distance between cocktails i and j in an arbitrary cocktail list

Description

Usage

Arguments

Value

Examples