diemr implements the Diagnostic Index Expectation
Maximization (diem) algorithm for genome
polarization in R. It estimates which alleles of
single nucleotide variant (SNV) sites belong to either side of a barrier
to gene flow, co-estimates individual assignment, and infers barrier
strength and divergence. These tools are designed for studies of
hybridization, speciation, and
population divergence, and extend the methods described in Baird et
al. (2023) Genome polarisation for detecting barriers to geneflow.
Methods in Ecology and Evolution 14, 512-528 doi:10.1111/2041-210X.14010. For the original algorithm
description and implementations in Python and
Mathematica, see the diem repository at https://github.com/StuartJEBaird/diem. For a
step-by-step explanation of the functions and their outputs, see
the
documentation
for the diemr package.
To start using diemr, load the package or install it from CRAN if it is not yet available:
if(!require("diemr", character.only = TRUE)){
install.packages("diemr", dependencies = TRUE)
library("diemr", character.only = TRUE)
}
# Loading required package: diemrThe developer version can be installed directly from this repository
using package devtools.
devtools::install_github("https://github.com/nmartinkova/diemr")Set working directory to a location with read and write privileges.
Next, assemble paths to all files containing the data to be used by diemr. Here, we will use a tiny example dataset for illustration that is included in the package. A good practice is to check that all files contain data in correct format for all individuals and markers.
filepaths <- system.file("extdata", "data7x3.txt",
package = "diemr")
CheckDiemFormat(filepaths, ploidy = list(rep(2, 6)), ChosenInds = 1:6)
# File check passed: TRUE
# Ploidy check passed: TRUEIf the CheckDiemFormat() function fails, work through
the error messages and fix the stored input files accordingly. The
algorithm repeatedly accesses data from the harddisk, so seeing the
passed file check prior to analysis is critical.
diem.res <- diem(files = filepaths,
ploidy = list(rep(2, 6)),
ChosenInds = 1:6,
nCores = 1)The results including marker polarisation, marker diagnostic index
and its support will be included in the list element
diem.res$DI. Additional elements in the results list
contain basic tracking information about the expectation maximisation
iterations. The key results are saved in a file
MarkerDiagnosticsWithOptimalPolarities.txt in the working
directory. Check the the diemr documentation
for further information.