Type: | Package |
Title: | Simulating Realistic Microbiome Data using 'MIDASim' |
Version: | 2.0 |
Description: | The 'MIDASim' package is a microbiome data simulator for generating realistic microbiome datasets by adapting a user-provided template. It supports the controlled introduction of experimental signals-such as shifts in taxon relative abundances, prevalence, and sample library sizes-to create distinct synthetic populations under diverse simulation scenarios. For more details, see He et al. (2024) <doi:10.1186/s40168-024-01822-z>. |
Imports: | psych, MASS, pracma, scam, stats |
Suggests: | vegan, phyloseq, rmarkdown, knitr, roxygen2, testthat |
URL: | https://github.com/mengyu-he/MIDASim |
BugReports: | https://github.com/mengyu-he/MIDASim/issues |
LazyData: | true |
License: | GPL-2 |
Encoding: | UTF-8 |
Depends: | R (≥ 3.5.0) |
VignetteBuilder: | knitr |
RoxygenNote: | 7.3.1 |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2025-07-01 18:50:50 UTC; mhe |
Author: | Mengyu He [aut, cre] |
Maintainer: | Mengyu He <mhe44@emory.edu> |
Repository: | CRAN |
Date/Publication: | 2025-07-05 19:00:03 UTC |
Simulating Realistic Microbiome Data using MIDASim
Description
Generate microbiome datasets using parameters from MIDASim.modify.
Usage
MIDASim(fitted.modified, only.rel = FALSE)
Arguments
fitted.modified |
Output from MIDASim.modify. |
only.rel |
A logical indicating whether to only simulate relative-
abundance data. If |
Value
Returns a list that has components:
sim_01 |
Matrix of simulated presence-absence data |
sim_rel |
Matrix of simulated relative-abundance data |
sim_count |
Matrix of simulated count data |
Author(s)
Mengyu He
Examples
data("throat.otu.tab")
otu.tab = throat.otu.tab[,colSums(throat.otu.tab>0)>1]
fitted = MIDASim.setup(otu.tab)
fitted.modified = MIDASim.modify(fitted)
sim = MIDASim(fitted.modified, only.rel = FALSE)
Modifying MIDASim model
Description
MIDASim.modify() modifies the fitted MIDASim.setup model according to user specification that one or multiple of the following characteristics, such as the library sizes, taxa relative abundances, location parameters of the parametric model can be changed. This is useful if the users wants to introduce an 'effect' in simulation studies.
Usage
MIDASim.modify(
fitted,
lib.size = NULL,
mean.rel.abund = NULL,
gengamma.mu = NULL,
sample.1.prop = NULL,
taxa.1.prop = NULL,
individual.rel.abund = NULL,
...
)
Arguments
fitted |
Output from MIDASim.setup. |
lib.size |
Numeric vector of pre-specified library sizes (length should
be equal to |
mean.rel.abund |
Numeric vector of specified mean relative abundances for
taxa. Length should be equal to |
gengamma.mu |
Numeric vector of specified location parameters for the
parametric model (generalized gamma model). Specify either |
sample.1.prop |
Numeric vector of specified proportion of non-zeros for
subjects (the length should be equal to |
taxa.1.prop |
Numeric vector of specified proportion of non-zeros for
taxa (the length should be equal to |
individual.rel.abund |
Numeric matrix of expected relative abundances
with |
... |
Additional arguments. If SCAM model is chosen for parameter changes
under the non-parametric mode, specify |
Details
The parametric model in MIDASim is a location-scale model, specifically, a
generalized gamma model for relative abundances \pi
of a taxon. Denote t = 1/\pi
.
The generalized gamma distribution for t
is chosen so that
ln(t)\ =\ \mu\ +\ \sigma \cdot w
where w
follows a log gamma distribution with a shape parameter 1/Q
.
MIDASim fits the model to the template data and estimates parameters \mu
, \sigma
and Q
by matching the first two moments of \pi
and maximizing the likelihood.
Value
Returns an updated list with different elements depending on the value
of fitted$mode
:
n.sample |
Target sample size in the simulation. |
lib.size |
Target library sizes in the simulation. |
taxa.1.prop |
Updated proportions of non-zero values for each taxon. |
sample.1.prop |
Updated proportion of non-zero cells for each subject. |
theta |
Mean values of the multivariate normal distribution in generating presence-absence data. |
eta |
Adjustment to be applied to samples in generating presence- absence data. |
Author(s)
Mengyu He
Examples
data("throat.otu.tab")
otu.tab = throat.otu.tab[,colSums(throat.otu.tab>0)>1]
fitted = MIDASim.setup(otu.tab, mode = 'parametric')
# modify library sizes
fitted.modified <- MIDASim.modify(fitted,
lib.size = sample(fitted$lib.size, 2*nrow(otu.tab),
replace = TRUE) )
# modify mean relative abundances
fitted.modified <- MIDASim.modify(fitted,
mean.rel.abund = fitted$mean.rel.abund * runif(fitted$n.taxa))
Fitting MIDAS model to microbiome data
Description
Midas.setup estimates parameters from a template microbiome count dataset for downstream data simulation.
Usage
MIDASim.setup(otu.tab, n.break.ties = 100, mode = "nonparametric")
Arguments
otu.tab |
Numeric matrix of template microbiome count dataset. Rows are samples, columns are taxa. |
n.break.ties |
Number of replicates to break ties when ranking relative
abundances. Defaults to |
mode |
A character indicating the modeling approach for relative abundances.
If |
Value
Returns a list that has components:
mat01 |
Presence-absence matrix of the template data. |
lib.size |
Observed library sizes of the template data. |
n.taxa |
Number of taxa in the template data. |
n.sample |
Sample size in the template data. |
ids |
Taxa ids present in all samples in the template. |
tetra.corr |
Estimated tetrachoric correlation of the presence-absence matrix of the template. |
corr.rel.corrected |
Estimated Pearson correlation of relative abundances, transformed from Spearman's rank correlation. |
sample.1.prop |
Proportion of non-zero cells for each subject. |
taxa.1.prop |
Proportion of non-zeros for each taxon. |
mean.rel.abund |
Observed mean relative abundances of each taxon. |
rel.abund.1 |
Observed non-zero relative abundances of each taxon. |
taxa.names |
Names of taxa in the template. |
Author(s)
Mengyu He
Examples
data("throat.otu.tab")
otu.tab = throat.otu.tab[,colSums(throat.otu.tab>0)>1]
# use nonparametric model
fitted = MIDASim.setup(otu.tab)
# use parametric model
fitted = MIDASim.setup(otu.tab, mode = 'parametric')
IBD microbiome dataset
Description
A filtered microbiome dataset of patients with IBD(Inflammatory Bowel Disease) in Human Microbiome Project 2 (HMP2).
Usage
data(count.ibd)
Format
An object of class matrix
(inherits from array
) with 146 rows and 614 columns.
References
Lloyd-Price, J., Arze, C., Ananthakrishnan, A.N. *et al*. Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. *Nature* 569, 655–662 (2019). https://doi.org/10.1038/s41586-019-1237-9.
Examples
data(count.ibd)
MIDASim.setup(otu.tab = count.ibd, mode = "nonparametric")
MOMS-PI microbiome dataset
Description
A filtered microbiome dataset of Multi-Omic Microbiome Study-Pregnancy Initiative (MOMS-PI) in Human Microbiome Project 2 (HMP2).
Usage
data(count.vaginal)
Format
An object of class matrix
(inherits from array
) with 517 rows and 1146 columns.
References
Fettweis, J.M., Serrano, M.G., Brooks, J.P. et al. The vaginal microbiome and preterm birth. Nat Med 25, 1012–1021 (2019). https://doi.org/10.1038/s41591-019-0450-2
Examples
data(count.vaginal)
MIDASim.setup(otu.tab = count.vaginal, mode = "nonparametric")
throat microbiome dataset
Description
A microbiome dataset of 60 subjects with 856 OTUs. The data were collected from right and left nasopharynx and oropharynx region.
Usage
data(throat.otu.tab)
Format
An object of class data.frame
with 60 rows and 856 columns.
References
Charlson, E. S., Chen, J., Custers-Allen, R., Bittinger, K., Li, H., Sinha, R., Hwang, J., Bushman, F. D., & Collman, R. G. (2010). Disordered microbial communities in the upper respiratory tract of cigarette smokers. PloS one, 5(12), e15216. https://doi.org/10.1371/journal.pone.0015216
Examples
data(throat.otu.tab)
MIDASim.setup(otu.tab = throat.otu.tab, mode = "nonparametric")