Help for package csmGmm

Type:

Package

Title:

Conditionally Symmetric Multidimensional Gaussian Mixture Model

Version:

0.3.0

Description:

Implements the conditionally symmetric multidimensional Gaussian mixture model (csmGmm) for large-scale testing of composite null hypotheses in genetic association applications such as mediation analysis, pleiotropy analysis, and replication analysis. In such analyses, we typically have J sets of K test statistics where K is a small number (e.g. 2 or 3) and J is large (e.g. 1 million). For each one of the J sets, we want to know if we can reject all K individual nulls. Please see the vignette for a quickstart guide. The paper describing these methods is "Testing a Large Number of Composite Null Hypotheses Using Conditionally Symmetric Multidimensional Gaussian Mixtures in Genome-Wide Studies" by Sun R, McCaw Z, & Lin X (2024, <doi:10.1080/01621459.2024.2422124>). The paper is accepted and published online (but not yet in print) in the Journal of the American Statistical Association as of Dec 1 2024.

License:

GPL-3

Encoding:

UTF-8

RoxygenNote:

7.3.1

Imports:

dplyr, mvtnorm, rlang, magrittr

Suggests:

knitr, rmarkdown

VignetteBuilder:

knitr

NeedsCompilation:

Packaged:

2024-12-02 20:45:12 UTC; rsun3

Author:

Ryan Sun [aut, cre]

Maintainer:

Ryan Sun <ryansun.work@gmail.com>

Repository:

CRAN

Date/Publication:

2024-12-03 19:30:02 UTC

calc_dens_cor.R

Description

For J*2 matrix of J sets, calculate the density of bivariate normal under fitted c-csmGmm.

Usage

calc_dens_cor(x, Zmat, corMat, log = FALSE)

Arguments

x

2*1 vector of means.

Zmat

J*2 matrix of test statistics.

corMat

2*2 matrix describing correlation structure of test statistics.

log

return logarithm of density

Value

A J*1 vector of densities for each row of Zmat.

Examples

x <- c(0, 0)
Zmat <- cbind(rnorm(10^5), rnorm(10^5))
calc_dens_cor(x, Zmat, corMat = cor(Zmat))

calc_dens_ind.R

Description

Calculate J bivariate normal densities (both dimensions are independent) under fitted csmGmm.

Usage

calc_dens_ind_2d(x, Zmat)

Arguments

x

2*1 vector of means.

Zmat

J*2 matrix of test statistics.

Value

A J*1 vector of densities for each row of Zmat.

Examples

x <- c(0, 0)
Zmat <- cbind(rnorm(10^5), rnorm(10^5))
calc_dens_ind_2d(x, Zmat)

Calculate J trivariate normal densities (all dimensions are independent) under fitted csmGmm.

Description

Calculate J trivariate normal densities (all dimensions are independent) under fitted csmGmm.

Usage

calc_dens_ind_3d(x, Zmat)

Arguments

x

3*1 vector of means.

Zmat

J*3 matrix of test statistics.

Value

A J*1 vector of densities for each row of Zmat.

Examples

x <- c(0, 0)
Zmat <- cbind(rnorm(10^5), rnorm(10^5), rnorm(10^5))
calc_dens_ind_3d(x, Zmat)

Calculate the density of K-dimensional multivariate normal (all dimensions are independent) under fitted acsGmm.

Description

Calculate the density of K-dimensional multivariate normal (all dimensions are independent) under fitted acsGmm.

Usage

calc_dens_ind_multiple(x, Zmat)

Arguments

x

K*1 vector of means.

Zmat

J*K matrix of test statistics.

Value

A J*1 vector of densities for each row of Zmat.

Examples

x <- c(0, 0)
Zmat <- cbind(rnorm(10^5), rnorm(10^5), rnorm(10^5), rnorm(10^5))
calc_dens_ind_multiple(x, Zmat)

check_incongruous.R

Description

Check the number of sets of test statistics that have a higher (less significant) lfdr value than other sets with test statistics of uniformly smaller magnitudes.

Usage

check_incongruous(zMatrix, lfdrVec)

Arguments

zMatrix

J*K vector of all test statistics.

lfdrVec

J*1 vector of lfdr values corresponding to each set of test statistics.

Value

A vector with all the indices of all sets that have a higher lfdr value those a set with smaller test statistic magnitudes.

Examples

zMatrix <- cbind(rnorm(10^4), rnorm(10^4))
lfdrVec <- runif(10^4)
check_incongruous(zMatrix = zMatrix, lfdrVec = lfdrVec)

Tells if row x if allTestStats is an incongruous result (has a higher lfdr than a set of test statistics with lower magnitudes). For K=2 case.

Description

Tells if row x if allTestStats is an incongruous result (has a higher lfdr than a set of test statistics with lower magnitudes). For K=2 case.

Usage

find_2d(x, allTestStats)

Arguments

x

Scalar, which row of allTestStats to check.

allTestStats

J*K vector of all test statistics.

Value

A scalar denoting the number of sets with lower lfdr and test statistics of lower magnitude. 0 means congruous result.

Examples

zMatrix <- cbind(rnorm(10^4), rnorm(10^4))
find_2d(x = 5, allTestStats = zMatrix)

Tells if row x if allTestStats is an incongruous result (has a higher lfdr than a set of test statistics with lower magnitudes). For K=3 case.

Description

Tells if row x if allTestStats is an incongruous result (has a higher lfdr than a set of test statistics with lower magnitudes). For K=3 case.

Usage

find_3d(x, allTestStats)

Arguments

x

Scalar, which row of allTestStats to check.

allTestStats

J*K vector of all test statistics.

Value

A scalar denoting the number of sets with lower lfdr and test statistics of lower magnitude. 0 means congruous result.

Examples

zMatrix <- cbind(rnorm(10^4), rnorm(10^4),  rnorm(10^4))
find_3d(x = 5, allTestStats = zMatrix)

find_max_means.R

Description

Find maximum means for each dimension in null settings.

Usage

find_max_means(muInfo)

Arguments

muInfo

A list with 2^K elements, where each element is a matrix with K rows and Mb columns.

Value

A K*1 vector of the maximum means for each dimension under the null.

Examples

initMuList <- list(matrix(data=0, nrow=2, ncol=1), matrix(data=c(0, 3, 0, 6), nrow=2),
matrix(data=c(3, 0, 6, 0), nrow=2), matrix(data=c(8, 8), nrow=2))
find_max_means(initMuList)

symm_fit_cor.R

Description

Fit the correlated csmGmm for sets of correlated elements. Currently restricted to K=2.

Usage

symm_fit_cor_EM(
  testStats,
  corMat,
  initMuList,
  initPiList,
  eps = 10^(-5),
  checkpoint = TRUE
)

Arguments

testStats

J*K matrix of test statistics where J is the number of sets and K is number of elements in each set.

corMat

K*K matrix that describes the correlation structure of each set.

initMuList

List of 2^K elements where each element is a matrix with K rows and number of columns equal to the number of possible mean vectors for that binary group.

initPiList

List of 2^K elements where each element is a vector with number of elements equal to the number of possible mean vectors for that binary group.

eps

Scalar, stop the EM algorithm when L2 norm of difference in parameters is less than this value.

checkpoint

Boolean, set to TRUE to print iterations of EM.

Value

A list with the elements:

muInfo

List with same dimensions as initMuList, holds the final mean parameters.

piInfo

List with same dimensions as initPiList, holds the final probability parameters.

iter

Number of iterations run in EM algorithm.

lfdrResults

J*1 vector of all lfdr statistics.

Examples

set.seed(0)
corMat <- matrix(data=c(1, 0.3, 0.3, 1), nrow=2)
testStats <- rbind(mvtnorm::rmvnorm(n=200, mean=c(3, 0), sigma=corMat),
mvtnorm::rmvnorm(n=200, mean=c(0, 4), sigma=corMat),
mvtnorm::rmvnorm(n=100, mean=c(7, 7), sigma=corMat),
mvtnorm::rmvnorm(n=10^4 - 500, mean=c(0, 0), sigma=corMat))
initMuList <- list(matrix(data=0, nrow=2, ncol=1), matrix(data=c(0, 3), nrow=2),
matrix(data=c(4, 0), nrow=2), matrix(data=c(5, 5), nrow=2))
initPiList <- list(c(0.9), c(0.04), c(0.04), c(0.02))
results <- symm_fit_cor_EM(testStats = testStats, corMat = cor(testStats),
initMuList = initMuList, initPiList = initPiList)

symm_fit_cor_fulllik.R

Description

Full likelihood, block correlation, blocks of size 2

Usage

symm_fit_cor_EM_fulllik(
  testStats,
  corMat,
  initMuList,
  initPiList,
  eps = 10^(-5),
  checkpoint = TRUE
)

Arguments

testStats

J*K matrix of test statistics where J is the number of sets and K is number of elements in each set.

corMat

K*K matrix that describes the correlation structure of each 2 by 2 block.

initMuList

List of 2^K elements where each element is a matrix with K rows and number of columns equal to the number of possible mean vectors for that binary group.

initPiList

List of 2^K elements where each element is a vector with number of elements equal to the number of possible mean vectors for that binary group.

eps

Scalar, stop the EM algorithm when L2 norm of difference in parameters is less than this value.

checkpoint

Boolean, set to TRUE to print iterations of EM.

Value

A list with the elements:

muInfo

List with same dimensions as initMuList, holds the final mean parameters.

piInfo

List with same dimensions as initPiList, holds the final probability parameters.

iter

Number of iterations run in EM algorithm.

lfdrResults

J*1 vector of all lfdr statistics.

Examples

set.seed(0)
testStats <- cbind(rnorm(10^4), rnorm(10^4))
testStats[1:100, 1] <- rnorm(100, mean=3)
testStats[101:200, 1] <- rnorm(100, mean=5)
testStats[201:300, 2] <- rnorm(100, mean=4)
testStats[301:400, 1:2] <- rnorm(200, mean=7)
initMuList <- list(matrix(data=0, nrow=2, ncol=1), matrix(data=c(0, 3), nrow=2, ncol=1),
matrix(data=c(3, 0), nrow=2, ncol=1), matrix(data=c(6, 6), nrow=2, ncol=1))
initPiList <- list(c(0.9), c(0.04),c(0.04), c(0.02))
results <- symm_fit_cor_EM_fulllik(testStats = testStats, corMat=diag(c(1,1)),
initMuList = initMuList, initPiList = initPiList)

symm_fit_cor_noAssumption.R

Description

Fit the correlated csmGmm for sets of correlated elements, but we don't assume that the means in the composite alternative are greater in magnitude than those in the composite null.

Usage

symm_fit_cor_EM_noAssumption(
  testStats,
  corMat,
  initMuList,
  initPiList,
  eps = 10^(-5),
  checkpoint = TRUE
)

Arguments

testStats

J*K matrix of test statistics where J is the number of sets and K is number of elements in each set.

corMat

K*K matrix that describes the correlation structure of each set.

initMuList

List of 2^K elements where each element is a matrix with K rows and number of columns equal to the number of possible mean vectors for that binary group.

initPiList

List of 2^K elements where each element is a vector with number of elements equal to the number of possible mean vectors for that binary group.

eps

Scalar, stop the EM algorithm when L2 norm of difference in parameters is less than this value.

checkpoint

Boolean, set to TRUE to print iterations of EM.

Value

A list with the elements:

muInfo

List with same dimensions as initMuList, holds the final mean parameters.

piInfo

List with same dimensions as initPiList, holds the final probability parameters.

iter

Number of iterations run in EM algorithm.

lfdrResults

J*1 vector of all lfdr statistics.

Examples

set.seed(0)
corMat <- matrix(data=c(1, 0.3, 0.3, 1), nrow=2)
testStats <- rbind(mvtnorm::rmvnorm(n=200, mean=c(3, 0), sigma=corMat),
mvtnorm::rmvnorm(n=200, mean=c(0, 4), sigma=corMat),
mvtnorm::rmvnorm(n=100, mean=c(7, 7), sigma=corMat),
mvtnorm::rmvnorm(n=10^4 - 500, mean=c(0, 0), sigma=corMat))
initMuList <- list(matrix(data=0, nrow=2, ncol=1), matrix(data=c(0, 3), nrow=2),
matrix(data=c(4, 0), nrow=2), matrix(data=c(5, 5), nrow=2))
initPiList <- list(c(0.9), c(0.04), c(0.04), c(0.02))
results <- symm_fit_cor_EM_noAssumption(testStats = testStats,
corMat = cor(testStats), initMuList = initMuList, initPiList = initPiList)

symm_fit_cor_rho.R

Description

Fit the correlated csmGmm for sets of correlated elements. Also fits the correlation parameter in EM algorithm.

Usage

symm_fit_cor_EM_rho(
  testStats,
  initRho,
  initMuList,
  initPiList,
  eps = 10^(-5),
  checkpoint = TRUE
)

Arguments

testStats

J*K matrix of test statistics where J is the number of sets and K is number of elements in each set.

initRho

Initial value of rho, any reasonable guess should be ok.

initMuList

List of 2^K elements where each element is a matrix with K rows and number of columns equal to the number of possible mean vectors for that binary group.

initPiList

List of 2^K elements where each element is a vector with number of elements equal to the number of possible mean vectors for that binary group.

eps

Scalar, stop the EM algorithm when L2 norm of difference in parameters is less than this value.

checkpoint

Boolean, set to TRUE to print iterations of EM.

Value

A list with the elements:

muInfo

List with same dimensions as initMuList, holds the final mean parameters.

piInfo

List with same dimensions as initPiList, holds the final probability parameters.

iter

Number of iterations run in EM algorithm.

lfdrResults

J*1 vector of all lfdr statistics.

Examples

set.seed(0)
corMat <- matrix(data=c(1, 0.3, 0.3, 1), nrow=2)
testStats <- rbind(mvtnorm::rmvnorm(n=200, mean=c(3, 0), sigma=corMat),
mvtnorm::rmvnorm(n=200, mean=c(0, 4), sigma=corMat),
mvtnorm::rmvnorm(n=100, mean=c(7, 7), sigma=corMat),
mvtnorm::rmvnorm(n=10^4 - 500, mean=c(0, 0), sigma=corMat))
initMuList <- list(matrix(data=0, nrow=2, ncol=1), matrix(data=c(0, 3), nrow=2),
matrix(data=c(4, 0), nrow=2), matrix(data=c(5, 5), nrow=2))
initPiList <- list(c(0.9), c(0.04), c(0.04), c(0.02))
results <- symm_fit_cor_EM_rho(testStats = testStats,
initRho = 0.1, initMuList = initMuList, initPiList = initPiList)

symm_fit_ind.R

Description

Fit the conditionally symmetric multidimensional Gaussian mixture model for sets of independent elements

Usage

symm_fit_ind_EM(
  testStats,
  initMuList,
  initPiList,
  sameDirAlt = FALSE,
  eps = 10^(-5),
  checkpoint = TRUE
)

Arguments

testStats

J*K matrix of test statistics where J is the number of sets and K is number of elements in each set.

initMuList

List of 2^K elements where each element is a matrix with K rows and number of columns equal to the number of possible mean vectors for that binary representation.

initPiList

List of 2^K elements where each element is a vector with number of elements equal to the number of possible mean vectors for that binary representation.

sameDirAlt

Boolean, set to TRUE for replication testing, which uses a smaller alternative space.

eps

Scalar, stop the EM algorithm when L2 norm of difference in parameters is less than this value.

checkpoint

Boolean, set to TRUE to print iterations of EM

Value

A list with the elements:

muInfo

List with same dimensions as initMuList, holds the final mean parameters.

piInfo

List with same dimensions as initPiList, holds the final mixture proportions

iter

Number of iterations run in EM algorithm.

lfdrResults

J*1 vector of all lfdr statistics.

Examples

set.seed(0)
testStats <- cbind(rnorm(10^4), rnorm(10^4))
testStats[1:200, 1] <- rnorm(100, mean=3)
testStats[201:400, 1] <- rnorm(100, mean=5)
testStats[401:600, 2] <- rnorm(100, mean=3)
testStats[601:800, 2] <- rnorm(100, mean=5)
testStats[801:1000, 1:2] <- rnorm(200, mean=7)
initMuList <- list(matrix(data=0, nrow=2, ncol=1), matrix(data=c(0, 3, 0, 5), nrow=2, ncol=2),
matrix(data=c(3, 0, 5, 0), nrow=2, ncol=2), matrix(data=c(7, 7), nrow=2, ncol=1))
initPiList <- list(c(0.9), c(0.02, 0.02),c(0.02, 0.02), c(0.02))
results <- symm_fit_ind_EM(testStats = testStats, initMuList = initMuList, initPiList = initPiList)

symm_fit_ind_noAssumption.R

Description

Fit the conditionally symmetric multidimensional Gaussian mixture model for sets of independent elements, but we don't assume that the means in the composite alternative are greater in magnitude than those in the composite null.

Usage

symm_fit_ind_EM_noAssumption(
  testStats,
  initMuList,
  initPiList,
  sameDirAlt = FALSE,
  eps = 10^(-5),
  checkpoint = TRUE
)

Arguments

testStats

J*K matrix of test statistics where J is the number of sets and K is number of elements in each set.

initMuList

List of 2^K elements where each element is a matrix with K rows and number of columns equal to the number of possible mean vectors for that binary representation.

initPiList

List of 2^K elements where each element is a vector with number of elements equal to the number of possible mean vectors for that binary representation.

sameDirAlt

Boolean, set to TRUE for replication testing, which uses a smaller alternative space.

eps

Scalar, stop the EM algorithm when L2 norm of difference in parameters is less than this value.

checkpoint

Boolean, set to TRUE to print iterations of EM

Value

A list with the elements:

muInfo

List with same dimensions as initMuList, holds the final mean parameters.

piInfo

List with same dimensions as initPiList, holds the final mixture proportions

iter

Number of iterations run in EM algorithm.

lfdrResults

J*1 vector of all lfdr statistics.

Examples

set.seed(0)
testStats <- cbind(rnorm(10^4), rnorm(10^4))
testStats[1:200, 1] <- rnorm(100, mean=3)
testStats[201:400, 1] <- rnorm(100, mean=5)
testStats[401:600, 2] <- rnorm(100, mean=3)
testStats[601:800, 2] <- rnorm(100, mean=5)
testStats[801:1000, 1:2] <- rnorm(200, mean=7)
initMuList <- list(matrix(data=0, nrow=2, ncol=1), matrix(data=c(0, 3, 0, 5), nrow=2, ncol=2),
matrix(data=c(3, 0, 5, 0), nrow=2, ncol=2), matrix(data=c(7, 7), nrow=2, ncol=1))
initPiList <- list(c(0.9), c(0.02, 0.02),c(0.02, 0.02), c(0.02))
results <- symm_fit_ind_EM_noAssumption(testStats = testStats,
initMuList = initMuList, initPiList = initPiList)