Type: | Package |
Title: | Noisy Stochastic Block Mode: Graph Inference by Multiple Testing |
Version: | 0.1.4 |
Author: | Tabea Rebafka [aut, cre], Etienne Roquain [ctb], Fanny Villers [aut] |
Maintainer: | Tabea Rebafka <tabea.rebafka@sorbonne-universite.fr> |
Description: | Variational Expectation-Maximization algorithm to fit the noisy stochastic block model to an observed dense graph and to perform a node clustering. Moreover, a graph inference procedure to recover the underlying binary graph. This procedure comes with a control of the false discovery rate. The method is described in the article "Powerful graph inference with false discovery rate control" by T. Rebafka, E. Roquain, F. Villers (2020) <doi:10.48550/arXiv.1907.10176>. |
License: | GPL-2 |
Encoding: | UTF-8 |
LazyData: | true |
Imports: | parallel, gtools, ggplot2, RColorBrewer |
RoxygenNote: | 7.1.1 |
Suggests: | knitr, rmarkdown |
VignetteBuilder: | knitr |
Depends: | R (≥ 2.10) |
NeedsCompilation: | no |
Packaged: | 2020-12-14 21:10:07 UTC; tabea |
Repository: | CRAN |
Date/Publication: | 2020-12-16 10:40:06 UTC |
Evalute the adjusted Rand index
Description
Compute the adjusted Rand index to compare two partitions
Usage
ARI(x, y)
Arguments
x |
vector (of length n) or matrix (with n columns) providing a partition |
y |
vector or matrix providing a partition |
Details
the partitions may be provided as n-vectors containing the cluster memeberships of n entities, or by Qxn - matrices whose entries are all 0 and 1 where 1 indicates the cluster membership
Value
the value of the adjusted Rand index
Examples
clust1 <- c(1,2,1,2)
clust2 <- c(2,1,2,1)
ARI(clust1, clust2)
clust3 <- matrix(c(1,1,0,0, 0,0,1,1), nrow=2, byrow=TRUE)
clust4 <- matrix(c(1,0,0,0, 0,1,0,0, 0,0,1,1), nrow=3, byrow=TRUE)
ARI(clust3, clust4)
computation of the Integrated Classification Likelihood criterion
Description
computation of the Integrated Classification Likelihood criterion for a result provided by mainVEM_Q()
Usage
ICL_Q(solutionThisRun, model)
Arguments
solutionThisRun |
result provided by mainVEM_Q() |
model |
Implemented models:
|
Value
value of the ICL criterion
evaluate the objective in the Gamma model
Description
evaluate the objective in the Gamma model
Usage
J.gamma(param, L, M)
Arguments
param |
parameters of the Gamma distribution |
L |
weighted mean of log(data) |
M |
weighted mean of the data |
Value
value of the lower bound of the log-likelihood function
evaluation of the objective in the Gauss model
Description
evaluation of the objective in the Gauss model
Usage
JEvalMstep(VE, mstep, data, modelFamily, directed)
Arguments
VE |
list with variational parameters tau and rho |
mstep |
list with current model parameters and additional auxiliary terms |
data |
data vector in the undirected model, data matrix in the directed model |
modelFamily |
probability distribution for the edges. Possible values:
|
directed |
booelan to indicate whether the model is directed or undirected |
Value
value of the ELBO and the complete log likelihood function
M-step
Description
performs one M-step, that is, update of pi, w, nu, nu0
Usage
Mstep(VE, mstep, model, data, modelFamily, directed)
Arguments
VE |
list with variational parameters tau and rho |
mstep |
list with current model parameters and additional auxiliary terms |
model |
Implemented models:
|
data |
data vector in the undirected model, data matrix in the directed model |
modelFamily |
probability distribution for the edges. Possible values:
|
directed |
booelan to indicate whether the model is directed or undirected |
Value
updated list mstep
with current model parameters and additional auxiliary terms
VE-step
Description
performs one VE-step, that is, update of tau and rho
Usage
VEstep(VE, mstep, data, modelFamily, directed, fix.iter = 5)
Arguments
VE |
list with variational parameters tau and rho |
mstep |
list with current model parameters and additional auxiliary terms |
data |
data vector in the undirected model, data matrix in the directed model |
modelFamily |
probability distribution for the edges. Possible values:
|
directed |
booelan to indicate whether the model is directed or undirected |
fix.iter |
maximal number of iterations for fixed point equation |
Value
updated list VE
with variational parameters tau and rho
split group q of provided tau randomly into two into
Description
split group q of provided tau randomly into two into
Usage
addRowToTau(tau, q)
Arguments
tau |
provided tau |
q |
indice of group to split |
Value
new tau
convert a clustering into a 0-1-matrix
Description
convert a clustering into a 0-1-matrix
Usage
classInd(cl, nbClusters)
Arguments
cl |
cluster in vector form |
nbClusters |
number of clusters |
Value
a 0-1-matrix encoding the clustering
transform a pair of block identifiers (q,l) into an identifying integer
Description
this is the inverse function of convertGroupPairIdentifier()
Usage
convertGroupPair(q, l, Q, directed)
Arguments
q |
indicator of a latent block |
l |
indicator of a latent block |
Q |
number of latent blocks |
directed |
indicates if the graph is directed |
takes a scalar indice of a group pair (q,l) and returns the values q and l
Description
this is the inverse function of convertGroupPair()
Usage
convertGroupPairIdentifier(ind_ql, Q)
Arguments
ind_ql |
indicator for a pair of latent blocks |
Q |
number of latent blocks |
transform a pair of nodes (i,j) into an identifying integer
Description
Associates an identifying integer with a pair of nodes (i,j)
Usage
convertNodePair(i, j, n, directed)
Arguments
i |
scalar or vector |
j |
scalar or vector, same length as i |
n |
number of vertices |
directed |
booelan to indicate whether the model is directed or undirected |
Details
returns the row number of the matrix build by listNodePairs(n) containing the pair (i,j)
corrects values of the variational parameters tau that are too close to the 0 or 1
Description
corrects values of the variational parameters tau that are too close to the 0 or 1
Usage
correctTau(tau)
Arguments
tau |
variational parameters |
compute the MLE in the Gamma model using the Newton-Raphson method
Description
compute the MLE in the Gamma model using the Newton-Raphson method
Usage
emv_gamma(L, M, param.old, epsilon = 0.001, nb.iter.max = 10)
Arguments
L |
weighted mean of log(data) |
M |
weighted mean of the data |
param.old |
parameters of the Gamma distribution |
epsilon |
threshold for the stopping criterion |
nb.iter.max |
maximum number of iterations |
Value
updated parameters of the Gamma distribution
VEM algorithm to adjust the noisy stochastic block model to an observed dense adjacency matrix
Description
fitNSBM()
estimates model parameters of the noisy stochastic block model and provides a clustering of the nodes
Usage
fitNSBM(
dataMatrix,
model = "Gauss0",
sbmSize = list(Qmin = 1, Qmax = NULL, explor = 1.5),
filename = NULL,
initParam = list(nbOfTau = NULL, nbOfPointsPerTau = NULL, maxNbOfPasses = NULL,
minNbOfPasses = 1),
nbCores = parallel::detectCores()
)
Arguments
dataMatrix |
observed dense adjacency matrix |
model |
Implemented models:
|
sbmSize |
list of parameters determining the size of SBM (the number of latent blocks) to be expored
|
filename |
results are saved in a file with this name (if provided) |
initParam |
list of parameters that fix the number of initializations
|
nbCores |
number of cores used for parallelization |
Details
fitNSBM()
supports different probability distributions for the edges and can estimate the number of node blocks
Value
Returns a list of estimation results for all numbers of latent blocks considered by the algorithm. Every element is a list composed of:
theta
estimated parameters of the noisy stochastic block model; a list with the following elements:
pi
parameter estimate of pi
w
parameter estimate of w
nu0
parameter estimate of nu0
nu
parameter estimate of nu
clustering
node clustering obtained by the noisy stochastic block model, more precisely, a hard clustering given by the maximum aposterior estimate of the variational parameters
sbmParam$edgeProba
sbmParam
further results concerning the latent binary stochastic block model. A list with the following elements:
Q
number of latent blocks in the noisy stochastic block model
clusterProba
soft clustering given by the conditional probabilities of a node to belong to a given latent block. In other words, these are the variational parameters
tau
; (Q x n)-matrixedgeProba
conditional probabilities
rho
of an edges given the node memberships of the interacting nodes; (N_Q x N)-matrixICL
value of the ICL criterion at the end of the algorithm
convergence
a list of convergence indicators:
J
value of the lower bound of the log-likelihood function at the end of the algorithm
complLogLik
value of the complete log-likelihood function at the end of the algorithm
converged
indicates if algorithm has converged
nbIter
number of iterations performed
Examples
n <- 10
theta <- list(pi= c(0.5,0.5), nu0=c(0,.1),
nu=matrix(c(-2,10,-2, 1,1,1),3,2), w=c(.5, .9, .3))
obs <- rnsbm(n, theta, modelFamily='Gauss')
res <- fitNSBM(obs$dataMatrix, sbmSize = list(Qmax=3),
initParam=list(nbOfTau=1, nbOfPointsPerTau=1), nbCores=1)
optimal number of SBM blocks
Description
returns the number of SBM blocks that maximizes the ICL
Usage
getBestQ(bestSolutionAtQ)
Arguments
bestSolutionAtQ |
output of |
Value
a list the maximal value of the ICL criterion among the provided solutions along with the best number of latent blocks
Examples
# res_gauss is the output of a call of fitNSBM()
getBestQ(res_gauss)
compute rho associated with given values of w, nu0 and nu
Description
compute rho associated with given values of w, nu0 and nu
Usage
getRho(Q, w, nu0, nu, data, modelFamily)
Arguments
Q |
number of latent blocks in the noisy stochastic block model |
w |
weight parameter in the noisy stochastic block model |
nu0 |
null parameter in the noisy stochastic block model |
nu |
alternative parameter in the noisy stochastic block model |
data |
data vector in the undirected model, data matrix in the directed model |
modelFamily |
probability distribution for the edges. Possible values:
|
Value
a matrix of conditional probabilities of an edge given the node memberships of the interacting nodes
Evaluate tau_q*tau_l in the noisy stochastic block model
Description
Evaluate tau_q*tau_l in the noisy stochastic block model
Usage
getTauql(q, l, tau, n, directed)
Arguments
q |
indicator of a latent block |
l |
indicator of a latent block |
tau |
variational parameters |
n |
number of vertices |
directed |
booelan to indicate whether the model is directed or undirected |
new graph inference procedure
Description
new graph inference procedure
Usage
graphInference(
dataMatrix,
nodeClustering,
theta,
alpha = 0.05,
modelFamily = "Gauss"
)
Arguments
dataMatrix |
observed adjacency matrix, nxn matrix |
nodeClustering |
n-vector of hard node Clustering |
theta |
parameter of the noisy stochastic block model |
alpha |
confidence level |
modelFamily |
probability distribution for the edges. Possible values:
|
Details
graph inference procedure based on conditional q-values in the noisy stochastic block model. It works in the Gaussian model, and also in the Gamma model, but only if the shape parameters of the Gamma distributions under the null and the alternatives are identical (e.g. when all distributions are exponentials).
Value
a list with:
A
resulting binary adjacency matrix
qvalues
vector with conditional q-values in the noisy stochastic block model
Examples
set.seed(1)
theta <- list(pi=c(.5,.5), w=c(.8,.1,.2), nu0=c(0,1), nu=matrix(c(-1,5,10, 1,1,1), ncol=2))
obs <- rnsbm(n=30, theta)
# res_gauss <- fitNSBM(obs$dataMatrix, nbCores=1)
resGraph <- graphInference(obs$dataMatrix, res_gauss[[2]]$clustering, theta, alpha=0.05)
sum((resGraph$A))/2 # nb of derived edges
sum(obs$latentAdj)/2 # correct nb of edges
compute a list of initial points for the VEM algorithm
Description
compute a list of initial points of tau and rhofor the VEM algorithm for a given number of blocks; returns nbOfTau*nbOfPointsPerTau inital points
Usage
initialPoints(
Q,
dataMatrix,
nbOfTau,
nbOfPointsPerTau,
modelFamily,
model,
directed
)
Arguments
Q |
number of latent blocks in the noisy stochastic block model |
dataMatrix |
observed dense adjacency matrix |
nbOfTau |
number of initializations for the latent block memberships |
nbOfPointsPerTau |
number of initializations of the latent binary graph associated with each initial latent block memberships |
modelFamily |
probability distribution for the edges. Possible values:
|
model |
Implemented models:
|
directed |
booelan to indicate whether the model is directed or undirected |
Value
list of inital points of tau and rho of length nbOfTau*nbOfPointsPerTau
Construct initial values with Q groups by meging groups of a solution obtained with Q+1 groups
Description
Construct initial values with Q groups by meging groups of a solution obtained with Q+1 groups
Usage
initialPointsByMerge(
tau_Qp1,
nbOfTau,
nbOfPointsPerTau,
data,
modelFamily,
model,
directed
)
Arguments
tau_Qp1 |
tau for a model with Q+1 latent blocks |
nbOfTau |
number of initializations for the latent block memberships |
nbOfPointsPerTau |
number of initializations of the latent binary graph associated with each initial latent block memberships |
data |
data vector in the undirected model, data matrix in the directed model |
modelFamily |
probability distribution for the edges. Possible values:
|
model |
Implemented models:
|
directed |
booelan to indicate whether the model is directed or undirected |
Value
list of inital points of tau and rho of length nbOfTau*nbOfPointsPerTau
Construct initial values with Q groups by splitting groups of a solution obtained with Q-1 groups
Description
Construct initial values with Q groups by splitting groups of a solution obtained with Q-1 groups
Usage
initialPointsBySplit(
tau_Qm1,
nbOfTau,
nbOfPointsPerTau,
data,
modelFamily,
model,
directed
)
Arguments
tau_Qm1 |
tau for a model with Q-1 latent blocks |
nbOfTau |
number of initializations for the latent block memberships |
nbOfPointsPerTau |
number of initializations of the latent binary graph associated with each initial latent block memberships |
data |
data vector in the undirected model, data matrix in the directed model |
modelFamily |
probability distribution for the edges. Possible values:
|
model |
Implemented models:
|
directed |
booelan to indicate whether the model is directed or undirected |
Value
list of inital points of tau and rho of length nbOfTau*nbOfPointsPerTau
compute initial values of rho
Description
for every provided initial point of tau nbOfPointsPerTau initial values of rho are computed in the Gamma model also initial values of nu are computed
Usage
initialRho(listOfTau, nbOfPointsPerTau, data, modelFamily, model, directed)
Arguments
listOfTau |
output of initialTau() |
nbOfPointsPerTau |
number of initializations of the latent binary graph associated with each initial latent block memberships |
data |
data vector in the undirected model, data matrix in the directed model |
modelFamily |
probability distribution for the edges. Possible values:
|
model |
Implemented models:
|
directed |
booelan to indicate whether the model is directed or undirected |
Value
list of inital points of tau and rho
compute intial values for tau
Description
returns a list of length nbOfTau of initial points for tau using spectral clustering with absolute values, kmeans and random perturbations of these points
Usage
initialTau(Q, dataMatrix, nbOfTau, percentageOfPerturbation, directed)
Arguments
Q |
number of latent blocks in the noisy stochastic block model |
dataMatrix |
observed dense adjacency matrix |
nbOfTau |
number of initializations for the latent block memberships |
percentageOfPerturbation |
percentage of node labels that are perturbed to obtain further initial points |
directed |
booelan to indicate whether the model is directed or undirected |
Value
a list of length nbOfTau of initial points for tau
returns a list of all possible node pairs (i,j)
Description
returns a list of all possible node pairs (i,j)
Usage
listNodePairs(n, directed = FALSE)
Arguments
n |
number of nodes |
directed |
indicates if the graph is directed |
Value
a 2-column matrix with all possible node pairs (i,j)
compute conditional l-values in the noisy stochastic block model
Description
compute conditional l-values in the noisy stochastic block model
Usage
lvaluesNSBM(dataVec, Z, theta, directed = FALSE, modelFamily = "Gauss")
Arguments
dataVec |
data vector |
Z |
a node clustering |
theta |
list of parameters for a noisy stochastic block model |
directed |
indicates if the graph is directed |
modelFamily |
probability distribution for the edges. Possible values:
|
Value
conditional l-values in the noisy stochastic block model
main function of VEM algorithm with fixed number of SBM blocks
Description
main function of VEM algorithm with fixed number of SBM blocks
Usage
mainVEM_Q(init, modelFamily, model, data, directed)
Arguments
init |
list of initial points for the algorithm |
modelFamily |
probability distribution for the edges. Possible values:
|
model |
Implemented models:
|
data |
data vector in the undirected model, data matrix in the directed model |
directed |
booelan to indicate whether the model is directed or undirected |
Value
list of estimated model parameters and a node clustering; like the output of fitNSBM()
main function of VEM algorithm for fixed number of latent blocks in parallel computing
Description
runs the VEM algorithm the provided initial point
Usage
mainVEM_Q_par(s, ListOfTauRho, modelFamily, model, data, directed)
Arguments
s |
indice of initial point in ListOfTauRho to be used for this run |
ListOfTauRho |
a list of initial points |
modelFamily |
probability distribution for the edges. Possible values:
|
model |
Implemented models:
|
data |
data vector in the undirected model, data matrix in the directed model |
directed |
booelan to indicate whether the model is directed or undirected |
Value
list of estimated model parameters and a node clustering; like the output of fitNSBM()
evaluate the density in the current model
Description
evaluate the density in the current model
Usage
modelDensity(x, nu, modelFamily = "Gauss")
Arguments
x |
vector with points where to evaluate the density |
nu |
distribution parameter |
modelFamily |
probability distribution for the edges. Possible values:
|
plot the data matrix, the inferred graph and/or the true binary graph
Description
plot the data matrix, the inferred graph and/or the true binary graph
Usage
plotGraphs(dataMatrix = NULL, inferredGraph = NULL, binaryTruth = NULL)
Arguments
dataMatrix |
observed data matrix |
inferredGraph |
graph inferred by the multiple testing procedure via graphInference() |
binaryTruth |
true binary graph |
Value
a list of FDR and TDR values, if possible
plot ICL curve
Description
plot ICL curve
Usage
plotICL(res)
Arguments
res |
output of fitNSBM() |
Value
figure of ICL curve
Examples
# res_gauss is the output of a call of fitNSBM()
plotICL(res_gauss)
auxiliary function for the computation of q-values
Description
auxiliary function for the computation of q-values
Usage
q_delta_ql(theta, ind, t, modelFamily = "Gauss")
Arguments
theta |
list of parameters for a noisy stochastic block model |
ind |
indicator for a pair of latent blocks |
t |
l-values |
modelFamily |
probability distribution for the edges. Possible values:
|
compute q-values in the noisy stochastic block model
Description
compute q-values in the noisy stochastic block model
Usage
qvaluesNSBM(
dataVec,
Z,
theta,
lvalues,
modelFamily = "Gauss",
directed = FALSE
)
Arguments
dataVec |
data vector |
Z |
a node clustering |
theta |
list of parameters for a noisy stochastic block model |
lvalues |
conditional l-values in the noisy stochastic block model |
modelFamily |
probability distribution for the edges. Possible values:
|
directed |
indicates if the graph is directed |
Value
q-values in the noisy stochastic block model
Output of fitNSBM() on a dataset applied in the exponential NSBM
Description
Parameter estimates fitted on a dataset given in the vignette
Usage
res_exp
Format
List with estimation results for different number of SBM blocks. Output of fitNSBM()
Output of fitNSBM() on a dataset applied in the Gamma NSBM
Description
Parameter estimates fitted on a dataset given in the vignette
Usage
res_gamma
Format
List with estimation results for different number of SBM blocks. Output of fitNSBM()
Output of fitNSBM() on a dataset applied in the Gaussian NSBM
Description
Parameter estimates fitted on a dataset given in the vignette
Usage
res_gauss
Format
List with estimation results for different number of SBM blocks. Output of fitNSBM()
simulation of a graph according the noisy stochastic block model
Description
simulation of a graph according the noisy stochastic block model
Usage
rnsbm(n, theta, modelFamily = "Gauss", directed = FALSE)
Arguments
n |
number of nodes |
theta |
model parameters of the noisy stochastic block model
|
modelFamily |
probability distribution for the edges. Possible values:
|
directed |
indicates if the graph is directed (boolean) |
Value
a list with:
- dataMatrix
simulated matrix from the noisy stochastic block model
- theta
model parameters of the noisy stochastic block model
- latentZ
underlying latent node memberships
- latentAdj
underlying latent binary graph
Examples
n <- 10
Q <- 2
theta <- list(pi= rep(1/Q,Q), nu0=c(0,1))
theta$nu <- matrix(c(-2,10,-2, 1,1,1),nrow=Q*(Q+1)/2,ncol=2)
theta$w <- c(.5, .9, .3)
obs <- rnsbm(n, theta, modelFamily='Gauss')
obs
spectral clustering with absolute values
Description
performs absolute spectral clustering of an adjacency matrix
Usage
spectralClustering(A, K)
Arguments
A |
adjacency matrix |
K |
number of desired clusters |
Value
a vector containing a node clustering into K groups
Create new initial values by merging pairs of groups of provided tau
Description
Create nbOfMerges new initial values by merging nbOfMerges (or all possible) pairs of groups of provided tau
Usage
tauDown(tau, nbOfMerges)
Arguments
tau |
soft node clustering |
nbOfMerges |
number of required merges of blocks |
Value
a list of length nbOfMerges (at most) of initial points for tau
Create new values of tau by splitting groups of provided tau
Description
Create nbOfSplits (or all) new values of tau by splitting nbOfSplits (or all) groups of provided tau
Usage
tauUp(tau, nbOfSplits = 1)
Arguments
tau |
soft node clustering |
nbOfSplits |
number of required splits of blocks |
Value
a list of length nbOfSplits (at most) of initial points for tau
Compute one iteration to solve the fixed point equation in the VE-step
Description
Compute one iteration to solve the fixed point equation in the VE-step
Usage
tauUpdate(tau, log.w, log.1mw, data, VE, mstep, modelFamily, directed)
Arguments
tau |
current value of tau |
log.w |
value of log(w) |
log.1mw |
value of log(1-w) |
data |
data vector in the undirected model, data matrix in the directed model |
VE |
list with variational parameters tau and rho |
mstep |
list with current model parameters and additional auxiliary terms |
modelFamily |
probability distribution for the edges. Possible values:
|
directed |
booelan to indicate whether the model is directed or undirected |
Value
updated value of tau
Perform one iteration of the Newton-Raphson to compute the MLE of the parameters of the Gamma distribution
Description
Perform one iteration of the Newton-Raphson to compute the MLE of the parameters of the Gamma distribution
Usage
update_newton_gamma(param, L, M)
Arguments
param |
current parameters of the Gamma distribution |
L |
weighted mean of log(data) |
M |
weighted mean of the data |
Value
updated parameters of the Gamma distribution