Title: | Single Cell Analysis and Differential Expression |
Version: | 1.0.12 |
Description: | Analyzing and interactively exploring large-scale single-cell RNA-seq datasets. 'pagoda2' primarily performs normalization and differential gene expression analysis, with an interactive application for exploring single-cell RNA-seq datasets. It performs basic tasks such as cell size normalization, gene variance normalization, and can be used to identify subpopulations and run differential expression within individual samples. 'pagoda2' was written to rapidly process modern large-scale scRNAseq datasets of approximately 1e6 cells. The companion web application allows users to explore which gene expression patterns form the different subpopulations within your data. The package also serves as the primary method for preprocessing data for conos, https://github.com/kharchenkolab/conos. This package interacts with data available through the 'p2data' package, which is available in a 'drat' repository. To access this data package, see the instructions at https://github.com/kharchenkolab/pagoda2. The size of the 'p2data' package is approximately 6 MB. |
License: | GPL-3 |
Copyright: | See the file COPYRIGHTS for various pagoda2 copyright details |
Encoding: | UTF-8 |
Depends: | R (≥ 3.5.0), Matrix, igraph |
Imports: | dendsort, drat, fastcluster, graphics, grDevices, irlba, magrittr, MASS, mgcv, methods, N2R, parallel, plyr, R.utils, Rcpp, rjson, rlang, R6, RMTstat, Rook, Rtsne, sccore (≥ 0.1.1), stats, urltools, utils |
RoxygenNote: | 7.1.1 |
Suggests: | AnnotationDbi, base64enc, BiocGenerics, BiocParallel, colorRamps, data.table, dbscan, dplyr, ggplot2, GO.db, gridExtra, KernSmooth, knitr, org.Dr.eg.db, org.Hs.eg.db, org.Mm.eg.db, pcaMethods, pheatmap, rgl, rmarkdown, robustbase, scde, testthat, uwot |
URL: | https://github.com/kharchenkolab/pagoda2 |
BugReports: | https://github.com/kharchenkolab/pagoda2/issues |
NeedsCompilation: | yes |
LinkingTo: | Rcpp, RcppArmadillo, RcppProgress, RcppEigen |
Author: | Nikolas Barkas [aut], Viktor Petukhov [aut], Peter Kharchenko [aut], Simon Steiger [ctb], Rasmus Rydbirk [ctb], Evan Biederstedt [cre, aut] |
Maintainer: | Evan Biederstedt <evan.biederstedt@gmail.com> |
Packaged: | 2024-02-26 20:22:22 UTC; evanbiederstedt |
Repository: | CRAN |
Date/Publication: | 2024-02-27 00:50:02 UTC |
Correct unloading of the library
Description
Correct unloading of the library
Usage
.onUnload(libpath)
Arguments
libpath |
library path |
Return the mode of a vector
Description
Return the mode of a vector
Usage
Mode(x)
Arguments
x |
the vector to return the mode of |
Value
the mode elements
Pagoda2 R6 class
Description
The class encompasses gene count matrices, providing methods for normalization, calculating embeddings, and differential expression.
Public fields
counts
Gene count matrix, normalized on total counts (default=NULL)
modelType
string Model used to normalize count matrices. Only supported values are 'raw', 'plain', and 'linearObs'. – 'plain': Normalize by regressing out on the non-zero observations of each gene (default). – 'raw': Use the raw count matrices, without normalization. The expression matrix taken "as is" without normalization, although log.scale still applies. – 'linearObs': Fit a linear model of pooled counts across all genes against depth. This approach isn't recommened, as the depth dependency is not completely normalized out.
clusters
Results of clustering (default=list())
graphs
Graph representations of the dataset (default=list())
reductions
Results of reductions, e.g. PCA (default=list())
embeddings
Results of visualization algorithms, t-SNE or largeVis (default=list())
diffgenes
Lists of differentially expressed genes (default=list())
n.cores
number of cores (default=1)
misc
list with additional info (default=list())
batch
Batch factor for the dataset (default=NULL)
genegraphs
Slot to store graphical representations in gene space (i.e. gene kNN graphs) (default=list())
depth
Number of molecules measured per cell (default=NULL)
Methods
Public methods
Method new()
Initialize Pagoda2 class
Usage
Pagoda2$new( x, modelType = "plain", n.cores = parallel::detectCores(logical = FALSE), verbose = TRUE, min.cells.per.gene = 0, trim = round(min.cells.per.gene/2), min.transcripts.per.cell = 10, batch = NULL, lib.sizes = NULL, log.scale = TRUE, keep.genes = NULL )
Arguments
x
input count matrix
modelType
Model used to normalize count matrices (default='plain'). Only supported values are 'raw', 'plain', and 'linearObs'.
n.cores
numeric Number of cores to use (default=1)
verbose
boolean Whether to give verbose output (default=TRUE)
min.cells.per.gene
integer Minimum number of cells per gene, used to subset counts for coverage (default=0)
trim
numeric Parameter used for winsorizing count data (default=round(min.cells.per.gene/2)). If value>0, will winsorize counts in normalized space in the hopes of getting a more stable depth estimates. If value<=0, ignored.
min.transcripts.per.cell
integer Minimum number of transcripts per cells, used to subset counts for coverage (default=10)
batch
fctor Batch factor for the dataset (default=NULL)
lib.sizes
character vector of library sizes (default=NULL)
log.scale
boolean If TRUE, scale counts by log() (default=TRUE)
keep.genes
list of genes to keep in count matrix after filtering out by coverage but before normalization (default=NULL)
Returns
new Pagoda2 object
Examples
\donttest{ ## Load pre-generated a dataset of 50 bone marrow cells as matrix cm <- readRDS(system.file("extdata", "sample_BM1_50.rds", package="pagoda2")) ## Perform QC, i.e. filter any cells that counts <- gene.vs.molecule.cell.filter(cm, min.cell.size=500) rownames(counts) <- make.unique(rownames(counts)) ## Generate Pagoda2 object p2_object <- Pagoda2$new(counts, log.scale=TRUE, min.cells.per.gene=10, n.cores=1) }
Method setCountMatrix()
Provide the initial count matrix, and estimate deviance residual matrix (correcting for depth and batch)
Usage
Pagoda2$setCountMatrix( countMatrix, depthScale = 1000, min.cells.per.gene = 0, trim = round(min.cells.per.gene/2), min.transcripts.per.cell = 10, lib.sizes = NULL, log.scale = FALSE, keep.genes = NULL, verbose = TRUE )
Arguments
countMatrix
input count matrix
depthScale
numeric Scaling factor for normalizing counts (defaul=1e3). If 'plain', counts are scaled by counts = counts/as.numeric(depth/depthScale).
min.cells.per.gene
integer Minimum number of cells per gene, used to subset counts for coverage (default=0)
trim
numeric Parameter used for winsorizing count data (default=round(min.cells.per.gene/2)). If value>0, will winsorize counts in normalized space in the hopes of getting a more stable depth estimates. If value<=0, ignored.
min.transcripts.per.cell
integer Minimum number of transcripts per cells, used to subset counts for coverage (default=10)
lib.sizes
character vector of library sizes (default=NULL)
log.scale
boolean If TRUE, scale counts by log() (default=TRUE)
keep.genes
list of genes to keep in count matrix after filtering out by coverage but before normalization (default=NULL)
verbose
boolean Whether to give verbose output (default=TRUE)
Returns
normalized count matrix (or if modelTye='raw', the unnormalized count matrix)
Method adjustVariance()
Adjust variance of the residual matrix, determine overdispersed sites This is done to normalize the extent to which genes with (very) different expression magnitudes will contribute to the downstream anlaysis.
Usage
Pagoda2$adjustVariance( gam.k = 5, alpha = 0.05, plot = FALSE, use.raw.variance = FALSE, use.unadjusted.pvals = FALSE, do.par = TRUE, max.adjusted.variance = 1000, min.adjusted.variance = 0.001, cells = NULL, verbose = TRUE, min.gene.cells = 0, persist = is.null(cells), n.cores = self$n.cores )
Arguments
gam.k
integer The k used for the generalized additive model 'v ~ s(m, k =gam.k)' (default=5). If gam.k<2, linear regression is used 'lm(v ~ m)'.
alpha
numeric The Type I error probability or the significance level (default=5e-2). This is the criterion used to measure statistical significance, i.e. if the p-value < alpha, then it is statistically significant.
plot
boolean Whether to output the plot (default=FALSE)
use.raw.variance
(default=FALSE). If modelType='raw', then this conditional will be used as TRUE.
use.unadjusted.pvals
boolean Whether to use Benjamini-Hochberg adjusted p-values (default=FALSE).
do.par
boolean Whether to put multiple graphs into a signle plot with par() (default=TRUE)
max.adjusted.variance
numeric Maximum adjusted variance (default=1e3). The gene scale factor is defined as sqrt(pmax(min.adjusted.variance,pmin(max.adjusted.variance,df$qv))/exp(df$v))
min.adjusted.variance
numeric Minimum adjusted variance (default=1e-3). The gene scale factor is defined as sqrt(pmax(min.adjusted.variance,pmin(max.adjusted.variance,df$qv))/exp(df$v))
cells
character vector Subset of cells upon which to perform variance normalization with adjustVariance() (default=NULL)
verbose
boolean Whether to give verbose output (default=TRUE)
min.gene.cells
integer Minimum number of genes per cells (default=0). This parameter is used to filter counts.
persist
boolean Whether to save results (default=TRUE, i.e. is.null(cells)).
n.cores
numeric Number of cores to use (default=1)
Returns
residual matrix with adjusted variance
Examples
\donttest{ ## Load pre-generated a dataset of 50 bone marrow cells as matrix cm <- readRDS(system.file("extdata", "sample_BM1_50.rds", package="pagoda2")) ## Perform QC, i.e. filter any cells that counts <- gene.vs.molecule.cell.filter(cm, min.cell.size=500) rownames(counts) <- make.unique(rownames(counts)) ## Generate Pagoda2 object p2_object <- Pagoda2$new(counts,log.scale=TRUE, min.cells.per.gene=10, n.cores=1) ## Normalize gene expression variance p2_object$adjustVariance(plot=TRUE, gam.k=10) }
Method makeKnnGraph()
Create k-nearest neighbor graph
Usage
Pagoda2$makeKnnGraph( k = 30, nrand = 1000, type = "counts", weight.type = "1m", odgenes = NULL, n.cores = self$n.cores, distance = "cosine", center = TRUE, x = NULL, p = NULL, var.scale = (type == "counts"), verbose = TRUE )
Arguments
k
integer Number of k clusters for k-NN (default=30)
nrand
numeric Number of randomizations i.e. the gene sets (of the same size) to be evaluated in parallel with each gene set (default=1e3)
type
string Data type of the reduction (default='counts'). If type='counts', this will access the raw counts. Otherwise, 'type' must be name of the reductions.
weight.type
string 'cauchy', 'normal', 'constant', '1m' (default='1m')
odgenes
character vector Overdispersed genes to retrieve (default=NULL)
n.cores
numeric Number of cores to use (default=1)
distance
string Distance metric used: 'cosine', 'L2', 'L1', 'cauchy', 'euclidean' (default='cosine')
center
boolean Whether to use centering when distance='cosine' (default=TRUE). The parameter is ignored otherwise.
x
counts or reduction to use (default=NULL). If NULL, uses counts. Otherwise, checks for the reduction in self$reductions[[type]]
p
(default=NULL)
var.scale
boolean Apply scaling if using raw counts (default=TRUE). If type="counts", var.scale is TRUE by default.
verbose
boolean Whether to give verbose output (default=TRUE)
Returns
kNN graph, stored in self$graphs
Examples
\donttest{ ## Load pre-generated a dataset of 50 bone marrow cells as matrix cm <- readRDS(system.file("extdata", "sample_BM1_50.rds", package="pagoda2")) ## Perform QC, i.e. filter any cells that counts <- gene.vs.molecule.cell.filter(cm, min.cell.size=300) rownames(counts) <- make.unique(rownames(counts)) ## Generate Pagoda2 object p2_object <- Pagoda2$new(counts,log.scale=TRUE, min.cells.per.gene=10, n.cores=1) ## Normalize gene expression variance p2_object$adjustVariance(plot=TRUE, gam.k=10) ## Generate a kNN graph of cells that will allow us to identify clusters of cells p2_object$makeKnnGraph(k=20, center=FALSE, distance='L2') }
Method getKnnClusters()
Calculate clusters based on the kNN graph
Usage
Pagoda2$getKnnClusters( type = "counts", method = igraph::multilevel.community, name = "community", n.cores = self$n.cores, g = NULL, min.cluster.size = 1, persist = TRUE, ... )
Arguments
type
string Data type (default='counts'). Currently only 'counts' supported.
method
Method to use (default=igraph::multilevel.community). Accepted methods are either 'igraph::infomap.community' or 'igraph::multilevel.community'. If NULL, if the number of vertices of the graph is greater than or equal to 2000, 'igraph::multilevel.community' will be used. Otherwise, 'igraph::infomap.community' will be used.
name
string Name of the community structure calculated from 'method' (default='community')
n.cores
numeric Number of cores to use (default=1)
g
Input graph (default=NULL). If NULL, access graph from self$graphs[[type]].
min.cluster.size
Minimum size of clusters (default=1). This parameter is primarily used to remove very small clusters.
persist
boolean Whether to save the clusters and community structure (default=TRUE)
...
Additional parameters to pass to 'method'
Returns
the community structure calculated from 'method'
Method geneKnnbyPCA()
Deprecated function. Use makeGeneKnnGraph() instead.
Usage
Pagoda2$geneKnnbyPCA()
Method getHierarchicalDiffExpressionAspects()
Take a given clustering and generate a hierarchical clustering
Usage
Pagoda2$getHierarchicalDiffExpressionAspects( type = "counts", groups = NULL, clusterName = NULL, method = "ward.D", dist = "pearson", persist = TRUE, z.threshold = 2, n.cores = self$n.cores, min.set.size = 5, verbose = TRUE )
Arguments
type
string Data type of the reduction (default='counts'). If type='counts', this will access the raw counts. Otherwise, 'type' must be name of the reductions.
groups
factor named with cell names specifying the clusters of cells to be compared (one against all) (default=NULL). To compare two cell clusters against each other, simply pass a factor containing only two levels.
clusterName
string Cluster name to access (default=NULL)
method
string The agglomeration method to be used in stats::hcust(method=method) (default='ward.D'). Accepted values are: "ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC). For more information, see stats::hclust().
dist
string 'pearson', 'spearman', 'euclidean', 'L2', 'JS' (default='pearson')
persist
boolean Whether to save the clusters and community structure (default=TRUE)
z.threshold
numeric Threshold of z-scores to filter, >=z.threshold are kept (default=2)
n.cores
numeric Number of cores to use (default=1)
min.set.size
integer Minimum threshold of sets to keep (default=5)
verbose
boolean Whether to give verbose output (default=TRUE)
Returns
hierarchical clustering
Method makeGeneKnnGraph()
Calculates gene Knn network for gene similarity
Usage
Pagoda2$makeGeneKnnGraph( nPcs = 100, center = TRUE, fastpath = TRUE, maxit = 1000, k = 30, n.cores = self$n.cores, verbose = TRUE )
Arguments
nPcs
integer Number of principal components (default=100). This is the parameter 'nv' in irlba::irlba(), the number of right singular vectors to estimate.
center
boolean Whether to center the PCA (default=TRUE)
fastpath
boolean Whether to try a (fast) C algorithm implementation if possible (default=TRUE). This parameter is equivalent to 'fastpath' in irlba::irlba().
maxit
integer Maximum number of iterations (default=1000). This parameter is equivalent to 'maxit' in irlba::irlba().
k
integer Number of k clusters for calculating k-NN on the resulting principal components (default=30).
n.cores
numeric Number of cores to use (default=1)
verbose
boolean Whether to give verbose output (default=TRUE)
Returns
graph with gene similarity
Method getDensityClusters()
Calculate density-based clusters
Usage
Pagoda2$getDensityClusters( type = "counts", embeddingType = NULL, name = "density", eps = 0.5, v = 0.7, s = 1, verbose = TRUE, ... )
Arguments
type
string Data type (default='counts'). Currently only 'counts' supported.
embeddingType
The type of embedding used when calculating with ‘getEmbedding()' (default=NULL). Accepted values are: ’largeVis', 'tSNE', 'FR', 'UMAP', 'UMAP_graph'
name
string Name fo the clustering (default='density').
eps
numeric value of the eps parameter, fed into dbscan::dbscan(x=emb, eps=eps, ...)
v
numeric The “value” to be used to complete the HSV color descriptions (default=0.7). Equivalent to the 'v' parameter in grDevices::rainbow().
s
numeric The “saturation” to be used to complete the HSV color descriptions (default=1). Equivalent to the 's' parameter in grDevices::rainbow().
verbose
boolean Whether to give verbose output (default=TRUE)
...
Additional parameters passed to dbscan::dbscan(emb, ...)
Returns
density-based clusters
Method getDifferentialGenes()
Determine differentially expressed genes, comparing each group against all others using Wilcoxon rank sum test
Usage
Pagoda2$getDifferentialGenes( type = "counts", clusterType = NULL, groups = NULL, name = "customClustering", z.threshold = 3, upregulated.only = FALSE, verbose = FALSE, append.specificity.metrics = TRUE, append.auc = FALSE )
Arguments
type
string Data type (default='counts'). Currently only 'counts' supported.
clusterType
Optional cluster type to use as a group-defining factor (default=NULL)
groups
factor named with cell names specifying the clusters of cells to be compared (one against all) (default=NULL). To compare two cell clusters against each other, simply pass a factor containing only two levels.
name
string Slot to store the results in (default='customClustering')
z.threshold
numeric Minimal absolute Z score (adjusted) to report (default=3)
upregulated.only
boolean Whether to report only genes that are expressed significantly higher in each group (default=FALSE)
verbose
boolean Whether to give verbose output (default=FALSE)
append.specificity.metrics
boolean Whether to append specifity metrics (default=TRUE). Uses the function sccore::appendSpecificityMetricsToDE().
append.auc
boolean If TRUE, append AUC values (default=FALSE). Parameter ignored if append.specificity.metrics is FALSE.
Returns
List with each element of the list corresponding to a cell group in the provided/used factor (i.e. factor levels) Each element of a list is a data frame listing the differentially epxressed genes (row names), with the following columns: Z - adjusted Z score, with positive values indicating higher expression in a given group compare to the rest M - log2 fold change highest- a boolean flag indicating whether the expression of a given gene in a given vcell group was on average higher than in every other cell group fe - fraction of cells in a given group having non-zero expression level of a given gene
Method plotDiffGeneHeatmap()
Plot heatmap of DE results
Usage
Pagoda2$plotDiffGeneHeatmap( type = "counts", clusterType = NULL, groups = NULL, n.genes = 100, z.score = 2, gradient.range.quantile = 0.95, inner.clustering = FALSE, gradientPalette = NULL, v = 0.8, s = 1, box = TRUE, drawGroupNames = FALSE, ... )
Arguments
type
string Data type (default='counts'). Currently only 'counts' supported.
clusterType
Optional cluster type to use as a group-defining factor (default=NULL)
groups
factor named with cell names specifying the clusters of cells to be compared (one against all) (default=NULL). To compare two cell clusters against each other, simply pass a factor containing only two levels.
n.genes
integer Number of genes to plot (default=100)
z.score
numeric Threshold of z-scores to filter (default=2). Only greater than or equal to this value are kept.
gradient.range.quantile
numeric Trimming quantile (default=0.95)
inner.clustering
boolean Whether to cluster cells within each cluster (default=FALSE)
gradientPalette
palette of colors to use (default=NULL). If NULL, uses 'colorRampPalette(c('gray90','red'), space = "Lab")(1024)'
v
numeric The “value” to be used to complete the HSV color descriptions (default=0.7). Equivalent to the 'v' parameter in grDevices::rainbow().
s
numeric The “saturation” to be used to complete the HSV color descriptions (default=1). Equivalent to the 's' parameter in grDevices::rainbow().
box
boolean Whether to draw a box around the current plot in the given color and linetype (default=TRUE)
drawGroupNames
boolean Whether to draw group names (default=FALSE)
...
Additional parameters passed to internal function used for heatmap plotting, my.heatmap2()
Returns
heatmap of DE results
Method getRefinedLibSizes()
Recalculate library sizes using robust regression within clusters
Usage
Pagoda2$getRefinedLibSizes( clusterType = NULL, groups = NULL, type = "counts", n.cores = self$n.cores )
Arguments
clusterType
Name of cluster to access (default=NULL). If NULL, takes the most recently generated clustering. Parameter ignored if groups is not NULL.
groups
factor named with cell names specifying the clusters of cells (default=NULL)
type
string Either 'counts' or the name of a stored embedding, names(self$embeddings) (default=NULL)
n.cores
numeric Number of cores to use (default=self$n.cores=1)
Returns
recalculated library sizes
Method plotGeneHeatmap()
Plot heatmap for a given set of genes
Usage
Pagoda2$plotGeneHeatmap( genes, type = "counts", clusterType = NULL, groups = NULL, gradient.range.quantile = 0.95, cluster.genes = FALSE, inner.clustering = FALSE, gradientPalette = NULL, v = 0.8, s = 1, box = TRUE, drawGroupNames = FALSE, useRaster = TRUE, smooth.span = max(1, round(nrow(self$counts)/1024)), ... )
Arguments
genes
character vector Gene names
type
string Data type (default='counts'). Currently only 'counts' supported.
clusterType
Optional cluster type to use as a group-defining factor (default=NULL)
groups
factor named with cell names specifying the clusters of cells to be compared (one against all) (default=NULL). To compare two cell clusters against each other, simply pass a factor containing only two levels.
gradient.range.quantile
numeric Trimming quantile (default=0.95)
cluster.genes
boolean Whether to cluster genes within each cluster using stats::hclust() (default=FALSE)
inner.clustering
boolean Whether to cluster cells within each cluster (default=FALSE)
gradientPalette
palette of colors to use (default=NULL). If NULL, uses 'colorRampPalette(c('gray90','red'), space = "Lab")(1024)'
v
numeric The “value” to be used to complete the HSV color descriptions (default=0.7). Equivalent to the 'v' parameter in grDevices::rainbow().
s
numeric The “saturation” to be used to complete the HSV color descriptions (default=1). Equivalent to the 's' parameter in grDevices::rainbow().
box
boolean Whether to draw a box around the current plot in the given color and linetype (default=TRUE)
drawGroupNames
boolean Whether to draw group names (default=FALSE)
useRaster
boolean If TRUE a bitmap raster is used to plot the image instead of polygons (default=TRUE). The grid must be regular in that case, otherwise an error is raised. For more information, see graphics::image().
smooth.span
(default=max(1,round(nrow(self$counts)/1024)))
...
Additional parameters passed to internal function used for heatmap plotting, my.heatmap2()
Returns
plot of gene heatmap
Method plotEmbedding()
Show embedding
Usage
Pagoda2$plotEmbedding( type = NULL, embeddingType = NULL, clusterType = NULL, groups = NULL, colors = NULL, gene = NULL, plot.theme = ggplot2::theme_bw(), ... )
Arguments
type
string Either 'counts' or the name of a stored embedding, names(self$embeddings) (default=NULL)
embeddingType
string Embedding type (default=NULL). If NULL, takes the most recently generated embedding.
clusterType
Name of cluster to access (default=NULL). If NULL, takes the most recently generated clustering. Parameter ignored if groups is not NULL.
groups
factor named with cell names specifying the clusters of cells (default=NULL)
colors
character vector List of gene names (default=NULL)
gene
(default=NULL)
plot.theme
(default=ggplot2::theme_bw())
...
Additional parameters passed to sccore::embeddingPlot()
Returns
plot of the embedding
Method getOdGenes()
Get overdispersed genes
Usage
Pagoda2$getOdGenes( n.odgenes = NULL, alpha = 0.05, use.unadjusted.pvals = FALSE )
Arguments
n.odgenes
integer Number of overdispersed genes to retrieve (default=NULL). If NULL, will return all.
alpha
numeric The Type I error probability or the significance level (default=5e-2). This is the criterion used to measure statistical significance, i.e. if the p-value < alpha, then it is statistically significant.
use.unadjusted.pvals
boolean Whether to use Benjamini-Hochberg adjusted p-values (default=FALSE).
Returns
vector of overdispersed genes
Method getNormalizedExpressionMatrix()
Return variance-normalized matrix for specified genes or a number of OD genes
Usage
Pagoda2$getNormalizedExpressionMatrix(genes = NULL, n.odgenes = NULL)
Arguments
genes
vector of gene names to explicitly return (default=NULL)
n.odgenes
integer Number of overdispersed genes to retrieve (default=NULL). If NULL, will return all.
Returns
cell by gene matrix
Method calculatePcaReduction()
Calculate PCA reduction of the data
Usage
Pagoda2$calculatePcaReduction( nPcs = 20, type = "counts", name = "PCA", use.odgenes = TRUE, n.odgenes = NULL, odgenes = NULL, center = TRUE, cells = NULL, fastpath = TRUE, maxit = 100, verbose = TRUE, var.scale = (type == "counts"), ... )
Arguments
nPcs
numeric Number of principal components (PCs) (default=20)
type
string Dataset view to reduce (counts by default, but can specify a name of an existing reduction) (default='counts')
name
string Name for the PCA reduction to be created (default='PCA')
use.odgenes
boolean Whether pre-calculated set of overdispersed genes should be used (default=TRUE)
n.odgenes
integer Number of overdispersed genes to retrieve (default=NULL). If NULL, will return all.
odgenes
Explicitly specify a set of overdispersed genes to use for the reduction (default=NULL)
center
boolean Whether data should be centered prior to PCA (default=TRUE)
cells
optional subset of cells on which PCA should be run (default=NULL)
fastpath
boolean Use C implementation for speedup (default=TRUE)
maxit
numeric Maximum number of iterations (default=100). For more information, see 'maxit' parameter in irlba::irlba().
verbose
boolean Whether to give verbose output (default=TRUE)
var.scale
boolean Apply scaling if using raw counts (default=TRUE). If type="counts", var.scale is TRUE by default.
...
additional arguments forwarded to irlba::irlba
Returns
Invisible PCA result (the reduction itself is saved in self$reductions[[name]])"
Method expandOdGenes()
Reset overdispersed genes 'odgenes' to be a superset of the standard odgene selection (guided by n.odgenes or alpha), and a set of recursively determined odgenes based on a given group (or a cluster info)
Usage
Pagoda2$expandOdGenes( type = "counts", clusterType = NULL, groups = NULL, min.group.size = 30, od.alpha = 0.1, use.odgenes = FALSE, n.odgenes = NULL, odgenes = NULL, n.odgene.multiplier = 1, gam.k = 10, verbose = FALSE, n.cores = self$n.cores, min.odgenes = 10, max.odgenes = Inf, recursive = TRUE )
Arguments
type
string Data type (default='counts'). Currently only 'counts' supported.
clusterType
Optional cluster type to use as a group-defining factor (default=NULL)
groups
factor named with cell names specifying the clusters of cells to be compared (one against all) (default=NULL). To compare two cell clusters against each other, simply pass a factor containing only two levels.
min.group.size
integer Number of minimum cells for filtering out group size (default=30)
od.alpha
numeric The Type I error probability or the significance level for calculating overdispersed genes (default=1e-1). This is the criterion used to measure statistical significance, i.e. if the p-value < alpha, then it is statistically significant.
use.odgenes
boolean Whether pre-calculated set of overdispersed genes should be used (default=FALSE)
n.odgenes
integer Number of overdispersed genes to retrieve (default=NULL). If NULL, will return all.
odgenes
Explicitly specify a set of overdispersed genes to use for the reduction (default=NULL) #' @param odgenes (default=NULL)
n.odgene.multiplier
numeric (default=1)
gam.k
integer The k used for the generalized additive model 'v ~ s(m, k =gam.k)' (default=10). If gam.k<2, linear regression is used 'lm(v ~ m)'.
verbose
boolean Whether to give verbose output (default=TRUE)
n.cores
numeric Number of cores to use (default=1)
min.odgenes
integer Minimum number of overdispersed genes to use (default=10)
max.odgenes
integer Maximum number of overdispersed genes to use (default=Inf)
recursive
boolean Whether to determine groups for which variance normalization will be rerun (default=TRUE)
Returns
List of overdispersed genes
Method localPcaKnn()
local PCA implementation
Usage
Pagoda2$localPcaKnn( nPcs = 5, type = "counts", clusterType = NULL, groups = NULL, k = 30, b = 1, a = 1, min.group.size = 30, name = "localPCA", od.alpha = 0.1, n.odgenes = NULL, gam.k = 10, verbose = FALSE, n.cores = self$n.cores, min.odgenes = 5, take.top.odgenes = FALSE, recursive = TRUE, euclidean = FALSE, perplexity = k, return.pca = FALSE, skip.pca = FALSE )
Arguments
nPcs
integer Number of principal components (default=5)
type
string Data type (default='counts'). Currently only 'counts' supported.
clusterType
Optional cluster type to use as a group-defining factor (default=NULL)
groups
factor named with cell names specifying the clusters of cells to be compared (one against all) (default=NULL). To compare two cell clusters against each other, simply pass a factor containing only two levels.
k
integer Number of components for kNN graph (default=30)
b
numeric Constant within exp(-b*(ncid/cldsd)^2), used for calculating cell relevance per cluster (default=1)
a
numeric Constant within "(1-exp(-a*(dsq)/(p$pcs$trsd^2)))*(pk /outerproduct pk)" (default=1)
min.group.size
integer Number of minimum cells for filtering out group size (default=30)
name
string Title (default='localPCA')
od.alpha
numeric Significance level for calculating overdispersed genes (default=1e-1). P-values will be filtered by <log(od.alpha).
n.odgenes
integer Number of overdispersed genes to retrieve (default=NULL). If NULL, will return all.
gam.k
integer The k used for the generalized additive model 'v ~ s(m, k =gam.k)' (default=10). If gam.k<2, linear regression is used 'lm(v ~ m)'.
verbose
boolean Whether to give verbose output (default=TRUE)
n.cores
numeric Number of cores to use (default=1)
min.odgenes
integer Minimum number of overdispersed genes to use (default=5)
take.top.odgenes
boolean Take top overdispersed genes in decreasing order (default=FALSE)
recursive
boolean Whether to recursively determine groups for which variance normalization will be rerun (default=FALSE)
euclidean
boolean Whether to applied euclidean-based distance similarity during variance normalization (default=FALSE)
perplexity
integer Perplexity parameter within Rtsne::Rtsne() (default=k). Please see Rtsne for more details.
return.pca
boolean Whether to return the PCs (default=FALSE)
skip.pca
boolean If TRUE and return.pca=TRUE, will return a list of scale factors, cells, and overdispersed genes, i.e. list(sf=sf, cells=cells, odgenes=odgenes) (default=FALSE). Otherwise, ignored.
Returns
localPcaKnn return here
Method testPathwayOverdispersion()
Test pathway overdispersion Note: this is a compressed version of the PAGODA1 approach in SCDE <https://hms-dbmi.github.io/scde/>
Usage
Pagoda2$testPathwayOverdispersion( setenv, type = "counts", max.pathway.size = 1000, min.pathway.size = 10, n.randomizations = 5, verbose = FALSE, n.cores = self$n.cores, score.alpha = 0.05, plot = FALSE, cells = NULL, adjusted.pvalues = TRUE, z.score = qnorm(0.05/2, lower.tail = FALSE), use.oe.scale = FALSE, return.table = FALSE, name = "pathwayPCA", correlation.distance.threshold = 0.2, loading.distance.threshold = 0.01, top.aspects = Inf, recalculate.pca = FALSE, save.pca = TRUE )
Arguments
setenv
Specific environment for pathway analysis
type
string Data type (default='counts'). Currently only 'counts' supported.
max.pathway.size
integer Maximum number of observed genes in a valid gene set (default=1e3)
min.pathway.size
integer Minimum number of observed genes that should be contained in a valid gene set (default=10)
n.randomizations
numeric Number of random gene sets (of the same size) to be evaluated in parallel with each gene set (default=5). (This can be kept at 5 or 10, but should be increased to 50-100 if the significance of pathway overdispersion will be determined relative to random gene set models.)
verbose
boolean Whether to give verbose output (default=TRUE)
n.cores
numeric Number of cores to use (default=1)
score.alpha
numeric Significance level of the confidence interval for determining upper/lower bounds (default=0.05)
plot
boolean Whether to output the plot (default=FALSE)
cells
character vector Specific cells to investigate (default=NULL)
adjusted.pvalues
boolean Whether to use adjusted p-values (default=TRUE)
z.score
numeric Z-score to be used as a cutoff for statistically significant patterns (default=qnorm(0.05/2, lower.tail = FALSE))
use.oe.scale
boolean Whether the variance of the returned aspect patterns should be normalized using observed/expected value instead of the default chi-squared derived variance corresponding to overdispersion Z-score (default=FALSE)
return.table
boolean Whether to return a text table with results (default=FALSE)
name
string Title (default='pathwayPCA')
correlation.distance.threshold
numeric Similarity threshold for grouping interdependent aspects in pagoda.reduce.redundancy() (default=0.2)
loading.distance.threshold
numeric Similarity threshold for grouping interdependent aspects in pagoda.reduce.loading.redundancy() (default=0.2)
top.aspects
Restrict output to the top N aspects of heterogeneity (default=Inf)
recalculate.pca
boolean Whether to recalculate PCA (default=FALSE)
save.pca
boolean Whether to save the PCA results (default=TRUE). If TRUE, caches them in self$misc[['pwpca']].
Returns
pathway output
Method getEmbedding()
Return embedding
Usage
Pagoda2$getEmbedding( type = "counts", embeddingType = "largeVis", name = NULL, dims = 2, M = 1, gamma = 1/M, perplexity = 50, verbose = TRUE, sgd_batches = NULL, diffusion.steps = 0, diffusion.power = 0.5, distance = "pearson", n.cores = self$n.cores, n.sgd.cores = n.cores, ... )
Arguments
type
string Data type (default='counts'). Currently only 'counts' supported.
embeddingType
string Type of embedding to construct (default='largeVis'). Possible values are: 'largeVis', 'tSNE', 'FR' (Fruchterman–Reingold), 'UMAP', 'UMAP_graph'
name
string Name of the embedding (default=NULL). If NULL, the name = embeddingType.
dims
integer Parameter 'dims' Matrix::sparseMatrix(); a non-negative, integer, dimensions vector of length 2 (default=2). See Matrix package documentation for more details.
M
numeric (largeVis) The number of negative edges to sample for each positive edge (default=5). Parameter only used if embeddingType is 'largeVis'.
gamma
numeric (largeVis) The strength of the force pushing non-neighbor nodes apart (default=7). Parameter only used if embeddingType is 'largeVis'.
perplexity
numeric Parameter 'perplexity' within largeVis::buildWijMatrix() (default=50). Please see the largeVis documentation for more details.
verbose
boolean Whether to give verbose output (default=TRUE)
sgd_batches
numeric The number of edges to process during SGD (default=NULL). Passed to projectKNNs(). Defaults to a value set based on the size of the dataset. If the parameter given is between
0
and1
, the default value will be multiplied by the parameter.diffusion.steps
integer Iteration steps to use. If 0, no steps are run. (default=0)
diffusion.power
numeric Factor to be used when calculating diffusion, (default=0.5)
distance
string 'pearson', 'spearman', 'euclidean', 'L2', 'JS' (default='pearson')
n.cores
numeric Number of cores to use (default=1)
n.sgd.cores
numeric Number of cores to use (default=n.cores)
...
Additional parameters passed to embedding functions, Rtsne::Rtsne() if 'L2', uwot::umap() if 'UMAP', embedKnnGraphUmap() if 'UMAP_graph'
Returns
embedding stored in self$embedding
Method clone()
The objects of this class are cloneable with this method.
Usage
Pagoda2$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
Author(s)
Simon Steiger
Examples
## ------------------------------------------------
## Method `Pagoda2$new`
## ------------------------------------------------
## Load pre-generated a dataset of 50 bone marrow cells as matrix
cm <- readRDS(system.file("extdata", "sample_BM1_50.rds", package="pagoda2"))
## Perform QC, i.e. filter any cells that
counts <- gene.vs.molecule.cell.filter(cm, min.cell.size=500)
rownames(counts) <- make.unique(rownames(counts))
## Generate Pagoda2 object
p2_object <- Pagoda2$new(counts, log.scale=TRUE, min.cells.per.gene=10, n.cores=1)
## ------------------------------------------------
## Method `Pagoda2$adjustVariance`
## ------------------------------------------------
## Load pre-generated a dataset of 50 bone marrow cells as matrix
cm <- readRDS(system.file("extdata", "sample_BM1_50.rds", package="pagoda2"))
## Perform QC, i.e. filter any cells that
counts <- gene.vs.molecule.cell.filter(cm, min.cell.size=500)
rownames(counts) <- make.unique(rownames(counts))
## Generate Pagoda2 object
p2_object <- Pagoda2$new(counts,log.scale=TRUE, min.cells.per.gene=10, n.cores=1)
## Normalize gene expression variance
p2_object$adjustVariance(plot=TRUE, gam.k=10)
## ------------------------------------------------
## Method `Pagoda2$makeKnnGraph`
## ------------------------------------------------
## Load pre-generated a dataset of 50 bone marrow cells as matrix
cm <- readRDS(system.file("extdata", "sample_BM1_50.rds", package="pagoda2"))
## Perform QC, i.e. filter any cells that
counts <- gene.vs.molecule.cell.filter(cm, min.cell.size=300)
rownames(counts) <- make.unique(rownames(counts))
## Generate Pagoda2 object
p2_object <- Pagoda2$new(counts,log.scale=TRUE, min.cells.per.gene=10, n.cores=1)
## Normalize gene expression variance
p2_object$adjustVariance(plot=TRUE, gam.k=10)
## Generate a kNN graph of cells that will allow us to identify clusters of cells
p2_object$makeKnnGraph(k=20, center=FALSE, distance='L2')
Quick utility to check if given character vector is colors Thanks to Stackoverflow: http://stackoverflow.com/questions/13289009/check-if-character-string-is-a-valid-color-representation
Description
Quick utility to check if given character vector is colors Thanks to Stackoverflow: http://stackoverflow.com/questions/13289009/check-if-character-string-is-a-valid-color-representation
Usage
areColors(x)
Arguments
x |
character vector to check |
Value
boolean whether character vector is colors
armaCor - matrix column correlations. Allows faster matrix correlations with armadillo. Similar to cor() call, will calculate correlation between matrix columns
Description
armaCor - matrix column correlations. Allows faster matrix correlations with armadillo. Similar to cor() call, will calculate correlation between matrix columns
Usage
armaCor(mat)
Arguments
mat |
matrix |
Value
matrix with columns as correlations
Perform basic 'pagoda2' processing, i.e. adjust variance, calculate pca reduction, make knn graph, identify clusters with multilevel, and generate largeVis and tSNE embeddings.
Description
Perform basic 'pagoda2' processing, i.e. adjust variance, calculate pca reduction, make knn graph, identify clusters with multilevel, and generate largeVis and tSNE embeddings.
Usage
basicP2proc(
cd,
n.cores = 1,
n.odgenes = 3000,
nPcs = 100,
k = 30,
perplexity = 50,
log.scale = TRUE,
trim = 10,
keep.genes = NULL,
min.cells.per.gene = 0,
min.transcripts.per.cell = 100,
get.largevis = TRUE,
get.tsne = TRUE,
make.geneknn = TRUE
)
Arguments
cd |
count matrix whereby rows are genes, columns are cells. |
n.cores |
numeric Number of cores to use (default=1) |
n.odgenes |
numeric Number of top overdispersed genes to use (dfault=3e3) |
nPcs |
numeric Number of PCs to use (default=100) |
k |
numeric Default number of neighbors to use in kNN graph (default=30) |
perplexity |
numeric Perplexity to use in generating tSNE and largeVis embeddings (default=50) |
log.scale |
boolean Whether to use log scale normalization (default=TRUE) |
trim |
numeric Number of cells to trim in winsorization (default=10) |
keep.genes |
optional set of genes to keep from being filtered out (even at low counts) (default=NULL) |
min.cells.per.gene |
numeric Minimal number of cells required for gene to be kept (unless listed in keep.genes) (default=0) |
min.transcripts.per.cell |
numeric Minimumal number of molecules/reads for a cell to be admitted (default=100) |
get.largevis |
boolean Whether to caluclate largeVis embedding (default=TRUE) |
get.tsne |
boolean Whether to calculate tSNE embedding (default=TRUE) |
make.geneknn |
boolean Whether pre-calculate gene kNN (for gene search) (default=TRUE) |
Value
a new 'Pagoda2' object
Generate a 'pagoda2' web application from a 'Pagoda2' object
Description
Generate a 'pagoda2' web application from a 'Pagoda2' object
Usage
basicP2web(p2, app.title = "Pagoda2", extraWebMetadata = NULL, n.cores = 4)
Arguments
p2 |
a 'Pagoda2' object |
app.title |
name of application as displayed in the browser title (default='Pagoda2') |
extraWebMetadata |
additional metadata generated by p2.metadata.from.fractor (default=NULL) |
n.cores |
numeric Number of cores to use for differential expression calculation (default=4) |
Value
a 'pagoda2' web object
Rescale the weights in an edge matrix to match a given perplexity. From 'largeVis', <https://github.com/elbamos/largeVis>
Description
Rescale the weights in an edge matrix to match a given perplexity. From 'largeVis', <https://github.com/elbamos/largeVis>
Usage
buildWijMatrix(x, threads = NULL, perplexity = 50)
Arguments
x |
An edgematrix, either an 'edgematrix' object or a sparse matrix. |
threads |
numeric The maximum number of threads to spawn (default=NULL). Determined automatically if NULL (default=NULL) |
perplexity |
numeric Given perplexity (default=50) |
Value
A list
with the following components:
- 'dist'
An [N,K] matrix of the distances to the nearest neighbors.
- 'id'
An [N,K] matrix of the node indexes of the neartest neighbors. Note that this matrix is 1-indexed, unlike most other matrices in this package.
- 'k'
The number of nearest neighbors.
Returns a list vector with the number of cells that are present in more than one selections in the provided p2 selection object
Description
Returns a list vector with the number of cells that are present in more than one selections in the provided p2 selection object
Usage
calcMulticlassified(sel)
Arguments
sel |
a pagoda2 selection as genereated by readPagoda2SelectionFile |
Value
list vector with the number of cells that are present in more than one selections in the provided p2 selection object
Get the number of cells in each selection group
Description
Get the number of cells in each selection group
Usage
cellsPerSelectionGroup(selection)
Arguments
selection |
a pagoda2 selection list |
Value
a named vector of cell numbers in each groups
Translate cell cluster dendrogram to an array, one row per node with 1/0 cluster membership
Description
Translate cell cluster dendrogram to an array, one row per node with 1/0 cluster membership
Usage
cldend2array(d, cells = NULL)
Arguments
d |
cell cluster dendrogram |
cells |
character vector of cells (default=NULL). If NULL, determine the total order of cells with unlist(dendrapply(d, attr, 'cells')) |
Value
array with one row per node with 1/0 cluster membership
Collapse aspect patterns into clusters
Description
Collapse aspect patterns into clusters
Usage
collapse.aspect.clusters(d, dw, ct, scale = TRUE, pick.top = FALSE)
Arguments
d |
matrix of normalized aspect patterns (rows: significant aspects, columns: cells), normally the output $xv in 'tamr', the combined pathways that show similar expression patterns |
dw |
corresponding weight matrix to parameter 'd' |
ct |
clusters, the output of fastcluster::hclust() |
scale |
boolean Whether to scale aspects (default=TRUE) |
pick.top |
boolean Whether to pick top aspects (default=FALSE) |
Value
list of clusters from matrix of normalized aspect patterns and clusters from the corresponding weight matrix
Compare two different clusterings provided as factors by plotting a normalised heatmap
Description
Compare two different clusterings provided as factors by plotting a normalised heatmap
Usage
compareClusterings(cl1, cl2, filename = NA)
Arguments
cl1 |
clustering 1, a named factor |
cl2 |
clustering 2, a named factor |
filename |
an optional filename to save the plot instead of displaying it, will be passed to pheatmap (default=NA) |
Value
invisible summary table that gets plotted
Perform differential expression on a p2 object given a set of web selections and two groups to compare
Description
Perform differential expression on a p2 object given a set of web selections and two groups to compare
Usage
diffExprOnP2FromWebSelection(p2, sel, group1name, group2name)
Arguments
p2 |
a pagoda2 object |
sel |
a web selection read with readPagoda2SelectionFile() |
group1name |
the name of the first selection |
group2name |
the names of the second selection |
Perform differential expression on a p2 object given a set of web selections and one group to compare against everything else
Description
Perform differential expression on a p2 object given a set of web selections and one group to compare against everything else
Usage
diffExprOnP2FromWebSelectionOneGroup(p2, sel, groupname)
Arguments
p2 |
a pagoda2 object |
sel |
a web selection read with readPagoda2SelectionFile() |
groupname |
name of group to compare to everything else |
Perform extended 'Pagoda2' processing. Generate organism specific GO environment and calculate pathway overdispersion.
Description
Perform extended 'Pagoda2' processing. Generate organism specific GO environment and calculate pathway overdispersion.
Usage
extendedP2proc(p2, organism = "hs")
Arguments
p2 |
the 'Pagoda2' object |
organism |
character Organisms hs (Homo Sapiens), mm (M. Musculus, mouse) or dr (D. Rerio, zebrafish) (default='hs') |
Value
list of a 'Pagoda2' object and go.env
Returns a factor of cell membership from a p2 selection object the factor only includes cells present in the selection. If the selection contains multiclassified cells an error is raised
Description
Returns a factor of cell membership from a p2 selection object the factor only includes cells present in the selection. If the selection contains multiclassified cells an error is raised
Usage
factorFromP2Selection(sel, use.internal.name = FALSE, flatten = FALSE)
Arguments
sel |
a pagoda2 selection as genereated by readPagoda2SelectionFile |
use.internal.name |
boolean Whether to use field 'internal.name' as factor names (default=FALSE) |
flatten |
boolean Whether to ignore multiclassified cells, overwriting randomly (default=FALSE) |
Value
factor of cell membership from a p2 selection object. The factor only includes cells present in the selection.
Converts a list of factors into 'pagoda2' metadata optionally filtering down to the cells present in the provided 'pagoda2' app.
Description
Converts a list of factors into 'pagoda2' metadata optionally filtering down to the cells present in the provided 'pagoda2' app.
Usage
factorListToMetadata(factor.list, p2 = NULL)
Arguments
factor.list |
list of factors named by the cell identifier |
p2 |
'pagoda2' app to filter the factors by, optional (default=NULL) |
Value
'pagoda2' web metadata object
Converts a names factor to a p2 selection object if colors are provided it assigns those, otherwise uses a rainbow palette
Description
Converts a names factor to a p2 selection object if colors are provided it assigns those, otherwise uses a rainbow palette
Usage
factorToP2selection(cl, col = NULL)
Arguments
cl |
factor |
col |
names vector of colors (default=NULL) |
Value
a p2 selection object (list)
Filter cells based on gene/molecule dependency
Description
Filter cells based on gene/molecule dependency
Usage
gene.vs.molecule.cell.filter(
countMatrix,
min.cell.size = 500,
max.cell.size = 50000,
p.level = min(0.001, 1/ncol(countMatrix)),
alpha = 0.1,
plot = TRUE,
do.par = TRUE
)
Arguments
countMatrix |
input count matrix to be filtered |
min.cell.size |
numeric Min allowed cell size (default=500) |
max.cell.size |
numeric Max allowed cell size (default=5e4) |
p.level |
numeric Statistical confidence level for deviation from the main trend, used for cell filtering (default=min(1e-3,1/ncol(countMatrix))) |
alpha |
numeric Shading of the confidence band (default=0.1) |
plot |
boolean Plot the molecule distribution and the gene/molecule dependency fit (default=TRUE) |
do.par |
boolean Reset graphical parameters prior to plotting (default=TRUE) |
Value
a filtered matrix
Given a cell clustering (partitioning) and a set of user provided selections generate a cleaned up annotation of cluster groups that can be used for classification
Description
Given a cell clustering (partitioning) and a set of user provided selections generate a cleaned up annotation of cluster groups that can be used for classification
Usage
generateClassificationAnnotation(clustering, selections)
Arguments
clustering |
a factor that provides the clustering |
selections |
a p2 selection object that provided by the web interfact user |
Value
a named factor that can be used for classification
Get a control geneset for cell scoring using the method described in Puram, Bernstein (Cell, 2018)
Description
Get a control geneset for cell scoring using the method described in Puram, Bernstein (Cell, 2018)
Usage
get.control.geneset(data, signature, n.bins = 25, n.genes.per.bin = 100)
Arguments
data |
matrix of expression, rows are cell, columns are genes |
signature |
character vector The signature to evaluate, a character vector of genes |
n.bins |
numeric Number of bins to put the genes in (default=25) |
n.genes.per.bin |
numeric Number of genes to get from each bin (default=100) |
Value
a character vector that can be used as a background signature
Generate differential expression genesets for the web app given a cell grouping by calculating DE sets between each cell set and everything else
Description
Generate differential expression genesets for the web app given a cell grouping by calculating DE sets between each cell set and everything else
Usage
get.de.geneset(pagObj, groups, prefix = "de_")
Arguments
pagObj |
pagoda object |
groups |
named factor to do the de by |
prefix |
chararcter Prefix to assign to genesets generated (default="de_") |
Value
a 'pagoda2' web object
Returns all the cells that are in the designated selections. Given a pagoda2 selections object and the names of some selections in it returns the names of the cells that are in these selections removed any duplicates
Description
Returns all the cells that are in the designated selections. Given a pagoda2 selections object and the names of some selections in it returns the names of the cells that are in these selections removed any duplicates
Usage
getCellsInSelections(p2selections, selectionNames)
Arguments
p2selections |
a p2 selections object |
selectionNames |
the names of some selections in th p2 object |
Value
a character vector of cell names
Assign names to the clusters, given a clustering vector and a set of selections. This function will use a set of pagoda2 cell seletcion to identify the clusters in a a named factor. It is meant to be used to import user defined annotations that are defined as selections into a more formal categorization of cells that are defined by cluster. To help with this the function allows a percent of cells to have been classified in the selections into multiple groups, something which may be the result of the users making wrong selections. The percent of cells allows to be multiselected in any given group is defined by multiClassCutoff. Furthermore the method will assign each cluster to a selection only if the most popular cluster to the next most popular exceed the ambiguous.ratio in terms of cell numbers. If a cluster does not satisfy this condtiion it is not assigned.
Description
Assign names to the clusters, given a clustering vector and a set of selections. This function will use a set of pagoda2 cell seletcion to identify the clusters in a a named factor. It is meant to be used to import user defined annotations that are defined as selections into a more formal categorization of cells that are defined by cluster. To help with this the function allows a percent of cells to have been classified in the selections into multiple groups, something which may be the result of the users making wrong selections. The percent of cells allows to be multiselected in any given group is defined by multiClassCutoff. Furthermore the method will assign each cluster to a selection only if the most popular cluster to the next most popular exceed the ambiguous.ratio in terms of cell numbers. If a cluster does not satisfy this condtiion it is not assigned.
Usage
getClusterLabelsFromSelection(
clustering,
selections,
multiClassCutoff = 0.3,
ambiguous.ratio = 0.5
)
Arguments
clustering |
a named factor of clusters, where every entry is a cell |
selections |
a pagoda2 selection object |
multiClassCutoff |
numeric Percent of cells in any one cluster that can be multiassigned (default=0.3) |
ambiguous.ratio |
numeric Ratio of first and second cell numbers for any cluster to produce a valid clustering (default=0.5) |
Value
a data.frame with two columns, one for cluster and one for selections, each cluster appears only once
Retrieves the colors of each selection from a p2 selection object as a names vector of strings
Description
Retrieves the colors of each selection from a p2 selection object as a names vector of strings
Usage
getColorsFromP2Selection(sel)
Arguments
sel |
pagoda2 selection object |
Value
a named vector of hex colours
Get a mapping form internal to external names for the specified selection object
Description
Get a mapping form internal to external names for the specified selection object
Usage
getIntExtNamesP2Selection(x)
Arguments
x |
p2 selection object |
Value
list of names from the specified selection object
Converts the output of hierarchical differential expression aspects into genesets that can be loaded into a 'pagoda2' web app to retrive the genes that make the geneset interactively
Description
Converts the output of hierarchical differential expression aspects into genesets that can be loaded into a 'pagoda2' web app to retrive the genes that make the geneset interactively
Usage
hierDiffToGenesets(output)
Arguments
output |
output of getHierarchicalDiffExpressionAspects |
Value
a geneset that can be loaded into p2 web genesets
Generate a Rook Server app from a 'Pagoda2' object. This generates a 'pagoda2' web object from a 'Pagoda2' object by automating steps that most users will want to run. This function is a wrapper about the 'pagoda2' web constructor. (Advanced users may wish to use that constructor directly.)
Description
Generate a Rook Server app from a 'Pagoda2' object. This generates a 'pagoda2' web object from a 'Pagoda2' object by automating steps that most users will want to run. This function is a wrapper about the 'pagoda2' web constructor. (Advanced users may wish to use that constructor directly.)
Usage
make.p2.app(
r,
dendrogramCellGroups,
additionalMetadata = list(),
geneSets,
show.depth = TRUE,
show.batch = TRUE,
show.clusters = TRUE,
appname = "Pagoda2 Application",
innerOrder = NULL,
orderDend = FALSE,
appmetadata = NULL
)
Arguments
r |
a 'Pagoda2' object |
dendrogramCellGroups |
a named factor of cell groups, used to generate the main dendrogram, limits zoom in |
additionalMetadata |
a list of metadata other than depth, batch and cluster that are automatically added (default=list()) |
geneSets |
a list of genesets to show |
show.depth |
boolean Include depth as a metadata row (default=TRUE) |
show.batch |
boolean Include batch as a metadata row (default=TRUE) |
show.clusters |
boolean Include clusters as a metadata row (default=TRUE) |
appname |
character Application name (default="Pagoda2 Application") |
innerOrder |
Ordering of cells inside the clusters provided in dendrogramCellGroups (default=NULL). This should be one of "odPCA", "reductdist", "graphbased", "knn". Defaults to NULL |
orderDend |
boolean Whether to order dendrogram (default=FALSE) |
appmetadata |
a 'pagoda2' web application metadata (default=NULL) |
Value
a 'pagoda2' web object that presents a Rook compatible interface
Scale the designated values between the range of 0 and 1
Description
Scale the designated values between the range of 0 and 1
Usage
minMaxScale(x)
Arguments
x |
values to scale |
Value
the scaled values
Examples
example_matrix = matrix(rep(c(1:5), 3), 5)
minMaxScale(example_matrix)
Get a vector of the names of an object named by the names themselves. This is useful with lapply when passing names of objects as it ensures that the output list is also named.
Description
Get a vector of the names of an object named by the names themselves. This is useful with lapply when passing names of objects as it ensures that the output list is also named.
Usage
namedNames(g)
Arguments
g |
an objects on which we can call names() |
Value
vector with names of object
Generate a GO environment for human for overdispersion analysis for the the back end
Description
Generate a GO environment for human for overdispersion analysis for the the back end
Usage
p2.generate.dr.go(r)
Arguments
r |
a 'Pagoda2' object |
Value
a GO environment object
Generates zebrafish (Danio rerio) GO annotation for the web object
Description
Generates zebrafish (Danio rerio) GO annotation for the web object
Usage
p2.generate.dr.go.web(gene.names, n.cores = 1)
Arguments
gene.names |
a character vector of genes to include |
n.cores |
numeric Number of cores to use (default=1) |
Generate a GO environment for the organism specified
Description
Generate a GO environment for the organism specified
Usage
p2.generate.go(
r,
organism = NULL,
go2all.egs = NULL,
eg.alias2eg = NULL,
min.env.length = 5
)
Arguments
r |
a 'Pagoda2' object |
organism |
the organism (default=NULL). Currently 'hs' (human), 'mm' (mouse) and 'dr' (zebrafish) are supported. |
go2all.egs |
mappings between a given GO identifier and all of the Entrez Gene identifiers annotated at that GO term or to one of its child nodes in the GO ontology (default=NULL) |
eg.alias2eg |
mappings between common gene symbol identifiers and entrez gene identifiers (default=NULL) |
min.env.length |
numeric Minimum environment length (default=5) |
Generates GO annotation for the web object for any species
Description
Generates GO annotation for the web object for any species
Usage
p2.generate.go.web(
gene.names,
egALIAS2EG = NULL,
egGO2ALLEGS = NULL,
n.cores = 1
)
Arguments
gene.names |
a character vector of genes to include |
egALIAS2EG |
(default=NULL) |
egGO2ALLEGS |
(default=NULL) |
n.cores |
numeric Number of cores to use (default=1) |
Generates GO annotation for the web object from the GO environment used for enrichment analysis
Description
Generates GO annotation for the web object from the GO environment used for enrichment analysis
Usage
p2.generate.go.web.fromGOEnv(go.env)
Arguments
go.env |
GO enviroment generated with p2.generate.go |
Generate a GO environment for human for overdispersion analysis for the the back end
Description
Generate a GO environment for human for overdispersion analysis for the the back end
Usage
p2.generate.human.go(r)
Arguments
r |
a 'Pagoda2' object |
Value
a GO environment object
Generates human GO annotation for the web object
Description
Generates human GO annotation for the web object
Usage
p2.generate.human.go.web(gene.names, n.cores = 1)
Arguments
gene.names |
a character vector of genes to include |
n.cores |
numeric Number of cores to use (default=1) |
Generate a GO environment for mouse for overdispersion analysis for the the back end
Description
Generate a GO environment for mouse for overdispersion analysis for the the back end
Usage
p2.generate.mouse.go(r)
Arguments
r |
a 'Pagoda2' object |
Value
a GO environment object
Generates mouse (Mus musculus) GO annotation for the web object
Description
Generates mouse (Mus musculus) GO annotation for the web object
Usage
p2.generate.mouse.go.web(gene.names, n.cores = 1)
Arguments
gene.names |
a character vector of genes to include |
n.cores |
numeric Number of cores to use (default=1) |
Create 'PAGODA1' web application from a 'Pagoda2' object 'PAGODA1' found here, with 'SCDE': <https://www.bioconductor.org/packages/release/bioc/html/scde.html>
Description
Create 'PAGODA1' web application from a 'Pagoda2' object 'PAGODA1' found here, with 'SCDE': <https://www.bioconductor.org/packages/release/bioc/html/scde.html>
Usage
p2.make.pagoda1.app(
p2,
col.cols = NULL,
row.clustering = NULL,
title = "pathway clustering",
zlim = NULL,
embedding = NULL,
inner.clustering = TRUE,
groups = NULL,
clusterType = NULL,
embeddingType = NULL,
veloinfo = NULL,
type = "PCA",
min.group.size = 1,
batch.colors = NULL,
n.cores = 10
)
Arguments
p2 |
'Pagoda2' object |
col.cols |
Matrix of column colors (default=NULL). Useful for visualizing cell annotations such as batch labels. |
row.clustering |
Row dendrogram (default=NULL) |
title |
character Title to use (default="pathway clustering") |
zlim |
Range of the normalized gene expression levels (default=NULL). Input as a list: c(lower_bound, upper_bound). Values outside this range will be Winsorized. Useful for increasing the contrast of the heatmap visualizations. If NULL, set to the 5th and 95th percentiles. |
embedding |
A 2-D embedding of the cells (PCA, tSNE, etc.), passed as a data frame with two columns (two dimensions) and rows corresponding to cells (row names have to match cell names) (default=NULL). |
inner.clustering |
boolean Whether to get overall cell clustering (default=TRUE). |
groups |
factor describing grouping of different cells. If provided, the cross-fits and the expected expression magnitudes will be determined separately within each group. The factor should have the same length as ncol(counts) (default=NULL). |
clusterType |
cluster type (default=NULL). If NULL, takes the latest cluster in the 'Pagoda2' object using 'p2$clusters[[type]][[1]]' |
embeddingType |
embedding type (default=NULL). If NULL, takes the latest embedding in the 'Pagoda2' object using p2$embeddings[[type]][[1]] |
veloinfo |
cell velocity information, cell velocities (grid and cell) (default=NULL) |
type |
character Either 'counts' or a name of a 'reduction' in the 'Pagoda2' object (default='PCA') |
min.group.size |
integer Minimum group size (default=1) |
batch.colors |
colors of the batches, i.e. the factor (corresponding to rows of the model matrix) specifying batch assignment of each cell(default=NULL) |
n.cores |
numeric Number of cores (default=10) |
Value
'PAGODA1' web application
Generate a list metadata structure that can be passed to a 'pagoda2' web object constructor as additional metadata given a named factor
Description
Generate a list metadata structure that can be passed to a 'pagoda2' web object constructor as additional metadata given a named factor
Usage
p2.metadata.from.factor(
metadata,
displayname = NULL,
s = 1,
v = 1,
start = 0,
end = NULL,
pal = NULL
)
Arguments
metadata |
named factor with metadata for individual cells, names must correspond to cells |
displayname |
character Name to display for the metadata (default=NULL) |
s |
numeric Value for rainbow palette (default=1) |
v |
numeric Value for rainbow palette (default=1) |
start |
numeric Starting value (default=0) |
end |
numeric Ending value (default=NULL) |
pal |
optional vector of colours to use, if provided overrides s,v,start and end parameters (default=NULL) |
Value
list of data, levels, palette to be passed to 'pagoda2' web object constructor
Generate a 'pagoda2' web object from a 'Pagoda2' object using hierarchical differential expression
Description
Generate a 'pagoda2' web object from a 'Pagoda2' object using hierarchical differential expression
Usage
p2.toweb.hdea(p2, title = "")
Arguments
p2 |
p2 object |
title |
character Name of the pagoda object (default="") |
Value
a 'pagoda2' web object
p2ViewPagodaApp R6 class
Description
Modified 'PAGODA1' app (from 'SCDE') for browsing 'pagoda2' results. Refer to 'ViewPagodaAppOld' and 'make.pagoda.app()' in 'SCDE'
Public fields
results
Result object returned by
scde.expression.difference()
(default=NULL). Note to browse group posterior levels, usereturn.posteriors = TRUE
in thescde.expression.difference()
call.type
Either 'counts' or a name of a 'reduction' in the 'Pagoda2' object
genes
List of genes to display in the Detailed clustering panel (default=list())
batch
Any batch or other known confounders to be included in the visualization as a column color track (default=NULL)
pathways
character vector Pathway or gene names (default=NULL)
name
App name (needs to be altered only if adding more than one app to the server using the 'server' parameter) (default=NULL)
trim
Trim quantity used for Winsorization for visualization
embedding
Embedding information (default=NULL)
veloinfo
Velocity information (default=NULL)
goenv
environment mapping pathways to genes (default=NULL)
renv
Global environment (default=NULL)
Methods
Public methods
Method new()
Initialize p2ViewPagodaApp class
Usage
p2ViewPagodaApp$new( results, pathways, genes, goenv, batch = NULL, name = "pathway overdispersion", trim = 1.1/nrow(p2$counts), embedding = NULL, type, veloinfo = NULL )
Arguments
results
Result object returned by
scde.expression.difference()
. Note to browse group posterior levels, usereturn.posteriors = TRUE
in thescde.expression.difference()
call.pathways
character vector Pathway or gene names (default=NULL)
genes
list Genes to display in the Detailed clustering panel (default=list())
goenv
Environment mapping pathways to genes (default=NULL)
batch
Any batch or other known confounders to be included in the visualization as a column color track (default=NULL)
name
string App name (needs to be altered only if adding more than one app to the server using the 'server' parameter) (default="pathway overdispersion")
trim
numeric Trim quantity used for Winsorization for visualization (default=1.1/nrow(p2$counts) whereby the 'counts' from the 'Pagoda2' object is the gene count matrix, normalized on total counts (default=NULL)
embedding
Embedding information (default=NULL)
type
Either 'counts' or a name of a 'reduction' in the 'pagoda2' object
veloinfo
Velocity information (default=NULL)
Returns
new 'p2ViewPagodaApp' object
Method getgenecldata()
Helper function to get the heatmap data for a given set of genes
Usage
p2ViewPagodaApp$getgenecldata(genes = NULL, gcl = NULL, ltrim = 0)
Arguments
genes
character vector Gene names (default=NULL)
gcl
pathway or gene-weighted PCA (default=NULL). If NULL, uses tp2c.view.pathways(self$genes, self$results$p2, goenv=goenv, vhc=self$results$hvc, plot=FALSE, trim=ltrim, n.genes=Inf).
ltrim
numeric Winsorization trim that should be applied (default=0)
Returns
heatmap data for a given set of genes
Method call()
Call Rook application. Using client-side ExtJS framework and Inchlib HTML5 canvas libraries to create the graphical user interface for PAGODA
Usage
p2ViewPagodaApp$call(env)
Arguments
env
The environment argument is a true R environment object which the application is free to modify. Please see the Rook documentation for more details.
Returns
modified 'PAGODA1' app
Method clone()
The objects of this class are cloneable with this method.
Usage
p2ViewPagodaApp$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
Collapse aspects driven by the same combinations of genes. (Aspects are some pattern across cells e.g. sequencing depth, or PC corresponding to an undesired process such as ribosomal pathway variation.) Examines PC loading vectors underlying the identified aspects and clusters of aspects based on a product of loading and score correlation (raised to corr.power). Clusters of aspects driven by the same genes are determined based on the parameter "distance.threshold".
Description
Collapse aspects driven by the same combinations of genes. (Aspects are some pattern across cells e.g. sequencing depth, or PC corresponding to an undesired process such as ribosomal pathway variation.) Examines PC loading vectors underlying the identified aspects and clusters of aspects based on a product of loading and score correlation (raised to corr.power). Clusters of aspects driven by the same genes are determined based on the parameter "distance.threshold".
Usage
pagoda.reduce.loading.redundancy(
tam,
pwpca,
clpca = NULL,
plot = FALSE,
cluster.method = "complete",
distance.threshold = 0.01,
corr.power = 4,
abs = TRUE,
n.cores = 1,
...
)
Arguments
tam |
output of pagoda.top.aspects(), i.e. a list structure containing the following items: xv: a matrix of normalized aspect patterns (rows: significant aspects, columns: cells) xvw: corresponding weight matrix gw: set of genes driving the significant aspects df: text table with the significance testing results |
pwpca |
output of pagoda.pathway.wPCA(), i.e. a list of weighted PCA info for each valid gene set |
clpca |
output of pagoda.gene.clusters() (optional) (default=NULL). The output of pagoda.gene.clusters() is a list structure containing the following fields: clusters: alist of genes in each cluster values xf: extreme value distribution fit for the standardized lambda1 of a randomly generated pattern tci: index of a top cluster in each random iteration cl.goc: weighted PCA info for each real gene cluster varm: standardized lambda1 values for each randomly generated matrix cluster clvlm: a linear model describing dependency of the cluster lambda1 on a Tracy-Widom lambda1 expectation |
plot |
boolean Whether to plot the resulting clustering (default=FALSE) |
cluster.method |
string One of the standard clustering methods to be used (default="complete") |
distance.threshold |
numeric Similarity threshold for grouping interdependent aspects (default=0.01) |
corr.power |
numeric Power to which the product of loading and score correlation is raised (default=4) |
abs |
boolean Whether to use absolute correlation (default=TRUE) |
n.cores |
numeric Number of cores to use during processing (default=1) |
... |
additional arguments are passed to the pagoda.view.aspects() method during plotting |
Value
a list structure analogous to that returned by pagoda.top.aspects(), but with addition of a $cnam element containing a list of aspects summarized by each row of the new (reduced) $xv and $xvw
Collapse aspects driven by similar patterns (i.e. separate the same sets of cells) Examines PC loading vectors underlying the identified aspects and clusters aspects based on score correlation. Clusters of aspects driven by the same patterns are determined based on the distance.threshold.
Description
Collapse aspects driven by similar patterns (i.e. separate the same sets of cells) Examines PC loading vectors underlying the identified aspects and clusters aspects based on score correlation. Clusters of aspects driven by the same patterns are determined based on the distance.threshold.
Usage
pagoda.reduce.redundancy(
tamr,
distance.threshold = 0.2,
cluster.method = "complete",
distance = NULL,
weighted.correlation = TRUE,
plot = FALSE,
top = Inf,
trim = 0,
abs = FALSE,
...
)
Arguments
tamr |
Combined pathways that show similar expression patterns, output of pagoda.reduce.loading.redundancy() |
distance.threshold |
numeric Similarity threshold for grouping interdependent aspects (default=0.2) |
cluster.method |
character One of the standard clustering methods to be used (default="complete") |
distance |
distance matrix (default=NULL) |
weighted.correlation |
boolean Whether to use a weighted correlation in determining the similarity of patterns (default=TRUE) |
plot |
boolean Whether to show plot (default=FALSE) |
top |
bololean Restrict output to the top N aspects of heterogeneity (default=Inf, i.e. no restriction) |
trim |
numeric Winsorization trim to use prior to determining the top aspects (default=0) |
abs |
boolean Whether to use absolute correlation (default=FALSE) |
... |
additional arguments are passed to the pagoda.view.aspects() method during plotting |
Value
List structure analogous to that returned by pagoda.top.aspects(), but with addition of a $cnam element containing a list of aspects summarized by each row of the new (reduced) $xv and $xvw
pagoda2WebApp class to create 'pagoda2' web applications via a Rook server
Description
pagoda2WebApp class to create 'pagoda2' web applications via a Rook server
Fields
originalP2object
Input 'Pagoda2' object
name
string Display name for the application
mat
Embedding
cellmetadata
Metadata associated with 'Pagoda2' object
mainDendrogram
Dendrogram from hclust() of all cells in the 'Pagoda2' object
geneSets
Gene sets in the 'Pagoda2' object
rookRoot
Rook server root directory
appmetadata
pagoda2 web application metadata
pagoda2WebApp_arrayToJSON
Description
Serialise an R array to a JSON object
Arguments
arr |
An array (default=NULL) |
Value
Serialised version of the array in JSON, which includes dimension information as separate fields
pagoda2WebApp_availableAspectsJSON
Description
Parse pathways from originalP2object$misc$pathwayOD$xv into JSON
Value
JSON with parsed cell order from mainDendrogram$cellorder
pagoda2WebApp_call
Description
Handle httpd server calls
Arguments
env |
The environment argument is a true R environment object which the application is free to modify. Please see the Rook documentation for more details. |
pagoda2WebApp_cellOrderJSON
Description
Parse mainDendrogram$cellorder into JSON
Value
JSON with parsed cell order from mainDendrogram$cellorder
pagoda2WebApp_cellmetadataJSON
Description
Parse cellmetadata into JSON
Value
JSON with parsed cellmetadata
pagoda2WebApp_geneInformationJSON
Description
Parse originalP2object$misc$varinfo[,c("m","qv")] into JSON
Value
JSON with parsed information from genename, dispersion, mean gene expression
Generate a dendrogram of groups
Description
Generate a dendrogram of groups
Arguments
dendrogramCellGroups |
Cell groups to input into hclust() |
Value
List of hcGroups, cellorder, and cluster.sizes
pagoda2WebApp_generateEmbeddingStructure
Description
Generate information about the embeddings we are exporting
Value
List with embeddings
pagoda2WebApp_generateGeneKnnJSON
Description
Generate a JSON list representation of the gene kNN network
Arguments
graph |
Input graph |
Value
JSON with gene kNN network
pagoda2WebApp_getCompressedEmbedding
Description
Compress the embedding
Arguments
reduc |
reduction |
embed |
embedding |
Value
compressed embedding as JSON
pagoda2WebApp_packCompressFloat64Array
Description
Compress float64 array
Arguments
v |
float64 array |
Value
compressed array
pagoda2WebApp_packCompressInt32Array
Description
Compress int32 array
Arguments
v |
int32 array |
Value
compressed array
pagoda2WebApp_readStaticFile
Description
Read a static file from the filesystem, and put in the response
Arguments
filename |
path to filename |
Value
Content to display or error page
pagoda2WebApp_reducedDendrogramJSON
Description
Parse dendrogram into JSON
Value
JSON with parsed dendrogram
pagoda2WebApp_serializeToStaticFast
Description
Convert serialized file to static file
Arguments
binary.filename |
path to binary file (default=NULL) |
verbose |
boolean Whether to give verbose output (default=FALSE) |
Value
static file written by WriteListToBinary(expL=exportList, outfile=binary.filename, verbose=verbose)
pagoda2WebApp_serverLog
Description
Logging function for console
Arguments
message |
Message to output for the console |
Value
printed message
pagoda2WebApp_sparseMatList
Description
Create simple List from sparse Matrix with Dimnames as JSON
Arguments
matsparse |
Sparse matrix |
Value
List with slots i, p, x
Parallel, optionally verbose lapply. See ?parallel::mclapply for more info.
Description
Parallel, optionally verbose lapply. See ?parallel::mclapply for more info.
Usage
papply(..., n.cores = parallel::detectCores(), mc.preschedule = FALSE)
Arguments
... |
Additional arguments passed to mclapply(), lapply(), or BiocParallel::bplapply() |
n.cores |
Number of cores to use (default=parallel::detectCores()) |
mc.preschedule |
See ?parallel::mclapply (default=FALSE). If TRUE then the computation is first divided to (at most) as many jobs are there are cores and then the jobs are started, each job possibly covering more than one value. If FALSE, then one job is forked for each value of X. The former is better for short computations or large number of values in X, the latter is better for jobs that have high variance of completion time and not too many values of X compared to mc.cores. |
Value
list, as returned by lapply
Calculate correlation distance between PC magnitudes given a number of target dimensions
Description
Calculate correlation distance between PC magnitudes given a number of target dimensions
Usage
pathway.pc.correlation.distance(pcc, xv, n.cores = 1, target.ndf = NULL)
Arguments
pcc |
weighted PC magnitudes e.g. scde::pagoda.pathway.wPCA() gives the weighted PC magnitudes for each gene provided; e.g. scde::pagoda.gene.clusters() gives the weighted PC magnitudes for de novo gene sets identified by clustering on expression |
xv |
a matrix of normalized aspect patterns (rows: significant aspects, columns: cells) |
n.cores |
numeric Number of cores to use (default=1) |
target.ndf |
numeric Target dimensions (default=NULL) |
Value
correlation distance matrix, akin to stats dist
Plot multiclassified cells per selection as a percent barplot
Description
Plot multiclassified cells per selection as a percent barplot
Usage
plotMulticlassified(sel)
Arguments
sel |
pagoda2 selection object |
Value
ggplot2 object
Plot the embedding of a 'Pagoda2' object with the given values
Description
Plot the embedding of a 'Pagoda2' object with the given values
Usage
plotOneWithValues(
p2obj,
values,
title = "",
type = "PCA",
embeddingType = "tSNE"
)
Arguments
p2obj |
the 'Pagoda2' object |
values |
the values to plot, fed into p2obj$plotEmbedding(colors=values) |
title |
character Title for the plot (default="") |
type |
character Type reduction on which the embedding is based on (default="PCA") |
embeddingType |
character Type of embedding to plot (default="tSNE") |
Value
NULL, simply updates p2obj$plotEmbedding()
Get a dataframe and plot summarising overlaps between selection of a pagoda2 selection object ignore self overlaps
Description
Get a dataframe and plot summarising overlaps between selection of a pagoda2 selection object ignore self overlaps
Usage
plotSelectionOverlaps(sel)
Arguments
sel |
a pagoda2 selection object |
Value
a list that contains a ggplot2 object and a datatable with the overlaps data
Project a distance matrix into a lower-dimensional space. (from elbamos/largeVis)
Description
Takes as input a sparse matrix of the edge weights connecting each node to its nearest neighbors, and outputs a matrix of coordinates embedding the inputs in a lower-dimensional space.
Usage
projectKNNs(
wij,
dim = 2,
sgd_batches = NULL,
M = 5,
gamma = 7,
alpha = 1,
rho = 1,
coords = NULL,
useDegree = FALSE,
momentum = NULL,
seed = NULL,
threads = NULL,
verbose = getOption("verbose", TRUE)
)
Arguments
wij |
A symmetric sparse matrix of edge weights, in C-compressed format, as created with the |
dim |
numeric The number of dimensions for the projection space (default=2) |
sgd_batches |
numeric The number of edges to process during SGD (default=NULL). Defaults to a value set based on the size of the dataset. If the parameter given is
between |
M |
numeric (largeVis) The number of negative edges to sample for each positive edge (default=5). |
gamma |
numeric (largeVis) The strength of the force pushing non-neighbor nodes apart (default=7). |
alpha |
numeric (largeVis) The hyperparameter in the distance function (default=1). The default distance function, |
rho |
(largeVis) numeric Initial learning rate (default=1) |
coords |
An initialized coordinate matrix (default=NULL) |
useDegree |
boolean Whether to use vertex degree to determine weights in negative sampling (if TRUE) or the sum of the vertex's edges (if FALSE) (default=FALSE) |
momentum |
If not NULL, SGD with momentum is used, with this multiplier, which must be between 0 and 1 (default=NULL). Note that momentum can drastically speed-up training time, at the cost of additional memory consumed. |
seed |
numeric Random seed to be passed to the C++ functions (default=NULL). Sampled from hardware entropy pool if |
threads |
numeric The maximum number of threads to spawn (default=NULL). Determined automatically if |
verbose |
boolean Verbosity (default=getOption("verbose", TRUE)) |
Details
The algorithm attempts to estimate a dim
-dimensional embedding using stochastic gradient descent and
negative sampling.
The objective function is:
O = \sum_{(i,j)\in E} w_{ij} (\log f(||p(e_{ij} = 1||) + \sum_{k=1}^{M} E_{jk~P_{n}(j)} \gamma \log(1 - f(||p(e_{ij_k} - 1||)))
where f()
is a probabilistic function relating the distance between two points in the low-dimensional projection space,
and the probability that they are nearest neighbors.
The default probabilistic function is 1 / (1 + \alpha \dot ||x||^2)
. If \alpha
is set to zero,
an alternative probabilistic function, 1 / (1 + \exp(x^2))
will be used instead.
Note that the input matrix should be symmetric. If any columns in the matrix are empty, the function will fail.
Value
A dense [N,D] matrix of the coordinates projecting the w_ij matrix into the lower-dimensional space.
Note
If specified, seed
is passed to the C++ and used to initialize the random number generator. This will not, however, be
sufficient to ensure reproducible results, because the initial coordinate matrix is generated using the R
random number generator.
To ensure reproducibility, call set.seed
before calling this function, or pass it a pre-allocated coordinate matrix.
The original paper called for weights in negative sampling to be calculated according to the degree of each vertex, the number of edges connecting to the vertex. The reference implementation, however, uses the sum of the weights of the edges to each vertex. In experiments, the difference was imperceptible with small (MNIST-size) datasets, but the results seems aesthetically preferrable using degree. The default is to use the edge weights, consistent with the reference implementation.
Quick loading of 10X CellRanger count matrices
Description
Quick loading of 10X CellRanger count matrices
Usage
read.10x.matrices(matrixPaths, version = "V3", n.cores = 1, verbose = TRUE)
Arguments
matrixPaths |
a single path to the folder containing matrix.mtx, genes.tsv and barcodes.tsv files, OR a named list of such paths |
version |
string Version of 10x output to read (default='V3'). Must be one of 'V2' or 'V3'. |
n.cores |
numeric Cores to utilize in parallel (default=1) |
verbose |
boolean Whether to output verbose output (default=TRUE) |
Value
a sparse matrix representation of the data (or a list of sparse matrices if a list of paths was passed)
This function reads a matrix generated by the 10x processing pipeline from the specified directory and returns it. It aborts if one of the required files in the specified directory do not exist.
Description
This function reads a matrix generated by the 10x processing pipeline from the specified directory and returns it. It aborts if one of the required files in the specified directory do not exist.
Usage
read10xMatrix(path, version = "V3", transcript.id = "SYMBOL", verbose = TRUE)
Arguments
path |
string Location of 10x output |
version |
string Version of 10x output to read (default='V3'). Must be one of 'V2' or 'V3'. |
transcript.id |
string Transcript identifier to use (default='SYMBOL'). Must be either 'SYMBOL' (e.g. "Sox17") or 'ENSEMBL' (e.g. "ENSMUSG00000025902"). This value is case-sensitive. |
verbose |
boolean Whether to return verbose output |
Value
parsed 10x outputs into a matrix
Read a pagoda2 cell selection file and return it as a factor while removing any mutliclassified cells
Description
Read a pagoda2 cell selection file and return it as a factor while removing any mutliclassified cells
Usage
readPagoda2SelectionAsFactor(filepath, use.internal.name = FALSE)
Arguments
filepath |
name of the selection file |
use.internal.name |
boolean Use field 'internal.name' as factor names (default=FALSE). Passed to factorFromP2Selection |
Value
a name factor with the membership of all the cells that are not multiclassified
Reads a 'pagoda2' web app exported cell selection file exported as a list of list objects that contain the name of the selection, the color (as a hex string) and the identifiers of the individual cells
Description
Reads a 'pagoda2' web app exported cell selection file exported as a list of list objects that contain the name of the selection, the color (as a hex string) and the identifiers of the individual cells
Usage
readPagoda2SelectionFile(filepath)
Arguments
filepath |
the path of the file load |
Remove cells that are present in more than one selection from all the selections they are in
Description
Remove cells that are present in more than one selection from all the selections they are in
Usage
removeSelectionOverlaps(selections)
Arguments
selections |
a pagoda2 selections list |
Value
a new list with the duplicated cells removed
Score cells by getting mean expression of genes in signatures
Description
Score cells by getting mean expression of genes in signatures
Usage
score.cells.nb0(data, signature)
Arguments
data |
matrix |
signature |
the genes in the signature |
Value
cell scores
Score cells after standardising the expression of each gene removing outliers
Description
Score cells after standardising the expression of each gene removing outliers
Usage
score.cells.nb1(data, signature, quantile.cutoff = 0.01)
Arguments
data |
matrix of expression, rows are cell, columns are genes |
signature |
a character vector of genes to use in the signature |
quantile.cutoff |
numeric The quantile extremes to trim before plotting (default=0.0.1) |
Value
The filtered subset of gene signatures
Puram, Bernstein (Cell, 2018) Score cells as described in Puram, Bernstein (Cell, 2018)
Description
Puram, Bernstein (Cell, 2018) Score cells as described in Puram, Bernstein (Cell, 2018)
Usage
score.cells.puram(data, signature, correct = TRUE, show.plot = FALSE, ...)
Arguments
data |
matrix of expression, rows are cell, columns are genes |
signature |
character vector The signature to evaluate, a character vector of genes |
correct |
boolean Perform background correction by getting a semi-random geneset (default=TRUE) |
show.plot |
boolean If corrected values are calculated show plot of corrected vs original scores (default=FALSE) |
... |
options for get.control.geneset() |
Value
a score for each cell
Calculate the default number of batches for a given number of vertices and edges.
The formula used is the one used by the 'largeVis' reference implementation. This is substantially less than the recommendation E * 10000
in the original paper.
Description
Calculate the default number of batches for a given number of vertices and edges.
The formula used is the one used by the 'largeVis' reference implementation. This is substantially less than the recommendation E * 10000
in the original paper.
Usage
sgdBatches(N, E = 150 * N/2)
Arguments
N |
Number of vertices |
E |
Number of edges (default = 150*N/2) |
Value
The recommended number of sgd batches.
Examples
# Observe that increasing K has no effect on processing time
N <- 70000 # MNIST
K <- 10:250
plot(K, sgdBatches(rep(N, length(K)), N * K / 2))
# Observe that processing time scales linarly with N
N <- c(seq(from = 1, to = 10000, by = 100), seq(from = 10000, to = 10000000, by = 1000))
plot(N, sgdBatches(N))
Directly open the 'pagoda2' web application and view the 'p2web' application object from our R session
Description
Directly open the 'pagoda2' web application and view the 'p2web' application object from our R session
Usage
show.app(app, name, port, ip, browse = TRUE, server = NULL)
Arguments
app |
'pagoda2' application object |
name |
character Name of the application to view |
port |
numeric Port number |
ip |
numeric IP address |
browse |
boolean Whether to load the app into an HTML browser (default=TRUE) |
server |
server If NULL, will grab server with get.scde.server(port=port, ip=ip) (derfault=NULL) |
Value
application within browser
Set names equal to values, a stats::setNames wrapper function
Description
Set names equal to values, a stats::setNames wrapper function
Usage
sn(x)
Arguments
x |
an object for which names attribute will be meaningful |
Value
An object with names assigned equal to values
Subset a gene signature to the genes in the given matrix with optional warning if genes are missing
Description
Subset a gene signature to the genes in the given matrix with optional warning if genes are missing
Usage
subsetSignatureToData(data, signature, raise.warning = TRUE)
Arguments
data |
matrix |
signature |
character vector The gene signature from which to subset a character vector of genes |
raise.warning |
boolean Warn if genes are missing (default=TRUE) |
Value
The filtered subset of gene signatures
View pathway or gene-weighted PCA 'Pagoda2' version of the function pagoda.show.pathways() Takes in a list of pathways (or a list of genes), runs weighted PCA, optionally showing the result.
Description
View pathway or gene-weighted PCA 'Pagoda2' version of the function pagoda.show.pathways() Takes in a list of pathways (or a list of genes), runs weighted PCA, optionally showing the result.
Usage
tp2c.view.pathways(
pathways,
p2,
goenv = NULL,
batch = NULL,
n.genes = 20,
two.sided = TRUE,
n.pc = rep(1, length(pathways)),
colcols = NULL,
zlim = NULL,
labRow = NA,
vhc = NULL,
cexCol = 1,
cexRow = 1,
nstarts = 50,
row.order = NULL,
show.Colv = TRUE,
plot = TRUE,
trim = 1.1/nrow(p2$counts),
showPC = TRUE,
...
)
Arguments
pathways |
character vector of pathway or gene names |
p2 |
'Pagoda2' object |
goenv |
environment mapping pathways to genes (default=NULL) |
batch |
factor (corresponding to rows of the model matrix) specifying batch assignment of each cell, to perform batch correction (default=NULL). |
n.genes |
integer Number of genes to show (default=20) |
two.sided |
boolean If TRUE, the set of shown genes should be split among highest and lowest loading (default=TRUE). If FALSE, genes with highest absolute loading should be shown. |
n.pc |
integer vector Number of principal component to show for each listed pathway(default=rep(1, length(pathways))) |
colcols |
column color matrix (default=NULL) |
zlim |
numeric z color limit (default=NULL) |
labRow |
row labels (default=NA) |
vhc |
cell clustering (default=NULL) |
cexCol |
positive numbers, used as cex.axis in for the row or column axis labeling(default=1) |
cexRow |
positive numbers, used as cex.axis in for the row or column axis labeling(default=1) |
nstarts |
integer Number of random starts to use (default=50) |
row.order |
row order (default=NULL). If NULL, uses order from hclust. |
show.Colv |
boolean Whether to show cell dendrogram (default=TRUE) |
plot |
boolean Whether to plot (default=TRUE) |
trim |
numeric Winsorization trim that should be applied (default=1.1/nrow(p2$counts)). Note that p2 is a 'Pagoda2' object. |
showPC |
boolean (default=TRUE) |
... |
parameters to pass to my.heatmap2. Only if plot is TRUE. |
Value
cell scores along the first principal component of shown genes (returned as invisible)
Validates a pagoda2 selection object
Description
Validates a pagoda2 selection object
Usage
validateSelectionsObject(selections)
Arguments
selections |
the pagoda2 selection object to be validated |
Value
a logical value indicating if the object is valid
Internal function to visualize aspects of transcriptional heterogeneity as a heatmap.
Description
Internal function to visualize aspects of transcriptional heterogeneity as a heatmap.
Usage
view.aspects(
mat,
row.clustering = NA,
cell.clustering = NA,
zlim = c(-1, 1) * quantile(mat, p = 0.95),
row.cols = NULL,
col.cols = NULL,
cols = colorRampPalette(c("darkgreen", "white", "darkorange"), space = "Lab")(1024),
show.row.var.colors = TRUE,
top = Inf,
...
)
Arguments
mat |
Numeric matrix |
row.clustering |
Row dendrogram (default=NA) |
cell.clustering |
Column dendrogram (default=NA) |
zlim |
numeric Range of the normalized gene expression levels, inputted as a list: c(lower_bound, upper_bound) (default=c(-1, 1)*quantile(mat, p = 0.95)). Values outside this range will be Winsorized. Useful for increasing the contrast of the heatmap visualizations. Default, set to the 5th and 95th percentiles. |
row.cols |
Matrix of row colors (default=NULL) |
col.cols |
Matrix of column colors (default=NULL). Useful for visualizing cell annotations such as batch labels. |
cols |
Heatmap colors (default=colorRampPalette(c("darkgreen", "white", "darkorange"), space = "Lab")(1024)) |
show.row.var.colors |
boolean Whether to show row variance as a color track (default=TRUE) |
top |
integer Restrict output to the top n aspects of heterogeneity (default=Inf) |
... |
additional arguments for heatmap plotting |
Value
A heatmap
Generate a 'pagoda2' web object
Description
Generate a 'pagoda2' web object
Usage
webP2proc(
p2,
additionalMetadata = NULL,
title = "Pagoda2",
make.go.sets = TRUE,
make.de.sets = TRUE,
go.env = NULL,
make.gene.graph = TRUE,
appmetadata = NULL
)
Arguments
p2 |
a 'Pagoda2' object |
additionalMetadata |
'pagoda2' web metadata object (default=NULL) |
title |
character string Title for the web app (default='Pagoda2') |
make.go.sets |
boolean Whether GO sets should be made (default=TRUE) |
make.de.sets |
boolean Whether differential expression sets should be made (default=TRUE) |
go.env |
the GO environment used for the overdispersion analysis (default=NULL) |
make.gene.graph |
logical specifying if the gene graph should be make, if FALSE the find similar genes functionality will be disabled on the web app |
appmetadata |
'pagoda2' web application metadata (default=NULL) |
Value
a 'pagoda2' web application
Sets the ncol(mat)*trim top outliers in each row to the next lowest value same for the lowest outliers
Description
Sets the ncol(mat)*trim top outliers in each row to the next lowest value same for the lowest outliers
Usage
winsorize.matrix(mat, trim)
Arguments
mat |
Numeric matrix |
trim |
numeric Fraction of outliers (on each side) that should be Winsorized, or (if the value is >= 1) the number of outliers to be trimmed on each side |
Value
Winsorized matrix
Examples
set.seed(0)
mat <- matrix( c(rnorm(5*10,mean=0,sd=1), rnorm(5*10,mean=5,sd=1)), 10, 10) # random matrix
mat[1,1] <- 1000 # make outlier
range(mat) # look at range of values
win.mat <- winsorize.matrix(mat, 0.1)
range(win.mat) # note outliers removed
Writes a list of genes as a gene selection that can be loaded in the web interface
Description
Writes a list of genes as a gene selection that can be loaded in the web interface
Usage
writeGenesAsPagoda2Selection(name, genes, filename)
Arguments
name |
the name of the selection |
genes |
a string vector of the gene names |
filename |
the filename to save to |
Value
NULL, writes to filepath the list of genes as a gene selection that can be loaded in the web interface
Writes a pagoda2 selection object as a p2 selection file that be be loaded to the web interface
Description
Writes a pagoda2 selection object as a p2 selection file that be be loaded to the web interface
Usage
writePagoda2SelectionFile(sel, filepath)
Arguments
sel |
pagoda2 selection object |
filepath |
name of file to which to write |
Value
NULL, writes to filepath the pagoda2 selection object as a p2 selection file that be be loaded to the web interface