Title: Haplotype-Aware CNV Analysis from scRNA-Seq
URL: https://github.com/kharchenkolab/numbat/, https://kharchenkolab.github.io/numbat/
Version: 1.4.2
Description: A computational method that infers copy number variations (CNVs) in cancer scRNA-seq data and reconstructs the tumor phylogeny. 'numbat' integrates signals from gene expression, allelic ratio, and population haplotype structures to accurately infer allele-specific CNVs in single cells and reconstruct their lineage relationship. 'numbat' can be used to: 1. detect allele-specific copy number variations from single-cells; 2. differentiate tumor versus normal cells in the tumor microenvironment; 3. infer the clonal architecture and evolutionary history of profiled tumors. 'numbat' does not require tumor/normal-paired DNA or genotype data, but operates solely on the donor scRNA-data data (for example, 10x Cell Ranger output). Additional examples and documentations are available at https://kharchenkolab.github.io/numbat/. For details on the method please see Gao et al. Nature Biotechnology (2022) <doi:10.1038/s41587-022-01468-y>.
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Depends: R (≥ 4.1.0), Matrix
Imports: ape, caTools, data.table, dendextend, dplyr (≥ 1.1.1), GenomicRanges, ggplot2, ggraph, ggtree, glue, hahmmr, igraph, IRanges, logger, magrittr, methods, optparse, parallel, parallelDist, patchwork, pryr, purrr, Rcpp, RhpcBLASctl, R.utils, scales, scistreer (≥ 1.1.0), stats4, stringr, tibble, tidygraph, tidyr (≥ 1.3.0), vcfR, zoo
Suggests: ggrastr, ggrepel, knitr, matrixStats, testthat (≥ 3.0.0),
Config/testthat/edition: 3
LinkingTo: Rcpp, RcppArmadillo, roptim
NeedsCompilation: yes
SystemRequirements: GNU make
Author: Teng Gao [cre, aut], Ruslan Soldatov [aut], Hirak Sarkar [aut], Evan Biederstedt [aut], Peter Kharchenko [aut]
Maintainer: Teng Gao <tgaoteng@gmail.com>
RoxygenNote: 7.2.3
Packaged: 2024-09-19 20:45:19 UTC; tenggao
Repository: CRAN
Date/Publication: 2024-09-20 12:20:07 UTC

Get the modes of a vector

Description

Get the modes of a vector

Usage

Modes(x)

Numbat R6 class

Description

Used to allow users to plot results

Value

a new 'Numbat' object

Public fields

label

character Sample name

gtf

dataframe Transcript annotation

joint_post

dataframe Joint posterior

exp_post

dataframe Expression posterior

allele_post

dataframe Allele posetrior

bulk_subtrees

dataframe Bulk profiles of lineage subtrees

bulk_clones

dataframe Bulk profiles of clones

segs_consensus

dataframe Consensus segments

tree_post

list Tree posterior

mut_graph

igraph Mutation history graph

gtree

tbl_graph Single-cell phylogeny

clone_post

dataframe Clone posteriors

gexp_roll_wide

matrix Smoothed expression of single cells

P

matrix Genotype probability matrix

treeML

matrix Maximum likelihood tree as phylo object

hc

hclust Initial hierarchical clustering

Methods

Public methods


Method new()

initialize Numbat class

Usage
Numbat$new(out_dir, i = 2, gtf = gtf_hg38, verbose = TRUE)
Arguments
out_dir

character string Output directory

i

integer Get results from which iteration (default=2)

gtf

dataframe Transcript gtf (default=gtf_hg38)

verbose

logical Whether to output verbose results (default=TRUE)

Returns

a new 'Numbat' object


Method plot_phylo_heatmap()

Plot the single-cell CNV calls in a heatmap and the corresponding phylogeny

Usage
Numbat$plot_phylo_heatmap(...)
Arguments
...

additional parameters passed to plot_phylo_heatmap()


Method plot_exp_roll()

Plot window-smoothed expression profiles

Usage
Numbat$plot_exp_roll(k = 3, n_sample = 300, ...)
Arguments
k

integer Number of clusters

n_sample

integer Number of cells to subsample

...

additional parameters passed to plot_exp_roll()


Method plot_mut_history()

Plot the mutation history of the tumor

Usage
Numbat$plot_mut_history(...)
Arguments
...

additional parameters passed to plot_mut_history()


Method plot_sc_tree()

Plot the single cell phylogeny

Usage
Numbat$plot_sc_tree(...)
Arguments
...

additional parameters passed to plot_sc_tree()


Method plot_consensus()

Plot consensus segments

Usage
Numbat$plot_consensus(...)
Arguments
...

additional parameters passed to plot_sc_tree()


Method plot_clone_profile()

Plot clone cnv profiles

Usage
Numbat$plot_clone_profile(...)
Arguments
...

additional parameters passed to plot_clone_profile()


Method cutree()

Re-define subclones on the phylogeny.

Usage
Numbat$cutree(max_cost = 0, n_cut = 0)
Arguments
max_cost

numeric Likelihood threshold to collapse internal branches

n_cut

integer Number of cuts on the phylogeny to define subclones


Method clone()

The objects of this class are cloneable with this method.

Usage
Numbat$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.


centromere regions (hg19)

Description

centromere regions (hg19)

Usage

acen_hg19

Format

An object of class tbl_df (inherits from tbl, data.frame) with 22 rows and 3 columns.


centromere regions (hg38)

Description

centromere regions (hg38)

Usage

acen_hg38

Format

An object of class tbl_df (inherits from tbl, data.frame) with 22 rows and 3 columns.


Utility function to make reference gene expression profiles

Description

Utility function to make reference gene expression profiles

Usage

aggregate_counts(count_mat, annot, normalized = TRUE, verbose = TRUE)

Arguments

count_mat

matrix/dgCMatrix Gene expression counts

annot

dataframe Cell annotation with columns "cell" and "group"

normalized

logical Whether to return normalized expression values

verbose

logical Verbosity

Value

matrix Reference gene expression levels

Examples

ref_custom = aggregate_counts(count_mat_ref, annot_ref, verbose = FALSE)

Call CNVs in a pseudobulk profile using the Numbat joint HMM

Description

Call CNVs in a pseudobulk profile using the Numbat joint HMM

Usage

analyze_bulk(
  bulk,
  t = 1e-05,
  gamma = 20,
  theta_min = 0.08,
  logphi_min = 0.25,
  nu = 1,
  min_genes = 10,
  exp_only = FALSE,
  allele_only = FALSE,
  bal_cnv = TRUE,
  retest = TRUE,
  find_diploid = TRUE,
  diploid_chroms = NULL,
  classify_allele = FALSE,
  run_hmm = TRUE,
  prior = NULL,
  exclude_neu = TRUE,
  phasing = TRUE,
  verbose = TRUE
)

Arguments

bulk

dataframe Pesudobulk profile

t

numeric Transition probability

gamma

numeric Dispersion parameter for the Beta-Binomial allele model

theta_min

numeric Minimum imbalance threshold

logphi_min

numeric Minimum log expression deviation threshold

nu

numeric Phase switch rate

min_genes

integer Minimum number of genes to call an event

exp_only

logical Whether to run expression-only HMM

allele_only

logical Whether to run allele-only HMM

bal_cnv

logical Whether to call balanced amplifications/deletions

retest

logical Whether to retest CNVs after Viterbi decoding

find_diploid

logical Whether to run diploid region identification routine

diploid_chroms

character vector User-given chromosomes that are known to be in diploid state

classify_allele

logical Whether to only classify allele (internal use only)

run_hmm

logical Whether to run HMM (internal use only)

prior

numeric vector Prior probabilities of states (internal use only)

exclude_neu

logical Whether to exclude neutral segments from retesting (internal use only)

phasing

logical Whether to use phasing information (internal use only)

verbose

logical Verbosity

Value

a pseudobulk profile dataframe with called CNV information

Examples

bulk_analyzed = analyze_bulk(bulk_example, t = 1e-5, find_diploid = FALSE, retest = FALSE)

Annotate a consensus segments on a pseudobulk dataframe

Description

Annotate a consensus segments on a pseudobulk dataframe

Usage

annot_consensus(bulk, segs_consensus, join_mode = "inner")

Arguments

bulk

dataframe Pseudobulk profile

segs_consensus

datatframe Consensus segment dataframe

Value

dataframe Pseudobulk profile


Annotate haplotype segments after HMM decoding

Description

Annotate haplotype segments after HMM decoding

Usage

annot_haplo_segs(bulk)

example reference cell annotation

Description

example reference cell annotation

Usage

annot_ref

Format

An object of class data.frame with 50 rows and 2 columns.


Annotate copy number segments after HMM decoding

Description

Annotate copy number segments after HMM decoding

Usage

annot_segs(bulk, var = "cnv_state")

Arguments

bulk

dataframe Pseudobulk profile

Value

a pseudobulk dataframe


Annotate the theta parameter for each segment

Description

Annotate the theta parameter for each segment

Usage

annot_theta_mle(bulk)

Arguments

bulk

dataframe Pseudobulk profile

Value

dataframe Pseudobulk profile


Annotate rolling estimate of imbalance level theta

Description

Annotate rolling estimate of imbalance level theta

Usage

annot_theta_roll(bulk)

Arguments

bulk

a pseudobulk dataframe

Value

a pseudobulk dataframe


Annotate genes on allele dataframe

Description

Annotate genes on allele dataframe

Usage

annotate_genes(df, gtf)

Arguments

df

dataframe Allele count dataframe

gtf

dataframe Gene gtf

Value

dataframe Allele dataframe with gene column


Laplace approximation of the posterior of expression fold change phi

Description

Laplace approximation of the posterior of expression fold change phi

Usage

approx_phi_post(
  Y_obs,
  lambda_ref,
  d,
  alpha = NULL,
  beta = NULL,
  mu = NULL,
  sig = NULL,
  lower = 0.2,
  upper = 10,
  start = 1
)

Arguments

Y_obs

numeric vector Gene expression counts

lambda_ref

numeric vector Reference expression levels

d

numeric Total library size

alpha

numeric Shape parameter of the gamma distribution

beta

numeric Rate parameter of the gamma distribution

mu

numeric Mean of the normal distribution

sig

numeric Standard deviation of the normal distribution

lower

numeric Lower bound of phi

upper

numeric Upper bound of phi

start

numeric Starting value of phi

Value

numeric MLE of phi and its standard deviation


Laplace approximation of the posterior of allelic imbalance theta

Description

Laplace approximation of the posterior of allelic imbalance theta

Usage

approx_theta_post(
  pAD,
  DP,
  p_s,
  lower = 0.001,
  upper = 0.499,
  start = 0.25,
  gamma = 20
)

Arguments

pAD

numeric vector Variant allele depth

DP

numeric vector Total allele depth

p_s

numeric vector Variant allele frequency

lower

numeric Lower bound of theta

upper

numeric Upper bound of theta

start

numeric Starting value of theta

gamma

numeric Gamma parameter of the beta-binomial distribution


calculate entropy for a binary variable

Description

calculate entropy for a binary variable

Usage

binary_entropy(p)

example pseudobulk dataframe

Description

example pseudobulk dataframe

Usage

bulk_example

Format

An object of class tbl_df (inherits from tbl, data.frame) with 3935 rows and 83 columns.


Calculate LLR for an allele HMM

Description

Calculate LLR for an allele HMM

Usage

calc_allele_LLR(pAD, DP, p_s, theta_mle, theta_0 = 0, gamma = 20)

Arguments

pAD

numeric vector Phased allele depth

DP

numeric vector Total allele depth

p_s

numeric vector Phase switch probabilities

theta_mle

numeric MLE of imbalance level theta (alternative hypothesis)

theta_0

numeric Imbalance level in the null hypothesis

gamma

numeric Dispersion parameter for the Beta-Binomial allele model

Value

numeric Log-likelihood ratio


Calculate allele likelihoods

Description

Calculate allele likelihoods

Usage

calc_allele_lik(pAD, DP, p_s, theta, gamma = 20)

Arguments

pAD

integer vector Paternal allele counts

DP

integer vector Total alelle counts

p_s

numeric vector Phase switch probabilities

theta

numeric Haplotype imbalance

gamma

numeric Overdispersion in the allele-specific expression


Calculate expression distance matrix between cell populatoins

Description

Calculate expression distance matrix between cell populatoins

Usage

calc_cluster_dist(count_mat, cell_annot)

Arguments

count_mat

dgCMatrix Gene expression counts

cell_annot

dataframe specifying the cell ID and cluster memberships

Value

a distance matrix


Calculate LLR for an expression HMM

Description

Calculate LLR for an expression HMM

Usage

calc_exp_LLR(
  Y_obs,
  lambda_ref,
  d,
  phi_mle,
  mu = NULL,
  sig = NULL,
  alpha = NULL,
  beta = NULL
)

Arguments

Y_obs

numeric vector Gene expression counts

lambda_ref

numeric vector Reference expression levels

d

numeric vector Total library size

phi_mle

numeric MLE of expression fold change phi (alternative hypothesis)

mu

numeric Mean parameter for the PLN expression model

sig

numeric Dispersion parameter for the PLN expression model

alpha

numeric Hyperparameter for the gamma poisson model (not used)

beta

numeric Hyperparameter for the gamma poisson model (not used)

Value

numeric Log-likelihood ratio


Calculate the MLE of expression fold change phi

Description

Calculate the MLE of expression fold change phi

Usage

calc_phi_mle_lnpois(Y_obs, lambda_ref, d, mu, sig, lower = 0.1, upper = 10)

Check the format of a allele dataframe

Description

Check the format of a allele dataframe

Usage

check_allele_df(df)

Arguments

df

dataframe Allele dataframe

Value

dataframe Allele dataframe


check inter-individual contamination

Description

check inter-individual contamination

Usage

check_contam(bulk)

Arguments

bulk

dataframe Pseudobulk profile


check noise level

Description

check noise level

Usage

check_exp_noise(bulk)

Arguments

bulk

dataframe Pseudobulk profile


check the format of lambdas_ref

Description

check the format of lambdas_ref

Usage

check_exp_ref(lambdas_ref)

Arguments

lambdas_ref

matrix Expression reference profile

Value

matrix Expression reference profile


Check the format of a count matrix

Description

Check the format of a count matrix

Usage

check_matrix(count_mat)

Arguments

count_mat

matrix Count matrix

Value

matrix Count matrix


check the format of a given consensus segment dataframe

Description

check the format of a given consensus segment dataframe

Usage

check_segs_fix(segs_consensus_fix)

Arguments

segs_consensus_fix

dataframe Consensus segment dataframe

Value

dataframe Consensus segment dataframe


Check the format of a given clonal LOH segment dataframe

Description

Check the format of a given clonal LOH segment dataframe

Usage

check_segs_loh(segs_loh)

Arguments

segs_loh

dataframe Clonal LOH segment dataframe

Value

dataframe Clonal LOH segment dataframe


choose beest reference for each cell based on correlation

Description

choose beest reference for each cell based on correlation

Usage

choose_ref_cor(count_mat, lambdas_ref, gtf)

Arguments

count_mat

dgCMatrix Gene expression counts

lambdas_ref

matrix Reference expression profiles

gtf

dataframe Transcript gtf

Value

named vector Best references for each cell


chromosome sizes (hg19)

Description

chromosome sizes (hg19)

Usage

chrom_sizes_hg19

Format

An object of class data.table (inherits from data.frame) with 22 rows and 2 columns.


chromosome sizes (hg38)

Description

chromosome sizes (hg38)

Usage

chrom_sizes_hg38

Format

An object of class data.table (inherits from data.frame) with 22 rows and 2 columns.


classify alleles using viterbi and forward-backward

Description

classify alleles using viterbi and forward-backward

Usage

classify_alleles(bulk)

Arguments

bulk

dataframe Pesudobulk profile

Value

dataframe Pesudobulk profile


Plot CNV heatmap

Description

Plot CNV heatmap

Usage

cnv_heatmap(
  segs,
  var = "group",
  label_group = TRUE,
  legend = TRUE,
  exclude_gap = TRUE,
  genome = "hg38"
)

Arguments

segs

dataframe Segments to plot. Need columns "seg_start", "seg_end", "cnv_state"

var

character Column to facet by

label_group

logical Label the groups

legend

logical Display the legend

exclude_gap

logical Whether to mark gap regions

genome

character Genome build, either 'hg38' or 'hg19'

Value

ggplot Heatmap of CNVs along the genome

Examples

p = cnv_heatmap(segs_example)

Combine allele and expression pseudobulks

Description

Combine allele and expression pseudobulks

Usage

combine_bulk(allele_bulk, exp_bulk)

Arguments

allele_bulk

dataframe Bulk allele profile

exp_bulk

dataframe Bulk expression profile

Value

dataframe Pseudobulk allele and expression profile


Do bayesian averaging to get posteriors

Description

Do bayesian averaging to get posteriors

Usage

compute_posterior(PL)

Arguments

PL

dataframe Likelihoods and priors

Value

dataframe Posteriors


Merge adjacent set of nodes

Description

Merge adjacent set of nodes

Usage

contract_nodes(G, vset, node_tar = NULL, debug = FALSE)

Arguments

G

igraph Mutation graph

vset

vector Set of adjacent vertices to merge

Value

igraph Mutation graph


example gene expression count matrix

Description

example gene expression count matrix

Usage

count_mat_example

Format

An object of class dgCMatrix with 1024 rows and 173 columns.


example reference count matrix

Description

example reference count matrix

Usage

count_mat_ref

Format

An object of class dgCMatrix with 1000 rows and 50 columns.


Call clonal LOH using SNP density. Rcommended for cell lines or tumor samples with no normal cells.

Description

Call clonal LOH using SNP density. Rcommended for cell lines or tumor samples with no normal cells.

Usage

detect_clonal_loh(bulk, t = 1e-05, snp_rate_loh = 5, min_depth = 0)

Arguments

bulk

dataframe Pseudobulk profile

t

numeric Transition probability

snp_rate_loh

numeric The assumed SNP density in clonal LOH regions

min_depth

integer Minimum coverage to filter SNPs

Value

dataframe LOH segments

Examples

segs_loh = detect_clonal_loh(bulk_example)

example allele count dataframe

Description

example allele count dataframe

Usage

df_allele_example

Format

An object of class data.frame with 41167 rows and 11 columns.


Run smoothed expression-based hclust

Description

Run smoothed expression-based hclust

Usage

exp_hclust(
  count_mat,
  lambdas_ref,
  gtf,
  sc_refs = NULL,
  window = 101,
  ncores = 1,
  verbose = TRUE
)

Arguments

count_mat

dgCMatrix Gene counts

lambdas_ref

matrix Reference expression profiles

gtf

dataframe Transcript GTF

sc_refs

named list Reference choices for single cells

window

integer Sliding window size

ncores

integer Number of cores

verbose

logical Verbosity


expand multi-allelic CNVs into separate entries in the single-cell posterior dataframe

Description

expand multi-allelic CNVs into separate entries in the single-cell posterior dataframe

Usage

expand_states(sc_post, segs_consensus)

Arguments

sc_post

dataframe Single-cell posteriors

segs_consensus

dataframe Consensus segments

Value

dataframe Single-cell posteriors with multi-allelic CNVs split into different entries


Fill neutral regions into consensus segments

Description

Fill neutral regions into consensus segments

Usage

fill_neu_segs(segs_consensus, segs_neu)

Arguments

segs_consensus

dataframe CNV segments from multiple samples

segs_neu

dataframe Neutral segments

Value

dataframe Collections of neutral and aberrant segments with no gaps


filter for mutually expressed genes

Description

filter for mutually expressed genes

Usage

filter_genes(count_mat, lambdas_ref, gtf, verbose = FALSE)

Arguments

count_mat

dgCMatrix Gene expression counts

lambdas_ref

named numeric vector A reference expression profile

gtf

dataframe Transcript gtf

Value

vector Genes that are kept after filtering


Find the common diploid region in a group of pseudobulks

Description

Find the common diploid region in a group of pseudobulks

Usage

find_common_diploid(
  bulks,
  grouping = "clique",
  gamma = 20,
  theta_min = 0.08,
  t = 1e-05,
  fc_min = 2^0.25,
  alpha = 1e-04,
  min_genes = 10,
  ncores = 1,
  debug = FALSE,
  verbose = TRUE
)

Arguments

bulks

dataframe Pseudobulk profiles (differentiated by "sample" column)

grouping

logical Whether to use cliques or components in the graph to find dipoid cluster

gamma

numeric Dispersion parameter for the Beta-Binomial allele model

theta_min

numeric Minimum imbalance threshold

t

numeric Transition probability

fc_min

numeric Minimum fold change to call quadruploid cluster

alpha

numeric FDR cut-off for q values to determine edges

ncores

integer Number of cores to use

Value

list Ploidy information


fit a Beta-Binomial model by maximum likelihood

Description

fit a Beta-Binomial model by maximum likelihood

Usage

fit_bbinom(AD, DP)

Arguments

AD

numeric vector Variant allele depth

DP

numeric vector Total allele depth

Value

MLE of alpha and beta


fit gamma maximum likelihood

Description

fit gamma maximum likelihood

Usage

fit_gamma(AD, DP, start = 20)

Arguments

AD

numeric vector Variant allele depth

DP

numeric vector Total allele depth

Value

a fit


fit a PLN model by maximum likelihood

Description

fit a PLN model by maximum likelihood

Usage

fit_lnpois(Y_obs, lambda_ref, d)

Arguments

Y_obs

numeric vector Gene expression counts

lambda_ref

numeric vector Reference expression levels

d

numeric Total library size

Value

numeric MLE of mu and sig


Fit a reference profile from multiple references using constrained least square

Description

Fit a reference profile from multiple references using constrained least square

Usage

fit_ref_sse(Y_obs, lambdas_ref, gtf, min_lambda = 2e-06, verbose = FALSE)

Arguments

Y_obs

vector

lambdas_ref

named vector

gtf

dataframe

Value

fitted expression profile


negative binomial model

Description

negative binomial model

Usage

fit_snp_rate(gene_snps, gene_length)

genome gap regions (hg19)

Description

genome gap regions (hg19)

Usage

gaps_hg19

Format

An object of class data.table (inherits from data.frame) with 28 rows and 3 columns.


genome gap regions (hg38)

Description

genome gap regions (hg38)

Usage

gaps_hg38

Format

An object of class data.table (inherits from data.frame) with 30 rows and 3 columns.


Generate alphabetical postfixes

Description

Generate alphabetical postfixes

Usage

generate_postfix(n)

Arguments

n

vector of integers

Value

vector of alphabetical postfixes


Genotyping main function

Description

Genotyping main function

Usage

genotype(label, samples, vcfs, outdir, het_only = FALSE, chr_prefix = TRUE)

Arguments

label

character Individual/sample label

samples

vector Sample names

vcfs

list of vcfR VCFs from cellsnp-lite pileup

outdir

character Output directory

het_only

logical Whether to only use heterozygous SNPs

chr_prefix

logical Whether to add chr prefix

Value

integer Status code


Aggregate into pseudobulk alelle profile

Description

Aggregate into pseudobulk alelle profile

Usage

get_allele_bulk(df_allele, nu = 1, min_depth = 0)

Arguments

df_allele

dataframe Single-cell allele counts

nu

numeric Phase switch rate

min_depth

integer Minimum coverage to filter SNPs

Value

dataframe Pseudobulk allele profile


Get an allele HMM

Description

Get an allele HMM

Usage

get_allele_hmm(pAD, DP, p_s, theta, gamma = 20)

Arguments

pAD

integer vector Paternal allele counts

DP

integer vector Total alelle counts

p_s

numeric vector Phase switch probabilities

theta

numeric Haplotype imbalance

gamma

numeric Overdispersion in the allele-specific expression

Value

HMM object


get CNV allele posteriors

Description

get CNV allele posteriors

Usage

get_allele_post(df_allele, haplotypes, segs_consensus)

Arguments

df_allele

dataframe Allele counts

haplotypes

dataframe Haplotype classification

segs_consensus

dataframe Consensus CNV segments

Value

dataframe Allele posteriors


Aggregate single-cell data into combined bulk expression and allele profile

Description

Aggregate single-cell data into combined bulk expression and allele profile

Usage

get_bulk(
  count_mat,
  lambdas_ref,
  df_allele,
  gtf,
  subset = NULL,
  min_depth = 0,
  nu = 1,
  segs_loh = NULL,
  verbose = TRUE
)

Arguments

count_mat

dgCMatrix Gene expression counts

lambdas_ref

matrix Reference expression profiles

df_allele

dataframe Single-cell allele counts

gtf

dataframe Transcript gtf

subset

vector Subset of cells to aggregate

min_depth

integer Minimum coverage to filter SNPs

nu

numeric Phase switch rate

segs_loh

dataframe Segments with clonal LOH to be excluded

verbose

logical Verbosity

Value

dataframe Pseudobulk gene expression and allele profile

Examples

bulk_example = get_bulk(
    count_mat = count_mat_example,
    lambdas_ref = ref_hca,
    df_allele = df_allele_example,
    gtf = gtf_hg38)

Map cells to the phylogeny (or genotypes) based on CNV posteriors

Description

Map cells to the phylogeny (or genotypes) based on CNV posteriors

Usage

get_clone_post(gtree, exp_post, allele_post)

Arguments

gtree

tbl_graph A cell lineage tree

exp_post

dataframe Expression posteriors

allele_post

dataframe Allele posteriors

Value

dataframe Clone posteriors


Aggregate into bulk expression profile

Description

Aggregate into bulk expression profile

Usage

get_exp_bulk(count_mat, lambdas_ref, gtf, verbose = FALSE)

Arguments

count_mat

dgCMatrix Gene expression counts

lambdas_ref

matrix Reference expression profiles

gtf

dataframe Transcript gtf

Value

dataframe Pseudobulk gene expression profile


get the single cell expression likelihoods

Description

get the single cell expression likelihoods

Usage

get_exp_likelihoods(
  exp_counts,
  diploid_chroms = NULL,
  use_loh = FALSE,
  depth_obs = NULL,
  mu = NULL,
  sigma = NULL
)

Arguments

exp_counts

dataframe Single-cell expression counts (CHROM, seg, cnv_state, gene, Y_obs, lambda_ref)

diploid_chroms

character vector Known diploid chromosomes

use_loh

logical Whether to include CNLOH regions in baseline

Value

dataframe Single-cell CNV likelihood scores


compute single-cell expression posteriors

Description

compute single-cell expression posteriors

Usage

get_exp_post(
  segs_consensus,
  count_mat,
  gtf,
  lambdas_ref,
  sc_refs = NULL,
  diploid_chroms = NULL,
  use_loh = NULL,
  segs_loh = NULL,
  ncores = 30,
  verbose = TRUE,
  debug = FALSE
)

Arguments

segs_consensus

dataframe Consensus segments

count_mat

dgCMatrix gene expression count matrix

gtf

dataframe transcript gtf

lambdas_ref

matrix Reference expression profiles

Value

dataframe Expression posteriors


get the single cell expression dataframe

Description

get the single cell expression dataframe

Usage

get_exp_sc(segs_consensus, count_mat, gtf, segs_loh = NULL)

Arguments

segs_consensus

dataframe Consensus segments

count_mat

dgCMatrix gene expression count matrix

gtf

dataframe Transcript gtf

Value

dataframe single cell expression counts annotated with segments


Get a tidygraph tree with simplified mutational history.

Description

Specify either max_cost or n_cut. max_cost works similarly as h and n_cut works similarly as k in stats::cutree. The top-level normal diploid clone is always included.

Usage

get_gtree(tree, P, n_cut = 0, max_cost = 0)

Arguments

tree

phylo Single-cell phylogenetic tree

P

matrix Genotype probability matrix

n_cut

integer Number of cuts on the phylogeny to define subclones

max_cost

numeric Likelihood threshold to collapse internal branches

Value

tbl_graph Phylogeny annotated with branch lengths and mutation events


Get phased haplotypes

Description

Get phased haplotypes

Usage

get_haplotype_post(bulks, segs_consensus, naive = FALSE)

Arguments

bulks

dataframe Subtree pseudobulk profiles

segs_consensus

dataframe Consensus CNV segments

naive

logical Whether to use naive haplotype classification

Value

dataframe Posterior haplotypes


Helper function to get inter-SNP distance

Description

Helper function to get inter-SNP distance

Usage

get_inter_cm(d)

Arguments

d

numeric vector Genetic positions in centimorgan (cM)

Value

numeric vector Inter-SNP genetic distances


Helper function to get the internal nodes of a dendrogram and the leafs in each subtree

Description

Helper function to get the internal nodes of a dendrogram and the leafs in each subtree

Usage

get_internal_nodes(den, node, labels)

Arguments

den

dendrogram

node

character Node name

labels

character vector Leaf labels


get joint posteriors

Description

get joint posteriors

Usage

get_joint_post(exp_post, allele_post, segs_consensus)

Arguments

exp_post

dataframe Expression single-cell CNV posteriors

allele_post

dataframe Allele single-cell CNV posteriors

segs_consensus

dataframe Consensus CNV segments

Value

dataframe Joint single-cell CNV posteriors


Get average reference expressio profile based on single-cell ref choices

Description

Get average reference expressio profile based on single-cell ref choices

Usage

get_lambdas_bar(lambdas_ref, sc_refs, verbose = TRUE)

Arguments

lambdas_ref

matrix Reference expression profiles

sc_refs

vector Single-cell reference choices

verbose

logical Print messages


Get the cost of a mutation reassignment

Description

Get the cost of a mutation reassignment

Usage

get_move_cost(muts, node_ori, node_tar, l_matrix)

Arguments

muts

character Mutations dlimited by comma

node_ori

character Name of the "from" node

node_tar

character Name of the "to" node

Value

numeric Likelihood cost of the mutation reassignment


Get the least costly mutation reassignment

Description

Get the least costly mutation reassignment

Usage

get_move_opt(G, l_matrix)

Arguments

G

igraph Mutation graph

l_matrix

matrix Likelihood matrix of mutation placements

Value

numeric Lieklihood cost of performing the mutation move


Get the internal nodes of a dendrogram and the leafs in each subtree

Description

Get the internal nodes of a dendrogram and the leafs in each subtree

Usage

get_nodes_celltree(hc, clusters)

Arguments

hc

hclust Clustering results

clusters

named vector Cutree output specifying the terminal clusters

Value

list Interal node subtrees with leaf memberships


Get ordered tips from a tree

Description

Get ordered tips from a tree

Usage

get_ordered_tips(tree)

Extract consensus CNV segments

Description

Extract consensus CNV segments

Usage

get_segs_consensus(bulks, min_LLR = 5, min_overlap = 0.45, retest = TRUE)

Arguments

bulks

dataframe Pseudobulks

min_LLR

numeric LLR threshold to filter CNVs

min_overlap

numeric Minimum overlap fraction to determine count two events as as overlapping

Value

dataframe Consensus segments


get neutral segments from multiple pseudobulks

Description

get neutral segments from multiple pseudobulks

Usage

get_segs_neu(bulks)

process VCFs into SNP dataframe

Description

process VCFs into SNP dataframe

Usage

get_snps(vcf)

Arguments

vcf

vcfR object

Value

dataframe SNP information


Find maximum lilkelihood assignment of mutations on a tree

Description

Find maximum lilkelihood assignment of mutations on a tree

Usage

get_tree_post(tree, P)

Arguments

tree

phylo Single-cell phylogenetic tree

P

matrix Genotype probability matrix

Value

list Mutation


example smoothed gene expression dataframe

Description

example smoothed gene expression dataframe

Usage

gexp_roll_example

Format

An object of class data.frame with 10 rows and 2000 columns.


gene model (hg19)

Description

gene model (hg19)

Usage

gtf_hg19

Format

An object of class data.table (inherits from data.frame) with 26841 rows and 5 columns.


gene model (hg38)

Description

gene model (hg38)

Usage

gtf_hg38

Format

An object of class data.table (inherits from data.frame) with 26807 rows and 5 columns.


gene model (mm10)

Description

gene model (mm10)

Usage

gtf_mm10

Format

An object of class data.table (inherits from data.frame) with 30336 rows and 5 columns.


example hclust tree

Description

example hclust tree

Usage

hc_example

Format

An object of class hclust of length 7.


example joint single-cell cnv posterior dataframe

Description

example joint single-cell cnv posterior dataframe

Usage

joint_post_example

Format

An object of class data.table (inherits from data.frame) with 3806 rows and 71 columns.


Annotate the direct upstream or downstream mutations on the edges

Description

Annotate the direct upstream or downstream mutations on the edges

Usage

label_edges(G)

Arguments

G

igraph Mutation graph

Value

igraph Mutation graph


Label the genotypes on a mutation graph

Description

Label the genotypes on a mutation graph

Usage

label_genotype(G)

Arguments

G

igraph Mutation graph

Value

igraph Mutation graph


Log memory usage

Description

Log memory usage

Usage

log_mem()

Log a message

Description

Log a message

Usage

log_message(msg, verbose = TRUE)

Arguments

msg

string Message to log

verbose

boolean Whether to print message to console


Make a group of pseudobulks

Description

Make a group of pseudobulks

Usage

make_group_bulks(
  groups,
  count_mat,
  df_allele,
  lambdas_ref,
  gtf,
  min_depth = 0,
  nu = 1,
  segs_loh = NULL,
  ncores = NULL
)

Arguments

groups

list Contains fields named "sample", "cells", "size", "members"

count_mat

dgCMatrix Gene counts

df_allele

dataframe Alelle counts

lambdas_ref

matrix Reference expression profiles

gtf

dataframe Transcript GTF

min_depth

integer Minimum allele depth to include

segs_loh

dataframe Segments with clonal LOH to be excluded

ncores

integer Number of cores

Value

dataframe Pseudobulk profiles


Mark the tumor lineage of a phylogeny

Description

Mark the tumor lineage of a phylogeny

Usage

mark_tumor_lineage(gtree)

Arguments

gtree

tbl_graph Single-cell phylogeny

Value

tbl_graph Phylogeny annotated with tumor versus normal compartment


example mutation graph

Description

example mutation graph

Usage

mut_graph_example

Format

An object of class igraph of length 5.


Rolling estimate of expression fold change phi

Description

Rolling estimate of expression fold change phi

Usage

phi_hat_roll(Y_obs, lambda_ref, d_obs, mu, sig, h)

Estimate of expression fold change phi in a segment

Description

Estimate of expression fold change phi in a segment

Usage

phi_hat_seg(Y_obs, lambda_ref, d, mu, sig)

example single-cell phylogeny

Description

example single-cell phylogeny

Usage

phylogeny_example

Format

An object of class tbl_graph (inherits from igraph) of length 345.


Plot a group of pseudobulk HMM profiles

Description

Plot a group of pseudobulk HMM profiles

Usage

plot_bulks(bulks, ..., ncol = 1, title = TRUE, title_size = 8)

Arguments

bulks

dataframe Pseudobulk profiles annotated with "sample" column

...

additional parameters passed to plot_psbulk()

ncol

integer Number of columns

title

logical Whether to add titles to individual plots

title_size

numeric Size of titles

Value

a ggplot object

Examples

p = plot_bulks(bulk_example)

Plot consensus CNVs

Description

Plot consensus CNVs

Usage

plot_consensus(segs)

Arguments

segs

dataframe Consensus segments

Value

ggplot object

Examples

p = plot_consensus(segs_example)

Plot single-cell smoothed expression magnitude heatmap

Description

Plot single-cell smoothed expression magnitude heatmap

Usage

plot_exp_roll(
  gexp_roll_wide,
  hc,
  k,
  gtf,
  lim = 0.8,
  n_sample = 300,
  reverse = TRUE,
  plot_tree = TRUE
)

Arguments

gexp_roll_wide

matrix Cell x gene smoothed expression magnitudes

hc

hclust Hierarchical clustring result

k

integer Number of clusters

gtf

dataframe Transcript GTF

lim

numeric Limit for expression magnitudes

n_sample

integer Number of cells to subsample

reverse

logical Whether to reverse the cell order

plot_tree

logical Whether to plot the dendrogram

Value

ggplot A single-cell heatmap of window-smoothed expression CNV signals

Examples

p = plot_exp_roll(gexp_roll_example, gtf = gtf_hg38, hc = hc_example, k = 3)

Plot mutational history

Description

Plot mutational history

Usage

plot_mut_history(
  G,
  clone_post = NULL,
  edge_label_size = 4,
  node_label_size = 6,
  node_size = 10,
  arrow_size = 2,
  show_clone_size = TRUE,
  show_distance = TRUE,
  legend = TRUE,
  edge_label = TRUE,
  node_label = TRUE,
  horizontal = TRUE,
  pal = NULL
)

Arguments

G

igraph Mutation history graph

clone_post

dataframe Clone assignment posteriors

edge_label_size

numeric Size of edge label

node_label_size

numeric Size of node label

node_size

numeric Size of nodes

arrow_size

numeric Size of arrows

show_clone_size

logical Whether to show clone size

show_distance

logical Whether to show evolutionary distance between clones

legend

logical Whether to show legend

edge_label

logical Whether to label edges

node_label

logical Whether to label nodes

horizontal

logical Whether to use horizontal layout

pal

named vector Node colors

Value

ggplot object

Examples

p = plot_mut_history(mut_graph_example)

Plot single-cell CNV calls along with the clonal phylogeny

Description

Plot single-cell CNV calls along with the clonal phylogeny

Usage

plot_phylo_heatmap(
  gtree,
  joint_post,
  segs_consensus,
  clone_post = NULL,
  p_min = 0.9,
  annot = NULL,
  pal_annot = NULL,
  annot_title = "Annotation",
  annot_scale = NULL,
  clone_dict = NULL,
  clone_bar = TRUE,
  clone_stack = TRUE,
  pal_clone = NULL,
  clone_title = "Genotype",
  clone_legend = TRUE,
  line_width = 0.1,
  tree_height = 1,
  branch_width = 0.2,
  tip_length = 0.2,
  annot_bar_width = 0.25,
  clone_bar_width = 0.25,
  bar_label_size = 7,
  tvn_line = TRUE,
  clone_line = FALSE,
  exclude_gap = FALSE,
  root_edge = TRUE,
  raster = FALSE,
  show_phylo = TRUE
)

Arguments

gtree

tbl_graph The single-cell phylogeny

joint_post

dataframe Joint single cell CNV posteriors

segs_consensus

datatframe Consensus segment dataframe

clone_post

dataframe Clone assignment posteriors

p_min

numeric Probability threshold to display CNV calls

annot

dataframe Cell annotations, dataframe with 'cell' and additional annotation columns

pal_annot

named vector Colors for cell annotations

annot_title

character Legend title for the annotation bar

annot_scale

ggplot scale Color scale for the annotation bar

clone_dict

named vector Clone annotations, mapping from cell name to clones

clone_bar

logical Whether to display clone bar plot

clone_stack

character Whether to plot clone assignment probabilities as stacked bar

pal_clone

named vector Clone colors

clone_title

character Legend title for the clone bar

clone_legend

logical Whether to display the clone legend

line_width

numeric Line width for CNV heatmap

tree_height

numeric Relative height of the phylogeny plot

branch_width

numeric Line width in the phylogeny

tip_length

numeric Length of tips in the phylogeny

annot_bar_width

numeric Width of annotation bar

clone_bar_width

numeric Width of clone genotype bar

bar_label_size

numeric Size of sidebar text labels

tvn_line

logical Whether to draw line separating tumor and normal cells

clone_line

logical Whether to display borders for clones in the heatmap

exclude_gap

logical Whether to mark gap regions

root_edge

logical Whether to plot root edge

raster

logical Whether to raster images

show_phylo

logical Whether to display phylogeny on y axis

Value

ggplot panel

Examples

p = plot_phylo_heatmap(
    gtree = phylogeny_example,
    joint_post = joint_post_example,
    segs_consensus = segs_example)

Plot a pseudobulk HMM profile

Description

Plot a pseudobulk HMM profile

Usage

plot_psbulk(
  bulk,
  use_pos = TRUE,
  allele_only = FALSE,
  min_LLR = 5,
  min_depth = 8,
  exp_limit = 2,
  phi_mle = TRUE,
  theta_roll = FALSE,
  dot_size = 0.8,
  dot_alpha = 0.5,
  legend = TRUE,
  exclude_gap = TRUE,
  genome = "hg38",
  text_size = 10,
  raster = FALSE
)

Arguments

bulk

dataframe Pseudobulk profile

use_pos

logical Use marker position instead of index as x coordinate

allele_only

logical Only plot alleles

min_LLR

numeric LLR threshold for event filtering

min_depth

numeric Minimum coverage depth for a SNP to be plotted

exp_limit

numeric Expression logFC axis limit

phi_mle

logical Whether to plot estimates of segmental expression fold change

theta_roll

logical Whether to plot rolling estimates of allele imbalance

dot_size

numeric Size of marker dots

dot_alpha

numeric Transparency of the marker dots

legend

logical Whether to show legend

exclude_gap

logical Whether to mark gap regions and centromeres

genome

character Genome build, either 'hg38' or 'hg19'

text_size

numeric Size of text in the plot

raster

logical Whether to raster images

Value

ggplot Plot of pseudobulk HMM profile

Examples

p = plot_psbulk(bulk_example)

Plot single-cell smoothed expression magnitude heatmap

Description

Plot single-cell smoothed expression magnitude heatmap

Usage

plot_sc_tree(
  gtree,
  label_mut = TRUE,
  label_size = 3,
  dot_size = 2,
  branch_width = 0.5,
  tip = TRUE,
  tip_length = 0.5,
  pal_clone = NULL
)

Arguments

gtree

tbl_graph The single-cell phylogeny

label_mut

logical Whether to label mutations

label_size

numeric Size of mutation labels

dot_size

numeric Size of mutation nodes

branch_width

numeric Width of branches in tree

tip

logical Whether to plot tip point

tip_length

numeric Length of the tips

pal_clone

named vector Clone colors

Value

ggplot A single-cell phylogeny with mutation history labeled

Examples

p = plot_sc_tree(phylogeny_example)

Get the total probability from a region of a normal pdf

Description

Get the total probability from a region of a normal pdf

Usage

pnorm.range.log(lower, upper, mu, sd)

HMM object for unit tests

Description

HMM object for unit tests

Usage

pre_likelihood_hmm

Format

An object of class list of length 10.


Preprocess allele data

Description

Preprocess allele data

Usage

preprocess_allele(sample, vcf_pu, vcf_phased, AD, DP, barcodes, gtf, gmap)

Arguments

sample

character Sample label

vcf_pu

dataframe Pileup VCF from cell-snp-lite

vcf_phased

dataframe Phased VCF from eagle2

AD

dgTMatrix Alt allele depth matrix from pileup

DP

dgTMatrix Total allele depth matrix from pileup

barcodes

vector List of barcodes from pileup

gtf

dataframe Transcript GTF

gmap

dataframe Genetic map

Value

dataframe Allele counts by cell


reference expression magnitudes from HCA

Description

reference expression magnitudes from HCA

Usage

ref_hca

Format

An object of class matrix (inherits from array) with 24756 rows and 12 columns.


reference expression counts from HCA

Description

reference expression counts from HCA

Usage

ref_hca_counts

Format

An object of class matrix (inherits from array) with 24857 rows and 12 columns.


Relevel chromosome column

Description

Relevel chromosome column

Usage

relevel_chrom(df)

Arguments

df

dataframe Dataframe with chromosome column


Get unique CNVs from set of segments

Description

Get unique CNVs from set of segments

Usage

resolve_cnvs(segs_all, min_overlap = 0.5, debug = FALSE)

Arguments

segs_all

dataframe CNV segments from multiple samples

min_overlap

numeric scalar Minimum overlap fraction to determine count two events as as overlapping

Value

dataframe Consensus CNV segments


retest consensus segments on pseudobulks

Description

retest consensus segments on pseudobulks

Usage

retest_bulks(
  bulks,
  segs_consensus = NULL,
  t = 1e-05,
  min_genes = 10,
  gamma = 20,
  nu = 1,
  use_loh = FALSE,
  diploid_chroms = NULL,
  ncores = 1,
  exclude_neu = TRUE,
  min_LLR = 5
)

Arguments

bulks

dataframe Pseudobulk profiles

segs_consensus

dataframe Consensus segments

use_loh

logical Whether to use loh in the baseline

diploid_chroms

vector User-provided diploid chromosomes

Value

dataframe Retested pseudobulks


retest CNVs in a pseudobulk

Description

retest CNVs in a pseudobulk

Usage

retest_cnv(
  bulk,
  theta_min = 0.08,
  logphi_min = 0.25,
  gamma = 20,
  allele_only = FALSE,
  exclude_neu = TRUE
)

Arguments

bulk

pesudobulk dataframe

gamma

numeric Dispersion parameter for the Beta-Binomial allele model

allele_only

whether to retest only using allele data

Value

a dataframe of segments with CNV posterior information


Check the format of a given file

Description

Check the format of a given file

Usage

return_missing_columns(file, expected_colnames = NULL)

Run multiple HMMs

Description

Run multiple HMMs

Usage

run_group_hmms(
  bulks,
  t = 1e-04,
  gamma = 20,
  alpha = 1e-04,
  min_genes = 10,
  nu = 1,
  common_diploid = TRUE,
  diploid_chroms = NULL,
  allele_only = FALSE,
  retest = TRUE,
  run_hmm = TRUE,
  exclude_neu = TRUE,
  ncores = 1,
  verbose = FALSE,
  debug = FALSE
)

Arguments

bulks

dataframe Pseudobulk profiles

t

numeric Transition probability

gamma

numeric Dispersion parameter for the Beta-Binomial allele model

alpha

numeric P value cut-off to determine segment clusters in find_diploid

common_diploid

logical Whether to find common diploid regions between pseudobulks

diploid_chroms

character vector Known diploid chromosomes to use as baseline

allele_only

logical Whether only use allele data to run HMM

retest

logcial Whether to retest CNVs

run_hmm

logical Whether to run HMM segments or just retest

ncores

integer Number of cores


Run workflow to decompose tumor subclones

Description

Run workflow to decompose tumor subclones

Usage

run_numbat(
  count_mat,
  lambdas_ref,
  df_allele,
  genome = "hg38",
  out_dir = tempdir(),
  max_iter = 2,
  max_nni = 100,
  t = 1e-05,
  gamma = 20,
  min_LLR = 5,
  alpha = 1e-04,
  eps = 1e-05,
  max_entropy = 0.5,
  init_k = 3,
  min_cells = 50,
  tau = 0.3,
  nu = 1,
  max_cost = ncol(count_mat) * tau,
  n_cut = 0,
  min_depth = 0,
  common_diploid = TRUE,
  min_overlap = 0.45,
  ncores = 1,
  ncores_nni = ncores,
  random_init = FALSE,
  segs_loh = NULL,
  call_clonal_loh = FALSE,
  verbose = TRUE,
  diploid_chroms = NULL,
  segs_consensus_fix = NULL,
  use_loh = NULL,
  min_genes = 10,
  skip_nj = FALSE,
  multi_allelic = TRUE,
  p_multi = 1 - alpha,
  plot = TRUE,
  check_convergence = FALSE,
  exclude_neu = TRUE
)

Arguments

count_mat

dgCMatrix Raw count matrices where rownames are genes and column names are cells

lambdas_ref

matrix Either a named vector with gene names as names and normalized expression as values, or a matrix where rownames are genes and columns are pseudobulk names

df_allele

dataframe Allele counts per cell, produced by preprocess_allele

genome

character Genome version (hg38, hg19, or mm10)

out_dir

string Output directory

max_iter

integer Maximum number of iterations to run the phyologeny optimization

max_nni

integer Maximum number of iterations to run NNI in the ML phylogeny inference

t

numeric Transition probability

gamma

numeric Dispersion parameter for the Beta-Binomial allele model

min_LLR

numeric Minimum LLR to filter CNVs

alpha

numeric P value cutoff for diploid finding

eps

numeric Convergence threshold for ML tree search

max_entropy

numeric Entropy threshold to filter CNVs

init_k

integer Number of clusters in the initial clustering

min_cells

integer Minimum number of cells to run HMM on

tau

numeric Factor to determine max_cost as a function of the number of cells (0-1)

nu

numeric Phase switch rate

max_cost

numeric Likelihood threshold to collapse internal branches

n_cut

integer Number of cuts on the phylogeny to define subclones

min_depth

integer Minimum allele depth

common_diploid

logical Whether to find common diploid regions in a group of peusdobulks

min_overlap

numeric Minimum CNV overlap threshold

ncores

integer Number of threads to use

ncores_nni

integer Number of threads to use for NNI

random_init

logical Whether to initiate phylogney using a random tree (internal use only)

segs_loh

dataframe Segments of clonal LOH to be excluded

call_clonal_loh

logical Whether to call segments with clonal LOH

verbose

logical Verbosity

diploid_chroms

vector Known diploid chromosomes

segs_consensus_fix

dataframe Pre-determined segmentation of consensus CNVs

use_loh

logical Whether to include LOH regions in the expression baseline

min_genes

integer Minimum number of genes to call a segment

skip_nj

logical Whether to skip NJ tree construction and only use UPGMA

multi_allelic

logical Whether to call multi-allelic CNVs

p_multi

numeric P value cutoff for calling multi-allelic CNVs

plot

logical Whether to plot results

check_convergence

logical Whether to terminate iterations based on consensus CNV convergence

exclude_neu

logical Whether to exclude neutral segments from CNV retesting (internal use only)

Value

a status code


example CNV segments dataframe

Description

example CNV segments dataframe

Usage

segs_example

Format

An object of class data.table (inherits from data.frame) with 27 rows and 30 columns.


Calculate simes' p

Description

Calculate simes' p

Usage

simes_p(p.vals, n_dim)

Simplify the mutational history based on likelihood evidence

Description

Simplify the mutational history based on likelihood evidence

Usage

simplify_history(G, l_matrix, max_cost = 150, n_cut = 0, verbose = TRUE)

Arguments

G

igraph Mutation graph

l_matrix

matrix Mutation placement likelihood matrix (node by mutation)

Value

igraph Mutation graph


filtering, normalization and capping

Description

filtering, normalization and capping

Usage

smooth_expression(count_mat, lambdas_ref, gtf, window = 101, verbose = FALSE)

Arguments

count_mat

dgCMatrix Gene expression counts

lambdas_ref

matrix Reference expression profiles

gtf

dataframe Transcript gtf

Value

dataframe Log(x+1) transformed normalized expression values for single cells


Smooth the segments after HMM decoding

Description

Smooth the segments after HMM decoding

Usage

smooth_segs(bulk, min_genes = 10)

Arguments

bulk

dataframe Pseudobulk profile

min_genes

integer Minimum number of genes to call a segment

Value

dataframe Pseudobulk profile


predict phase switch probablity as a function of genetic distance

Description

predict phase switch probablity as a function of genetic distance

Usage

switch_prob_cm(d, nu = 1, min_p = 1e-10)

Arguments

d

numeric vector Genetic distance in cM

nu

numeric Phase switch rate

min_p

numeric Minimum phase switch probability

Value

numeric vector Phase switch probability


T-test wrapper, handles error for insufficient observations

Description

T-test wrapper, handles error for insufficient observations

Usage

t_test_pval(x, y)

test for multi-allelic CNVs

Description

test for multi-allelic CNVs

Usage

test_multi_allelic(bulks, segs_consensus, min_LLR = 5, p_min = 0.999)

Arguments

bulks

dataframe Pseudobulk profiles

segs_consensus

dataframe Consensus segments

min_LLR

numeric CNV LLR threshold to filter events

p_min

numeric Probability threshold to call multi-allelic events

Value

dataframe Consensus segments annotated with multi-allelic events


Rolling estimate of imbalance level theta

Description

Rolling estimate of imbalance level theta

Usage

theta_hat_roll(major_count, minor_count, h)

Arguments

major_count

vector of major allele count

minor_count

vector of minor allele count

h

window size

Value

rolling estimate of theta


Estimate of imbalance level theta in a segment

Description

Estimate of imbalance level theta in a segment

Usage

theta_hat_seg(major_count, minor_count)

Arguments

major_count

vector of major allele count

minor_count

vector of minor allele count

Value

estimate of theta


Description

Annotate the direct upstream or downstream node on the edges

Usage

transfer_links(G)

Arguments

G

igraph Mutation graph

Value

igraph Mutation graph


UPGMA and WPGMA clustering

Description

UPGMA and WPGMA clustering

Usage

upgma(D, method = "average", ...)

Arguments

D

A distance matrix.

method

The agglomeration method to be used. This should be (an unambiguous abbreviation of) one of "ward", "single", "complete", "average", "mcquitty", "median" or "centroid". The default is "average".

...

Further arguments passed to or from other methods.


example VCF header

Description

example VCF header

Usage

vcf_meta

Format

An object of class character of length 65.


Viterbi for clonal LOH detection

Description

Viterbi for clonal LOH detection

Usage

viterbi_loh(hmm, ...)

Arguments

hmm

HMM object; expect variables x (SNP count), snp_sig (snp rate standard deviation), pm (snp density for ref and loh states), pn (gene lengths), d (total expression depth), y (expression count), lambda_star (reference expression rate), mu (global expression mean), sig (global expression standard deviation), Pi (transition prob matrix), delta (prior for each state), phi (expression fold change for each state)