Title: | Tree-Based Discriminant Analysis |
Version: | 0.0.5 |
Description: | Performs sparse discriminant analysis on a combination of node and leaf predictors when the predictor variables are structured according to a tree, as described in Fukuyama et al. (2017) <doi:10.1371/journal.pcbi.1005706>. |
Depends: | R (≥ 3.4.0) |
Imports: | sparseLDA (≥ 0.1.9), Matrix (≥ 1.2.10), mvtnorm (≥ 1.0.6), reshape2 (≥ 1.4.2), gtable (≥ 0.2.0), phyloseq (≥ 1.22.3), ggplot2 (≥ 2.2.1), ape (≥ 5.1), grid, stats |
Suggests: | adaptiveGPCA (≥ 0.1), knitr (≥ 1.16), testthat (≥ 2.0.0), markdown, rmarkdown |
VignetteBuilder: | knitr |
License: | GPL-2 |
URL: | https://github.com/jfukuyama/treeda |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.1.0 |
NeedsCompilation: | no |
Packaged: | 2021-05-14 20:15:48 UTC; jfukuyam |
Author: | Julia Fukuyama [aut, cre] |
Maintainer: | Julia Fukuyama <julia.fukuyama@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2021-05-14 21:30:03 UTC |
Tree-based discriminant analysis
Description
A package for performing sparse, tree-based discriminant analysis.
Details
This package contains functions for building sparse, tree-structured models for classification. The method is based on the idea that when our predictors are structured according to a tree, we can create an expanded feature space containing both the original leaf predictors as well as node predictors, which correspond to sums or averages across the leaves descending from them. Without some sort of regularization this problem would be unidentifiable, but with the regularization provided by sparse discriminant analysis we get stable solutions.
The package fits a sparse discriminant model in the expanded feature space and translates the results back to the leaf space, so that the interpretation can be purely in terms of the original predictors. The package also includes functions to perform cross validation to pick the sparsity level and plotting commands to visualize the tree and the fitted coefficient vectors.
The main function in this package is treeda
, which
fits a sparse tree-based discriminant model. Additional functions
provided are treedacv
, which performs
cross-validation to determine the correct sparsity level, and
functions to plot the resulting coefficient vectors along the tree
(plot_coefficients
).
Author(s)
Maintainer: Julia Fukuyama julia.fukuyama@gmail.com
See Also
Useful links:
Check predictors
Description
Checks whether the predictors are consistent with the tree structure.
Usage
checkPredictorsAndTree(predictors, tree)
Coefficients from treeda fit
Description
Returns the coefficients from a treeda fit either in terms of the leaves only or in terms of the nodes and leaves.
Usage
## S3 method for class 'treeda'
coef(object, type = c("leaves", "nodes"), ...)
Arguments
object |
An object of class |
type |
Should the coefficients be in the leaf space or the node space? |
... |
Not used. |
Value
A Matrix
object containing the coefficients.
Examples
data(treeda_example)
out.treeda = treeda(response = treeda_example$response,
predictors = treeda_example$predictors,
tree = treeda_example$tree,
p = 1)
coef(out.treeda, type = "leaves")
coef(out.treeda, type = "nodes")
Method for combining two ggplots
Description
This method takes a ggplot of some data along the tips of the tree
and a ggplot of a tree and combines them. It assumes that you are
putting the tree on top and that the x axis for the plot has the
leaves in the correct position (this can be found using the
function get_leaf_position
).
Usage
combine_plot_and_tree(plot, tree.plot, tree.height = 5, print = TRUE)
Arguments
plot |
A plot of data about the leaves with the x axis corresponding to leaves. |
tree.plot |
A plot of the tree. |
tree.height |
The relative amount of space in the plot the tree should take up. |
print |
If true, the function will print the combined plot to a graphics device, otherwise it will just return the gtable object without printing. |
Value
Returns a gtable
object.
Makes a hash table with nodes and their children
Description
Takes the edge matrix from a phylo-class object and turns it into a list where the entries are nodes and the elements are vectors with the children of the nodes.
Usage
edgesToChildren(edges)
Expand the background of a gtable.
Description
Expand the background of a gtable.
Usage
expand_background(gtable)
Arguments
gtable |
A gtable object whose background needs to be expanded to fill the whole space. |
Value
A gtable object with a bigger background.
Make branch length vector
Description
Gets the branch lengths of the tree, with order the same as the columns in makeDescendantMatrix.
Usage
getBranchLengths(tree)
Arguments
tree |
A tree object of class phylo |
Value
A vector of length ntips + nnodes, with ith element giving the length of the branch above node i.
Get leaf positions from a tree layout
Description
Takes a tree, returns a vector with names describing the leaves and entries giving the position of that leaf in the tree layout.
Usage
get_leaf_position(tree, ladderize)
Arguments
tree |
A tree of class |
ladderize |
FALSE for a non-ladderzied layout, TRUE or "right" for a ladderized layout, "left" for a layout ladderized the other way. |
Compute properties of the classes
Description
For each class, computes the prior probabilities, means, and variances for that class.
Usage
makeClassProperties(response, projections)
Arguments
response |
A vector containing the response for each observation. |
projections |
A matrix giving the projections of each observation onto the discriminating axes. |
Make descendant matrix
Description
Make a matrix describing the ancestry structure of a tree. Element (i,j) indicates whether leaf i is a descendant of node j.
Usage
makeDescendantMatrix(tree)
Arguments
tree |
A tree object of class phylo |
Value
A matrix describing the ancestry structure of a tree.
Make leaf coefficients
Description
For a set of coefficients defined on a matrix of (potentially centered and scaled) leaf and node predictors, find the equivalent set of coefficients on just the leaves.
Usage
makeLeafCoefficients(sda.out, descendantMatrix, means, sds)
Arguments
sda.out |
A fitted sda object |
descendantMatrix |
A matrix describing the tree which was used, element (i,j) indicates whether leaf i is a descendant of node j. |
means |
If the original predictor matrix was centered, the means of the original predictor matrix, otherwise NULL. |
sds |
If the original predictor matrix was scaled, the sds of the original predictor matrix, otherwise NULL. |
Value
A list giving the coefficients on the leaves for each of the discriminating axes and the intercepts for each of the discriminating axes.
Make a matrix with predictors for each leaf and node
Description
Make a matrix with one predictor for each leaf and node in the tree, where the node predictors are the sum of the leaf predictors descending from them.
Usage
makeNodeAndLeafPredictors(leafPredictors, tree)
Arguments
leafPredictors |
A predictor matrix for the leaves: rows are samples, columns are leaves. |
tree |
A phylogenetic tree describing the relationships between the species/leaves. |
Value
A predictor matrix for leaves and nodes together: rows are samples, columns are leaf/node predictors.
Make response matrix
Description
Create a dummy variable matrix for the response
Usage
makeResponseMatrix(response, class.names = NULL)
Arguments
response |
A factor or character vector containing the classes. |
class.names |
A character vector giving the possible levels of the factor. If NULL, it will be generated from the levels of response. |
Value
A dummy variable matrix with column names giving the class names.
Node coefficients to leaf coefficients
Description
General-purpose function for going from a coefficient vector on the nodes to a coefficient vector on the leaves.
Usage
nodeToLeafCoefficients(coef.vec, tree)
Arguments
coef.vec |
A vector containing coefficients on internal nodes plus leaves. |
tree |
The phylogenetic tree. |
Value
A vector containing coefficients on the leaves.
Plot a treedacv object
Description
Plots the cross-validation error with standard error bars.
Usage
## S3 method for class 'treedacv'
plot(x, ...)
Arguments
x |
An object of class |
... |
Not used. |
Examples
data(treeda_example)
out.treedacv = treedacv(response = treeda_example$response,
predictors = treeda_example$predictors,
tree = treeda_example$tree,
pvec = 1:10)
plot(out.treedacv)
Plot the discriminating axes from treeda
Description
Plots the leaf coefficients for the discriminating axes in a fitted
treeda
model aligned under the tree.
Usage
plot_coefficients(
out.treeda,
remove.bl = TRUE,
ladderize = TRUE,
tree.height = 2
)
Arguments
out.treeda |
The object resulting from a call to
|
remove.bl |
A logical, |
ladderize |
Layout parameter for the tree. |
tree.height |
The height of the tree relative to the height of the plot below. |
Value
A plot of the tree and the coefficients.
Examples
data(treeda_example)
out.treeda = treeda(response = treeda_example$response,
predictors = treeda_example$predictors,
tree = treeda_example$tree,
p = 1)
plot_coefficients(out.treeda)
Predict using new data
Description
Given a fitted treeda
model, get the predicted
classes and projections onto the discriminating axes for new data.
Usage
## S3 method for class 'treeda'
predict(object, newdata, newresponse = NULL, check.consist = TRUE, ...)
Arguments
object |
Output from |
newdata |
New predictor matrix in the same format as the
|
newresponse |
New response vector, not required. |
check.consist |
Check the consistency between the tree and predictor matrix? |
... |
Not used. |
Value
A list containing the projections of the new data onto the
discriminating axes (projections
), the predicted classes
(classes
), and the rss (rss
, only included if the
ground truth for the responses is available).
Examples
data(treeda_example)
out.treeda = treeda(response = treeda_example$response,
predictors = treeda_example$predictors,
tree = treeda_example$tree,
p = 1)
## Here we are predicting on the training data, in general this
## would be done on a held out test set
preds = predict(out.treeda, newdata = treeda_example$predictors,
newresponse = treeda_example$response)
## make a confusion matrix
table(preds$classes, treeda_example$response)
Print a treeda object
Description
Print a treeda object
Usage
## S3 method for class 'treeda'
print(x, ...)
Arguments
x |
|
... |
Not used. |
Print treedacv objects
Description
Print treedacv objects
Usage
## S3 method for class 'treedacv'
print(x, ...)
Arguments
x |
|
... |
Not used |
Tree-based sparse discriminant analysis
Description
Performs tree-structured sparse discriminant analysis using an augmented predictor matrix with additional predictors corresponding to the nodes and then translating the parameters back in terms of only the leaves.
Usage
treeda(
response,
predictors,
tree,
p,
k = nclasses - 1,
center = TRUE,
scale = TRUE,
class.names = NULL,
check.consist = TRUE,
A = NULL,
...
)
Arguments
response |
A factor or character vector giving the class to be predicted. |
predictors |
A matrix of predictor variables corresponding to the leaves of the tree and in the same order as the leaves of the tree. |
tree |
A tree of class |
p |
The number of predictors to use. |
k |
The number of components to use. |
center |
Center the predictor variables? |
scale |
Scale the predictor variables? |
class.names |
Optional argument giving the class names. |
check.consist |
Check consistency of the predictor matrix and the tree. |
A |
A matrix describing the tree structure. If it has been computed before it can be passed in here and will not be recomputed. |
... |
Additional arguments to be passed to sda |
Value
An object of class treeda
. Contains the coefficients
in the original predictor space (leafCoefficients
), the
number of predictors used in the node + leaf space
(nPredictors
), number of leaf predictors used
(nLeafPredictors
), the projections of the samples onto
the discriminating axes (projections
), and the sparse
discriminant analysis object that was used in the fit
(sda
).
Examples
data(treeda_example)
out.treeda = treeda(response = treeda_example$response,
predictors = treeda_example$predictors,
tree = treeda_example$tree,
p = 1)
out.treeda
Example dataset
Description
A small example dataset with three components, stored as a list
with a vector containing the classes (response
), a matrix
containing the predictor variables (predictors
), and a tree
describing the relationships between the predictor variables
(tree
). The dataset consists of 50 samples divided into two
classes and 100 taxa/predictor variables, related to each other by
a random tree (generated with ape::rtree
). A set of 42 taxa
descending from one internal node are all over-represented in one
class and under-represented in the other. The predictors
element in the list contains real numbers, not counts, and is
supposed to reflect normalized taxon abundances (e.g.,
normalization using the variance-stabilizing transformation in
DESeq2).
Format
A list containing response variables, predictor variables, and a tree describing the relationship between the predictor variables.
treeda cross validation
Description
Performs cross-validation of a treeda
fit.
Usage
treedacv(
response,
predictors,
tree,
folds = 5,
pvec = 1:tree$Nnode,
k = nclasses - 1,
center = TRUE,
scale = TRUE,
class.names = NULL,
...
)
Arguments
response |
The classes to be predicted. |
predictors |
A matrix of predictors corresponding to the tips of the tree. |
tree |
A tree object of class |
folds |
Either a single number corresponding to the number of folds of cross-validation to perform or a vector of integers ranging from 1 to the number of folds desired giving the partition of the dataset. |
pvec |
The values of p to use. |
k |
The number of discriminating axes to keep. |
center |
Center the predictors? |
scale |
Scale the predictors? |
class.names |
A vector giving the names of the classes. |
... |
Additional arguments to be passed to |
Value
A list with the value of p with minimum cv error
(p.min
), the minimum value of p with in 1 se of the
minimum cv error (p.1se
), and a data frame containing
the loss for each fold, mean loss, and standard error of the
loss for each value of p (loss.df
).
Examples
data(treeda_example)
out.treedacv = treedacv(response = treeda_example$response,
predictors = treeda_example$predictors,
tree = treeda_example$tree,
pvec = 1:10)
out.treedacv