Type: | Package |
Title: | Linear Optimal Low-Rank Projection |
Version: | 2.1 |
Date: | 2020-06-20 |
Maintainer: | Eric Bridgeford <ericwb95@gmail.com> |
Description: | Supervised learning techniques designed for the situation when the dimensionality exceeds the sample size have a tendency to overfit as the dimensionality of the data increases. To remedy this High dimensionality; low sample size (HDLSS) situation, we attempt to learn a lower-dimensional representation of the data before learning a classifier. That is, we project the data to a situation where the dimensionality is more manageable, and then are able to better apply standard classification or clustering techniques since we will have fewer dimensions to overfit. A number of previous works have focused on how to strategically reduce dimensionality in the unsupervised case, yet in the supervised HDLSS regime, few works have attempted to devise dimensionality reduction techniques that leverage the labels associated with the data. In this package and the associated manuscript Vogelstein et al. (2017) <doi:10.48550/arXiv.1709.01233>, we provide several methods for feature extraction, some utilizing labels and some not, along with easily extensible utilities to simplify cross-validative efforts to identify the best feature extraction method. Additionally, we include a series of adaptable benchmark simulations to serve as a standard for future investigative efforts into supervised HDLSS. Finally, we produce a comprehensive comparison of the included algorithms across a range of benchmark simulations and real data applications. |
Depends: | R (≥ 3.4.0) |
License: | GPL-2 |
URL: | https://github.com/neurodata/lol |
Imports: | ggplot2, abind, MASS, irlba, pls, robust, robustbase |
Encoding: | UTF-8 |
LazyData: | true |
VignetteBuilder: | knitr |
RoxygenNote: | 7.1.0 |
Suggests: | knitr, rmarkdown, parallel, randomForest, latex2exp, testthat, covr |
NeedsCompilation: | no |
Packaged: | 2020-06-25 18:56:31 UTC; eric |
Author: | Eric Bridgeford [aut, cre], Minh Tang [ctb], Jason Yim [ctb], Joshua Vogelstein [ths] |
Repository: | CRAN |
Date/Publication: | 2020-06-26 22:30:03 UTC |
Nearest Centroid Classifier Training
Description
A function that trains a classifier based on the nearest centroid.
Usage
lol.classify.nearestCentroid(X, Y, ...)
Arguments
X |
|
Y |
|
... |
optional args. |
Value
A list of class nearestCentroid
, with the following attributes:
centroids |
|
ylabs |
|
priors |
|
Details
For more details see the help vignette:
vignette("centroid", package = "lolR")
Author(s)
Eric Bridgeford
Examples
library(lolR)
data <- lol.sims.rtrunk(n=200, d=30) # 200 examples of 30 dimensions
X <- data$X; Y <- data$Y
model <- lol.classify.nearestCentroid(X, Y)
Random Classifier Utility
Description
A function for random classifiers.
Usage
lol.classify.rand(X, Y, ...)
Arguments
X |
|
Y |
|
... |
optional args. |
Value
A structure, with the following attributes:
ylabs |
|
priors |
|
Author(s)
Eric Bridgeford
Randomly Chance Classifier Training
Description
A function that predicts the maximally present class in the dataset. Functionality consistent with the standard R prediction interface so that one can compute the "chance" accuracy with minimal modification of other classification scripts.
Usage
lol.classify.randomChance(X, Y, ...)
Arguments
X |
|
Y |
|
... |
optional args. |
Value
A list of class randomGuess
, with the following attributes:
ylabs |
|
priors |
|
Author(s)
Eric Bridgeford
Examples
library(lolR)
data <- lol.sims.rtrunk(n=200, d=30) # 200 examples of 30 dimensions
X <- data$X; Y <- data$Y
model <- lol.classify.randomChance(X, Y)
Randomly Guessing Classifier Training
Description
A function that predicts by randomly guessing based on the pmf of the class priors. Functionality consistent with the standard R prediction interface so that one can compute the "guess" accuracy with minimal modification of other classification scripts.
Usage
lol.classify.randomGuess(X, Y, ...)
Arguments
X |
|
Y |
|
... |
optional args. |
Value
A list of class randomGuess
, with the following attributes:
ylabs |
|
priors |
|
Author(s)
Eric Bridgeford
Examples
library(lolR)
data <- lol.sims.rtrunk(n=200, d=30) # 200 examples of 30 dimensions
X <- data$X; Y <- data$Y
model <- lol.classify.randomGuess(X, Y)
Embedding
Description
A function that embeds points in high dimensions to a lower dimensionality.
Usage
lol.embed(X, A, ...)
Arguments
X |
|
A |
|
... |
optional args. |
Value
an array [n, r]
the original n
points embedded into r
dimensions.
Author(s)
Eric Bridgeford
Examples
library(lolR)
data <- lol.sims.rtrunk(n=200, d=30) # 200 examples of 30 dimensions
X <- data$X; Y <- data$Y
model <- lol.project.lol(X=X, Y=Y, r=5) # use lol to project into 5 dimensions
Xr <- lol.embed(X, model$A)
Bayes Optimal
Description
A function for recovering the Bayes Optimal Projection, which optimizes Bayes classification.
Usage
lol.project.bayes_optimal(X, Y, mus, Sigmas, priors, ...)
Arguments
X |
|
Y |
|
mus |
|
Sigmas |
|
priors |
|
... |
optional args. |
Value
A list of class embedding
containing the following:
A |
|
d |
the eigen values associated with the eigendecomposition. |
ylabs |
|
centroids |
|
priors |
|
Xr |
|
cr |
|
Author(s)
Eric Bridgeford
Examples
library(lolR)
data <- lol.sims.rtrunk(n=200, d=30) # 200 examples of 30 dimensions
X <- data$X; Y <- data$Y
# obtain bayes-optimal projection of the data
model <- lol.project.bayes_optimal(X=X, Y=Y, mus=data$mus,
S=data$Sigmas, priors=data$priors)
Data Piling
Description
A function for implementing the Maximal Data Piling (MDP) Algorithm.
Usage
lol.project.dp(X, Y, ...)
Arguments
X |
|
Y |
|
... |
optional args. |
Value
A list containing the following:
A |
|
ylabs |
|
centroids |
|
priors |
|
Xr |
|
cr |
|
Details
For more details see the help vignette:
vignette("dp", package = "lolR")
Author(s)
Minh Tang and Eric Bridgeford
Examples
library(lolR)
data <- lol.sims.rtrunk(n=200, d=30) # 200 examples of 30 dimensions
X <- data$X; Y <- data$Y
model <- lol.project.dp(X=X, Y=Y) # use mdp to project into maximal data piling
Linear Optimal Low-Rank Projection (LOL)
Description
A function for implementing the Linear Optimal Low-Rank Projection (LOL) Algorithm. This algorithm allows users to find an optimal projection from 'd' to 'r' dimensions, where 'r << d', by combining information from the first and second moments in thet data.
Usage
lol.project.lol(
X,
Y,
r,
second.moment.xfm = FALSE,
second.moment.xfm.opts = list(),
first.moment = "delta",
second.moment = "linear",
orthogonalize = FALSE,
robust = FALSE,
...
)
Arguments
X |
|
Y |
|
r |
the rank of the projection. Note that |
second.moment.xfm |
whether to use extraneous options in estimation of the second moment component. The transforms specified should be a numbered list of transforms you wish to apply, and will be applied in accordance with |
second.moment.xfm.opts |
optional arguments to pass to the |
first.moment |
the function to capture the first moment. Defaults to
|
second.moment |
the function to capture the second moment. Defaults to
|
orthogonalize |
whether to orthogonalize the projection matrix. Defaults to |
robust |
whether to perform PCA on a robust estimate of the covariance matrix or not. Defaults to |
... |
trailing args. |
Value
A list containing the following:
A |
|
ylabs |
|
centroids |
|
priors |
|
Xr |
|
cr |
|
second.moment |
the method used to estimate the second moment. |
first.moment |
the method used to estimate the first moment. |
Details
For more details see the help vignette:
vignette("lol", package = "lolR")
Author(s)
Eric Bridgeford
References
Joshua T. Vogelstein, et al. "Supervised Dimensionality Reduction for Big Data" arXiv (2020).
Examples
library(lolR)
data <- lol.sims.rtrunk(n=200, d=30) # 200 examples of 30 dimensions
X <- data$X; Y <- data$Y
model <- lol.project.lol(X=X, Y=Y, r=5) # use lol to project into 5 dimensions
# use lol to project into 5 dimensions, and produce an orthogonal basis for the projection matrix
model <- lol.project.lol(X=X, Y=Y, r=5, orthogonalize=TRUE)
# use LRQDA to estimate the second moment by performing PCA on each class
model <- lol.project.lol(X=X, Y=Y, r=5, second.moment='quadratic')
# use PLS to estimate the second moment
model <- lol.project.lol(X=X, Y=Y, r=5, second.moment='pls')
# use LRLDA to estimate the second moment, and apply a unit transformation
# (according to scale function) with no centering
model <- lol.project.lol(X=X, Y=Y, r=5, second.moment='linear', second.moment.xfm='unit',
second.moment.xfm.opts=list(center=FALSE))
Low-rank Canonical Correlation Analysis (LR-CCA)
Description
A function for implementing the Low-rank Canonical Correlation Analysis (LR-CCA) Algorithm.
Usage
lol.project.lrcca(X, Y, r, ...)
Arguments
X |
[n, d] the data with |
Y |
[n] the labels of the samples with |
r |
the rank of the projection. |
... |
trailing args. |
Value
A list containing the following:
A |
|
d |
the eigen values associated with the eigendecomposition. |
ylabs |
|
centroids |
|
priors |
|
Xr |
|
cr |
|
Details
For more details see the help vignette:
vignette("lrcca", package = "lolR")
Author(s)
Eric Bridgeford and Minh Tang
Examples
library(lolR)
data <- lol.sims.rtrunk(n=200, d=30) # 200 examples of 30 dimensions
X <- data$X; Y <- data$Y
model <- lol.project.lrcca(X=X, Y=Y, r=5) # use lrcca to project into 5 dimensions
Low-Rank Linear Discriminant Analysis (LRLDA)
Description
A function that performs LRLDA on the class-centered data. Same as class-conditional PCA.
Usage
lol.project.lrlda(X, Y, r, xfm = FALSE, xfm.opts = list(), robust = FALSE, ...)
Arguments
X |
|
Y |
|
r |
the rank of the projection. |
xfm |
whether to transform the variables before taking the SVD.
|
xfm.opts |
optional arguments to pass to the |
robust |
whether to use a robust estimate of the covariance matrix when taking PCA. Defaults to |
... |
trailing args. |
Value
A list containing the following:
A |
|
d |
the eigen values associated with the eigendecomposition. |
ylabs |
|
centroids |
|
priors |
|
Xr |
|
cr |
|
Details
For more details see the help vignette:
vignette("lrlda", package = "lolR")
Author(s)
Eric Bridgeford
Examples
library(lolR)
data <- lol.sims.rtrunk(n=200, d=30) # 200 examples of 30 dimensions
X <- data$X; Y <- data$Y
model <- lol.project.lrlda(X=X, Y=Y, r=2) # use lrlda to project into 2 dimensions
Principal Component Analysis (PCA)
Description
A function that performs PCA on data.
Usage
lol.project.pca(X, r, xfm = FALSE, xfm.opts = list(), robust = FALSE, ...)
Arguments
X |
|
r |
the rank of the projection. |
xfm |
whether to transform the variables before taking the SVD.
|
xfm.opts |
optional arguments to pass to the |
robust |
whether to perform PCA on a robust estimate of the covariance matrix or not. Defaults to |
... |
trailing args. |
Value
A list containing the following:
A |
|
d |
the eigen values associated with the eigendecomposition. |
Xr |
|
Details
For more details see the help vignette:
vignette("pca", package = "lolR")
Author(s)
Eric Bridgeford
Examples
library(lolR)
data <- lol.sims.rtrunk(n=200, d=30) # 200 examples of 30 dimensions
X <- data$X; Y <- data$Y
model <- lol.project.pca(X=X, r=2) # use pca to project into 2 dimensions
Partial Least-Squares (PLS)
Description
A function for implementing the Partial Least-Squares (PLS) Algorithm.
Usage
lol.project.pls(X, Y, r, ...)
Arguments
X |
[n, d] the data with |
Y |
[n] the labels of the samples with |
r |
the rank of the projection. |
... |
trailing args. |
Value
A list containing the following:
A |
|
ylabs |
|
centroids |
|
priors |
|
Xr |
|
cr |
|
Details
For more details see the help vignette:
vignette("pls", package = "lolR")
Author(s)
Eric Bridgeford
Examples
library(lolR)
data <- lol.sims.rtrunk(n=200, d=30) # 200 examples of 30 dimensions
X <- data$X; Y <- data$Y
model <- lol.project.pls(X=X, Y=Y, r=5) # use pls to project into 5 dimensions
Random Projections (RP)
Description
A function for implementing gaussian random projections (rp).
Usage
lol.project.rp(X, r, scale = TRUE, ...)
Arguments
X |
|
r |
the rank of the projection. Note that |
scale |
whether to scale the random projection by the sqrt(1/d). Defaults to |
... |
trailing args. |
Value
A list containing the following:
A |
|
Xr |
|
Details
For more details see the help vignette:
vignette("rp", package = "lolR")
Author(s)
Eric Bridgeford
Examples
library(lolR)
data <- lol.sims.rtrunk(n=200, d=30) # 200 examples of 30 dimensions
X <- data$X; Y <- data$Y
model <- lol.project.rp(X=X, r=5) # use lol to project into 5 dimensions
Stacked Cigar
Description
A simulation for the stacked cigar experiment.
Usage
lol.sims.cigar(n, d, rotate = FALSE, priors = NULL, a = 0.15, b = 4)
Arguments
n |
the number of samples of the simulated data. |
d |
the dimensionality of the simulated data. |
rotate |
whether to apply a random rotation to the mean and covariance. With random rotataion matrix |
priors |
the priors for each class. If |
a |
scalar for all of the mu1 but 2nd dimension. Defaults to |
b |
scalar for 2nd dimension value of mu2 and the 2nd variance term of S. Defaults to |
Value
A list of class simulation
with the following:
X |
|
Y |
|
mus |
|
Sigmas |
|
priors |
|
simtype |
The name of the simulation. |
params |
Any extraneous parameters the simulation was created with. |
Details
For more details see the help vignette:
vignette("sims", package = "lolR")
Author(s)
Eric Bridgeford
Examples
library(lolR)
data <- lol.sims.cigar(n=200, d=30) # 200 examples of 30 dimensions
X <- data$X; Y <- data$Y
Cross
Description
A simulation for the cross experiment, in which the two classes have orthogonal covariant dimensions and the same means.
Usage
lol.sims.cross(n, d, rotate = FALSE, priors = NULL, a = 1, b = 0.25, K = 2)
Arguments
n |
the number of samples of simulated data. |
d |
the dimensionality of the simulated data. |
rotate |
With random rotataion matrix |
priors |
the priors for each class. If |
a |
scalar for the magnitude of the variance that is high within the particular class. Defaults to |
b |
scalar for the magnitude of the varaince that is not high within the particular class. Defaults to |
K |
the number of classes. Defaults to |
Value
A list of class simulation
with the following:
X |
|
Y |
|
mus |
|
Sigmas |
|
priors |
|
simtype |
The name of the simulation. |
params |
Any extraneous parameters the simulation was created with. |
Details
For more details see the help vignette:
vignette("sims", package = "lolR")
Author(s)
Eric Bridgeford
Examples
library(lolR)
data <- lol.sims.cross(n=200, d=30) # 200 examples of 30 dimensions
X <- data$X; Y <- data$Y
Fat Tails Simulation
Description
A function for simulating from 2 classes with differing means each with 2 sub-clusters, where one sub-cluster has a narrow tail and the other sub-cluster has a fat tail.
Usage
lol.sims.fat_tails(
n,
d,
rotate = FALSE,
f = 15,
s0 = 10,
rho = 0.2,
t = 0.8,
priors = NULL
)
Arguments
n |
the number of samples of the simulated data. |
d |
the dimensionality of the simulated data. |
rotate |
whether to apply a random rotation to the mean and covariance. With random rotataion matrix |
f |
the fatness scaling of the tail. S2 = f*S1, where S1_ij = rho if i != j, and 1 if i == j. Defaults to |
s0 |
the number of dimensions with a difference in the means. s0 should be < d. Defaults to |
rho |
the scaling of the off-diagonal covariance terms, should be < 1. Defaults to |
t |
the fraction of each class from the narrower-tailed distribution. Defaults to |
priors |
the priors for each class. If |
Value
A list of class simulation
with the following:
X |
|
Y |
|
mus |
|
Sigmas |
|
priors |
|
simtype |
The name of the simulation. |
params |
Any extraneous parameters the simulation was created with. |
Details
For more details see the help vignette:
vignette("sims", package = "lolR")
Author(s)
Eric Bridgeford
Examples
library(lolR)
data <- lol.sims.fat_tails(n=200, d=30) # 200 examples of 30 dimensions
X <- data$X; Y <- data$Y
Mean Difference Simulation
Description
A function for simulating data in which a difference in the means is present only in a subset of dimensions, and equal covariance.
Usage
lol.sims.mean_diff(
n,
d,
rotate = FALSE,
priors = NULL,
K = 2,
md = 1,
subset = c(1),
offdiag = 0,
s = 1
)
Arguments
n |
the number of samples of the simulated data. |
d |
the dimensionality of the simulated data. |
rotate |
whether to apply a random rotation to the mean and covariance. With random rotataion matrix |
priors |
the priors for each class. If |
K |
the number of classes. Defaults to |
md |
the magnitude of the difference in the means in the specified subset of dimensions. Ddefaults to |
subset |
the dimensions to have a difference in the means. Defaults to only the first dimension. |
offdiag |
the off-diagonal elements of the covariance matrix. Should be < 1. |
s |
the scaling parameter of the covariance matrix. S_ij = scaling*1 if i == j, or scaling*offdiag if i != j. Defaults to |
Value
A list of class simulation
with the following:
X |
|
Y |
|
mus |
|
Sigmas |
|
priors |
|
simtype |
The name of the simulation. |
params |
Any extraneous parameters the simulation was created with. |
Details
For more details see the help vignette:
vignette("sims", package = "lolR")
Author(s)
Eric Bridgeford
Examples
library(lolR)
data <- lol.sims.mean_diff(n=200, d=30) # 200 examples of 30 dimensions
X <- data$X; Y <- data$Y
Quadratic Discriminant Toeplitz Simulation
Description
A function for simulating data generalizing the Toeplitz setting, where each class has a different covariance matrix. This results in a Quadratic Discriminant.
Usage
lol.sims.qdtoep(
n,
d,
rotate = FALSE,
priors = NULL,
D1 = 10,
b = 0.4,
rho = 0.5
)
Arguments
n |
the number of samples of the simulated data. |
d |
the dimensionality of the simulated data. |
rotate |
whether to apply a random rotation to the mean and covariance. With random rotataion matrix |
priors |
the priors for each class. If |
D1 |
the dimensionality for the non-equal covariance terms. Defaults to |
b |
a scaling parameter for the means. Defaults to |
rho |
the scaling of the covariance terms, should be < 1. Defaults to |
Value
A list of class simulation
with the following:
X |
|
Y |
|
mus |
|
Sigmas |
|
priors |
|
simtype |
The name of the simulation. |
params |
Any extraneous parameters the simulation was created with. |
Details
For more details see the help vignette:
vignette("sims", package = "lolR")
Author(s)
Eric Bridgeford
Examples
library(lolR)
data <- lol.sims.qdtoep(n=200, d=30) # 200 examples of 30 dimensions
X <- data$X; Y <- data$Y
Random Rotation
Description
A helper function for applying a random rotation to gaussian parameter set.
Usage
lol.sims.random_rotate(mus, Sigmas, Q = NULL)
Arguments
mus |
means per class. |
Sigmas |
covariances per class. |
Q |
rotation to use, if any |
Author(s)
Eric Bridgeford
Reverse Random Trunk
Description
A simulation for the reversed random trunk experiment, in which the maximal covariant directions are the same as the directions with the maximal mean difference.
Usage
lol.sims.rev_rtrunk(
n,
d,
robust = FALSE,
rotate = FALSE,
priors = NULL,
b = 4,
K = 2,
maxvar = b^3,
maxvar.outlier = maxvar^3
)
Arguments
n |
the number of samples of the simulated data. |
d |
the dimensionality of the simulated data. |
robust |
the number of outlier points to add, where outliers have opposite covariance of inliers. Defaults to |
rotate |
whether to apply a random rotation to the mean and covariance. With random rotataion matrix |
priors |
the priors for each class. If |
b |
scalar for mu scaling. Default to |
K |
number of classes, should be <4. Defaults to |
maxvar |
the maximum covariance between the two classes. Defaults to |
maxvar.outlier |
the maximum covariance for the outlier points. Defaults to |
Value
A list of class simulation
with the following:
X |
|
Y |
|
mus |
|
Sigmas |
|
priors |
|
simtype |
The name of the simulation. |
params |
Any extraneous parameters the simulation was created with. |
robust |
If robust is not false, a list containing |
Details
For more details see the help vignette:
vignette("sims", package = "lolR")
Author(s)
Eric Bridgeford
Examples
library(lolR)
data <- lol.sims.rtrunk(n=200, d=30) # 200 examples of 30 dimensions
X <- data$X; Y <- data$Y
Sample Random Rotation
Description
A helper function for estimating a random rotation matrix.
Usage
lol.sims.rotation(d)
Arguments
d |
dimensions to generate a rotation matrix for. |
Value
the rotation matrix
Author(s)
Eric Bridgeford
Random Trunk
Description
A simulation for the random trunk experiment, in which the maximal covariant dimensions are the reverse of the maximal mean differences.
Usage
lol.sims.rtrunk(
n,
d,
rotate = FALSE,
priors = NULL,
b = 4,
K = 2,
maxvar = 100
)
Arguments
n |
the number of samples of the simulated data. |
d |
the dimensionality of the simulated data. |
rotate |
whether to apply a random rotation to the mean and covariance. With random rotataion matrix |
priors |
the priors for each class. If |
b |
scalar for mu scaling. Default to |
K |
number of classes, should be <4. Defaults to |
maxvar |
the maximum covariance between the two classes. Defaults to |
Value
A list of class simulation
with the following:
X |
|
Y |
|
mus |
|
Sigmas |
|
priors |
|
simtype |
The name of the simulation. |
params |
Any extraneous parameters the simulation was created with. |
robust |
If robust is not false, a list containing |
Details
For more details see the help vignette:
vignette("sims", package = "lolR")
Author(s)
Eric Bridgeford
Examples
library(lolR)
data <- lol.sims.rtrunk(n=200, d=30) # 200 examples of 30 dimensions
X <- data$X; Y <- data$Y
GMM Simulate
Description
A helper function for simulating from Gaussian Mixture.
Usage
lol.sims.sim_gmm(mus, Sigmas, n, priors)
Arguments
mus |
|
Sigmas |
|
n |
the number of examples. |
priors |
|
Value
A list with the following:
X |
|
Y |
|
priors |
|
Author(s)
Eric Bridgeford
Toeplitz Simulation
Description
A function for simulating data in which the covariance is a non-symmetric toeplitz matrix.
Usage
lol.sims.toep(n, d, rotate = FALSE, priors = NULL, D1 = 10, b = 0.4, rho = 0.5)
Arguments
n |
the number of samples of the simulated data. |
d |
the dimensionality of the simulated data. |
rotate |
whether to apply a random rotation to the mean and covariance. With random rotataion matrix |
priors |
the priors for each class. If |
D1 |
the dimensionality for the non-equal covariance terms. Defaults to |
b |
a scaling parameter for the means. Defaults to |
rho |
the scaling of the covariance terms, should be < 1. Defaults to |
Value
A list of class simulation
with the following:
X |
|
Y |
|
mus |
|
Sigmas |
|
priors |
|
simtype |
The name of the simulation. |
params |
Any extraneous parameters the simulation was created with. |
Details
For more details see the help vignette:
vignette("sims", package = "lolR")
Author(s)
Eric Bridgeford
Examples
library(lolR)
data <- lol.sims.toep(n=200, d=30) # 200 examples of 30 dimensions
X <- data$X; Y <- data$Y
Xor Problem
Description
A function to simulate from the 2-class xor problem.
Usage
lol.sims.xor2(n, d, priors = NULL, fall = 100)
Arguments
n |
the number of samples of the simulated data. |
d |
the dimensionality of the simulated data. |
priors |
the priors for each class. If |
fall |
the falloff for the covariance structuring. Sigma declines by ndim/fall across the variance terms. Defaults to |
Value
A list of class simulation
with the following:
X |
|
Y |
|
mus |
|
Sigmas |
|
priors |
|
simtype |
The name of the simulation. |
params |
Any extraneous parameters the simulation was created with. |
Details
For more details see the help vignette:
vignette("sims", package = "lolR")
Author(s)
Eric Bridgeford
Examples
library(lolR)
data <- lol.sims.xor2(n=200, d=30) # 200 examples of 30 dimensions
X <- data$X; Y <- data$Y
A utility to use irlba when necessary
Description
A utility to use irlba when necessary
Usage
lol.utils.decomp(
X,
xfm = FALSE,
xfm.opts = list(),
ncomp = 0,
t = 0.05,
robust = FALSE
)
Arguments
X |
the data to compute the svd of. |
xfm |
whether to transform the variables before taking the SVD.
|
xfm.opts |
optional arguments to pass to the |
ncomp |
the number of left singular vectors to retain. |
t |
the threshold of percent of singular vals/vecs to use irlba. |
robust |
whether to use a robust estimate of the covariance matrix when taking PCA. Defaults to |
Value
the svd of X.
Author(s)
Eric Bridgeford
A function that performs a utility computation of information about the differences of the classes.
Description
A function that performs a utility computation of information about the differences of the classes.
Usage
lol.utils.deltas(centroids, priors, ...)
Arguments
centroids |
|
priors |
|
... |
optional args. |
Value
deltas [d, K]
the K difference vectors.
Author(s)
Eric Bridgeford
A function that performs basic utilities about the data.
Description
A function that performs basic utilities about the data.
Usage
lol.utils.info(X, Y, robust = FALSE, ...)
Arguments
X |
|
Y |
|
robust |
whether to perform PCA on a robust estimate of the covariance matrix or not. Defaults to |
... |
optional args. |
Value
n
the number of samples.
d
the number of dimensions.
ylabs [K]
vector containing the unique, ordered class labels.
priors [K]
vector containing prior probability for the unique, ordered classes.
Author(s)
Eric Bridgeford
A function for one-hot encoding categorical respose vectors.
Description
A function for one-hot encoding categorical respose vectors.
Usage
lol.utils.ohe(Y)
Arguments
Y |
[n] a vector of the categorical resposes, with |
Value
a list containing the following:
Yh |
[n, K] the one-hot encoded Y respose variable. |
ylabs |
[K] a vector of the y names corresponding to each response column. |
Author(s)
Eric Bridgeford
Embedding Cross Validation
Description
A function for performing leave-one-out cross-validation for a given embedding model. This function produces fold-wise
cross-validated misclassification rates for standard embedding techniques. Users can optionally specify custom embedding techniques
with proper configuration of alg.*
parameters and hyperparameters. Optional classifiers implementing the S3 predict
function can be used
for classification, with hyperparameters to classifiers for determining misclassification rate specified in classifier.*
parameters and hyperparameters.
Usage
lol.xval.eval(
X,
Y,
r,
alg,
sets = NULL,
alg.dimname = "r",
alg.opts = list(),
alg.embedding = "A",
classifier = lda,
classifier.opts = list(),
classifier.return = "class",
k = "loo",
rank.low = FALSE,
...
)
Arguments
X |
|
Y |
|
r |
the number of embedding dimensions desired, where |
alg |
the algorithm to use for embedding. Should be a function that accepts inputs |
sets |
a user-defined cross-validation set. Defaults to
|
alg.dimname |
the name of the parameter accepted by |
alg.opts |
the hyper-parameter options you want to pass into your algorithm, as a keyworded list. Defaults to |
alg.embedding |
the attribute returned by
|
classifier |
the classifier to use for assessing performance. The classifier should accept |
classifier.opts |
any extraneous options to be passed to the classifier function, as a list. Defaults to an empty list. |
classifier.return |
if the return type is a list,
|
k |
the cross-validated method to perform. Defaults to
|
rank.low |
whether to force the training set to low-rank. Defaults to
|
... |
trailing args. |
Value
Returns a list containing:
lhat |
the mean cross-validated error. |
model |
The model returned by |
classifier |
The classifier trained on all of the embedded data. |
lhats |
the cross-validated error for each of the |
Details
For more details see the help vignette:
vignette("xval", package = "lolR")
For extending cross-validation techniques shown here to arbitrary embedding algorithms, see the vignette:
vignette("extend_embedding", package = "lolR")
For extending cross-validation techniques shown here to arbitrary classification algorithms, see the vignette:
vignette("extend_classification", package = "lolR")
Author(s)
Eric Bridgeford
Examples
# train model and analyze with loo validation using lda classifier
library(lolR)
data <- lol.sims.rtrunk(n=200, d=30) # 200 examples of 30 dimensions
X <- data$X; Y <- data$Y
r=5 # embed into r=5 dimensions
# run cross-validation with the nearestCentroid method and
# leave-one-out cross-validation, which returns only
# prediction labels so we specify classifier.return as NaN
xval.fit <- lol.xval.eval(X, Y, r, lol.project.lol,
classifier=lol.classify.nearestCentroid,
classifier.return=NaN, k='loo')
# train model and analyze with 5-fold validation using lda classifier
data <- lol.sims.rtrunk(n=200, d=30) # 200 examples of 30 dimensions
X <- data$X; Y <- data$Y
xval.fit <- lol.xval.eval(X, Y, r, lol.project.lol, k=5)
# pass in existing cross-validation sets
sets <- lol.xval.split(X, Y, k=2)
xval.fit <- lol.xval.eval(X, Y, r, lol.project.lol, sets=sets)
Optimal Cross-Validated Number of Embedding Dimensions
Description
A function for performing leave-one-out cross-validation for a given embedding model, that allows users to determine the optimal number of embedding dimensions for
their algorithm-of-choice. This function produces fold-wise cross-validated misclassification rates for standard embedding techniques across a specified selection of
embedding dimensions. Optimal embedding dimension is selected as the dimension with the lowest average misclassification rate across all folds.
Users can optionally specify custom embedding techniques with proper configuration of alg.*
parameters and hyperparameters.
Optional classifiers implementing the S3 predict
function can be used for classification, with hyperparameters to classifiers for
determining misclassification rate specified in classifier.*
.
Usage
lol.xval.optimal_dimselect(
X,
Y,
rs,
alg,
sets = NULL,
alg.dimname = "r",
alg.opts = list(),
alg.embedding = "A",
alg.structured = TRUE,
classifier = lda,
classifier.opts = list(),
classifier.return = "class",
k = "loo",
rank.low = FALSE,
...
)
Arguments
X |
|
Y |
|
rs |
|
alg |
the algorithm to use for embedding. Should be a function that accepts inputs |
sets |
a user-defined cross-validation set. Defaults to
|
alg.dimname |
the name of the parameter accepted by |
alg.opts |
the hyper-parameter options to pass to your algorithm as a keyworded list. Defaults to |
alg.embedding |
the attribute returned by
|
alg.structured |
a boolean to indicate whether the embedding matrix is structured. Provides performance increase by not having to compute the embedding matrix
|
classifier |
the classifier to use for assessing performance. The classifier should accept |
classifier.opts |
any extraneous options to be passed to the classifier function, as a list. Defaults to an empty list. |
classifier.return |
if the return type is a list,
|
k |
the cross-validated method to perform. Defaults to
|
rank.low |
whether to force the training set to low-rank. Defaults to
|
... |
trailing args. |
Value
Returns a list containing:
folds.data |
the results, as a data-frame, of the per-fold classification accuracy. |
foldmeans.data |
the results, as a data-frame, of the average classification accuracy for each |
optimal.lhat |
the classification error of the optimal |
.
optimal.r |
the optimal number of embedding dimensions from |
.
model |
the model trained on all of the data at the optimal number of embedding dimensions. |
classifier |
the classifier trained on all of the data at the optimal number of embedding dimensions. |
Details
For more details see the help vignette:
vignette("xval", package = "lolR")
For extending cross-validation techniques shown here to arbitrary embedding algorithms, see the vignette:
vignette("extend_embedding", package = "lolR")
For extending cross-validation techniques shown here to arbitrary classification algorithms, see the vignette:
vignette("extend_classification", package = "lolR")
Author(s)
Eric Bridgeford
Examples
# train model and analyze with loo validation using lda classifier
library(lolR)
data <- lol.sims.rtrunk(n=200, d=30) # 200 examples of 30 dimensions
X <- data$X; Y <- data$Y
# run cross-validation with the nearestCentroid method and
# leave-one-out cross-validation, which returns only
# prediction labels so we specify classifier.return as NaN
xval.fit <- lol.xval.optimal_dimselect(X, Y, rs=c(5, 10, 15), lol.project.lol,
classifier=lol.classify.nearestCentroid,
classifier.return=NaN, k='loo')
# train model and analyze with 5-fold validation using lda classifier
data <- lol.sims.rtrunk(n=200, d=30) # 200 examples of 30 dimensions
X <- data$X; Y <- data$Y
xval.fit <- lol.xval.optimal_dimselect(X, Y, rs=c(5, 10, 15), lol.project.lol, k=5)
# pass in existing cross-validation sets
sets <- lol.xval.split(X, Y, k=2)
xval.fit <- lol.xval.optimal_dimselect(X, Y, rs=c(5, 10, 15), lol.project.lol, sets=sets)
Cross-Validation Data Splitter
Description
A function to split a dataset into training and testing sets for cross validation. The procedure for cross-validation
is to split the data into k-folds. The k-folds are then rotated individually to form a single held-out testing set the model will be validated on,
and the remaining (k-1) folds are used for training the developed model. Note that this cross-validation function includes functionality to be used for
low-rank cross-validation. In that case, instead of using the full (k-1) folds for training, we subset min((k-1)/k*n, d)
samples to ensure that
the resulting training sets are all low-rank. We still rotate properly over the held-out fold to ensure that the resulting testing sets
do not have any shared examples, which would add a complicated dependence structure to inference we attempt to infer on the testing sets.
Usage
lol.xval.split(X, Y, k = "loo", rank.low = FALSE, ...)
Arguments
X |
|
Y |
|
k |
the cross-validated method to perform. Defaults to
|
rank.low |
whether to force the training set to low-rank. Defaults to
|
... |
optional args. |
Value
sets the cross-validation sets as an object of class "XV"
containing the following:
train |
length |
test |
length |
Author(s)
Eric Bridgeford
Examples
# prepare data for 10-fold validation
library(lolR)
data <- lol.sims.rtrunk(n=200, d=30) # 200 examples of 30 dimensions
X <- data$X; Y <- data$Y
sets.xval.10fold <- lol.xval.split(X, Y, k=10)
# prepare data for loo validation
sets.xval.loo <- lol.xval.split(X, Y, k='loo')
Nearest Centroid Classifier Prediction
Description
A function that predicts the class of points based on the nearest centroid
Usage
## S3 method for class 'nearestCentroid'
predict(object, X, ...)
Arguments
object |
An object of class
|
X |
|
... |
optional args. |
Value
Yhat [n]
the predicted class of each of the n
data point in X
.
Details
For more details see the help vignette:
vignette("centroid", package = "lolR")
Author(s)
Eric Bridgeford
Examples
library(lolR)
data <- lol.sims.rtrunk(n=200, d=30) # 200 examples of 30 dimensions
X <- data$X; Y <- data$Y
model <- lol.classify.nearestCentroid(X, Y)
Yh <- predict(model, X)
Randomly Chance Classifier Prediction
Description
A function that predicts the maximally present class in the dataset. Functionality consistent with the standard R prediction interface so that one can compute the "chance" accuracy with minimal modification of other classification scripts.
Usage
## S3 method for class 'randomChance'
predict(object, X, ...)
Arguments
object |
An object of class
|
X |
|
... |
optional args. |
Value
Yhat [n]
the predicted class of each of the n
data point in X
.
Author(s)
Eric Bridgeford
Examples
library(lolR)
data <- lol.sims.rtrunk(n=200, d=30) # 200 examples of 30 dimensions
X <- data$X; Y <- data$Y
model <- lol.classify.randomChance(X, Y)
Yh <- predict(model, X)
Randomly Guessing Classifier Prediction
Description
A function that predicts by randomly guessing based on the pmf of the class priors. Functionality consistent with the standard R prediction interface so that one can compute the "guess" accuracy with minimal modification of other classification scripts.
Usage
## S3 method for class 'randomGuess'
predict(object, X, ...)
Arguments
object |
An object of class
|
X |
|
... |
optional args. |
Value
Yhat [n]
the predicted class of each of the n
data point in X
.
Author(s)
Eric Bridgeford
Examples
library(lolR)
data <- lol.sims.rtrunk(n=200, d=30) # 200 examples of 30 dimensions
X <- data$X; Y <- data$Y
model <- lol.classify.randomGuess(X, Y)
Yh <- predict(model, X)