Title: | Cross-Validated Covariance Matrix Estimation |
Version: | 1.2.2 |
Description: | An efficient cross-validated approach for covariance matrix estimation, particularly useful in high-dimensional settings. This method relies upon the theory of high-dimensional loss-based covariance matrix estimator selection developed by Boileau et al. (2022) <doi:10.1080/10618600.2022.2110883> to identify the optimal estimator from among a prespecified set of candidates. |
Depends: | R (≥ 4.0.0) |
Imports: | matrixStats, Matrix, stats, methods, origami, coop, Rdpack, rlang, dplyr, stringr, purrr, tibble, assertthat, RSpectra, ggplot2, ggpubr, RColorBrewer, RMTstat |
Suggests: | future, future.apply, MASS, testthat, knitr, rmarkdown, covr, spelling |
License: | MIT + file LICENSE |
URL: | https://github.com/PhilBoileau/cvCovEst |
BugReports: | https://github.com/PhilBoileau/cvCovEst/issues |
Encoding: | UTF-8 |
VignetteBuilder: | knitr |
RoxygenNote: | 7.3.1 |
RdMacros: | Rdpack |
Language: | en-US |
NeedsCompilation: | no |
Packaged: | 2024-02-17 19:57:20 UTC; philippe |
Author: | Philippe Boileau |
Maintainer: | Philippe Boileau <philippe_boileau@berkeley.edu> |
Repository: | CRAN |
Date/Publication: | 2024-02-17 20:20:02 UTC |
Adaptive LASSO Estimator
Description
adaptiveLassoEst()
applied the adaptive LASSO to the
entries of the sample covariance matrix. The thresholding function is
inspired by the penalized regression introduced by
Zou (2006). The thresholding function assigns
a weight to each entry of the sample covariance matrix based on its
initial value. This weight then determines the relative size of the penalty
resulting in larger values being penalized less and reducing bias
(Rothman et al. 2009).
Usage
adaptiveLassoEst(dat, lambda, n)
Arguments
dat |
A numeric |
lambda |
A non-negative |
n |
A non-negative |
Value
A matrix
corresponding to the estimate of the covariance
matrix.
References
Rothman AJ, Levina E, Zhu J (2009).
“Generalized Thresholding of Large Covariance Matrices.”
Journal of the American Statistical Association, 104(485), 177-186.
doi:10.1198/jasa.2009.0101, https://doi.org/10.1198/jasa.2009.0101.
Zou H (2006).
“The Adaptive Lasso and Its Oracle Properties.”
Journal of the American Statistical Association, 101(476), 1418-1429.
doi:10.1198/016214506000000735, https://doi.org/10.1198/016214506000000735.
Examples
adaptiveLassoEst(dat = mtcars, lambda = 0.9, n = 0.9)
Adaptive LASSO Thresholding Function
Description
adaptiveLassoThreshold()
applies the adaptive LASSO
thresholding function to the entries of a matrix
. In particular, it
is meant to be applied to sample covariance matrix
Usage
adaptiveLassoThreshold(entry, lambda, n)
Arguments
entry |
A |
lambda |
A non-negative |
n |
A non-negative |
Value
A regularized numeric
.
Banding Estimator
Description
bandingEst()
estimates the covariance matrix of data with
ordered variables by forcing off-diagonal entries to be zero for indices
that are far removed from one another. The {i, j} entry of the estimated
covariance matrix will be zero if the absolute value of {i - j} is greater
than some non-negative constant k
. This estimator was proposed by
Bickel and Levina (2008).
Usage
bandingEst(dat, k)
Arguments
dat |
A numeric |
k |
A non-negative, |
Value
A matrix
corresponding to the estimate of the covariance
matrix.
References
Bickel PJ, Levina E (2008). “Regularized estimation of large covariance matrices.” Annals of Statistics, 36(1), 199–227. doi:10.1214/009053607000000758.
Examples
bandingEst(dat = mtcars, k = 2L)
Showing Best Estimator Within Each Class of Estimators
Description
bestInClass()
finds the best performing estimator within
each class of estimator passed to cvCovEst()
and
finds the associated hyperparameters if applicable.
Usage
bestInClass(dat, worst = FALSE)
Arguments
dat |
The |
worst |
This facilitates the option to choose the worst performing
estimator in each class. Default is |
Value
tibble
with rows corresponding to estimator
classes and columns for hyperparameter values, cross-validated risk, and
other summary metrics for the best (or worst) estimator in that class.
Check Arguments Passed to cvCovEst
Description
checkArgs()
verifies that all arguments
passed to cvCovEst()
function meet its specifications.
Usage
checkArgs(
dat,
estimators,
estimator_params,
cv_loss,
cv_scheme,
mc_split,
v_folds,
parallel
)
Arguments
dat |
A numeric |
estimators |
A |
estimator_params |
A named |
cv_loss |
A |
cv_scheme |
A |
mc_split |
A |
v_folds |
An |
parallel |
A |
Value
Whether all argument conditions are satisfied
Check Arguments Passed to plot.cvCovEst and summary.cvCovEst
Description
The checkPlotSumArgs()
function verifies that all
arguments passed to the plot.cvCovEst()
and
summary.cvCovEst()
functions meet their specifications. Some
additional arguments may be checked at the individual function level.
Usage
checkPlotSumArgs(
dat,
dat_orig,
which_fun,
estimator,
plot_type,
summ_fun,
stat,
k,
leading,
abs_v
)
Arguments
dat |
An object of class |
dat_orig |
The |
which_fun |
A |
estimator |
A |
plot_type |
A |
summ_fun |
A |
stat |
A |
k |
A |
leading |
A |
abs_v |
A |
Value
Whether all argument conditions are satisfied.
Estimate C of Spiked Covariance Matrix Estimator
Description
computeC()
computes the c(ell) value described in
Donoho et al. (2018).
Usage
computeC(ell, p_n_ratio)
Arguments
ell |
A |
p_n_ratio |
A |
Value
A numeric
vector.
References
Donoho D, Gavish M, Johnstone I (2018). “Optimal shrinkage of eigenvalues in the spiked covariance model.” The Annals of Statistics, 46(4), 1742 – 1778.
Estimate Ell of Spiked Covariance Matrix Estimator
Description
computeEll()
computes the ell value described in
Donoho et al. (2018).
Usage
computeEll(scaled_eig_vals, p, p_n_ratio)
Arguments
scaled_eig_vals |
A |
p |
A |
p_n_ratio |
A |
Value
A numeric
vector.
References
Donoho D, Gavish M, Johnstone I (2018). “Optimal shrinkage of eigenvalues in the spiked covariance model.” The Annals of Statistics, 46(4), 1742 – 1778.
Estimate S of Spiked Covariance Matrix Estimator
Description
computeS()
computes the s(ell) value described in
Donoho et al. (2018).
Usage
computeS(c_donoho)
Arguments
c_donoho |
A |
Value
A numeric
vector.
References
Donoho D, Gavish M, Johnstone I (2018). “Optimal shrinkage of eigenvalues in the spiked covariance model.” The Annals of Statistics, 46(4), 1742 – 1778.
Cross-Validated Covariance Matrix Estimator Selector
Description
cvCovEst()
identifies the optimal covariance matrix
estimator from among a set of candidate estimators.
Usage
cvCovEst(
dat,
estimators = c(linearShrinkEst, thresholdingEst, sampleCovEst),
estimator_params = list(linearShrinkEst = list(alpha = 0), thresholdingEst = list(gamma
= 0)),
cv_loss = cvMatrixFrobeniusLoss,
cv_scheme = "v_fold",
mc_split = 0.5,
v_folds = 10L,
parallel = FALSE,
...
)
Arguments
dat |
A numeric |
estimators |
A |
estimator_params |
A named |
cv_loss |
A |
cv_scheme |
A |
mc_split |
A |
v_folds |
An |
parallel |
A |
... |
Not currently used. Permits backward compatibility. |
Value
A list
of results containing the following elements:
-
estimate
- Amatrix
corresponding to the estimate of the optimal covariance matrix estimator. -
estimator
- Acharacter
indicating the optimal estimator and corresponding hyperparameters, if any. -
risk_df
- Atibble
providing the cross-validated risk estimates of each estimator. -
cv_df
- Atibble
providing each estimators' loss over the folds of the cross-validated procedure. -
args
- A namedlist
containing arguments passed tocvCovEst
.
Examples
cvCovEst(
dat = mtcars,
estimators = c(
linearShrinkLWEst, thresholdingEst, sampleCovEst
),
estimator_params = list(
thresholdingEst = list(gamma = seq(0.1, 0.3, 0.1))
)
)
Eigenvalue Plot
Description
cvEigenPlot()
plots the eigenvalues of one or more
estimators produced by cvCovEst()
.
Usage
cvEigenPlot(
dat,
estimator,
stat = "min",
dat_orig,
k,
leading = TRUE,
plot_type = "eigen",
cv_details,
has_hypers
)
Arguments
dat |
A named |
estimator |
A |
stat |
A |
dat_orig |
The |
k |
A |
leading |
A |
plot_type |
A |
cv_details |
A |
has_hypers |
A |
Value
A plot, or grid of plots, showing the k
leading or trailing
eigenvalues of the specified estimators and associated summary statistics of
the cross-validated risk.
Cross-Validation Function for Aggregated Frobenius Loss
Description
cvFrobeniusLoss()
evaluates the aggregated Frobenius loss
over a fold
object (from 'origami'
(Coyle and Hejazi 2018)).
Usage
cvFrobeniusLoss(fold, dat, estimator_funs, estimator_params = NULL)
Arguments
fold |
A |
dat |
A |
estimator_funs |
An |
estimator_params |
A named |
Value
A tibble
providing information on estimators,
their hyperparameters (if any), and their scaled Frobenius loss evaluated
on a given fold
.
References
Coyle J, Hejazi N (2018). “origami: A Generalized Framework for Cross-Validation in R.” Journal of Open Source Software, 3(21), 512. doi:10.21105/joss.00512.
Examples
library(MASS)
library(origami)
library(rlang)
# generate 10x10 covariance matrix with unit variances and off-diagonal
# elements equal to 0.5
Sigma <- matrix(0.5, nrow = 10, ncol = 10) + diag(0.5, nrow = 10)
# sample 50 observations from multivariate normal with mean = 0, var = Sigma
dat <- mvrnorm(n = 50, mu = rep(0, 10), Sigma = Sigma)
# generate a single fold using MC-cv
resub <- make_folds(dat,
fold_fun = folds_vfold,
V = 2
)[[1]]
cvFrobeniusLoss(
fold = resub,
dat = dat,
estimator_funs = rlang::quo(c(
linearShrinkEst, thresholdingEst, sampleCovEst
)),
estimator_params = list(
linearShrinkEst = list(alpha = c(0, 1)),
thresholdingEst = list(gamma = c(0, 1))
)
)
Cross-Validation Function for Matrix Frobenius Loss
Description
cvMatrixFrobeniusLoss()
evaluates the matrix Frobenius
loss over a fold
object (from 'origami'
(Coyle and Hejazi 2018)). This loss function is equivalent to that
presented in cvFrobeniusLoss()
in terms of estimator
selections, but is more computationally efficient.
Usage
cvMatrixFrobeniusLoss(fold, dat, estimator_funs, estimator_params = NULL)
Arguments
fold |
A |
dat |
A |
estimator_funs |
An |
estimator_params |
A named |
Value
A tibble
providing information on estimators,
their hyperparameters (if any), and their matrix Frobenius loss evaluated
on a given fold
.
References
Coyle J, Hejazi N (2018). “origami: A Generalized Framework for Cross-Validation in R.” Journal of Open Source Software, 3(21), 512. doi:10.21105/joss.00512.
Examples
library(MASS)
library(origami)
library(rlang)
# generate 10x10 covariance matrix with unit variances and off-diagonal
# elements equal to 0.5
Sigma <- matrix(0.5, nrow = 10, ncol = 10) + diag(0.5, nrow = 10)
# sample 50 observations from multivariate normal with mean = 0, var = Sigma
dat <- mvrnorm(n = 50, mu = rep(0, 10), Sigma = Sigma)
# generate a single fold using MC-cv
resub <- make_folds(dat,
fold_fun = folds_vfold,
V = 2
)[[1]]
cvMatrixFrobeniusLoss(
fold = resub,
dat = dat,
estimator_funs = rlang::quo(c(
linearShrinkEst, thresholdingEst, sampleCovEst
)),
estimator_params = list(
linearShrinkEst = list(alpha = c(0, 1)),
thresholdingEst = list(gamma = c(0, 1))
)
)
Matrix Metrics for cvCovEst Object
Description
cvMatrixMetrics
computes various metrics and properties
for each covariance matrix estimator candidate's estimate.
Usage
cvMatrixMetrics(object, dat_orig)
Arguments
object |
A named list of class |
dat_orig |
The |
Value
A named list of class "cvCovEst"
whose cross-validated risk
assessment is now a tibble
containing the
corresponding metrics for each estimate. The tibble
is grouped by estimator and ordered by the primary hyperparameter if
applicable.
Multiple Heat Map Plot
Description
cvMultiMelt()
visualizes the structure of one or more
covariance matrix estimators through a grid of heat maps, where each heat
map corresponds to a different estimator.
Usage
cvMultiMelt(
dat,
estimator,
stat = "min",
dat_orig,
plot_type = "heatmap",
cv_details,
has_hypers,
abs_v = TRUE
)
Arguments
dat |
A named |
estimator |
A |
stat |
A |
dat_orig |
The |
plot_type |
A |
cv_details |
A |
has_hypers |
A |
abs_v |
A |
Value
A grid of heat map plots comparing the desired covariance matrix estimators.
Summary Statistics of Cross-Validated Risk by Estimator Class
Description
cvRiskByClass()
calculates the following
summary statistics for the cross-validated risk within each class of
estimator passed to cvCovEst()
: minimum, Q1, median, mean, Q3,
and maximum. The results are output as a tibble
.
Usage
cvRiskByClass(dat)
Arguments
dat |
The |
Value
tibble
with rows corresponding to estimator
classes and columns corresponding to each summary statistic.
Cross-Validated Risk Plot
Description
cvRiskPlot()
plots the cross-validated risk for a given
estimator, or set of estimators, as a function of the hyperparameters.
Usage
cvRiskPlot(
dat,
est,
plot_type = "risk",
cv_details,
switch_vars = FALSE,
min_max = FALSE
)
Arguments
dat |
A named |
est |
A |
plot_type |
A |
cv_details |
A |
switch_vars |
A |
min_max |
A |
Value
A single plot or grid of plots for each estimator specified.
Cross-Validation Function for Scaled Matrix Frobenius Loss
Description
cvScaledMatrixFrobeniusLoss()
evaluates the scaled matrix
Frobenius loss over a fold
object (from 'origami'
(Coyle and Hejazi 2018)). The squared error loss computed for each
entry of the estimated covariance matrix is scaled by the training set's
sample variances of the variable associated with that entry's row and
column variables. This loss should be used instead of
cvMatrixFrobeniusLoss()
when a dataset's variables' values
are of different magnitudes.
Usage
cvScaledMatrixFrobeniusLoss(fold, dat, estimator_funs, estimator_params = NULL)
Arguments
fold |
A |
dat |
A |
estimator_funs |
An |
estimator_params |
A named |
Value
A tibble
providing information on estimators,
their hyperparameters (if any), and their scaled matrix Frobenius loss
evaluated on a given fold
.
References
Coyle J, Hejazi N (2018). “origami: A Generalized Framework for Cross-Validation in R.” Journal of Open Source Software, 3(21), 512. doi:10.21105/joss.00512.
Examples
library(MASS)
library(origami)
library(rlang)
# generate 10x10 covariance matrix with unit variances and off-diagonal
# elements equal to 0.5
Sigma <- matrix(0.5, nrow = 10, ncol = 10) + diag(0.5, nrow = 10)
# sample 50 observations from multivariate normal with mean = 0, var = Sigma
dat <- mvrnorm(n = 50, mu = rep(0, 10), Sigma = Sigma)
# generate a single fold using MC-cv
resub <- make_folds(dat,
fold_fun = folds_vfold,
V = 2
)[[1]]
cvScaledMatrixFrobeniusLoss(
fold = resub,
dat = dat,
estimator_funs = rlang::quo(c(
linearShrinkEst, thresholdingEst, sampleCovEst
)),
estimator_params = list(
linearShrinkEst = list(alpha = c(0, 1)),
thresholdingEst = list(gamma = c(0, 1))
)
)
Summary Plot
Description
cvSummaryPlot()
combines plots of the eigenvalues and the
covariance heatmap for the optimal estimator selected by
cvCovEst()
, and also provides a table showing the best
estimator within each class. A plot the risk of the optimal estimator's
class is also provided if applicable.
Usage
cvSummaryPlot(
dat,
estimator,
dat_orig,
stat,
k,
leading,
plot_type = "summary",
cv_details,
has_hypers,
multi_hypers,
abs_v,
switch_vars,
min_max
)
Arguments
dat |
A named |
estimator |
A |
dat_orig |
The |
plot_type |
A |
cv_details |
Character vector summarizing key arguments passed to
|
has_hypers |
A |
multi_hypers |
A |
abs_v |
A |
switch_vars |
A |
min_max |
A |
Value
A collection of plots and summary statistics for the optimal
estimator selected by cvCovEst
.
Linear Shrinkage Estimator, Dense Target
Description
denseLinearShrinkEst()
computes the asymptotically
optimal convex combination of the sample covariance matrix and a dense
target matrix. This target matrix's diagonal elements are equal to the
average of the sample covariance matrix estimate's diagonal elements, and
its off-diagonal elements are equal to the average of the sample covariance
matrix estimate's off-diagonal elements. For information on this
estimator's derivation, see Ledoit and Wolf (2020) and
Schäfer and Strimmer (2005).
Usage
denseLinearShrinkEst(dat)
Arguments
dat |
A numeric |
Value
A matrix
corresponding to the estimate of the covariance
matrix.
References
Ledoit O, Wolf M (2020).
“The Power of (Non-)Linear Shrinking: A Review and Guide to Covariance Matrix Estimation.”
Journal of Financial Econometrics.
ISSN 1479-8409, doi:10.1093/jjfinec/nbaa007, nbaa007, https://academic.oup.com/jfec/advance-article-pdf/doi/10.1093/jjfinec/nbaa007/33416890/nbaa007.pdf.
Schäfer J, Strimmer K (2005).
“A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics.”
Statistical Applications in Genetics and Molecular Biology, 4(1).
doi:10.2202/1544-6115.1175, https://www.degruyter.com/view/journals/sagmb/4/1/article-sagmb.2005.4.1.1175.xml.xml.
Examples
denseLinearShrinkEst(dat = mtcars)
Estimator Attributes Function
Description
estAttributes()
returns a named list corresponding to the
attributes of a specific estimator implemented in the cvCovEst
package.
Usage
estAttributes(estimator)
Arguments
estimator |
A |
Value
A named list
containing the attributes of the indicated
estimator.
Estimate Noise in Spiked Covariance Matrix Model
Description
estimateNoise()
estimates the unknown noise term in a
Gaussian spiked covariance matrix model, where the covariance matrix is
assumed to be the identity matrix multiplied by the unknown noise, save for
a few "spiked" entries. This procedures is described in
Donoho et al. (2018).
Usage
estimateNoise(eig_vals, p_n_ratio)
Arguments
eig_vals |
A |
p_n_ratio |
A |
Value
A numeric
estimate of the noise term in a spiked covariance
matrix model.
References
Donoho D, Gavish M, Johnstone I (2018). “Optimal shrinkage of eigenvalues in the spiked covariance model.” The Annals of Statistics, 46(4), 1742 – 1778.
Hyperparameter Retrieval Function
Description
getHypers()
retrieves the names and values of all
hyperparameters associated with an estimator passed to cvCovEst()
.
Usage
getHypers(dat, summ_stat, new_df = FALSE)
Arguments
dat |
A |
summ_stat |
A character vector specifying the summary statistic of interest. |
new_df |
A |
Value
A named list
containing the names of all hyperparameters and
their associated values, or a new wider data.frame
.
Summarize Cross-Validated Risks by Class with Hyperparameter
Description
hyperRisk()
groups together estimators of the
same class and parses the hyperparameter values over quantiles of the risk.
Usage
hyperRisk(dat)
Arguments
dat |
The |
Value
A named list
of data frames. Each list element corresponds to
a tibble
of summary statistics for a specific
estimator class. If no estimators have hyper-parameters, a message is
returned.
Check for cvCovEst Class
Description
is.cvCovEst()
provides a generic method for checking if
input is of class cvCovEst
.
Usage
is.cvCovEst(x)
Arguments
x |
The specific object to test. |
Value
A logical
indicating TRUE
if x
inherits from
class cvCovEst
.
Examples
cv_dat <- cvCovEst(
dat = mtcars,
estimators = c(
thresholdingEst, sampleCovEst
),
estimator_params = list(
thresholdingEst = list(gamma = seq(0.1, 0.3, 0.1))
),
center = TRUE,
scale = TRUE
)
is.cvCovEst(cv_dat)
Linear Shrinkage Estimator
Description
linearShrinkEst()
computes the linear shrinkage estimate
of the covariance matrix for a given value of alpha
. The linear
shrinkage estimator is defined as the convex combination of the sample
covariance matrix and the identity matrix. The choice of alpha
determines the bias-variance tradeoff of the estimators in this class:
values near 1 are more likely to exhibit high variance but low bias, and
values near 0 are more likely to be be very biased but have low variance.
Usage
linearShrinkEst(dat, alpha)
Arguments
dat |
A numeric |
alpha |
A |
Value
A matrix
corresponding to the estimate of the covariance
matrix.
Examples
linearShrinkEst(dat = mtcars, alpha = 0.1)
Ledoit-Wolf Linear Shrinkage Estimator
Description
linearShrinkLWEst()
computes an asymptotically optimal
convex combination of the sample covariance matrix and the identity matrix.
This convex combination effectively shrinks the eigenvalues of the sample
covariance matrix towards the identity. This estimator is more accurate
than the sample covariance matrix in high-dimensional settings under fairly
loose assumptions. For more information, consider reviewing the manuscript
by Ledoit and Wolf (2004).
Usage
linearShrinkLWEst(dat)
Arguments
dat |
A numeric |
Value
A matrix
corresponding to the Ledoit-Wolf linear shrinkage
estimate of the covariance matrix.
References
Ledoit O, Wolf M (2004). “A well-conditioned estimator for large-dimensional covariance matrices.” Journal of Multivariate Analysis, 88(2), 365 - 411. ISSN 0047-259X, doi:10.1016/S0047-259X(03)00096-4, https://www.sciencedirect.com/science/article/pii/S0047259X03000964.
Examples
linearShrinkLWEst(dat = mtcars)
General Matrix Metrics
Description
matrixMetrics
computes the condition number, sparsity,
and sign of a covariance matrix estimate.
Usage
matrixMetrics(estimate)
Arguments
estimate |
A |
Value
A named list
containing the three values.
Multi-Hyperparameter Risk Plots
Description
multiHyperRisk()
produces plots of the cross-validated
risk for estimators with more than one hyperparameter. The function
transforms one of the hyperparameters into a factor and uses it to
distinguish between the risk of various estimators. If one of the
hyperparameters has only one unique value, that hyperparameter is used as
the factor variable. If all hyperparameters have only one unique value, a
plot is not generated for that estimator class.
Usage
multiHyperRisk(dat, estimator, switch_vars = FALSE, min_max = FALSE)
Arguments
dat |
A |
estimator |
A |
switch_vars |
A |
min_max |
A |
Value
A named list
of plots.
Analytical Non-Linear Shrinkage Estimator
Description
nlShrinkLWEst()
invokes the analytical estimator
presented by Ledoit and Wolf (2018) for applying a
nonlinear shrinkage function to the sample eigenvalues of the covariance
matrix. The shrinkage function relies on an application of the Hilbert
Transform to an estimate of the sample eigenvalues' limiting spectral
density. This estimated density is computed with the Epanechnikov kernel
using a global bandwidth parameter of n^(-1/3)
. The resulting
shrinkage function pulls eigenvalues towards the nearest mode of their
empirical distribution, thus creating a localized shrinkage effect rather
than a global one.
We do not recommend that this estimator be employed when the estimand is the correlation matrix. The diagonal entries of the resulting estimate are not guaranteed to be equal to one.
Usage
nlShrinkLWEst(dat)
Arguments
dat |
A numeric |
Value
A matrix
corresponding to the estimate of the covariance
matrix.
References
Ledoit O, Wolf M (2018). “Analytical nonlinear shrinkage of large-dimensional covariance matrices.” Technical Report 264, Department of Economics - University of Zurich. https://EconPapers.repec.org/RePEc:zur:econwp:264.
Examples
nlShrinkLWEst(dat = mtcars)
Generic Plot Method for cvCovEst
Description
The plot
method is a generic method for plotting objects
of class, "cvCovEst"
. The method is designed as a tool for diagnostic
and exploratory analysis purposes when selecting a covariance matrix
estimator using cvCovEst
.
Usage
## S3 method for class 'cvCovEst'
plot(
x,
dat_orig,
estimator = NULL,
plot_type = c("summary"),
stat = c("min"),
k = NULL,
leading = TRUE,
abs_v = TRUE,
switch_vars = FALSE,
min_max = FALSE,
...
)
Arguments
x |
An object of class, |
dat_orig |
The |
estimator |
A |
plot_type |
A |
stat |
A |
k |
A |
leading |
A |
abs_v |
A |
switch_vars |
A |
min_max |
A |
... |
Additional arguments passed to the plot method. These are not explicitly used and should be ignored by the user. |
Details
This plot method is designed to aide users in understanding the
estimation procedure carried out in cvCovEst()
. There are
currently four different values for plot_type
that can be called:
-
"eigen"
- Plots the eigenvalues associated with the specifiedestimator
andstat
arguments in decreasing order. -
"risk"
- Plots the cross-validated risk of the specifiedestimator
as a function of the hyperparameter values passed tocvCovEst()
. This type of plot is only compatible with estimators which take hyperparameters as arguments. -
"heatmap"
- Plots a covariance heat map associated with the specifiedestimator
andstat
arguments. Multiple estimators and performance stats may be specified to produce grids of heat maps. -
"summary"
- Specifying this plot type will run all of the above plots for the best performing estimator selected bycvCovEst()
. These plots are then combined into a single panel along with a table containing the best performing estimator within each class. If the optimal estimator selected bycvCovEst()
does not have hyperparameters, then the risk plot is replaced with a table displaying the minimum, first quartile, median, third quartile, and maximum of the cross-validated risk associated with each class of estimator.
The stat
argument accepts five values. They each correspond to a
summary statistic of the cross-validated risk distribution within a class
of estimator. Possible values are:
-
"min"
- minimum -
"Q1"
- first quartile -
"median"
- median -
"Q3"
- third quartile -
"max"
- maximum
Value
A plot object
Examples
cv_dat <- cvCovEst(
dat = mtcars,
estimators = c(
thresholdingEst, sampleCovEst
),
estimator_params = list(
thresholdingEst = list(gamma = seq(0.1, 0.9, 0.1))
)
)
plot(x = cv_dat, dat_orig = mtcars)
Plot adaptiveLassoEst
Description
plotAdaptiveLassoEst()
performs actions specific to
plotting the cross-validated risk of the Adaptive LASSO estimator.
Usage
plotAdaptiveLassoEst(dat, switch_vars = FALSE, min_max = FALSE)
Arguments
dat |
A data table of cross-validated risks. Specifically, this is the
|
switch_vars |
A |
min_max |
A |
Value
A plot object
Plot poetEst
Description
plotPoetEst()
performs actions specific to plotting
the cross-validated risk of the POET estimator.
Usage
plotPoetEst(dat, switch_vars = FALSE, min_max = FALSE)
Arguments
dat |
A data table of cross-validated risks. Specifically, this is the
|
switch_vars |
A |
min_max |
A |
Value
A plot object
Plot robustPoetEst
Description
plotRobustPoetEst()
performs actions specific to plotting
the cross-validated risk of the Robust POET estimator.
Usage
plotRobustPoetEst(dat, switch_vars = FALSE, min_max = FALSE)
Arguments
dat |
A data table of cross-validated risks. Specifically, this is the
|
switch_vars |
A |
min_max |
A |
Value
A list of plots
POET Estimator
Description
poetEst()
implements the Principal Orthogonal complEment
Thresholding (POET) estimator, a nonparametric, unobserved-factor-based
estimator of the covariance matrix (Fan et al. 2013). The
estimator is defined as the sum of the sample covariance matrix'
rank-k
approximation and its post-thresholding principal orthogonal
complement. The hard thresholding function is used here, though others
could be used instead.
Usage
poetEst(dat, k, lambda)
Arguments
dat |
A numeric |
k |
An |
lambda |
A non-negative |
Value
A matrix
corresponding to the estimate of the covariance
matrix.
References
Fan J, Liao Y, Mincheva M (2013). “Large covariance estimation by thresholding principal orthogonal complements.” Journal of the Royal Statistical Society. Series B (Statistical Methodology), 75(4), 603–680. ISSN 13697412, 14679868, https://www.jstor.org/stable/24772450.
Examples
poetEst(dat = mtcars, k = 2L, lambda = 0.1)
Robust POET Estimator for Elliptical Distributions
Description
robustPoetEst()
implements the robust version of
Principal Orthogonal complEment Thresholding (POET) estimator, a
nonparametric, unobserved-factor-based estimator of the covariance matrix
when the underlying distribution is elliptical
(Fan et al. 2018). The estimator is defined as the sum of the
sample covariance matrix's rank-k
approximation and its
post-thresholding principal orthogonal complement. The rank-k
approximation is constructed from the sample covariance matrix, its leading
eigenvalues, and its leading eigenvectors. The sample covariance matrix and
leading eigenvalues are initially estimated via an M-estimation procedure
and the marginal Kendall's tau estimator. The leading eigenvectors are
estimated using spatial Kendall's tau estimator. The hard thresholding
function is used to regularize the idiosyncratic errors' estimated
covariance matrix, though other regularization schemes could be used.
We do not recommend that this estimator be employed when the estimand is the correlation matrix. The diagonal entries of the resulting estimate are not guaranteed to be equal to one.
Usage
robustPoetEst(dat, k, lambda, var_est = c("sample", "mad", "huber"))
Arguments
dat |
A numeric |
k |
An |
lambda |
A non-negative |
var_est |
A |
Value
A matrix
corresponding to the estimate of the covariance
matrix.
References
Fan J, Liao Y, Mincheva M (2013).
“Large covariance estimation by thresholding principal orthogonal complements.”
Journal of the Royal Statistical Society. Series B (Statistical Methodology), 75(4), 603–680.
ISSN 13697412, 14679868, https://www.jstor.org/stable/24772450.
Fan J, Liu H, Wang W (2018).
“Large covariance estimation through elliptical factor models.”
Ann. Statist., 46(4), 1383–1414.
doi:10.1214/17-AOS1588.
Examples
robustPoetEst(dat = mtcars, k = 2L, lambda = 0.1, var_est = "sample")
Safe Centering and Scaling of Columns
Description
safeColScale()
is a safe utility for centering and
scaling an input matrix X
. It is intended to avoid the drawback of
using scale()
on data with constant variance by adding a
small perturbation to truncate the values in such columns. Also, this is
faster than scale()
through relying on
'matrixStats for a key internal computation.
Usage
safeColScale(
X,
center = TRUE,
scale = TRUE,
tol = .Machine$double.eps,
eps = 0.01
)
Arguments
X |
An input |
center |
A |
scale |
A |
tol |
A tolerance level for the lowest column variance (or standard
deviation) value to be tolerated when scaling is desired. The default is
set to |
eps |
The desired lower bound of the estimated variance for a given
column. When the lowest estimate falls below |
Value
A centered and/or scaled version of the input data.
Note
This is an un-exported function borrowed directly from scPCA.
Sample Covariance Matrix
Description
sampleCovEst()
computes the sample covariance matrix.
This function is a simple wrapper around covar()
.
Usage
sampleCovEst(dat)
Arguments
dat |
A numeric |
Value
A matrix
corresponding to the estimate of the covariance
matrix.
Examples
sampleCovEst(dat = mtcars)
Smoothly Clipped Absolute Deviation Estimator
Description
scadEst()
applies the SCAD thresholding function of
Fan and Li (2001) to each entry of the sample
covariance matrix. This penalized estimator constitutes a compromise
between hard and soft thresholding of the sample covariance matrix: it is
a linear interpolation between soft thresholding up to 2 * lambda
and hard thresholding after 3.7 * lambda
(Rothman et al. 2009).
Usage
scadEst(dat, lambda)
Arguments
dat |
A numeric |
lambda |
A non-negative |
Value
A matrix
corresponding to the estimate of the covariance
matrix.
References
Fan J, Li R (2001).
“Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties.”
Journal of the American Statistical Association, 96(456), 1348-1360.
doi:10.1198/016214501753382273, https://doi.org/10.1198/016214501753382273.
Rothman AJ, Levina E, Zhu J (2009).
“Generalized Thresholding of Large Covariance Matrices.”
Journal of the American Statistical Association, 104(485), 177-186.
doi:10.1198/jasa.2009.0101, https://doi.org/10.1198/jasa.2009.0101.
Examples
scadEst(dat = mtcars, lambda = 0.2)
Smoothly Clipped Absolute Deviation Thresholding Function
Description
scadThreshold()
applies the smoothly clipped absolute
deviation thresholding function to the entries of a matrix
.
In particular, it is meant to be applied to the sample covariance matrix.
Usage
scadThreshold(entry, lambda, a)
Arguments
entry |
A |
lambda |
A non-negative |
a |
A |
Value
A regularized numeric
.
Extract Estimated Scaled Eigenvalues in Spiked Covariance Matrix Model
Description
scaleEigVals()
computes the scaled eigenvalues, and
filters out all eigenvalues that do not need to be shrunk.
Usage
scaleEigVals(eig_vals, noise, p_n_ratio, num_spikes)
Arguments
eig_vals |
A |
noise |
|
p_n_ratio |
A |
num_spikes |
|
Value
A numeric
vector of the scaled eigenvalues to be shrunk.
Frobenius Norm Shrinkage Estimator, Spiked Covariance Model
Description
spikedFrobeniusShrinkEst()
implements the asymptotically
optimal shrinkage estimator with respect to the Frobenius loss in a spiked
covariance matrix model. Informally, this model admits Gaussian
data-generating processes whose covariance matrix is a scalar multiple of
the identity, save for a few number of large "spikes". A thorough review of
this estimator, or more generally spiked covariance matrix estimation, is
provided in Donoho et al. (2018).
Usage
spikedFrobeniusShrinkEst(dat, p_n_ratio, num_spikes = NULL, noise = NULL)
Arguments
dat |
A numeric |
p_n_ratio |
A |
num_spikes |
A |
noise |
A |
Value
A matrix
corresponding to the covariance matrix estimate.
References
Donoho D, Gavish M, Johnstone I (2018). “Optimal shrinkage of eigenvalues in the spiked covariance model.” The Annals of Statistics, 46(4), 1742 – 1778.
Examples
spikedFrobeniusShrinkEst(dat = mtcars, p_n_ratio = 0.1, num_spikes = 2L)
Operator Norm Shrinkage Estimator, Spiked Covariance Model
Description
spikedOperatorShrinkEst()
implements the asymptotically
optimal shrinkage estimator with respect to the operator loss in a spiked
covariance matrix model. Informally, this model admits Gaussian
data-generating processes whose covariance matrix is a scalar multiple of
the identity, save for a few number of large "spikes". A thorough review of
this estimator, or more generally spiked covariance matrix estimation, is
provided in Donoho et al. (2018).
Usage
spikedOperatorShrinkEst(dat, p_n_ratio, num_spikes = NULL, noise = NULL)
Arguments
dat |
A numeric |
p_n_ratio |
A |
num_spikes |
A |
noise |
A |
Value
A matrix
corresponding to the covariance matrix estimate.
References
Donoho D, Gavish M, Johnstone I (2018). “Optimal shrinkage of eigenvalues in the spiked covariance model.” The Annals of Statistics, 46(4), 1742 – 1778.
Examples
spikedOperatorShrinkEst(dat = mtcars, p_n_ratio = 0.1, num_spikes = 2L)
Stein Loss Shrinkage Estimator, Spiked Covariance Model
Description
spikedSteinShrinkEst()
implements the asymptotically
optimal shrinkage estimator with respect to the Stein loss in a spiked
covariance matrix model. Informally, this model admits Gaussian
data-generating processes whose covariance matrix is a scalar multiple of
the identity, save for a few number of large "spikes". A thorough review of
this estimator, or more generally spiked covariance matrix estimation, is
provided in Donoho et al. (2018).
Usage
spikedSteinShrinkEst(dat, p_n_ratio, num_spikes = NULL, noise = NULL)
Arguments
dat |
A numeric |
p_n_ratio |
A |
num_spikes |
A |
noise |
A |
Value
A matrix
corresponding to the covariance matrix estimate.
References
Donoho D, Gavish M, Johnstone I (2018). “Optimal shrinkage of eigenvalues in the spiked covariance model.” The Annals of Statistics, 46(4), 1742 – 1778.
Examples
spikedFrobeniusShrinkEst(dat = mtcars, p_n_ratio = 0.1, num_spikes = 2L)
Convert String to Numeric or Integer When Needed
Description
Convert String to Numeric or Integer When Needed
Usage
strToNumber(x)
Arguments
x |
A |
Value
x
converted to the appropriate type.
Generic Summary Method for cvCovEst
Description
summary()
provides summary statistics regarding
the performance of cvCovEst()
and can be used for diagnostic
plotting.
Usage
## S3 method for class 'cvCovEst'
summary(
object,
dat_orig,
summ_fun = c("cvRiskByClass", "bestInClass", "worstInClass", "hyperRisk"),
...
)
Arguments
object |
A named |
dat_orig |
The |
summ_fun |
A |
... |
Additional arguments passed to |
Details
summary()
accepts four different choices for the
summ_fun
argument. The choices are:
-
"cvRiskByClass"
- Returns the minimum, first quartile, median, third quartile, and maximum of the cross-validated risk associated with each class of estimator passed tocvCovEst()
. -
"bestInClass"
- Returns the specific hyperparameters, if applicable, of the best performing estimator within each class along with other metrics. -
"worstInClass"
- Returns the specific hyperparameters, if applicable, of the worst performing estimator within each class along with other metrics. -
"hyperRisk"
- For estimators that take hyperparameters as arguments, this function returns the hyperparameters associated with the minimum, first quartile, median, third quartile, and maximum of the cross-validated risk within each class of estimator. Each class has its owntibble
, which are returned as alist
.
Value
A named list
where each element corresponds to the output of
of the requested summaries.
Examples
cv_dat <- cvCovEst(
dat = mtcars,
estimators = c(
linearShrinkEst, thresholdingEst, sampleCovEst
),
estimator_params = list(
linearShrinkEst = list(alpha = seq(0.1, 0.9, 0.1)),
thresholdingEst = list(gamma = seq(0.1, 0.9, 0.1))
),
center = TRUE,
scale = TRUE
)
summary(cv_dat, mtcars)
Tapering Estimator
Description
taperingEst()
estimates the covariance matrix of a
data.frame
-like object with ordered variables by gradually shrinking
the bands of the sample covariance matrix towards zero. The estimator is
defined as the Hadamard product of the sample covariance matrix and a
weight matrix. The amount of shrinkage is dictated by the weight matrix
and is specified by a hyperparameter k
. This estimator is attributed
to Cai et al. (2010).
The weight matrix is a Toeplitz matrix with entries defined as follows. Let
i and j index the rows and columns of the weight matrix, respectively. If
abs(i - j) <= k / 2
, then entry {i, j} in the weight matrix is
equal to 1. If k / 2 < abs(i - j) < k
, then entry {i, j} is equal
to 2 - 2 * abs(i - j) / k
. Otherwise, entry {i, j} is equal to 0.
Usage
taperingEst(dat, k)
Arguments
dat |
A numeric |
k |
A non-negative, even |
Value
A matrix
corresponding to the estimate of the covariance
matrix.
References
Cai TT, Zhang C, Zhou HH (2010). “Optimal rates of convergence for covariance matrix estimation.” Ann. Statist., 38(4), 2118–2144. doi:10.1214/09-AOS752.
Examples
taperingEst(dat = mtcars, k = 0.1)
cvCovEst Plot Theme
Description
theme_cvCovEst()
defines the overall theme of the
cvCovEst
package plotting functions and makes changes depending on
which plot function is being called.
Usage
theme_cvCovEst(plot_type, ...)
Arguments
plot_type |
A |
Value
A ggplot
theme.
Hard Thresholding Estimator
Description
thresholdingEst()
computes the hard thresholding estimate
of the covariance matrix for a given value of gamma
. The threshold
estimator of the covariance matrix applies a hard thresholding operator to
each element of the sample covariance matrix. For more information on this
estimator, review Bickel and Levina (2008).
Usage
thresholdingEst(dat, gamma)
Arguments
dat |
A numeric |
gamma |
A non-negative |
Value
A matrix
corresponding to the estimate of the covariance
matrix.
References
Bickel PJ, Levina E (2008). “Covariance regularization by thresholding.” Annals of Statistics, 36(6), 2577–2604. doi:10.1214/08-AOS600.
Examples
thresholdingEst(dat = mtcars, gamma = 0.2)