Help for package cvCovEst

Title:

Cross-Validated Covariance Matrix Estimation

Version:

1.2.2

Description:

An efficient cross-validated approach for covariance matrix estimation, particularly useful in high-dimensional settings. This method relies upon the theory of high-dimensional loss-based covariance matrix estimator selection developed by Boileau et al. (2022) <doi:10.1080/10618600.2022.2110883> to identify the optimal estimator from among a prespecified set of candidates.

Depends:

R (≥ 4.0.0)

Imports:

matrixStats, Matrix, stats, methods, origami, coop, Rdpack, rlang, dplyr, stringr, purrr, tibble, assertthat, RSpectra, ggplot2, ggpubr, RColorBrewer, RMTstat

Suggests:

future, future.apply, MASS, testthat, knitr, rmarkdown, covr, spelling

License:

MIT + file LICENSE

URL:

https://github.com/PhilBoileau/cvCovEst

BugReports:

https://github.com/PhilBoileau/cvCovEst/issues

Encoding:

UTF-8

VignetteBuilder:

knitr

RoxygenNote:

7.3.1

RdMacros:

Rdpack

Language:

en-US

NeedsCompilation:

Packaged:

2024-02-17 19:57:20 UTC; philippe

Author:

Philippe Boileau

[aut, cre, cph], Nima Hejazi

[aut], Brian Collica

[aut], Jamarcus Liu [ctb], Mark van der Laan

[ctb, ths], Sandrine Dudoit

[ctb, ths]

Maintainer:

Philippe Boileau <philippe_boileau@berkeley.edu>

Repository:

CRAN

Date/Publication:

2024-02-17 20:20:02 UTC

Adaptive LASSO Estimator

Description

adaptiveLassoEst() applied the adaptive LASSO to the entries of the sample covariance matrix. The thresholding function is inspired by the penalized regression introduced by Zou (2006). The thresholding function assigns a weight to each entry of the sample covariance matrix based on its initial value. This weight then determines the relative size of the penalty resulting in larger values being penalized less and reducing bias (Rothman et al. 2009).

Usage

adaptiveLassoEst(dat, lambda, n)

Arguments

dat

A numeric data.frame, matrix, or similar object.

lambda

A non-negative numeric defining the amount of thresholding applied to each element of dat's sample covariance matrix.

n

A non-negative numeric defining the exponent of the adaptive weight applied to each element of dat's sample covariance matrix.

Value

A matrix corresponding to the estimate of the covariance matrix.

References

Rothman AJ, Levina E, Zhu J (2009). “Generalized Thresholding of Large Covariance Matrices.” Journal of the American Statistical Association, 104(485), 177-186. doi:10.1198/jasa.2009.0101, https://doi.org/10.1198/jasa.2009.0101.

Zou H (2006). “The Adaptive Lasso and Its Oracle Properties.” Journal of the American Statistical Association, 101(476), 1418-1429. doi:10.1198/016214506000000735, https://doi.org/10.1198/016214506000000735.

Examples

adaptiveLassoEst(dat = mtcars, lambda = 0.9, n = 0.9)

Adaptive LASSO Thresholding Function

Description

adaptiveLassoThreshold() applies the adaptive LASSO thresholding function to the entries of a matrix. In particular, it is meant to be applied to sample covariance matrix

Usage

adaptiveLassoThreshold(entry, lambda, n)

Arguments

entry

A numeric entry in a covariance matrix estimate.

lambda

A non-negative numeric defining the amount of thresholding applied to each element of dat's sample covariance matrix.

n

A non-negative numeric defining the adaptive weight applied to each element of dat's sample covariance matrix.

Value

A regularized numeric.

Banding Estimator

Description

bandingEst() estimates the covariance matrix of data with ordered variables by forcing off-diagonal entries to be zero for indices that are far removed from one another. The {i, j} entry of the estimated covariance matrix will be zero if the absolute value of {i - j} is greater than some non-negative constant k. This estimator was proposed by Bickel and Levina (2008).

Usage

bandingEst(dat, k)

Arguments

dat

A numeric data.frame, matrix, or similar object.

k

A non-negative, numeric integer.

Value

A matrix corresponding to the estimate of the covariance matrix.

References

Bickel PJ, Levina E (2008). “Regularized estimation of large covariance matrices.” Annals of Statistics, 36(1), 199–227. doi:10.1214/009053607000000758.

Examples

bandingEst(dat = mtcars, k = 2L)

Showing Best Estimator Within Each Class of Estimators

Description

bestInClass() finds the best performing estimator within each class of estimator passed to cvCovEst() and finds the associated hyperparameters if applicable.

Usage

bestInClass(dat, worst = FALSE)

Arguments

dat

The tibble of cross-validated risks which is output by cvCovEst().

worst

This facilitates the option to choose the worst performing estimator in each class. Default is FALSE.

Value

tibble with rows corresponding to estimator classes and columns for hyperparameter values, cross-validated risk, and other summary metrics for the best (or worst) estimator in that class.

Check Arguments Passed to cvCovEst

Description

checkArgs() verifies that all arguments passed to cvCovEst() function meet its specifications.

Usage

checkArgs(
  dat,
  estimators,
  estimator_params,
  cv_loss,
  cv_scheme,
  mc_split,
  v_folds,
  parallel
)

Arguments

dat

A numeric data.frame, matrix, or similar object.

estimators

A list of estimator functions to be considered in the cross-validated estimator selection procedure.

estimator_params

A named list of arguments corresponding to the hyperparameters of covariance matrix estimators in estimators. The name of each list element should match the name of an estimator passed to estimators. Each element of the estimator_params is itself a named list, with the names corresponding to a given estimator's hyperparameter(s). These hyperparameters may be in the form of a single numeric or a numeric vector. If no hyperparameter is needed for a given estimator, then the estimator need not be listed.

cv_loss

A function indicating the loss function to be used. This defaults to the Frobenius loss, cvMatrixFrobeniusLoss(). An observation-based version, cvFrobeniusLoss(), is also made available. Additionally, the cvScaledMatrixFrobeniusLoss(()) is included for situations in which dat's variables are of different scales.

cv_scheme

A character indicating the cross-validation scheme to be employed. There are two options: (1) V-fold cross-validation, via "v_folds"; and (2) Monte Carlo cross-validation, via "mc". Defaults to Monte Carlo cross-validation.

mc_split

A numeric between 0 and 1 indicating the proportion of observations to be included in the validation set of each Monte Carlo cross-validation fold.

v_folds

An integer larger than or equal to 1 indicating the number of folds to use for cross-validation. The default is 10, regardless of the choice of cross-validation scheme.

parallel

A logical option indicating whether to run the main cross-validation loop with future_lapply(). This is passed directly to cross_validate().

Value

Whether all argument conditions are satisfied

Check Arguments Passed to plot.cvCovEst and summary.cvCovEst

Description

The checkPlotSumArgs() function verifies that all arguments passed to the plot.cvCovEst() and summary.cvCovEst() functions meet their specifications. Some additional arguments may be checked at the individual function level.

Usage

checkPlotSumArgs(
  dat,
  dat_orig,
  which_fun,
  estimator,
  plot_type,
  summ_fun,
  stat,
  k,
  leading,
  abs_v
)

Arguments

dat

An object of class "cvCovEst". Specifically, this is the standard output of the cvCovEst().

dat_orig

The numeric data.frame, matrix, or similar object originally passed to cvCovEst().

which_fun

A character indicating a choice of "plot" or "summary" depending on which function is being checked.

estimator

A character vector specifying one or more classes of estimators to compare.

plot_type

A character vector specifying one of four choices of diagnostic plots.

summ_fun

A character vector specifying which summaries to output.

stat

A character vector of one or more summary statistics to use when comparing estimators.

k

A integer indicating the number of leading/trailing eigenvalues to plot.

leading

A logical indicating if the leading eigenvalues should be used.

abs_v

A logical determining if the absolute value of the matrix entries should be used for plotting the matrix heatmap.

Value

Whether all argument conditions are satisfied.

Estimate C of Spiked Covariance Matrix Estimator

Description

computeC() computes the c(ell) value described in Donoho et al. (2018).

Usage

computeC(ell, p_n_ratio)

Arguments

ell

A numeric vector output by computeEll().

p_n_ratio

A numeric indicating the asymptotic ratio of the number of features, p, and the number of observations, n. This ratio is assumed to be between 0 and 1.

Value

A numeric vector.

References

Donoho D, Gavish M, Johnstone I (2018). “Optimal shrinkage of eigenvalues in the spiked covariance model.” The Annals of Statistics, 46(4), 1742 – 1778.

Estimate Ell of Spiked Covariance Matrix Estimator

Description

computeEll() computes the ell value described in Donoho et al. (2018).

Usage

computeEll(scaled_eig_vals, p, p_n_ratio)

Arguments

scaled_eig_vals

A numeric vector of scaled estimated eigenvalues.

p

A numeric integer indicating the number of features in the data.

p_n_ratio

A numeric indicating the asymptotic ratio of the number of features, p, and the number of observations, n. This ratio is assumed to be between 0 and 1.

Value

A numeric vector.

References

Donoho D, Gavish M, Johnstone I (2018). “Optimal shrinkage of eigenvalues in the spiked covariance model.” The Annals of Statistics, 46(4), 1742 – 1778.

Estimate S of Spiked Covariance Matrix Estimator

Description

computeS() computes the s(ell) value described in Donoho et al. (2018).

Usage

computeS(c_donoho)

Arguments

c_donoho

A numeric vector output by computeC().

Value

A numeric vector.

References

Donoho D, Gavish M, Johnstone I (2018). “Optimal shrinkage of eigenvalues in the spiked covariance model.” The Annals of Statistics, 46(4), 1742 – 1778.

Cross-Validated Covariance Matrix Estimator Selector

Description

cvCovEst() identifies the optimal covariance matrix estimator from among a set of candidate estimators.

Usage

cvCovEst(
  dat,
  estimators = c(linearShrinkEst, thresholdingEst, sampleCovEst),
  estimator_params = list(linearShrinkEst = list(alpha = 0), thresholdingEst = list(gamma
    = 0)),
  cv_loss = cvMatrixFrobeniusLoss,
  cv_scheme = "v_fold",
  mc_split = 0.5,
  v_folds = 10L,
  parallel = FALSE,
  ...
)

Arguments

dat

A numeric data.frame, matrix, or similar object.

estimators

A list of estimator functions to be considered in the cross-validated estimator selection procedure.

estimator_params

A named list of arguments corresponding to the hyperparameters of covariance matrix estimators in estimators. The name of each list element should match the name of an estimator passed to estimators. Each element of the estimator_params is itself a named list, with the names corresponding to a given estimator's hyperparameter(s). The hyperparameter(s) may be in the form of a single numeric or a numeric vector. If no hyperparameter is needed for a given estimator, then the estimator need not be listed.

cv_loss

A function indicating the loss function to be used. This defaults to the Frobenius loss, cvMatrixFrobeniusLoss(). An observation-based version, cvFrobeniusLoss(), is also made available. Additionally, the cvScaledMatrixFrobeniusLoss() is included for situations in which dat's variables are of different scales.

cv_scheme

mc_split

A numeric between 0 and 1 indicating the proportion of observations to be included in the validation set of each Monte Carlo cross-validation fold.

v_folds

An integer larger than or equal to 1 indicating the number of folds to use for cross-validation. The default is 10, regardless of the choice of cross-validation scheme.

parallel

A logical option indicating whether to run the main cross-validation loop with future_lapply(). This is passed directly to cross_validate().

...

Not currently used. Permits backward compatibility.

Value

A list of results containing the following elements:

estimate - A matrix corresponding to the estimate of the optimal covariance matrix estimator.
estimator - A character indicating the optimal estimator and corresponding hyperparameters, if any.
risk_df - A tibble providing the cross-validated risk estimates of each estimator.
cv_df - A tibble providing each estimators' loss over the folds of the cross-validated procedure.
args - A named list containing arguments passed to cvCovEst.

Examples

cvCovEst(
  dat = mtcars,
  estimators = c(
    linearShrinkLWEst, thresholdingEst, sampleCovEst
  ),
  estimator_params = list(
    thresholdingEst = list(gamma = seq(0.1, 0.3, 0.1))
  )
)

Eigenvalue Plot

Description

cvEigenPlot() plots the eigenvalues of one or more estimators produced by cvCovEst().

Usage

cvEigenPlot(
  dat,
  estimator,
  stat = "min",
  dat_orig,
  k,
  leading = TRUE,
  plot_type = "eigen",
  cv_details,
  has_hypers
)

Arguments

dat

A named list. Specifically, this is the standard output of cvCovEst().

estimator

A character vector specifying one or more classes of estimators to compare.

stat

A character vector containing the names of various cross-validated risk summary statistics. Within each class of estimator, eigenvalues will be plot for the estimators corresponding to each stat.

dat_orig

The numeric data.frame, matrix, or similar object originally passed to cvCovEst().

k

A numeric indicating the number of eigenvalues to plot. Must be less than or equal to the number of columns of the original data matrix.

leading

A logical indicating if the leading eigenvalues should be used. Default is TRUE. If FALSE, the trailing eigenvalues will be used instead.

plot_type

A character detailing the type of plot. Passed to theme_cvCovEst, defaults to "risk"

cv_details

A character vector summarizing key arguments passed to cvCovEst().

has_hypers

A character vector containing the names of current estimators with hyperparameters.

Value

A plot, or grid of plots, showing the k leading or trailing eigenvalues of the specified estimators and associated summary statistics of the cross-validated risk.

Cross-Validation Function for Aggregated Frobenius Loss

Description

cvFrobeniusLoss() evaluates the aggregated Frobenius loss over a fold object (from 'origami' (Coyle and Hejazi 2018)).

Usage

cvFrobeniusLoss(fold, dat, estimator_funs, estimator_params = NULL)

Arguments

fold

A fold object (from make_folds()) over which the estimation procedure is to be performed.

dat

A data.frame containing the full (non-sample-split) data, on which the cross-validated procedure is performed.

estimator_funs

An expression corresponding to a vector of covariance matrix estimator functions to be applied to the training data.

estimator_params

A named list of arguments corresponding to the hyperparameters of covariance matrix estimators, estimator_funs. The name of each list element should be the name of an estimator passed to estimator_funs. Each element of the estimator_params is itself a named list, with names corresponding to an estimators' hyperparameter(s). These hyperparameters may be in the form of a single numeric or a numeric vector. If no hyperparameter is needed for a given estimator, then the estimator need not be listed.

Value

A tibble providing information on estimators, their hyperparameters (if any), and their scaled Frobenius loss evaluated on a given fold.

References

Coyle J, Hejazi N (2018). “origami: A Generalized Framework for Cross-Validation in R.” Journal of Open Source Software, 3(21), 512. doi:10.21105/joss.00512.

Examples

library(MASS)
library(origami)
library(rlang)

# generate 10x10 covariance matrix with unit variances and off-diagonal
# elements equal to 0.5
Sigma <- matrix(0.5, nrow = 10, ncol = 10) + diag(0.5, nrow = 10)

# sample 50 observations from multivariate normal with mean = 0, var = Sigma
dat <- mvrnorm(n = 50, mu = rep(0, 10), Sigma = Sigma)

# generate a single fold using MC-cv
resub <- make_folds(dat,
  fold_fun = folds_vfold,
  V = 2
)[[1]]
cvFrobeniusLoss(
  fold = resub,
  dat = dat,
  estimator_funs = rlang::quo(c(
    linearShrinkEst, thresholdingEst, sampleCovEst
  )),
  estimator_params = list(
    linearShrinkEst = list(alpha = c(0, 1)),
    thresholdingEst = list(gamma = c(0, 1))
  )
)

Cross-Validation Function for Matrix Frobenius Loss

Description

cvMatrixFrobeniusLoss() evaluates the matrix Frobenius loss over a fold object (from 'origami' (Coyle and Hejazi 2018)). This loss function is equivalent to that presented in cvFrobeniusLoss() in terms of estimator selections, but is more computationally efficient.

Usage

cvMatrixFrobeniusLoss(fold, dat, estimator_funs, estimator_params = NULL)

Arguments

fold

A fold object (from make_folds()) over which the estimation procedure is to be performed.

dat

A data.frame containing the full (non-sample-split) data, on which the cross-validated procedure is performed.

estimator_funs

An expression corresponding to a vector of covariance matrix estimator functions to be applied to the training data.

estimator_params

Value

A tibble providing information on estimators, their hyperparameters (if any), and their matrix Frobenius loss evaluated on a given fold.

References

Coyle J, Hejazi N (2018). “origami: A Generalized Framework for Cross-Validation in R.” Journal of Open Source Software, 3(21), 512. doi:10.21105/joss.00512.

Examples

library(MASS)
library(origami)
library(rlang)

# generate 10x10 covariance matrix with unit variances and off-diagonal
# elements equal to 0.5
Sigma <- matrix(0.5, nrow = 10, ncol = 10) + diag(0.5, nrow = 10)

# sample 50 observations from multivariate normal with mean = 0, var = Sigma
dat <- mvrnorm(n = 50, mu = rep(0, 10), Sigma = Sigma)

# generate a single fold using MC-cv
resub <- make_folds(dat,
  fold_fun = folds_vfold,
  V = 2
)[[1]]
cvMatrixFrobeniusLoss(
  fold = resub,
  dat = dat,
  estimator_funs = rlang::quo(c(
    linearShrinkEst, thresholdingEst, sampleCovEst
  )),
  estimator_params = list(
    linearShrinkEst = list(alpha = c(0, 1)),
    thresholdingEst = list(gamma = c(0, 1))
  )
)

Matrix Metrics for cvCovEst Object

Description

cvMatrixMetrics computes various metrics and properties for each covariance matrix estimator candidate's estimate.

Usage

cvMatrixMetrics(object, dat_orig)

Arguments

object

A named list of class "cvCovEst" containing the cross-validated risk assessment.

dat_orig

The numeric data.frame, matrix, or similar object originally passed to cvCovEst().

Value

A named list of class "cvCovEst" whose cross-validated risk assessment is now a tibble containing the corresponding metrics for each estimate. The tibble is grouped by estimator and ordered by the primary hyperparameter if applicable.

Multiple Heat Map Plot

Description

cvMultiMelt() visualizes the structure of one or more covariance matrix estimators through a grid of heat maps, where each heat map corresponds to a different estimator.

Usage

cvMultiMelt(
  dat,
  estimator,
  stat = "min",
  dat_orig,
  plot_type = "heatmap",
  cv_details,
  has_hypers,
  abs_v = TRUE
)

Arguments

dat

A named list. Specifically, this is the standard output of cvCovEst.

estimator

A character vector specifying one or more classes of estimators to visualize and compare.

stat

A character vector containing the names of various cross-validated risk summary statistics. Estimators corresponding to each statistics will be visualized with a different heatmap. Default is 'min' for minimum cross-validated risk.

dat_orig

The numeric data.frame, matrix, or similar object originally passed to cvCovEst().

plot_type

A character detailing the type of plot. Passed to theme_cvCovEst, defaults to "risk"

cv_details

A character vector summarizing key arguments passed to cvCovEst().

has_hypers

A character vector containing the names of current estimators with hyperparameters.

abs_v

A logical determining if the absolute value of the matrix entries should be displayed versus the signed value. Default is TRUE.

Value

A grid of heat map plots comparing the desired covariance matrix estimators.

Summary Statistics of Cross-Validated Risk by Estimator Class

Description

cvRiskByClass() calculates the following summary statistics for the cross-validated risk within each class of estimator passed to cvCovEst(): minimum, Q1, median, mean, Q3, and maximum. The results are output as a tibble.

Usage

cvRiskByClass(dat)

Arguments

dat

The tibble of cross-validated risk calculations which is output by cvCovEst().

Value

tibble with rows corresponding to estimator classes and columns corresponding to each summary statistic.

Cross-Validated Risk Plot

Description

cvRiskPlot() plots the cross-validated risk for a given estimator, or set of estimators, as a function of the hyperparameters.

Usage

cvRiskPlot(
  dat,
  est,
  plot_type = "risk",
  cv_details,
  switch_vars = FALSE,
  min_max = FALSE
)

Arguments

dat

A named list. Specifically, this is the standard output of cvCovEst().

est

A character vector specifying one or more classes of estimators to compare.

plot_type

A character detailing the type of plot. Passed to theme_cvCovEst(), defaults to "risk"

cv_details

A character vector summarizing key arguments passed to cvCovEst().

switch_vars

A logical. If TRUE, the hyperparameters used for the x-axis and factor variables are switched. Only applies to estimators with more than one hyperparameter. Defaults to FALSE.

min_max

A logical. If TRUE, only the minimum and maximum values of the factor hyperparameter will be used. Only applies to estimators with more than one hyperparameter. Defaults to FALSE.

Value

A single plot or grid of plots for each estimator specified.

Cross-Validation Function for Scaled Matrix Frobenius Loss

Description

cvScaledMatrixFrobeniusLoss() evaluates the scaled matrix Frobenius loss over a fold object (from 'origami' (Coyle and Hejazi 2018)). The squared error loss computed for each entry of the estimated covariance matrix is scaled by the training set's sample variances of the variable associated with that entry's row and column variables. This loss should be used instead of cvMatrixFrobeniusLoss() when a dataset's variables' values are of different magnitudes.

Usage

cvScaledMatrixFrobeniusLoss(fold, dat, estimator_funs, estimator_params = NULL)

Arguments

fold

A fold object (from make_folds()) over which the estimation procedure is to be performed.

dat

A data.frame containing the full (non-sample-split) data, on which the cross-validated procedure is performed.

estimator_funs

An expression corresponding to a vector of covariance matrix estimator functions to be applied to the training data.

estimator_params

Value

A tibble providing information on estimators, their hyperparameters (if any), and their scaled matrix Frobenius loss evaluated on a given fold.

References

Coyle J, Hejazi N (2018). “origami: A Generalized Framework for Cross-Validation in R.” Journal of Open Source Software, 3(21), 512. doi:10.21105/joss.00512.

Examples

library(MASS)
library(origami)
library(rlang)

# generate 10x10 covariance matrix with unit variances and off-diagonal
# elements equal to 0.5
Sigma <- matrix(0.5, nrow = 10, ncol = 10) + diag(0.5, nrow = 10)

# sample 50 observations from multivariate normal with mean = 0, var = Sigma
dat <- mvrnorm(n = 50, mu = rep(0, 10), Sigma = Sigma)

# generate a single fold using MC-cv
resub <- make_folds(dat,
  fold_fun = folds_vfold,
  V = 2
)[[1]]
cvScaledMatrixFrobeniusLoss(
  fold = resub,
  dat = dat,
  estimator_funs = rlang::quo(c(
    linearShrinkEst, thresholdingEst, sampleCovEst
  )),
  estimator_params = list(
    linearShrinkEst = list(alpha = c(0, 1)),
    thresholdingEst = list(gamma = c(0, 1))
  )
)

Summary Plot

Description

cvSummaryPlot() combines plots of the eigenvalues and the covariance heatmap for the optimal estimator selected by cvCovEst(), and also provides a table showing the best estimator within each class. A plot the risk of the optimal estimator's class is also provided if applicable.

Usage

cvSummaryPlot(
  dat,
  estimator,
  dat_orig,
  stat,
  k,
  leading,
  plot_type = "summary",
  cv_details,
  has_hypers,
  multi_hypers,
  abs_v,
  switch_vars,
  min_max
)

Arguments

dat

A named list. Specifically, this is the standard output of cvCovEst.

estimator

A character vector specifying which class of estimator to plot.

dat_orig

The numeric data.frame, matrix, or similar object originally passed to cvCovEst.

plot_type

A character detailing the type of plot. Passed to theme_cvCovEst(), defaults to "risk"

cv_details

Character vector summarizing key arguments passed to cvCovEst().

has_hypers

A character vector containing the names of current estimators with hyperparameters.

multi_hypers

A character vector containing the names of current estimators with multiple hyperparameters.

abs_v

A logical determining if the absolute value of the matrix entries should be used for plotting the matrix heatmap. Default is TRUE.

switch_vars

A logical. If TRUE, the hyperparameters used for the x-axis and factor variables are switched. Only applies to estimators with more than one hyperparameter. Defaults to FALSE.

min_max

A logical. If TRUE, only the minimum and maximum values of the factor hyperparameter will be used. Only applies to estimators with more than one hyperparameter. Defaults to FALSE.

Value

A collection of plots and summary statistics for the optimal estimator selected by cvCovEst.

Linear Shrinkage Estimator, Dense Target

Description

denseLinearShrinkEst() computes the asymptotically optimal convex combination of the sample covariance matrix and a dense target matrix. This target matrix's diagonal elements are equal to the average of the sample covariance matrix estimate's diagonal elements, and its off-diagonal elements are equal to the average of the sample covariance matrix estimate's off-diagonal elements. For information on this estimator's derivation, see Ledoit and Wolf (2020) and Schäfer and Strimmer (2005).

Usage

denseLinearShrinkEst(dat)

Arguments

dat

A numeric data.frame, matrix, or similar object.

Value

A matrix corresponding to the estimate of the covariance matrix.

References

Ledoit O, Wolf M (2020). “The Power of (Non-)Linear Shrinking: A Review and Guide to Covariance Matrix Estimation.” Journal of Financial Econometrics. ISSN 1479-8409, doi:10.1093/jjfinec/nbaa007, nbaa007, https://academic.oup.com/jfec/advance-article-pdf/doi/10.1093/jjfinec/nbaa007/33416890/nbaa007.pdf.

Schäfer J, Strimmer K (2005). “A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics.” Statistical Applications in Genetics and Molecular Biology, 4(1). doi:10.2202/1544-6115.1175, https://www.degruyter.com/view/journals/sagmb/4/1/article-sagmb.2005.4.1.1175.xml.xml.

Examples

denseLinearShrinkEst(dat = mtcars)

Estimator Attributes Function

Description

estAttributes() returns a named list corresponding to the attributes of a specific estimator implemented in the cvCovEst package.

Usage

estAttributes(estimator)

Arguments

estimator

A character vector specifying a class of estimator.

Value

A named list containing the attributes of the indicated estimator.

Estimate Noise in Spiked Covariance Matrix Model

Description

estimateNoise() estimates the unknown noise term in a Gaussian spiked covariance matrix model, where the covariance matrix is assumed to be the identity matrix multiplied by the unknown noise, save for a few "spiked" entries. This procedures is described in Donoho et al. (2018).

Usage

estimateNoise(eig_vals, p_n_ratio)

Arguments

eig_vals

A numeric vector of estimated eigenvalues.

p_n_ratio

A numeric indicating the asymptotic ratio of the number of features, p, and the number of observations, n. This ratio is assumed to be between 0 and 1.

Value

A numeric estimate of the noise term in a spiked covariance matrix model.

References

Donoho D, Gavish M, Johnstone I (2018). “Optimal shrinkage of eigenvalues in the spiked covariance model.” The Annals of Statistics, 46(4), 1742 – 1778.

Hyperparameter Retrieval Function

Description

getHypers() retrieves the names and values of all hyperparameters associated with an estimator passed to cvCovEst().

Usage

getHypers(dat, summ_stat, new_df = FALSE)

Arguments

dat

A data.frame of estimators and their hyperparameter values. Specifically, this is one of the outputs of summary.cvCovEst() or cvCovEst().

summ_stat

A character vector specifying the summary statistic of interest.

new_df

A logical indicating whether a new data.frame should be returned with columns for individual hyperparameters. Default is FALSE.

Value

A named list containing the names of all hyperparameters and their associated values, or a new wider data.frame.

Summarize Cross-Validated Risks by Class with Hyperparameter

Description

hyperRisk() groups together estimators of the same class and parses the hyperparameter values over quantiles of the risk.

Usage

hyperRisk(dat)

Arguments

dat

The tibble of cross-validated risk calculations which is output by cvCovEst().

Value

A named list of data frames. Each list element corresponds to a tibble of summary statistics for a specific estimator class. If no estimators have hyper-parameters, a message is returned.

Check for cvCovEst Class

Description

is.cvCovEst() provides a generic method for checking if input is of class cvCovEst.

Usage

is.cvCovEst(x)

Arguments

x

The specific object to test.

Value

A logical indicating TRUE if x inherits from class cvCovEst.

Examples

cv_dat <- cvCovEst(
  dat = mtcars,
  estimators = c(
    thresholdingEst, sampleCovEst
  ),
  estimator_params = list(
    thresholdingEst = list(gamma = seq(0.1, 0.3, 0.1))
  ),
  center = TRUE,
  scale = TRUE
)

is.cvCovEst(cv_dat)

Linear Shrinkage Estimator

Description

linearShrinkEst() computes the linear shrinkage estimate of the covariance matrix for a given value of alpha. The linear shrinkage estimator is defined as the convex combination of the sample covariance matrix and the identity matrix. The choice of alpha determines the bias-variance tradeoff of the estimators in this class: values near 1 are more likely to exhibit high variance but low bias, and values near 0 are more likely to be be very biased but have low variance.

Usage

linearShrinkEst(dat, alpha)

Arguments

dat

A numeric data.frame, matrix, or similar object.

alpha

A numeric between 0 and 1 defining convex combinations of the sample covariance matrix and the identity. alpha = 1 produces the sample covariance matrix, and alpha = 0 returns the identity.

Value

A matrix corresponding to the estimate of the covariance matrix.

Examples

linearShrinkEst(dat = mtcars, alpha = 0.1)

Ledoit-Wolf Linear Shrinkage Estimator

Description

linearShrinkLWEst() computes an asymptotically optimal convex combination of the sample covariance matrix and the identity matrix. This convex combination effectively shrinks the eigenvalues of the sample covariance matrix towards the identity. This estimator is more accurate than the sample covariance matrix in high-dimensional settings under fairly loose assumptions. For more information, consider reviewing the manuscript by Ledoit and Wolf (2004).

Usage

linearShrinkLWEst(dat)

Arguments

dat

A numeric data.frame, matrix, or similar object.

Value

A matrix corresponding to the Ledoit-Wolf linear shrinkage estimate of the covariance matrix.

References

Ledoit O, Wolf M (2004). “A well-conditioned estimator for large-dimensional covariance matrices.” Journal of Multivariate Analysis, 88(2), 365 - 411. ISSN 0047-259X, doi:10.1016/S0047-259X(03)00096-4, https://www.sciencedirect.com/science/article/pii/S0047259X03000964.

Examples

linearShrinkLWEst(dat = mtcars)

General Matrix Metrics

Description

matrixMetrics computes the condition number, sparsity, and sign of a covariance matrix estimate.

Usage

matrixMetrics(estimate)

Arguments

estimate

A matrix corresponding to a covariance matrix estimate.

Value

A named list containing the three values.

Multi-Hyperparameter Risk Plots

Description

multiHyperRisk() produces plots of the cross-validated risk for estimators with more than one hyperparameter. The function transforms one of the hyperparameters into a factor and uses it to distinguish between the risk of various estimators. If one of the hyperparameters has only one unique value, that hyperparameter is used as the factor variable. If all hyperparameters have only one unique value, a plot is not generated for that estimator class.

Usage

multiHyperRisk(dat, estimator, switch_vars = FALSE, min_max = FALSE)

Arguments

dat

A data.frame of cross-validated risks. Specifically, this is the risk_df table output by cvCovEst().

estimator

A character vector specifying one or more classes of estimators to compare.

switch_vars

A logical indicating if the x-axis and factor variables should be switched. Default is FALSE.

min_max

A logical. If TRUE, only the minimum and maximum values of the factor hyperparameter will be used. Defaults to FALSE.

Value

A named list of plots.

Analytical Non-Linear Shrinkage Estimator

Description

nlShrinkLWEst() invokes the analytical estimator presented by Ledoit and Wolf (2018) for applying a nonlinear shrinkage function to the sample eigenvalues of the covariance matrix. The shrinkage function relies on an application of the Hilbert Transform to an estimate of the sample eigenvalues' limiting spectral density. This estimated density is computed with the Epanechnikov kernel using a global bandwidth parameter of n^(-1/3). The resulting shrinkage function pulls eigenvalues towards the nearest mode of their empirical distribution, thus creating a localized shrinkage effect rather than a global one.

We do not recommend that this estimator be employed when the estimand is the correlation matrix. The diagonal entries of the resulting estimate are not guaranteed to be equal to one.

Usage

nlShrinkLWEst(dat)

Arguments

dat

A numeric data.frame, matrix, or similar object.

Value

A matrix corresponding to the estimate of the covariance matrix.

References

Ledoit O, Wolf M (2018). “Analytical nonlinear shrinkage of large-dimensional covariance matrices.” Technical Report 264, Department of Economics - University of Zurich. https://EconPapers.repec.org/RePEc:zur:econwp:264.

Examples

nlShrinkLWEst(dat = mtcars)

Generic Plot Method for cvCovEst

Description

The plot method is a generic method for plotting objects of class, "cvCovEst". The method is designed as a tool for diagnostic and exploratory analysis purposes when selecting a covariance matrix estimator using cvCovEst.

Usage

## S3 method for class 'cvCovEst'
plot(
  x,
  dat_orig,
  estimator = NULL,
  plot_type = c("summary"),
  stat = c("min"),
  k = NULL,
  leading = TRUE,
  abs_v = TRUE,
  switch_vars = FALSE,
  min_max = FALSE,
  ...
)

Arguments

x

An object of class, "cvCovEst". Specifically, this is the standard output of the function cvCovEst.

dat_orig

The numeric data.frame, matrix, or similar object originally passed to cvCovEst.

estimator

A character vector specifying one or more classes of estimators to compare. If NULL, the class of estimator associated with optimal cvCovEst selection is used.

plot_type

A character vector specifying one of four choices of diagnostic plots. Default is "summary". See Details for more about each plotting choice.

stat

A character vector of one or more summary statistics to use when comparing estimators. Default is "min" for minimum cross-validated risk. See Details for more options.

k

A integer indicating the number of leading/trailing eigenvalues to plot. If NULL, will default to the number of columns in dat_orig.

leading

A logical indicating if the leading eigenvalues should be used. Default is TRUE. If FALSE, the trailing eigenvalues are used instead.

abs_v

A logical determining if the absolute value of the matrix entries should be used for plotting the matrix heat map. Default is TRUE.

switch_vars

A logical. If TRUE, the hyperparameters used for the x-axis and factor variables are switched in the plot of the cross-validated risk. Only applies to estimators with more than one hyperparameter. Default is FALSE.

min_max

A logical. If TRUE, only the minimum and maximum values of the factor hyperparameter will be used. Only applies to estimators with more than one hyperparameter. Default is FALSE.

...

Additional arguments passed to the plot method. These are not explicitly used and should be ignored by the user.

Details

This plot method is designed to aide users in understanding the estimation procedure carried out in cvCovEst(). There are currently four different values for plot_type that can be called:

"eigen" - Plots the eigenvalues associated with the specified estimator and stat arguments in decreasing order.
"risk" - Plots the cross-validated risk of the specified estimator as a function of the hyperparameter values passed to cvCovEst(). This type of plot is only compatible with estimators which take hyperparameters as arguments.
"heatmap" - Plots a covariance heat map associated with the specified estimator and stat arguments. Multiple estimators and performance stats may be specified to produce grids of heat maps.
"summary" - Specifying this plot type will run all of the above plots for the best performing estimator selected by cvCovEst(). These plots are then combined into a single panel along with a table containing the best performing estimator within each class. If the optimal estimator selected by cvCovEst() does not have hyperparameters, then the risk plot is replaced with a table displaying the minimum, first quartile, median, third quartile, and maximum of the cross-validated risk associated with each class of estimator.

The stat argument accepts five values. They each correspond to a summary statistic of the cross-validated risk distribution within a class of estimator. Possible values are:

"min" - minimum
"Q1" - first quartile
"median" - median
"Q3" - third quartile
"max" - maximum

Value

A plot object

Examples

cv_dat <- cvCovEst(
  dat = mtcars,
  estimators = c(
    thresholdingEst, sampleCovEst
  ),
  estimator_params = list(
    thresholdingEst = list(gamma = seq(0.1, 0.9, 0.1))
  )
)

plot(x = cv_dat, dat_orig = mtcars)

Plot adaptiveLassoEst

Description

plotAdaptiveLassoEst() performs actions specific to plotting the cross-validated risk of the Adaptive LASSO estimator.

Usage

plotAdaptiveLassoEst(dat, switch_vars = FALSE, min_max = FALSE)

Arguments

dat

A data table of cross-validated risks. Specifically, this is the risk_df table output by cvCovEst().

switch_vars

A logical indicating if the x-axis and factor variables should be switched. Default is FALSE.

min_max

A logical. Default is FALSE. If TRUE, only the minimum and maximum values of the factor hyperparameter will be used.

Value

A plot object

Plot poetEst

Description

plotPoetEst() performs actions specific to plotting the cross-validated risk of the POET estimator.

Usage

plotPoetEst(dat, switch_vars = FALSE, min_max = FALSE)

Arguments

dat

A data table of cross-validated risks. Specifically, this is the risk_df table output by cvCovEst().

switch_vars

A logical indicating if the x-axis and factor variables should be switched. Default is FALSE.

min_max

A logical. Default is FALSE. If TRUE, only the minimum and maximum values of the factor hyperparameter will be used.

Value

A plot object

Plot robustPoetEst

Description

plotRobustPoetEst() performs actions specific to plotting the cross-validated risk of the Robust POET estimator.

Usage

plotRobustPoetEst(dat, switch_vars = FALSE, min_max = FALSE)

Arguments

dat

A data table of cross-validated risks. Specifically, this is the risk_df table output by cvCovEst().

switch_vars

A logical indicating if the x-axis and factor variables should be switched. Default is FALSE.

min_max

A logical. Default is FALSE. If TRUE, only the minimum and maximum values of the factor hyperparameter will be used.

Value

A list of plots

POET Estimator

Description

poetEst() implements the Principal Orthogonal complEment Thresholding (POET) estimator, a nonparametric, unobserved-factor-based estimator of the covariance matrix (Fan et al. 2013). The estimator is defined as the sum of the sample covariance matrix' rank-k approximation and its post-thresholding principal orthogonal complement. The hard thresholding function is used here, though others could be used instead.

Usage

poetEst(dat, k, lambda)

Arguments

dat

A numeric data.frame, matrix, or similar object.

k

An integer indicating the number of unobserved latent factors. Empirical evidence suggests that the POET estimator is robust to overestimation of this hyperparameter (Fan et al. 2013). In practice, it is therefore preferable to use larger values.

lambda

A non-negative numeric defining the amount of thresholding applied to each element of sample covariance matrix's orthogonal complement.

Value

A matrix corresponding to the estimate of the covariance matrix.

References

Examples

poetEst(dat = mtcars, k = 2L, lambda = 0.1)

Robust POET Estimator for Elliptical Distributions

Description

robustPoetEst() implements the robust version of Principal Orthogonal complEment Thresholding (POET) estimator, a nonparametric, unobserved-factor-based estimator of the covariance matrix when the underlying distribution is elliptical (Fan et al. 2018). The estimator is defined as the sum of the sample covariance matrix's rank-k approximation and its post-thresholding principal orthogonal complement. The rank-k approximation is constructed from the sample covariance matrix, its leading eigenvalues, and its leading eigenvectors. The sample covariance matrix and leading eigenvalues are initially estimated via an M-estimation procedure and the marginal Kendall's tau estimator. The leading eigenvectors are estimated using spatial Kendall's tau estimator. The hard thresholding function is used to regularize the idiosyncratic errors' estimated covariance matrix, though other regularization schemes could be used.

We do not recommend that this estimator be employed when the estimand is the correlation matrix. The diagonal entries of the resulting estimate are not guaranteed to be equal to one.

Usage

robustPoetEst(dat, k, lambda, var_est = c("sample", "mad", "huber"))

Arguments

dat

A numeric data.frame, matrix, or similar object.

k

lambda

A non-negative numeric defining the amount of thresholding applied to each element of sample covariance matrix's orthogonal complement.

var_est

A character dictating which variance estimator to use. This must be one of the strings "sample", "mad", or "huber". "sample" uses sample variances; "mad" estimates variances via median absolute deviation; "huber" uses an M-estimator for variance under the Huber loss.

Value

A matrix corresponding to the estimate of the covariance matrix.

References

Fan J, Liao Y, Mincheva M (2013). “Large covariance estimation by thresholding principal orthogonal complements.” Journal of the Royal Statistical Society. Series B (Statistical Methodology), 75(4), 603–680. ISSN 13697412, 14679868, https://www.jstor.org/stable/24772450.

Fan J, Liu H, Wang W (2018). “Large covariance estimation through elliptical factor models.” Ann. Statist., 46(4), 1383–1414. doi:10.1214/17-AOS1588.

Examples

robustPoetEst(dat = mtcars, k = 2L, lambda = 0.1, var_est = "sample")

Safe Centering and Scaling of Columns

Description

safeColScale() is a safe utility for centering and scaling an input matrix X. It is intended to avoid the drawback of using scale() on data with constant variance by adding a small perturbation to truncate the values in such columns. Also, this is faster than scale() through relying on 'matrixStats for a key internal computation.

Usage

safeColScale(
  X,
  center = TRUE,
  scale = TRUE,
  tol = .Machine$double.eps,
  eps = 0.01
)

Arguments

X

An input matrix to be centered and/or scaled. If X is not of class matrix, then it must be coercible to such.

center

A logical indicating whether to re-center the columns of the input X.

scale

A logical indicating whether to re-scale the columns of the input X.

tol

A tolerance level for the lowest column variance (or standard deviation) value to be tolerated when scaling is desired. The default is set to double.eps of machine precision .Machine.

eps

The desired lower bound of the estimated variance for a given column. When the lowest estimate falls below tol, it is truncated to the value specified in this argument. The default is 0.01.

Value

A centered and/or scaled version of the input data.

Note

This is an un-exported function borrowed directly from scPCA.

Sample Covariance Matrix

Description

sampleCovEst() computes the sample covariance matrix. This function is a simple wrapper around covar().

Usage

sampleCovEst(dat)

Arguments

dat

A numeric data.frame, matrix, or similar object.

Value

A matrix corresponding to the estimate of the covariance matrix.

Examples

sampleCovEst(dat = mtcars)

Smoothly Clipped Absolute Deviation Estimator

Description

scadEst() applies the SCAD thresholding function of Fan and Li (2001) to each entry of the sample covariance matrix. This penalized estimator constitutes a compromise between hard and soft thresholding of the sample covariance matrix: it is a linear interpolation between soft thresholding up to 2 * lambda and hard thresholding after 3.7 * lambda (Rothman et al. 2009).

Usage

scadEst(dat, lambda)

Arguments

dat

A numeric data.frame, matrix, or similar object.

lambda

A non-negative numeric defining the degree of thresholding applied to each element of dat's sample covariance matrix.

Value

A matrix corresponding to the estimate of the covariance matrix.

References

Fan J, Li R (2001). “Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties.” Journal of the American Statistical Association, 96(456), 1348-1360. doi:10.1198/016214501753382273, https://doi.org/10.1198/016214501753382273.

Rothman AJ, Levina E, Zhu J (2009). “Generalized Thresholding of Large Covariance Matrices.” Journal of the American Statistical Association, 104(485), 177-186. doi:10.1198/jasa.2009.0101, https://doi.org/10.1198/jasa.2009.0101.

Examples

scadEst(dat = mtcars, lambda = 0.2)

Smoothly Clipped Absolute Deviation Thresholding Function

Description

scadThreshold() applies the smoothly clipped absolute deviation thresholding function to the entries of a matrix. In particular, it is meant to be applied to the sample covariance matrix.

Usage

scadThreshold(entry, lambda, a)

Arguments

entry

A numeric entry in a covariance matrix estimate.

lambda

A non-negative numeric defining the degree of thresholding applied to each element of dat's sample covariance matrix.

a

A numeric larger than or equal to 2 defining the point at which the SCAD thresholding functions becomes equal to the hard thresholding function.

Value

A regularized numeric.

Extract Estimated Scaled Eigenvalues in Spiked Covariance Matrix Model

Description

scaleEigVals() computes the scaled eigenvalues, and filters out all eigenvalues that do not need to be shrunk.

Usage

scaleEigVals(eig_vals, noise, p_n_ratio, num_spikes)

Arguments

eig_vals

A numeric vector of estimated eigenvalues.

noise

numeric representing the known scalar multiple of the identity matrix giving the approximate population covariance matrix.

p_n_ratio

A numeric indicating the asymptotic ratio of the number of features, p, and the number of observations, n. This ratio is assumed to be between 0 and 1.

num_spikes

numeric integer equal to or larger than one which providing the known number of spikes in the population covariance matrix. If set to NULL, the number of spikes is estimated.

Value

A numeric vector of the scaled eigenvalues to be shrunk.

Frobenius Norm Shrinkage Estimator, Spiked Covariance Model

Description

spikedFrobeniusShrinkEst() implements the asymptotically optimal shrinkage estimator with respect to the Frobenius loss in a spiked covariance matrix model. Informally, this model admits Gaussian data-generating processes whose covariance matrix is a scalar multiple of the identity, save for a few number of large "spikes". A thorough review of this estimator, or more generally spiked covariance matrix estimation, is provided in Donoho et al. (2018).

Usage

spikedFrobeniusShrinkEst(dat, p_n_ratio, num_spikes = NULL, noise = NULL)

Arguments

dat

A numeric data.frame, matrix, or similar object.

p_n_ratio

A numeric between 0 and 1 representing the asymptotic ratio of the number of features, p, and the number of observations, n.

num_spikes

A numeric integer equal to or larger than one which providing the known number of spikes in the population covariance matrix. Defaults to NULL, indicating that this value is not known and must be estimated.

noise

A numeric representing the known scalar multiple of the identity matrix giving the approximate population covariance matrix. Defaults to NULL, indicating that this values is not known and must be estimated.

Value

A matrix corresponding to the covariance matrix estimate.

References

Donoho D, Gavish M, Johnstone I (2018). “Optimal shrinkage of eigenvalues in the spiked covariance model.” The Annals of Statistics, 46(4), 1742 – 1778.

Examples

spikedFrobeniusShrinkEst(dat = mtcars, p_n_ratio = 0.1, num_spikes = 2L)

Operator Norm Shrinkage Estimator, Spiked Covariance Model

Description

spikedOperatorShrinkEst() implements the asymptotically optimal shrinkage estimator with respect to the operator loss in a spiked covariance matrix model. Informally, this model admits Gaussian data-generating processes whose covariance matrix is a scalar multiple of the identity, save for a few number of large "spikes". A thorough review of this estimator, or more generally spiked covariance matrix estimation, is provided in Donoho et al. (2018).

Usage

spikedOperatorShrinkEst(dat, p_n_ratio, num_spikes = NULL, noise = NULL)

Arguments

dat

A numeric data.frame, matrix, or similar object.

p_n_ratio

A numeric between 0 and 1 representing the asymptotic ratio of the number of features, p, and the number of observations, n.

num_spikes

noise

Value

A matrix corresponding to the covariance matrix estimate.

References

Donoho D, Gavish M, Johnstone I (2018). “Optimal shrinkage of eigenvalues in the spiked covariance model.” The Annals of Statistics, 46(4), 1742 – 1778.

Examples

spikedOperatorShrinkEst(dat = mtcars, p_n_ratio = 0.1, num_spikes = 2L)

Stein Loss Shrinkage Estimator, Spiked Covariance Model

Description

spikedSteinShrinkEst() implements the asymptotically optimal shrinkage estimator with respect to the Stein loss in a spiked covariance matrix model. Informally, this model admits Gaussian data-generating processes whose covariance matrix is a scalar multiple of the identity, save for a few number of large "spikes". A thorough review of this estimator, or more generally spiked covariance matrix estimation, is provided in Donoho et al. (2018).

Usage

spikedSteinShrinkEst(dat, p_n_ratio, num_spikes = NULL, noise = NULL)

Arguments

dat

A numeric data.frame, matrix, or similar object.

p_n_ratio

A numeric between 0 and 1 representing the asymptotic ratio of the number of features, p, and the number of observations, n.

num_spikes

noise

Value

A matrix corresponding to the covariance matrix estimate.

References

Donoho D, Gavish M, Johnstone I (2018). “Optimal shrinkage of eigenvalues in the spiked covariance model.” The Annals of Statistics, 46(4), 1742 – 1778.

Examples

spikedFrobeniusShrinkEst(dat = mtcars, p_n_ratio = 0.1, num_spikes = 2L)

Convert String to Numeric or Integer When Needed

Description

Convert String to Numeric or Integer When Needed

Usage

strToNumber(x)

Arguments

x

A character representing a number or an integer.

Value

x converted to the appropriate type.

Generic Summary Method for cvCovEst

Description

summary() provides summary statistics regarding the performance of cvCovEst() and can be used for diagnostic plotting.

Usage

## S3 method for class 'cvCovEst'
summary(
  object,
  dat_orig,
  summ_fun = c("cvRiskByClass", "bestInClass", "worstInClass", "hyperRisk"),
  ...
)

Arguments

object

A named list of class "cvCovEst".

dat_orig

The numeric data.frame, matrix, or similar object originally passed to cvCovEst().

summ_fun

A character vector specifying which summaries to output. See Details for function descriptions.

...

Additional arguments passed to summary()These are not explicitly used and should be ignored by the user.

Details

summary() accepts four different choices for the summ_fun argument. The choices are:

"cvRiskByClass" - Returns the minimum, first quartile, median, third quartile, and maximum of the cross-validated risk associated with each class of estimator passed to cvCovEst().
"bestInClass" - Returns the specific hyperparameters, if applicable, of the best performing estimator within each class along with other metrics.
"worstInClass" - Returns the specific hyperparameters, if applicable, of the worst performing estimator within each class along with other metrics.
"hyperRisk" - For estimators that take hyperparameters as arguments, this function returns the hyperparameters associated with the minimum, first quartile, median, third quartile, and maximum of the cross-validated risk within each class of estimator. Each class has its own tibble, which are returned as a list.

Value

A named list where each element corresponds to the output of of the requested summaries.

Examples

cv_dat <- cvCovEst(
  dat = mtcars,
  estimators = c(
    linearShrinkEst, thresholdingEst, sampleCovEst
  ),
  estimator_params = list(
    linearShrinkEst = list(alpha = seq(0.1, 0.9, 0.1)),
    thresholdingEst = list(gamma = seq(0.1, 0.9, 0.1))
  ),
  center = TRUE,
  scale = TRUE
)

summary(cv_dat, mtcars)

Tapering Estimator

Description

taperingEst() estimates the covariance matrix of a data.frame-like object with ordered variables by gradually shrinking the bands of the sample covariance matrix towards zero. The estimator is defined as the Hadamard product of the sample covariance matrix and a weight matrix. The amount of shrinkage is dictated by the weight matrix and is specified by a hyperparameter k. This estimator is attributed to Cai et al. (2010).

The weight matrix is a Toeplitz matrix with entries defined as follows. Let i and j index the rows and columns of the weight matrix, respectively. If abs(i - j) <= k / 2, then entry {i, j} in the weight matrix is equal to 1. If k / 2 < abs(i - j) < k, then entry {i, j} is equal to 2 - 2 * abs(i - j) / k. Otherwise, entry {i, j} is equal to 0.

Usage

taperingEst(dat, k)

Arguments

dat

A numeric data.frame, matrix, or similar object.

k

A non-negative, even numeric integer.

Value

A matrix corresponding to the estimate of the covariance matrix.

References

Cai TT, Zhang C, Zhou HH (2010). “Optimal rates of convergence for covariance matrix estimation.” Ann. Statist., 38(4), 2118–2144. doi:10.1214/09-AOS752.

Examples

taperingEst(dat = mtcars, k = 0.1)

cvCovEst Plot Theme

Description

theme_cvCovEst() defines the overall theme of the cvCovEst package plotting functions and makes changes depending on which plot function is being called.

Usage

theme_cvCovEst(plot_type, ...)

Arguments

plot_type

A character vector specifying which plot is to be displayed. Can contain more than one value in the case of plot_type = c("heatmap", "summary").

Value

A ggplot theme.

Hard Thresholding Estimator

Description

thresholdingEst() computes the hard thresholding estimate of the covariance matrix for a given value of gamma. The threshold estimator of the covariance matrix applies a hard thresholding operator to each element of the sample covariance matrix. For more information on this estimator, review Bickel and Levina (2008).

Usage

thresholdingEst(dat, gamma)

Arguments

dat

A numeric data.frame, matrix, or similar object.

gamma

A non-negative numeric defining the degree of hard thresholding applied to each element of dat's sample covariance matrix.

Value

A matrix corresponding to the estimate of the covariance matrix.

References

Bickel PJ, Levina E (2008). “Covariance regularization by thresholding.” Annals of Statistics, 36(6), 2577–2604. doi:10.1214/08-AOS600.

Examples

thresholdingEst(dat = mtcars, gamma = 0.2)