Title: Mixed, Low-Rank, and Sparse Multivariate Regression on High-Dimensional Data
Version: 0.1.0
Description: Mixed, low-rank, and sparse multivariate regression ('mixedLSR') provides tools for performing mixture regression when the coefficient matrix is low-rank and sparse. 'mixedLSR' allows subgroup identification by alternating optimization with simulated annealing to encourage global optimum convergence. This method is data-adaptive, automatically performing parameter selection to identify low-rank substructures in the coefficient matrix.
License: MIT + file LICENSE
Encoding: UTF-8
RoxygenNote: 7.2.1
Depends: R (≥ 4.1.0)
Imports: grpreg, purrr, MASS, stats, ggplot2
Suggests: knitr, rmarkdown, mclust
VignetteBuilder: knitr
BugReports: https://github.com/alexanderjwhite/mixedLSR
URL: https://alexanderjwhite.github.io/mixedLSR/
NeedsCompilation: no
Packaged: 2022-11-04 10:33:31 UTC; whitealj
Author: Alexander White ORCID iD [aut, cre], Sha Cao ORCID iD [aut], Yi Zhao ORCID iD [ctb], Chi Zhang ORCID iD [ctb]
Maintainer: Alexander White <whitealj@iu.edu>
Repository: CRAN
Date/Publication: 2022-11-04 20:00:02 UTC

Compute Bayesian information criterion for a mixedLSR model

Description

Compute Bayesian information criterion for a mixedLSR model

Usage

bic_lsr(a, n, llik)

Arguments

a

A list of coefficient matrices.

n

The sample size.

llik

The log-likelihood of the model.

Value

The BIC.

Examples

n <- 50
simulate <- simulate_lsr(n)
model <- mixed_lsr(simulate$x, simulate$y, k = 2, init_lambda = c(1,1), alt_iter = 0)
bic_lsr(model$A, n = n, model$llik)

Internal Alternating Optimization Function

Description

Internal Alternating Optimization Function

Usage

fct_alt_optimize(
  x,
  y,
  k,
  clust_assign,
  lambda,
  alt_iter,
  anneal_iter,
  em_iter,
  temp,
  mu,
  eps,
  accept_prob,
  sim_N,
  verbose
)

Arguments

x

A matrix of predictors.

y

A matrix of responses.

k

The number of groups.

clust_assign

The current clustering assignment.

lambda

A vector of penalization parameters.

alt_iter

The maximum number of times to alternate between the classification expectation maximization algorithm and the simulated annealing algorithm.

anneal_iter

The maximum number of simulated annealing iterations.

em_iter

The maximum number of EM iterations.

temp

The initial simulated annealing temperature, temp > 0.

mu

The simulated annealing decrease temperature fraction. Once the best configuration cannot be improved, reduce the temperature to (mu)T, 0 < mu < 1.

eps

The final simulated annealing temperature, eps > 0.

accept_prob

The simulated annealing probability of accepting a new assignment 0 < accept_prob < 1. When closer to 1, trial assignments will only be small perturbation of the current assignment. When closer to 0, trial assignments are closer to random.

sim_N

The simulated annealing number of iterations for reaching equilibrium.

verbose

A boolean indicating whether to print to screen.

Value

A final fit of mixedLSR


Internal Double Penalized Projection Function

Description

Internal Double Penalized Projection Function

Usage

fct_dpp(
  y,
  x,
  rank,
  lambda = NULL,
  alpha = 2 * sqrt(3),
  beta = 1,
  sigma,
  ptype = "grLasso",
  y_sparse = TRUE
)

Arguments

y

A matrix of responses.

x

A matrix of predictors.

rank

The rank, if known.

lambda

A vector of penalization parameters.

alpha

A positive constant DPP parameter.

beta

A positive constant DPP parameter.

sigma

An estimated standard deviation

ptype

A group penalized regression penalty type. See grpreg.

y_sparse

Should Y coefficients be treated as sparse?

Value

A list containing estimated coefficients, covariance, and penalty parameters.


Internal EM Algorithm

Description

Internal EM Algorithm

Usage

fct_em(x, y, k, lambda, clust_assign, lik_track, em_iter, verbose)

Arguments

x

A matrix of predictors.

y

A matrix of responses.

k

The number of groups.

lambda

A vector of penalization parameters.

clust_assign

The current clustering assignment.

lik_track

A vector storing the log-likelihood by iteration.

em_iter

The maximum number of EM iterations.

verbose

A boolean indicating whether to print to screen.

Value

A mixedLSR model.


Internal Posterior Calculation

Description

Internal Posterior Calculation

Usage

fct_gamma(
  x,
  y,
  k,
  N,
  clust_assign,
  pi_vec,
  lambda,
  alpha,
  beta,
  y_sparse,
  rank,
  max_rank
)

Arguments

x

A matrix of predictors.

y

A matrix of responses.

k

The number of groups.

N

The sample size.

clust_assign

The current clustering assignment.

pi_vec

A vector of mixing probabilities for each cluster label.

lambda

A vector of penalization parameters.

alpha

A positive constant DPP parameter.

beta

A positive constant DPP parameter.

y_sparse

Should Y coefficients be treated as sparse?

rank

The rank, if known.

max_rank

The maximum allowed rank.

Value

A list with the posterior, coefficients, and estimated covariance.


Internal Partition Initialization Function

Description

Internal Partition Initialization Function

Usage

fct_initialize(k, N)

Arguments

k

The number of groups.

N

The sample size.

Value

A vector of assignments.


Internal Likelihood Function

Description

Internal Likelihood Function

Usage

fct_j_lik(
  x,
  y,
  k,
  clust_assign,
  lambda,
  alpha = 2 * sqrt(3),
  beta = 1,
  y_sparse = TRUE,
  max_rank = 3,
  rank = NULL
)

Arguments

x

A matrix of predictors.

y

A matrix of responses.

k

The number of groups.

clust_assign

A vector of cluster labels.

lambda

A vector of penalization parameters.

alpha

A positive constant DPP parameter.

beta

A positive constant DPP parameter.

y_sparse

Should Y coefficients be treated as sparse?

max_rank

The maximum allowed rank.

rank

The rank, if known.

Value

The weighted log-likelihood


Internal Log-Likelihood Function

Description

Internal Log-Likelihood Function

Usage

fct_log_lik(mu_mat, sig_vec, y, N, m)

Arguments

mu_mat

The mean matrix.

sig_vec

A vector of sigma.

y

The output matrix.

N

The sample size.

m

The number of y features.

Value

A posterior matrix.


Internal Perturb Function

Description

Internal Perturb Function

Usage

fct_new_assign(assign, k, p)

Arguments

assign

The current clustering assignments.

k

The number of groups.

p

The acceptance probability.

Value

A perturbed assignment.


Internal Pi Function

Description

Internal Pi Function

Usage

fct_pi_vec(clust_assign, k, N)

Arguments

clust_assign

The current clustering assignment.

k

The number of groups.

N

The sample size.

Value

A mixing vector.


Internal Rank Estimation Function

Description

Internal Rank Estimation Function

Usage

fct_rank(x, y, sigma, eta)

Arguments

x

A matrix of predictors.

y

A matrix of responses.

sigma

An estimated noise level.

eta

A rank selection parameter.

Value

The estimated rank.


Internal Penalty Parameter Selection Function.

Description

Internal Penalty Parameter Selection Function.

Usage

fct_select_lambda(
  x,
  y,
  k,
  clust_assign = NULL,
  initial = FALSE,
  type = "all",
  verbose
)

Arguments

x

A matrix of predictors.

y

A matrix of responses.

k

The number of groups.

clust_assign

The current clustering assignment.

initial

An initial penalty parameter.

type

A type.

verbose

A boolean indicating whether to print to screen.

Value

A selected penalty parameter.


Internal Sigma Estimation Function

Description

Internal Sigma Estimation Function

Usage

fct_sigma(y, N, m)

Arguments

y

A matrix of responses.

N

The sample size.

m

The number of outcome variables.

Value

The estimated sigma.


Internal Simulated Annealing Function

Description

Internal Simulated Annealing Function

Usage

fct_sim_anneal(
  x,
  y,
  k,
  init_assign,
  lambda,
  temp,
  mu,
  eps,
  accept_prob,
  sim_N,
  track,
  anneal_iter = 1000,
  verbose
)

Arguments

x

A matrix of predictors.

y

A matrix of responses.

k

The number of groups.

init_assign

An initial clustering assignment.

lambda

A vector of penalization parameters.

temp

The initial simulated annealing temperature, temp > 0.

mu

The simulated annealing decrease temperature fraction. Once the best configuration cannot be improved, reduce the temperature to (mu)T, 0 < mu < 1.

eps

The final simulated annealing temperature, eps > 0.

accept_prob

The simulated annealing probability of accepting a new assignment 0 < accept_prob < 1. When closer to 1, trial assignments will only be small perturbation of the current assignment. When closer to 0, trial assignments are closer to random.

sim_N

The simulated annealing number of iterations for reaching equilibrium.

track

A likelihood tracking vector.

anneal_iter

The maximum number of simulated annealing iterations.

verbose

A boolean indicating whether to print to screen.

Value

An updated clustering vector.


Internal Weighted Log Likelihood Function

Description

Internal Weighted Log Likelihood Function

Usage

fct_weighted_ll(gamma)

Arguments

gamma

A posterior matrix

Value

A weighted log likelihood vector


Mixed Low-Rank and Sparse Multivariate Regression for High-Dimensional Data

Description

Mixed Low-Rank and Sparse Multivariate Regression for High-Dimensional Data

Usage

mixed_lsr(
  x,
  y,
  k,
  nstart = 1,
  init_assign = NULL,
  init_lambda = NULL,
  alt_iter = 5,
  anneal_iter = 1000,
  em_iter = 1000,
  temp = 1000,
  mu = 0.95,
  eps = 1e-06,
  accept_prob = 0.95,
  sim_N = 200,
  verbose = TRUE
)

Arguments

x

A matrix of predictors.

y

A matrix of responses.

k

The number of groups.

nstart

The number of random initializations, the result with the maximum likelihood is returned.

init_assign

A vector of initial assignments, NULL by default.

init_lambda

A vector with the values to initialize the penalization parameter for each group, e.g., c(1,1,1). Set to NULL by default.

alt_iter

The maximum number of times to alternate between the classification expectation maximization algorithm and the simulated annealing algorithm.

anneal_iter

The maximum number of simulated annealing iterations.

em_iter

The maximum number of EM iterations.

temp

The initial simulated annealing temperature, temp > 0.

mu

The simulated annealing decrease temperature fraction. Once the best configuration cannot be improved, reduce the temperature to (mu)T, 0 < mu < 1.

eps

The final simulated annealing temperature, eps > 0.

accept_prob

The simulated annealing probability of accepting a new assignment 0 < accept_prob < 1. When closer to 1, trial assignments will only be small perturbation of the current assignment. When closer to 0, trial assignments are closer to random.

sim_N

The simulated annealing number of iterations for reaching equilibrium.

verbose

A boolean indicating whether to print to screen.

Value

A list containing the likelihood, the partition, the coefficient matrices, and the BIC.

Examples

simulate <- simulate_lsr(50)
mixed_lsr(simulate$x, simulate$y, k = 2, init_lambda = c(1,1), alt_iter = 0)

Heatmap Plot of the mixedLSR Coefficient Matrices

Description

Heatmap Plot of the mixedLSR Coefficient Matrices

Usage

plot_lsr(a, abs = TRUE)

Arguments

a

A coefficient matrix from mixed_lsr model.

abs

A boolean for taking the absolute value of the coefficient matrix.

Value

A ggplot2 heatmap of the coefficient matrix, separated by subgroup.

Examples

simulate <- simulate_lsr()
plot_lsr(simulate$a)

Simulate Heterogeneous, Low-Rank, and Sparse Data

Description

Simulate Heterogeneous, Low-Rank, and Sparse Data

Usage

simulate_lsr(
  N = 100,
  k = 2,
  p = 30,
  m = 35,
  b = 1,
  d = 20,
  h = 0.2,
  case = "independent"
)

Arguments

N

The sample size, default = 100.

k

The number of groups, default = 2.

p

The number of predictor features, default = 30.

m

The number of response features, default = 35.

b

The signal-to-noise ratio, default = 1.

d

The singular value, default = 20.

h

The lower bound for the singular matrix simulation, default = 0.2.

case

The covariance case, "independent" or "dependent", default = "independent".

Value

A list of simulation values, including x matrix, y matrix, coefficients and true clustering assignments.

Examples

simulate_lsr()