Type: | Package |
Title: | The Induced Smoothed Lasso |
Version: | 1.5.2 |
Date: | 2024-01-22 |
Depends: | glmnet (≥ 4.0), Matrix (≥ 1.0-6), R (≥ 4.0.0) |
Imports: | stats, utils, graphics |
Suggests: | knitr, lars, xfun, rmarkdown |
Description: | An implementation of the induced smoothing (IS) idea to lasso regularization models to allow estimation and inference on the model coefficients (currently hypothesis testing only). Linear, logistic, Poisson and gamma regressions with several link functions are implemented. The algorithm is described in the original paper; see <doi:10.1177/0962280219842890> and discussed in a tutorial <doi:10.13140/RG.2.2.16360.11521>. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Encoding: | UTF-8 |
NeedsCompilation: | yes |
Packaged: | 2024-01-23 09:00:46 UTC; gianlucasottile |
Author: | Gianluca Sottile [aut, cre], Giovanna Cilluffo [aut, ctb], Vito MR Muggeo [aut, ctb] |
Maintainer: | Gianluca Sottile <gianluca.sottile@unipa.it> |
LazyLoad: | yes |
Repository: | CRAN |
Date/Publication: | 2024-01-23 15:20:05 UTC |
The Induced Smoothed lasso: A practical framework for hypothesis testing in high dimensional regression
Description
This package implements an induced smoothed approach for hypothesis testing in lasso regression.
Details
Package: | islasso |
Type: | Package |
Version: | 1.5.w |
Date: | 2024-01-22 |
License: | GPL-2 |
islasso
is used to fit generalized linear models with a L1-penalty on (some) regression coefficients. Along with point estimates, the main advantage is to return the full covariance matrix of estimate. The resulting standard errors can be used to make inference in the lasso framework. The main function is islasso
and the correspoinding fitter function islasso.fit
, and many auxiliary functions are implemented to summarize and visualize results: summary.islasso
, predict.islasso
, logLik.islasso
, deviance.islasso
, residuals.islasso
.
islasso.path
is used to fit Fit a generalized linear model via the Induced Smoothed Lasso. The regularization path is computed for the lasso or elasticnet penalty at a grid of values for the regularization parameter lambda. Along with coefficients profile, the main advantage is to return also the standard errors profile. The resulting standard errors can be used to make inference in the lasso framework. The main function is islasso.path
and the correspoinding fitter function islasso.path.fit
, and many auxiliary functions are implemented to summarize and visualize results: summary.islasso.path
, predict.islasso.path
, logLik.islasso.path
, deviance.islasso.path
, residuals.islasso.path
, coef.islasso.path
, fitted.islasso.path
.
Author(s)
Gianluca Sottile based on some preliminary functions by Vito Muggeo.
Maintainer: Gianluca Sottile <gianluca.sottile@unipa.it>
References
Cilluffo, G, Sottile, G, S, La Grutta, S and Muggeo, VMR (2019). The Induced Smoothed lasso: A practical framework for hypothesis testing in high dimensional regression. Statistical Methods in Medical Research, DOI: 10.1177/0962280219842890.
Sottile, G, Cilluffo, G, Muggeo, VMR (2019). The R package islasso: estimation and hypothesis testing in lasso regression. Technical Report on ResearchGate. doi:10.13140/RG.2.2.16360.11521.
Examples
set.seed(1)
n <- 100
p <- 30
p1 <- 10 #number of nonzero coefficients
coef.veri <- sort(round(c(seq(.5, 3, l=p1/2), seq(-1, -2, l=p1/2)), 2))
sigma <- 1
coef <- c(coef.veri, rep(0, p-p1))
X <- matrix(rnorm(n*p), n, p)
mu <- drop(X%*%coef)
y <- mu + rnorm(n, 0,sigma)
o <- islasso.path(y ~ ., data = data.frame(y = y, X),
family = gaussian())
temp <- GoF.islasso.path(o)
lambda <- temp$lambda.min["BIC"]
o <- islasso(y ~ ., data = data.frame(y = y, X),
family = gaussian(), lambda = lambda)
o
summary(o, pval = .05)
Optimization for the selection of the tuning parameter
Description
This function extracts the value of the tuning parameter which minimizes the AIC/BIC/AICc/eBIC/GCV/GIC criterion in “islasso.path
”.
Usage
GoF.islasso.path(object, plot = TRUE, ...)
Arguments
object |
a fitted model object of class |
plot |
a logical flag indicating if each criterion have to be plotted |
... |
further arguments passed to or from other methods. |
Details
Minimization of the Akaike Information Criterion (AIC), or Bayesian Information Criterion (BIC) or several other criteria are sometimes employed to select the tuning parameter as an alternative to the cross validation. The model degrees of freedom (not necessarly integers as in the plain lasso) used in all methods are computed as trace of the hat matrix at convergence.
Value
A list of
gof |
the goodness of fit measures |
minimum |
the position of the optimal lambda values |
lambda.min |
the optimal lambda values |
Author(s)
Maintainer: Gianluca Sottile <gianluca.sottile@unipa.it>
See Also
islasso.path
, islasso.path.fit
, coef.islasso.path
, residuals.islasso.path
, summary.islasso.path
, logLik.islasso.path
, fitted.islasso.path
, predict.islasso.path
and deviance.islasso.path
methods.
Examples
set.seed(1)
n <- 100
p <- 30
p1 <- 10 #number of nonzero coefficients
coef.veri <- sort(round(c(seq(.5, 3, l=p1/2), seq(-1, -2, l=p1/2)), 2))
sigma <- 1
coef <- c(coef.veri, rep(0, p-p1))
X <- matrix(rnorm(n*p), n, p)
mu <- drop(X%*%coef)
y <- mu + rnorm(n, 0, sigma)
o <- islasso.path(y ~ ., data = data.frame(y = y, X))
GoF.islasso.path(o)
Prostate Cancer Data
Description
These data come from a study that examined the correlation between the level of prostate specific antigen and a number of clinical measures in men who were about to receive a radical prostatectomy. It is data frame with 97 rows and 9 columns.
Usage
data(Prostate)
Format
The data frame has the following components:
lcavol
log(cancer volume)
lweight
log(prostate weight)
age
age
lbph
log(benign prostatic hyperplasia amount)
svi
seminal vesicle invasion
lcp
log(capsular penetration)
gleason
Gleason score
pgg45
percentage Gleason scores 4 or 5
lpsa
log(prostate specific antigen)
Source
Stamey, T.A., Kabalin, J.N., McNeal, J.E., Johnstone, I.M., Freiha,
F., Redwine, E.A. and Yang, N. (1989)
Prostate specific antigen in the diagnosis and treatment of
adenocarcinoma of the prostate: II. radical prostatectomy treated
patients,
Journal of Urology 141(5), 1076–1083.
Optimization for the selection of the tuning parameter
Description
This function performs a minimization of the AIC/BIC criterion for selecting the tuning parameter in “islasso
”.
Usage
aic.islasso(object, method = c("AIC", "BIC", "AICc", "GCV", "GIC"),
interval, g = 0, y, X,
intercept = FALSE, family = gaussian(), alpha = 1, offset,
weights, unpenalized, control = is.control(), trace = TRUE)
Arguments
object |
a fitted model object of class "islasso". |
method |
the criterion to optimize, AIC, BIC, AICc, GCV, GIC. |
interval |
the lower and upper limits of |
g |
a value belonging to the interval [0, 1]. Classical BIC is returned by letting g = 0 (default value), whereas extended BIC corresponds to the case g = 0.5. |
y |
if |
X |
if |
intercept |
if |
family |
if |
alpha |
The elasticnet mixing parameter, with
|
offset |
this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length equal to the number of cases. |
weights |
observation weights. Default is 1 for each observation. |
unpenalized |
a vector used to specify the unpenalized estimators; unpenalized has to be a vector of logicals. |
control |
a list of parameters for controlling the fitting process (see |
trace |
Should the iterative procedure be printed? TRUE is the default value. |
Details
Minimization of the Akaike Information Criterion (AIC), or Bayesian Information Criterion (BIC) or several other criteria are sometimes employed to select the tuning parameter as an alternative to the cross validation. The model degrees of freedom (not necessarly integers as in the plain lasso) used in all methods are computed as trace of the hat matrix at convergence.
Value
the optimal lambda value is returned
Author(s)
Maintainer: Gianluca Sottile <gianluca.sottile@unipa.it>
See Also
islasso.fit
, summary.islasso
, residuals.islasso
, logLik.islasso
, predict.islasso
and deviance.islasso
methods.
Examples
set.seed(1)
n <- 100
p <- 100
p1 <- 20 #number of nonzero coefficients
coef.veri <- sort(round(c(seq(.5, 3, l=p1/2), seq(-1, -2, l=p1/2)), 2))
sigma <- 1
coef <- c(coef.veri, rep(0, p-p1))
X <- matrix(rnorm(n*p), n, p)
mu <- drop(X%*%coef)
y <- mu + rnorm(n, 0, sigma)
o <- islasso(y ~ ., data = data.frame(y = y, X))
## Not run:
#use the evaluation interval of the fit
lambda_aic <- aic.islasso(o, method = "AIC")
#overwrites the evaluation interval for lambda
lambda_bic <- aic.islasso(o, interval = c(.1, 30), method = "BIC")
#overwrites the evaluation interval for lambda using eBIC criterion
lambda_ebic <- aic.islasso(o, interval = c(.1, 30), method = "BIC", g = .5)
## End(Not run)
General Linear Combination method for islasso
objects
Description
General linear hypotheses and confidence intervals estimation for linear combinantions of the regression coefficients in islasso
fits
Usage
## S3 method for class 'islasso'
anova(object, A, b = NULL, ci, ...)
Arguments
object |
a fitted model object of class "islasso". |
A |
matrix (or vector) giving linear combinations of coefficients by rows, or a character vector giving the hypothesis in symbolic form (see Details). |
b |
right-hand-side vector for hypothesis, with as many entries as rows in the hypothesis matrix A; can be omitted, in which case it defaults to a vector of zeroes. |
ci |
optionally, a two columns matrix of estimated confidence intervals for the estimated coefficients. |
... |
not used. |
Details
For the islasso regression model with coefficients \beta
, the null hypothesis is
H_0:A\beta=b
where A and b are known matrix and vector.
The hypothesis matrix A can be supplied as a numeric matrix (or vector), the rows of which specify linear combinations of the model coefficients, which are tested equal to the corresponding entries in the right-hand-side vector b, which defaults to a vector of zeroes.
Alternatively, the hypothesis can be specified symbolically as a character vector with one or more elements, each of which gives either a linear combination of coefficients, or a linear equation in the coefficients (i.e., with both a left and right side separated by an equals sign). Components of a linear expression or linear equation can consist of numeric constants, or numeric constants multiplying coefficient names (in which case the number precedes the coefficient, and may be separated from it by spaces or an asterisk); constants of 1 or -1 may be omitted. Spaces are always optional. Components are separated by plus or minus signs. Newlines or tabs in hypotheses will be treated as spaces. See the examples below.
Value
An object of class "anova.islasso" which contains the estimates, the standard errors, the Wald statistics and corresponding p value of each linear combination and of the restriced model.
Author(s)
The main function of the same name was inspired by the R function previously implemented by Vito MR Muggeo.
Maintainer: Gianluca Sottile <gianluca.sottile@unipa.it>
Examples
set.seed(1)
n <- 100
p <- 100
p1 <- 10 #number of nonzero coefficients
coef.true <- sort(round(c(seq(.5, 3, l=p1/2), seq(-1, -2, l=p1/2)), 2))
sigma <- 1
coef <- c(coef.true, rep(0, p-p1))
X <- matrix(rnorm(n*p), n, p)
eta <- drop(X %*% coef)
mu <- eta
y <- mu + rnorm(n, 0, sigma)
o <- islasso(y ~ . - 1, data = data.frame(y = y, X),
family = gaussian())
anova(o, A = diag(p), b = coef)
anova(o, A = c("X1 + X2 + X3 + X4 + X5 = -7.5"))
anova(o, A = c("X1 + X2 + X3 + X4 + X5 = 0"))
anova(o, A = c("X6 + X7 + X8 + X9 + X10"), b = 8.75)
anova(o, A = c("X6 + X7 + X8 + X9 + X10"), b = 0)
anova(o, A = c("X1 + X2 + X3 + X4 + X5 = -7.5",
"X6 + X7 + X8 + X9 + X10 = 8.75"))
anova(o, A = c("X1 + X2 + X3 + X4 + X5",
"X6 + X7 + X8 + X9 + X10"), b = c(-7.5, 8.75))
anova(o, A = c("X1 + X2 + X3 + X4 + X5",
"X6 + X7 + X8 + X9 + X10"))
Breast Cancer microarray experiment
Description
This data set details microarray experiment for 52 breast cancer patients. The binary variable status
is used to indicate whether or not the patient has died of breast cancer (status = 0
= did not die of breast cancer, status = 1
= died of breast cancer). The other variables contain the amplification or deletion of the considered genes.
Rather than measuring gene expression, this experiment aims to measure gene amplification or deletion, which refers to the number of copies of a particular DNA sequence within the genome. The aim of the experiment is to find out the key genomic factors involved in agressive and non-agressive forms of breast cancer.
The experiment was conducted by the Dr.\ John Bartlett and Dr.\ Caroline Witton in the Division of Cancer Sciences and Molecular Pathology of the University of Glasgow at the city's Royal Infirmary.
Usage
data(breast)
Source
Dr. John Bartlett and Dr. Caroline Witton, Division of Cancer Sciences and Molecular Pathology, University of Glasgow, Glasgow Royal Infirmary.
References
Augugliaro L., Mineo A.M. and Wit E.C. (2013) dgLARS: a differential geometric approach to sparse generalized linear models, Journal of the Royal Statistical Society. Series B., Vol 75(3), 471-498.
Wit E.C. and McClure J. (2004) "Statistics for Microarrays: Design, Analysis and Inference" Chichester: Wiley.
confint method for islasso
objects
Description
confint method for islasso
objects
Usage
## S3 method for class 'islasso'
confint(object, parm, level = 0.95,
type.ci = "wald", trace = TRUE, ...)
Arguments
object |
a fitted model object of class "islasso". |
parm |
a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered. |
level |
the confidence level required. |
type.ci |
Only Wald-type confidence intervals are implemented yet! type.ci = "wald" estimates and standard errors are used to build confidence interval |
trace |
if TRUE (default) a bar shows the iterations status. |
... |
additional argument(s) for methods. |
Author(s)
Maintainer: Gianluca Sottile <gianluca.sottile@unipa.it>
See Also
islasso.fit
, summary.islasso
, residuals.islasso
, logLik.islasso
, predict.islasso
and deviance.islasso
methods.
Examples
set.seed(1)
n <- 100
p <- 100
p1 <- 10 #number of nonzero coefficients
coef.veri <- sort(round(c(seq(.5, 3, l=p1/2), seq(-1, -2, l=p1/2)), 2))
sigma <- 1
coef <- c(coef.veri, rep(0, p-p1))
X <- matrix(rnorm(n*p), n, p)
eta <- drop(X%*%coef)
##### gaussian ######
mu <- eta
y <- mu + rnorm(n, 0, sigma)
o <- islasso(y ~ ., data = data.frame(y = y, X),
family = gaussian())
ci <- confint(o, type.ci = "wald", parm = 1:10)
ci
plot(ci)
Blood and other measurements in diabetics
Description
The diabetes
data frame has 442 rows and 3 columns.
These are the data used in the Efron et al "Least Angle Regression" paper.
Format
This data frame contains the following columns:
- x
a matrix with 10 columns
- y
a numeric vector
- x2
a matrix with 64 columns
Details
The x matrix has been standardized to have unit L2 norm in each column and zero mean. The matrix x2 consists of x plus certain interactions.
Source
https://web.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.ps
References
Efron, Hastie, Johnstone and Tibshirani (2003) "Least Angle Regression" (with discussion) Annals of Statistics
Auxiliary for controlling islasso model fitting
Description
Auxiliary function for controlling the islasso model fitting.
Usage
is.control(sigma2 = -1, tol = 1E-05, itmax = 1E+3, stand = TRUE,
trace = 0, nfolds = 5, seed = NULL, adaptive = FALSE, g = .5,
b0 = NULL, V0 = NULL, c = .5)
Arguments
sigma2 |
optional. The fixed value of dispersion parameter. If -1 (default) it is estimated from the data |
tol |
tollerance value to declare convergence, dafault to 1e-5 |
itmax |
maximum number of iterations, default to 1000 |
stand |
if TRUE (default), the covariates are standardized prior to fitting the model. However the coefficients are always returned on the original scale. |
trace |
Should the iterative procedure be printed? 0: no printing, 1 = compact printing, 2 = enlarged printing, 3 = compact printing including Fisher scoring information (only used in glm family). |
nfolds |
if |
seed |
optional, the seed to be used to split the dataframe and to perform cross validation. Useful to make reproducible the results. |
adaptive |
experimental, if TRUE the adaptive LASSO is implemented. |
g |
a value belonging to the interval [0, 1]. Classical BIC is returned by letting g = 0 (default value), whereas extended BIC corresponds to the case g = 0.5. |
b0 |
optional, starting values for the regression coefficients. If NULL, the point estimates from |
V0 |
optional, starting value for the estimates covariance matrix, If NULL, the identity matrix is used. |
c |
the weight of the mixture in the induced smoothed lasso, the default is |
Author(s)
Maintainer: Gianluca Sottile <gianluca.sottile@unipa.it>
The Induced Smoothed lasso
Description
islasso
is used to fit lasso regression models wherein the nonsmooth L_1
norm penalty is replaced by a smooth approximation justified under the induced smoothing paradigm. Simple lasso-type or elastic-net penalties are permitted and Linear, Logistic, Poisson and Gamma responses are allowed.
Usage
islasso(formula, family = gaussian, lambda, alpha = 1, data, weights, subset,
offset, unpenalized, contrasts = NULL, control = is.control())
Arguments
formula |
an object of class “formula” (or one that can be coerced to that class): the ‘usual’ symbolic description of the model to be fitted. |
family |
the assumed response distribution. Gaussian, (quasi) Binomial, (quasi) Poisson, and Gamma are allowed. |
lambda |
Value of the tuning parameter in the objective. If missing, the optimal lambda is computed using |
alpha |
The elastic-net mixing parameter, with
|
data |
an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which |
weights |
observation weights. Default is 1 for each observation. |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
offset |
this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length equal to the number of cases. |
unpenalized |
optional. A vector of integers or characters indicating any covariate (in the formula) with coefficients not to be penalized. The intercept, if included in the model, is always unpenalized. |
contrasts |
an optional list. See the contrasts.arg of |
control |
a list of parameters for controlling the fitting process (see |
Details
islasso
estimates regression models by imposing a lasso-type penalty on some or all regression coefficients. However the nonsmooth L_1
norm penalty is replaced by a smooth approximation justified under the induced smoothing paradigm. The advantage is that reliable standard errors are returned as model output and hypothesis testing on linear combinantions of the regression parameters can be carried out straightforwardly via the Wald statistic. Simulation studies provide evidence that the proposed approach controls type-I errors and exhibits good power in different scenarios.
Value
A list of
coefficients |
a named vector of coefficients |
se |
a named vector of standard errors |
residuals |
the working residuals |
fitted.values |
the fitted values |
rank |
the estimated degrees of freedom |
family |
the family object used |
linear.predictors |
the linear predictors |
deviance |
the family deviance |
aic |
the Akaike Information Criterion |
null.deviance |
the family null deviance |
iter |
the number of iterations of IWLS used |
weights |
the working weights, that is the weights in the final iteration of the IWLS fit |
df.residual |
the residual degrees of freedom |
df.null |
the degrees of freedom of a null model |
converged |
logical. Was the IWLS algorithm judged to have converged? |
model |
if requested (the default), the model frame used. |
call |
the matched call |
formula |
the formula supplied |
terms |
the terms object used |
data |
he data argument. |
offset |
the offset vector used. |
control |
the value of the control argument used |
xlevels |
(where relevant) a record of the levels of the factors used in fitting. |
lambda |
the lambda value used in the islasso algorithm |
alpha |
the elasticnet mixing parameter |
dispersion |
the estimated dispersion parameter |
internal |
internal elements |
contrasts |
(only where relevant) the contrasts used. |
Author(s)
The main function of the same name was inspired by the R function previously implemented by Vito MR Muggeo.
Maintainer: Gianluca Sottile <gianluca.sottile@unipa.it>
References
Cilluffo, G, Sottile, G, S, La Grutta, S and Muggeo, VMR (2019). The Induced Smoothed lasso: A practical framework for hypothesis testing in high dimensional regression. Statistical Methods in Medical Research, DOI: 10.1177/0962280219842890.
Sottile, G, Cilluffo, G, Muggeo, VMR (2019). The R package islasso: estimation and hypothesis testing in lasso regression. Technical Report on ResearchGate. doi:10.13140/RG.2.2.16360.11521.
See Also
islasso.fit
, summary.islasso
, residuals.islasso
, logLik.islasso
, predict.islasso
and deviance.islasso
methods.
Examples
set.seed(1)
n <- 100
p <- 100
p1 <- 10 #number of nonzero coefficients
coef.veri <- sort(round(c(seq(.5, 3, l=p1/2), seq(-1, -2, l=p1/2)), 2))
sigma <- 1
coef <- c(coef.veri, rep(0, p-p1))
X <- matrix(rnorm(n*p), n, p)
eta <- drop(X%*%coef)
##### gaussian ######
mu <- eta
y <- mu + rnorm(n, 0, sigma)
o <- islasso(y ~ ., data = data.frame(y = y, X),
family = gaussian())
o
summary(o)
coef(o)
fitted(o)
predict(o, type="response")
plot(o)
residuals(o)
deviance(o)
AIC(o)
logLik(o)
## Not run:
# for the interaction
o <- islasso(y ~ X1 * X2, data = data.frame(y = y, X),
family = gaussian())
##### binomial ######
coef <- c(c(1,1,1), rep(0, p-3))
X <- matrix(rnorm(n*p), n, p)
eta <- drop(cbind(1, X)%*%c(-1, coef))
mu <- binomial()$linkinv(eta)
y <- rbinom(n, 100, mu)
y <- cbind(y, 100-y)
o <- islasso(cbind(y1, y2) ~ .,
data = data.frame(y1 = y[,1], y2 = y[,2], X),
family = binomial())
summary(o, pval = .05)
##### poisson ######
coef <- c(c(1,1,1), rep(0, p-3))
X <- matrix(rnorm(n*p), n, p)
eta <- drop(cbind(1, X)%*%c(1, coef))
mu <- poisson()$linkinv(eta)
y <- rpois(n, mu)
o <- islasso(y ~ ., data = data.frame(y = y, X),
family = poisson())
summary(o, pval = .05)
##### Gamma ######
coef <- c(c(1,1,1), rep(0, p-3))
X <- matrix(rnorm(n*p), n, p)
eta <- drop(cbind(1, X)%*%c(-1, coef))
mu <- Gamma(link="log")$linkinv(eta)
shape <- 10
phi <- 1 / shape
y <- rgamma(n, scale = mu / shape, shape = shape)
o <- islasso(y ~ ., data = data.frame(y = y, X),
family = Gamma(link = "log"))
summary(o, pval = .05)
## End(Not run)
Internal Functions
Description
Internal islasso functions.
Usage
fmt.perc(probs, digits)
qqNorm(x, probs=seq(.005, .995, l=200), centre=FALSE, scale=FALSE,
leg=TRUE, mean=0, sd=1, add=FALSE, p.col=1, dF=FALSE, ylab, ...)
ginv2(X, tol = sqrt(.Machine$double.eps))
modelX(n, p, rho=.5, scale=TRUE)
printCoefmat2(x, digits = max(3L, getOption("digits") - 2L),
signif.stars = getOption("show.signif.stars"),
signif.legend = signif.stars, dig.tst = max(1L, min(5L, digits - 1L)),
cs.ind = 1:k, tst.ind = k + 1, zap.ind = integer(), P.values = NULL,
has.Pvalue = nc >= 4L && length(cn <- colnames(x)) &&
substr(cn[nc], 1L, 3L) %in% c("Pr(", "p-v"),
eps.Pvalue = .Machine$double.eps, na.print = "NA", quote = FALSE,
right = TRUE, ...)
.checkinput(X, y, family, alpha, intercept, weights, offset,
unpenalized, control)
.startpoint(X, y, lambda, alpha, weights, offset, mustart,
family, intercept, setting)
.islasso(prep, start, Lambda, fam, link)
checkinput.islasso.path(X, y, family, lambda, nlambda, lambda.min.ratio,
alpha, intercept, weights, offset, unpenalized, control)
startpoint.islasso.path(X, y, lambda, alpha, weights, offset, mustart,
family, intercept, setting)
islasso.path.fit.glm(prep, start, lambda, fam, link)
islasso.path.fitter2(b, se, c, fix.c, l, alpha, x, y, weights, sigma2,
cov.unscaled, n, intercept, tol, unpenalized, offset, family, maxIter)
interpolate(y1, y2, x1, x2, x.new)
makeHyp(cnames, hypothesis, rhs = NULL)
printHyp(L, b, nms)
cislasso(object, a, ci)
ci.fitted.islasso(object, newx, ci = NULL, type.ci = "wald",
conf.level=.95, only.ci = FALSE)
grad.islasso(object, ...)
hess.islasso(object, ...)
is.influence(model, do.coef = TRUE)
islasso.diag(glmfit)
islasso.diag.plots(glmfit, glmdiag = islasso.diag(glmfit),
subset = NULL, iden = FALSE, labels = NULL, ret = FALSE)
predislasso(object, newdata, type = c("response", "terms"),
terms = NULL, na.action = na.pass, ...)
Details
These functions are not intended for users.
Author(s)
Gianluca Sottile (gianluca.sottile@unipa.it)
The Induced Smoothed lasso path
Description
islasso.path
is used to fit a generalized linear model via induced smoothed lasso method. The regularization path is computed for the lasso or elasticnet penalty at a grid of values for the regularization parameter lambda. Fits linear, logistic, poisson and gamma regression models.
Usage
islasso.path(formula, family = gaussian(), lambda = NULL, nlambda = 100,
lambda.min.ratio = ifelse(nobs < nvars, 1E-2, 1E-03), alpha = 1, data,
weights, subset, offset, contrasts = NULL, unpenalized, control = is.control())
Arguments
formula |
an object of class “formula” (or one that can be coerced to that class): the ‘usual’ symbolic description of the model to be fitted. |
family |
the assumed response distribution. Gaussian, (quasi) Binomial, (quasi) Poisson, and Gamma are allowed. |
lambda |
A user supplied lambda sequence. Typical usage is to have the program compute its own lambda sequence based on nlambda and lambda.min.ratio. Supplying a value of lambda overrides this. |
nlambda |
The number of lambda values - default is 100. |
lambda.min.ratio |
Smallest value for lambda, as a fraction of lambda.max, the (data derived) entry value (i.e. the smallest value for which all coefficients are zero). The default depends on the sample size |
alpha |
The elastic-net mixing parameter, with
|
data |
an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which |
weights |
observation weights. Default is 1 for each observation. |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
offset |
this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length equal to the number of cases. |
contrasts |
an optional list. See the contrasts.arg of |
control |
a list of parameters for controlling the fitting process (see |
unpenalized |
optional. A vector of integers or characters indicating any covariate (in the formula) with coefficients not to be penalized. The intercept, if included in the model, is always unpenalized. |
Details
The sequence of models implied by lambda is fit the islasso method. islasso
estimates regression models by imposing a lasso-type penalty on some or all regression coefficients. However the nonsmooth L_1
norm penalty is replaced by a smooth approximation justified under the induced smoothing paradigm. The advantage is that reliable standard errors are returned as model output and hypothesis testing on linear combinantions of the regression parameters can be carried out straightforwardly via the Wald statistic. Simulation studies provide evidence that the proposed approach controls type-I errors and exhibits good power in different scenarios.
Value
A list of
call |
the matched call. |
Info |
a named matrix containing information about lambda values, estimated degrees of freedom, estimated dispersion parameters, deviance, loglikelhood, number of iterations and convergence criteria. |
GoF |
a named matrix containing information criteria, i.e., AIC, BIC, AICc, eBIC, GCV, GIC. |
Coef |
a |
SE |
a |
Weights |
a |
Linear.predictors |
a |
Fitted.values |
a |
Residuals |
a |
Input |
a named list containing several input arguments, i.e., the numbers of observations and predictors, if an intercept ha to be estimated, the model matrix and the response vector, the observation weights, the offset, the family object used, The elasticnet mixing parameter and the vector used to specify the unpenalized estimators. |
control |
the value of the control argument used. |
formula |
the formula supplied. |
model |
if requested (the default), the model frame used. |
terms |
the terms object used. |
data |
he data argument. |
xlevels |
(where relevant) a record of the levels of the factors used in fitting. |
contrasts |
(only where relevant) the contrasts used. |
Author(s)
Maintainer: Gianluca Sottile <gianluca.sottile@unipa.it>
References
Cilluffo, G, Sottile, G, S, La Grutta, S and Muggeo, VMR (2019). The Induced Smoothed lasso: A practical framework for hypothesis testing in high dimensional regression. Statistical Methods in Medical Research, DOI: 10.1177/0962280219842890.
Sottile, G, Cilluffo, G, Muggeo, VMR (2019). The R package islasso: estimation and hypothesis testing in lasso regression. Technical Report on ResearchGate. doi:10.13140/RG.2.2.16360.11521.
See Also
islasso.path.fit
, coef.islasso.path
, summary.islasso.path
, residuals.islasso.path
, GoF.islasso.path
, logLik.islasso.path
, fitted.islasso.path
, predict.islasso.path
and deviance.islasso.path
methods.
Examples
set.seed(1)
n <- 100
p <- 30
p1 <- 10 #number of nonzero coefficients
coef.veri <- sort(round(c(seq(.5, 3, l=p1/2), seq(-1, -2, l=p1/2)), 2))
sigma <- 1
coef <- c(coef.veri, rep(0, p-p1))
X <- matrix(rnorm(n*p), n, p)
eta <- drop(X%*%coef)
##### gaussian ######
mu <- eta
y <- mu + rnorm(n, 0, sigma)
o <- islasso.path(y ~ ., data = data.frame(y = y, X),
family = gaussian(), nlambda = 30L)
o
summary(o, lambda = 10)
coef(o, lambda = 10)
fitted(o, lambda = 10)
predict(o, type="response", lambda = 10)
plot(o, xvar = "coef")
residuals(o, lambda = 10)
deviance(o, lambda = 10)
logLik(o, lambda = 10)
GoF.islasso.path(o)
## Not run:
##### binomial ######
coef <- c(c(1,1,1), rep(0, p-3))
X <- matrix(rnorm(n*p), n, p)
eta <- drop(cbind(1, X)%*%c(-1, coef))
mu <- binomial()$linkinv(eta)
y <- rbinom(n, 100, mu)
y <- cbind(y, 100-y)
o <- islasso.path(cbind(y1, y2) ~ .,
data = data.frame(y1 = y[,1], y2 = y[,2], X),
family = binomial(), nlambda = 30L)
temp <- GoF.islasso.path(o)
summary(o, pval = .05, lambda = temp$lambda.min["BIC"])
##### poisson ######
coef <- c(c(1,1,1), rep(0, p-3))
X <- matrix(rnorm(n*p), n, p)
eta <- drop(cbind(1, X)%*%c(1, coef))
mu <- poisson()$linkinv(eta)
y <- rpois(n, mu)
o <- islasso.path(y ~ ., data = data.frame(y = y, X),
family = poisson(), nlambda = 30L)
temp <- GoF.islasso.path(o)
summary(o, pval = .05, lambda = temp$lambda.min["BIC"])
##### Gamma ######
coef <- c(c(1,1,1), rep(0, p-3))
X <- matrix(rnorm(n*p), n, p)
eta <- drop(cbind(1, X)%*%c(-1, coef))
mu <- Gamma(link="log")$linkinv(eta)
shape <- 10
phi <- 1 / shape
y <- rgamma(n, scale = mu / shape, shape = shape)
o <- islasso.path(y ~ ., data = data.frame(y = y, X),
family = Gamma(link = "log"), nlambda = 30L)
temp <- GoF.islasso.path(o)
summary(o, pval = .05, lambda = temp$lambda.min["BIC"])
## End(Not run)
Diagnostics plots for Induced Smoothing Lasso Model
Description
Diagnostics plots for Induced Smoothing Lasso Model
Usage
## S3 method for class 'islasso'
plot(x, ...)
Arguments
x |
an object of class |
... |
other graphical parameters for the plot |
Details
The plot on the top left is a plot of the standard deviance residuals against the fitted values. The plot on the top right is a normal QQ plot of the standardized deviance residuals. The red line is the expected line if the standardized residuals are normally distributed, i.e. it is the line with intercept 0 and slope 1. The bottom two panels are plots of link and variance functions. On the left is squared standardized Pearson residuals against the fitted values. On the right working vector against the linear predictor.
Author(s)
Maintainer: Gianluca Sottile <gianluca.sottile@unipa.it>
See Also
islasso.fit
, summary.islasso
, residuals.islasso
, logLik.islasso
, predict.islasso
and deviance.islasso
methods.
Examples
## Not run:
set.seed(1)
n <- 100
p <- 100
p1 <- 20 #number of nonzero coefficients
coef.veri <- sort(round(c(seq(.5, 3, l=p1/2), seq(-1, -2, l=p1/2)), 2))
sigma <- 1
coef <- c(coef.veri, rep(0, p-p1))
X <- matrix(rnorm(n*p), n, p)
mu <- drop(X%*%coef)
y <- mu + rnorm(n, 0,sigma)
lambda <- 2
o <- islasso(y ~ ., data = data.frame(y = y, X),
family = gaussian(), lambda = lambda)
plot(o)
## End(Not run)
plot coefficient profile from a fitted "islasso.path" object.
Description
Produces a coefficient profile plot of the coefficient paths for a fitted "islasso.path" object.
Usage
## S3 method for class 'islasso.path'
plot(x,
yvar = c("coefficients", "se", "gradient", "weight", "gof"),
gof = c("none", "AIC", "BIC", "AICc", "eBIC", "GCV", "GIC"),
label = FALSE, legend = FALSE, ...)
Arguments
x |
an object of class |
yvar |
What is on the Y-axis. "coef" plot the log-lambda sequence against the coefficients; "se" plot the log-lambda sequence against the standard deviations; "gradient" plot the log-lambda sequence against the gradient; "weight" plot the log-lambda sequence against the mixture weight of the islasso method; "gof" plot the log-lambda sequence against the chosen criterion. |
gof |
the chosen criterion to highlight the active variables. "none" doesn't highlight active variables. |
label |
a logical flag indicating if some labels have to be added. |
legend |
a logical flag indicating if legend has to be shown. |
... |
other graphical parameters for the plot, i.e., main, xlab, ylab, xlim, ylim, lty, col, lwd, cex.axis, cex.lab, cex.main, gof_lty, gof_col and gof_lwd. The last three parameters are used to modify aspects of the legend, and of the goodness of fit measure used. |
Details
A coefficient profile plot is produced for Induced Smoothing Lasso Model path.
Examples
## Not run:
set.seed(1)
n <- 100
p <- 30
p1 <- 10 #number of nonzero coefficients
coef.veri <- sort(round(c(seq(.5, 3, l=p1/2), seq(-1, -2, l=p1/2)), 2))
sigma <- 1
coef <- c(coef.veri, rep(0, p-p1))
X <- matrix(rnorm(n*p), n, p)
mu <- drop(X%*%coef)
y <- mu + rnorm(n, 0,sigma)
o <- islasso.path(y ~ ., data = data.frame(y = y, X),
family = gaussian())
par(mfrow = c(2, 2))
plot(o, yvar = "coefficients", gof = "AICc", label = TRUE)
plot(o, yvar = "se", gof = "AICc")
plot(o, yvar = "gradient", gof = "AICc")
plot(o, yvar = "gof", gof = "AICc")
## End(Not run)
Prediction method for islasso fitted objects
Description
Prediction method for islasso fitted objects
Usage
## S3 method for class 'islasso'
predict(object, newdata = NULL,
type = c("link", "response", "coefficients", "class", "terms"),
se.fit = FALSE, ci = NULL, type.ci = "wald",
level = .95, terms = NULL, na.action = na.pass, ...)
Arguments
object |
a fitted object of class "islasso". |
newdata |
optionally, a data frame in which to look for variables with which to predict. If omitted, the fitted linear predictors are used. |
type |
the type of prediction required. The default is on the scale of the linear predictors; the alternative "response" is on the scale of the response variable.
Thus for a default binomial model the default predictions are of log-odds (probabilities on logit scale) and type = "response" gives the predicted probabilities. The |
se.fit |
logical switch indicating if confidence intervals are required. |
ci |
optionally, a two columns matrix of estimated confidence intervals for the estimated coefficients. |
type.ci |
Only Wald-type confidence intervals are implemented yet! type.ci = "wald" estimates and standard errors are used to build confidence interval |
level |
the confidence level required. |
terms |
with type = "terms" by default all terms are returned. A character vector specifies which terms are to be returned. |
na.action |
function determining what should be done with missing values in newdata. The default is to predict NA. |
... |
further arguments passed to or from other methods. |
Value
An object depending on the type argument
Author(s)
Maintainer: Gianluca Sottile <gianluca.sottile@unipa.it>
See Also
islasso.fit
, summary.islasso
, residuals.islasso
, logLik.islasso
, predict.islasso
and deviance.islasso
methods.
Examples
set.seed(1)
n <- 100
p <- 100
p1 <- 20 #number of nonzero coefficients
coef.veri <- sort(round(c(seq(.5, 3, l=p1/2), seq(-1, -2, l=p1/2)), 2))
sigma <- 1
coef <- c(coef.veri, rep(0, p-p1))
X <- matrix(rnorm(n*p), n, p)
mu <- drop(X%*%coef)
y <- mu + rnorm(n, 0,sigma)
lambda <- 2
o <- islasso(y ~ ., data = data.frame(y = y, X), lambda = lambda)
predict(o, type = "response")
Prediction method for islasso.path fitted objects
Description
Prediction method for islasso fitted objects
Usage
## S3 method for class 'islasso.path'
predict(object, newdata, type = c("link", "response",
"coefficients", "class"), lambda, ...)
Arguments
object |
a fitted object of class |
newdata |
optionally, a data frame in which to look for variables with which to predict. If omitted, the fitted linear predictors are used. |
type |
the type of prediction required. The default is on the scale of the linear predictors; the alternative "response" is on the scale of the response variable.
Thus for a default binomial model the default predictions are of log-odds (probabilities on logit scale) and type = "response" gives the predicted probabilities. The |
lambda |
Value(s) of the penalty parameter lambda at which predictions are required. Default is the entire sequence used to create the model. |
... |
further arguments passed to or from other methods. |
Value
An object depending on the type argument
Author(s)
Maintainer: Gianluca Sottile <gianluca.sottile@unipa.it>
See Also
islasso.path
, islasso.path.fit
, coef.islasso.path
, residuals.islasso.path
, GoF.islasso.path
, logLik.islasso.path
, fitted.islasso.path
, summary.islasso.path
and deviance.islasso.path
methods.
Examples
## Not run:
set.seed(1)
n <- 100
p <- 30
p1 <- 10 #number of nonzero coefficients
coef.veri <- sort(round(c(seq(.5, 3, l=p1/2), seq(-1, -2, l=p1/2)), 2))
sigma <- 1
coef <- c(coef.veri, rep(0, p-p1))
X <- matrix(rnorm(n*p), n, p)
mu <- drop(X%*%coef)
y <- mu + rnorm(n, 0,sigma)
o <- islasso.path(y ~ ., data = data.frame(y = y, X),
family = gaussian())
temp <- GoF.islasso.path(o)
predict(o, type = "response", lambda = temp$lambda.min)
## End(Not run)
Simulate model matrix and response
Description
Simulate model matrix and response from a specified distribution.
Usage
simulXy(n, p, interc = 0, beta, family = gaussian(), prop =
0.1, lim.b = c(-3, 3), sigma = 1, size = 1, rho = 0,
scale = TRUE, seed, X)
Arguments
n |
number of observations. |
p |
total number of covariates in the model matrix. |
interc |
the model intercept. |
beta |
the vector of p coefficients in the linear predictor. |
family |
a description of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. Only gaussian, binomial or poisson are allowed. |
prop |
if |
lim.b |
if |
sigma |
if family is 'gaussian', the standard deviation of the response. The default is 1. |
size |
if family is 'binomial', the number of trials to build the response vector. The default is 1. |
rho |
correlation value to define the variance covariance matrix to build the model matrix, i.e., rho^|i-j| i,j = 1,...,p and i different from j. The default is 0. |
scale |
Should the columns of the mdoel matrix be scaled? The default is TRUE |
seed |
optional, the seed to generate the data. |
X |
optional, the model matrix. |
Examples
n <- 100
p <- 100
beta <- c(runif(10, -3, 3), rep(0, p-10))
dat <- simulXy(n, p, beta = beta, seed=1234)
summary method for islasso fitted objects
Description
summary method for islasso fitted objects
Usage
## S3 method for class 'islasso'
summary(object, pval = 1, which, use.t = FALSE,
type.pval = "wald", ...)
Arguments
object |
fitted |
pval |
a threshold p-value value indicating which coefficients should be printed. If |
which |
a specification of which parameters are to be given p-values. If missing, all parameters are considered. |
use.t |
if |
type.pval |
Only Wald-type confidence intervals are implemented yet! type.pval = "wald" (default) estimates and standard errors are used to build confidence interval |
... |
not used |
Author(s)
Maintainer: Gianluca Sottile <gianluca.sottile@unipa.it>
See Also
islasso.fit
, summary.islasso
, residuals.islasso
, logLik.islasso
, predict.islasso
and deviance.islasso
methods.
Examples
## Not run:
#continues example from ?islasso
summary(o, pval = .1) #print just the "borderline" significant coefficients
## End(Not run)
summary method for islasso.path fitted objects
Description
summary method for islasso.path fitted objects
Usage
## S3 method for class 'islasso.path'
summary(object, pval = 1, use.t = FALSE, lambda, ...)
Arguments
object |
fitted |
pval |
a threshold p-value value indicating which coefficients should be printed. If |
use.t |
if |
lambda |
Value of the penalty parameter lambda at which summary are required. |
... |
not used |
Author(s)
Maintainer: Gianluca Sottile <gianluca.sottile@unipa.it>
See Also
islasso.path
, islasso.path.fit
, coef.islasso.path
, residuals.islasso.path
, GoF.islasso.path
, logLik.islasso.path
, fitted.islasso.path
, predict.islasso.path
and deviance.islasso.path
methods.
Examples
## Not run:
#continues example from ?islasso.path
summary(o, pval = .1, lambda = 5) #print just the "borderline" significant coefficients
## End(Not run)