Title: | Focused Information Criteria for Model Comparison |
Version: | 1.0.1 |
Date: | 2025-03-24 |
Description: | Compares how well different models estimate a quantity of interest (the "focus") so that different models may be preferred for different purposes. Comparisons within any class of models fitted by maximum likelihood are supported, with shortcuts for commonly-used classes such as generalised linear models and parametric survival models. The methods originate from Claeskens and Hjort (2003) <doi:10.1198/016214503000000819> and Claeskens and Hjort (2008, ISBN:9780521852258). |
Maintainer: | Christopher Jackson <chris.jackson@mrc-bsu.cam.ac.uk> |
Depends: | R (≥ 2.10) |
Imports: | stats, numDeriv, mvtnorm, ggplot2, scales, survival, tensor, abind, mgcv |
URL: | https://github.com/chjackson/fic |
BugReports: | https://github.com/chjackson/fic/issues |
License: | GPL-3 |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.2 |
Suggests: | knitr, rmarkdown, testthat, msm(≥ 1.6.6), flexsurv, sn, gapminder, GGally |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2025-03-27 16:44:41 UTC; Chris J |
Author: | Christopher Jackson [cre, aut] (package design and programming), Gerda Claeskens [aut] (method development and design advice), Howard Thom [ctb] |
Repository: | CRAN |
Date/Publication: | 2025-03-27 17:20:02 UTC |
Focused Information Criteria for Model Comparison
Description
For a full explanation of the methods behind the fic package and worked examples of using it, see the main package vignette "Different models for different purposes: focused model comparison in R"
Author(s)
Maintainer: Christopher Jackson chris.jackson@mrc-bsu.cam.ac.uk (package design and programming)
Authors:
Gerda Claeskens gerda.claeskens@kuleuven.be (method development and design advice)
Other contributors:
Howard Thom howard.thom@bristol.ac.uk [contributor]
See Also
Useful links:
Form indicator matrix describing all submodels of a general linear wide model
Description
Form indicator matrix describing all possible submodels of a general linear wide model, where the submodels are defined by selected covariates.
Usage
## Default S3 method:
all_inds(wide, inds0 = NULL, auxpars = NULL, ...)
all_inds(wide, inds0, ...)
## S3 method for class 'lm'
all_inds(wide, inds0, ...)
## S3 method for class 'glm'
all_inds(wide, inds0, ...)
## S3 method for class 'coxph'
all_inds(wide, inds0, ...)
Arguments
wide |
A fitted model of standard R format, such that |
inds0 |
Narrow model indicators, in format described in |
auxpars |
Names of parameters in the wide model other than the covariate effects being selected from. By default, for linear and generalised linear models this is |
... |
Other arguments. Currently unused. |
Value
A matrix in the format required by the inds
argument of fic()
, representing all possible submodels of the wide model.
The number of rows is the number of models, and the number of columns is the number of parameters in the wide model. The r,s
entry of the matrix is a 1 if the r
th submodel includes parameter s
, and 0 otherwise.
If a factor is included (excluded) from the submodel, then all corresponding parameters are included (excluded).
Examples
bwt.glm <- glm(low ~ lwtkg + age + smoke, data=birthwt, family="binomial")
all_inds(bwt.glm, inds0=c(1,0,0,0))
# note no intercept term in Cox models, so inds0 has two elements here
library(survival)
wide <- coxph(Surv(years, death==1) ~ sex + thick_centred, data=melanoma)
all_inds(wide, inds0=c(0,0))
Risk factors associated with low infant birth weight
Description
Risk factors associated with low infant birth weight
Usage
birthwt
Format
A data frame with 189 rows and 19 variables. The first 10 columns are included in the dataset of the same name in the MASS package. The remaining 9 columns are defined in Claeskens and Hjort (2008), and are included in this dataset for convenience.
- low
indicator of birth weight less than 2.5 kg
- age
mother's age in years
- lwt
mother's weight in pounds at last menstrual period
- race
mother's race (‘1’ = white, ‘2’ = black, ‘3’ = other)
- smoke
smoking status during pregnancy
- ptl
number of previous premature labours
- ht
history of hypertension
- ui
presence of uterine irritability
- ftv
number of physician visits during the first trimester
- bwt
birth weight in grams
- smokeui
Binary indicator for both smoking and uterine irritation
- smokeage
Interaction between age and binary (0/1) smoking status, that is,
age
for smokers and zero for non-smokers.- intercpt
Intercept term (all 1)
- raceother
Binary indicator for
race
"other"- raceblack
Binary indicator for
race
"black"- ftv2p
Binary indicator for
ftv
, number of physician visits during the first trimester, 2 or more- ftv1
Binary indicator for
ftv
1- ptd
Binary indicator for
ptl
, number of previous premature labours, 1 or more- lwtkg
Weight measured in kg, as used in Claeskens and Hjort. Note
lwt
, as used in MASS, is in pounds.
Source
MASS package (Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer); originally from Hosmer, D.W. and Lemeshow, S. (1989) Applied Logistic Regression. New York: Wiley
References
Claeskens, G., & Hjort, N. L. (2008). Model selection and model averaging (Vol. 330). Cambridge: Cambridge University Press.
Form 'fic' model indicator argument in presence of factors
Description
Given a model indicator inds
identifying terms in a regression
model, convert this to the format needed for fic
by converting
indicators for regression terms to indicators for inclusion of
parameters. Only required if there are factors.
Usage
expand_inds(inds, wide)
Arguments
inds |
Matrix or vector of indicators for which parameters are included in the submodel or submodels to be assessed. A matrix should be supplied if there are multiple submodels. This should have number of rows equal to the number of submodels, and number of columns equal to the total number of parameters in the wide model. It contains 1s in the positions where the parameter is included in the submodel, and 0s in positions where the parameter is excluded. This should always be 1 in the positions defining the narrow model, as specified in |
wide |
Fitted model object containing the wide model. |
Details
If a regression term is a factor, then the 0 or 1 indicating its inclusion/exclusion is replicated to a length given by the number of factor levels minus 1, that is, the number of parameters pertaining to that factor.
This function only works for classes of models for which the
model.matrix
function is understood and returns objects
with an "assign"
attribute. This includes all the
commonly-used models in base R.
Examples
# Five terms in this model: intercept and four covariates,
# but the covariate "ftv" is a factor with 3 levels,
# so there are six parameters
bwt.glm <- glm(low ~ lwtkg + age + smoke + ftv, data=birthwt, family="binomial")
## Convert indicator for terms to indicator for parameters
inds <- rbind(c(1,1,1,0,0),
c(1,1,1,1,1))
expand_inds(inds, bwt.glm)
Focused information criteria for Cox proportional hazard regression models
Description
Focused model comparison for Cox models fitted with coxph
from the survival
package. Built-in focuses include the hazard ratio, survival and cumulative hazard.
Usage
## S3 method for class 'coxph'
fic(
wide,
inds,
inds0 = NULL,
gamma0 = 0,
focus,
X = NULL,
t = NULL,
sub = "auto",
tidy = TRUE,
...
)
Arguments
wide |
Fitted model object containing the wide model. |
inds |
Matrix or vector of indicators for which parameters are included in the submodel or submodels to be assessed. A matrix should be supplied if there are multiple submodels. This should have number of rows equal to the number of submodels, and number of columns equal to the total number of parameters in the wide model. It contains 1s in the positions where the parameter is included in the submodel, and 0s in positions where the parameter is excluded. This should always be 1 in the positions defining the narrow model, as specified in |
inds0 |
Vector of indicators specifying the narrow model, in the same format as |
gamma0 |
Vector of special values taken by the parameters This defaults to 0, as in covariate selection, where "excluded" coefficients are fixed to 0. This should either be a scalar, assumed to be the same for all parameters fixed in the narrow model, or a vector of length equal to the number of parameters from the wide model which are fixed in the narrow model, that is, the number of entries of |
focus |
Three built-in focus quantities are supported:
Alternatively, a list of three R functions can be supplied, with components named |
X |
Covariate values to evaluate the focus at. See |
t |
times to evaluate the focus at. Only relevant for survival and cumulative hazard focuses, as the hazard ratio is constant through time. |
sub |
If |
tidy |
If |
... |
Other arguments to the focus function can be supplied here. The built-in focus functions If just one focus is needed, then To compute focused model comparison statistics for multiple focuses defined by the same focus function evaluated at multiple covariate values, For a typical regression model, the first parameter will denote an intercept, so the first value of Arguments to the focus function other than |
Details
Stratified Cox models are not currently supported.
User-defined focuses
Each function should have four arguments:
par
Vector of estimated coefficients, the log hazard ratios in the Cox model.
H0
Cumulative hazard estimate at a set of times, in the form of the output from basehaz
. The function get_H0
can be used on this estimate to obtain the estimate at any other times by interpolation.
X
Matrix of covariates, with ncov
rows and npar
columns, where ncov
is the number of alternative covariate values definining alternative focuses we want to compare models for, and npar
is the number of coefficients in the model.
t
Vector of times defining alternative focus quantities (such as the survival)
For examples, examine the source for the built-in functions
fic:::cox_hr,fic:::cox_hr_deriv,fic:::cox_hr_dH
for the hazard ratio between X
and 0
fic:::cox_cumhaz,fic:::cox_cumhaz_deriv,fic:::cox_cumhaz_dH
for the cumulative hazard
fic:::cox_survival,fic:::cox_survival_deriv,fic:::cox_survival_dH
for the survival.
Examples
## Example of focused covariate selection in a Cox model
## For more information see the main package vignette.
library(survival)
wide <- coxph(Surv(years, death==1) ~ sex + thick_centred + infilt +
epith + ulcer + depth + age, data=melanoma)
## Define models to be selected from
## Sex included in all models
## Select between models including all combinations of other covariates
inds0 <- expand_inds(c(1,0,0,0,0,0,0), wide)
combs <- all_inds(wide, inds0)
## Covariate values defining focus
newdata <- with(melanoma,
data.frame(sex = c("female","male"),
thick_centred = tapply(thick_centred, sex, mean),
infilt=4, epith=1, ulcer=1, depth=2,
age = tapply(age, sex, mean)))
X <- newdata_to_X(newdata, wide, intercept=FALSE)
## Focus is 5-year survival for these covariate values
ficall <- fic(wide, inds=combs, inds0=inds0, focus="survival", X=X, t=5)
ggplot_fic(ficall)
summary(ficall)
Focused information criteria: main user interface
Description
Focused information criteria for general models. These methods estimate the bias and variance of estimates of a quantity of interest (the "focus") when smaller submodels are used in place of a "wide" model that is assumed to generate the data but may not give precise enough estimates.
Usage
## Default S3 method:
fic(
wide,
inds,
inds0 = NULL,
gamma0 = 0,
focus = NULL,
focus_deriv = NULL,
wt = NULL,
sub = NULL,
fns = NULL,
FIC = FALSE,
B = 0,
loss = loss_mse,
tidy = TRUE,
...
)
fic(wide, ...)
Arguments
wide |
Fitted model object containing the wide model. |
inds |
Matrix or vector of indicators for which parameters are included in the submodel or submodels to be assessed. A matrix should be supplied if there are multiple submodels. This should have number of rows equal to the number of submodels, and number of columns equal to the total number of parameters in the wide model. It contains 1s in the positions where the parameter is included in the submodel, and 0s in positions where the parameter is excluded. This should always be 1 in the positions defining the narrow model, as specified in |
inds0 |
Vector of indicators specifying the narrow model, in the same format as |
gamma0 |
Vector of special values taken by the parameters This defaults to 0, as in covariate selection, where "excluded" coefficients are fixed to 0. This should either be a scalar, assumed to be the same for all parameters fixed in the narrow model, or a vector of length equal to the number of parameters from the wide model which are fixed in the narrow model, that is, the number of entries of |
focus |
An R function with:
The function should return the focus quantity of interest. If additional arguments are supplied which are vectors or matrices, e.g. Not required if Alternatively,
See |
focus_deriv |
Vector of partial derivatives of the focus function with respect to the parameters in the wide model. This is not usually needed, as it can generally be computed automatically and accurately from the function supplied in |
wt |
Vector of weights to apply to different covariate values in |
sub |
List of fitted model objects corresponding to each submodel to be assessed. For some classes of models with built in methods for Otherwise, this argument can be omitted, but it is required if you want the estimate of the focus function under each submodel to be included in the results, which is usually the case. |
fns |
Named list of functions to extract the quantities from the fitted model object that are required to calculate the focused model comparison statistics. By default this is
Suppose the fitted model object is called
If one or more of these functions does not work for
If less than three components are specified in |
FIC |
If |
B |
If |
loss |
A function returning an estimated loss for a submodel estimate under the sampling distribution of the wide model. Only applicable when using bootstrapping. This should have two arguments |
tidy |
If |
... |
Other arguments to the focus function can be supplied here. The built-in focus functions If just one focus is needed, then To compute focused model comparison statistics for multiple focuses defined by the same focus function evaluated at multiple covariate values, For a typical regression model, the first parameter will denote an intercept, so the first value of Arguments to the focus function other than |
Value
The returned data frame or array contains the following components, describing characteristics of the defined submodel. See the package vignette for full, formal definitions, and Chapter 6 of Claeskens and Hjort, 2008.
rmse |
The root mean square error of the estimate of the focus quantity. Defined as the square root of (squared unadjusted bias plus variance). This is an asymptotically unbiased estimator, but may occasionally be indeterminate if the estimate of the squared bias plus variance is negative. |
rmse.adj |
The root mean square error, based on a bias estimator which is adjusted to avoid negative squared bias. Defined on page 157 of Claeskens and Hjort as the sum of the variance and the squared adjusted bias. |
bias |
The estimated bias of the focus quantity, adjusted to avoid negative squared bias. This is defined as the square root of the quantity |
se |
The estimated standard error (root variance) of the focus quantity. Defined on page 157. |
FIC |
The focused information criterion (equation 6.1 from Claeskens and Hjort), if |
The object returned by fic
also has the following attributes, which can be extracted with the attr
function.
iwide |
Index of the wide model in the vector of submodels, or |
inarr |
Index of the narrow model in the vector of submodels, or |
sub |
List of fitted submodel objects. |
parnames |
Vector of names of parameters in the wide model. |
inds |
Submodel indicators, as supplied in the |
References
Claeskens, G., & Hjort, N. L. (2008). Model selection and model averaging (Vol. 330). Cambridge: Cambridge University Press.
Claeskens, G., & Hjort, N. L. (2003). The focused information criterion. Journal of the American Statistical Association, 98(464), 900-916.
Examples
wide.glm <- glm(low ~ lwtkg + age + smoke + ht + ui + smokeage + smokeui,
data=birthwt, family=binomial)
inds <- rbind(
narrow = c(1,1,0,0,0,0,0,0),
mod1 = c(1,1,1,1,0,0,0,0),
wide = c(1,1,1,1,1,1,1,1)
)
vals.smoke <- c(1, 58.24, 22.95, 1, 0, 0, 22.95, 0)
vals.nonsmoke <- c(1, 59.50, 23.43, 0, 0, 0, 0, 0)
X <- rbind("Smokers"=vals.smoke, "Non-smokers"=vals.nonsmoke)
fic(wide=wide.glm, inds=inds, focus="prob_logistic", X=X)
focus <- function(par, X)plogis(X %*% par)
fic(wide=wide.glm, inds=inds, focus=focus, X=X) # equivalent
Focused information criteria for flexible parametric survival models
Description
Focused information criteria for parametric survival models fitted with the flexsurv package.
Usage
## S3 method for class 'flexsurvreg'
fic(
wide,
inds,
inds0 = NULL,
gamma0 = 0,
focus = NULL,
focus_deriv = NULL,
wt = NULL,
sub = NULL,
B = 0,
loss = loss_mse,
...
)
Arguments
wide |
Object of class |
inds |
Matrix or vector of indicators for which parameters are included in the submodel or submodels to be assessed. A matrix should be supplied if there are multiple submodels. This should have number of rows equal to the number of submodels, and number of columns equal to the total number of parameters in the wide model. It contains 1s in the positions where the parameter is included in the submodel, and 0s in positions where the parameter is excluded. This should always be 1 in the positions defining the narrow model, as specified in |
inds0 |
Vector of indicators specifying the narrow model, in the same format as |
gamma0 |
Vector of special values taken by the parameters This defaults to 0, as in covariate selection, where "excluded" coefficients are fixed to 0. This should either be a scalar, assumed to be the same for all parameters fixed in the narrow model, or a vector of length equal to the number of parameters from the wide model which are fixed in the narrow model, that is, the number of entries of |
focus |
An R function with:
The function should return the focus quantity of interest. If additional arguments are supplied which are vectors or matrices, e.g. Not required if Alternatively,
See |
focus_deriv |
Vector of partial derivatives of the focus function with respect to the parameters in the wide model. This is not usually needed, as it can generally be computed automatically and accurately from the function supplied in |
wt |
Vector of weights to apply to different covariate values in |
sub |
List of objects of class |
B |
If |
loss |
A function returning an estimated loss for a submodel estimate under the sampling distribution of the wide model. Only applicable when using bootstrapping. This should have two arguments |
... |
Other arguments to the focus function can be supplied here. The built-in focus functions If just one focus is needed, then To compute focused model comparison statistics for multiple focuses defined by the same focus function evaluated at multiple covariate values, For a typical regression model, the first parameter will denote an intercept, so the first value of Arguments to the focus function other than |
Details
Any situation where all models being compared are special cases of a single "wide" model are supported. Examples include covariate selection, selection between models for the baseline hazard/survival with different levels of flexibility (e.g. comparing exponential, Weibull and generalized gamma). Some of these are illustrated in the fic package vignette "Examples of focused model comparison: parametric survival models".
The choice between flexsurvspline
models with different numbers of knots is not supported, unless perhaps if the knot locations are defined manually so that models are nested within each other, but this has not been investigated.
Examples
## Simulated example from the "fic" package vignette on
## parametric survival modelling.
## See this vignette for more details and more examples.
set.seed(1)
if (requireNamespace("flexsurv", quietly=TRUE)){
## Simulate from an exponential
y <- rexp(50); cen <- rep(1,50)
## Fit wide generalized gamma, and compare
## exponential, weibull and generalized gamma models
indmat <- rbind(exp = c(1,0,0),
weib = c(1,1,0),
ggamma = c(1,1,1))
gge <- flexsurv::flexsurvreg(survival::Surv(y, cen) ~ 1, dist="gengamma")
## Focus is restricted mean survival over 8 time units
focus <- function(par){
flexsurv::rmst_gengamma(8, par[1], exp(par[2]), par[3])
}
## Weibull model actually has lowest FIC and RMSE even though it's
## not true: extra variability is deemed worth alleviating the
## risk of bias.
fic(gge, inds=indmat, gamma0=c(0,1), focus=focus)
}
Focused information criteria for generalized linear models
Description
Focused information criteria for generalized linear models fitted with glm
.
Used to compare models with different covariates (more generally,
different linear terms).
Usage
## S3 method for class 'glm'
fic(
wide,
inds,
inds0 = NULL,
gamma0 = 0,
focus = NULL,
focus_deriv = NULL,
wt = NULL,
sub = "auto",
B = 0,
loss = loss_mse,
...
)
Arguments
wide |
Fitted model object containing the wide model. |
inds |
Matrix or vector of indicators for which parameters are included in the submodel or submodels to be assessed. A matrix should be supplied if there are multiple submodels. This should have number of rows equal to the number of submodels, and number of columns equal to the total number of parameters in the wide model. It contains 1s in the positions where the parameter is included in the submodel, and 0s in positions where the parameter is excluded. This should always be 1 in the positions defining the narrow model, as specified in |
inds0 |
Vector of indicators specifying the narrow model, in the same format as |
gamma0 |
Vector of special values taken by the parameters This defaults to 0, as in covariate selection, where "excluded" coefficients are fixed to 0. This should either be a scalar, assumed to be the same for all parameters fixed in the narrow model, or a vector of length equal to the number of parameters from the wide model which are fixed in the narrow model, that is, the number of entries of |
focus |
An R function with:
The function should return the focus quantity of interest. If additional arguments are supplied which are vectors or matrices, e.g. Not required if Alternatively,
See |
focus_deriv |
Vector of partial derivatives of the focus function with respect to the parameters in the wide model. This is not usually needed, as it can generally be computed automatically and accurately from the function supplied in |
wt |
Vector of weights to apply to different covariate values in |
sub |
If |
B |
If |
loss |
A function returning an estimated loss for a submodel estimate under the sampling distribution of the wide model. Only applicable when using bootstrapping. This should have two arguments |
... |
Other arguments to the focus function can be supplied here. The built-in focus functions If just one focus is needed, then To compute focused model comparison statistics for multiple focuses defined by the same focus function evaluated at multiple covariate values, For a typical regression model, the first parameter will denote an intercept, so the first value of Arguments to the focus function other than |
Details
The model parameters include the intercept, followed by the coefficients of any covariates. Any "dispersion" parameter is excluded from the parameters indicated by inds
and inds0
.
Only covariate selection problems are supported in this function.
To compare between models with a fixed and unknown dispersion,
glm
would have to be replaced by maximum likelihood
estimation routines written by hand, along the lines described in
the "skew-normal models" vignette.
The focus function can however depend on the value of the
dispersion parameter. The focus function should then have
an argument called dispersion
. See the example of a gamma
GLM below, where the focus is the mean outcome for some covariate
value.
Examples of covariate selection in logistic regression are given in the main package vignette.
Examples
# Gamma regression with one binary covariate
# Simulated data
set.seed(1)
simx <- rbinom(1000, 1, 0.5)
mean0 <- 1.1
simy <- rgamma(1000, shape=2, scale=exp(log(mean0) + 0.2*simx))
mod <- glm(simy ~ simx, family=Gamma(link="log"))
# Check the parameter estimates are close to true
# values used for simulation
(shape <- 1 / summary(mod)$dispersion)
coef(mod)[2] # log mean ratio associated with covariate. true value 0.2
exp(coef(mod)[1] - log(shape)) # mean with x=0, true value 1.1
exp(coef(mod)[1] + coef(mod)[2] - log(shape)) # mean with x=1
focus_mean <- function(par, X, dispersion){
exp(X %*% par - log(1/dispersion))
}
X <- rbind("x0" = c(1,0), "x1" = c(1,1))
inds <- rbind("no_covariate"=c(1,0), "covariate"=c(1,1))
fic(mod, inds=inds, focus=focus_mean, X=X)
# The focus need not depend on X or the dispersion
focus_base_scale <- function(par, dispersion){
exp(par[1])
}
fic(mod, inds=inds, focus=focus_base_scale)
# ...equivalently,
focus_base_scale <- function(par){
exp(par[1])
}
fic(mod, inds=inds, focus=focus_base_scale)
Focused information criteria for linear models
Description
Focused information criteria for linear models fitted with lm
.
Typically used to compare models with different covariates (more generally,
different linear terms).
Usage
## S3 method for class 'lm'
fic(
wide,
inds,
inds0 = NULL,
gamma0 = 0,
focus = NULL,
focus_deriv = NULL,
wt = NULL,
sub = "auto",
B = 0,
loss = loss_mse,
...
)
Arguments
wide |
Fitted model object containing the wide model. |
inds |
Matrix or vector of indicators for which parameters are included in the submodel or submodels to be assessed. A matrix should be supplied if there are multiple submodels. This should have number of rows equal to the number of submodels, and number of columns equal to the total number of parameters in the wide model. It contains 1s in the positions where the parameter is included in the submodel, and 0s in positions where the parameter is excluded. This should always be 1 in the positions defining the narrow model, as specified in |
inds0 |
Vector of indicators specifying the narrow model, in the same format as |
gamma0 |
Vector of special values taken by the parameters This defaults to 0, as in covariate selection, where "excluded" coefficients are fixed to 0. This should either be a scalar, assumed to be the same for all parameters fixed in the narrow model, or a vector of length equal to the number of parameters from the wide model which are fixed in the narrow model, that is, the number of entries of |
focus |
An R function with:
The function should return the focus quantity of interest. If additional arguments are supplied which are vectors or matrices, e.g. Not required if Alternatively,
See |
focus_deriv |
Vector of partial derivatives of the focus function with respect to the parameters in the wide model. This is not usually needed, as it can generally be computed automatically and accurately from the function supplied in |
wt |
Vector of weights to apply to different covariate values in |
sub |
If The model parameters include the intercept, followed by the coefficients of any covariates. The standard deviation is excluded. Only covariate selection problems are supported in this function. To compare between models with a fixed and unknown standard deviation, hand-written maximum likelihood estimation routines would be needed, along the lines described in the "skew-normal models" vignette. The focus can depend on the standard deviation. The focus function should then have an argument See the vignette "Using the fic R package for focused model comparison: linear regression" for some examples. |
B |
If |
loss |
A function returning an estimated loss for a submodel estimate under the sampling distribution of the wide model. Only applicable when using bootstrapping. This should have two arguments |
... |
Other arguments to the focus function can be supplied here. The built-in focus functions If just one focus is needed, then To compute focused model comparison statistics for multiple focuses defined by the same focus function evaluated at multiple covariate values, For a typical regression model, the first parameter will denote an intercept, so the first value of Arguments to the focus function other than |
Examples
## Covariate selection in Motor Trend cars data
## See the "fic" package vignette on linear models for more details
wide.lm <- lm(mpg ~ am + wt + qsec + disp + hp, data=mtcars)
## Select between all submodels
ncovs_wide <- length(coef(wide.lm)) - 1
inds0 <- c(1, rep(0, ncovs_wide))
inds <- all_inds(wide.lm, inds0)
## Two focuses: mean MPG for automatic and manual transmission,
## given mean values of the other covariates
cmeans <- colMeans(model.frame(wide.lm)[,c("wt","qsec","disp","hp")])
X <- rbind(
"auto" = c(intercept=1, am=0, cmeans),
"manual" = c(intercept=1, am=1, cmeans)
)
ficres <- fic(wide.lm, inds=inds, focus=mean_normal, X=X)
summary(ficres)
ggplot_fic(ficres)
Focused information criteria for multi-state models for panel data
Description
Focused information criteria for multi-state models fitted with msm
from the msm package.
Usage
## S3 method for class 'msm'
fic(
wide,
inds,
inds0 = NULL,
gamma0 = 0,
focus = NULL,
focus_deriv = NULL,
wt = NULL,
sub = NULL,
B = 0,
loss = loss_mse,
...
)
Arguments
wide |
Object returned by |
inds |
Matrix or vector of indicators for which parameters are included in the submodel or submodels to be assessed. A matrix should be supplied if there are multiple submodels. This should have number of rows equal to the number of submodels, and number of columns equal to the total number of parameters in the wide model. It contains 1s in the positions where the parameter is included in the submodel, and 0s in positions where the parameter is excluded. This should always be 1 in the positions defining the narrow model, as specified in |
inds0 |
Vector of indicators specifying the narrow model, in the same format as |
gamma0 |
Vector of special values taken by the parameters This defaults to 0, as in covariate selection, where "excluded" coefficients are fixed to 0. This should either be a scalar, assumed to be the same for all parameters fixed in the narrow model, or a vector of length equal to the number of parameters from the wide model which are fixed in the narrow model, that is, the number of entries of |
focus |
An R function with:
The function should return the focus quantity of interest. If additional arguments are supplied which are vectors or matrices, e.g. Not required if Alternatively,
See |
focus_deriv |
Vector of partial derivatives of the focus function with respect to the parameters in the wide model. This is not usually needed, as it can generally be computed automatically and accurately from the function supplied in |
wt |
Vector of weights to apply to different covariate values in |
sub |
List of objects returned by |
B |
If |
loss |
A function returning an estimated loss for a submodel estimate under the sampling distribution of the wide model. Only applicable when using bootstrapping. This should have two arguments |
... |
Other arguments to the focus function can be supplied here. The built-in focus functions If just one focus is needed, then To compute focused model comparison statistics for multiple focuses defined by the same focus function evaluated at multiple covariate values, For a typical regression model, the first parameter will denote an intercept, so the first value of Arguments to the focus function other than |
Details
This might be used for covariate selection, or comparing models with different constraints on the covariate effects or intensities. An example is given in the fic package vignette "Examples of focused model comparison: multi-state models". Note in particular in this example how the parameters are ordered in the inds
argument, and how the various msm output functions can be used as focuses.
Examples
## Covariate selection in psoriatic arthritis model.
## See the "fic" package vignette on multi-state models for
## more details and examples.
if (requireNamespace("msm",quietly=TRUE)){
Qind <- rbind(c(0, 1, 0, 0),
c(0, 0, 1, 0),
c(0, 0, 0, 1),
c(0, 0, 0, 0))
psor.wide.msm <- msm::msm(state ~ months, subject=ptnum, data=msm::psor,
qmatrix = Qind, gen.inits=TRUE,
covariates = ~ollwsdrt+hieffusn)
inds <- rbind(
c(1,1,1,0,0,0,0,0,0),
c(1,1,1,0,0,0,0,0,1),
c(1,1,1,0,0,0,0,1,1),
c(1,1,1,0,0,0,1,1,1),
c(1,1,1,0,0,1,1,1,1),
c(1,1,1,0,1,1,1,1,1),
c(1,1,1,1,1,1,1,1,1)
)
focus_tlos <- function(par){
x.new <- msm::updatepars.msm(psor.wide.msm, par)
msm::totlos.msm(x.new, covariates=0, tot=10)["State 4"]
}
fres <- fic(wide=psor.wide.msm, inds=inds, focus=focus_tlos)
fres
}
Focused information criteria for parametric survival models
Description
Focused information criteria for parametric survival models fitted with the survival package.
Usage
## S3 method for class 'survreg'
fic(
wide,
inds,
inds0 = NULL,
gamma0 = 0,
focus = NULL,
focus_deriv = NULL,
wt = NULL,
sub = NULL,
B = 0,
loss = loss_mse,
...
)
Arguments
wide |
Object returned by the |
inds |
Matrix or vector of indicators for which parameters are included in the submodel or submodels to be assessed. A matrix should be supplied if there are multiple submodels. This should have number of rows equal to the number of submodels, and number of columns equal to the total number of parameters in the wide model. It contains 1s in the positions where the parameter is included in the submodel, and 0s in positions where the parameter is excluded. This should always be 1 in the positions defining the narrow model, as specified in |
inds0 |
Vector of indicators specifying the narrow model, in the same format as |
gamma0 |
Vector of special values taken by the parameters This defaults to 0, as in covariate selection, where "excluded" coefficients are fixed to 0. This should either be a scalar, assumed to be the same for all parameters fixed in the narrow model, or a vector of length equal to the number of parameters from the wide model which are fixed in the narrow model, that is, the number of entries of |
focus |
An R function with:
The function should return the focus quantity of interest. If additional arguments are supplied which are vectors or matrices, e.g. Not required if Alternatively,
See |
focus_deriv |
Vector of partial derivatives of the focus function with respect to the parameters in the wide model. This is not usually needed, as it can generally be computed automatically and accurately from the function supplied in |
wt |
Vector of weights to apply to different covariate values in |
sub |
List of fitted model objects of class |
B |
If |
loss |
A function returning an estimated loss for a submodel estimate under the sampling distribution of the wide model. Only applicable when using bootstrapping. This should have two arguments |
... |
Other arguments to the focus function can be supplied here. The built-in focus functions If just one focus is needed, then To compute focused model comparison statistics for multiple focuses defined by the same focus function evaluated at multiple covariate values, For a typical regression model, the first parameter will denote an intercept, so the first value of Arguments to the focus function other than |
Details
Any situation where all models being compared are special cases of a single "wide" model are supported. Examples include covariate selection, selection between models for the baseline hazard/survival with different levels of flexibility (e.g. comparing exponential and Weibull). An example of the latter is in the fic package vignette "Examples of focused model comparison: parametric survival models".
Parameters par
of the focus function should be on the scale reported by the icoef
component of the results of survreg
, that is, with any positive-valued parameters log transformed.
Examples
library(survival)
## Fit exponential and Weibull models and plot fitted survival curves
ex <- survreg(Surv(futime, fustat) ~ 1, data=ovarian, dist="exponential")
we <- survreg(Surv(futime, fustat) ~ 1, data=ovarian, dist="weibull")
## Plot fitted survival curves, highlighting 1 year survival
plot(survfit(Surv(futime, fustat) ~ 1, data=ovarian))
t <- seq(0, 1200)
lines(t, pweibull(q=t, shape=exp(we$icoef[2]),
scale=exp(we$icoef[1]), lower.tail=FALSE))
lines(t, pexp(q=t, rate=1/exp(ex$icoef[1]), lower.tail=FALSE), lty=2)
abline(v=365, col="gray")
## Focused model comparison for focus of 1-year survival probability
indmat <- rbind(exp = c(1,0),
weib = c(1,1))
surv1yr <- function(par){
pweibull(q=365, shape=exp(par[2]), scale=exp(par[1]), lower.tail=FALSE)
}
fic(we, inds=indmat, focus=surv1yr, sub=list(ex, we))
## Exponential model has lower expected error, given such a small dataset
Focused information criteria: core calculation functions
Description
Core FIC calculation functions underlying the user interface in fic
.
fic_core
just handles one submodel, while fic_multi
can assess multiple submodels of the same wide model. For fic_multi
, inds
and parsub
can be matrices with one row per submodel, while for fic_core
they must be vectors.
Usage
fic_core(par, J, inds, inds0, gamma0 = 0, n, focus_deriv = NULL)
fic_multi(
par,
J,
inds,
inds0,
gamma0 = 0,
n,
focus = NULL,
focus_deriv = NULL,
wt = NULL,
parsub = NULL,
auxpar = NULL,
auxsub = NULL,
...
)
Arguments
par |
Vector of maximum likelihood estimates from the wide model |
J |
Information matrix from the wide model, evaluated at the maximum likelihood estimates (note that this definition differs from Claeskens and Hjort, where |
inds |
Matrix or vector of indicators for which parameters are included in the submodel or submodels to be assessed. A matrix should be supplied if there are multiple submodels. This should have number of rows equal to the number of submodels, and number of columns equal to the total number of parameters in the wide model. It contains 1s in the positions where the parameter is included in the submodel, and 0s in positions where the parameter is excluded. This should always be 1 in the positions defining the narrow model, as specified in |
inds0 |
Vector of indicators specifying the narrow model, in the same format as |
gamma0 |
Vector of special values taken by the parameters This defaults to 0, as in covariate selection, where "excluded" coefficients are fixed to 0. This should either be a scalar, assumed to be the same for all parameters fixed in the narrow model, or a vector of length equal to the number of parameters from the wide model which are fixed in the narrow model, that is, the number of entries of |
n |
Number of observations in the data used to fit the wide model. |
focus_deriv |
Vector of partial derivatives of the focus function with respect to the parameters in the wide model. This is required by If there are multiple submodels, this should be a matrix with number of rows equal to the number of submodels, and number of columns equal to the number of parameters in the wide model. If there is a single submodel, this should be a vector with number of columns equal to the number of parameters in the wide model. This should take the value given by |
focus |
An R function with:
The function should return the focus quantity of interest. If additional arguments are supplied which are vectors or matrices, e.g. Not required if Alternatively,
See |
wt |
Vector of weights to apply to different covariate values in |
parsub |
Vector of maximum likelihood estimates from the submodel, or a matrix if there are multiple submodels. Only required to return the estimate of the focus quantity alongside the model assessment statistics for the submodel. If omitted, the estimate is omitted. |
auxpar |
Estimates of auxiliary parameters from the wide model. The only built-in example is the dispersion parameter for GLMs. |
auxsub |
List of estimates of auxiliary parameters from the submodel. The only built-in example is the dispersion parameter for GLMs. |
... |
Other arguments to the focus function can be supplied here. The built-in focus functions If just one focus is needed, then To compute focused model comparison statistics for multiple focuses defined by the same focus function evaluated at multiple covariate values, For a typical regression model, the first parameter will denote an intercept, so the first value of Arguments to the focus function other than |
Value
See fic
See Also
Examples
## Lower-level implementation of the example in the main vignette
wide.glm <- glm(low ~ lwtkg + age + smoke + ht + ui + smokeage + smokeui,
data=birthwt, family=binomial)
mod1.glm <- glm(low ~ lwtkg + age + smoke, data=birthwt, family=binomial)
inds0 <- c(1,1,0,0,0,0,0,0)
inds1 <- c(1,1,1,1,0,0,0,0)
focus_plogis <- function(par, X)plogis(X %*% par)
vals.smoke <- c(1, 58.24, 22.95, 1, 0, 0, 22.95, 0)
vals.nonsmoke <- c(1, 59.50, 23.43, 0, 0, 0, 0, 0)
X <- rbind(vals.smoke, vals.nonsmoke)
par <- coef(wide.glm)
n <- nrow(birthwt)
J <- solve(vcov(wide.glm))
fic_multi(par=par, J=J, inds=inds1, inds0=inds0, n=n, focus="prob_logistic",
X=X, parsub=c(coef(mod1.glm), 0, 0, 0, 0))
## Even lower-level implementation, requiring derivatives of the focus
## These are available analytically in this example, but would normally
## need to be calculated using numerical differentiation
focus_deriv <- prob_logistic_deriv(par=par, X=X)
fic_core(par=par, J=J, inds=inds1, inds0=inds0, gamma0=0, n=n,
focus_deriv=focus_deriv)
Fit submodels of a general linear wide model, defined by a matrix of indicators for inclusion of covariates
Description
Fit the submodels of a wide model wide
which are defined by
inds
. This can only be used for covariate selection
problems, where the submodels contain different subsets of
covariates.
Usage
fit_submodels(wide, inds, ...)
Arguments
wide |
Fitted model object containing the wide model. |
inds |
Matrix or vector of indicators for which parameters are included in the submodel or submodels to be assessed. A matrix should be supplied if there are multiple submodels. This should have number of rows equal to the number of submodels, and number of columns equal to the total number of parameters in the wide model. It contains 1s in the positions where the parameter is included in the submodel, and 0s in positions where the parameter is excluded. This should always be 1 in the positions defining the narrow model, as specified in |
... |
Other arguments to the focus function can be supplied here. The built-in focus functions If just one focus is needed, then To compute focused model comparison statistics for multiple focuses defined by the same focus function evaluated at multiple covariate values, For a typical regression model, the first parameter will denote an intercept, so the first value of Arguments to the focus function other than |
Details
Requires wide
to have a component named
call
giving the function call used to produce wide
.
This call should include a formula
component, which this
function updates in order to define and fit the submodel. This
should work for most standard linear-type models in common R packages.
Value
List of all fitted submodel objects.
Examples
bwt.glm <- glm(low ~ lwtkg + age + smoke,
data=birthwt, family="binomial")
inds <- rbind(c(1,1,1,0), c(1,1,0,0))
fit_submodels(bwt.glm, inds=inds)
Built-in focus functions and their derivatives
Description
Built-in focus functions and their derivatives
Usage
prob_logistic(par, X)
prob_logistic_deriv(par, X)
mean_normal(par, X)
mean_normal_deriv(par, X)
Arguments
par |
Vector of parameter estimates, including the intercept. |
X |
Vector or matrix of covariate values, including the intercept. This can either be a vector of length |
Value
prob_logistic
returns the probability of the outcome in a logistic regression model, and mean_normal
returns the mean outcome in a normal linear regression. The _deriv
functions return the vector of partial derivatives of the focus with respect to each parameter (or matrix, if there are multiple foci).
See Also
Examples
## Model and focus from the main vignette
wide.glm <- glm(low ~ lwtkg + age + smoke + ht + ui +
smokeage + smokeui, data=birthwt, family=binomial)
vals.smoke <- c(1, 58.24, 22.95, 1, 0, 0, 22.95, 0)
vals.nonsmoke <- c(1, 59.50, 23.43, 0, 0, 0, 0, 0)
X <- rbind("Smokers" = vals.smoke, "Non-smokers" = vals.nonsmoke)
prob_logistic(coef(wide.glm), X=X)
prob_logistic_deriv(coef(wide.glm), X=X)
## Mean mpg for a particular covariate category in the Motor Trend data
## See the "fic" linear models vignette for more detail
wide.lm <- lm(mpg ~ am + wt + qsec + disp + hp, data=mtcars)
cmeans <- colMeans(model.frame(wide.lm)[,c("wt","qsec","disp","hp")])
X <- rbind(
"auto" = c(intercept=1, am=0, cmeans),
"manual" = c(intercept=1, am=1, cmeans)
)
mean_normal(coef(wide.lm), X)
mean_normal_deriv(coef(wide.lm), X)
Interpolate cumulative hazard function from a fitted Cox model
Description
Returns the baseline cumulative hazard, at the requested times,
from a Cox model fitted by coxph
. Linear
interpolation is used, assuming the hazard is piecewise constant,
thus the cumulative hazard is piecewise linear.
Usage
get_H0(H0, t)
Arguments
H0 |
output from |
t |
vector of times for which cumulative hazard estimates are required. |
Details
This does not extrapolate. If t
is outside the observed event times, then NA
will be returned.
Value
Fitted cumulative hazard at t
.
Examples
library(survival)
wide <- coxph(Surv(years, death==1) ~ sex + thick_centred +
infilt + epith + ulcer + depth + age, data=melanoma)
basehaz(wide)
get_H0(basehaz(wide), c(0,1,5,10,100))
Plot focused model comparison statistics: ggplot2 method
Description
This only works if the focus estimates are available. The focus estimates are plotted against the root MSE. One plot is made for each covariate value defining different focuses. If the wide model estimate is available, this is illustrated as a solid line on the plot, and if the narrow model estimate is available, this is showm as a dashed line.
Usage
ggplot_fic(
x,
ci = TRUE,
adj = TRUE,
legend = TRUE,
ylab = NULL,
xlab = NULL,
xlim = NULL,
ylim = NULL
)
Arguments
x |
Output from |
ci |
Plot interval estimates? ( |
adj |
The optimal model is the one with the lowest root mean square error (RMSE). If |
legend |
Show a legend, identifying a) the line types for the wide and narrow models b) the names of the terms of the wide model. This is used when
the |
ylab |
y-axis label. |
xlab |
x-axis label. |
xlim |
x-axis limits (pair of numbers) |
ylim |
y-axis limits |
See Also
plot.fic, summary.fic
Examples
## Example from the main vignette, see there for more details
wide.glm <- glm(low ~ lwtkg + age + smoke + ht + ui + smokeage + smokeui,
data=birthwt, family=binomial)
vals.smoke <- c(1, 58.24, 22.95, 1, 0, 0, 22.95, 0)
vals.nonsmoke <- c(1, 59.50, 23.43, 0, 0, 0, 0, 0)
X <- rbind("Smokers" = vals.smoke, "Non-smokers" = vals.nonsmoke)
inds0 <- c(1,1,0,0,0,0,0,0)
combs <- all_inds(wide.glm, inds0)
ficres <- fic(wide = wide.glm, inds = combs, inds0 = inds0,
focus = prob_logistic, X = X)
ggplot_fic(ficres)
summary(ficres)
Malignant melanoma survival data
Description
Data originally analysed by Andersen et al (1993) on the survival of 205 patients in Denmark with malignant melanoma, and used by Claeskens and Hjort (2008) to illustrate focused model selection in Cox regression.
Usage
melanoma
Format
A data frame with 205 rows and the following columns:
- ptno
Patient identification number
- death
Survival status: 1 = dead from illness, 2 = censored, 4 = dead from other causes
- days
Survival time in days
- depth
Invasion depth: factor with levels 1, 2, 3
- infilt
Infection infiltration level, a measure of resistance to the tumour: factor with levels 1 (high resistance), 2, 3, 4 (low resistance)
- epith
Indicator for epithelioid cells present
- ulcer
Indicator for ulceration
- thick
Thickness of the tumour in 1/100 mm
- sex
Sex. Factor with levels
"female","male"
and reference level"female"
- age
Age in years
- years
Survival time in years (instead of days)
- thick_centred
Version of
thick
centred around its mean and rescaled, defined as (thick
- 292)/100.
Source
The supporting material from Claeskens and Hjort (2008), at https://feb.kuleuven.be/public/u0043181/modelselection/datasets/melanoma_data.txt. Versions of this dataset are also given in the MASS and boot packages.
References
Claeskens, G., & Hjort, N. L. (2008). Model selection and model averaging (Vol. 330). Cambridge: Cambridge University Press.
Andersen, P. K., Borgan, O., Gill, R. D., & Keiding, N. (2012). Statistical models based on counting processes. Springer.
Convert data frame of covariate values to a design matrix
Description
Convert data frame of covariate values to a design matrix
Usage
newdata_to_X(newdata, wide, intercept = TRUE)
Arguments
newdata |
Data frame where each row is a vector of covariate values defining an alternative focus quantity. |
wide |
Wide model which includes these covariates. |
intercept |
Include an intercept as the first column. |
Details
Numeric values can be supplied for factor levels that are character strings denoting numbers (like "1"
or "2"
).
Value
"Design" matrix of covariate values defining alternative focuses, with factors expanded to their contrasts. This is in the form required by the X
argument of fic
, with one row per alternative focus. The columns correspond to coefficients in a linear-type model. For the built-in focus functions such as mean_normal
and prob_logistic
, these coefficients include an intercept, but user-written focuses may be written in such a way as not to require an intercept (as in the example in the "skew normal" vignette).
Examples
bwt.glm <- glm(low ~ lwtkg + age + smoke + ftv, data=birthwt, family="binomial")
newdata <- data.frame(lwtkg=1, age=60, smoke=0, ftv="2+")
newdata_to_X(newdata, bwt.glm)
## See the Cox regression section of the main package vignette for another example.
Plot focused model comparison statistics: base graphics method
Description
Plot focused model comparison statistics: base graphics method
Usage
## S3 method for class 'fic'
plot(
x,
ci = TRUE,
adj = TRUE,
xlab = NULL,
ylab = NULL,
xlim = NULL,
ylim = NULL,
pch = 19,
mfrow = NULL,
...
)
Arguments
x |
Output from |
ci |
Plot interval estimates? ( |
adj |
The optimal model is the one with the lowest root mean square error (RMSE). If |
xlab |
x-axis label. |
ylab |
y-axis label. |
xlim |
x-axis limits (pair of numbers) |
ylim |
y-axis limits |
pch |
Plot point character, by default 19 (solid circle). |
mfrow |
Vector of two numbers giving the number of rows and number of columns respectively in the plot grid, if there are multiple focuses. |
... |
Other options to pass to |
Details
If the focus estimates are available, then the focus estimates are plotted against the root MSE. One plot is made for each covariate value defining different focuses. If the wide model estimate is available, this is illustrated as a solid line on the plot, and if the narrow model estimate is available, this is shown as a dashed line.
If the focus estimates are unavailable, then the standard errors of the focus estimate are plotted against the corresponding bias. The plot points are shaded with darkness proportional to the RMSE, with the point of maximum RMSE in black.
The ggplot2-based plot method, ggplot_fic
, is
slightly nicer.
See Also
Examples
## Example from the main vignette, see there for more details
wide.glm <- glm(low ~ lwtkg + age + smoke + ht + ui + smokeage + smokeui,
data=birthwt, family=binomial)
vals.smoke <- c(1, 58.24, 22.95, 1, 0, 0, 22.95, 0)
vals.nonsmoke <- c(1, 59.50, 23.43, 0, 0, 0, 0, 0)
X <- rbind("Smokers" = vals.smoke, "Non-smokers" = vals.nonsmoke)
inds0 <- c(1,1,0,0,0,0,0,0)
combs <- all_inds(wide.glm, inds0)
ficres <- fic(wide = wide.glm, inds = combs, inds0 = inds0,
focus = prob_logistic, X = X)
plot(ficres)
Summarise focused model comparison results
Description
Summarise focused model comparison results
Usage
## S3 method for class 'fic'
summary(object, tidy = TRUE, adj = FALSE, ...)
Arguments
object |
Object returned by |
tidy |
If |
adj |
The optimal model is the one with the lowest root mean square error (RMSE). If |
... |
Other arguments, currently unused. |
Value
A list of two components, one for the optimal model per focus, and one for the range of focus and RMSE estimates over models.
See Also
ggplot_fic
, plot.fic
for a more detailed visual representation of the focused comparison
Examples
## Example from the main vignette, see there for more details
wide.glm <- glm(low ~ lwtkg + age + smoke + ht + ui + smokeage + smokeui,
data=birthwt, family=binomial)
vals.smoke <- c(1, 58.24, 22.95, 1, 0, 0, 22.95, 0)
vals.nonsmoke <- c(1, 59.50, 23.43, 0, 0, 0, 0, 0)
X <- rbind("Smokers" = vals.smoke, "Non-smokers" = vals.nonsmoke)
inds0 <- c(1,1,0,0,0,0,0,0)
combs <- all_inds(wide.glm, inds0)
ficres <- fic(wide = wide.glm, inds = combs, inds0 = inds0,
focus = prob_logistic, X = X)
ggplot_fic(ficres)
summary(ficres)