Type: | Package |
Title: | K-Fold Cross Validation for Factor Analysis |
Version: | 0.2.2 |
Author: | Kyle Nickodem [aut, cre] and Peter Halpin [aut] |
Maintainer: | Kyle Nickodem <kyle.nickodem@gmail.com> |
Description: | Provides functions to identify plausible and replicable factor structures for a set of variables via k-fold cross validation. The process combines the exploratory and confirmatory factor analytic approach to scale development (Flora & Flake, 2017) <doi:10.1037/cbs0000069> with a cross validation technique that maximizes the available data (Hastie, Tibshirani, & Friedman, 2009) <isbn:978-0-387-21606-5>. Also available are functions to determine k by drawing on power analytic techniques for covariance structures (MacCallum, Browne, & Sugawara, 1996) <doi:10.1037/1082-989X.1.2.130>, generate model syntax, and summarize results in a report. |
Depends: | R (≥ 3.6) |
Imports: | caret, doParallel, flextable (≥ 0.6.3), foreach, GPArotation, knitr, lavaan (≥ 0.6.9), officer, parallel, rmarkdown, semTools (≥ 0.5.5), simstandard |
Suggests: | semPlot |
License: | GPL (≥ 3) |
Encoding: | UTF-8 |
LazyData: | true |
URL: | https://github.com/knickodem/kfa |
BugReports: | https://github.com/knickodem/kfa/issues |
RoxygenNote: | 7.2.3 |
NeedsCompilation: | no |
Packaged: | 2023-07-08 23:18:57 UTC; kylenick |
Repository: | CRAN |
Date/Publication: | 2023-07-09 09:00:02 UTC |
Aggregated factor correlations
Description
The factor correlations aggregated over k-folds
Usage
agg_cors(models, flag = 0.9, type = "factor")
Arguments
models |
An object returned from |
flag |
threshold above which a factor correlation will be flagged |
type |
currently ignored; |
Value
data.frame
of mean factor correlations for each factor model and vector
with count of folds with a flagged correlation
Examples
data(example.kfa)
agg_cors(example.kfa)
Aggregated factor loadings
Description
The factor loadings aggregated over k-folds
Usage
agg_loadings(models, flag = 0.3, digits = 2)
Arguments
models |
An object returned from |
flag |
threshold below which loading will be flagged |
digits |
integer; number of decimal places to display in the report. |
Value
data.frame
of mean factor loadings for each factor model and vector
with count of folds with a flagged loading
Examples
data(example.kfa)
agg_loadings(example.kfa)
Summary table of model fit
Description
Summary table of model fit aggregated over k-folds
Usage
agg_model_fit(kfits, index = "all", digits = 2)
Arguments
kfits |
an object returned from |
index |
character; one or more fit indices to summarize. Indices
must be present in the |
digits |
integer; number of decimal places to display in the report |
Value
data.frame
of aggregated model fit statistics
Examples
data(example.kfa)
fits <- k_model_fit(example.kfa, by.fold = TRUE)
agg_model_fit(fits)
Aggregated scale reliabilities
Description
The factor reliabilities aggregated over k-folds
Usage
agg_rels(models, flag = 0.6, digits = 2)
Arguments
models |
An object returned from |
flag |
threshold below which reliability will be flagged |
digits |
integer; number of decimal places to display in the report. |
Value
data.frame
of mean factor (scale) reliabilities for each factor model and vector
with count of folds with a flagged reliability
Examples
data(example.kfa)
agg_rels(example.kfa)
Write confirmatory factor analysis syntax
Description
Uses the factor loadings matrix, presumably from an exploratory factor analysis, to generate lavaan
compatible confirmatory factory analysis syntax.
Usage
efa_cfa_syntax(
loadings,
simple = TRUE,
min.loading = NA,
single.item = c("keep", "drop", "none"),
identified = TRUE,
constrain0 = FALSE
)
Arguments
loadings |
matrix of factor loadings |
simple |
logical; Should the perfect simple structure be returned (default) when converting EFA results to CFA syntax?
If |
min.loading |
numeric between 0 and 1 indicating the minimum (absolute) value of the loading for a variable on a factor
when converting EFA results to CFA syntax. Must be specified when |
single.item |
character indicating how single-item factors should be treated.
Use |
identified |
logical; Should identification check for rotational uniqueness a la Millsap (2001) be performed?
If the model is not identified |
constrain0 |
logical; Should variable(s) with all loadings below |
References
Millsap, R. E. (2001). When trivial constraints are not trivial: The choice of uniqueness constraints in confirmatory factor analysis. *Structural Equation Modeling, 8*(1), 1-17. doi:10.1207/S15328007SEM0801_1
Examples
loadings <- matrix(c(rep(.2, 3), rep(.6, 3), rep(.8, 3), rep(.3, 3)), ncol = 2)
# simple structure
efa_cfa_syntax(loadings)
# allow cross-loadings and check if model is identified
efa_cfa_syntax(loadings, simple = FALSE, min.loading = .25)
# allow cross-loadings and ignore identification check
efa_cfa_syntax(loadings, simple = FALSE, min.loading = .25, identified = FALSE)
kfa results from simulated data example
Description
Simulated responses for 900 observations on 20 variables loading onto a 3 factor
structure (see example in kfa
documentation for model).
The simulated data was run through kfa
with the call
kfa(sim.data, k = 2, m = 3) which tested 1-, 2-, and 3-factor structures over 2 folds.
Usage
data(example.kfa)
Format
An object of class "kfa"
, which is a four-element list
:
-
cfas
lavaan
CFA objects for each k fold -
cfa.syntax syntax used to produce CFA objects
-
model.names vector of names for CFA objects
-
efa.structures all factor structures identified in the EFA
Examples
data(example.kfa)
agg_cors(example.kfa)
Find k for k-fold cross-validation
Description
This function is specifically for determining k in the context of factor analysis using change in RMSEA as the criterion for identifying the optimal factor model.
Usage
find_k(
variables,
n,
p,
m = NULL,
max.k = 10,
min.n = 200,
rmsea0 = 0.05,
rmseaA = 0.08,
...
)
Arguments
variables |
a |
n |
integer; number of observations. Ignored if |
p |
integer; number of variables to factor analyze. Ignored if |
m |
integer; maximum number of factors expected to be extracted from |
max.k |
integer; maximum number of folds. Default is 10. |
min.n |
integer; minimum sample size per fold. Default is 200 based on simulations from Curran et al. (2003). |
rmsea0 |
numeric; RMSEA under the null hypothesis. |
rmseaA |
numeric; RMSEA under the alternative hypothesis. |
... |
other arguments passed to |
Value
named vector with the number of folds (k), sample size suggested by the power analysis (power.n), and the actual sample size used for determining k (actual.n).
References
Curran, P. J., Bollen, K. A., Chen, F., Paxton, P., & Kirby, J. B. (2003). Finite sampling properties of the point estimates and confidence intervals of the RMSEA. Sociological Methods & Research, 32(2), 208-252. doi:10.1177/0049124103256130
MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis and determination of sample size for covariance structure modeling. Psychological Methods, 1(2), 130–149. doi:10.1037/1082-989X.1.2.130
Examples
find_k(n = 900, p = 20, m = 3)
# adjust precision
find_k(n = 900, p = 20, m = 3, rmsea0 = .03, rmseaA = .10)
Standardized factor loadings matrix
Description
Extract standardized factor loadings from lavaan object
Usage
get_std_loadings(object, type = "std.all", df = FALSE)
Arguments
object |
a |
type |
standardize on the latent variables ( |
df |
should loadings be returned as a |
Value
A matrix
or data.frame
of factor loadings
Examples
data(HolzingerSwineford1939, package = "lavaan")
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '
fit <- lavaan::cfa(HS.model, data = HolzingerSwineford1939)
get_std_loadings(fit)
Available Fit Indices
Description
Shows the fit indices available from kfa
object to report in kfa_report
Usage
index_available(models)
Arguments
models |
an object returned from |
Value
character vector of index names
Examples
data(example.kfa)
index_available(example.kfa)
Extract model fit
Description
Model fit indices extracted from k-folds
Usage
k_model_fit(models, index = "default", by.fold = TRUE)
Arguments
models |
an object returned from |
index |
character; one or more fit indices to summarize in the report. Use |
by.fold |
Should each element in the returned lists be a fold (default) or a factor model? |
Value
list
of data.frames with average model fit for each factor model
Examples
data(example.kfa)
# customize fit indices to report
k_model_fit(example.kfa, index = c("chisq", "cfi", "rmsea", "srmr"))
# organize results by factor model rather than by fold
k_model_fit(example.kfa, by.fold = FALSE)
Conducts k-fold cross validation for factor analysis
Description
The function splits the data into k folds where each fold contains training data and test data.
For each fold, exploratory factor analyses (EFAs) are run on the training data. The structure for each model
is transformed into lavaan
-compatible confirmatory factor analysis (CFA) syntax.
The CFAs are then run on the test data.
Usage
kfa(
data,
variables = names(data),
k = NULL,
m = floor(length(variables)/4),
seed = 101,
cores = NULL,
custom.cfas = NULL,
power.args = list(rmsea0 = 0.05, rmseaA = 0.08),
rotation = "oblimin",
simple = TRUE,
min.loading = NA,
single.item = "none",
ordered = FALSE,
estimator = NULL,
missing = "listwise",
...
)
Arguments
data |
a |
variables |
character vector of column names in |
k |
number of folds in which to split the data. Default is |
m |
integer; maximum number of factors to extract. Default is 4 items per factor. |
seed |
integer passed to |
cores |
integer; number of CPU cores to use for parallel processing. Default is |
custom.cfas |
a single object or named |
power.args |
named |
rotation |
character (case-sensitive); any rotation method listed in
|
simple |
logical; Should the perfect simple structure be returned (default) when converting EFA results to CFA syntax?
If |
min.loading |
numeric between 0 and 1 indicating the minimum (absolute) value of the loading for a variable on a factor
when converting EFA results to CFA syntax. Must be specified when |
single.item |
character indicating how single-item factors should be treated.
Use |
ordered |
logical; Should items be treated as ordinal and the
polychoric correlations used in the factor analysis? When |
estimator |
if |
missing |
default is "listwise". See |
... |
other arguments passed to |
Details
In order for custom.cfas
to be tested along with the EFA identified structures, each model supplied in custom.cfas
must
include all variables
in lavaan
-compatible syntax.
Deciding an appropriate m can be difficult, but is consequential for the possible factor structures to
examine, the power analysis to determine k, and overall computation time.
The n_factors
function in the parameters
package can assist with this decision.
When converting EFA results to CFA syntax (via efa_cfa_syntax
), the simple structure is
defined as each variable loading onto a single factor. This is determined using the largest factor loading for each variable.
When simple = FALSE
, variables are allowed to cross-load on multiple factors. In this case, all pathways with loadings
above the min.loading
are retained. However, allowing cross-loading variables can result in model under-identification.
The efa_cfa_syntax
) function conducts an identification check (i.e., identified = TRUE
) and
under-identified models are not run in the CFA portion of the analysis.
Value
An object of class "kfa"
, which is a four-element list
:
-
cfas
lavaan
CFA objects for each k fold -
cfa.syntax syntax used to produce CFA objects
-
model.names vector of names for CFA objects
-
efa.structures all factor structures identified in the EFA
Examples
# simulate data based on a 3-factor model with standardized loadings
sim.mod <- "f1 =~ .7*x1 + .8*x2 + .3*x3 + .7*x4 + .6*x5 + .8*x6 + .4*x7
f2 =~ .8*x8 + .7*x9 + .6*x10 + .5*x11 + .5*x12 + .7*x13 + .6*x14
f3 =~ .6*x15 + .5*x16 + .9*x17 + .4*x18 + .7*x19 + .5*x20
f1 ~~ .2*f2
f2 ~~ .2*f3
f1 ~~ .2*f3
x9 ~~ .2*x10"
set.seed(1161)
sim.data <- simstandard::sim_standardized(sim.mod, n = 900,
latent = FALSE,
errors = FALSE)[c(2:9,1,10:20)]
# include a custom 2-factor model
custom2f <- paste0("f1 =~ ", paste(colnames(sim.data)[1:10], collapse = " + "),
"\nf2 =~ ",paste(colnames(sim.data)[11:20], collapse = " + "))
mods <- kfa(data = sim.data,
k = NULL, # prompts power analysis to determine number of folds
cores = 2,
custom.cfas = custom2f)
Creates summary report from a k-fold factor analysis
Description
Generates a report summarizing the factor analytic results over k-folds.
Usage
kfa_report(
models,
file.name,
report.title = file.name,
path = NULL,
report.format = "html_document",
word.template = NULL,
index = "default",
plots = FALSE,
load.flag = 0.3,
cor.flag = 0.9,
rel.flag = 0.6,
digits = 2
)
Arguments
models |
an object returned from |
file.name |
character; file name to create on disk. |
report.title |
character; title of the report |
path |
character; path of the directory where summary report will be saved. Default is working directory. |
report.format |
character; file format of the report. Default is HTML ("html_document"). See |
word.template |
character; file path to word document to use as a formatting template when |
index |
character; one or more fit indices to summarize in the report. Use |
plots |
logical; should plots of the factor models be included in the report? |
load.flag |
numeric; factor loadings of variables below this value will be flagged. Default is .30 |
cor.flag |
numeric; factor correlations above this value will be flagged. Default is .90 |
rel.flag |
numeric; factor (scale) reliabilities below this value will be flagged. Default is .60. |
digits |
integer; number of decimal places to display in the report. |
Value
A summary report of factor structures and model fit within and between folds.
Examples
# simulate data based on a 3-factor model with standardized loadings
sim.mod <- "f1 =~ .7*x1 + .8*x2 + .3*x3 + .7*x4 + .6*x5 + .8*x6 + .4*x7
f2 =~ .8*x8 + .7*x9 + .6*x10 + .5*x11 + .5*x12 + .7*x13 + .6*x14
f3 =~ .6*x15 + .5*x16 + .9*x17 + .4*x18 + .7*x19 + .5*x20
f1 ~~ .2*f2
f2 ~~ .2*f3
f1 ~~ .2*f3
x9 ~~ .2*x10"
set.seed(1161)
sim.data <- simstandard::sim_standardized(sim.mod, n = 900,
latent = FALSE,
errors = FALSE)[c(2:9,1,10:20)]
# include a custom 2-factor model
custom2f <- paste0("f1 =~ ", paste(colnames(sim.data)[1:10], collapse = " + "),
"\nf2 =~ ",paste(colnames(sim.data)[11:20], collapse = " + "))
mods <- kfa(data = sim.data,
k = NULL, # prompts power analysis to determine number of folds
cores = 2,
custom.cfas = custom2f)
## Not run:
kfa_report(mods, file.name = "example_sim_kfa_report",
report.format = "html_document",
report.title = "K-fold Factor Analysis - Example Sim")
## End(Not run)
Unique factor structures
Description
Extract unique factor structures across the k-folds
Usage
model_structure(models)
Arguments
models |
An object returned from |
Value
data.frame
with the number of folds the unique factor structure was tested for each factor model.
Examples
data(example.kfa)
model_structure(example.kfa)
Conducts exploratory factor analysis
Description
This function is intended for use on independent samples rather than integrated with k-fold cross-validation.
Usage
run_efa(
data,
variables = names(data),
m = floor(ncol(data)/4),
rotation = "oblimin",
simple = TRUE,
min.loading = NA,
single.item = c("keep", "drop", "none"),
identified = TRUE,
constrain0 = FALSE,
ordered = FALSE,
estimator = NULL,
missing = "listwise",
...
)
Arguments
data |
a |
variables |
character vector of column names in |
m |
integer; maximum number of factors to extract. Default is 4 items per factor. |
rotation |
character (case-sensitive); any rotation method listed in
|
simple |
logical; Should the perfect simple structure be returned (default) when converting EFA results to CFA syntax?
If |
min.loading |
numeric between 0 and 1 indicating the minimum (absolute) value of the loading for a variable on a factor
when converting EFA results to CFA syntax. Must be specified when |
single.item |
character indicating how single-item factors should be treated.
Use |
identified |
logical; Should identification check for rotational uniqueness a la Millsap (2001) be performed?
If the model is not identified |
constrain0 |
logical; Should variable(s) with all loadings below |
ordered |
logical; Should items be treated as ordinal and the
polychoric correlations used in the factor analysis? When |
estimator |
if |
missing |
default is "listwise". See |
... |
other arguments passed to |
Details
When converting EFA results to CFA syntax (via efa_cfa_syntax
), the simple structure is
defined as each variable loading onto a single factor. This is determined using the largest factor loading for each variable.
When simple = FALSE
, variables are allowed to cross-load on multiple factors. In this case, all pathways with loadings
above the min.loading
are retained. However, allowing cross-loading variables can result in model under-identification.
An identification check is run by default, but can be turned off by setting identified = FALSE
.
Value
A three-element list
:
-
efas
lavaan
object for each m model -
loadings (rotated) factor loading matrix for each m model
-
cfa.syntax CFA syntax generated from loadings
References
Millsap, R. E. (2001). When trivial constraints are not trivial: The choice of uniqueness constraints in confirmatory factor analysis. Structural Equation Modeling, 8(1), 1-17. doi:10.1207/S15328007SEM0801_1
Examples
# simulate data based on a 3-factor model with standardized loadings
sim.mod <- "f1 =~ .7*x1 + .8*x2 + .3*x3 + .7*x4 + .6*x5 + .8*x6 + .4*x7
f2 =~ .8*x8 + .7*x9 + .6*x10 + .5*x11 + .5*x12 + .7*x13 + .6*x14
f3 =~ .6*x15 + .5*x16 + .9*x17 + .4*x18 + .7*x19 + .5*x20
f1 ~~ .2*f2
f2 ~~ .2*f3
f1 ~~ .2*f3
x9 ~~ .2*x10"
set.seed(1161)
sim.data <- simstandard::sim_standardized(sim.mod, n = 900,
latent = FALSE,
errors = FALSE)[c(2:9,1,10:20)]
# Run 1-, 2-, and 3-factor models
efas <- run_efa(sim.data, m = 3)
Write exploratory factor analysis syntax
Description
Converts variable names to lavaan-compatible exploratory factor analysis syntax
Usage
write_efa(nf, vnames)
Arguments
nf |
integer; number of factors |
vnames |
character vector; names of variables to include in the efa |
Value
character. Use cat()
to best examine the returned syntax.
Examples
vnames <- paste("x", 1:10)
syntax <- write_efa(nf = 2, vnames = vnames)
cat(syntax)