run_lda_models.Rd
This function fits a collection of lda models to a dataset, fitting one model
for each hyperparameter setting specified by the
lda_varying_params_list
argument. Its output can be directly used by
align_topics
.
run_lda_models(
data,
lda_varying_params_lists,
lda_fixed_params_list = list(),
dir = NULL,
reset = FALSE,
verbose = FALSE,
seed = 1L
)
(required) a matrix
, data.frame
or
slam::simple_triplet_matrix
containing the counts (integers) of each
feature (e.g. words) and each sample (or document). If data is provided as
matrix
or data.frame
, each row is a sample, each column is a
feature.
(required) a list
specifying the
parameter for each models that needs to be ran. Currently, supported
parameters are "k" (the number of topic), "method" ("VEM" or "Gibbs"), and
"control", a list of type LDAcontrol
. See topicmodels::LDA
for
details and below for examples.
(optional) a list
specifying the
parameters common to all models to be fitted. Values provided by
lda_fixed_params_list
are overwritten by those provided by
lda_varying_params_lists
.
(optional) a character
specifying the directory in which
individual LDA models should be stored. If not specified, individual LDA
models are not stored. This option is especially useful for data exploration
as it allows to save execution time if one wishes to add models to an
existing model list. (see examples)
(optional, default = FALSE
). Should any cached models in
the save directory be cleared?
(optional, default = FALSE
) Print verbose output while
running models?
(optional, default = 1
) Seed to use in
topicmodels::LDAControl
. Necessary because LDA's VEM routine uses an
external (non-R) random number generator.
a list of LDA models (see package topicmodels
).
? or a lda_models
object which would be a list of
1. a list of model;
2. some metadata about the alignement
set.seed(1)
data = matrix(sample(0:1000, size = 24), 4, 6)
lda_varying_params_lists = list(K2 = list(k = 2), K3 = list(k = 3))
lda_models =
run_lda_models(
data = data,
lda_varying_params_lists = lda_varying_params_lists,
dir = "test_lda_models/"
)
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
additional_lda_varying_params_list =
list(K4 = list(k = 4))
updated_lda_models =
run_lda_models(
data = data,
lda_varying_params_lists =
append(
lda_varying_params_lists,
additional_lda_varying_params_list),
dir = "test_lda_models/"
)
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
# because we specified the "dir" option, it only runs LDA for k = 4
unlink("test_lda_models/", recursive = TRUE)