run_lda_models.RdThis function fits a collection of lda models to a dataset, fitting one model
for each hyperparameter setting specified by the
lda_varying_params_list argument. Its output can be directly used by
align_topics.
run_lda_models(
data,
lda_varying_params_lists,
lda_fixed_params_list = list(),
dir = NULL,
reset = FALSE,
verbose = FALSE,
seed = 1L
)(required) a matrix, data.frame or
slam::simple_triplet_matrix containing the counts (integers) of each
feature (e.g. words) and each sample (or document). If data is provided as
matrix or data.frame, each row is a sample, each column is a
feature.
(required) a list specifying the
parameter for each models that needs to be ran. Currently, supported
parameters are "k" (the number of topic), "method" ("VEM" or "Gibbs"), and
"control", a list of type LDAcontrol. See topicmodels::LDA for
details and below for examples.
(optional) a list specifying the
parameters common to all models to be fitted. Values provided by
lda_fixed_params_list are overwritten by those provided by
lda_varying_params_lists.
(optional) a character specifying the directory in which
individual LDA models should be stored. If not specified, individual LDA
models are not stored. This option is especially useful for data exploration
as it allows to save execution time if one wishes to add models to an
existing model list. (see examples)
(optional, default = FALSE). Should any cached models in
the save directory be cleared?
(optional, default = FALSE) Print verbose output while
running models?
(optional, default = 1) Seed to use in
topicmodels::LDAControl. Necessary because LDA's VEM routine uses an
external (non-R) random number generator.
a list of LDA models (see package topicmodels).
? or a lda_models object which would be a list of
1. a list of model;
2. some metadata about the alignement
set.seed(1)
data = matrix(sample(0:1000, size = 24), 4, 6)
lda_varying_params_lists = list(K2 = list(k = 2), K3 = list(k = 3))
lda_models =
run_lda_models(
data = data,
lda_varying_params_lists = lda_varying_params_lists,
dir = "test_lda_models/"
)
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
additional_lda_varying_params_list =
list(K4 = list(k = 4))
updated_lda_models =
run_lda_models(
data = data,
lda_varying_params_lists =
append(
lda_varying_params_lists,
additional_lda_varying_params_list),
dir = "test_lda_models/"
)
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
# because we specified the "dir" option, it only runs LDA for k = 4
unlink("test_lda_models/", recursive = TRUE)