This function fits a collection of lda models to a dataset, fitting one model for each hyperparameter setting specified by the lda_varying_params_list argument. Its output can be directly used by align_topics.

run_lda_models(
  data,
  lda_varying_params_lists,
  lda_fixed_params_list = list(),
  dir = NULL,
  reset = FALSE,
  verbose = FALSE,
  seed = 1L
)

Arguments

data

(required) a matrix, data.frame or slam::simple_triplet_matrix containing the counts (integers) of each feature (e.g. words) and each sample (or document). If data is provided as matrix or data.frame, each row is a sample, each column is a feature.

lda_varying_params_lists

(required) a list specifying the parameter for each models that needs to be ran. Currently, supported parameters are "k" (the number of topic), "method" ("VEM" or "Gibbs"), and "control", a list of type LDAcontrol. See topicmodels::LDA for details and below for examples.

lda_fixed_params_list

(optional) a list specifying the parameters common to all models to be fitted. Values provided by lda_fixed_params_list are overwritten by those provided by lda_varying_params_lists.

dir

(optional) a character specifying the directory in which individual LDA models should be stored. If not specified, individual LDA models are not stored. This option is especially useful for data exploration as it allows to save execution time if one wishes to add models to an existing model list. (see examples)

reset

(optional, default = FALSE). Should any cached models in the save directory be cleared?

verbose

(optional, default = FALSE) Print verbose output while running models?

seed

(optional, default = 1) Seed to use in topicmodels::LDAControl. Necessary because LDA's VEM routine uses an external (non-R) random number generator.

Value

a list of LDA models (see package topicmodels). ? or a lda_models object which would be a list of 1. a list of model; 2. some metadata about the alignement

Examples

set.seed(1)
data = matrix(sample(0:1000, size = 24), 4, 6)
lda_varying_params_lists = list(K2 = list(k = 2), K3 = list(k = 3))
lda_models =
   run_lda_models(
      data = data,
      lda_varying_params_lists = lda_varying_params_lists,
      dir = "test_lda_models/"
      )
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.

additional_lda_varying_params_list =
   list(K4 = list(k = 4))
 updated_lda_models =
   run_lda_models(
      data = data,
      lda_varying_params_lists =
        append(
           lda_varying_params_lists,
           additional_lda_varying_params_list),
      dir = "test_lda_models/"
      )
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.

# because we specified the "dir" option, it only runs LDA for k = 4
unlink("test_lda_models/", recursive = TRUE)