This vignette measures the runtime of a few steps in the alignment workflow. Running this vignette with \(V = 1000\), \(N = 250\) gives the estimates reported in the accompanying manuscript.

library(MCMCpack)
#> Loading required package: coda
#> Loading required package: MASS
#> ##
#> ## Markov Chain Monte Carlo Package (MCMCpack)
#> ## Copyright (C) 2003-2023 Andrew D. Martin, Kevin M. Quinn, and Jong Hee Park
#> ##
#> ## Support provided by the U.S. National Science Foundation
#> ## (Grants SES-0350646 and SES-0350613)
#> ##
library(alto)
#> 
#> Attaching package: 'alto'
#> The following object is masked from 'package:stats':
#> 
#>     weights
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following object is masked from 'package:MASS':
#> 
#>     select
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(purrr)
library(stringr)
library(tictoc)
source("https://raw.githubusercontent.com/krisrs1128/topic_align/main/simulations/simulation_functions.R")

For this simulation, we work with simulated LDA data, as in the “Identifying True Topics” vignette.

attach(params)
lambdas <- list(beta = 0.1, gamma = .5, count = 1e4)
betas <- rdirichlet(K, rep(lambdas$beta, V))
gammas <- rdirichlet(N, rep(lambdas$gamma, K))
x <- simulate_lda(betas, gammas, lambda = lambdas$count)

We split model running and alignment, so we can measure the computation times separately. We use the tictoc library for this. In general, running the LDA models consumes the majority of the time in an alignment workflow, especially when the sample or vocabulary size is large.

lda_params <- map(1:n_models, ~ list(k = .))
names(lda_params) <- str_c("K", 1:n_models)

tic()
lda_models <- run_lda_models(x, lda_params, reset = TRUE)
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
toc()
#> 6.955 sec elapsed

tic()
align_topics(lda_models, method = "product")
#> # An alignment: 10 models, 55 topics:
#> # A tibble: 6 × 8
#>   m     m_next     k k_next weight document_mass bw_weight fw_weight
#>   <fct> <fct>  <int>  <int>  <dbl>         <dbl>     <dbl>     <dbl>
#> 1 K1    K2         1      1  0.602         18.1          1     0.602
#> 2 K1    K2         1      2  0.398         11.9          1     0.398
#> 3 K1    K3         1      1  0.396         11.9          1     0.396
#> 4 K1    K3         1      2  0.232          6.96         1     0.232
#> 5 K1    K3         1      3  0.372         11.2          1     0.372
#> 6 K1    K4         1      1  0.164          4.91         1     0.164
#> # ... with 1314 more rows
toc()
#> 13.01 sec elapsed

tic()
align_topics(lda_models, method = "transport")
#> # An alignment: 10 models, 55 topics:
#> # A tibble: 6 × 8
#>   m     m_next     k k_next weight document_mass bw_weight fw_weight
#>   <fct> <fct>  <int>  <int>  <dbl>         <dbl>     <dbl>     <dbl>
#> 1 K1    K2         1      1  0.602         18.1          1     0.602
#> 2 K1    K2         1      2  0.398         11.9          1     0.398
#> 3 K1    K3         1      1  0.396         11.9          1     0.396
#> 4 K1    K3         1      2  0.232          6.96         1     0.232
#> 5 K1    K3         1      3  0.372         11.2          1     0.372
#> 6 K1    K4         1      1  0.262          7.86         1     0.262
#> # ... with 1314 more rows
toc()
#> 11.314 sec elapsed