This vignette measures the runtime of a few steps in the alignment workflow. Running this vignette with \(V = 1000\), \(N = 250\) gives the estimates reported in the accompanying manuscript.

library(MCMCpack)
#> Loading required package: coda
#> Loading required package: MASS
#> ##
#> ## Markov Chain Monte Carlo Package (MCMCpack)
#> ## Copyright (C) 2003-2024 Andrew D. Martin, Kevin M. Quinn, and Jong Hee Park
#> ##
#> ## Support provided by the U.S. National Science Foundation
#> ## (Grants SES-0350646 and SES-0350613)
#> ##
library(alto)
#> 
#> Attaching package: 'alto'
#> The following object is masked from 'package:stats':
#> 
#>     weights
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following object is masked from 'package:MASS':
#> 
#>     select
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(purrr)
library(stringr)
library(tictoc)
source("https://raw.githubusercontent.com/krisrs1128/topic_align/main/simulations/simulation_functions.R")

For this simulation, we work with simulated LDA data, as in the “Identifying True Topics” vignette.

attach(params)
lambdas <- list(beta = 0.1, gamma = .5, count = 1e4)
betas <- rdirichlet(K, rep(lambdas$beta, V))
gammas <- rdirichlet(N, rep(lambdas$gamma, K))
x <- simulate_lda(betas, gammas, lambda = lambdas$count)

We split model running and alignment, so we can measure the computation times separately. We use the tictoc library for this. In general, running the LDA models consumes the majority of the time in an alignment workflow, especially when the sample or vocabulary size is large.

lda_params <- map(1:n_models, ~ list(k = .))
names(lda_params) <- str_c("K", 1:n_models)

tic()
lda_models <- run_lda_models(x, lda_params, reset = TRUE)
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
toc()
#> 2.674 sec elapsed

tic()
align_topics(lda_models, method = "product")
#> # An alignment: 10 models, 55 topics:
#> # A tibble: 6 × 8
#>   m     m_next     k k_next weight document_mass bw_weight fw_weight
#>   <fct> <fct>  <int>  <int>  <dbl>         <dbl>     <dbl>     <dbl>
#> 1 K1    K2         1      1  0.439         13.2          1     0.439
#> 2 K1    K2         1      2  0.561         16.8          1     0.561
#> 3 K1    K3         1      1  0.266          7.99         1     0.266
#> 4 K1    K3         1      2  0.463         13.9          1     0.463
#> 5 K1    K3         1      3  0.271          8.12         1     0.271
#> 6 K1    K4         1      1  0.248          7.45         1     0.248
#> # ... with 1314 more rows
toc()
#> 0.919 sec elapsed

tic()
align_topics(lda_models, method = "transport")
#> # An alignment: 10 models, 55 topics:
#> # A tibble: 6 × 8
#>   m     m_next     k k_next weight document_mass bw_weight fw_weight
#>   <fct> <fct>  <int>  <int>  <dbl>         <dbl>     <dbl>     <dbl>
#> 1 K1    K2         1      1  0.439         13.2          1     0.439
#> 2 K1    K2         1      2  0.561         16.8          1     0.561
#> 3 K1    K3         1      1  0.266          7.99         1     0.266
#> 4 K1    K3         1      2  0.463         13.9          1     0.463
#> 5 K1    K3         1      3  0.271          8.12         1     0.271
#> 6 K1    K4         1      1  0.248          7.45         1     0.248
#> # ... with 1314 more rows
toc()
#> 0.901 sec elapsed