timing.RmdThis vignette measures the runtime of a few steps in the alignment workflow. Running this vignette with \(V = 1000\), \(N = 250\) gives the estimates reported in the accompanying manuscript.
library(MCMCpack)
#> Loading required package: coda
#> Loading required package: MASS
#> ##
#> ## Markov Chain Monte Carlo Package (MCMCpack)
#> ## Copyright (C) 2003-2024 Andrew D. Martin, Kevin M. Quinn, and Jong Hee Park
#> ##
#> ## Support provided by the U.S. National Science Foundation
#> ## (Grants SES-0350646 and SES-0350613)
#> ##
library(alto)
#>
#> Attaching package: 'alto'
#> The following object is masked from 'package:stats':
#>
#> weights
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following object is masked from 'package:MASS':
#>
#> select
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(purrr)
library(stringr)
library(tictoc)
source("https://raw.githubusercontent.com/krisrs1128/topic_align/main/simulations/simulation_functions.R")For this simulation, we work with simulated LDA data, as in the “Identifying True Topics” vignette.
attach(params)
lambdas <- list(beta = 0.1, gamma = .5, count = 1e4)
betas <- rdirichlet(K, rep(lambdas$beta, V))
gammas <- rdirichlet(N, rep(lambdas$gamma, K))
x <- simulate_lda(betas, gammas, lambda = lambdas$count)We split model running and alignment, so we can measure the
computation times separately. We use the tictoc library for
this. In general, running the LDA models consumes the majority of the
time in an alignment workflow, especially when the sample or vocabulary
size is large.
lda_params <- map(1:n_models, ~ list(k = .))
names(lda_params) <- str_c("K", 1:n_models)
tic()
lda_models <- run_lda_models(x, lda_params, reset = TRUE)
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
toc()
#> 2.674 sec elapsed
tic()
align_topics(lda_models, method = "product")
#> # An alignment: 10 models, 55 topics:
#> # A tibble: 6 × 8
#> m m_next k k_next weight document_mass bw_weight fw_weight
#> <fct> <fct> <int> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 K1 K2 1 1 0.439 13.2 1 0.439
#> 2 K1 K2 1 2 0.561 16.8 1 0.561
#> 3 K1 K3 1 1 0.266 7.99 1 0.266
#> 4 K1 K3 1 2 0.463 13.9 1 0.463
#> 5 K1 K3 1 3 0.271 8.12 1 0.271
#> 6 K1 K4 1 1 0.248 7.45 1 0.248
#> # ... with 1314 more rows
toc()
#> 0.919 sec elapsed
tic()
align_topics(lda_models, method = "transport")
#> # An alignment: 10 models, 55 topics:
#> # A tibble: 6 × 8
#> m m_next k k_next weight document_mass bw_weight fw_weight
#> <fct> <fct> <int> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 K1 K2 1 1 0.439 13.2 1 0.439
#> 2 K1 K2 1 2 0.561 16.8 1 0.561
#> 3 K1 K3 1 1 0.266 7.99 1 0.266
#> 4 K1 K3 1 2 0.463 13.9 1 0.463
#> 5 K1 K3 1 3 0.271 8.12 1 0.271
#> 6 K1 K4 1 1 0.248 7.45 1 0.248
#> # ... with 1314 more rows
toc()
#> 0.901 sec elapsed