timing.Rmd
This vignette measures the runtime of a few steps in the alignment workflow. Running this vignette with \(V = 1000\), \(N = 250\) gives the estimates reported in the accompanying manuscript.
library(MCMCpack)
#> Loading required package: coda
#> Loading required package: MASS
#> ##
#> ## Markov Chain Monte Carlo Package (MCMCpack)
#> ## Copyright (C) 2003-2023 Andrew D. Martin, Kevin M. Quinn, and Jong Hee Park
#> ##
#> ## Support provided by the U.S. National Science Foundation
#> ## (Grants SES-0350646 and SES-0350613)
#> ##
library(alto)
#>
#> Attaching package: 'alto'
#> The following object is masked from 'package:stats':
#>
#> weights
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following object is masked from 'package:MASS':
#>
#> select
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(purrr)
library(stringr)
library(tictoc)
source("https://raw.githubusercontent.com/krisrs1128/topic_align/main/simulations/simulation_functions.R")
For this simulation, we work with simulated LDA data, as in the “Identifying True Topics” vignette.
attach(params)
lambdas <- list(beta = 0.1, gamma = .5, count = 1e4)
betas <- rdirichlet(K, rep(lambdas$beta, V))
gammas <- rdirichlet(N, rep(lambdas$gamma, K))
x <- simulate_lda(betas, gammas, lambda = lambdas$count)
We split model running and alignment, so we can measure the
computation times separately. We use the tictoc
library for
this. In general, running the LDA models consumes the majority of the
time in an alignment workflow, especially when the sample or vocabulary
size is large.
lda_params <- map(1:n_models, ~ list(k = .))
names(lda_params) <- str_c("K", 1:n_models)
tic()
lda_models <- run_lda_models(x, lda_params, reset = TRUE)
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
#> Using default value 'VEM' for 'method' LDA parameter.
toc()
#> 6.955 sec elapsed
tic()
align_topics(lda_models, method = "product")
#> # An alignment: 10 models, 55 topics:
#> # A tibble: 6 × 8
#> m m_next k k_next weight document_mass bw_weight fw_weight
#> <fct> <fct> <int> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 K1 K2 1 1 0.602 18.1 1 0.602
#> 2 K1 K2 1 2 0.398 11.9 1 0.398
#> 3 K1 K3 1 1 0.396 11.9 1 0.396
#> 4 K1 K3 1 2 0.232 6.96 1 0.232
#> 5 K1 K3 1 3 0.372 11.2 1 0.372
#> 6 K1 K4 1 1 0.164 4.91 1 0.164
#> # ... with 1314 more rows
toc()
#> 13.01 sec elapsed
tic()
align_topics(lda_models, method = "transport")
#> # An alignment: 10 models, 55 topics:
#> # A tibble: 6 × 8
#> m m_next k k_next weight document_mass bw_weight fw_weight
#> <fct> <fct> <int> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 K1 K2 1 1 0.602 18.1 1 0.602
#> 2 K1 K2 1 2 0.398 11.9 1 0.398
#> 3 K1 K3 1 1 0.396 11.9 1 0.396
#> 4 K1 K3 1 2 0.232 6.96 1 0.232
#> 5 K1 K3 1 3 0.372 11.2 1 0.372
#> 6 K1 K4 1 1 0.262 7.86 1 0.262
#> # ... with 1314 more rows
toc()
#> 11.314 sec elapsed