VIBRANT Trial - Primary Analyses

Author

Laura Symul (UCLouvain), Laura Vermeren (UCLouvain), Susan Holmes (Stanford University)

Published

January 9, 2026

Code
library(tidyverse)
library(magrittr)
library(gt)
library(patchwork)
library(SummarizedExperiment)
library(tidySummarizedExperiment)
library(brms)
library(gtsummary)
library(labelled)
library(scales)
# library(gridExtra)
library(ggpubr)
library(ggh4x)
library(ggtext)
# library(mia) # BiocManager::install("mia")

#theme_set(theme_light())
tmp <- fs::dir_map("R", source)
rm(tmp)
Code
tmp <- fs::dir_map("../VIBRANT-99-utils/R/", source)
rm(tmp)

This document details the analyses and provides results for the primary analyses of the VIBRANT trial.

Trial data

The data used for these analyses are stored in an R Bioconductor MultiAssayExperiment object which contains the QCed and pre-processed data. The code and description of how raw data have been processed and QCed is available on the VIBRANT-0-QC-and-preprocessing GitHub repository.

Code
# Loading MAE_1
mae_file <- 
  fs::dir_ls(str_c(data_dir(), "06 subsetted MAEs/"), regexp = "mae_1_.*\\.rds$") |> 
  sort(decreasing = TRUE) |>
  magrittr::extract(1)

cat(str_c("Loading the RDS file containing the unblinded VIBRANT MAE: /", mae_file |> str_remove(data_dir()), "\n"))
Loading the RDS file containing the unblinded VIBRANT MAE: /06 subsetted MAEs/mae_1_20260109.rds
Code
mae <- readRDS(mae_file)

rm(mae_file)

mae
A MultiAssayExperiment object of 4 listed
 experiments with user-defined names and respective classes.
 Containing an ExperimentList class object of length 4:
 [1] col_LBP_mg: SummarizedExperiment with 2 rows and 890 columns
 [2] col_LBP_qPCR: SummarizedExperiment with 4 rows and 895 columns
 [3] col_crispatus_mg: SummarizedExperiment with 2 rows and 890 columns
 [4] mg: SummarizedExperiment with 779 rows and 893 columns
Functionality:
 experiments() - obtain the ExperimentList instance
 colData() - the primary/phenotype DataFrame
 sampleMap() - the sample coordination DataFrame
 `$`, `[`, `[[` - extract colData columns, subset, or experiment
 *Format() - convert into a long or wide DataFrame
 assays() - convert ExperimentList to a SimpleList of matrices
 exportClass() - save data to flat files

Demographics (Table 1 & suppl. tables by sites)

Summary table

The demographics table is constructed as follows.

We first create a table1_data data.frame from the participant_crfs_merged table stored in the @metadata slot of the mae object.

This table contains most of the variables required for the demographics summary, including: site, age, race, ethn, cut_size_meals, eat_less, hungry_did_not_eat, sexual_partners_lifetime, sexual_partners_past_month and contraception_method.

Additional variables such as arm and study population flags are retrieved from mae@colData. For this analysis, we restrict the dataset to participants in the mITT population.

From the food-related variables (cut_size_meals, eat_less, and hungry_did_not_eat), we derive a new variable food_insecurity. A participant is considered food insecure if she answered “Yes” to any of the three questions.

Code
table1_data <- 
  metadata(mae)$participant_crfs_merged |> 
  dplyr::left_join(
    colData(mae) |> as_tibble() |> select(pid, site, arm, ITT, mITT, PP) |> distinct(), 
    by = join_by(pid, site)
    )  |> 
  filter(mITT) |> 
  select(pid, site, arm, age, race, ethn, cut_size_meals, eat_less, hungry_did_not_eat, sexual_partners_lifetime, sexual_partners_past_month, contraception_method) |> 
  mutate(
    food_insecurity = ifelse(
      cut_size_meals == "Yes" | eat_less == "Yes" | hungry_did_not_eat == "Yes",
      "Yes",
      "No"
    ),
    food_insecurity = factor(food_insecurity, levels = c("No", "Yes"))
  ) |> 
  select(-c(eat_less, cut_size_meals, hungry_did_not_eat))
Code
table1_data |> 
  ggplot(aes(x = food_insecurity)) +
  geom_bar() +
  labs(x= "Food insecurity", y = "Count")+
  theme_classic()

In addition to these variables, we also include information about about the gender of participants’ sexual partners.

Participants were asked about their partner’s gender at each weekly visit (weeks 0 through 12: visits 1000, 1100, 1200, 1300, 1400, 1500, 1700, 1900, and 2120) using Case Report Form (CRF) 19. Responses are recorded in the past_week_sex_partner column of the mae@metadata$visits_crfs_merged table.

Based on participants’ answers across all weekly visits, we categorized their reported partner gender into one of four groups:

  • No sex
  • Only male
  • Only female
  • Both

This derived variable is included in the table1_data table.

Code
gender_sexual_partner <- 
  metadata(mae)$visits_crfs_merged |>
  select(pid, past_week_sex_partner) |>
  filter(!is.na(past_week_sex_partner)) |>
  group_by(pid) |>
  summarise(
    gender_sexual_partner = 
      case_when(
        all(past_week_sex_partner == "No sex in the past week") ~ "No sex",
        any(past_week_sex_partner %in% c("A single male partner", "Multiple male partners")) &
          !any(past_week_sex_partner %in% c("A single female partner", "Multiple female partners")) ~ "Only man",
        any(past_week_sex_partner %in% c("A single female partner", "Multiple female partners")) &
          !any(past_week_sex_partner %in% c("A single male partner", "Multiple male partners")) ~ "Only female",
        any(past_week_sex_partner %in% c("A single male partner", "Multiple male partners")) &
          any(past_week_sex_partner %in% c("A single female partner", "Multiple female partners")) ~ "Both",
        TRUE ~ NA_character_
      ),
    .groups = "drop"
  )

table1_data <- 
  table1_data |> 
  left_join(gender_sexual_partner, by = "pid")

We also recoded the race variable to simplify and harmonize categories:

  • “Asian or South Asian” is re-coded to “Asian” (for brevity)
  • “Black, African American, or African” and “African” are re-coded to “Black/African” (for brevity)
  • “Coloured”, “White”, “Other”, and “Prefered not to answer” are left as is.
Code
table1_data <-
  table1_data |>
  
  mutate(
    across(where(is.character), as.factor),
    
    race = fct_recode(
      race,
      "Asian" = "Asian or South Asian",
      "Black/African" = "Black, African American, or African",
      "Black/African" = "African",
      "Coloured" = "Coloured",
      "White" = "White",
      "Other" = "Other",
      "Prefer not to answer" = "Prefer not to answer"
    ),
    
    race = fct_relevel(race, "Asian", "Black/African", "Coloured", "White", "Other", "Prefer not to answer"),
    race = race |> fct_drop(),
    arm = arm |> fct_drop()
  ) |>
  
  set_variable_labels(
    food_insecurity = "Report food insecurity in past 12 months ",
    sexual_partners_past_month = "Number of partners in past month",
    sexual_partners_lifetime = "Number of partners in lifetime",
    ethn = "Ethnicity", 
    contraception_method = "Contraception", 
    gender_sexual_partner = "Gender of sexual partners"
  )

Table 1 (Population characteristics by arm)

Table 1 summarizes baseline demographic and behavioral characteristics of the participants described above, stratified by study arm and restricted to the mITT population.. Continuous variables are presented as median and range (minimum–maximum), while categorical variables are shown as counts and percentages.

Comparisons across study arms are performed using the Kruskal-Wallis rank sum test for continuous variables. For categorical variables, either Fisher’s exact test or Pearson’s Chi-squared test is used, depending on cell counts and expected frequencies.

Code
table1_data |> 
  select(-pid) |>
  rename_with(~ str_to_title(.x)) |>  
  tbl_summary(
    by = Arm,
    type =
      list(
        Sexual_partners_past_month ~ "continuous",
        Sexual_partners_lifetime ~ "continuous"
      ),
    statistic = list(
      all_continuous() ~ "{median} ({min}-{max})",
      all_categorical() ~ "{n} ({p}%)"
    )
  )  |> 
  add_p(include = -Site)
Characteristic Pl
N = 191
LC106-7
N = 211
LC106-3
N = 151
LC106-o
N = 151
LC115
N = 201
p-value2
Site





    CAP 14 (74%) 14 (67%) 14 (93%) 14 (93%) 14 (70%)
    MGH 5 (26%) 7 (33%) 1 (6.7%) 1 (6.7%) 6 (30%)
Age 29 (20-38) 29 (18-40) 26 (21-34) 25 (19-35) 30 (20-40) 0.5
Race




0.5
    Asian 1 (5.3%) 0 (0%) 1 (6.7%) 0 (0%) 0 (0%)
    Black/African 14 (74%) 18 (86%) 14 (93%) 15 (100%) 18 (90%)
    White 2 (11%) 3 (14%) 0 (0%) 0 (0%) 1 (5.0%)
    Other 1 (5.3%) 0 (0%) 0 (0%) 0 (0%) 1 (5.0%)
    Prefer not to answer 1 (5.3%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
Ethnicity




0.8
    Not hispanic/latino/spanish 3 (60%) 6 (86%) 1 (100%) 1 (100%) 4 (67%)
    hispanic/latino/spanish 2 (40%) 1 (14%) 0 (0%) 0 (0%) 2 (33%)
    Unknown 14 14 14 14 14
Number of partners in lifetime 5 (1-10) 5 (1-90) 4 (1-10) 4 (1-20) 5 (1-10) 0.8
Number of partners in past month 1 (1-2) 1 (0-2) 1 (0-2) 1 (0-3) 1 (0-2) 0.4
Contraception




>0.9
    Continuous combined oral contraceptive pills (skip placebo) 3 (16%) 5 (24%) 1 (6.7%) 1 (6.7%) 3 (15%)
    Contraceptive vaginal ring 0 (0%) 1 (4.8%) 0 (0%) 0 (0%) 0 (0%)
    Cyclic combined oral contraceptive pills 0 (0%) 0 (0%) 0 (0%) 0 (0%) 1 (5.0%)
    Depo provera 11 (58%) 10 (48%) 12 (80%) 11 (73%) 11 (55%)
    Hormonal implant 1 (5.3%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Hormone containing IUD 2 (11%) 2 (9.5%) 0 (0%) 0 (0%) 2 (10%)
    Net-EN 2 (11%) 2 (9.5%) 2 (13%) 3 (20%) 2 (10%)
    Other 0 (0%) 1 (4.8%) 0 (0%) 0 (0%) 1 (5.0%)
Report food insecurity in past 12 months 7 (37%) 7 (33%) 6 (40%) 5 (33%) 6 (30%) >0.9
Gender of sexual partners




0.8
    No sex 1 (5.3%) 3 (14%) 1 (6.7%) 1 (6.7%) 3 (15%)
    Only man 18 (95%) 18 (86%) 14 (93%) 14 (93%) 17 (85%)
1 n (%); Median (Min-Max)
2 Kruskal-Wallis rank sum test; Fisher’s exact test; Pearson’s Chi-squared test

Study population characteristics appear balanced across arms.

Population characteristics by site

An additional version of Table 1 is generated using the same variables and summary methods, but stratified by study site instead of study arm.

Code
table1_data |> 
  select(arm, age, race, site, food_insecurity, contraception_method, sexual_partners_past_month, sexual_partners_lifetime) |>
  rename_with(~ str_to_title(.x)) |>  
  tbl_summary(
    by = Site,
    type =
      list(
        Sexual_partners_past_month ~ "continuous",
        Sexual_partners_lifetime ~ "continuous"
      ),
    statistic = list(
      all_continuous() ~ "{median} ({min}-{max})",
      all_categorical() ~ "{n} ({p}%)"
    )
  )  |> 
  add_p()
Characteristic CAP
N = 701
MGH
N = 201
p-value2
Arm

0.2
    Pl 14 (20%) 5 (25%)
    LC106-7 14 (20%) 7 (35%)
    LC106-3 14 (20%) 1 (5.0%)
    LC106-o 14 (20%) 1 (5.0%)
    LC115 14 (20%) 6 (30%)
Age 26 (18-37) 33 (22-40) <0.001
Race

<0.001
    Asian 0 (0%) 2 (10%)
    Black/African 70 (100%) 9 (45%)
    White 0 (0%) 6 (30%)
    Other 0 (0%) 2 (10%)
    Prefer not to answer 0 (0%) 1 (5.0%)
Report food insecurity in past 12 months 28 (40%) 3 (15%) 0.038
Contraception

<0.001
    Continuous combined oral contraceptive pills (skip placebo) 5 (7.1%) 8 (40%)
    Contraceptive vaginal ring 0 (0%) 1 (5.0%)
    Cyclic combined oral contraceptive pills 0 (0%) 1 (5.0%)
    Depo provera 54 (77%) 1 (5.0%)
    Hormonal implant 0 (0%) 1 (5.0%)
    Hormone containing IUD 0 (0%) 6 (30%)
    Net-EN 11 (16%) 0 (0%)
    Other 0 (0%) 2 (10%)
Number of partners in past month 1 (0-3) 1 (0-2) 0.4
Number of partners in lifetime 4 (1-20) 10 (4-90) <0.001
1 n (%); Median (Min-Max)
2 Fisher’s exact test; Wilcoxon rank sum test; Pearson’s Chi-squared test

There are, unsurprisingly, striking differences in study population characteristics by site.

Population characteristics by arm, split by site

Separate versions of Table 1 are also generated for each study site, with participant characteristics summarized by study arm within each site.

Code
table1_data |> 
  filter(site == "CAP") |> 
  select(-c(pid, site)) |>
  rename_with(~ str_to_title(.x)) |>  
  tbl_summary(
    by = Arm,
    type =
      list(
        Sexual_partners_past_month ~ "continuous",
        Sexual_partners_lifetime ~ "continuous"
      ),
    statistic = list(
      all_continuous() ~ "{median} ({min}-{max})",
      all_categorical() ~ "{n} ({p}%)"
    )
  ) |> 
  add_p()|>
  modify_caption("**Site = CAP**")
Site = CAP
Characteristic Pl
N = 141
LC106-7
N = 141
LC106-3
N = 141
LC106-o
N = 141
LC115
N = 141
p-value2
Age 26 (20-36) 26 (18-36) 26 (21-34) 25 (19-35) 25 (20-37) >0.9
Race




>0.9
    Asian 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Black/African 14 (100%) 14 (100%) 14 (100%) 14 (100%) 14 (100%)
    White 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Other 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Prefer not to answer 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
Ethnicity





    Not hispanic/latino/spanish 0 (NA%) 0 (NA%) 0 (NA%) 0 (NA%) 0 (NA%)
    hispanic/latino/spanish 0 (NA%) 0 (NA%) 0 (NA%) 0 (NA%) 0 (NA%)
    Unknown 14 14 14 14 14
Number of partners in lifetime 4 (1-10) 3 (1-7) 4 (1-10) 4 (1-20) 4 (1-9) 0.8
Number of partners in past month 1 (1-2) 1 (1-2) 1 (0-2) 1 (0-3) 1 (1-2) 0.2
Contraception




0.9
    Continuous combined oral contraceptive pills (skip placebo) 1 (7.1%) 2 (14%) 0 (0%) 0 (0%) 2 (14%)
    Contraceptive vaginal ring 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Cyclic combined oral contraceptive pills 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Depo provera 11 (79%) 10 (71%) 12 (86%) 11 (79%) 10 (71%)
    Hormonal implant 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Hormone containing IUD 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Net-EN 2 (14%) 2 (14%) 2 (14%) 3 (21%) 2 (14%)
    Other 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
Report food insecurity in past 12 months 7 (50%) 5 (36%) 6 (43%) 5 (36%) 5 (36%) >0.9
Gender of sexual partners




>0.9
    No sex 1 (7.1%) 2 (14%) 1 (7.1%) 1 (7.1%) 2 (14%)
    Only man 13 (93%) 12 (86%) 13 (93%) 13 (93%) 12 (86%)
1 Median (Min-Max); n (%)
2 Kruskal-Wallis rank sum test; Fisher’s exact test; NA; Pearson’s Chi-squared test
Code
table1_data |> 
  filter(site == "MGH") |> 
  select(-c(pid, site)) |>
  rename_with(~ str_to_title(.x)) |>  
  tbl_summary(
    by = Arm,
    type =
      list(
        Sexual_partners_past_month ~ "continuous",
        Sexual_partners_lifetime ~ "continuous"
      ),
    statistic = list(
      all_continuous() ~ "{median} ({min}-{max})",
      all_categorical() ~ "{n} ({p}%)"
    )
  ) |> 
  add_p()|>
  modify_caption("**Site = MGH**")
Site = MGH
Characteristic Pl
N = 51
LC106-7
N = 71
LC106-3
N = 11
LC106-o
N = 11
LC115
N = 61
p-value2
Age 32 (26-38) 33 (24-40) 25 (25-25) 35 (35-35) 33 (22-40) 0.7
Race




0.13
    Asian 1 (20%) 0 (0%) 1 (100%) 0 (0%) 0 (0%)
    Black/African 0 (0%) 4 (57%) 0 (0%) 1 (100%) 4 (67%)
    White 2 (40%) 3 (43%) 0 (0%) 0 (0%) 1 (17%)
    Other 1 (20%) 0 (0%) 0 (0%) 0 (0%) 1 (17%)
    Prefer not to answer 1 (20%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
Ethnicity




0.8
    Not hispanic/latino/spanish 3 (60%) 6 (86%) 1 (100%) 1 (100%) 4 (67%)
    hispanic/latino/spanish 2 (40%) 1 (14%) 0 (0%) 0 (0%) 2 (33%)
Number of partners in lifetime 9 (7-10) 15 (6-90) 7 (7-7) 10 (10-10) 6 (4-10) 0.059
Number of partners in past month 1 (1-2) 1 (0-2) 1 (1-1) 1 (1-1) 1 (0-2) >0.9
Contraception




>0.9
    Continuous combined oral contraceptive pills (skip placebo) 2 (40%) 3 (43%) 1 (100%) 1 (100%) 1 (17%)
    Contraceptive vaginal ring 0 (0%) 1 (14%) 0 (0%) 0 (0%) 0 (0%)
    Cyclic combined oral contraceptive pills 0 (0%) 0 (0%) 0 (0%) 0 (0%) 1 (17%)
    Depo provera 0 (0%) 0 (0%) 0 (0%) 0 (0%) 1 (17%)
    Hormonal implant 1 (20%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Hormone containing IUD 2 (40%) 2 (29%) 0 (0%) 0 (0%) 2 (33%)
    Net-EN 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Other 0 (0%) 1 (14%) 0 (0%) 0 (0%) 1 (17%)
Report food insecurity in past 12 months 0 (0%) 2 (29%) 0 (0%) 0 (0%) 1 (17%) 0.8
Gender of sexual partners




>0.9
    No sex 0 (0%) 1 (14%) 0 (0%) 0 (0%) 1 (17%)
    Only man 5 (100%) 6 (86%) 1 (100%) 1 (100%) 5 (83%)
1 Median (Min-Max); n (%)
2 Kruskal-Wallis rank sum test; Fisher’s exact test

Study population characteristics appear balanced across arms at the CAPRISA site. Given the low number of participants at the MGH site in two of the five arms, we create an additional table excluding these two arms.

Code
table1_data |> 
  filter(site == "MGH", !(arm %in% c("LC106-o", "LC106-3"))) |> 
  mutate(arm = arm |> fct_drop()) |> 
  select(-c(pid, site)) |>
  rename_with(~ str_to_title(.x)) |>  
  tbl_summary(
    by = Arm,
    type =
      list(
        Sexual_partners_past_month ~ "continuous",
        Sexual_partners_lifetime ~ "continuous"
      ),
    statistic = list(
      all_continuous() ~ "{median} ({min}-{max})",
      all_categorical() ~ "{n} ({p}%)"
    )
  ) |> 
  add_p()|>
  modify_caption("**Site = MGH**")
Site = MGH
Characteristic Pl
N = 51
LC106-7
N = 71
LC115
N = 61
p-value2
Age 32 (26-38) 33 (24-40) 33 (22-40) >0.9
Race


0.14
    Asian 1 (20%) 0 (0%) 0 (0%)
    Black/African 0 (0%) 4 (57%) 4 (67%)
    White 2 (40%) 3 (43%) 1 (17%)
    Other 1 (20%) 0 (0%) 1 (17%)
    Prefer not to answer 1 (20%) 0 (0%) 0 (0%)
Ethnicity


0.7
    Not hispanic/latino/spanish 3 (60%) 6 (86%) 4 (67%)
    hispanic/latino/spanish 2 (40%) 1 (14%) 2 (33%)
Number of partners in lifetime 9 (7-10) 15 (6-90) 6 (4-10) 0.017
Number of partners in past month 1 (1-2) 1 (0-2) 1 (0-2) 0.8
Contraception


>0.9
    Continuous combined oral contraceptive pills (skip placebo) 2 (40%) 3 (43%) 1 (17%)
    Contraceptive vaginal ring 0 (0%) 1 (14%) 0 (0%)
    Cyclic combined oral contraceptive pills 0 (0%) 0 (0%) 1 (17%)
    Depo provera 0 (0%) 0 (0%) 1 (17%)
    Hormonal implant 1 (20%) 0 (0%) 0 (0%)
    Hormone containing IUD 2 (40%) 2 (29%) 2 (33%)
    Net-EN 0 (0%) 0 (0%) 0 (0%)
    Other 0 (0%) 1 (14%) 1 (17%)
Report food insecurity in past 12 months 0 (0%) 2 (29%) 1 (17%) 0.7
Gender of sexual partners


>0.9
    No sex 0 (0%) 1 (14%) 1 (17%)
    Only man 5 (100%) 6 (86%) 5 (83%)
1 Median (Min-Max); n (%)
2 Kruskal-Wallis rank sum test; Fisher’s exact test

Besides the number of lifetime sexual partners, arms appear balanced across arm at MGH.

Primary outcomes (Table 2) & sensitivity analyses

LBP colonization by week 5

The primary outcome is defined as “Colonization by any of the LBP strains after product administration by week 5”, and colonization by any of the LBP strain is defined as a relative abundance of 5% by any single strain or 10% total relative abundance of the LBP strains. The relative abundance of LBP strain is estimated using metagenomics (taxonomic composition estimated using VIRGO2 and strain proportion of total L. crispatus is estimated using kSanity), and includes samples from weekly visits 1200 to 1500 (week 2 (post-INT) to week 5 (3 weeks post-INT)).

The computation itself has been done at the QC/Pre-processing stage (see the VIBRANT-0-QC-and-preprocessing GitHub repository).

Code
col_by_week5 <- 
  mae@colData |> as_tibble() |> 
  select(pid, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |> 
  left_join(
    mae[["col_LBP_mg"]] |> 
      as_tibble() |> 
      dplyr::left_join(
        mae@colData |> as_tibble() |> select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |> 
      dplyr::filter(visit_code == "1500", .feature == "colonized_LBP_by_mg")
  ) |> 
  mutate(LBP_colonization_by_week5 = outcome |> replace_na(FALSE)) 
Code
summary_table <- 
  col_by_week5 |> 
  group_by(site, arm) |> 
  summarize(
    n = n(), 
    n_success = sum(LBP_colonization_by_week5),
    .groups = "drop"
  ) |> 
  mutate(
    p = n_success / n,
    CI = binom::binom.confint(n_success, n, method = "wilson") #exact
  )
Code
table_2 <- 
  summary_table |> 
  mutate(
    N_per_site_and_arm = str_c("N = ", n),
    LBP_strain_detected = str_c(n_success, " (", round(p * 100)," %)"),
    CI_text = str_c(round(CI$lower * 100), "% - ", round(CI$upper * 100), "%")
  ) |> 
  dplyr::select(
    site, arm, N_per_site_and_arm, LBP_strain_detected, CI_text
  ) |> 
  pivot_longer(-c(site, arm)) |>
  mutate(
    name = 
      case_when(
        name == "N_per_site_and_arm" ~ "N participants",
        name == "LBP_strain_detected" ~ "n (%) participants\nwith LBP strain detected",
        name == "CI_text" ~ "95% CI"
      )
  ) |> 
  pivot_wider(
    id_cols = c(site, name), 
    names_from = arm,
    values_from = value, values_fill = ""
    )
Code
table_2 |> 
  group_by(site) |> 
  gt(caption = "Table 1: Colonization by week 5 by arm and site", row_group_as_column = TRUE) |> 
  cols_width(
    "name" ~ px(200),
    everything() ~ px(120)
  ) |> 
  cols_label(
    name = "",
    # Blinded = "All blinded arms",
    Pl = "Placebo",
    `LC106-7` = "LC106<br>7 days",
    `LC106-3` = "LC106<br>3 days",
    `LC106-o` = "LC106<br>overlap",
    `LC115` = "LC115<br>7 days",
    .fn = md
  )
Table 1: Colonization by week 5 by arm and site
Placebo LC106
7 days
LC106
3 days
LC106
overlap
LC115
7 days
CAP N participants N = 14 N = 14 N = 14 N = 14 N = 14
n (%) participants with LBP strain detected 0 (0 %) 9 (64 %) 7 (50 %) 7 (50 %) 10 (71 %)
95% CI 0% - 22% 39% - 84% 27% - 73% 27% - 73% 45% - 88%
MGH N participants N = 5 N = 7 N = 1 N = 1 N = 6
n (%) participants with LBP strain detected 1 (20 %) 6 (86 %) 1 (100 %) 1 (100 %) 6 (100 %)
95% CI 4% - 62% 49% - 97% 21% - 100% 21% - 100% 61% - 100%

Visualization of the primary outcome

Code
col_by_week5 |> 
  arrange(site, arm, LBP_colonization_by_week5) |>
  group_by(site, arm) |> 
  mutate(participant_number = row_number() |> factor()) |> 
  ungroup() |> 
  ggplot(aes(x = participant_number, y = site |> fct_rev(), col = site, fill = site, shape = LBP_colonization_by_week5)) +
  geom_point(size = 3) +
  facet_grid(. ~ arm, scales = "free_x", space = "free_x") +
  guides(color = "none", fill = "none") +
  scale_shape_manual("Colonized by LBP by week 5", values = c(1, 16)) +
  scale_color_manual(values = site_colors) + 
  xlab("Participant number by arm and site") + ylab("") +
  theme(
    legend.position = "top",
    legend.text = element_text(hjust = 0.5),
    legend.title = element_text(hjust = 0.5),
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)
  ) +
  guides(shape = guide_legend(ncol = 2))

Test for differences between the placebo and each active arm

We test for differences between the placebo arm and each active arm for the primary outcome using a one-sided unconditional exact test.

Data from both site are merged for each arm given the low numbers of participants at MGH.

The \(p\)-values are adjusted for multiple testing (multiple active arms against the Placebo) using the Benjamini-Hochberg method.

Code
arms_to_test <- col_by_week5$arm |> unique() |> setdiff("Pl")

exact_test_results <-
  map(
  arms_to_test,
  ~ {
    arm_to_test <- .x
    tmp <- col_by_week5 |> filter(arm %in% c("Pl", arm_to_test))
    table_for_test <- table(tmp$LBP_colonization_by_week5, tmp$arm |> fct_drop())
    
    test_res <- 
      exact2x2::uncondExact2x2(
        x1 = table_for_test[2, 1], n1 = sum(table_for_test[, 1]),
        x2 = table_for_test[2, 2], n2 = sum(table_for_test[, 2]),
        alternative = "greater", parmtype = "ratio",
        conf.level = 0.95, conf.int = TRUE, method = "FisherAdj"
        )

    tibble(
      arm = arm_to_test,
      p_value = test_res$p.value,
      ratio = test_res$estimate,
      lower_CI = test_res$conf.int[1],
      upper_CI = test_res$conf.int[2]
    )
  }
) |> 
  bind_rows() |> 
  mutate(
    adj_p_value = p_value |> p.adjust(method = "BH")
  )
Code
test_results <- 
  exact_test_results |> 
  mutate(
    adj_p_value = adj_p_value |> scales::pvalue(accuracy = 0.001),
    ratio = ratio |> round(2) |> as.character(),
    CI = str_c(lower_CI |> round(2), " - ", upper_CI |> round(2))
    ) |> 
  select(arm, adj_p_value, ratio, CI) |>
  pivot_longer(cols = -arm, names_to = "name", values_to = "value") |>
  pivot_wider(names_from = arm, values_from = value) |> 
  mutate(
    name = 
      case_when(
        name == "adj_p_value" ~ "Adj. p-value",
        name == "ratio" ~ "Benefit ratio",
        name == "CI" ~ "95% CI",
        TRUE ~ name
      )
  )
Code
table_2_with_model_results <- 
  bind_rows(
    table_2,
    test_results
  ) |> 
  mutate(
    site = site |> str_replace_na(""),
    Pl = 
      case_when(
        name == "Adj. p-value" ~ "Ref.",
        name == "Benefit ratio" ~ "",
        str_detect(name, "CI") ~ "",
        TRUE ~ Pl
      )
  )
Code
table_2_with_model_results |> 
  group_by(site) |>
  gt(caption = "Table 2: Colonization by week 5 by arm and site", row_group_as_column = TRUE) |> 
  cols_width(
    "name" ~ px(150),
    everything() ~ px(120)
  ) |> 
  cols_label(
    name = "",
    Pl = "Placebo",
    `LC106-7` = "LC106<br>7 days",
    `LC106-3` = "LC106<br>3 days",
    `LC106-o` = "LC106<br>overlap",
    `LC115` = "LC115<br>7 days",
    .fn = md
  )
Table 2: Colonization by week 5 by arm and site
Placebo LC106
7 days
LC106
3 days
LC106
overlap
LC115
7 days
CAP N participants N = 14 N = 14 N = 14 N = 14 N = 14
n (%) participants with LBP strain detected 0 (0 %) 9 (64 %) 7 (50 %) 7 (50 %) 10 (71 %)
95% CI 39% - 84% 27% - 73% 27% - 73% 45% - 88%
MGH N participants N = 5 N = 7 N = 1 N = 1 N = 6
n (%) participants with LBP strain detected 1 (20 %) 6 (86 %) 1 (100 %) 1 (100 %) 6 (100 %)
95% CI 49% - 97% 21% - 100% 21% - 100% 61% - 100%
Adj. p-value Ref. <0.001 <0.001 <0.001 <0.001
Benefit ratio 13.57 10.13 10.13 15.2
95% CI 1.72 - Inf 1.37 - Inf 1.37 - Inf 2.1 - Inf

Sensitivity analyses (Supplementary tables and figures)

LBP colonization by week 5 excluding immediately post-product visits

Defining the alternative primary outcome

We defined an alternative primary outcome that excluded visits conducted immediately after the last product administration. Specifically, for participants in the 7-dose arms (LC107-7 and LC115) and placebo, we eclude visit 1200, while for participants in the 3-dose arm (LC106-3) and overlap arm (LC106-o) only visits prior to 1200 (1000 and 1100) were excluded.

Code
new_outcome_def <- 
  # we start from the "colonized at"
  mae[["col_LBP_mg"]] |> 
  as_tibble() |> 
  dplyr::filter(.feature == "colonized_LBP_at_mg") |> 
  dplyr::left_join(
    mae@colData |> as_tibble() |> select(uid, pid, visit_code, arm),
    by = join_by(.sample == uid)
  ) |> 
  filter(!is.na(arm))

# we fill the missing visits
new_outcome_def <- 
  new_outcome_def |> 
  select(-c(arm)) |> 
  dplyr::full_join(
    new_outcome_def |> 
      select(pid, arm) |> 
      distinct() |> 
      expand_grid(
        visit_code = 
          unique(new_outcome_def$visit_code) |> sort() |> setdiff(c("0001","0010"))
        ),
    by = join_by(pid, visit_code)
  ) |> 
  arrange(pid, visit_code) |> 
  mutate(outcome = outcome |> replace_na(FALSE)) |> 
  dplyr::filter(!is.na(arm))
  
# we compute the new outcome
new_outcome_def <- 
  new_outcome_def |>  
  arrange(pid, visit_code) |> 
  group_by(pid) |> 
  mutate(
    tmp = 
      case_when(
        (arm %in% c("Pl", "LC106-7", "LC115")) & (as.numeric(visit_code) <= 1200) ~ FALSE,
        (arm %in% c("LC106-3", "LC106-o")) & (as.numeric(visit_code) < 1200) ~ FALSE,
        TRUE ~ outcome
      ),
    colonized_LBP_by_mg = cummax(tmp) |> as.logical()
  ) |> 
  ungroup()
Code
new_outcome_def |> 
  left_join(mae@colData |> as_tibble() |> select(pid, mITT) |> distinct()) |>
  ggplot() +
  aes(y = visit_code |> fct_rev(), x = pid, fill = colonized_LBP_by_mg) +
  facet_grid(. ~ arm + ifelse(mITT, "mITT", "ITT"), scales = "free", space = "free") +
  geom_tile() + 
  scale_fill_manual(
    "Colonized by LBP by MG by each visit",
    values = c("gray", "orangered"), labels = c("No", "Yes")
    ) +
  ylab("Visit code") + xlab("Participant IDs") +
  labs(
    caption = "Colonization at missed visits was considered negative."
  ) +
  theme(
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5),
    legend.position = "bottom"
  )

Code
new_outcome_def <- 
  new_outcome_def |> 
  dplyr::rename(colonized_LBP_by_mg_alt = colonized_LBP_by_mg) |> 
  left_join(
    mae[["col_LBP_mg"]] |> 
      as_tibble() |> 
      dplyr::filter(.feature == "colonized_LBP_by_mg") |> 
      select(.sample, outcome) |> 
      dplyr::rename(colonized_LBP_by_mg = outcome)
  )
Code
new_outcome_def |> 
  dplyr::filter(!is.na(colonized_LBP_by_mg)) |> 
  dplyr::count(colonized_LBP_by_mg_alt, colonized_LBP_by_mg)
new_outcome_def |> 
  dplyr::filter(visit_code == "1500", !is.na(colonized_LBP_by_mg)) |>
  dplyr::count(colonized_LBP_by_mg_alt, colonized_LBP_by_mg)
Code
col_by_week5 <- 
  new_outcome_def |> 
  select(pid, visit_code, colonized_LBP_by_mg_alt) |> 
  left_join(
    mae@colData |> as_tibble() |> 
      select(uid, pid, visit_code, site, randomized, arm, ITT, mITT, PP)
    ) |> 
  dplyr::filter(randomized, mITT) |> 
  dplyr::filter(visit_code == "1500") |> 
  mutate(LBP_colonization_by_week5 = colonized_LBP_by_mg_alt |> replace_na(FALSE)) 

Visualization of the alternative primary outcome

Code
col_by_week5 |> 
  arrange(site, arm, LBP_colonization_by_week5) |>
  group_by(site, arm) |> 
  mutate(participant_number = row_number() |> factor()) |> 
  ungroup() |> 
  ggplot(aes(x = participant_number, y = site |> fct_rev(), col = site, fill = site, shape = LBP_colonization_by_week5)) +
  geom_point(size = 3) +
  facet_grid(. ~ arm, scales = "free_x", space = "free_x") +
  guides(color = "none", fill = "none") +
  scale_shape_manual("Colonized by LBP by week 5 (excluding immediately post-product visit)", values = c(1, 16)) +
  scale_color_manual(values = site_colors) + 
  xlab("Participant number") + ylab("") +
  theme(
    legend.position = "top",
    legend.text = element_text(hjust = 0.5),
    legend.title = element_text(hjust = 0.5),
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)
  ) +
  guides(shape = guide_legend(ncol = 2))

Table

Code
summary_table <- 
  col_by_week5 |> 
  group_by(site, arm) |> 
  summarize(
    n = n(), 
    n_success = sum(LBP_colonization_by_week5),
    .groups = "drop"
  ) |> 
  mutate(
    p = n_success / n,
    CI = binom::binom.confint(n_success, n, method = "wilson")
  )
Code
table_2 <- 
  summary_table |> 
  mutate(
    N_per_site_and_arm = str_c("N = ", n),
    LBP_strain_detected = str_c(n_success, " (", round(p * 100)," %)"),
    CI_text = str_c(round(CI$lower * 100), "% - ", round(CI$upper * 100), "%")
  ) |> 
  dplyr::select(
    site, arm, N_per_site_and_arm, LBP_strain_detected, CI_text
  ) |> 
  pivot_longer(-c(site, arm)) |>
  mutate(
    name = 
      case_when(
        name == "N_per_site_and_arm" ~ "N participants",
        name == "LBP_strain_detected" ~ "n (%) participants\nwith LBP strain detected",
        name == "CI_text" ~ "95% CI"
      )
  ) |> 
  pivot_wider(
    id_cols = c(site, name), 
    names_from = arm,
    values_from = value, values_fill = ""
    )
Code
arms_to_test <- col_by_week5$arm |> unique() |> setdiff("Pl")

exact_test_results <-
  map(
  arms_to_test,
  ~ {
    arm_to_test <- .x
    tmp <- col_by_week5 |> filter(arm %in% c("Pl", arm_to_test))
    table_for_test <- table(tmp$LBP_colonization_by_week5, tmp$arm |> fct_drop())
    test_res <- 
      exact2x2::uncondExact2x2(
        x1 = table_for_test[2, 1], n1 = sum(table_for_test[, 1]),
        x2 = table_for_test[2, 2], n2 = sum(table_for_test[, 2]),
        alternative = "greater", parmtype = "ratio",
        conf.level = 0.95, conf.int = TRUE, method = "FisherAdj"
      )
    tibble(
      arm = arm_to_test,
      p_value = test_res$p.value,
      ratio = test_res$estimate,
      lower_CI = test_res$conf.int[1],
      upper_CI = test_res$conf.int[2]
    )
  }
) |> 
  bind_rows() |> 
  mutate(
    adj_p_value = p_value |> p.adjust(method = "BH")
  )
Code
test_results <- 
  exact_test_results |> 
  mutate(
    adj_p_value = adj_p_value |> scales::pvalue(accuracy = 0.001),
    ratio = ratio |> round(2) |> as.character(),
    CI = str_c(lower_CI |> round(2), " - ", upper_CI |> round(2))
    ) |> 
  select(arm, adj_p_value, ratio, CI) |>
  pivot_longer(cols = -arm, names_to = "name", values_to = "value") |>
  pivot_wider(names_from = arm, values_from = value) |> 
  mutate(
    name = 
      case_when(
        name == "adj_p_value" ~ "Adj. p-value",
        name == "ratio" ~ "Benefit ratio",
        name == "CI" ~ "95% CI",
        TRUE ~ name
      )
  )
Code
table_2_with_model_results <- 
  bind_rows(
    table_2,
    test_results # |> filter(name != "Adj. p-value")
  ) |> 
  mutate(
    site = site |> str_replace_na(""),
    Pl = 
      case_when(
        name == "Adj. p-value" ~ "Ref.",
        name == "Benefit ratio" ~ "-",
        str_detect(name, "CI") ~ "",
        TRUE ~ Pl
      )
  )
Code
table_2_with_model_results |> 
  group_by(site) |>
  gt(
    caption = "LBP Colonization by week 5 by arm and site excluding immediately post-product visit.", 
    row_group_as_column = TRUE
    ) |> 
  cols_width(
    "name" ~ px(150),
    everything() ~ px(120)
  ) |> 
  cols_label(
    name = "",
    Pl = "Placebo",
    `LC106-7` = "LC106<br>7 days",
    `LC106-3` = "LC106<br>3 days",
    `LC106-o` = "LC106<br>overlap",
    `LC115` = "LC115<br>7 days",
    .fn = md
  )
LBP Colonization by week 5 by arm and site excluding immediately post-product visit.
Placebo LC106
7 days
LC106
3 days
LC106
overlap
LC115
7 days
CAP N participants N = 13 N = 14 N = 14 N = 14 N = 14
n (%) participants with LBP strain detected 0 (0 %) 8 (57 %) 7 (50 %) 7 (50 %) 6 (43 %)
95% CI 33% - 79% 27% - 73% 27% - 73% 21% - 67%
MGH N participants N = 4 N = 7 N = 1 N = 1 N = 6
n (%) participants with LBP strain detected 1 (25 %) 4 (57 %) 1 (100 %) 1 (100 %) 4 (67 %)
95% CI 25% - 84% 21% - 100% 21% - 100% 30% - 90%
Adj. p-value Ref. 0.002 0.002 0.002 0.002
Benefit ratio - 9.71 9.07 9.07 8.5
95% CI 1.35 - Inf 1.35 - Inf 1.35 - Inf 1.27 - Inf

LBP colonization by week 5, qPCR-estimated relative abundances

For this sensitivity analysis, we define colonization by any of the LBP strains using relative abundances estimated from qPCR data. For each participant and visit, the relative abundance of each LBP strain was computed as the ratio of the strain-specific copies per swab to the total 16S rRNA gene copies per swab. In line with the metagenomic definition, colonization at a given visit was considered present if the total relative abundance of all LBP strains exceeded 10% or if the maximum relative abundance of any single LBP strain exceeded 5%. These analyses included the immediate post-product visits.

The computation itself has been done at the QC/Pre-processing stage (see the VIBRANT-0-QC-and-preprocessing GitHub repository).

Code
# by qPCR 

col_by_week5_qPCR <-
  mae@colData |> as_tibble() |> 
  select(pid, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |> 
  left_join(
    mae[["col_LBP_qPCR"]] |> 
      as_tibble() |> 
      dplyr::left_join(
        mae@colData |> as_tibble() |> select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |> 
      dplyr::filter(visit_code == "1500", .feature == "colonized_LBP_by_qpcr")
  ) |> 
  mutate(LBP_colonization_by_week5 = outcome |> replace_na(FALSE)) 
Code
col_by_week5_qPCR |> 
  arrange(site, arm, LBP_colonization_by_week5) |>
  group_by(site, arm) |> 
  mutate(participant_number = row_number() |> factor()) |> 
  ungroup() |> 
  ggplot(aes(x = participant_number, y = site |> fct_rev(), col = site, fill = site, shape = LBP_colonization_by_week5)) +
  geom_point(size = 3) +
  facet_grid(. ~ arm, scales = "free_x", space = "free_x") +
  guides(color = "none", fill = "none") +
  scale_shape_manual("Colonized by LBP by week 5", values = c(1, 16)) +
  scale_color_manual(values = site_colors) + 
  xlab("Participant number") + ylab("") +
  ggtitle("Colonization based on the relative abundances estimated by qPCR") +
  theme(
    legend.position = "top",
    legend.text = element_text(hjust = 0.5),
    legend.title = element_text(hjust = 0.5),
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5), 
    plot.title = element_text(hjust = 0.5)
  ) +
  guides(shape = guide_legend(ncol = 2))

Table

Code
summary_table <- 
  col_by_week5_qPCR |> 
  group_by(site, arm) |> 
  summarize(
    n = n(), 
    n_success = sum(LBP_colonization_by_week5),
    .groups = "drop"
  ) |> 
  mutate(
    p = n_success / n,
    CI = binom::binom.confint(n_success, n, method = "wilson") #exact
  )
Code
table_2 <- 
  summary_table |> 
  mutate(
    N_per_site_and_arm = str_c("N = ", n),
    LBP_strain_detected = str_c(n_success, " (", round(p * 100)," %)"),
    CI_text = str_c(round(CI$lower * 100), "% - ", round(CI$upper * 100), "%")
  ) |> 
  dplyr::select(
    site, arm, N_per_site_and_arm, LBP_strain_detected, CI_text
  ) |> 
  pivot_longer(-c(site, arm)) |>
  mutate(
    name = 
      case_when(
        name == "N_per_site_and_arm" ~ "N participants",
        name == "LBP_strain_detected" ~ "n (%) participants\nwith LBP strain detected",
        name == "CI_text" ~ "95% CI"
      )
  ) |> 
  pivot_wider(
    id_cols = c(site, name), 
    names_from = arm,
    values_from = value, values_fill = ""
    )
Code
arms_to_test <- col_by_week5$arm |> unique() |> setdiff("Pl")

exact_test_results <-
  map(
  arms_to_test,
  ~ {
    arm_to_test <- .x
    tmp <- col_by_week5_qPCR |> filter(arm %in% c("Pl", arm_to_test))
    table_for_test <- table(tmp$LBP_colonization_by_week5, tmp$arm |> fct_drop())
    test_res <- 
      exact2x2::uncondExact2x2(
        x1 = table_for_test[2, 1], n1 = sum(table_for_test[, 1]),
        x2 = table_for_test[2, 2], n2 = sum(table_for_test[, 2]),
        alternative = "greater", parmtype = "ratio",
        conf.level = 0.95, conf.int = TRUE, method = "FisherAdj"
        )
    tibble(
      arm = arm_to_test,
      p_value = test_res$p.value,
      ratio = test_res$estimate,
      lower_CI = test_res$conf.int[1],
      upper_CI = test_res$conf.int[2]
    )
  }
) |> 
  bind_rows() |> 
  mutate(
    adj_p_value = p_value |> p.adjust(method = "BH")
  )
Code
test_results <- 
  exact_test_results |> 
  mutate(
    adj_p_value = adj_p_value |> scales::pvalue(accuracy = 0.001),
    ratio = ratio |> round(2) |> as.character(),
    CI = str_c(lower_CI |> round(2), " - ", upper_CI |> round(2))
    ) |> 
  select(arm, adj_p_value, ratio, CI) |>
  pivot_longer(cols = -arm, names_to = "name", values_to = "value") |>
  pivot_wider(names_from = arm, values_from = value) |> 
  mutate(
    name = 
      case_when(
        name == "adj_p_value" ~ "Adj. p-value",
        name == "ratio" ~ "Benefit ratio",
        name == "CI" ~ "95% CI",
        TRUE ~ name
      )
  )
Code
table_2_with_model_results <- 
  bind_rows(
    table_2,
    test_results # |> filter(name != "Adj. p-value")
  ) |> 
  mutate(
    site = site |> str_replace_na(""),
    Pl = 
      case_when(
        name == "Adj. p-value" ~ "Ref.",
        name == "Benefit ratio" ~ "-",
        str_detect(name, "CI") ~ "",
        TRUE ~ Pl
      )
  )
Code
table_2_with_model_results |> 
  group_by(site) |>
  gt(
    caption = "LBP Colonization by week 5 by arm, with colonization definition based on qPCR-estimated relative abundance.", 
    row_group_as_column = TRUE
    ) |> 
  cols_width(
    "name" ~ px(150),
    everything() ~ px(120)
  ) |> 
  cols_label(
    name = "",
    Pl = "Placebo",
    `LC106-7` = "LC106<br>7 days",
    `LC106-3` = "LC106<br>3 days",
    `LC106-o` = "LC106<br>overlap",
    `LC115` = "LC115<br>7 days",
    .fn = md
  )
LBP Colonization by week 5 by arm, with colonization definition based on qPCR-estimated relative abundance.
Placebo LC106
7 days
LC106
3 days
LC106
overlap
LC115
7 days
CAP N participants N = 14 N = 14 N = 14 N = 14 N = 14
n (%) participants with LBP strain detected 0 (0 %) 8 (57 %) 7 (50 %) 8 (57 %) 10 (71 %)
95% CI 33% - 79% 27% - 73% 33% - 79% 45% - 88%
MGH N participants N = 5 N = 7 N = 1 N = 1 N = 6
n (%) participants with LBP strain detected 1 (20 %) 6 (86 %) 1 (100 %) 1 (100 %) 6 (100 %)
95% CI 49% - 97% 21% - 100% 21% - 100% 61% - 100%
Adj. p-value Ref. <0.001 <0.001 <0.001 <0.001
Benefit ratio - 12.67 10.13 11.4 15.2
95% CI 1.58 - Inf 1.37 - Inf 1.47 - Inf 2.1 - Inf

LBP detection by week 5, qPCR-based detection

In this analysis, LBP is considered to be detected if at least 2 LBP strains had observed Cq values at the same visit by week 5. “By week 5” includes all post-product visits from visit_code = 1200 (no excluding the immediate post-product visits.

Code
detected_by_week5_qPCR <-
  mae@colData |> as_tibble() |> 
  select(pid, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |> 
  left_join(
    mae[["col_LBP_qPCR"]] |> 
      as_tibble() |> 
      dplyr::left_join(
        mae@colData |> as_tibble() |> select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |> 
      dplyr::filter(visit_code == "1500", .feature == "LBP_detected_by_qpcr")
  ) |> 
  mutate(LBP_detected_by_week5 = outcome |> replace_na(FALSE)) 
Code
detected_by_week5_qPCR |> 
  arrange(site, arm, LBP_detected_by_week5) |>
  group_by(site, arm) |> 
  mutate(participant_number = row_number() |> factor()) |> 
  ungroup() |> 
  ggplot(aes(x = participant_number, y = site |> fct_rev(), col = site, fill = site, shape = LBP_detected_by_week5)) +
  geom_point(size = 3) +
  facet_grid(. ~ arm, scales = "free_x", space = "free_x") +
  guides(color = "none", fill = "none") +
  scale_shape_manual("Colonized by LBP by week 5", values = c(1, 16)) +
  scale_color_manual(values = site_colors) + 
  xlab("Participant number") + ylab("") +
  ggtitle("Colonization based on qPCR-based detection of at least 2 LBP strains detected at the same visit by week 5") +
  theme(
    legend.position = "top",
    legend.text = element_text(hjust = 0.5),
    legend.title = element_text(hjust = 0.5),
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5), 
    plot.title = element_text(hjust = 0.5)
  ) +
  guides(shape = guide_legend(ncol = 2))

Table

Code
summary_table <- 
  detected_by_week5_qPCR |> 
  group_by(site, arm) |> 
  summarize(
    n = n(), 
    n_success = sum(LBP_detected_by_week5),
    .groups = "drop"
  ) |> 
  mutate(
    p = n_success / n,
    CI = binom::binom.confint(n_success, n, method = "wilson") #exact
  )
Code
table_2 <- 
  summary_table |> 
  mutate(
    N_per_site_and_arm = str_c("N = ", n),
    LBP_strain_detected = str_c(n_success, " (", round(p * 100)," %)"),
    CI_text = str_c(round(CI$lower * 100), "% - ", round(CI$upper * 100), "%")
  ) |> 
  dplyr::select(
    site, arm, N_per_site_and_arm, LBP_strain_detected, CI_text
  ) |> 
  pivot_longer(-c(site, arm)) |>
  mutate(
    name = 
      case_when(
        name == "N_per_site_and_arm" ~ "N participants",
        name == "LBP_strain_detected" ~ "n (%) participants\nwith LBP strain detected",
        name == "CI_text" ~ "95% CI"
      )
  ) |> 
  pivot_wider(
    id_cols = c(site, name), 
    names_from = arm,
    values_from = value, values_fill = ""
    )
Code
arms_to_test <- col_by_week5$arm |> unique() |> setdiff("Pl")

exact_test_results <-
  map(
  arms_to_test,
  ~ {
    arm_to_test <- .x
    tmp <- detected_by_week5_qPCR |> filter(arm %in% c("Pl", arm_to_test))
    table_for_test <- table(tmp$LBP_detected_by_week5, tmp$arm |> fct_drop())
    test_res <- 
      exact2x2::uncondExact2x2(
        x1 = table_for_test[2, 1], n1 = sum(table_for_test[, 1]),
        x2 = table_for_test[2, 2], n2 = sum(table_for_test[, 2]),
        alternative = "greater", parmtype = "ratio",
        conf.level = 0.95, conf.int = TRUE, method = "FisherAdj"
        )
    tibble(
      arm = arm_to_test,
      p_value = test_res$p.value,
      ratio = test_res$estimate,
      lower_CI = test_res$conf.int[1],
      upper_CI = test_res$conf.int[2]
    )
  }
) |> 
  bind_rows() |> 
  mutate(
    adj_p_value = p_value |> p.adjust(method = "BH")
  )
Code
test_results <- 
  exact_test_results |> 
  mutate(
    adj_p_value = adj_p_value |> scales::pvalue(accuracy = 0.001),
    ratio = ratio |> round(2) |> as.character(),
    CI = str_c(lower_CI |> round(2), " - ", upper_CI |> round(2))
    ) |> 
  select(arm, adj_p_value, ratio, CI) |>
  pivot_longer(cols = -arm, names_to = "name", values_to = "value") |>
  pivot_wider(names_from = arm, values_from = value) |> 
  mutate(
    name = 
      case_when(
        name == "adj_p_value" ~ "Adj. p-value",
        name == "ratio" ~ "Benefit ratio",
        name == "CI" ~ "95% CI",
        TRUE ~ name
      )
  )
Code
table_2_with_model_results <- 
  bind_rows(
    table_2,
    test_results # |> filter(name != "Adj. p-value")
  ) |> 
  mutate(
    site = site |> str_replace_na(""),
    Pl = 
      case_when(
        name == "Adj. p-value" ~ "Ref.",
        name == "Benefit ratio" ~ "-",
        str_detect(name, "CI") ~ "",
        TRUE ~ Pl
      )
  )
Code
table_2_with_model_results |> 
  group_by(site) |>
  gt(
    caption = "LBP detection by week 5 by arm, with LBP detection defined as at least two LBP strains simultaneously detected by qPCR.", 
    row_group_as_column = TRUE
    ) |> 
  cols_width(
    "name" ~ px(150),
    everything() ~ px(120)
  ) |> 
  cols_label(
    name = "",
    Pl = "Placebo",
    `LC106-7` = "LC106<br>7 days",
    `LC106-3` = "LC106<br>3 days",
    `LC106-o` = "LC106<br>overlap",
    `LC115` = "LC115<br>7 days",
    .fn = md
  )
LBP detection by week 5 by arm, with LBP detection defined as at least two LBP strains simultaneously detected by qPCR.
Placebo LC106
7 days
LC106
3 days
LC106
overlap
LC115
7 days
CAP N participants N = 14 N = 14 N = 14 N = 14 N = 14
n (%) participants with LBP strain detected 2 (14 %) 12 (86 %) 8 (57 %) 9 (64 %) 11 (79 %)
95% CI 60% - 96% 33% - 79% 39% - 84% 52% - 92%
MGH N participants N = 5 N = 7 N = 1 N = 1 N = 6
n (%) participants with LBP strain detected 3 (60 %) 6 (86 %) 1 (100 %) 1 (100 %) 6 (100 %)
95% CI 49% - 97% 21% - 100% 21% - 100% 61% - 100%
Adj. p-value Ref. <0.001 0.028 0.015 <0.001
Benefit ratio - 3.26 2.28 2.53 3.23
95% CI 1.47 - Inf 1.07 - Inf 1.15 - Inf 1.47 - Inf

L. crispatus dominance by week 5

In this analysis, L. crispatus dominance is defined as a metagenomic-estimated relative abundance ≥ 50%. “By week 5” includes all post-product visits from visit_code = 1200 (no excluding the immediate post-product visits.

Code
# crispatus dominance 

crisp_dom_by_5week <- 
  mae@colData |> as_tibble() |> 
  select(pid, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |> 
  left_join(
    mae[["col_crispatus_mg"]] |> 
      as_tibble() |> 
      dplyr::left_join(
        mae@colData |> as_tibble() |> select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |> 
      dplyr::filter(visit_code == "1500", .feature == "crispatus_dominance_by_mg")
  ) |> 
  mutate(crisp_dom_by_5week = outcome |> replace_na(FALSE)) 
Code
crisp_dom_by_5week |> 
  arrange(site, arm, crisp_dom_by_5week) |>
  group_by(site, arm) |> 
  mutate(participant_number = row_number() |> factor()) |> 
  ungroup() |> 
  ggplot(aes(x = participant_number, y = site |> fct_rev(), col = site, fill = site, shape = crisp_dom_by_5week)) +
  geom_point(size = 3) +
  facet_grid(. ~ arm, scales = "free_x", space = "free_x") +
  guides(color = "none", fill = "none") +
  scale_shape_manual("Colonized by LBP by week 5", values = c(1, 16)) +
  scale_color_manual(values = site_colors) + 
  xlab("Participant number") + ylab("") +
  ggtitle("L. crispatus dominance (≥ 50% by metagenomics)") +
  theme(
    legend.position = "top",
    legend.text = element_text(hjust = 0.5),
    legend.title = element_text(hjust = 0.5),
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5), 
    plot.title = element_text(hjust = 0.5)
  ) +
  guides(shape = guide_legend(ncol = 2))

Table

Code
summary_table <- 
  crisp_dom_by_5week |> 
  group_by(site, arm) |> 
  summarize(
    n = n(), 
    n_success = sum(crisp_dom_by_5week),
    .groups = "drop"
  ) |> 
  mutate(
    p = n_success / n,
    CI = binom::binom.confint(n_success, n, method = "wilson") #exact
  )
Code
table_2 <- 
  summary_table |> 
  mutate(
    N_per_site_and_arm = str_c("N = ", n),
    Lc_dom = str_c(n_success, " (", round(p * 100)," %)"),
    CI_text = str_c(round(CI$lower * 100), "% - ", round(CI$upper * 100), "%")
  ) |> 
  dplyr::select(
    site, arm, N_per_site_and_arm, Lc_dom, CI_text
  ) |> 
  pivot_longer(-c(site, arm)) |>
  mutate(
    name = 
      case_when(
        name == "N_per_site_and_arm" ~ "N participants",
        name == "Lc_dom" ~ "n (%) participants\n≥ 50% L. crispatus",
        name == "CI_text" ~ "95% CI"
      )
  ) |> 
  pivot_wider(
    id_cols = c(site, name), 
    names_from = arm,
    values_from = value, values_fill = ""
    )
Code
arms_to_test <- col_by_week5$arm |> unique() |> setdiff("Pl")

exact_test_results <-
  map(
  arms_to_test,
  ~ {
    arm_to_test <- .x
    tmp <- crisp_dom_by_5week |> filter(arm %in% c("Pl", arm_to_test))
    table_for_test <- table(tmp$crisp_dom_by_5week, tmp$arm |> fct_drop())
    test_res <- 
      exact2x2::uncondExact2x2(
        x1 = table_for_test[2, 1], n1 = sum(table_for_test[, 1]),
        x2 = table_for_test[2, 2], n2 = sum(table_for_test[, 2]),
        alternative = "greater", parmtype = "ratio",
        conf.level = 0.95, conf.int = TRUE, method = "FisherAdj"
        )
    tibble(
      arm = arm_to_test,
      p_value = test_res$p.value,
      ratio = test_res$estimate,
      lower_CI = test_res$conf.int[1],
      upper_CI = test_res$conf.int[2]
    )
  }
) |> 
  bind_rows() |> 
  mutate(
    adj_p_value = p_value |> p.adjust(method = "BH")
  )
Code
test_results <- 
  exact_test_results |> 
  mutate(
    adj_p_value = adj_p_value |> scales::pvalue(accuracy = 0.001),
    ratio = ratio |> round(2) |> as.character(),
    CI = str_c(lower_CI |> round(2), " - ", upper_CI |> round(2))
    ) |> 
  select(arm, adj_p_value, ratio, CI) |>
  pivot_longer(cols = -arm, names_to = "name", values_to = "value") |>
  pivot_wider(names_from = arm, values_from = value) |> 
  mutate(
    name = 
      case_when(
        name == "adj_p_value" ~ "Adj. p-value",
        name == "ratio" ~ "Benefit ratio",
        name == "CI" ~ "95% CI",
        TRUE ~ name
      )
  )
Code
table_2_with_model_results <- 
  bind_rows(
    table_2,
    test_results # |> filter(name != "Adj. p-value")
  ) |> 
  mutate(
    site = site |> str_replace_na(""),
    Pl = 
      case_when(
        name == "Adj. p-value" ~ "Ref.",
        name == "Benefit ratio" ~ "-",
        str_detect(name, "CI") ~ "",
        TRUE ~ Pl
      )
  )
Code
table_2_with_model_results |> 
  group_by(site) |>
  gt(
    caption = "Ever ≥50% L. crispatus by week 5 by arm.", 
    row_group_as_column = TRUE
    ) |> 
  cols_width(
    "name" ~ px(150),
    everything() ~ px(120)
  ) |> 
  cols_label(
    name = "",
    Pl = "Placebo",
    `LC106-7` = "LC106<br>7 days",
    `LC106-3` = "LC106<br>3 days",
    `LC106-o` = "LC106<br>overlap",
    `LC115` = "LC115<br>7 days",
    .fn = md
  )
Ever ≥50% L. crispatus by week 5 by arm.
Placebo LC106
7 days
LC106
3 days
LC106
overlap
LC115
7 days
CAP N participants N = 14 N = 14 N = 14 N = 14 N = 14
n (%) participants ≥ 50% L. crispatus 0 (0 %) 8 (57 %) 5 (36 %) 6 (43 %) 10 (71 %)
95% CI 33% - 79% 16% - 61% 21% - 67% 45% - 88%
MGH N participants N = 5 N = 7 N = 1 N = 1 N = 6
n (%) participants ≥ 50% L. crispatus 1 (20 %) 6 (86 %) 1 (100 %) 1 (100 %) 5 (83 %)
95% CI 49% - 97% 21% - 100% 21% - 100% 44% - 97%
Adj. p-value Ref. <0.001 0.008 0.004 <0.001
Benefit ratio - 12.67 7.6 8.87 14.25
95% CI 1.58 - Inf 1.21 - Inf 1.29 - Inf 1.89 - Inf

Secondary outcomes (Figures 2-4)

Code
fig_title <- mae_title(mae)

Kinetics of LBP colonization

Total relative abundances of LBP strains

To investigate the kinetics of colonization, we estimated at each clinic visit the total relative abundance of all LBP strains combined from metagenomic sequencing data. For each participant, the relative abundances of the individual LBP strains were summed, and trajectories were visualized over time stratified by study arm and site.

Code
total_rel_ab_of_LBP <- 
  mae[["mg"]] |> 
  as_tibble() |> 
  filter(!is.na(LBP), !poor_quality_mg_data) |> 
  left_join(
    mae |> colData() |> as_tibble(), 
    by = join_by(uid)
  ) |>
  filter(randomized, mITT) |> 
  group_by(.sample, uid) |>
  summarize(
    total_LBP_rel_ab = sum(rel_ab, na.rm = TRUE),
    .groups = "drop"
  )

total_rel_ab_of_LBP <- 
  total_rel_ab_of_LBP |> 
  dplyr::left_join(mae |> colData() |> as_tibble(), by = join_by(uid))


total_rel_ab_of_LBP <- 
  total_rel_ab_of_LBP |> 
  left_join(
    total_rel_ab_of_LBP |> 
      dplyr::select(pid, arm, site) |> 
      distinct() |> 
      group_by(arm, site) |> 
      arrange(pid) |> 
      mutate(pid_nb = row_number()) |> 
      ungroup()
  )
Code
g_tot_prop_LBP_strains <- 
  total_rel_ab_of_LBP |>
  filter(visit_type == "Clinic", !is.na(arm), PIPV) |> 
  ggplot(aes(x = study_week, y = total_LBP_rel_ab, col = arm)) +
  geom_line(aes(group = pid), alpha = 0.5) +
  geom_point(size = 0.5, alpha = 0.5) +
  facet_grid(site ~ arm) + 
  scale_x_continuous("Study weeks", breaks = c(-1:5,7,9,12), minor_breaks = 0:12) +
  scale_y_continuous(
    "Total relative abundances\nof LBP strains", labels = scales::percent,
    breaks = seq(0, 1, by = 0.2)
  ) +
  expand_limits(y = 0) +
  scale_color_manual(values = arm_colors) +
  guides(col = "none") +
  ggtitle(fig_title) 
Code
# g_tot_prop_LBP_strains

To facilitate interpretation, each participant within the same study arm and site is labeled by a unique number. These numbers allow visual tracking of individual trajectories across visits within each stratified panel.

Code
g_tot_prop_LBP_strains +
  geom_label(aes(label = pid_nb), size = 2) +
  labs(
    caption = "Participants are labeled by their enrollment order within each arm and site."
  )

Proportions of participants who colonized in each arm and each visit

Code
get_longitudinal_proportions_of_participants_who_colonized <- 
  function(total_rel_ab_of_LBP) {
    total_rel_ab_of_LBP |> 
      filter(!is.na(arm), PIPV) |> 
      group_by(site, arm, study_week) |>
      summarize(
        prop_who_colonized_at = mean(total_LBP_rel_ab > 0.05), 
        CI_low = 
          prop.test(
            x = sum(total_LBP_rel_ab > 0.05), n = n(), 
            conf.level = 0.95)$conf.int[1],
        CI_high = 
          prop.test(
            x = sum(total_LBP_rel_ab > 0.05), n = n(), 
            conf.level = 0.95)$conf.int[2],
        .groups = "drop"
      )
  }

arm_colors <- c(
  "LC106-7" = "darkorange", 
  "LC115"   = "#8E2708",
  "LC106-3" = "dodgerblue1",
  "LC106-o" = "deeppink",
  "Pl"       = "#B3B3B3"
)
Code
plot_longitudinal_proportions_of_participants_who_colonized <- function(total_rel_ab_of_LBP){
    
    get_longitudinal_proportions_of_participants_who_colonized(total_rel_ab_of_LBP) |>
      ggplot() +
      aes(x = study_week, y = prop_who_colonized_at, col = arm, shape = arm) +
      facet_grid(site ~ .) +
      geom_ribbon(
        aes(ymin = CI_low, ymax = CI_high, fill = arm, colour = arm), 
        alpha = 0.05, size = 0.2 #, color = NA
      ) +
      geom_line(linewidth = 0.75) + #size = 0.35
      geom_point(size = 3) + 
      scale_x_continuous("Study weeks", breaks = c(-1:5,7,9,12), minor_breaks = 0:12) +
      scale_y_continuous(
        "Proportion of participants\nwith detected LBP strains", labels = scales::percent,
        breaks = seq(0, 1, by = 0.2)
      )  +
      expand_limits(y = 0) +
    scale_color_manual("Arm",values = arm_colors) +
    scale_fill_manual("Arm",values = arm_colors) + 
    scale_shape_manual("Arm", values = c(1, 19, 15, 17, 8)) 
  }
Code
g_prop_colonized_by <- plot_longitudinal_proportions_of_participants_who_colonized(total_rel_ab_of_LBP)
g_prop_colonized_by # + ggtitle(fig_title) 

For the sake of clarity, we remove the two arms that only one participant in the MGH is involved in.

Code
g_prop_colonized_by_final <- 
  plot_longitudinal_proportions_of_participants_who_colonized(
    total_rel_ab_of_LBP |> filter(!((arm %in% c("LC106-3", "LC106-o")) & (site == "MGH")))
    )
g_prop_colonized_by_final + ggtitle(fig_title) 

Code
plot_longitudinal_proportions_of_participants_who_colonized_v2 <- 
  function(total_rel_ab_of_LBP){
    
    get_longitudinal_proportions_of_participants_who_colonized(total_rel_ab_of_LBP) |>
      ggplot(aes(x = study_week, y = prop_who_colonized_at, col = arm)) +
      geom_linerange(
        aes(ymin = CI_low, ymax = CI_high, col = arm), 
        alpha = 0.4, lineend = "round", linewidth = 1,
        position = position_dodge(width = 0.5)
      ) +
      geom_line(position = position_dodge(width = 0.5), linewidth = 0.8) +
      geom_point(aes(shape = arm), position = position_dodge(width = 0.5), size = 2) +
      facet_grid(site ~ .) + 
      scale_shape_manual("Arm", values = c(1, 19, 15, 17, 8)) +
      scale_x_continuous("Study weeks", breaks = c(-1:5,7,9,12), minor_breaks = 0:12) +
      scale_y_continuous(
        "Proportion of participants\nwith detected LBP strains", labels = scales::percent,
        breaks = seq(0, 1, by = 0.2)
      ) +
      expand_limits(y = 0) +
      scale_color_manual("Arm", values = arm_colors) +
      scale_fill_manual("Arm", values = arm_colors)
    
  }

Alternative visualization:

Code
g_prop_colonized_by <- plot_longitudinal_proportions_of_participants_who_colonized_v2(total_rel_ab_of_LBP)
g_prop_colonized_by +  ggtitle(fig_title) 

Code
g_prop_colonized_by <- 
  plot_longitudinal_proportions_of_participants_who_colonized_v2(
    total_rel_ab_of_LBP |> filter(!((arm %in% c("LC106-3", "LC106-o")) & (site == "MGH")))
    )

g_prop_colonized_by +  ggtitle(fig_title) 

Code
total_rel_ab_of_LBP |> 
  filter(!is.na(study_week), study_week >= 0) |> 
  mutate(
    arm_group = case_when(
      arm == "Pl" ~ "Placebo",
      TRUE ~ "Active arms"
    ) |> factor(levels = c("Placebo", "Active arms"))
  ) |> 
  group_by(
    arm_group, visit_code, study_week
  ) |> 
  summarize(
    n = n(), 
    n_colonized = sum(total_LBP_rel_ab >= 0.05),
    prop_colonized = n_colonized/n,
    CI_low = 
      prop.test(
        x = n_colonized, n = n, 
        conf.level = 0.95)$conf.int[1],
    CI_high = 
      prop.test(
        x = n_colonized, n = n, 
        conf.level = 0.95)$conf.int[2],
    .groups = "drop"
  ) |> 
  mutate(
    n_N = str_c(n_colonized, " / ", n),
    perc = str_c(round(prop_colonized * 100, 1), "% (", round(CI_low * 100, 1), " - ", round(CI_high * 100, 1), "%)"),
  ) |> 
  select(arm_group, study_week, n_N, perc) |> 
  pivot_wider(
    names_from = arm_group,
    values_from = c(n_N, perc),
    names_vary = "slowest"
  ) |> 
gt() |> 
  cols_label(
    study_week = "Study week",
    n_N_Placebo = "N colonized (Placebo)",
    `n_N_Active arms` = "N colonized (Active arms)",
    perc_Placebo = "Proportion colonized (Placebo)",
    `perc_Active arms` = "Proportion colonized (Active arms)"
  ) 
Study week N colonized (Placebo) Proportion colonized (Placebo) N colonized (Active arms) Proportion colonized (Active arms)
0 0 / 19 0% (0 - 20.9%) 0 / 71 0% (0 - 6.4%)
1 0 / 19 0% (0 - 20.9%) 13 / 70 18.6% (10.6 - 30%)
2 0 / 18 0% (0 - 21.9%) 43 / 68 63.2% (50.6 - 74.4%)
3 0 / 18 0% (0 - 21.9%) 26 / 67 38.8% (27.4 - 51.5%)
4 0 / 16 0% (0 - 24.1%) 34 / 70 48.6% (36.6 - 60.7%)
5 1 / 17 5.9% (0.3 - 30.8%) 26 / 71 36.6% (25.7 - 49%)
7 1 / 17 5.9% (0.3 - 30.8%) 26 / 67 38.8% (27.4 - 51.5%)
9 0 / 18 0% (0 - 21.9%) 22 / 69 31.9% (21.5 - 44.3%)
12 0 / 17 0% (0 - 22.9%) 24 / 68 35.3% (24.4 - 47.9%)

Kinetics of L. crispatus colonization

Total relative abundances of L. crispatus

We further examined colonization dynamics by focusing on Lactobacillus crispatus, irrespective of whether strains belonged to the investigational product.

For each participant and visit, the total relative abundance of L. crispatus was calculated by summing all metagenomic features annotated to this species. Trajectories were visualized over time stratified by study arm and site.

Code
total_rel_ab_of_Lc <- 
  mae[["mg"]] |> 
  as_tibble() |> 
  filter(str_detect(species, "crispatus"), !poor_quality_mg_data) |> 
  left_join(
    mae |> colData() |> as_tibble(), 
    by = join_by(uid)
  ) |>
  filter(randomized, mITT) |> 
  group_by(.sample, uid) |>
  summarize(
    total_Lc_rel_ab = sum(rel_ab, na.rm = TRUE),
    .groups = "drop"
  )

total_rel_ab_of_Lc <- 
  total_rel_ab_of_Lc |> 
  dplyr::left_join(mae |> colData() |> as_tibble(), by = join_by(uid))


# we add a pid_nb 
total_rel_ab_of_Lc <- 
  total_rel_ab_of_Lc |> 
  left_join(
    total_rel_ab_of_Lc |> 
      dplyr::select(pid, arm, site) |> 
      distinct() |> 
      group_by(arm, site) |> 
      arrange(pid) |> 
      mutate(pid_nb = row_number()) |> 
      ungroup()
  )
Code
g_tot_prop_Lc <- 
  total_rel_ab_of_Lc |>
  filter(visit_type == "Clinic", !is.na(arm), PIPV) |> 
  ggplot(aes(x = study_week, y = total_Lc_rel_ab, col = arm)) +
  geom_line(aes(group = pid), alpha = 0.5) +
  geom_point(size = 0.5, alpha = 0.5) +
  facet_grid(site ~ arm) + 
  scale_x_continuous("Study weeks", breaks = c(-1:5,7,9,12), minor_breaks = 0:12) +
  scale_y_continuous(
    "Total relative abundances\nof L. crispatus", labels = scales::percent,
    breaks = seq(0, 1, by = 0.2)
  ) +
  expand_limits(y = 0) +
  scale_color_manual(values = arm_colors) +
  guides(col = "none") +
  ggtitle(mae_title(mae)) 
Code
# g_tot_prop_Lc
Code
g_tot_prop_Lc +
  geom_label(aes(label = pid_nb), size = 2) +
  labs(
    caption = "Participants are labeled by their enrollment order within each arm and site."
  )

Proportions of participants who colonized with 50% L. crispatus in each arm and each visit

We estimated at each visit the proportion of participants whose vaginal microbiota was dominated by L. crispatus (defined as a relative abundance >50%). For each study arm and site, proportions were calculated together with 95% confidence intervals derived from binomial tests.

Code
get_longitudinal_proportions_of_participants_who_colonized <- 
  function(total_rel_ab_of_Lc) {
    total_rel_ab_of_Lc |> 
      filter(!is.na(arm), PIPV) |> 
      group_by(site, arm, study_week) |>
      summarize(
        prop_who_colonized_at = mean(total_Lc_rel_ab > 0.5), 
        CI_low = 
          prop.test(
            x = sum(total_Lc_rel_ab > 0.5), n = n(), 
            conf.level = 0.95)$conf.int[1],
        CI_high = 
          prop.test(
            x = sum(total_Lc_rel_ab > 0.5), n = n(), 
            conf.level = 0.95)$conf.int[2],
        .groups = "drop"
      )
  }
Code
plot_longitudinal_proportions_of_participants_who_colonized <- function(total_rel_ab_of_Lc){
    
    get_longitudinal_proportions_of_participants_who_colonized(total_rel_ab_of_Lc) |>
      ggplot() +
      aes(x = study_week, y = prop_who_colonized_at, col = arm, shape = arm) +
      facet_grid(site ~ .) +
      geom_ribbon(
        aes(ymin = CI_low, ymax = CI_high, fill = arm, colour = arm), 
        alpha = 0.05, size = 0.2 #, color = NA
      ) +
      geom_line(size = 0.35) +
      geom_point() + 
      scale_x_continuous("Study weeks", breaks = c(-1:5,7,9,12), minor_breaks = 0:12) +
      scale_y_continuous(
        "Proportion of participants\n≥ 50% L. crispatus", labels = scales::percent,
        breaks = seq(0, 1, by = 0.2)
      )  +
      expand_limits(y = 0) +
    scale_shape_manual("Arm", values = c(1, 19, 15, 17, 8)) + 
    scale_color_manual(
      "Arm",
      values = arm_colors
    ) +
    scale_fill_manual(
      "Arm",
      values = arm_colors
    )
  }
Code
g_prop_colonized_by <- plot_longitudinal_proportions_of_participants_who_colonized(total_rel_ab_of_Lc)
g_prop_colonized_by # + ggtitle(fig_title)

Code
g_prop_colonized_by <- 
  plot_longitudinal_proportions_of_participants_who_colonized(
    total_rel_ab_of_Lc |> filter(!((arm %in% c("LC106-3", "LC106-o")) & (site == "MGH")))
    )

g_prop_colonized_by + ggtitle(fig_title)

Alternative visualization:

Code
plot_longitudinal_proportions_of_participants_who_colonized_v2 <- 
  function(total_rel_ab_of_Lc){
    
    get_longitudinal_proportions_of_participants_who_colonized(total_rel_ab_of_Lc) |>
      ggplot(aes(x = study_week, y = prop_who_colonized_at, col = arm)) +
      geom_linerange(
        aes(ymin = CI_low, ymax = CI_high, col = arm), 
        alpha = 0.4, lineend = "round", linewidth = 1,
        position = position_dodge(width = 0.5)
      ) +
      geom_line(position = position_dodge(width = 0.5), linewidth = 0.8) +
      geom_point(aes(shape = arm), position = position_dodge(width = 0.5), size = 2) +
      facet_grid(site ~ .) + 
      scale_shape_manual("Arm", values = c(1, 19, 15, 17, 8)) +
      scale_x_continuous("Study weeks", breaks = c(-1:5,7,9,12), minor_breaks = 0:12) +
      scale_y_continuous(
        "Proportion of participants\n≥ 50% L. crispatus", labels = scales::percent,
        breaks = seq(0, 1, by = 0.2)
      ) +
      expand_limits(y = 0) +
      scale_color_manual("Arm", values = arm_colors) +
      scale_fill_manual("Arm", values = arm_colors)
    
  }
Code
g_prop_colonized_by <- plot_longitudinal_proportions_of_participants_who_colonized_v2(total_rel_ab_of_Lc)
g_prop_colonized_by + ggtitle(fig_title)

Code
g_prop_colonized_by <- 
  plot_longitudinal_proportions_of_participants_who_colonized_v2(
    total_rel_ab_of_Lc |> filter(!((arm %in% c("LC106-3", "LC106-o")) & (site == "MGH")))
    )

g_prop_colonized_by + ggtitle(fig_title)

Durability of colonization

We assessed the persistence of colonization among participants who had achieved LBP colonization by week 5. Specifically, we compared colonization status at week 12 to determine the proportion of initially colonized participants who remained colonized at the end of follow-up.

Code
col_by_week5 <- 
  mae@colData |> as_tibble() |> 
  dplyr::select(pid, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |> 
  left_join(
    mae[["col_LBP_mg"]] |> 
      as_tibble() |> 
      dplyr::left_join(
        mae@colData |> as_tibble() |> dplyr::select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |> 
      dplyr::filter(visit_code == "1500", .feature == "colonized_LBP_by_mg")
  ) |> 
  mutate(LBP_colonization_by_week5 = outcome |> replace_na(FALSE)) 


col_at_week12 <- 
  mae@colData |> as_tibble() |> 
  dplyr::select(pid, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |> 
  left_join(
    mae[["col_LBP_mg"]] |> 
      as_tibble() |> 
      dplyr::left_join(
        mae@colData |> as_tibble() |> dplyr::select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |> 
      dplyr::filter(visit_code == "2120", .feature == "colonized_LBP_at_mg")
  ) |> 
  mutate(LBP_colonization_at_week12 = outcome |> replace_na(FALSE)) 


col_still_col <- 
  col_by_week5 |> 
  dplyr::select(pid, site, arm, randomized, ITT, mITT, PP, LBP_colonization_by_week5) |> 
  left_join(
    col_at_week12 |> dplyr::select(pid, LBP_colonization_at_week12 ),
    by = c("pid")
  )
Code
summary_col_still_col <-
  bind_rows(
    # by site and arms
    col_still_col |> 
      filter(LBP_colonization_by_week5) |>
      group_by(site, arm) |> 
      summarize(
        n = n(),  # total with colonization at week 5
        n_success = sum(LBP_colonization_at_week12),  # still colonized at week 12
        .groups = "drop"
      ),
    # combining sites, by arm
    col_still_col |> 
      filter(LBP_colonization_by_week5) |>
      group_by(arm) |> 
      summarize(
        n = n(),  # total with colonization at week 5
        n_success = sum(LBP_colonization_at_week12),  # still colonized at week 12
        .groups = "drop"
      ) |> 
      mutate(site = "CAP & MGH")
  ) |> 
  mutate(
    site = site |> fct_inorder(),
    arm = arm |> fct_expand("All arms", "All active arms")
    )

summary_col_still_col <- 
  summary_col_still_col |> 
  bind_rows(
    # overall by site, all arms
    summary_col_still_col |> 
      group_by(site) |> 
      summarize(
        n = sum(n),  # total with colonization at week 5
        n_success = sum(n_success),  # still colonized at week 12
        .groups = "drop"
      ) |> 
      mutate(arm = "All arms" |> factor(levels = summary_col_still_col$arm |> levels())),
    # overall by site, all active arms
    summary_col_still_col |>
      filter(arm != "Pl") |>
      group_by(site) |> 
      summarize(
        n = sum(n),  # total with colonization at week 5
        n_success = sum(n_success),  # still colonized at week 12
        .groups = "drop"
      )  |> 
      mutate(arm = "All active arms" |> factor(levels = summary_col_still_col$arm |> levels()))
  )

summary_col_still_col <- 
  summary_col_still_col |> 
  mutate(
    p_col_stil_col = n_success / n,
    CI = binom::binom.confint(n_success, n, method = "exact")
  )
Code
table_still_col <- 
  summary_col_still_col |> 
  mutate(
    N_per_site_and_arm = str_c("N = ", n),
    LBP_strain_detected = str_c(n_success, " (", round(p_col_stil_col * 100)," %)"),
    CI_text = str_c(round(CI$lower * 100), "% - ", round(CI$upper * 100), "%")
  ) |> 
  dplyr::select(
    site, arm, N_per_site_and_arm, LBP_strain_detected, CI_text
  ) |> 
  pivot_longer(-c(site, arm)) |>
  mutate(
    name = 
      case_when(
        name == "N_per_site_and_arm" ~ "N participants colonized by week 5",
        name == "LBP_strain_detected" ~ "n (%) participants\nwith LBP strain detected at week 12",
        name == "CI_text" ~ "95% CI"
      )
  ) |>
  arrange(arm) |> 
  pivot_wider(
    id_cols = c(site, name), 
    names_from = arm,
    values_from = value, values_fill = ""
    ) |> 
  arrange(site)
Code
table_still_col |> 
  group_by(site) |> 
  gt(
    caption = "Colonization at week 12 among participants colonized by week 5 by arm", 
    row_group_as_column = TRUE
    ) |> 
  cols_width(
    "name" ~ px(200),
    everything() ~ px(120)
  ) |> 
  cols_align(
    align = "center", 
    columns = c(Pl, `LC106-7`, `LC106-3`, `LC106-o`, `LC115`, `All arms`, `All active arms`)
  ) |> 
  cols_label(
    name = "",
    Pl = "Placebo",
    `LC106-7` = "LC106<br>7 days",
    `LC106-3` = "LC106<br>3 days",
    `LC106-o` = "LC106<br>overlap",
    `LC115` = "LC115<br>7 days",
    `All active arms` = "All active<br>arms",
    .fn = md
  )
Colonization at week 12 among participants colonized by week 5 by arm
Placebo LC106
7 days
LC106
3 days
LC106
overlap
LC115
7 days
All arms All active
arms
CAP N participants colonized by week 5 N = 9 N = 7 N = 7 N = 10 N = 33 N = 33
n (%) participants with LBP strain detected at week 12 6 (67 %) 3 (43 %) 4 (57 %) 4 (40 %) 17 (52 %) 17 (52 %)
95% CI 30% - 93% 10% - 82% 18% - 90% 12% - 74% 34% - 69% 34% - 69%
MGH N participants colonized by week 5 N = 1 N = 6 N = 1 N = 1 N = 6 N = 15 N = 14
n (%) participants with LBP strain detected at week 12 0 (0 %) 3 (50 %) 1 (100 %) 1 (100 %) 1 (17 %) 6 (40 %) 6 (43 %)
95% CI 0% - 98% 12% - 88% 3% - 100% 3% - 100% 0% - 64% 16% - 68% 18% - 71%
CAP & MGH N participants colonized by week 5 N = 1 N = 15 N = 8 N = 8 N = 16 N = 48 N = 47
n (%) participants with LBP strain detected at week 12 0 (0 %) 9 (60 %) 4 (50 %) 5 (62 %) 5 (31 %) 23 (48 %) 23 (49 %)
95% CI 0% - 98% 32% - 84% 16% - 84% 24% - 91% 11% - 59% 33% - 63% 34% - 64%
Code
g_long_terme_col <- 
  summary_col_still_col |> 
  filter(!(arm %in% c("Pl", "All arms"))) |> 
  arrange(arm) |> 
  mutate(arm = arm |> str_replace_all(" ", "\n") |> fct_inorder()) |> 
  ggplot() +
  aes(x = arm, y = p_col_stil_col, col = site) +
  geom_linerange(
    aes(ymin = CI$lower, ymax = CI$upper), 
    linewidth = 1.5, alpha = 0.5, lineend = "round",
    position = position_dodge(width = 0.5)
    ) +
  geom_point(position = position_dodge(width = 0.5), size = 3) +
  scale_color_manual("Study site", values = c(site_colors, c("CAP & MGH" = "gray30"))) +
  xlab("") + 
  scale_y_continuous(
    name = "% participants colonized by week 5\nwith LBP strain detected at week 12",
    labels = scales::percent_format(accuracy = 1)
  ) +
  theme(
    panel.grid.major.x = element_blank()
    # axis.text.x = element_text(angle = 45, hjust = 1)
  )

g_long_terme_col

Microbiota trajectories

Longitudinal stack relative abundances plot:

Code
rel_abs <- 
  mae[["mg"]] |> 
  as_tibble() |> 
  dplyr::select(
    .feature, .sample, uid, rel_ab, poor_quality_mg_data,
    taxon_label, genus, LBP, strain_origin
    ) 
Code
rel_abs <- 
  rel_abs |> 
  group_by(.feature) |> 
  mutate(max_rel_ab = max(rel_ab, na.rm = TRUE), mean_rel_ab = mean(rel_ab, na.rm = TRUE)) |> 
  ungroup() |> 
  arrange(is.na(LBP), -mean_rel_ab) |> 
  mutate(.feature = .feature |> fct_inorder()) |> 
  mutate(
    taxon = 
      case_when(
        as.numeric(.feature) <= 29 ~ taxon_label,
        (genus == "Lactobacillus") ~ "Other Lactobacillus",
        TRUE ~ "Other non-Lacto."
      )
  ) |> 
  arrange(is.na(LBP), LBP,genus != "Lactobacillus", genus, -mean_rel_ab) |> 
  mutate(taxon = taxon |> fct_inorder() |> fct_relevel("Other Lactobacillus", "Other non-Lacto.", "Other", after = Inf))
Code
rel_abs_agg <- 
  rel_abs |> 
  group_by(uid, taxon, LBP, strain_origin, poor_quality_mg_data) |> 
  summarize(rel_ab = sum(rel_ab), .groups = "drop") |> 
  dplyr::left_join(mae |> colData() |> as_tibble(), by = join_by(uid)) 
Code
tmp <- 
  rel_abs_agg |> 
  mutate(pid_label = ifelse(PP, "", ifelse(mITT, "*", ""))) |> 
  dplyr::select(pid, pid_label) |> 
  distinct() 
pid_label <- setNames(tmp$pid_label, tmp$pid)
rm(tmp)

trajectories_data <- 
  rel_abs_agg |>
  filter(mITT, randomized != 0, PIPV, !poor_quality_mg_data) |>
  group_by(pid) |> 
  mutate(
    tot_LC115 = sum(rel_ab[LBP == "LC115"]),
    tot_LBP = sum(rel_ab[!is.na(LBP)])
    ) |> 
  ungroup() |> 
  arrange(desc(PP), -tot_LC115, -tot_LBP) |> 
  mutate(
    pid = pid |> fct_inorder(),
    visit_label = visit |> as.character() |> fct_reorder(visit |> as.numeric())
    )
Code
g_trajectories <-
  trajectories_data |> 
  ggplot() + 
  aes(y = rel_ab, x = pid, fill = taxon) +
  geom_col() +
  ggh4x::facet_nested(
    rows = vars(visit_label),
    cols = vars(arm, site),
    scales = "free_x", space = "free_x", 
    strip = 
      ggh4x::strip_nested(
        background_x = 
          ggh4x::elem_list_rect(
            fill = c(rep("gray90",5), 
                     rep(c("#FF0000", "#00868B"), 5))
          ), 
        text_x = 
          ggh4x::elem_list_text(
            face = rep("bold", 15),
            colour = c(rep("black", 5), rep("white", 5), "transparent", "white", "transparent", rep("white", 3))
            )              
      )
  ) +
  scale_fill_manual(
    "LBP strain or taxon",
    values = get_taxa_colors(rel_abs_agg$taxon |> levels()),
    labels = get_taxa_labels(rel_abs_agg$taxon |> levels()),
    guide = guide_legend(ncol = 1)
    ) +
  scale_y_continuous(
    "Relative abundance", labels = scales::percent,
    breaks = seq(0, 0.5, by = 0.5), minor_breaks = seq(0, 1, by = 0.25)
  ) +
  scale_x_discrete("Participants", labels = pid_label) +
  theme(
    strip.text.x = element_text(hjust = 0.5),
    axis.text.x = element_text(hjust = 0.5, vjust = 1, size = 15, margin = margin(t = -1)), 
    axis.ticks.x = element_blank(),
    legend.text = element_markdown(),
    legend.key.height = unit(15, "pt")
  )  
Code
g_trajectories 

Microbiota trajectories of mITT participants in the VIBRANT study. The relative abundances of the LBP strains and top taxa are shown. Participants are ordered by their total relative abundance of LBP strains, and the study weeks are shown in the rows. Stars at the bottom of trajectories indicates participant that were not part of the per-protocol (PP) population.
Code
# checking the data for the placebo participant that has the LBP strains at MGH

mae[["mg"]] |> 
  as_tibble() |> 
  dplyr::left_join(mae@colData |> as_tibble()) |> 
  filter(pid == "068100062", study_week == 5) 


mae[["qPCR"]] |> 
  as_tibble() |> 
  dplyr::left_join(mae@colData |> as_tibble()) |> 
  filter(pid == "068100062", study_week == 5) 

mae[["amplicon"]] |> 
  as_tibble() |> 
  dplyr::left_join(mae@colData |> as_tibble()) |> 
  filter(pid == "068100062", study_week == 5) |> 
  dplyr::select(.feature, .sample, rel_ab, exclude_sample)


mae[["amplicon"]] |> 
  as_tibble() |> 
  dplyr::left_join(mae@colData |> as_tibble()) |> 
  filter(pid == "068100062", .feature == "Lactobacillus_crispatus") |> 
  dplyr::select(.feature, .sample, visit_code, rel_ab, exclude_sample)

Individual LBP strains by sites

Relative abundance of each strain at each visit as estimated by metagenomic sequencing. Strains are organized according to geographical origin, and product(s) in which they are included.

Code
LBP_strains <- 
  mae[["mg"]] |> 
  as_tibble() |> 
  filter(!is.na(LBP)) |> 
  dplyr::select(.feature, .sample, uid, rel_ab, LBP, strain_origin, poor_quality_mg_data) |> 
  left_join(mae |> colData() |> as_tibble()) |> 
  filter(mITT, !poor_quality_mg_data, randomized)
Code
LBP_strains |> 
   mutate(
    visit_label = visit |> as.character() |> fct_reorder(visit |> as.numeric())
    ) |> 
  filter(!is.na(arm), !is.na(study_week), study_week >= 2) |> 
  ggplot() +
  aes(x = .feature, y = rel_ab, col = site, fill = site) +
  ggbeeswarm::geom_quasirandom(alpha = 0.5, size = 0.5, dodge.width = 0.5) +
  ggh4x::facet_nested(
    rows = vars(visit_label),
    cols = vars(LBP, strain_origin),
    scales = "free_x", space = "free_x") + 
  xlab("LBP strain") +
  scale_y_continuous(
    "Relative abundance", labels = scales::percent,
    breaks = seq(0, 1, by = 0.2)
  ) +
  scale_color_manual("Study\nsite", values = site_colors) +
  scale_fill_manual("Study\nsite", values = site_colors) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1)
  ) 

Code
g_relative_abundance <- 
  LBP_strains |> 
  mutate(
    visit_label = visit |> as.character() |> fct_reorder(visit |> as.numeric()), 
    strain_origin = recode(strain_origin, "SA" = "ZA strains", "US" = "US strains")
    ) |>
  filter(!is.na(arm), !is.na(study_week), study_week >= 2) |> 
  ggplot() +
  aes(x = .feature, y = rel_ab, col = site, fill = site) +
  geom_boxplot(outlier.size = 0.7, alpha = 0.4) +
  ggh4x::facet_nested(
    rows = vars(visit_label),
    cols = vars(LBP, strain_origin),
    scales = "free_x", space = "free_x") +
  xlab("") +
  scale_y_continuous(
    "Relative abundance", labels = scales::percent,
    breaks = seq(0, 1, by = 0.2)
  ) +
  scale_color_manual(values = site_colors) +
  scale_fill_manual(values = site_colors) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1)
  ) 
g_relative_abundance

Relative abundances of individual LBP strains estimated by metagenomic sequencing across study weeks, stratified by site, strain origin, and strain identity. Each line represents the longitudinal trajectory of one participant; points show visit-specific measurements.

Code
LBP_strains |> 
  filter(!is.na(arm), !is.na(study_week), study_week >= 2) |> 
  mutate(
   strain_origin = recode(strain_origin, "SA" = "ZA strains", "US" = "US strains")
  ) |> 
  ggplot() +
  aes(x = study_week, y = rel_ab, col = strain_origin) +
  geom_point(alpha = 0.5, size = 0.5) +
  geom_line(aes(group = pid), alpha = 0.5) +
  ggh4x::facet_nested(
    rows = vars(site),
    cols = vars(LBP, strain_origin, .feature),
    scales = "free_x", space = "free_x") +
  xlab("Study week") +
  scale_y_continuous(
    "Relative abundance bacteria", labels = scales::percent,
    breaks = seq(0, 1, by = 0.2)
  ) +
  scale_color_manual("Strain origin", values = strain_origin_colors) +
  scale_x_continuous("Study weeks", breaks = c(0:5,7,9,12), minor_breaks = 0:12) +
  theme(
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5),
    legend.position = "bottom"
  ) 

Proportion colonized among exposed

We estimated the proportion of exposed participants who ever colonized with each LBP strain during follow-up (visits after the first 2 weeks), based on metagenomic relative abundances.

Colonization was defined at the participant level as detection of a given strain at ≥5% relative abundance in at least one post-week 2 sample. Placebo participants were excluded, since they were not exposed to any of the strains.

For LC115 strains, only participants in the LC115 arm were considered exposed, while for LC106 strains all participants in active arms were considered exposed. For each strain and site, we calculated the number of exposed participants, the number who colonized, the corresponding proportion, and 95% confidence intervals using Wilson’s method.

Code
tmp <- 
  # we use the metagenomics data for this
  mae[["mg"]] |> 
  as_tibble() |> 
  filter(!is.na(LBP), !poor_quality_mg_data) |> 
  dplyr::select(.feature, LBP, strain_origin, .sample, uid, rel_ab) |> 
  # add the arm, mITT, etc.
  left_join(
    mae |> colData() |> as_tibble() |> dplyr::select(uid, pid, site, randomized, mITT, arm, visit_code, visit, PIPV), 
    by = join_by(uid)
  ) |>
  # only consider the mITT
  filter(randomized, mITT) |> 
  filter(PIPV, as.numeric(visit_code) >= 1200) |> # only include visits after the first 2 weeks
  # define the outcome of interest for each person and strain
  group_by(.feature, LBP, strain_origin, pid, arm, site) |>
  summarize(
    ever_colonized = any(rel_ab >= 0.05, na.rm = TRUE),
    .groups = "drop"
  ) |> 
  # exclude Placebo participants because, they were, in theory, not exposed to any of the strains.
  filter(arm != "Pl") |> 
  # define the total number of participants exposed for each strain.
  # For LC115 strains, this is only participants of arm LC115
  # For LC106 strains, this is participants from all active arms.
  mutate(
    exposed = case_when(
      LBP == "LC115" ~ (arm == "LC115"),
      LBP == "LC106 & LC115" ~ TRUE,
      TRUE ~ FALSE
    )
  ) |> 
  # we compute the statistics of interest
  group_by(.feature, LBP, strain_origin, site) |> 
  summarize(
    n_exposed = sum(exposed, na.rm = TRUE),
    n_colonized = sum(ever_colonized & exposed, na.rm = TRUE),
    .groups = "drop"
  ) |> 
  mutate(
    prop_colonized = n_colonized / n_exposed,
    CI_low = binom::binom.confint(n_colonized, n_exposed, method = "wilson")$lower,
    CI_high = binom::binom.confint(n_colonized, n_exposed, method = "wilson")$upper
  )
Code
g_colonized_among_exposed <- 
  tmp |> 
  mutate(strain_origin = recode(strain_origin, "SA" = "ZA strains", "US" = "US strains")) |>
  ggplot() +
  aes(x = .feature, y = prop_colonized, col = site) +
   ggh4x::facet_nested(
    cols = vars(LBP, strain_origin),
    scales = "free_x", space = "free_x") +
  geom_errorbar(
    aes(ymin = CI_low, ymax = CI_high), 
    linewidth = 1, alpha = 0.7,
    width = 0.3, position = position_dodge(width = 0.5)
    ) +
  geom_point(position = position_dodge(width = 0.5), size = 2) +
  scale_color_manual("Study site", values = site_colors) +
  xlab("") + 
  ylab("Proportion of participants colonized\namong those exposed") +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

g_colonized_among_exposed

Code
tmp |> 
  mutate(
    strain_origin = recode(strain_origin, "SA" = "ZA strains", "US" = "US strains"),
    CI = str_c(
      "[", 
      scales::percent(CI_low, accuracy = 1), 
      " - ", 
      scales::percent(CI_high, accuracy = 1), 
      "]"
    ),
    value = 
      str_c(
        scales::percent(prop_colonized, accuracy = 1), 
        " (", n_colonized, "/", n_exposed,")<br>",
       CI
      ),
    strain_info = str_c(LBP, "<br>(", strain_origin, " strains)")
  ) |> 
  dplyr::select(-n_exposed, -n_colonized, -prop_colonized, -CI_low, -CI_high, -CI, -LBP, -strain_origin) |> 
  pivot_wider(
    names_from = c(site), 
    values_from = value
  ) |>
  arrange(strain_info) |> 
  group_by(strain_info) |> 
  gt(
    row_group_as_column = TRUE, 
    process_md = TRUE,
    caption = "Proportion of mITT participants exposed to a given strain who colonized with that strain (colonization defined as a ≥ 5% relative abundance by metagenomics) at any weekly visit after product exposure."
  ) |> 
  cols_label(.feature = "LBP strain") |> 
  cols_align(columns = .feature, align = "left") |> 
  cols_align(columns = c("CAP", "MGH"), align = "center") |>
  fmt_markdown(columns = everything()) 
Proportion of mITT participants exposed to a given strain who colonized with that strain (colonization defined as a ≥ 5% relative abundance by metagenomics) at any weekly visit after product exposure.
LBP strain CAP MGH
LC106 & LC115
(US strains strains)
C0022A1 25% (14/56)
[16% - 38%]
47% (7/15)
[25% - 70%]
C0059E1 61% (34/56)
[48% - 72%]
87% (13/15)
[62% - 96%]
C0175A1 39% (22/56)
[28% - 52%]
67% (10/15)
[42% - 85%]
LC106 & LC115
(ZA strains strains)
FF00018 7% (4/56)
[3% - 17%]
20% (3/15)
[7% - 45%]
FF00051 30% (17/56)
[20% - 43%]
67% (10/15)
[42% - 85%]
UC101 25% (14/56)
[16% - 38%]
73% (11/15)
[48% - 89%]
LC115
(US strains strains)
C0006A1 50% (7/14)
[27% - 73%]
67% (4/6)
[30% - 90%]
C0028A1 0% (0/14)
[0% - 22%]
0% (0/6)
[0% - 39%]
C0112A1 64% (9/14)
[39% - 84%]
83% (5/6)
[44% - 97%]
LC115
(ZA strains strains)
122010 57% (8/14)
[33% - 79%]
33% (2/6)
[10% - 70%]
185329 0% (0/14)
[0% - 22%]
0% (0/6)
[0% - 39%]
FF00004 50% (7/14)
[27% - 73%]
83% (5/6)
[44% - 97%]
FF00064 50% (7/14)
[27% - 73%]
83% (5/6)
[44% - 97%]
FF00072 7% (1/14)
[1% - 31%]
50% (3/6)
[19% - 81%]
UC119 0% (0/14)
[0% - 22%]
0% (0/6)
[0% - 39%]

Excluding immediate post-product visit

Same analyze as above but excluding the week 2 visit for the LC106-7 and LC115 arms, since this visit is immediately after product use.

Code
tmp <- 
  # we use the metagenomics data for this
  mae[["mg"]] |> 
  as_tibble() |> 
  filter(!is.na(LBP), !poor_quality_mg_data) |> 
  dplyr::select(.feature, LBP, strain_origin, .sample, uid, rel_ab) |> 
  # add the arm, mITT, etc.
  left_join(
    mae |> colData() |> as_tibble() |> 
      dplyr::select(uid, pid, site, randomized, mITT, arm, visit_code, visit, PIPV), 
    by = join_by(uid)
  ) |>
  # only consider the mITT
  filter(randomized, mITT) |> 
  mutate(
    include_visit = 
      case_when(
        (arm %in% c("LC106-7", "LC115")) & as.numeric(visit_code) > 1200 ~ TRUE,
        (arm %in% c("LC106-3", "LC106-o")) & as.numeric(visit_code) >= 1200 ~TRUE,
        TRUE ~ FALSE
      )
  ) |> 
  filter(PIPV, include_visit) |> # only include visits after the first 2 weeks
  # define the outcome of interest for each person and strain
  group_by(.feature, LBP, strain_origin, pid, arm, site) |>
  summarize(
    ever_colonized = any(rel_ab >= 0.05, na.rm = TRUE),
    .groups = "drop"
  ) |> 
  # exclude Placebo participants because, they were, in theory, not exposed to any of the strains.
  filter(arm != "Pl") |> 
  # define the total number of participants exposed for each strain.
  # For LC115 strains, this is only participants of arm LC115
  # For LC106 strains, this is participants from all active arms.
  mutate(
    exposed = case_when(
      LBP == "LC115" ~ (arm == "LC115"),
      LBP == "LC106 & LC115" ~ TRUE,
      TRUE ~ FALSE
    )
  ) |> 
  # we compute the statistics of interest
  group_by(.feature, LBP, strain_origin, site) |> 
  summarize(
    n_exposed = sum(exposed, na.rm = TRUE),
    n_colonized = sum(ever_colonized & exposed, na.rm = TRUE),
    .groups = "drop"
  ) |> 
  mutate(
    prop_colonized = n_colonized / n_exposed,
    CI_low = binom::binom.confint(n_colonized, n_exposed, method = "wilson")$lower,
    CI_high = binom::binom.confint(n_colonized, n_exposed, method = "wilson")$upper
  )
Code
g_colonized_among_exposed_supp <- 
  tmp |> 
  mutate(strain_origin = recode(strain_origin, "SA" = "ZA strains", "US" = "US strains")) |>
  ggplot() +
  aes(x = .feature, y = prop_colonized, col = site) +
  facet_grid(. ~ LBP + strain_origin, scales = "free", space = "free") +
  geom_errorbar(
    aes(ymin = CI_low, ymax = CI_high), 
    linewidth = 1, alpha = 0.7,
    width = 0.3, position = position_dodge(width = 0.5)
    ) +
  geom_point(position = position_dodge(width = 0.5), size = 2) +
  scale_color_manual("Study site", values = site_colors) +
  xlab("LBP strain") + 
  ylab("Proportion of participants colonized\namong those exposed") +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

g_colonized_among_exposed_supp

Code
tmp |> 
  mutate(
    strain_origin = recode(strain_origin, "SA" = "ZA strains", "US" = "US strains"),
    CI = str_c(
      "[", 
      scales::percent(CI_low, accuracy = 1), 
      " - ", 
      scales::percent(CI_high, accuracy = 1), 
      "]"
    ),
    value = 
      str_c(
        scales::percent(prop_colonized, accuracy = 1), 
        " (", n_colonized, "/", n_exposed,")<br>",
       CI
      ),
    strain_info = str_c(LBP, "<br>(", strain_origin, " strains)")
  ) |> 
  dplyr::select(-n_exposed, -n_colonized, -prop_colonized, -CI_low, -CI_high, -CI, -LBP, -strain_origin) |> 
  pivot_wider(
    names_from = c(site), 
    values_from = value
  ) |>
  arrange(strain_info) |> 
  group_by(strain_info) |> 
  gt(
    row_group_as_column = TRUE, 
    process_md = TRUE,
    caption = "Proportion of mITT participants exposed to a given strain who colonized with that strain (colonization defined as a ≥ 5% relative abundance by metagenomics) at any weekly visit after product exposure but excluding immediate post-product visit (i.e., excluding week 2 visit for the LC106-7 and LC115 arms)."
  ) |> 
  cols_label(.feature = "LBP strain") |> 
  cols_align(columns = .feature, align = "left") |> 
  cols_align(columns = c("CAP", "MGH"), align = "center") |>
  fmt_markdown(columns = everything()) 
Proportion of mITT participants exposed to a given strain who colonized with that strain (colonization defined as a ≥ 5% relative abundance by metagenomics) at any weekly visit after product exposure but excluding immediate post-product visit (i.e., excluding week 2 visit for the LC106-7 and LC115 arms).
LBP strain CAP MGH
LC106 & LC115
(US strains strains)
C0022A1 21% (12/56)
[13% - 34%]
33% (5/15)
[15% - 58%]
C0059E1 43% (24/56)
[31% - 56%]
40% (6/15)
[20% - 64%]
C0175A1 39% (22/56)
[28% - 52%]
67% (10/15)
[42% - 85%]
LC106 & LC115
(ZA strains strains)
FF00018 2% (1/56)
[0% - 9%]
7% (1/15)
[1% - 30%]
FF00051 9% (5/56)
[4% - 19%]
20% (3/15)
[7% - 45%]
UC101 2% (1/56)
[0% - 9%]
7% (1/15)
[1% - 30%]
LC115
(US strains strains)
C0006A1 0% (0/14)
[0% - 22%]
0% (0/6)
[0% - 39%]
C0028A1 0% (0/14)
[0% - 22%]
0% (0/6)
[0% - 39%]
C0112A1 0% (0/14)
[0% - 22%]
0% (0/6)
[0% - 39%]
LC115
(ZA strains strains)
122010 36% (5/14)
[16% - 61%]
17% (1/6)
[3% - 56%]
185329 0% (0/14)
[0% - 22%]
0% (0/6)
[0% - 39%]
FF00004 0% (0/14)
[0% - 22%]
0% (0/6)
[0% - 39%]
FF00064 0% (0/14)
[0% - 22%]
0% (0/6)
[0% - 39%]
FF00072 0% (0/14)
[0% - 22%]
0% (0/6)
[0% - 39%]
UC119 0% (0/14)
[0% - 22%]
0% (0/6)
[0% - 39%]

Supplementary analyses

Applicator staining

We analyze returned applicators and staining results to assess adherence to study product use. We use data from participant_crfs_merged stored in the @metadata slot of the mae object, which contains data from CRF23 (returned applicators) and CRF46 (staining results). We also use the exposures table to obtain information on study product use.

From these, we create the applicator table, which includes only participants from the mITT population.

Code
tmp <- 
   metadata(mae)$participant_crfs_merged |> 
   as_tibble() |> 
   select(pid, used_applicators, number_applicators, applicator_stain_positive, applicator_stain_negative, applicator_stain_indeterminant, total_applicators_stained, proportion_positive_applicators, total) |> 
   dplyr::rename(
     returned_used_applicators = used_applicators,
     number_of_returned_applicators = number_applicators
   )

study_product <- 
  mae@colData |> 
  as_tibble() |> 
  select(pid, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |> 
  left_join(
    mae@metadata$calc_vars_pid |> select(pid, n_product_doses),  
    by = "pid"
  )

applicator <- 
  mae@colData |> as_tibble() |>
  select(pid, site, randomized, arm, ITT, mITT, PP) |>
  distinct() |>
  filter(randomized, mITT) |>
  left_join(tmp, by = c("pid")) |>
  left_join(study_product, by = "pid") 

We note that two participants returned eight applicators. In both cases, comments explain that they had received a spare applicator:

  • Participant 068-20-0425 reported taking the spare applicator out of its wrapping and placing it in the return bag without using or washing any applicators.

  • Participant 068-20-0439 reported inserting the spare applicator on the day of the visit, thinking all applicators had to be used, but also did not wash any of them.

Returned applicator statistics

Here, we want to document the proportions of mITT participants who

  • returned any used applicators

  • returned their complete set of applicators (i.e., 7 or 8 applicators)

Code
applicator <- 
  applicator |> 
  mutate(
    returned_complete_set = (number_of_returned_applicators >= 7),
    return_category = case_when(
      (returned_used_applicators == "Yes") & returned_complete_set ~ "Returned complete set",
      (returned_used_applicators == "Yes") & !returned_complete_set ~ "Returned incomplete set",
      returned_used_applicators == "No" ~ "Did not return applicators",
      TRUE ~ "Missing"
    ) |> fct_infreq()
  ) 

applicator |> 
  dplyr::count(return_category) |> 
  mutate(`%` = round(n / sum(n), 3) * 100) |> 
  gt() |> 
  cols_label(return_category = "")
n %
Returned complete set 77 85.6
Returned incomplete set 7 7.8
Did not return applicators 6 6.7

Staining statistics

Here, we want to document, for those who returned their complete set of applicators, how many got their full set stained.

Code
applicator <- 
  applicator |> 
  mutate(
    staining_done_category = 
      case_when(
        (number_of_returned_applicators == 0) | is.na(number_of_returned_applicators) ~ "No applicators returned",
        is.na(total_applicators_stained) ~ "No staining done",
        total_applicators_stained == number_of_returned_applicators ~ "All returned applicators stained",
        total_applicators_stained < number_of_returned_applicators ~ "Partial staining",
        total_applicators_stained > number_of_returned_applicators ~ "???",
        TRUE ~ "???"
      )
  )
Code
applicator |> 
  dplyr::count(return_category, staining_done_category) |> 
  group_by(return_category) |>
  mutate(
    `%` = round(n / sum(n), 3) * 100
  ) |>
  gt(row_group_as_column = TRUE)
staining_done_category n %
Returned complete set All returned applicators stained 77 100.0
Returned incomplete set All returned applicators stained 6 85.7
Partial staining 1 14.3
Did not return applicators No applicators returned 6 100.0
Code
applicator |> 
  dplyr::count(return_category, staining_done_category) |> 
  group_by(return_category) |>
  mutate(
    `%` = round(n / sum(n), 3) * 100
  ) |>
  gt(row_group_as_column = TRUE)
staining_done_category n %
Returned complete set All returned applicators stained 77 100.0
Returned incomplete set All returned applicators stained 6 85.7
Partial staining 1 14.3
Did not return applicators No applicators returned 6 100.0

Positive staining statistics

Here, we document, for the mITT participants who returned their complete set and got all of their applicators stained, the number of positively stained applicators.

Code
applicator |> 
  mutate(
    positive_staining_category = 
      ifelse(
      applicator_stain_positive == total_applicators_stained,
      "All stained applicators are positive",
      "Some stained applicators are not positive"
      )
  ) |> 
  dplyr::count(return_category, staining_done_category, positive_staining_category) |> 
  group_by(return_category, staining_done_category) |> 
  mutate(`%` = round(n / sum(n), 3) * 100) |>
  gt(row_group_as_column = TRUE) |>
  cols_label(
    positive_staining_category = "Staining result"
  )
Staining result n %
Returned complete set - All returned applicators stained All stained applicators are positive 61 79.2
Some stained applicators are not positive 16 20.8
Returned incomplete set - All returned applicators stained All stained applicators are positive 5 83.3
Some stained applicators are not positive 1 16.7
Returned incomplete set - Partial staining All stained applicators are positive 1 100.0
Did not return applicators - No applicators returned NA 6 100.0
Code
applicator |> 
  filter(
    return_category == "Returned complete set",
    staining_done_category == "All returned applicators stained"
  ) |> 
  dplyr::count(
    return_category,
    proportion_positive_applicators, 
    str_c(100*proportion_positive_applicators, "%")
  ) |> 
  mutate(`%` = round(n / sum(n), 3) * 100) |>
  select(-proportion_positive_applicators) |> 
  group_by(return_category) |> 
  gt(row_group_as_column = TRUE) |> 
  cols_label(
    `str_c(100 * proportion_positive_applicators, "%")` = "% of positive applicators"
  )
% of positive applicators n %
Returned complete set 0% 1 1.3
57% 1 1.3
71% 5 6.5
86% 8 10.4
88% 1 1.3
100% 61 79.2
Code
applicator |> 
  filter(
    return_category == "Returned complete set",
    staining_done_category == "All returned applicators stained"
  ) |> 
  mutate(
    number_of_non_positive_applicators = 
      total_applicators_stained - applicator_stain_positive
  ) |> 
  dplyr::count(
    return_category, number_of_non_positive_applicators
  ) |> 
  mutate(`%` = round(n / sum(n), 3) * 100) |>
  group_by(return_category) |> 
  gt(row_group_as_column = TRUE) |> 
  cols_label(
    `number_of_non_positive_applicators` = "n non-positive"
  )
n non-positive n %
Returned complete set 0 61 79.2
1 9 11.7
2 5 6.5
3 1 1.3
7 1 1.3

Comparison with declared number of doses taken

Code
applicator |> 
  filter(
    return_category == "Returned complete set",
    staining_done_category == "All returned applicators stained"
  ) |>
  mutate(
    applicator_stain_positive_fct = applicator_stain_positive |> as.factor() |> fct_expand("Missing") |> replace_na("Missing"),
    n_product_doses_fct = n_product_doses |> as.factor()
  ) |>
  dplyr::count(
    n_product_doses, n_product_doses_fct, 
    applicator_stain_positive, applicator_stain_positive_fct
    ) |>
  mutate(
    label = str_c(n, " (", round(100 * n / sum(n)), "%)") 
  ) |> 
  ggplot() + 
  aes(
    x = n_product_doses_fct , 
    y = applicator_stain_positive_fct, 
    fill = applicator_stain_positive >= n_product_doses,
    alpha = n
  ) +
  geom_tile() +
  geom_label(aes(label = label), size = 3) +
  scale_fill_manual(na.value = "gray70", values = c("tomato", "steelblue2")) +
  # scale_fill_gradient(low = "lightblue1", high = "lightblue4") +
  guides(fill = "none", alpha = "none") +
  xlab("Number of doses taken (self-report)") +
  ylab("Number of positive applicators") +
  labs(
    caption = "Participants who returned their complete set of applicators were included in this analysis"
  ) 

Interval between last dose and week 2 clinic visit in active arms

Code
tmp <- 
  mae@metadata$calc_vars_pid |>
  left_join(
    mae@colData |>
      as.data.frame() |> 
      select(pid, site, arm, randomized, mITT) |> 
      distinct(),
    by = join_by(pid)
  ) |> 
  filter(randomized, mITT) 

tmp |> 
  filter(arm != "Pl") |> 
  dplyr::count(site, arm, last_dose_at_W2) |> 
  pivot_wider(
    id_cols = c(site, last_dose_at_W2), 
    names_from = arm,
    values_from = n, values_fill = 0
  ) |> 
  group_by(site) |> 
  gt() 
last_dose_at_W2 LC106-7 LC106-3 LC106-o LC115
CAP
same day 1 0 0 0
day before 10 12 0 13
more than one day 3 2 14 1
MGH
same day 1 0 0 0
day before 5 1 0 5
more than one day 1 0 1 1
Code
tmp |> 
  filter(arm != "Pl") |>
  dplyr::count(last_dose_at_W2) |>
  gt()
last_dose_at_W2 n
same day 2
day before 46
more than one day 23
Code
rm(tmp)

Dominance of LBP strains at visits remote from dosing

To assess whether colonization typically reflected dominance, we examined the composition of samples from participants in active arms who achieved the primary outcome at visits conducted after product use (weeks 3–5 for LC106-7 and LC115; weeks 2–5 for LC106-o and LC106-3).

For each colonized sample, we calculated the total relative abundance of LBP strains based on metagenomic data. We then determined the proportion of samples in which LBP strains accounted for at least 50% of the community.

Code
col_at <- 
  mae@colData |> as_tibble() |>
  select(pid, site, randomized, arm, ITT, mITT, PP) |>
  distinct() |>
  filter(randomized, mITT, arm != "Pl") |>
  left_join(
    mae[["col_LBP_mg"]] |>
      as_tibble() |>
      dplyr::left_join(
        mae@colData |> as_tibble() |> select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      )
  ) |>
  dplyr::filter(
    .feature == "colonized_LBP_at_mg" &
      (arm %in% c("LC106-7", "LC115") & visit_code %in% c("1300", "1400", "1500") |
       arm %in% c("LC106-o", "LC106-3") & visit_code %in% c("1200", "1300", "1400", "1500")
      )
  ) |>
  filter(outcome) # only primary outcome achieved 

col_at |> 
  left_join(mae[["mg"]] |> as_tibble(), by = ".sample") |> 
  filter(!is.na(LBP)) |> # only LBP strain
  group_by(.sample) |> 
  summarise(
    prop_tot_LBP = sum(rel_ab, na.rm = TRUE),
    .groups = "drop"
  ) |> 
  dplyr::count(prop_tot_LBP >= 0.5) |> 
  mutate(prop = 100 * round(n / sum(n), 3)) |> 
  gt() |> 
  cols_label(
    `prop_tot_LBP >= 0.5` = "LBP strains ≥ 50% total rel. ab",
    prop = "%"
  )
LBP strains ≥ 50% total rel. ab n %
FALSE 13 13.3
TRUE 85 86.7

LBP colonization in participants with BV post-MTZ

We assessed treatment response by Nugent score at the first visit following oral metronidazole treatment (visit 1100). Nugent scores were extracted from visits_crfs_merged and a binary variable was created to classify participants as BV positive (nugent_total_score ≥ 7) or BV negative (< 7).

We then estimated the number and proportion of participants who remained BV positive post-metronidazole, overall and by treatment arm.

Code
# nugent score for all visits 

nugent_score <- 
  mae@colData |> 
  as_tibble() |> 
  select(uid, pid, visit_code, study_day, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |> 
  left_join(
    metadata(mae)$visits_crfs_merged |> 
      as_tibble() |> 
      select(pid, visit_code, nugent_total_score),
    by = c("pid", "visit_code")
  ) |> 
  # some pid x visit are duplicate but there is no data for the nugent score, so we remove the duplicate visit with no data for nugent score
  group_by(pid, visit_code) |> 
  arrange(is.na(nugent_total_score)) |>  
  dplyr::slice(1) |>  
  ungroup() |>
  group_by(visit_code) |> 
  mutate(any_Nugent = any(!is.na(nugent_total_score))) |> 
  ungroup() |> 
  filter(any_Nugent) |> 
  group_by(pid) |>
  mutate(
    active_arm = arm != "Pl",
    nugent_positive = nugent_total_score >= 7
  ) |> 
  ungroup()
Code
nugent_score |> 
  ggplot()+ 
  aes(x = visit_code, y = pid, fill = nugent_total_score) +
  geom_tile() +
  scale_fill_gradient(
    name = "Nugent score",
    low = "lightblue1", high = "red3"
  ) + 
  labs(x = "Visit Code", y = "Participant ID") + 
  theme(axis.text.y = element_text(size = 4))

Code
nugent_score |>
  filter(visit_code == "1100") |> 
  dplyr::count(nugent_positive) |> 
  gt(caption = "Number of mITT participant with BV (nugent_positive == TRUE) post-MTZ (visit 1100)")
Number of mITT participant with BV (nugent_positive == TRUE) post-MTZ (visit 1100)
nugent_positive n
FALSE 68
TRUE 18
NA 4
Code
nugent_score |>
  filter(visit_code == "1100") |> 
  dplyr::count(active_arm, nugent_positive) |> 
  gt(caption = "Number of mITT participant with BV (nugent_positive == TRUE) post-MTZ (visit 1100) by active arm")
Number of mITT participant with BV (nugent_positive == TRUE) post-MTZ (visit 1100) by active arm
active_arm nugent_positive n
FALSE FALSE 11
FALSE TRUE 6
FALSE NA 2
TRUE FALSE 57
TRUE TRUE 12
TRUE NA 2

At visit 1100, among participants with BV who were exposed to the LBP (i.e., in the active arm), 5 participants had a detectable LBP strain identified after treatment by metagenomics.

Code
# among the 12 of these people exposed to LBP

tmp <- 
  nugent_score |>
  filter(visit_code == "1100") |> 
  filter(active_arm) |>
  left_join(
    # add mg data to see how many had detectable LBP strain post-MTZ
    mae[["mg"]] |> as_tibble() |>  # add mg data
      dplyr::left_join(
        mae@colData |> as_tibble() |> select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |>
      filter(visit_code == "1200", !poor_quality_mg_data)|>
      filter(!is.na(LBP)) |> #only LBP strain
      group_by(pid) |>
      summarise(prop_tot_LBP = sum(rel_ab)) |>
      arrange(prop_tot_LBP |> desc()) |> 
      mutate(LBP_detected_post_INT = prop_tot_LBP >= 0.05),
    by = c("pid")
  )  

tmp |> 
  dplyr::count(nugent_positive, LBP_detected_post_INT) |> 
  gt(caption = "Number of mITT participant in one of the active arms with BV diagnosis post-MTZ (visit 1100) and LBP detection (>= 5% total LBP rel. ab) post-INT (visit 1200)")
Number of mITT participant in one of the active arms with BV diagnosis post-MTZ (visit 1100) and LBP detection (>= 5% total LBP rel. ab) post-INT (visit 1200)
nugent_positive LBP_detected_post_INT n
FALSE FALSE 19
FALSE TRUE 36
FALSE NA 2
TRUE FALSE 6
TRUE TRUE 5
TRUE NA 1
NA TRUE 2

rBV by colonization status

In this section, we are investigating whether LBP colonization is protective of rBV.

We only include participants who were BV-cured post-MTZ, then count the number and proportions of participants who had rBV among those who met the primary outcome and those who did not.

Code
rBV_stats <- 
  # we first define for each mITT participant whether they were BV-cured or not post-MTZ by Nugent
  nugent_score |> 
  filter(visit_code == "1100", mITT) |> 
  select(uid, pid, visit_code, site, mITT, active_arm, arm, nugent_total_score) |> 
  mutate(BV_cure_post_MTZ = nugent_total_score < 7) |> 
  # we then specify for each of these participants whether they met the criteria for the primary endpoint of LBP colonization by week 5
  left_join(
    mae[["col_LBP_mg"]] |> 
      as_tibble() |> 
      dplyr::left_join(
        mae@colData |> as_tibble() |> select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |> 
      dplyr::filter(visit_code == "1500", .feature == "colonized_LBP_by_mg") |> 
      mutate(LBP_colonization_by_week5 = outcome |> replace_na(FALSE)) |> 
      select(pid, LBP_colonization_by_week5),
    by = join_by(pid)
  ) |> 
  # we next check who had BV (or rBV) by week 12
  left_join(
    nugent_score |> 
      select(pid, visit_code, nugent_positive) |> 
      filter(as.numeric(visit_code) > 1100, as.numeric(visit_code) <= 2120) |> 
      group_by(pid) |> 
      summarize(
        BV_by_week12 = any(nugent_positive, na.rm = TRUE),
        BV_after_week5 = any(nugent_positive & as.numeric(visit_code) > 1500, na.rm = TRUE), # any BV after week 5
        ),
    by = join_by(pid)
  ) |> 
  mutate(
    rBV_by_week12 = BV_by_week12 & BV_cure_post_MTZ # rBV if BV by week 12 and cured post-MTZ
  )
  
  
rBV_stats |> 
  filter(active_arm) |> 
  dplyr::count(active_arm, BV_cure_post_MTZ, LBP_colonization_by_week5, BV_by_week12)
# A tibble: 10 × 5
   active_arm BV_cure_post_MTZ LBP_colonization_by_week5 BV_by_week12     n
   <lgl>      <lgl>            <lgl>                     <lgl>        <int>
 1 TRUE       FALSE            FALSE                     FALSE            1
 2 TRUE       FALSE            FALSE                     TRUE             6
 3 TRUE       FALSE            TRUE                      FALSE            2
 4 TRUE       FALSE            TRUE                      TRUE             3
 5 TRUE       TRUE             FALSE                     FALSE            5
 6 TRUE       TRUE             FALSE                     TRUE            12
 7 TRUE       TRUE             TRUE                      FALSE           27
 8 TRUE       TRUE             TRUE                      TRUE            13
 9 TRUE       NA               TRUE                      FALSE            1
10 TRUE       NA               TRUE                      TRUE             1
Code
rBV_stats |> 
  filter(active_arm, BV_cure_post_MTZ) |> 
  dplyr::count(LBP_colonization_by_week5, BV_by_week12) |> 
  group_by(LBP_colonization_by_week5) |> 
  mutate(
    N = sum(n),
    `%` = round(n / N, 3) * 100,
    CI = binom::binom.confint(n, N, method = "exact"),
    `95% CI` = str_c(round(CI$lower * 100, 1), "%–", round(CI$upper * 100, 1), "%")
  ) |> 
  select(-CI)
# A tibble: 4 × 6
# Groups:   LBP_colonization_by_week5 [2]
  LBP_colonization_by_week5 BV_by_week12     n     N   `%` `95% CI`   
  <lgl>                     <lgl>        <int> <int> <dbl> <chr>      
1 FALSE                     FALSE            5    17  29.4 10.3%–56%  
2 FALSE                     TRUE            12    17  70.6 44%–89.7%  
3 TRUE                      FALSE           27    40  67.5 50.9%–81.4%
4 TRUE                      TRUE            13    40  32.5 18.6%–49.1%
Code
rBV_stats <- 
  # we first define for each mITT participant whether they were BV-cured or not post-MTZ by Nugent
  nugent_score |> 
  select(-any_Nugent, -nugent_positive) |> 
  filter(visit_code == "1100", mITT) |> 
  mutate(BV_cure_post_MTZ = nugent_total_score < 7) |> 
  # we then specify for each of these participants whether they met the criteria for the primary endpoint of LBP colonization by week 5
  left_join(
    mae[["col_LBP_mg"]] |> 
      as_tibble() |> 
      dplyr::left_join(
        mae@colData |> as_tibble() |> select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |> 
      dplyr::filter(visit_code == "1500", .feature == "colonized_LBP_by_mg") |> 
      mutate(LBP_colonization_by_week5 = outcome |> replace_na(FALSE)) |> 
      select(pid, LBP_colonization_by_week5),
    by = join_by(pid)
  ) |> 
  # we next check who had BV (or rBV) by week 12
  left_join(
    nugent_score |> 
      select(pid, visit_code, nugent_positive) |> 
      filter(as.numeric(visit_code) > 1100, as.numeric(visit_code) <= 2120) |> 
      group_by(pid) |> 
      summarize(
        BV_by_week12 = any(nugent_positive, na.rm = TRUE),
        BV_after_week5 = any(nugent_positive & as.numeric(visit_code) > 1500, na.rm = TRUE), # any BV after week 5
        ),
    by = join_by(pid)
  ) |> 
  mutate(
    rBV_by_week12 = BV_by_week12 & BV_cure_post_MTZ, # rBV if BV by week 12 and cured post-MTZ
  )
  
  
rBV_stats |> filter(active_arm) |> dplyr::count(active_arm, LBP_colonization_by_week5, BV_after_week5)
# A tibble: 4 × 4
  active_arm LBP_colonization_by_week5 BV_after_week5     n
  <lgl>      <lgl>                     <lgl>          <int>
1 TRUE       FALSE                     FALSE             11
2 TRUE       FALSE                     TRUE              13
3 TRUE       TRUE                      FALSE             37
4 TRUE       TRUE                      TRUE              10
Code
col_by_week5 <- 
  mae@colData |> as_tibble() |> 
  select(pid, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |> 
  left_join(
    mae[["col_LBP_mg"]] |> 
      as_tibble() |> 
      dplyr::left_join(
        mae@colData |> as_tibble() |> select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |> 
      dplyr::filter(visit_code == "1500", .feature == "colonized_LBP_by_mg")
  ) |> 
  mutate(LBP_colonization_by_week5 = outcome |> replace_na(FALSE)) 


nugent_score |> 
  left_join(col_by_week5 |> select(pid, LBP_colonization_by_week5), 
            by = "pid") |> 
  arrange(pid, visit_code) |>
  group_by(pid, active_arm) |>
  summarise(
    LBP_colonization_by_week5 = any(LBP_colonization_by_week5, na.rm = TRUE),
    BV_recurrence = {
      cured <- cumsum(!nugent_positive & !is.na(nugent_positive)) > 0
      any(nugent_positive & cured, na.rm = TRUE)
    },
    .groups = "drop"
  ) |> 
  dplyr::count(active_arm, LBP_colonization_by_week5, BV_recurrence)
# A tibble: 7 × 4
  active_arm LBP_colonization_by_week5 BV_recurrence     n
  <lgl>      <lgl>                     <lgl>         <int>
1 FALSE      FALSE                     FALSE             9
2 FALSE      FALSE                     TRUE              9
3 FALSE      TRUE                      TRUE              1
4 TRUE       FALSE                     FALSE             8
5 TRUE       FALSE                     TRUE             16
6 TRUE       TRUE                      FALSE            32
7 TRUE       TRUE                      TRUE             15
Code
nugent_score |> 
  filter(visit_code == "1500") |> 
  group_by(active_arm) |> 
  summarise(
    n = n(),
    n_positive = sum(nugent_positive, na.rm = TRUE),
    p = n_positive / n,
    CI = binom::binom.confint(n_positive, n, method = "exact")
  ) |>
  mutate(
    pct = round(p * 100, 1),
    CI_text = str_c("(", round(CI$lower * 100, 1), "%–", round(CI$upper * 100, 1), "%)")
  ) |> 
  select(-CI, -p) |> 
  gt() |> 
  tab_header(
    title = "Proportion of Participants with Nugent Score ≥ 7 at Visit 1500 (week 5)"
  ) |> 
  cols_label(
    n = "N participants",
    n_positive = "n (%) participants with nugent score >= 7",
    pct = "Percentage",
    CI_text = "95% CI"
  )
Proportion of Participants with Nugent Score ≥ 7 at Visit 1500 (week 5)
active_arm N participants n (%) participants with nugent score >= 7 Percentage 95% CI
FALSE 17 13 76.5 (50.1%–93.2%)
TRUE 71 25 35.2 (24.2%–47.5%)
Code
  nugent_score |> 
  filter(visit_code == "2120") |> 
  group_by(active_arm) |> 
  summarise(
    n = n(),
    n_positive = sum(nugent_positive, na.rm = TRUE),
    p = n_positive / n,
    CI = binom::binom.confint(n_positive, n, method = "exact")
  ) |>
  mutate(
    pct = round(p * 100, 1),
    CI_text = str_c("(", round(CI$lower * 100, 1), "%–", round(CI$upper * 100, 1), "%)")
  ) |> 
  select(-CI, -p) |> 
  gt() |> 
  tab_header(
    title = "Proportion of Participants with Nugent Score ≥ 7 at Visit 2120 (week12)"
  ) |> 
  cols_label(
    n = "N participants",
    n_positive = "n (%) participants with nugent score >= 7",
    pct = "Percentage",
    CI_text = "95% CI"
  )
Proportion of Participants with Nugent Score ≥ 7 at Visit 2120 (week12)
active_arm N participants n (%) participants with nugent score >= 7 Percentage 95% CI
FALSE 17 12 70.6 (44%–89.7%)
TRUE 69 23 33.3 (22.4%–45.7%)
Code
nugent_score |> 
  filter(visit_code == "2120") |> 
  left_join(col_by_week5 |> select(pid, LBP_colonization_by_week5), by="pid") |> 
  group_by(LBP_colonization_by_week5) |> 
  summarise(
    n = n(),
    n_positive = sum(nugent_positive, na.rm = TRUE),
    p = n_positive / n,
    CI = binom::binom.confint(n_positive, n, method = "exact")
  ) |>
  mutate(
    pct = round(p * 100, 1),
    CI_text = str_c("(", round(CI$lower * 100, 1), "%–", round(CI$upper * 100, 1), "%)")
  ) |> 
  select(-CI, -p) |> 
  gt() |> 
  tab_header(
    title = "Proportion of Participants with Nugent Score ≥ 7 at Visit 2120 (week12)"
  ) |> 
  cols_label(
    n = "N participants",
    n_positive = "n (%) participants with nugent score >= 7",
    pct = "Percentage",
    CI_text = "95% CI"
  )
Proportion of Participants with Nugent Score ≥ 7 at Visit 2120 (week12)
LBP_colonization_by_week5 N participants n (%) participants with nugent score >= 7 Percentage 95% CI
FALSE 39 25 64.1 (47.2%–78.8%)
TRUE 47 10 21.3 (10.7%–35.7%)

LBP Colonization dominance

We first assessed whether LBP strains, when detected by metagenomic sequencing, typically established dominance in the vaginal microbiota. For each colonized sample, L. crispatus relative abundance was calculated, and dominance was defined as L. crispatus representing more than 50% of the total community.

We then investigated whether dominance by LBP strains was associated with a reduced risk of recurrent BV. Colonization outcomes were linked to Nugent scores collected at follow-up visits, and we compared the prevalence of BV (Nugent ≥ 7) among participants with or without LBP dominance at week 5. Results suggest that participants who achieved LBP strain dominance were less likely to experience recurrent BV.

Code
rel_abs <- 
  mae[["mg"]] |> 
  as_tibble() |> 
  filter(!poor_quality_mg_data) |> 
  left_join(
    mae |> colData() |> as_tibble(), 
    by = join_by(uid)
  ) |>
  filter(randomized, mITT)

rel_abs |> 
  group_by(.sample) |> 
  summarise(
    detected = any(!is.na(LBP) & rel_ab > 0, na.rm = TRUE),
    crispatus_abundance = sum(rel_ab[species == "crispatus"], na.rm = TRUE), 
    crispatus_dominance = ifelse(crispatus_abundance > 0.5, TRUE, FALSE), 
    .groups = "drop"
  ) |> 
  filter(detected) |> 
  dplyr::count(crispatus_dominance, name = "n") |> 
  mutate(
    crispatus_dominance = if_else(crispatus_dominance, "Yes", "No"),
    total = sum(n),
    pct = round(100 * n / total, 1),
    CI = binom::binom.confint(n, total, method = "exact"),
    CI_text = str_c("(", round(CI$lower * 100, 1), "%–", round(CI$upper * 100, 1), "%)")
  ) |> 
  select(crispatus_dominance, n, pct, CI_text) |> 
  gt() |> 
  tab_header(
    title = md("Dominance of *L. crispatus* when LBP strains are detected")
  ) |> 
  cols_label(
    crispatus_dominance = "L. crispatus dominance",
    n = "N samples with LBP detected",
    pct = "Percentage",
    CI_text = "95% CI"
  )
Dominance of L. crispatus when LBP strains are detected
L. crispatus dominance N samples with LBP detected Percentage 95% CI
No 39 17.2 (12.5%–22.7%)
Yes 188 82.8 (77.3%–87.5%)
Code
tmp <- 
  rel_abs |> 
  group_by(.sample) |> 
  summarise(
    detected = any(!is.na(LBP) & rel_ab > 0, na.rm = TRUE),
    crispatus_abundance = sum(rel_ab[species == "crispatus"], na.rm = TRUE), 
    LBP_abundance = sum(rel_ab[!is.na(LBP)], na.rm = TRUE),
    LBP_dominance = ifelse(LBP_abundance>0.5, TRUE, FALSE), 
    crispatus_dominance = ifelse(crispatus_abundance>0.5, TRUE, FALSE), 
    .groups = "drop"
  ) |> 
  left_join(nugent_score |> 
              select(uid, pid, visit_code, nugent_total_score, nugent_positive), 
            by = c(".sample"="uid")) 

tmp |>  
  filter(!is.na(visit_code)) |> 
  ggplot() + 
  aes(x=visit_code, y = pid, 
      shape = nugent_positive |> as.character(), 
      col = LBP_dominance |> as.character()) + 
  geom_point() + 
  scale_shape_manual(
    name = "Nugent Postive", 
    values = c("TRUE" = 16, "FALSE" = 1)
  ) + 
  scale_color_manual(
    name = "LPB Dominance", 
    values = c("TRUE" = "green3", "FALSE" = "orchid2" )
  ) + 
  labs(x = "Visit code", y = "Pid") + 
  theme(axis.text.y = element_text(size = 4))

Code
tmp |> 
  filter(!is.na(LBP_dominance), !is.na(nugent_positive)) |> 
  filter(visit_code == "1500") |> 
  dplyr::count(LBP_dominance, nugent_positive) |> 
  group_by(LBP_dominance) |> 
  mutate(
    pct = round(100 * n / sum(n), 1)
  ) |> 
  ungroup() |> 
  gt()
LBP_dominance nugent_positive n pct
FALSE FALSE 25 40.3
FALSE TRUE 37 59.7
TRUE FALSE 23 95.8
TRUE TRUE 1 4.2

LBP detectable at week 12

To evaluate the durability of colonization, we assessed the presence of LBP strains at the week 12 visit (visit 2120) among participants in the active arms. Detection was defined by metagenomic sequencing as the presence of any LBP strain with non-zero relative abundance.

Code
rel_abs |> 
  filter(visit_code == "2120") |> 
  group_by(.sample) |> 
  summarise(detected = any(!is.na(LBP) & rel_ab > 0, na.rm = TRUE)) |> 
  ungroup() |> 
  dplyr::count(detected) |>
  mutate(
    pct = round(100 * n / sum(n), 1)
  ) |> 
  gt()
detected n pct
FALSE 61 71.8
TRUE 24 28.2

LBP detection at week 3 and 12 by qPCR

As a complementary and more sensitive assay, we assessed LBP strain detection by qPCR. Detection was defined as the concurrent observation of Cq values for at least two LBP strains at the same visit.

Code
detected_at_week2_qPCR <-
  mae@colData |> as_tibble() |> 
  select(pid, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |> 
  left_join(
    mae[["col_LBP_qPCR"]] |> 
      as_tibble() |> 
      dplyr::left_join(
        mae@colData |> as_tibble() |> select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |> 
      dplyr::filter(visit_code == "1200", .feature == "LBP_detected_at_qpcr")
  ) |> 
  mutate(LBP_detected_at_week2 = outcome |> replace_na(FALSE)) 
Code
detected_at_week2_qPCR |> 
  filter(arm != "Pl") |> 
  dplyr::count(LBP_detected_at_week2) |> 
  gt()
LBP_detected_at_week2 n
FALSE 25
TRUE 46
Code
detected_at_week12_qPCR <-
  mae@colData |> as_tibble() |> 
  select(pid, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |> 
  left_join(
    mae[["col_LBP_qPCR"]] |> 
      as_tibble() |> 
      dplyr::left_join(
        mae@colData |> as_tibble() |> select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |> 
      dplyr::filter(visit_code == "2120", .feature == "LBP_detected_at_qpcr")
  ) |> 
  mutate(LBP_detected_at_week12 = outcome |> replace_na(FALSE)) 
Code
detected_at_week12_qPCR |> 
  filter(arm != "Pl") |> 
  dplyr::count(LBP_detected_at_week12) |> 
  gt()
LBP_detected_at_week12 n
FALSE 51
TRUE 20

LBP detection week 3 (1300) and week 4_12 - qPCR

LBP detection was assessed by qPCR, defined as the presence of Cq values for at least two LBP strains at the same visit. We computed the proportion of participants with detectable LBP strains at week 3 (visit 1300, one week after the end of dosing) and at week 9 (visit 1900, after 7 weeks off product), both overall and excluding placebo participants.

Code
# one week after LBP = 1300

detected_at_week3_qPCR <-
  mae@colData |> as_tibble() |> 
  select(pid, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |> 
  left_join(
    mae[["col_LBP_qPCR"]] |> 
      as_tibble() |> 
      dplyr::left_join(
        mae@colData |> as_tibble() |> select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |> 
      dplyr::filter(visit_code == "1300", .feature == "LBP_detected_at_qpcr")
  ) |> 
  mutate(LBP_detected_at_week3 = outcome |> replace_na(FALSE)) 

detected_at_week3_qPCR |> 
  filter(arm != "Pl") |> 
  dplyr::count(LBP_detected_at_week3) |> 
  mutate(p = n/sum(n) * 100) |> 
  gt(caption = "Without placebo")
Without placebo
LBP_detected_at_week3 n p
FALSE 36 50.70423
TRUE 35 49.29577
Code
detected_at_week3_qPCR |> 
  dplyr::count(LBP_detected_at_week3) |>
  mutate(p = n/sum(n) * 100) |> 
  gt(caption = "With placebo")
With placebo
LBP_detected_at_week3 n p
FALSE 53 58.88889
TRUE 37 41.11111
Code
detected_at_week9_qPCR <-
  mae@colData |> as_tibble() |> 
  select(pid, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |> 
  left_join(
    mae[["col_LBP_qPCR"]] |> 
      as_tibble() |> 
      dplyr::left_join(
        mae@colData |> as_tibble() |> select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |> 
      dplyr::filter(visit_code == "1900", .feature == "LBP_detected_at_qpcr")
  ) |> 
  mutate(LBP_detected_at_week9 = outcome |> replace_na(FALSE)) 

detected_at_week9_qPCR |> 
  filter(arm != "Pl") |> 
  dplyr::count(LBP_detected_at_week9) |> 
  mutate(p = n/sum(n) * 100) |> 
  gt(caption = "Without placebo")
Without placebo
LBP_detected_at_week9 n p
FALSE 55 77.46479
TRUE 16 22.53521
Code
detected_at_week9_qPCR |> 
  dplyr::count(LBP_detected_at_week9) |> 
  mutate(p = n/sum(n) * 100) |> 
  gt(caption = "With placebo")
With placebo
LBP_detected_at_week9 n p
FALSE 74 82.22222
TRUE 16 17.77778

Effective LBP exposure from daily qPCR data during dosing period

Here we look at the proportion of days exposed by qPCR (daily home swabs), among expected dosing days based on participants’ arm.

Code
eff_exp_qPCR <- 
  mae@colData |> as_tibble() |> 
  select(pid, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |>
  left_join(
    mae@metadata$calc_vars_pid |> 
      select(pid, pid_exposed_by_qPCR, n_days_exposed_by_qPCR, n_days_exp_dosed),
    by = join_by(pid)
  ) |>
  mutate(
    prop_days_exposed = 
      case_when(
        n_days_exp_dosed > 0 ~ n_days_exposed_by_qPCR / n_days_exp_dosed,
        TRUE ~ 0
      )
  )
Code
eff_exp_qPCR |> 
  filter(mITT) |> 
  group_by(arm, site) |> 
  summarize(
    N = n(),
    n = sum(pid_exposed_by_qPCR, na.rm = TRUE),
    .groups = "drop"
  ) |> 
  mutate(
    value = str_c(n, " / ", N, " (", round(100*n/N, 1), "%)")
  ) |>  
  select(-n, -N) |>
  pivot_wider(names_from = arm, values_from = value) |> 
  gt(caption = "Participants exposed by qPCR (daily home swabs): n/N (%)") 
Participants exposed by qPCR (daily home swabs): n/N (%)
site Pl LC106-7 LC106-3 LC106-o LC115
CAP 1 / 14 (7.1%) 13 / 14 (92.9%) 13 / 14 (92.9%) 13 / 14 (92.9%) 11 / 14 (78.6%)
MGH 0 / 5 (0%) 6 / 7 (85.7%) 1 / 1 (100%) 1 / 1 (100%) 6 / 6 (100%)
Code
bind_rows(
  eff_exp_qPCR |> filter(arm != "Pl"),
  eff_exp_qPCR |> filter(arm != "Pl") |> mutate(site = "both"),
  eff_exp_qPCR |> filter(arm != "Pl", arm != "LC106-3") |> mutate(arm = "all active arms excl. LC106-3"),
  eff_exp_qPCR |> filter(arm != "Pl", arm != "LC106-3") |> mutate(site = "both", arm = "all active arms excl. LC106-3")
) |> 
  filter(mITT) |> 
  group_by(arm, site) |> 
  summarize(
    n = sum(!is.na(prop_days_exposed)),
    mean_prop_days_exposed = mean(prop_days_exposed, na.rm = TRUE),
    sd_prop = sd(prop_days_exposed, na.rm = TRUE),
    se = sd_prop / sqrt(n), 
    ci_lower = pmax(0, mean_prop_days_exposed - qt(0.975, df = pmax(n - 1, 1)) * se),
    ci_upper = pmin(1, mean_prop_days_exposed + qt(0.975, df = pmax(n - 1, 1)) * se), 
    mean_with_ci = str_c(percent(mean_prop_days_exposed, accuracy = 1), " (", percent(ci_lower, accuracy = 1), "–", percent(ci_upper, accuracy = 1), ")"),
    .groups = "drop"
  ) |>  
  select(arm, site, mean_with_ci) |>
  pivot_wider(names_from = arm, values_from = mean_with_ci) |> 
  gt(caption = "Proportion of days exposed by qPCR (daily home swabs), among dosing days: mean proportion (in %) and associated 95% CI.") 
Proportion of days exposed by qPCR (daily home swabs), among dosing days: mean proportion (in %) and associated 95% CI.
site LC106-3 LC106-7 LC106-o LC115 all active arms excl. LC106-3
CAP 71% (53%–90%) 65% (49%–82%) 69% (53%–85%) 63% (40%–86%) 66% (56%–76%)
MGH NA 64% (27%–100%) NA 72% (42%–100%) 70% (50%–89%)
both 71% (54%–88%) 65% (50%–79%) 71% (56%–86%) 66% (49%–83%) 67% (58%–75%)

We group participants by the proportion of days exposed by qPCR (daily home swabs), among dosing days: >50%, >0% to ≤50%, and 0%. We then summarize the number and percentage of participants in each group, along with 95% confidence intervals, stratified by study arm.

Code
summary_exposed_qpcr <-
  eff_exp_qPCR |>
  filter(arm != "Pl") |>
  filter(mITT) |> 
  mutate(
    expo_grp = case_when(
      prop_days_exposed > 0.5                     ~ ">50%",
      prop_days_exposed > 0 & prop_days_exposed <= 0.5 ~ ">0%, ≤50%",
      prop_days_exposed == 0                      ~ "0%",
      .default = NA_character_
    ) |> factor(levels = c(">50%", ">0%, ≤50%", "0%"))
  ) |>
  dplyr::count(arm, expo_grp, name = "n") |>
  group_by(arm) |>
  mutate(tot_participant = sum(n), p = n / tot_participant |> round(2)) |>
  rowwise() |>
  mutate(
    CI = list(binom::binom.confint(n, tot_participant, method = "exact")[, c("lower","upper")])
  ) |>
  ungroup() |>
  unnest_wider(CI, names_sep = "_")

summary_exposed_qpcr |> 
  group_by(arm) |> 
  gt(row_group_as_column = TRUE)|>
  fmt_number(
    columns = where(is.numeric),
    decimals = 2
  )
expo_grp n tot_participant p CI_lower CI_upper
LC106-7 >50% 14.00 21.00 0.67 0.43 0.85
>0%, ≤50% 5.00 21.00 0.24 0.08 0.47
0% 2.00 21.00 0.10 0.01 0.30
LC106-3 >50% 12.00 15.00 0.80 0.52 0.96
>0%, ≤50% 2.00 15.00 0.13 0.02 0.40
0% 1.00 15.00 0.07 0.00 0.32
LC106-o >50% 13.00 15.00 0.87 0.60 0.98
>0%, ≤50% 1.00 15.00 0.07 0.00 0.32
0% 1.00 15.00 0.07 0.00 0.32
LC115 >50% 14.00 20.00 0.70 0.46 0.88
>0%, ≤50% 3.00 20.00 0.15 0.03 0.38
0% 3.00 20.00 0.15 0.03 0.38
Code
# supplementary figure 3

exposed_qpcr <- 
  summary_exposed_qpcr |> 
  arrange(arm) |> 
  mutate(arm = arm |> str_replace_all(" ", "\n") |> fct_inorder()) |> 
  ggplot() +
  aes(x = arm, y = p, col = expo_grp) +
  geom_linerange(
    aes(ymin = CI_lower, ymax = CI_upper), 
    linewidth = 1.5, alpha = 0.5, lineend = "round",
    position = position_dodge(width = 0.5)
    ) +
  geom_point(position = position_dodge(width = 0.5), size = 3) +
  scale_color_manual("% of dosing days\nwith evidence of\nLBP exposure by qPCR", values = c(
    ">50%" = "grey50",
    ">0%, ≤50%" = "steelblue", 
    "0%" = "gold1"
  )) +
  xlab("") + 
  scale_y_continuous(
    name = "% of participants",
    labels = scales::percent_format(accuracy = 1), 
    breaks = seq(0, 1, by = 0.2)
  ) +
  theme(
    panel.grid.major.x = element_blank()
    # axis.text.x = element_text(angle = 45, hjust = 1)
  )

exposed_qpcr

Code
supplementary_figure3 <- exposed_qpcr
Code
# alternative supplementary figure 3

exposed_qpcr <- 
  eff_exp_qPCR |> 
  filter(arm != "Pl") |> 
  arrange(arm) |> 
  mutate(arm = arm |> str_replace_all(" ", "\n") |> fct_inorder()) |> 
  ggplot() +
  aes(x = arm, y = prop_days_exposed) +
  geom_violin(col = NA, fill = "steelblue1", alpha = 0.5) +
  ggbeeswarm::geom_quasirandom(alpha = 0.7) +
  scale_y_continuous(
    name = "% of dosing days with\nevidence of LBP exposure by qPCR",
    labels = scales::percent_format(accuracy = 1), 
    breaks = seq(0, 1, by = 0.2)
  ) 

exposed_qpcr

Microbiota trajectories highlighting people exposed by qPCR

Longitudinal stack relative abundances plot:

Code
rel_abs <- 
  mae[["mg"]] |> 
  as_tibble() |> 
  dplyr::select(
    .feature, .sample, uid, rel_ab, poor_quality_mg_data,
    taxon_label, genus, LBP, strain_origin
    ) 
Code
rel_abs <- 
  rel_abs |> 
  group_by(.feature) |> 
  mutate(max_rel_ab = max(rel_ab, na.rm = TRUE), mean_rel_ab = mean(rel_ab, na.rm = TRUE)) |> 
  ungroup() |> 
  arrange(is.na(LBP), -mean_rel_ab) |> 
  mutate(.feature = .feature |> fct_inorder()) |> 
  mutate(
    taxon = 
      case_when(
        as.numeric(.feature) <= 29 ~ taxon_label,
        (genus == "Lactobacillus") ~ "Other Lactobacillus",
        TRUE ~ "Other non-Lacto."
      )
  ) |> 
  arrange(is.na(LBP), LBP,genus != "Lactobacillus", genus, -mean_rel_ab) |> 
  mutate(taxon = taxon |> fct_inorder() |> fct_relevel("Other Lactobacillus", "Other non-Lacto.", "Other", after = Inf))
Code
rel_abs_agg <- 
  rel_abs |> 
  group_by(uid, taxon, LBP, strain_origin, poor_quality_mg_data) |> 
  summarize(rel_ab = sum(rel_ab), .groups = "drop") |> 
  dplyr::left_join(mae |> colData() |> as_tibble(), by = join_by(uid)) 
Code
tmp <- 
  eff_exp_qPCR |> 
  right_join(rel_abs_agg |> select(pid) |> distinct()) |> 
  mutate(
    pid_label = 
      case_when(
        prop_days_exposed > 0.5 ~ "E",
        prop_days_exposed > 0 ~ "e",
        is.na(prop_days_exposed) ~ "?",
        TRUE ~ "."
      ) 
    ) |> 
  dplyr::select(pid, pid_label) |> 
  distinct() 
pid_label <- setNames(tmp$pid_label, tmp$pid)

 
g_trajectories_qpcr_exp <-
  rel_abs_agg |>
  filter(mITT, randomized != 0, PIPV, !poor_quality_mg_data) |>
  group_by(pid) |> 
  mutate(
    tot_LC115 = sum(rel_ab[LBP == "LC115"]),
    tot_LBP = sum(rel_ab[!is.na(LBP)])
    ) |> 
  ungroup() |> 
  arrange(desc(PP), -tot_LC115, -tot_LBP) |> 
  mutate(
    pid = pid |> fct_inorder(),
    visit_label = visit |> as.character() |> fct_reorder(visit |> as.numeric())
    ) |> 
  ggplot(aes(y = rel_ab, x = pid, fill = taxon)) + #, alpha = ifelse(PP, "PP", "mITT")
  geom_col() +
  ggh4x::facet_nested(
    rows = vars(visit_label),
    cols = vars(arm, site),
    scales = "free_x", space = "free_x", 
    strip = 
      ggh4x::strip_nested(
        background_x = 
          ggh4x::elem_list_rect(
            fill = c(rep("gray90",5), 
                     rep(c("#FF0000", "#00868B"), 5))
          ), 
        text_x = 
          ggh4x::elem_list_text(
            face = rep("bold", 15),
            colour = c(rep("black", 5), rep("white", 5), "transparent", "white", "transparent", rep("white", 3))
            )              
      )
  ) +
  scale_fill_manual(
    "LBP strain or taxon",
    values = get_taxa_colors(rel_abs_agg$taxon |> levels()),
    labels = get_taxa_labels(rel_abs_agg$taxon |> levels()),
    guide = guide_legend(ncol = 1)
    ) +
  #scale_alpha_discrete("Population", range = c(0.5, 1)) +
  scale_y_continuous(
    "Relative abundance", labels = scales::percent,
    breaks = seq(0, 0.5, by = 0.5), minor_breaks = seq(0, 1, by = 0.25)
  ) +
  scale_x_discrete("Participants", labels = pid_label) +
  theme(
    #strip.background.y = element_rect(fill = "gray90"),
    #strip.text.y = element_text(color = "black", angle = 0, hjust = 0), # face = "bold", 
    strip.text.x = element_text(hjust = 0.5),
    axis.text.x = element_text(hjust = 0.5, vjust = 1, size = 10, margin = margin(t = 0)), 
    axis.ticks.x = element_blank(),
    legend.key.height = unit(15, "pt"),
    legend.text = element_markdown()
  )  


# theme_bw() + theme(axis.ticks.x = element_blank(), strip.text.y = element_text(angle = 0))
Code
g_trajectories_qpcr_exp 

Microbiota trajectories of mITT participants in the VIBRANT study. The relative abundances of the LBP strains and top taxa are shown. Participants are ordered by their total relative abundance of LBP strains, and the study weeks are shown in the rows.
Code
tmp <- 
  applicator |> 
  select(pid, number_of_returned_applicators, p_pos_applicator = proportion_positive_applicators) |>
  distinct() |> 
  right_join(rel_abs_agg |> select(pid) |> distinct()) |> 
  mutate(
    pid_label = 
      case_when(
        is.na(number_of_returned_applicators) ~ "?",
        number_of_returned_applicators == 0 ~ "?",
        p_pos_applicator > 0.75 ~ "A",
        p_pos_applicator > 0 ~ "a",
        TRUE ~ "."
      ) 
    ) |> 
  dplyr::select(pid, pid_label) |> 
  distinct() 
pid_label <- setNames(tmp$pid_label, tmp$pid)

 
g_trajectories_appl_exp <-
  rel_abs_agg |>
  filter(mITT, randomized != 0, PIPV, !poor_quality_mg_data) |>
  group_by(pid) |> 
  mutate(
    tot_LC115 = sum(rel_ab[LBP == "LC115"]),
    tot_LBP = sum(rel_ab[!is.na(LBP)])
    ) |> 
  ungroup() |> 
  arrange(desc(PP), -tot_LC115, -tot_LBP) |> 
  mutate(
    pid = pid |> fct_inorder(),
    visit_label = visit |> as.character() |> fct_reorder(visit |> as.numeric())
    ) |> 
  ggplot(aes(y = rel_ab, x = pid, fill = taxon)) + #, alpha = ifelse(PP, "PP", "mITT")
  geom_col() +
  ggh4x::facet_nested(
    rows = vars(visit_label),
    cols = vars(arm, site),
    scales = "free_x", space = "free_x", 
    strip = 
      ggh4x::strip_nested(
        background_x = 
          ggh4x::elem_list_rect(
            fill = c(rep("gray90",5), 
                     rep(c("#FF0000", "#00868B"), 5))
          ), 
        text_x = 
          ggh4x::elem_list_text(
            face = rep("bold", 15),
            colour = c(rep("black", 5), rep("white", 5), "transparent", "white", "transparent", rep("white", 3))
            )              
      )
  ) +
  scale_fill_manual(
    "LBP strain or taxon",
    values = get_taxa_colors(rel_abs_agg$taxon |> levels()),
    labels = get_taxa_labels(rel_abs_agg$taxon |> levels()),
    guide = guide_legend(ncol = 1)
    ) +
  #scale_alpha_discrete("Population", range = c(0.5, 1)) +
  scale_y_continuous(
    "Relative abundance", labels = scales::percent,
    breaks = seq(0, 0.5, by = 0.5), minor_breaks = seq(0, 1, by = 0.25)
  ) +
  scale_x_discrete("Participants", labels = pid_label) +
  theme(
    #strip.background.y = element_rect(fill = "gray90"),
    #strip.text.y = element_text(color = "black", angle = 0, hjust = 0), # face = "bold", 
    strip.text.x = element_text(hjust = 0.5),
    axis.text.x = element_text(hjust = 0.5, vjust = 1, size = 10, margin = margin(t = 0)), 
    axis.ticks.x = element_blank(),
    legend.key.height = unit(15, "pt"), 
    legend.text = element_markdown()
  )  


# theme_bw() + theme(axis.ticks.x = element_blank(), strip.text.y = element_text(angle = 0))
Code
g_trajectories_appl_exp 

Microbiota trajectories of mITT participants in the VIBRANT study. The relative abundances of the LBP strains and top taxa are shown. Participants are ordered by their total relative abundance of LBP strains, and the study weeks are shown in the rows.

Figures (export)

Code
ggsave(
  g_trajectories, filename = "figures/Figure2_trajectories_plot.pdf",
  width = 12, height = 7.5, dpi = 300, units = "in"
)
Code
# figure 3

figure3 <- 
  g_prop_colonized_by_final + theme(legend.position = "bottom", legend.direction = "horizontal") +
  g_long_terme_col + theme(legend.position = "bottom", legend.direction = "horizontal") +
  plot_layout(ncol = 2, widths = c(2, 1.4)) +
  plot_annotation(tag_level = "A") + 
  theme(
    plot.tag = element_text(
      face = "bold",
      hjust = -0.5,  
      vjust = 2     
    ))

figure3

ggsave(
  figure3, filename = "figures/Figure3_colonization.pdf",
  width = 12, height = 5, dpi = 300, units = "in"
)
Code
# figure 4 
ggsave(
  g_relative_abundance, filename = "figures/Figure4_relative_abundance.pdf",
  width = 9, height = 6, dpi = 300, units = "in"
)
Code
# figure 5 
ggsave(
  g_colonized_among_exposed, filename = "figures/Figure5_colonized_among_exposed.pdf",
  width = 9, height = 6, dpi = 300, units = "in"
)
Code
# Supp figure 4 
ggsave(
  g_colonized_among_exposed_supp, filename = "figures/Suppl_Figure4_colonized_among_exposed.pdf",
  width = 9, height = 6, dpi = 300, units = "in"
)
Code
# supplementary figure 3 
#| fig-height: 4
#| fig-width: 12
#| eval: false

ggsave(
  supplementary_figure3, filename = "figures/Suppl_Figure3_colonized_among_exposed.pdf",
  width = 9, height = 6, dpi = 300, units = "in"
)