VIBRANT Trial - Primary Analyses

Author

Laura Symul (UCLouvain), Laura Vermeren (UCLouvain), Susan Holmes (Stanford University)

Published

January 9, 2026

Code

library(tidyverse)
library(magrittr)
library(gt)
library(patchwork)
library(SummarizedExperiment)
library(tidySummarizedExperiment)
library(brms)
library(gtsummary)
library(labelled)
library(scales)
# library(gridExtra)
library(ggpubr)
library(ggh4x)
library(ggtext)
# library(mia) # BiocManager::install("mia")

#theme_set(theme_light())
tmp <- fs::dir_map("R", source)
rm(tmp)

Code

tmp <- fs::dir_map("../VIBRANT-99-utils/R/", source)
rm(tmp)

This document details the analyses and provides results for the primary analyses of the VIBRANT trial.

Trial data

The data used for these analyses are stored in an R Bioconductor MultiAssayExperiment object which contains the QCed and pre-processed data. The code and description of how raw data have been processed and QCed is available on the VIBRANT-0-QC-and-preprocessing GitHub repository.

Code

# Loading MAE_1
mae_file <- 
  fs::dir_ls(str_c(data_dir(), "06 subsetted MAEs/"), regexp = "mae_1_.*\\.rds$") |> 
  sort(decreasing = TRUE) |>
  magrittr::extract(1)

cat(str_c("Loading the RDS file containing the unblinded VIBRANT MAE: /", mae_file |> str_remove(data_dir()), "\n"))

Loading the RDS file containing the unblinded VIBRANT MAE: /06 subsetted MAEs/mae_1_20260109.rds

Code

mae <- readRDS(mae_file)

rm(mae_file)

mae

A MultiAssayExperiment object of 4 listed
 experiments with user-defined names and respective classes.
 Containing an ExperimentList class object of length 4:
 [1] col_LBP_mg: SummarizedExperiment with 2 rows and 890 columns
 [2] col_LBP_qPCR: SummarizedExperiment with 4 rows and 895 columns
 [3] col_crispatus_mg: SummarizedExperiment with 2 rows and 890 columns
 [4] mg: SummarizedExperiment with 779 rows and 893 columns
Functionality:
 experiments() - obtain the ExperimentList instance
 colData() - the primary/phenotype DataFrame
 sampleMap() - the sample coordination DataFrame
 `$`, `[`, `[[` - extract colData columns, subset, or experiment
 *Format() - convert into a long or wide DataFrame
 assays() - convert ExperimentList to a SimpleList of matrices
 exportClass() - save data to flat files

Demographics (Table 1 & suppl. tables by sites)

Summary table

The demographics table is constructed as follows.

We first create a table1_data data.frame from the participant_crfs_merged table stored in the @metadata slot of the mae object.

This table contains most of the variables required for the demographics summary, including: site, age, race, ethn, cut_size_meals, eat_less, hungry_did_not_eat, sexual_partners_lifetime, sexual_partners_past_month and contraception_method.

Additional variables such as arm and study population flags are retrieved from mae@colData. For this analysis, we restrict the dataset to participants in the mITT population.

From the food-related variables (cut_size_meals, eat_less, and hungry_did_not_eat), we derive a new variable food_insecurity. A participant is considered food insecure if she answered “Yes” to any of the three questions.

Code

table1_data <- 
  metadata(mae)$participant_crfs_merged |> 
  dplyr::left_join(
    colData(mae) |> as_tibble() |> select(pid, site, arm, ITT, mITT, PP) |> distinct(), 
    by = join_by(pid, site)
    )  |> 
  filter(mITT) |> 
  select(pid, site, arm, age, race, ethn, cut_size_meals, eat_less, hungry_did_not_eat, sexual_partners_lifetime, sexual_partners_past_month, contraception_method) |> 
  mutate(
    food_insecurity = ifelse(
      cut_size_meals == "Yes" | eat_less == "Yes" | hungry_did_not_eat == "Yes",
      "Yes",
      "No"
    ),
    food_insecurity = factor(food_insecurity, levels = c("No", "Yes"))
  ) |> 
  select(-c(eat_less, cut_size_meals, hungry_did_not_eat))

Code

table1_data |> 
  ggplot(aes(x = food_insecurity)) +
  geom_bar() +
  labs(x= "Food insecurity", y = "Count")+
  theme_classic()

In addition to these variables, we also include information about about the gender of participants’ sexual partners.

Participants were asked about their partner’s gender at each weekly visit (weeks 0 through 12: visits 1000, 1100, 1200, 1300, 1400, 1500, 1700, 1900, and 2120) using Case Report Form (CRF) 19. Responses are recorded in the past_week_sex_partner column of the mae@metadata$visits_crfs_merged table.

Based on participants’ answers across all weekly visits, we categorized their reported partner gender into one of four groups:

No sex
Only male
Only female
Both

This derived variable is included in the table1_data table.

Code

gender_sexual_partner <- 
  metadata(mae)$visits_crfs_merged |>
  select(pid, past_week_sex_partner) |>
  filter(!is.na(past_week_sex_partner)) |>
  group_by(pid) |>
  summarise(
    gender_sexual_partner = 
      case_when(
        all(past_week_sex_partner == "No sex in the past week") ~ "No sex",
        any(past_week_sex_partner %in% c("A single male partner", "Multiple male partners")) &
          !any(past_week_sex_partner %in% c("A single female partner", "Multiple female partners")) ~ "Only man",
        any(past_week_sex_partner %in% c("A single female partner", "Multiple female partners")) &
          !any(past_week_sex_partner %in% c("A single male partner", "Multiple male partners")) ~ "Only female",
        any(past_week_sex_partner %in% c("A single male partner", "Multiple male partners")) &
          any(past_week_sex_partner %in% c("A single female partner", "Multiple female partners")) ~ "Both",
        TRUE ~ NA_character_
      ),
    .groups = "drop"
  )

table1_data <- 
  table1_data |> 
  left_join(gender_sexual_partner, by = "pid")

We also recoded the race variable to simplify and harmonize categories:

“Asian or South Asian” is re-coded to “Asian” (for brevity)
“Black, African American, or African” and “African” are re-coded to “Black/African” (for brevity)
“Coloured”, “White”, “Other”, and “Prefered not to answer” are left as is.

Code

table1_data <-
  table1_data |>
  
  mutate(
    across(where(is.character), as.factor),
    
    race = fct_recode(
      race,
      "Asian" = "Asian or South Asian",
      "Black/African" = "Black, African American, or African",
      "Black/African" = "African",
      "Coloured" = "Coloured",
      "White" = "White",
      "Other" = "Other",
      "Prefer not to answer" = "Prefer not to answer"
    ),
    
    race = fct_relevel(race, "Asian", "Black/African", "Coloured", "White", "Other", "Prefer not to answer"),
    race = race |> fct_drop(),
    arm = arm |> fct_drop()
  ) |>
  
  set_variable_labels(
    food_insecurity = "Report food insecurity in past 12 months ",
    sexual_partners_past_month = "Number of partners in past month",
    sexual_partners_lifetime = "Number of partners in lifetime",
    ethn = "Ethnicity", 
    contraception_method = "Contraception", 
    gender_sexual_partner = "Gender of sexual partners"
  )

Table 1 (Population characteristics by arm)

Table 1 summarizes baseline demographic and behavioral characteristics of the participants described above, stratified by study arm and restricted to the mITT population.. Continuous variables are presented as median and range (minimum–maximum), while categorical variables are shown as counts and percentages.

Comparisons across study arms are performed using the Kruskal-Wallis rank sum test for continuous variables. For categorical variables, either Fisher’s exact test or Pearson’s Chi-squared test is used, depending on cell counts and expected frequencies.

Code

table1_data |> 
  select(-pid) |>
  rename_with(~ str_to_title(.x)) |>  
  tbl_summary(
    by = Arm,
    type =
      list(
        Sexual_partners_past_month ~ "continuous",
        Sexual_partners_lifetime ~ "continuous"
      ),
    statistic = list(
      all_continuous() ~ "{median} ({min}-{max})",
      all_categorical() ~ "{n} ({p}%)"
    )
  )  |> 
  add_p(include = -Site)

Characteristic	Pl N = 19¹	LC106-7 N = 21¹	LC106-3 N = 15¹	LC106-o N = 15¹	LC115 N = 20¹	p-value²
Site
CAP	14 (74%)	14 (67%)	14 (93%)	14 (93%)	14 (70%)
MGH	5 (26%)	7 (33%)	1 (6.7%)	1 (6.7%)	6 (30%)
Age	29 (20-38)	29 (18-40)	26 (21-34)	25 (19-35)	30 (20-40)	0.5
Race						0.5
Asian	1 (5.3%)	0 (0%)	1 (6.7%)	0 (0%)	0 (0%)
Black/African	14 (74%)	18 (86%)	14 (93%)	15 (100%)	18 (90%)
White	2 (11%)	3 (14%)	0 (0%)	0 (0%)	1 (5.0%)
Other	1 (5.3%)	0 (0%)	0 (0%)	0 (0%)	1 (5.0%)
Prefer not to answer	1 (5.3%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)
Ethnicity						0.8
Not hispanic/latino/spanish	3 (60%)	6 (86%)	1 (100%)	1 (100%)	4 (67%)
hispanic/latino/spanish	2 (40%)	1 (14%)	0 (0%)	0 (0%)	2 (33%)
Unknown	14	14	14	14	14
Number of partners in lifetime	5 (1-10)	5 (1-90)	4 (1-10)	4 (1-20)	5 (1-10)	0.8
Number of partners in past month	1 (1-2)	1 (0-2)	1 (0-2)	1 (0-3)	1 (0-2)	0.4
Contraception						>0.9
Continuous combined oral contraceptive pills (skip placebo)	3 (16%)	5 (24%)	1 (6.7%)	1 (6.7%)	3 (15%)
Contraceptive vaginal ring	0 (0%)	1 (4.8%)	0 (0%)	0 (0%)	0 (0%)
Cyclic combined oral contraceptive pills	0 (0%)	0 (0%)	0 (0%)	0 (0%)	1 (5.0%)
Depo provera	11 (58%)	10 (48%)	12 (80%)	11 (73%)	11 (55%)
Hormonal implant	1 (5.3%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)
Hormone containing IUD	2 (11%)	2 (9.5%)	0 (0%)	0 (0%)	2 (10%)
Net-EN	2 (11%)	2 (9.5%)	2 (13%)	3 (20%)	2 (10%)
Other	0 (0%)	1 (4.8%)	0 (0%)	0 (0%)	1 (5.0%)
Report food insecurity in past 12 months	7 (37%)	7 (33%)	6 (40%)	5 (33%)	6 (30%)	>0.9
Gender of sexual partners						0.8
No sex	1 (5.3%)	3 (14%)	1 (6.7%)	1 (6.7%)	3 (15%)
Only man	18 (95%)	18 (86%)	14 (93%)	14 (93%)	17 (85%)
¹ n (%); Median (Min-Max)
² Kruskal-Wallis rank sum test; Fisher’s exact test; Pearson’s Chi-squared test

Study population characteristics appear balanced across arms.

Population characteristics by site

An additional version of Table 1 is generated using the same variables and summary methods, but stratified by study site instead of study arm.

Code

table1_data |> 
  select(arm, age, race, site, food_insecurity, contraception_method, sexual_partners_past_month, sexual_partners_lifetime) |>
  rename_with(~ str_to_title(.x)) |>  
  tbl_summary(
    by = Site,
    type =
      list(
        Sexual_partners_past_month ~ "continuous",
        Sexual_partners_lifetime ~ "continuous"
      ),
    statistic = list(
      all_continuous() ~ "{median} ({min}-{max})",
      all_categorical() ~ "{n} ({p}%)"
    )
  )  |> 
  add_p()

Characteristic	CAP N = 70¹	MGH N = 20¹	p-value²
Arm			0.2
Pl	14 (20%)	5 (25%)
LC106-7	14 (20%)	7 (35%)
LC106-3	14 (20%)	1 (5.0%)
LC106-o	14 (20%)	1 (5.0%)
LC115	14 (20%)	6 (30%)
Age	26 (18-37)	33 (22-40)	<0.001
Race			<0.001
Asian	0 (0%)	2 (10%)
Black/African	70 (100%)	9 (45%)
White	0 (0%)	6 (30%)
Other	0 (0%)	2 (10%)
Prefer not to answer	0 (0%)	1 (5.0%)
Report food insecurity in past 12 months	28 (40%)	3 (15%)	0.038
Contraception			<0.001
Continuous combined oral contraceptive pills (skip placebo)	5 (7.1%)	8 (40%)
Contraceptive vaginal ring	0 (0%)	1 (5.0%)
Cyclic combined oral contraceptive pills	0 (0%)	1 (5.0%)
Depo provera	54 (77%)	1 (5.0%)
Hormonal implant	0 (0%)	1 (5.0%)
Hormone containing IUD	0 (0%)	6 (30%)
Net-EN	11 (16%)	0 (0%)
Other	0 (0%)	2 (10%)
Number of partners in past month	1 (0-3)	1 (0-2)	0.4
Number of partners in lifetime	4 (1-20)	10 (4-90)	<0.001
¹ n (%); Median (Min-Max)
² Fisher’s exact test; Wilcoxon rank sum test; Pearson’s Chi-squared test

There are, unsurprisingly, striking differences in study population characteristics by site.

Population characteristics by arm, split by site

Separate versions of Table 1 are also generated for each study site, with participant characteristics summarized by study arm within each site.

Code

table1_data |> 
  filter(site == "CAP") |> 
  select(-c(pid, site)) |>
  rename_with(~ str_to_title(.x)) |>  
  tbl_summary(
    by = Arm,
    type =
      list(
        Sexual_partners_past_month ~ "continuous",
        Sexual_partners_lifetime ~ "continuous"
      ),
    statistic = list(
      all_continuous() ~ "{median} ({min}-{max})",
      all_categorical() ~ "{n} ({p}%)"
    )
  ) |> 
  add_p()|>
  modify_caption("**Site = CAP**")

**Site = CAP**
Characteristic	Pl N = 14¹	LC106-7 N = 14¹	LC106-3 N = 14¹	LC106-o N = 14¹	LC115 N = 14¹	p-value²
Age	26 (20-36)	26 (18-36)	26 (21-34)	25 (19-35)	25 (20-37)	>0.9
Race						>0.9
Asian	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)
Black/African	14 (100%)	14 (100%)	14 (100%)	14 (100%)	14 (100%)
White	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)
Other	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)
Prefer not to answer	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)
Ethnicity
Not hispanic/latino/spanish	0 (NA%)	0 (NA%)	0 (NA%)	0 (NA%)	0 (NA%)
hispanic/latino/spanish	0 (NA%)	0 (NA%)	0 (NA%)	0 (NA%)	0 (NA%)
Unknown	14	14	14	14	14
Number of partners in lifetime	4 (1-10)	3 (1-7)	4 (1-10)	4 (1-20)	4 (1-9)	0.8
Number of partners in past month	1 (1-2)	1 (1-2)	1 (0-2)	1 (0-3)	1 (1-2)	0.2
Contraception						0.9
Continuous combined oral contraceptive pills (skip placebo)	1 (7.1%)	2 (14%)	0 (0%)	0 (0%)	2 (14%)
Contraceptive vaginal ring	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)
Cyclic combined oral contraceptive pills	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)
Depo provera	11 (79%)	10 (71%)	12 (86%)	11 (79%)	10 (71%)
Hormonal implant	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)
Hormone containing IUD	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)
Net-EN	2 (14%)	2 (14%)	2 (14%)	3 (21%)	2 (14%)
Other	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)
Report food insecurity in past 12 months	7 (50%)	5 (36%)	6 (43%)	5 (36%)	5 (36%)	>0.9
Gender of sexual partners						>0.9
No sex	1 (7.1%)	2 (14%)	1 (7.1%)	1 (7.1%)	2 (14%)
Only man	13 (93%)	12 (86%)	13 (93%)	13 (93%)	12 (86%)
¹ Median (Min-Max); n (%)
² Kruskal-Wallis rank sum test; Fisher’s exact test; NA; Pearson’s Chi-squared test

Code

table1_data |> 
  filter(site == "MGH") |> 
  select(-c(pid, site)) |>
  rename_with(~ str_to_title(.x)) |>  
  tbl_summary(
    by = Arm,
    type =
      list(
        Sexual_partners_past_month ~ "continuous",
        Sexual_partners_lifetime ~ "continuous"
      ),
    statistic = list(
      all_continuous() ~ "{median} ({min}-{max})",
      all_categorical() ~ "{n} ({p}%)"
    )
  ) |> 
  add_p()|>
  modify_caption("**Site = MGH**")

**Site = MGH**
Characteristic	Pl N = 5¹	LC106-7 N = 7¹	LC106-3 N = 1¹	LC106-o N = 1¹	LC115 N = 6¹	p-value²
Age	32 (26-38)	33 (24-40)	25 (25-25)	35 (35-35)	33 (22-40)	0.7
Race						0.13
Asian	1 (20%)	0 (0%)	1 (100%)	0 (0%)	0 (0%)
Black/African	0 (0%)	4 (57%)	0 (0%)	1 (100%)	4 (67%)
White	2 (40%)	3 (43%)	0 (0%)	0 (0%)	1 (17%)
Other	1 (20%)	0 (0%)	0 (0%)	0 (0%)	1 (17%)
Prefer not to answer	1 (20%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)
Ethnicity						0.8
Not hispanic/latino/spanish	3 (60%)	6 (86%)	1 (100%)	1 (100%)	4 (67%)
hispanic/latino/spanish	2 (40%)	1 (14%)	0 (0%)	0 (0%)	2 (33%)
Number of partners in lifetime	9 (7-10)	15 (6-90)	7 (7-7)	10 (10-10)	6 (4-10)	0.059
Number of partners in past month	1 (1-2)	1 (0-2)	1 (1-1)	1 (1-1)	1 (0-2)	>0.9
Contraception						>0.9
Continuous combined oral contraceptive pills (skip placebo)	2 (40%)	3 (43%)	1 (100%)	1 (100%)	1 (17%)
Contraceptive vaginal ring	0 (0%)	1 (14%)	0 (0%)	0 (0%)	0 (0%)
Cyclic combined oral contraceptive pills	0 (0%)	0 (0%)	0 (0%)	0 (0%)	1 (17%)
Depo provera	0 (0%)	0 (0%)	0 (0%)	0 (0%)	1 (17%)
Hormonal implant	1 (20%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)
Hormone containing IUD	2 (40%)	2 (29%)	0 (0%)	0 (0%)	2 (33%)
Net-EN	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)
Other	0 (0%)	1 (14%)	0 (0%)	0 (0%)	1 (17%)
Report food insecurity in past 12 months	0 (0%)	2 (29%)	0 (0%)	0 (0%)	1 (17%)	0.8
Gender of sexual partners						>0.9
No sex	0 (0%)	1 (14%)	0 (0%)	0 (0%)	1 (17%)
Only man	5 (100%)	6 (86%)	1 (100%)	1 (100%)	5 (83%)
¹ Median (Min-Max); n (%)
² Kruskal-Wallis rank sum test; Fisher’s exact test

Study population characteristics appear balanced across arms at the CAPRISA site. Given the low number of participants at the MGH site in two of the five arms, we create an additional table excluding these two arms.

Code

table1_data |> 
  filter(site == "MGH", !(arm %in% c("LC106-o", "LC106-3"))) |> 
  mutate(arm = arm |> fct_drop()) |> 
  select(-c(pid, site)) |>
  rename_with(~ str_to_title(.x)) |>  
  tbl_summary(
    by = Arm,
    type =
      list(
        Sexual_partners_past_month ~ "continuous",
        Sexual_partners_lifetime ~ "continuous"
      ),
    statistic = list(
      all_continuous() ~ "{median} ({min}-{max})",
      all_categorical() ~ "{n} ({p}%)"
    )
  ) |> 
  add_p()|>
  modify_caption("**Site = MGH**")

**Site = MGH**
Characteristic	Pl N = 5¹	LC106-7 N = 7¹	LC115 N = 6¹	p-value²
Age	32 (26-38)	33 (24-40)	33 (22-40)	>0.9
Race				0.14
Asian	1 (20%)	0 (0%)	0 (0%)
Black/African	0 (0%)	4 (57%)	4 (67%)
White	2 (40%)	3 (43%)	1 (17%)
Other	1 (20%)	0 (0%)	1 (17%)
Prefer not to answer	1 (20%)	0 (0%)	0 (0%)
Ethnicity				0.7
Not hispanic/latino/spanish	3 (60%)	6 (86%)	4 (67%)
hispanic/latino/spanish	2 (40%)	1 (14%)	2 (33%)
Number of partners in lifetime	9 (7-10)	15 (6-90)	6 (4-10)	0.017
Number of partners in past month	1 (1-2)	1 (0-2)	1 (0-2)	0.8
Contraception				>0.9
Continuous combined oral contraceptive pills (skip placebo)	2 (40%)	3 (43%)	1 (17%)
Contraceptive vaginal ring	0 (0%)	1 (14%)	0 (0%)
Cyclic combined oral contraceptive pills	0 (0%)	0 (0%)	1 (17%)
Depo provera	0 (0%)	0 (0%)	1 (17%)
Hormonal implant	1 (20%)	0 (0%)	0 (0%)
Hormone containing IUD	2 (40%)	2 (29%)	2 (33%)
Net-EN	0 (0%)	0 (0%)	0 (0%)
Other	0 (0%)	1 (14%)	1 (17%)
Report food insecurity in past 12 months	0 (0%)	2 (29%)	1 (17%)	0.7
Gender of sexual partners				>0.9
No sex	0 (0%)	1 (14%)	1 (17%)
Only man	5 (100%)	6 (86%)	5 (83%)
¹ Median (Min-Max); n (%)
² Kruskal-Wallis rank sum test; Fisher’s exact test

Besides the number of lifetime sexual partners, arms appear balanced across arm at MGH.

Primary outcomes (Table 2) & sensitivity analyses

LBP colonization by week 5

The primary outcome is defined as “Colonization by any of the LBP strains after product administration by week 5”, and colonization by any of the LBP strain is defined as a relative abundance of 5% by any single strain or 10% total relative abundance of the LBP strains. The relative abundance of LBP strain is estimated using metagenomics (taxonomic composition estimated using VIRGO2 and strain proportion of total L. crispatus is estimated using kSanity), and includes samples from weekly visits 1200 to 1500 (week 2 (post-INT) to week 5 (3 weeks post-INT)).

The computation itself has been done at the QC/Pre-processing stage (see the VIBRANT-0-QC-and-preprocessing GitHub repository).

Code

col_by_week5 <- 
  mae@colData |> as_tibble() |> 
  select(pid, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |> 
  left_join(
    mae[["col_LBP_mg"]] |> 
      as_tibble() |> 
      dplyr::left_join(
        mae@colData |> as_tibble() |> select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |> 
      dplyr::filter(visit_code == "1500", .feature == "colonized_LBP_by_mg")
  ) |> 
  mutate(LBP_colonization_by_week5 = outcome |> replace_na(FALSE))

Code

summary_table <- 
  col_by_week5 |> 
  group_by(site, arm) |> 
  summarize(
    n = n(), 
    n_success = sum(LBP_colonization_by_week5),
    .groups = "drop"
  ) |> 
  mutate(
    p = n_success / n,
    CI = binom::binom.confint(n_success, n, method = "wilson") #exact
  )

Code

table_2 <- 
  summary_table |> 
  mutate(
    N_per_site_and_arm = str_c("N = ", n),
    LBP_strain_detected = str_c(n_success, " (", round(p * 100)," %)"),
    CI_text = str_c(round(CI$lower * 100), "% - ", round(CI$upper * 100), "%")
  ) |> 
  dplyr::select(
    site, arm, N_per_site_and_arm, LBP_strain_detected, CI_text
  ) |> 
  pivot_longer(-c(site, arm)) |>
  mutate(
    name = 
      case_when(
        name == "N_per_site_and_arm" ~ "N participants",
        name == "LBP_strain_detected" ~ "n (%) participants\nwith LBP strain detected",
        name == "CI_text" ~ "95% CI"
      )
  ) |> 
  pivot_wider(
    id_cols = c(site, name), 
    names_from = arm,
    values_from = value, values_fill = ""
    )

Code

table_2 |> 
  group_by(site) |> 
  gt(caption = "Table 1: Colonization by week 5 by arm and site", row_group_as_column = TRUE) |> 
  cols_width(
    "name" ~ px(200),
    everything() ~ px(120)
  ) |> 
  cols_label(
    name = "",
    # Blinded = "All blinded arms",
    Pl = "Placebo",
    `LC106-7` = "LC106<br>7 days",
    `LC106-3` = "LC106<br>3 days",
    `LC106-o` = "LC106<br>overlap",
    `LC115` = "LC115<br>7 days",
    .fn = md
  )

Table 1: Colonization by week 5 by arm and site
		Placebo	LC106 7 days	LC106 3 days	LC106 overlap	LC115 7 days
CAP	N participants	N = 14	N = 14	N = 14	N = 14	N = 14
	n (%) participants with LBP strain detected	0 (0 %)	9 (64 %)	7 (50 %)	7 (50 %)	10 (71 %)
	95% CI	0% - 22%	39% - 84%	27% - 73%	27% - 73%	45% - 88%
MGH	N participants	N = 5	N = 7	N = 1	N = 1	N = 6
	n (%) participants with LBP strain detected	1 (20 %)	6 (86 %)	1 (100 %)	1 (100 %)	6 (100 %)
	95% CI	4% - 62%	49% - 97%	21% - 100%	21% - 100%	61% - 100%

Visualization of the primary outcome

Code

col_by_week5 |> 
  arrange(site, arm, LBP_colonization_by_week5) |>
  group_by(site, arm) |> 
  mutate(participant_number = row_number() |> factor()) |> 
  ungroup() |> 
  ggplot(aes(x = participant_number, y = site |> fct_rev(), col = site, fill = site, shape = LBP_colonization_by_week5)) +
  geom_point(size = 3) +
  facet_grid(. ~ arm, scales = "free_x", space = "free_x") +
  guides(color = "none", fill = "none") +
  scale_shape_manual("Colonized by LBP by week 5", values = c(1, 16)) +
  scale_color_manual(values = site_colors) + 
  xlab("Participant number by arm and site") + ylab("") +
  theme(
    legend.position = "top",
    legend.text = element_text(hjust = 0.5),
    legend.title = element_text(hjust = 0.5),
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)
  ) +
  guides(shape = guide_legend(ncol = 2))

Test for differences between the placebo and each active arm

We test for differences between the placebo arm and each active arm for the primary outcome using a one-sided unconditional exact test.

Data from both site are merged for each arm given the low numbers of participants at MGH.

The $p$-values are adjusted for multiple testing (multiple active arms against the Placebo) using the Benjamini-Hochberg method.

Code

arms_to_test <- col_by_week5$arm |> unique() |> setdiff("Pl")

exact_test_results <-
  map(
  arms_to_test,
  ~ {
    arm_to_test <- .x
    tmp <- col_by_week5 |> filter(arm %in% c("Pl", arm_to_test))
    table_for_test <- table(tmp$LBP_colonization_by_week5, tmp$arm |> fct_drop())
    
    test_res <- 
      exact2x2::uncondExact2x2(
        x1 = table_for_test[2, 1], n1 = sum(table_for_test[, 1]),
        x2 = table_for_test[2, 2], n2 = sum(table_for_test[, 2]),
        alternative = "greater", parmtype = "ratio",
        conf.level = 0.95, conf.int = TRUE, method = "FisherAdj"
        )

    tibble(
      arm = arm_to_test,
      p_value = test_res$p.value,
      ratio = test_res$estimate,
      lower_CI = test_res$conf.int[1],
      upper_CI = test_res$conf.int[2]
    )
  }
) |> 
  bind_rows() |> 
  mutate(
    adj_p_value = p_value |> p.adjust(method = "BH")
  )

Code

test_results <- 
  exact_test_results |> 
  mutate(
    adj_p_value = adj_p_value |> scales::pvalue(accuracy = 0.001),
    ratio = ratio |> round(2) |> as.character(),
    CI = str_c(lower_CI |> round(2), " - ", upper_CI |> round(2))
    ) |> 
  select(arm, adj_p_value, ratio, CI) |>
  pivot_longer(cols = -arm, names_to = "name", values_to = "value") |>
  pivot_wider(names_from = arm, values_from = value) |> 
  mutate(
    name = 
      case_when(
        name == "adj_p_value" ~ "Adj. p-value",
        name == "ratio" ~ "Benefit ratio",
        name == "CI" ~ "95% CI",
        TRUE ~ name
      )
  )

Code

table_2_with_model_results <- 
  bind_rows(
    table_2,
    test_results
  ) |> 
  mutate(
    site = site |> str_replace_na(""),
    Pl = 
      case_when(
        name == "Adj. p-value" ~ "Ref.",
        name == "Benefit ratio" ~ "",
        str_detect(name, "CI") ~ "",
        TRUE ~ Pl
      )
  )

Code

table_2_with_model_results |> 
  group_by(site) |>
  gt(caption = "Table 2: Colonization by week 5 by arm and site", row_group_as_column = TRUE) |> 
  cols_width(
    "name" ~ px(150),
    everything() ~ px(120)
  ) |> 
  cols_label(
    name = "",
    Pl = "Placebo",
    `LC106-7` = "LC106<br>7 days",
    `LC106-3` = "LC106<br>3 days",
    `LC106-o` = "LC106<br>overlap",
    `LC115` = "LC115<br>7 days",
    .fn = md
  )

Table 2: Colonization by week 5 by arm and site
		Placebo	LC106 7 days	LC106 3 days	LC106 overlap	LC115 7 days
CAP	N participants	N = 14	N = 14	N = 14	N = 14	N = 14
	n (%) participants with LBP strain detected	0 (0 %)	9 (64 %)	7 (50 %)	7 (50 %)	10 (71 %)
	95% CI		39% - 84%	27% - 73%	27% - 73%	45% - 88%
MGH	N participants	N = 5	N = 7	N = 1	N = 1	N = 6
	n (%) participants with LBP strain detected	1 (20 %)	6 (86 %)	1 (100 %)	1 (100 %)	6 (100 %)
	95% CI		49% - 97%	21% - 100%	21% - 100%	61% - 100%
	Adj. p-value	Ref.	<0.001	<0.001	<0.001	<0.001
	Benefit ratio		13.57	10.13	10.13	15.2
	95% CI		1.72 - Inf	1.37 - Inf	1.37 - Inf	2.1 - Inf

Sensitivity analyses (Supplementary tables and figures)

LBP colonization by week 5 excluding immediately post-product visits

Defining the alternative primary outcome

We defined an alternative primary outcome that excluded visits conducted immediately after the last product administration. Specifically, for participants in the 7-dose arms (LC107-7 and LC115) and placebo, we eclude visit 1200, while for participants in the 3-dose arm (LC106-3) and overlap arm (LC106-o) only visits prior to 1200 (1000 and 1100) were excluded.

Code

new_outcome_def <- 
  # we start from the "colonized at"
  mae[["col_LBP_mg"]] |> 
  as_tibble() |> 
  dplyr::filter(.feature == "colonized_LBP_at_mg") |> 
  dplyr::left_join(
    mae@colData |> as_tibble() |> select(uid, pid, visit_code, arm),
    by = join_by(.sample == uid)
  ) |> 
  filter(!is.na(arm))

# we fill the missing visits
new_outcome_def <- 
  new_outcome_def |> 
  select(-c(arm)) |> 
  dplyr::full_join(
    new_outcome_def |> 
      select(pid, arm) |> 
      distinct() |> 
      expand_grid(
        visit_code = 
          unique(new_outcome_def$visit_code) |> sort() |> setdiff(c("0001","0010"))
        ),
    by = join_by(pid, visit_code)
  ) |> 
  arrange(pid, visit_code) |> 
  mutate(outcome = outcome |> replace_na(FALSE)) |> 
  dplyr::filter(!is.na(arm))
  
# we compute the new outcome
new_outcome_def <- 
  new_outcome_def |>  
  arrange(pid, visit_code) |> 
  group_by(pid) |> 
  mutate(
    tmp = 
      case_when(
        (arm %in% c("Pl", "LC106-7", "LC115")) & (as.numeric(visit_code) <= 1200) ~ FALSE,
        (arm %in% c("LC106-3", "LC106-o")) & (as.numeric(visit_code) < 1200) ~ FALSE,
        TRUE ~ outcome
      ),
    colonized_LBP_by_mg = cummax(tmp) |> as.logical()
  ) |> 
  ungroup()

Code

new_outcome_def |> 
  left_join(mae@colData |> as_tibble() |> select(pid, mITT) |> distinct()) |>
  ggplot() +
  aes(y = visit_code |> fct_rev(), x = pid, fill = colonized_LBP_by_mg) +
  facet_grid(. ~ arm + ifelse(mITT, "mITT", "ITT"), scales = "free", space = "free") +
  geom_tile() + 
  scale_fill_manual(
    "Colonized by LBP by MG by each visit",
    values = c("gray", "orangered"), labels = c("No", "Yes")
    ) +
  ylab("Visit code") + xlab("Participant IDs") +
  labs(
    caption = "Colonization at missed visits was considered negative."
  ) +
  theme(
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5),
    legend.position = "bottom"
  )

Code

new_outcome_def <- 
  new_outcome_def |> 
  dplyr::rename(colonized_LBP_by_mg_alt = colonized_LBP_by_mg) |> 
  left_join(
    mae[["col_LBP_mg"]] |> 
      as_tibble() |> 
      dplyr::filter(.feature == "colonized_LBP_by_mg") |> 
      select(.sample, outcome) |> 
      dplyr::rename(colonized_LBP_by_mg = outcome)
  )

Code

new_outcome_def |> 
  dplyr::filter(!is.na(colonized_LBP_by_mg)) |> 
  dplyr::count(colonized_LBP_by_mg_alt, colonized_LBP_by_mg)
new_outcome_def |> 
  dplyr::filter(visit_code == "1500", !is.na(colonized_LBP_by_mg)) |>
  dplyr::count(colonized_LBP_by_mg_alt, colonized_LBP_by_mg)

Code

col_by_week5 <- 
  new_outcome_def |> 
  select(pid, visit_code, colonized_LBP_by_mg_alt) |> 
  left_join(
    mae@colData |> as_tibble() |> 
      select(uid, pid, visit_code, site, randomized, arm, ITT, mITT, PP)
    ) |> 
  dplyr::filter(randomized, mITT) |> 
  dplyr::filter(visit_code == "1500") |> 
  mutate(LBP_colonization_by_week5 = colonized_LBP_by_mg_alt |> replace_na(FALSE))

Visualization of the alternative primary outcome

Code

col_by_week5 |> 
  arrange(site, arm, LBP_colonization_by_week5) |>
  group_by(site, arm) |> 
  mutate(participant_number = row_number() |> factor()) |> 
  ungroup() |> 
  ggplot(aes(x = participant_number, y = site |> fct_rev(), col = site, fill = site, shape = LBP_colonization_by_week5)) +
  geom_point(size = 3) +
  facet_grid(. ~ arm, scales = "free_x", space = "free_x") +
  guides(color = "none", fill = "none") +
  scale_shape_manual("Colonized by LBP by week 5 (excluding immediately post-product visit)", values = c(1, 16)) +
  scale_color_manual(values = site_colors) + 
  xlab("Participant number") + ylab("") +
  theme(
    legend.position = "top",
    legend.text = element_text(hjust = 0.5),
    legend.title = element_text(hjust = 0.5),
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)
  ) +
  guides(shape = guide_legend(ncol = 2))

Table

Code

summary_table <- 
  col_by_week5 |> 
  group_by(site, arm) |> 
  summarize(
    n = n(), 
    n_success = sum(LBP_colonization_by_week5),
    .groups = "drop"
  ) |> 
  mutate(
    p = n_success / n,
    CI = binom::binom.confint(n_success, n, method = "wilson")
  )

Code

table_2 <- 
  summary_table |> 
  mutate(
    N_per_site_and_arm = str_c("N = ", n),
    LBP_strain_detected = str_c(n_success, " (", round(p * 100)," %)"),
    CI_text = str_c(round(CI$lower * 100), "% - ", round(CI$upper * 100), "%")
  ) |> 
  dplyr::select(
    site, arm, N_per_site_and_arm, LBP_strain_detected, CI_text
  ) |> 
  pivot_longer(-c(site, arm)) |>
  mutate(
    name = 
      case_when(
        name == "N_per_site_and_arm" ~ "N participants",
        name == "LBP_strain_detected" ~ "n (%) participants\nwith LBP strain detected",
        name == "CI_text" ~ "95% CI"
      )
  ) |> 
  pivot_wider(
    id_cols = c(site, name), 
    names_from = arm,
    values_from = value, values_fill = ""
    )

Code

arms_to_test <- col_by_week5$arm |> unique() |> setdiff("Pl")

exact_test_results <-
  map(
  arms_to_test,
  ~ {
    arm_to_test <- .x
    tmp <- col_by_week5 |> filter(arm %in% c("Pl", arm_to_test))
    table_for_test <- table(tmp$LBP_colonization_by_week5, tmp$arm |> fct_drop())
    test_res <- 
      exact2x2::uncondExact2x2(
        x1 = table_for_test[2, 1], n1 = sum(table_for_test[, 1]),
        x2 = table_for_test[2, 2], n2 = sum(table_for_test[, 2]),
        alternative = "greater", parmtype = "ratio",
        conf.level = 0.95, conf.int = TRUE, method = "FisherAdj"
      )
    tibble(
      arm = arm_to_test,
      p_value = test_res$p.value,
      ratio = test_res$estimate,
      lower_CI = test_res$conf.int[1],
      upper_CI = test_res$conf.int[2]
    )
  }
) |> 
  bind_rows() |> 
  mutate(
    adj_p_value = p_value |> p.adjust(method = "BH")
  )

Code

test_results <- 
  exact_test_results |> 
  mutate(
    adj_p_value = adj_p_value |> scales::pvalue(accuracy = 0.001),
    ratio = ratio |> round(2) |> as.character(),
    CI = str_c(lower_CI |> round(2), " - ", upper_CI |> round(2))
    ) |> 
  select(arm, adj_p_value, ratio, CI) |>
  pivot_longer(cols = -arm, names_to = "name", values_to = "value") |>
  pivot_wider(names_from = arm, values_from = value) |> 
  mutate(
    name = 
      case_when(
        name == "adj_p_value" ~ "Adj. p-value",
        name == "ratio" ~ "Benefit ratio",
        name == "CI" ~ "95% CI",
        TRUE ~ name
      )
  )

Code

table_2_with_model_results <- 
  bind_rows(
    table_2,
    test_results # |> filter(name != "Adj. p-value")
  ) |> 
  mutate(
    site = site |> str_replace_na(""),
    Pl = 
      case_when(
        name == "Adj. p-value" ~ "Ref.",
        name == "Benefit ratio" ~ "-",
        str_detect(name, "CI") ~ "",
        TRUE ~ Pl
      )
  )

Code

table_2_with_model_results |> 
  group_by(site) |>
  gt(
    caption = "LBP Colonization by week 5 by arm and site excluding immediately post-product visit.", 
    row_group_as_column = TRUE
    ) |> 
  cols_width(
    "name" ~ px(150),
    everything() ~ px(120)
  ) |> 
  cols_label(
    name = "",
    Pl = "Placebo",
    `LC106-7` = "LC106<br>7 days",
    `LC106-3` = "LC106<br>3 days",
    `LC106-o` = "LC106<br>overlap",
    `LC115` = "LC115<br>7 days",
    .fn = md
  )

LBP Colonization by week 5 by arm and site excluding immediately post-product visit.
		Placebo	LC106 7 days	LC106 3 days	LC106 overlap	LC115 7 days
CAP	N participants	N = 13	N = 14	N = 14	N = 14	N = 14
	n (%) participants with LBP strain detected	0 (0 %)	8 (57 %)	7 (50 %)	7 (50 %)	6 (43 %)
	95% CI		33% - 79%	27% - 73%	27% - 73%	21% - 67%
MGH	N participants	N = 4	N = 7	N = 1	N = 1	N = 6
	n (%) participants with LBP strain detected	1 (25 %)	4 (57 %)	1 (100 %)	1 (100 %)	4 (67 %)
	95% CI		25% - 84%	21% - 100%	21% - 100%	30% - 90%
	Adj. p-value	Ref.	0.002	0.002	0.002	0.002
	Benefit ratio	-	9.71	9.07	9.07	8.5
	95% CI		1.35 - Inf	1.35 - Inf	1.35 - Inf	1.27 - Inf

LBP colonization by week 5, qPCR-estimated relative abundances

For this sensitivity analysis, we define colonization by any of the LBP strains using relative abundances estimated from qPCR data. For each participant and visit, the relative abundance of each LBP strain was computed as the ratio of the strain-specific copies per swab to the total 16S rRNA gene copies per swab. In line with the metagenomic definition, colonization at a given visit was considered present if the total relative abundance of all LBP strains exceeded 10% or if the maximum relative abundance of any single LBP strain exceeded 5%. These analyses included the immediate post-product visits.

The computation itself has been done at the QC/Pre-processing stage (see the VIBRANT-0-QC-and-preprocessing GitHub repository).

Code

# by qPCR 

col_by_week5_qPCR <-
  mae@colData |> as_tibble() |> 
  select(pid, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |> 
  left_join(
    mae[["col_LBP_qPCR"]] |> 
      as_tibble() |> 
      dplyr::left_join(
        mae@colData |> as_tibble() |> select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |> 
      dplyr::filter(visit_code == "1500", .feature == "colonized_LBP_by_qpcr")
  ) |> 
  mutate(LBP_colonization_by_week5 = outcome |> replace_na(FALSE))

Code

col_by_week5_qPCR |> 
  arrange(site, arm, LBP_colonization_by_week5) |>
  group_by(site, arm) |> 
  mutate(participant_number = row_number() |> factor()) |> 
  ungroup() |> 
  ggplot(aes(x = participant_number, y = site |> fct_rev(), col = site, fill = site, shape = LBP_colonization_by_week5)) +
  geom_point(size = 3) +
  facet_grid(. ~ arm, scales = "free_x", space = "free_x") +
  guides(color = "none", fill = "none") +
  scale_shape_manual("Colonized by LBP by week 5", values = c(1, 16)) +
  scale_color_manual(values = site_colors) + 
  xlab("Participant number") + ylab("") +
  ggtitle("Colonization based on the relative abundances estimated by qPCR") +
  theme(
    legend.position = "top",
    legend.text = element_text(hjust = 0.5),
    legend.title = element_text(hjust = 0.5),
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5), 
    plot.title = element_text(hjust = 0.5)
  ) +
  guides(shape = guide_legend(ncol = 2))

Table

Code

summary_table <- 
  col_by_week5_qPCR |> 
  group_by(site, arm) |> 
  summarize(
    n = n(), 
    n_success = sum(LBP_colonization_by_week5),
    .groups = "drop"
  ) |> 
  mutate(
    p = n_success / n,
    CI = binom::binom.confint(n_success, n, method = "wilson") #exact
  )

Code

table_2 <- 
  summary_table |> 
  mutate(
    N_per_site_and_arm = str_c("N = ", n),
    LBP_strain_detected = str_c(n_success, " (", round(p * 100)," %)"),
    CI_text = str_c(round(CI$lower * 100), "% - ", round(CI$upper * 100), "%")
  ) |> 
  dplyr::select(
    site, arm, N_per_site_and_arm, LBP_strain_detected, CI_text
  ) |> 
  pivot_longer(-c(site, arm)) |>
  mutate(
    name = 
      case_when(
        name == "N_per_site_and_arm" ~ "N participants",
        name == "LBP_strain_detected" ~ "n (%) participants\nwith LBP strain detected",
        name == "CI_text" ~ "95% CI"
      )
  ) |> 
  pivot_wider(
    id_cols = c(site, name), 
    names_from = arm,
    values_from = value, values_fill = ""
    )

Code

arms_to_test <- col_by_week5$arm |> unique() |> setdiff("Pl")

exact_test_results <-
  map(
  arms_to_test,
  ~ {
    arm_to_test <- .x
    tmp <- col_by_week5_qPCR |> filter(arm %in% c("Pl", arm_to_test))
    table_for_test <- table(tmp$LBP_colonization_by_week5, tmp$arm |> fct_drop())
    test_res <- 
      exact2x2::uncondExact2x2(
        x1 = table_for_test[2, 1], n1 = sum(table_for_test[, 1]),
        x2 = table_for_test[2, 2], n2 = sum(table_for_test[, 2]),
        alternative = "greater", parmtype = "ratio",
        conf.level = 0.95, conf.int = TRUE, method = "FisherAdj"
        )
    tibble(
      arm = arm_to_test,
      p_value = test_res$p.value,
      ratio = test_res$estimate,
      lower_CI = test_res$conf.int[1],
      upper_CI = test_res$conf.int[2]
    )
  }
) |> 
  bind_rows() |> 
  mutate(
    adj_p_value = p_value |> p.adjust(method = "BH")
  )

Code

test_results <- 
  exact_test_results |> 
  mutate(
    adj_p_value = adj_p_value |> scales::pvalue(accuracy = 0.001),
    ratio = ratio |> round(2) |> as.character(),
    CI = str_c(lower_CI |> round(2), " - ", upper_CI |> round(2))
    ) |> 
  select(arm, adj_p_value, ratio, CI) |>
  pivot_longer(cols = -arm, names_to = "name", values_to = "value") |>
  pivot_wider(names_from = arm, values_from = value) |> 
  mutate(
    name = 
      case_when(
        name == "adj_p_value" ~ "Adj. p-value",
        name == "ratio" ~ "Benefit ratio",
        name == "CI" ~ "95% CI",
        TRUE ~ name
      )
  )

Code

table_2_with_model_results <- 
  bind_rows(
    table_2,
    test_results # |> filter(name != "Adj. p-value")
  ) |> 
  mutate(
    site = site |> str_replace_na(""),
    Pl = 
      case_when(
        name == "Adj. p-value" ~ "Ref.",
        name == "Benefit ratio" ~ "-",
        str_detect(name, "CI") ~ "",
        TRUE ~ Pl
      )
  )

Code

table_2_with_model_results |> 
  group_by(site) |>
  gt(
    caption = "LBP Colonization by week 5 by arm, with colonization definition based on qPCR-estimated relative abundance.", 
    row_group_as_column = TRUE
    ) |> 
  cols_width(
    "name" ~ px(150),
    everything() ~ px(120)
  ) |> 
  cols_label(
    name = "",
    Pl = "Placebo",
    `LC106-7` = "LC106<br>7 days",
    `LC106-3` = "LC106<br>3 days",
    `LC106-o` = "LC106<br>overlap",
    `LC115` = "LC115<br>7 days",
    .fn = md
  )

LBP Colonization by week 5 by arm, with colonization definition based on qPCR-estimated relative abundance.
		Placebo	LC106 7 days	LC106 3 days	LC106 overlap	LC115 7 days
CAP	N participants	N = 14	N = 14	N = 14	N = 14	N = 14
	n (%) participants with LBP strain detected	0 (0 %)	8 (57 %)	7 (50 %)	8 (57 %)	10 (71 %)
	95% CI		33% - 79%	27% - 73%	33% - 79%	45% - 88%
MGH	N participants	N = 5	N = 7	N = 1	N = 1	N = 6
	n (%) participants with LBP strain detected	1 (20 %)	6 (86 %)	1 (100 %)	1 (100 %)	6 (100 %)
	95% CI		49% - 97%	21% - 100%	21% - 100%	61% - 100%
	Adj. p-value	Ref.	<0.001	<0.001	<0.001	<0.001
	Benefit ratio	-	12.67	10.13	11.4	15.2
	95% CI		1.58 - Inf	1.37 - Inf	1.47 - Inf	2.1 - Inf

LBP detection by week 5, qPCR-based detection

In this analysis, LBP is considered to be detected if at least 2 LBP strains had observed Cq values at the same visit by week 5. “By week 5” includes all post-product visits from visit_code = 1200 (no excluding the immediate post-product visits.

Code

detected_by_week5_qPCR <-
  mae@colData |> as_tibble() |> 
  select(pid, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |> 
  left_join(
    mae[["col_LBP_qPCR"]] |> 
      as_tibble() |> 
      dplyr::left_join(
        mae@colData |> as_tibble() |> select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |> 
      dplyr::filter(visit_code == "1500", .feature == "LBP_detected_by_qpcr")
  ) |> 
  mutate(LBP_detected_by_week5 = outcome |> replace_na(FALSE))

Code

detected_by_week5_qPCR |> 
  arrange(site, arm, LBP_detected_by_week5) |>
  group_by(site, arm) |> 
  mutate(participant_number = row_number() |> factor()) |> 
  ungroup() |> 
  ggplot(aes(x = participant_number, y = site |> fct_rev(), col = site, fill = site, shape = LBP_detected_by_week5)) +
  geom_point(size = 3) +
  facet_grid(. ~ arm, scales = "free_x", space = "free_x") +
  guides(color = "none", fill = "none") +
  scale_shape_manual("Colonized by LBP by week 5", values = c(1, 16)) +
  scale_color_manual(values = site_colors) + 
  xlab("Participant number") + ylab("") +
  ggtitle("Colonization based on qPCR-based detection of at least 2 LBP strains detected at the same visit by week 5") +
  theme(
    legend.position = "top",
    legend.text = element_text(hjust = 0.5),
    legend.title = element_text(hjust = 0.5),
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5), 
    plot.title = element_text(hjust = 0.5)
  ) +
  guides(shape = guide_legend(ncol = 2))

Table

Code

summary_table <- 
  detected_by_week5_qPCR |> 
  group_by(site, arm) |> 
  summarize(
    n = n(), 
    n_success = sum(LBP_detected_by_week5),
    .groups = "drop"
  ) |> 
  mutate(
    p = n_success / n,
    CI = binom::binom.confint(n_success, n, method = "wilson") #exact
  )

Code

table_2 <- 
  summary_table |> 
  mutate(
    N_per_site_and_arm = str_c("N = ", n),
    LBP_strain_detected = str_c(n_success, " (", round(p * 100)," %)"),
    CI_text = str_c(round(CI$lower * 100), "% - ", round(CI$upper * 100), "%")
  ) |> 
  dplyr::select(
    site, arm, N_per_site_and_arm, LBP_strain_detected, CI_text
  ) |> 
  pivot_longer(-c(site, arm)) |>
  mutate(
    name = 
      case_when(
        name == "N_per_site_and_arm" ~ "N participants",
        name == "LBP_strain_detected" ~ "n (%) participants\nwith LBP strain detected",
        name == "CI_text" ~ "95% CI"
      )
  ) |> 
  pivot_wider(
    id_cols = c(site, name), 
    names_from = arm,
    values_from = value, values_fill = ""
    )

Code

arms_to_test <- col_by_week5$arm |> unique() |> setdiff("Pl")

exact_test_results <-
  map(
  arms_to_test,
  ~ {
    arm_to_test <- .x
    tmp <- detected_by_week5_qPCR |> filter(arm %in% c("Pl", arm_to_test))
    table_for_test <- table(tmp$LBP_detected_by_week5, tmp$arm |> fct_drop())
    test_res <- 
      exact2x2::uncondExact2x2(
        x1 = table_for_test[2, 1], n1 = sum(table_for_test[, 1]),
        x2 = table_for_test[2, 2], n2 = sum(table_for_test[, 2]),
        alternative = "greater", parmtype = "ratio",
        conf.level = 0.95, conf.int = TRUE, method = "FisherAdj"
        )
    tibble(
      arm = arm_to_test,
      p_value = test_res$p.value,
      ratio = test_res$estimate,
      lower_CI = test_res$conf.int[1],
      upper_CI = test_res$conf.int[2]
    )
  }
) |> 
  bind_rows() |> 
  mutate(
    adj_p_value = p_value |> p.adjust(method = "BH")
  )

Code

test_results <- 
  exact_test_results |> 
  mutate(
    adj_p_value = adj_p_value |> scales::pvalue(accuracy = 0.001),
    ratio = ratio |> round(2) |> as.character(),
    CI = str_c(lower_CI |> round(2), " - ", upper_CI |> round(2))
    ) |> 
  select(arm, adj_p_value, ratio, CI) |>
  pivot_longer(cols = -arm, names_to = "name", values_to = "value") |>
  pivot_wider(names_from = arm, values_from = value) |> 
  mutate(
    name = 
      case_when(
        name == "adj_p_value" ~ "Adj. p-value",
        name == "ratio" ~ "Benefit ratio",
        name == "CI" ~ "95% CI",
        TRUE ~ name
      )
  )

Code

table_2_with_model_results <- 
  bind_rows(
    table_2,
    test_results # |> filter(name != "Adj. p-value")
  ) |> 
  mutate(
    site = site |> str_replace_na(""),
    Pl = 
      case_when(
        name == "Adj. p-value" ~ "Ref.",
        name == "Benefit ratio" ~ "-",
        str_detect(name, "CI") ~ "",
        TRUE ~ Pl
      )
  )

Code

table_2_with_model_results |> 
  group_by(site) |>
  gt(
    caption = "LBP detection by week 5 by arm, with LBP detection defined as at least two LBP strains simultaneously detected by qPCR.", 
    row_group_as_column = TRUE
    ) |> 
  cols_width(
    "name" ~ px(150),
    everything() ~ px(120)
  ) |> 
  cols_label(
    name = "",
    Pl = "Placebo",
    `LC106-7` = "LC106<br>7 days",
    `LC106-3` = "LC106<br>3 days",
    `LC106-o` = "LC106<br>overlap",
    `LC115` = "LC115<br>7 days",
    .fn = md
  )

LBP detection by week 5 by arm, with LBP detection defined as at least two LBP strains simultaneously detected by qPCR.
		Placebo	LC106 7 days	LC106 3 days	LC106 overlap	LC115 7 days
CAP	N participants	N = 14	N = 14	N = 14	N = 14	N = 14
	n (%) participants with LBP strain detected	2 (14 %)	12 (86 %)	8 (57 %)	9 (64 %)	11 (79 %)
	95% CI		60% - 96%	33% - 79%	39% - 84%	52% - 92%
MGH	N participants	N = 5	N = 7	N = 1	N = 1	N = 6
	n (%) participants with LBP strain detected	3 (60 %)	6 (86 %)	1 (100 %)	1 (100 %)	6 (100 %)
	95% CI		49% - 97%	21% - 100%	21% - 100%	61% - 100%
	Adj. p-value	Ref.	<0.001	0.028	0.015	<0.001
	Benefit ratio	-	3.26	2.28	2.53	3.23
	95% CI		1.47 - Inf	1.07 - Inf	1.15 - Inf	1.47 - Inf

L. crispatus dominance by week 5

In this analysis, L. crispatus dominance is defined as a metagenomic-estimated relative abundance ≥ 50%. “By week 5” includes all post-product visits from visit_code = 1200 (no excluding the immediate post-product visits.

Code

# crispatus dominance 

crisp_dom_by_5week <- 
  mae@colData |> as_tibble() |> 
  select(pid, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |> 
  left_join(
    mae[["col_crispatus_mg"]] |> 
      as_tibble() |> 
      dplyr::left_join(
        mae@colData |> as_tibble() |> select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |> 
      dplyr::filter(visit_code == "1500", .feature == "crispatus_dominance_by_mg")
  ) |> 
  mutate(crisp_dom_by_5week = outcome |> replace_na(FALSE))

Code

crisp_dom_by_5week |> 
  arrange(site, arm, crisp_dom_by_5week) |>
  group_by(site, arm) |> 
  mutate(participant_number = row_number() |> factor()) |> 
  ungroup() |> 
  ggplot(aes(x = participant_number, y = site |> fct_rev(), col = site, fill = site, shape = crisp_dom_by_5week)) +
  geom_point(size = 3) +
  facet_grid(. ~ arm, scales = "free_x", space = "free_x") +
  guides(color = "none", fill = "none") +
  scale_shape_manual("Colonized by LBP by week 5", values = c(1, 16)) +
  scale_color_manual(values = site_colors) + 
  xlab("Participant number") + ylab("") +
  ggtitle("L. crispatus dominance (≥ 50% by metagenomics)") +
  theme(
    legend.position = "top",
    legend.text = element_text(hjust = 0.5),
    legend.title = element_text(hjust = 0.5),
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5), 
    plot.title = element_text(hjust = 0.5)
  ) +
  guides(shape = guide_legend(ncol = 2))

Table

Code

summary_table <- 
  crisp_dom_by_5week |> 
  group_by(site, arm) |> 
  summarize(
    n = n(), 
    n_success = sum(crisp_dom_by_5week),
    .groups = "drop"
  ) |> 
  mutate(
    p = n_success / n,
    CI = binom::binom.confint(n_success, n, method = "wilson") #exact
  )

Code

table_2 <- 
  summary_table |> 
  mutate(
    N_per_site_and_arm = str_c("N = ", n),
    Lc_dom = str_c(n_success, " (", round(p * 100)," %)"),
    CI_text = str_c(round(CI$lower * 100), "% - ", round(CI$upper * 100), "%")
  ) |> 
  dplyr::select(
    site, arm, N_per_site_and_arm, Lc_dom, CI_text
  ) |> 
  pivot_longer(-c(site, arm)) |>
  mutate(
    name = 
      case_when(
        name == "N_per_site_and_arm" ~ "N participants",
        name == "Lc_dom" ~ "n (%) participants\n≥ 50% L. crispatus",
        name == "CI_text" ~ "95% CI"
      )
  ) |> 
  pivot_wider(
    id_cols = c(site, name), 
    names_from = arm,
    values_from = value, values_fill = ""
    )

Code

arms_to_test <- col_by_week5$arm |> unique() |> setdiff("Pl")

exact_test_results <-
  map(
  arms_to_test,
  ~ {
    arm_to_test <- .x
    tmp <- crisp_dom_by_5week |> filter(arm %in% c("Pl", arm_to_test))
    table_for_test <- table(tmp$crisp_dom_by_5week, tmp$arm |> fct_drop())
    test_res <- 
      exact2x2::uncondExact2x2(
        x1 = table_for_test[2, 1], n1 = sum(table_for_test[, 1]),
        x2 = table_for_test[2, 2], n2 = sum(table_for_test[, 2]),
        alternative = "greater", parmtype = "ratio",
        conf.level = 0.95, conf.int = TRUE, method = "FisherAdj"
        )
    tibble(
      arm = arm_to_test,
      p_value = test_res$p.value,
      ratio = test_res$estimate,
      lower_CI = test_res$conf.int[1],
      upper_CI = test_res$conf.int[2]
    )
  }
) |> 
  bind_rows() |> 
  mutate(
    adj_p_value = p_value |> p.adjust(method = "BH")
  )

Code

test_results <- 
  exact_test_results |> 
  mutate(
    adj_p_value = adj_p_value |> scales::pvalue(accuracy = 0.001),
    ratio = ratio |> round(2) |> as.character(),
    CI = str_c(lower_CI |> round(2), " - ", upper_CI |> round(2))
    ) |> 
  select(arm, adj_p_value, ratio, CI) |>
  pivot_longer(cols = -arm, names_to = "name", values_to = "value") |>
  pivot_wider(names_from = arm, values_from = value) |> 
  mutate(
    name = 
      case_when(
        name == "adj_p_value" ~ "Adj. p-value",
        name == "ratio" ~ "Benefit ratio",
        name == "CI" ~ "95% CI",
        TRUE ~ name
      )
  )

Code

table_2_with_model_results <- 
  bind_rows(
    table_2,
    test_results # |> filter(name != "Adj. p-value")
  ) |> 
  mutate(
    site = site |> str_replace_na(""),
    Pl = 
      case_when(
        name == "Adj. p-value" ~ "Ref.",
        name == "Benefit ratio" ~ "-",
        str_detect(name, "CI") ~ "",
        TRUE ~ Pl
      )
  )

Code

table_2_with_model_results |> 
  group_by(site) |>
  gt(
    caption = "Ever ≥50% L. crispatus by week 5 by arm.", 
    row_group_as_column = TRUE
    ) |> 
  cols_width(
    "name" ~ px(150),
    everything() ~ px(120)
  ) |> 
  cols_label(
    name = "",
    Pl = "Placebo",
    `LC106-7` = "LC106<br>7 days",
    `LC106-3` = "LC106<br>3 days",
    `LC106-o` = "LC106<br>overlap",
    `LC115` = "LC115<br>7 days",
    .fn = md
  )

Ever ≥50% L. crispatus by week 5 by arm.
		Placebo	LC106 7 days	LC106 3 days	LC106 overlap	LC115 7 days
CAP	N participants	N = 14	N = 14	N = 14	N = 14	N = 14
	n (%) participants ≥ 50% L. crispatus	0 (0 %)	8 (57 %)	5 (36 %)	6 (43 %)	10 (71 %)
	95% CI		33% - 79%	16% - 61%	21% - 67%	45% - 88%
MGH	N participants	N = 5	N = 7	N = 1	N = 1	N = 6
	n (%) participants ≥ 50% L. crispatus	1 (20 %)	6 (86 %)	1 (100 %)	1 (100 %)	5 (83 %)
	95% CI		49% - 97%	21% - 100%	21% - 100%	44% - 97%
	Adj. p-value	Ref.	<0.001	0.008	0.004	<0.001
	Benefit ratio	-	12.67	7.6	8.87	14.25
	95% CI		1.58 - Inf	1.21 - Inf	1.29 - Inf	1.89 - Inf

Secondary outcomes (Figures 2-4)

Code

fig_title <- mae_title(mae)

Kinetics of LBP colonization

Total relative abundances of LBP strains

To investigate the kinetics of colonization, we estimated at each clinic visit the total relative abundance of all LBP strains combined from metagenomic sequencing data. For each participant, the relative abundances of the individual LBP strains were summed, and trajectories were visualized over time stratified by study arm and site.

Code

total_rel_ab_of_LBP <- 
  mae[["mg"]] |> 
  as_tibble() |> 
  filter(!is.na(LBP), !poor_quality_mg_data) |> 
  left_join(
    mae |> colData() |> as_tibble(), 
    by = join_by(uid)
  ) |>
  filter(randomized, mITT) |> 
  group_by(.sample, uid) |>
  summarize(
    total_LBP_rel_ab = sum(rel_ab, na.rm = TRUE),
    .groups = "drop"
  )

total_rel_ab_of_LBP <- 
  total_rel_ab_of_LBP |> 
  dplyr::left_join(mae |> colData() |> as_tibble(), by = join_by(uid))


total_rel_ab_of_LBP <- 
  total_rel_ab_of_LBP |> 
  left_join(
    total_rel_ab_of_LBP |> 
      dplyr::select(pid, arm, site) |> 
      distinct() |> 
      group_by(arm, site) |> 
      arrange(pid) |> 
      mutate(pid_nb = row_number()) |> 
      ungroup()
  )

Code

g_tot_prop_LBP_strains <- 
  total_rel_ab_of_LBP |>
  filter(visit_type == "Clinic", !is.na(arm), PIPV) |> 
  ggplot(aes(x = study_week, y = total_LBP_rel_ab, col = arm)) +
  geom_line(aes(group = pid), alpha = 0.5) +
  geom_point(size = 0.5, alpha = 0.5) +
  facet_grid(site ~ arm) + 
  scale_x_continuous("Study weeks", breaks = c(-1:5,7,9,12), minor_breaks = 0:12) +
  scale_y_continuous(
    "Total relative abundances\nof LBP strains", labels = scales::percent,
    breaks = seq(0, 1, by = 0.2)
  ) +
  expand_limits(y = 0) +
  scale_color_manual(values = arm_colors) +
  guides(col = "none") +
  ggtitle(fig_title)

Code

# g_tot_prop_LBP_strains

To facilitate interpretation, each participant within the same study arm and site is labeled by a unique number. These numbers allow visual tracking of individual trajectories across visits within each stratified panel.

Code

g_tot_prop_LBP_strains +
  geom_label(aes(label = pid_nb), size = 2) +
  labs(
    caption = "Participants are labeled by their enrollment order within each arm and site."
  )

Proportions of participants who colonized in each arm and each visit

Code

get_longitudinal_proportions_of_participants_who_colonized <- 
  function(total_rel_ab_of_LBP) {
    total_rel_ab_of_LBP |> 
      filter(!is.na(arm), PIPV) |> 
      group_by(site, arm, study_week) |>
      summarize(
        prop_who_colonized_at = mean(total_LBP_rel_ab > 0.05), 
        CI_low = 
          prop.test(
            x = sum(total_LBP_rel_ab > 0.05), n = n(), 
            conf.level = 0.95)$conf.int[1],
        CI_high = 
          prop.test(
            x = sum(total_LBP_rel_ab > 0.05), n = n(), 
            conf.level = 0.95)$conf.int[2],
        .groups = "drop"
      )
  }

arm_colors <- c(
  "LC106-7" = "darkorange", 
  "LC115"   = "#8E2708",
  "LC106-3" = "dodgerblue1",
  "LC106-o" = "deeppink",
  "Pl"       = "#B3B3B3"
)

Code

plot_longitudinal_proportions_of_participants_who_colonized <- function(total_rel_ab_of_LBP){
    
    get_longitudinal_proportions_of_participants_who_colonized(total_rel_ab_of_LBP) |>
      ggplot() +
      aes(x = study_week, y = prop_who_colonized_at, col = arm, shape = arm) +
      facet_grid(site ~ .) +
      geom_ribbon(
        aes(ymin = CI_low, ymax = CI_high, fill = arm, colour = arm), 
        alpha = 0.05, size = 0.2 #, color = NA
      ) +
      geom_line(linewidth = 0.75) + #size = 0.35
      geom_point(size = 3) + 
      scale_x_continuous("Study weeks", breaks = c(-1:5,7,9,12), minor_breaks = 0:12) +
      scale_y_continuous(
        "Proportion of participants\nwith detected LBP strains", labels = scales::percent,
        breaks = seq(0, 1, by = 0.2)
      )  +
      expand_limits(y = 0) +
    scale_color_manual("Arm",values = arm_colors) +
    scale_fill_manual("Arm",values = arm_colors) + 
    scale_shape_manual("Arm", values = c(1, 19, 15, 17, 8)) 
  }

Code

g_prop_colonized_by <- plot_longitudinal_proportions_of_participants_who_colonized(total_rel_ab_of_LBP)
g_prop_colonized_by # + ggtitle(fig_title)

For the sake of clarity, we remove the two arms that only one participant in the MGH is involved in.

Code

g_prop_colonized_by_final <- 
  plot_longitudinal_proportions_of_participants_who_colonized(
    total_rel_ab_of_LBP |> filter(!((arm %in% c("LC106-3", "LC106-o")) & (site == "MGH")))
    )
g_prop_colonized_by_final + ggtitle(fig_title)

Code

plot_longitudinal_proportions_of_participants_who_colonized_v2 <- 
  function(total_rel_ab_of_LBP){
    
    get_longitudinal_proportions_of_participants_who_colonized(total_rel_ab_of_LBP) |>
      ggplot(aes(x = study_week, y = prop_who_colonized_at, col = arm)) +
      geom_linerange(
        aes(ymin = CI_low, ymax = CI_high, col = arm), 
        alpha = 0.4, lineend = "round", linewidth = 1,
        position = position_dodge(width = 0.5)
      ) +
      geom_line(position = position_dodge(width = 0.5), linewidth = 0.8) +
      geom_point(aes(shape = arm), position = position_dodge(width = 0.5), size = 2) +
      facet_grid(site ~ .) + 
      scale_shape_manual("Arm", values = c(1, 19, 15, 17, 8)) +
      scale_x_continuous("Study weeks", breaks = c(-1:5,7,9,12), minor_breaks = 0:12) +
      scale_y_continuous(
        "Proportion of participants\nwith detected LBP strains", labels = scales::percent,
        breaks = seq(0, 1, by = 0.2)
      ) +
      expand_limits(y = 0) +
      scale_color_manual("Arm", values = arm_colors) +
      scale_fill_manual("Arm", values = arm_colors)
    
  }

Alternative visualization:

Code

g_prop_colonized_by <- plot_longitudinal_proportions_of_participants_who_colonized_v2(total_rel_ab_of_LBP)
g_prop_colonized_by +  ggtitle(fig_title)

Code

g_prop_colonized_by <- 
  plot_longitudinal_proportions_of_participants_who_colonized_v2(
    total_rel_ab_of_LBP |> filter(!((arm %in% c("LC106-3", "LC106-o")) & (site == "MGH")))
    )

g_prop_colonized_by +  ggtitle(fig_title)

Code

total_rel_ab_of_LBP |> 
  filter(!is.na(study_week), study_week >= 0) |> 
  mutate(
    arm_group = case_when(
      arm == "Pl" ~ "Placebo",
      TRUE ~ "Active arms"
    ) |> factor(levels = c("Placebo", "Active arms"))
  ) |> 
  group_by(
    arm_group, visit_code, study_week
  ) |> 
  summarize(
    n = n(), 
    n_colonized = sum(total_LBP_rel_ab >= 0.05),
    prop_colonized = n_colonized/n,
    CI_low = 
      prop.test(
        x = n_colonized, n = n, 
        conf.level = 0.95)$conf.int[1],
    CI_high = 
      prop.test(
        x = n_colonized, n = n, 
        conf.level = 0.95)$conf.int[2],
    .groups = "drop"
  ) |> 
  mutate(
    n_N = str_c(n_colonized, " / ", n),
    perc = str_c(round(prop_colonized * 100, 1), "% (", round(CI_low * 100, 1), " - ", round(CI_high * 100, 1), "%)"),
  ) |> 
  select(arm_group, study_week, n_N, perc) |> 
  pivot_wider(
    names_from = arm_group,
    values_from = c(n_N, perc),
    names_vary = "slowest"
  ) |> 
gt() |> 
  cols_label(
    study_week = "Study week",
    n_N_Placebo = "N colonized (Placebo)",
    `n_N_Active arms` = "N colonized (Active arms)",
    perc_Placebo = "Proportion colonized (Placebo)",
    `perc_Active arms` = "Proportion colonized (Active arms)"
  )

Study week	N colonized (Placebo)	Proportion colonized (Placebo)	N colonized (Active arms)	Proportion colonized (Active arms)
0	0 / 19	0% (0 - 20.9%)	0 / 71	0% (0 - 6.4%)
1	0 / 19	0% (0 - 20.9%)	13 / 70	18.6% (10.6 - 30%)
2	0 / 18	0% (0 - 21.9%)	43 / 68	63.2% (50.6 - 74.4%)
3	0 / 18	0% (0 - 21.9%)	26 / 67	38.8% (27.4 - 51.5%)
4	0 / 16	0% (0 - 24.1%)	34 / 70	48.6% (36.6 - 60.7%)
5	1 / 17	5.9% (0.3 - 30.8%)	26 / 71	36.6% (25.7 - 49%)
7	1 / 17	5.9% (0.3 - 30.8%)	26 / 67	38.8% (27.4 - 51.5%)
9	0 / 18	0% (0 - 21.9%)	22 / 69	31.9% (21.5 - 44.3%)
12	0 / 17	0% (0 - 22.9%)	24 / 68	35.3% (24.4 - 47.9%)

Kinetics of L. crispatus colonization

Total relative abundances of L. crispatus

We further examined colonization dynamics by focusing on Lactobacillus crispatus, irrespective of whether strains belonged to the investigational product.

For each participant and visit, the total relative abundance of L. crispatus was calculated by summing all metagenomic features annotated to this species. Trajectories were visualized over time stratified by study arm and site.

Code

total_rel_ab_of_Lc <- 
  mae[["mg"]] |> 
  as_tibble() |> 
  filter(str_detect(species, "crispatus"), !poor_quality_mg_data) |> 
  left_join(
    mae |> colData() |> as_tibble(), 
    by = join_by(uid)
  ) |>
  filter(randomized, mITT) |> 
  group_by(.sample, uid) |>
  summarize(
    total_Lc_rel_ab = sum(rel_ab, na.rm = TRUE),
    .groups = "drop"
  )

total_rel_ab_of_Lc <- 
  total_rel_ab_of_Lc |> 
  dplyr::left_join(mae |> colData() |> as_tibble(), by = join_by(uid))


# we add a pid_nb 
total_rel_ab_of_Lc <- 
  total_rel_ab_of_Lc |> 
  left_join(
    total_rel_ab_of_Lc |> 
      dplyr::select(pid, arm, site) |> 
      distinct() |> 
      group_by(arm, site) |> 
      arrange(pid) |> 
      mutate(pid_nb = row_number()) |> 
      ungroup()
  )

Code

g_tot_prop_Lc <- 
  total_rel_ab_of_Lc |>
  filter(visit_type == "Clinic", !is.na(arm), PIPV) |> 
  ggplot(aes(x = study_week, y = total_Lc_rel_ab, col = arm)) +
  geom_line(aes(group = pid), alpha = 0.5) +
  geom_point(size = 0.5, alpha = 0.5) +
  facet_grid(site ~ arm) + 
  scale_x_continuous("Study weeks", breaks = c(-1:5,7,9,12), minor_breaks = 0:12) +
  scale_y_continuous(
    "Total relative abundances\nof L. crispatus", labels = scales::percent,
    breaks = seq(0, 1, by = 0.2)
  ) +
  expand_limits(y = 0) +
  scale_color_manual(values = arm_colors) +
  guides(col = "none") +
  ggtitle(mae_title(mae))

Code

# g_tot_prop_Lc

Code

g_tot_prop_Lc +
  geom_label(aes(label = pid_nb), size = 2) +
  labs(
    caption = "Participants are labeled by their enrollment order within each arm and site."
  )

Proportions of participants who colonized with 50% L. crispatus in each arm and each visit

We estimated at each visit the proportion of participants whose vaginal microbiota was dominated by L. crispatus (defined as a relative abundance >50%). For each study arm and site, proportions were calculated together with 95% confidence intervals derived from binomial tests.

Code

get_longitudinal_proportions_of_participants_who_colonized <- 
  function(total_rel_ab_of_Lc) {
    total_rel_ab_of_Lc |> 
      filter(!is.na(arm), PIPV) |> 
      group_by(site, arm, study_week) |>
      summarize(
        prop_who_colonized_at = mean(total_Lc_rel_ab > 0.5), 
        CI_low = 
          prop.test(
            x = sum(total_Lc_rel_ab > 0.5), n = n(), 
            conf.level = 0.95)$conf.int[1],
        CI_high = 
          prop.test(
            x = sum(total_Lc_rel_ab > 0.5), n = n(), 
            conf.level = 0.95)$conf.int[2],
        .groups = "drop"
      )
  }

Code

plot_longitudinal_proportions_of_participants_who_colonized <- function(total_rel_ab_of_Lc){
    
    get_longitudinal_proportions_of_participants_who_colonized(total_rel_ab_of_Lc) |>
      ggplot() +
      aes(x = study_week, y = prop_who_colonized_at, col = arm, shape = arm) +
      facet_grid(site ~ .) +
      geom_ribbon(
        aes(ymin = CI_low, ymax = CI_high, fill = arm, colour = arm), 
        alpha = 0.05, size = 0.2 #, color = NA
      ) +
      geom_line(size = 0.35) +
      geom_point() + 
      scale_x_continuous("Study weeks", breaks = c(-1:5,7,9,12), minor_breaks = 0:12) +
      scale_y_continuous(
        "Proportion of participants\n≥ 50% L. crispatus", labels = scales::percent,
        breaks = seq(0, 1, by = 0.2)
      )  +
      expand_limits(y = 0) +
    scale_shape_manual("Arm", values = c(1, 19, 15, 17, 8)) + 
    scale_color_manual(
      "Arm",
      values = arm_colors
    ) +
    scale_fill_manual(
      "Arm",
      values = arm_colors
    )
  }

Code

g_prop_colonized_by <- plot_longitudinal_proportions_of_participants_who_colonized(total_rel_ab_of_Lc)
g_prop_colonized_by # + ggtitle(fig_title)

Code

g_prop_colonized_by <- 
  plot_longitudinal_proportions_of_participants_who_colonized(
    total_rel_ab_of_Lc |> filter(!((arm %in% c("LC106-3", "LC106-o")) & (site == "MGH")))
    )

g_prop_colonized_by + ggtitle(fig_title)

Alternative visualization:

Code

plot_longitudinal_proportions_of_participants_who_colonized_v2 <- 
  function(total_rel_ab_of_Lc){
    
    get_longitudinal_proportions_of_participants_who_colonized(total_rel_ab_of_Lc) |>
      ggplot(aes(x = study_week, y = prop_who_colonized_at, col = arm)) +
      geom_linerange(
        aes(ymin = CI_low, ymax = CI_high, col = arm), 
        alpha = 0.4, lineend = "round", linewidth = 1,
        position = position_dodge(width = 0.5)
      ) +
      geom_line(position = position_dodge(width = 0.5), linewidth = 0.8) +
      geom_point(aes(shape = arm), position = position_dodge(width = 0.5), size = 2) +
      facet_grid(site ~ .) + 
      scale_shape_manual("Arm", values = c(1, 19, 15, 17, 8)) +
      scale_x_continuous("Study weeks", breaks = c(-1:5,7,9,12), minor_breaks = 0:12) +
      scale_y_continuous(
        "Proportion of participants\n≥ 50% L. crispatus", labels = scales::percent,
        breaks = seq(0, 1, by = 0.2)
      ) +
      expand_limits(y = 0) +
      scale_color_manual("Arm", values = arm_colors) +
      scale_fill_manual("Arm", values = arm_colors)
    
  }

Code

g_prop_colonized_by <- plot_longitudinal_proportions_of_participants_who_colonized_v2(total_rel_ab_of_Lc)
g_prop_colonized_by + ggtitle(fig_title)

Code

g_prop_colonized_by <- 
  plot_longitudinal_proportions_of_participants_who_colonized_v2(
    total_rel_ab_of_Lc |> filter(!((arm %in% c("LC106-3", "LC106-o")) & (site == "MGH")))
    )

g_prop_colonized_by + ggtitle(fig_title)

Durability of colonization

We assessed the persistence of colonization among participants who had achieved LBP colonization by week 5. Specifically, we compared colonization status at week 12 to determine the proportion of initially colonized participants who remained colonized at the end of follow-up.

Code

col_by_week5 <- 
  mae@colData |> as_tibble() |> 
  dplyr::select(pid, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |> 
  left_join(
    mae[["col_LBP_mg"]] |> 
      as_tibble() |> 
      dplyr::left_join(
        mae@colData |> as_tibble() |> dplyr::select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |> 
      dplyr::filter(visit_code == "1500", .feature == "colonized_LBP_by_mg")
  ) |> 
  mutate(LBP_colonization_by_week5 = outcome |> replace_na(FALSE)) 


col_at_week12 <- 
  mae@colData |> as_tibble() |> 
  dplyr::select(pid, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |> 
  left_join(
    mae[["col_LBP_mg"]] |> 
      as_tibble() |> 
      dplyr::left_join(
        mae@colData |> as_tibble() |> dplyr::select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |> 
      dplyr::filter(visit_code == "2120", .feature == "colonized_LBP_at_mg")
  ) |> 
  mutate(LBP_colonization_at_week12 = outcome |> replace_na(FALSE)) 


col_still_col <- 
  col_by_week5 |> 
  dplyr::select(pid, site, arm, randomized, ITT, mITT, PP, LBP_colonization_by_week5) |> 
  left_join(
    col_at_week12 |> dplyr::select(pid, LBP_colonization_at_week12 ),
    by = c("pid")
  )

Code

summary_col_still_col <-
  bind_rows(
    # by site and arms
    col_still_col |> 
      filter(LBP_colonization_by_week5) |>
      group_by(site, arm) |> 
      summarize(
        n = n(),  # total with colonization at week 5
        n_success = sum(LBP_colonization_at_week12),  # still colonized at week 12
        .groups = "drop"
      ),
    # combining sites, by arm
    col_still_col |> 
      filter(LBP_colonization_by_week5) |>
      group_by(arm) |> 
      summarize(
        n = n(),  # total with colonization at week 5
        n_success = sum(LBP_colonization_at_week12),  # still colonized at week 12
        .groups = "drop"
      ) |> 
      mutate(site = "CAP & MGH")
  ) |> 
  mutate(
    site = site |> fct_inorder(),
    arm = arm |> fct_expand("All arms", "All active arms")
    )

summary_col_still_col <- 
  summary_col_still_col |> 
  bind_rows(
    # overall by site, all arms
    summary_col_still_col |> 
      group_by(site) |> 
      summarize(
        n = sum(n),  # total with colonization at week 5
        n_success = sum(n_success),  # still colonized at week 12
        .groups = "drop"
      ) |> 
      mutate(arm = "All arms" |> factor(levels = summary_col_still_col$arm |> levels())),
    # overall by site, all active arms
    summary_col_still_col |>
      filter(arm != "Pl") |>
      group_by(site) |> 
      summarize(
        n = sum(n),  # total with colonization at week 5
        n_success = sum(n_success),  # still colonized at week 12
        .groups = "drop"
      )  |> 
      mutate(arm = "All active arms" |> factor(levels = summary_col_still_col$arm |> levels()))
  )

summary_col_still_col <- 
  summary_col_still_col |> 
  mutate(
    p_col_stil_col = n_success / n,
    CI = binom::binom.confint(n_success, n, method = "exact")
  )

Code

table_still_col <- 
  summary_col_still_col |> 
  mutate(
    N_per_site_and_arm = str_c("N = ", n),
    LBP_strain_detected = str_c(n_success, " (", round(p_col_stil_col * 100)," %)"),
    CI_text = str_c(round(CI$lower * 100), "% - ", round(CI$upper * 100), "%")
  ) |> 
  dplyr::select(
    site, arm, N_per_site_and_arm, LBP_strain_detected, CI_text
  ) |> 
  pivot_longer(-c(site, arm)) |>
  mutate(
    name = 
      case_when(
        name == "N_per_site_and_arm" ~ "N participants colonized by week 5",
        name == "LBP_strain_detected" ~ "n (%) participants\nwith LBP strain detected at week 12",
        name == "CI_text" ~ "95% CI"
      )
  ) |>
  arrange(arm) |> 
  pivot_wider(
    id_cols = c(site, name), 
    names_from = arm,
    values_from = value, values_fill = ""
    ) |> 
  arrange(site)

Code

table_still_col |> 
  group_by(site) |> 
  gt(
    caption = "Colonization at week 12 among participants colonized by week 5 by arm", 
    row_group_as_column = TRUE
    ) |> 
  cols_width(
    "name" ~ px(200),
    everything() ~ px(120)
  ) |> 
  cols_align(
    align = "center", 
    columns = c(Pl, `LC106-7`, `LC106-3`, `LC106-o`, `LC115`, `All arms`, `All active arms`)
  ) |> 
  cols_label(
    name = "",
    Pl = "Placebo",
    `LC106-7` = "LC106<br>7 days",
    `LC106-3` = "LC106<br>3 days",
    `LC106-o` = "LC106<br>overlap",
    `LC115` = "LC115<br>7 days",
    `All active arms` = "All active<br>arms",
    .fn = md
  )

Colonization at week 12 among participants colonized by week 5 by arm
		Placebo	LC106 7 days	LC106 3 days	LC106 overlap	LC115 7 days	All arms	All active arms
CAP	N participants colonized by week 5		N = 9	N = 7	N = 7	N = 10	N = 33	N = 33
	n (%) participants with LBP strain detected at week 12		6 (67 %)	3 (43 %)	4 (57 %)	4 (40 %)	17 (52 %)	17 (52 %)
	95% CI		30% - 93%	10% - 82%	18% - 90%	12% - 74%	34% - 69%	34% - 69%
MGH	N participants colonized by week 5	N = 1	N = 6	N = 1	N = 1	N = 6	N = 15	N = 14
	n (%) participants with LBP strain detected at week 12	0 (0 %)	3 (50 %)	1 (100 %)	1 (100 %)	1 (17 %)	6 (40 %)	6 (43 %)
	95% CI	0% - 98%	12% - 88%	3% - 100%	3% - 100%	0% - 64%	16% - 68%	18% - 71%
CAP & MGH	N participants colonized by week 5	N = 1	N = 15	N = 8	N = 8	N = 16	N = 48	N = 47
	n (%) participants with LBP strain detected at week 12	0 (0 %)	9 (60 %)	4 (50 %)	5 (62 %)	5 (31 %)	23 (48 %)	23 (49 %)
	95% CI	0% - 98%	32% - 84%	16% - 84%	24% - 91%	11% - 59%	33% - 63%	34% - 64%

Code

g_long_terme_col <- 
  summary_col_still_col |> 
  filter(!(arm %in% c("Pl", "All arms"))) |> 
  arrange(arm) |> 
  mutate(arm = arm |> str_replace_all(" ", "\n") |> fct_inorder()) |> 
  ggplot() +
  aes(x = arm, y = p_col_stil_col, col = site) +
  geom_linerange(
    aes(ymin = CI$lower, ymax = CI$upper), 
    linewidth = 1.5, alpha = 0.5, lineend = "round",
    position = position_dodge(width = 0.5)
    ) +
  geom_point(position = position_dodge(width = 0.5), size = 3) +
  scale_color_manual("Study site", values = c(site_colors, c("CAP & MGH" = "gray30"))) +
  xlab("") + 
  scale_y_continuous(
    name = "% participants colonized by week 5\nwith LBP strain detected at week 12",
    labels = scales::percent_format(accuracy = 1)
  ) +
  theme(
    panel.grid.major.x = element_blank()
    # axis.text.x = element_text(angle = 45, hjust = 1)
  )

g_long_terme_col

Microbiota trajectories

Longitudinal stack relative abundances plot:

Code

rel_abs <- 
  mae[["mg"]] |> 
  as_tibble() |> 
  dplyr::select(
    .feature, .sample, uid, rel_ab, poor_quality_mg_data,
    taxon_label, genus, LBP, strain_origin
    )

Code

rel_abs <- 
  rel_abs |> 
  group_by(.feature) |> 
  mutate(max_rel_ab = max(rel_ab, na.rm = TRUE), mean_rel_ab = mean(rel_ab, na.rm = TRUE)) |> 
  ungroup() |> 
  arrange(is.na(LBP), -mean_rel_ab) |> 
  mutate(.feature = .feature |> fct_inorder()) |> 
  mutate(
    taxon = 
      case_when(
        as.numeric(.feature) <= 29 ~ taxon_label,
        (genus == "Lactobacillus") ~ "Other Lactobacillus",
        TRUE ~ "Other non-Lacto."
      )
  ) |> 
  arrange(is.na(LBP), LBP,genus != "Lactobacillus", genus, -mean_rel_ab) |> 
  mutate(taxon = taxon |> fct_inorder() |> fct_relevel("Other Lactobacillus", "Other non-Lacto.", "Other", after = Inf))

Code

rel_abs_agg <- 
  rel_abs |> 
  group_by(uid, taxon, LBP, strain_origin, poor_quality_mg_data) |> 
  summarize(rel_ab = sum(rel_ab), .groups = "drop") |> 
  dplyr::left_join(mae |> colData() |> as_tibble(), by = join_by(uid))

Code

tmp <- 
  rel_abs_agg |> 
  mutate(pid_label = ifelse(PP, "", ifelse(mITT, "*", ""))) |> 
  dplyr::select(pid, pid_label) |> 
  distinct() 
pid_label <- setNames(tmp$pid_label, tmp$pid)
rm(tmp)

trajectories_data <- 
  rel_abs_agg |>
  filter(mITT, randomized != 0, PIPV, !poor_quality_mg_data) |>
  group_by(pid) |> 
  mutate(
    tot_LC115 = sum(rel_ab[LBP == "LC115"]),
    tot_LBP = sum(rel_ab[!is.na(LBP)])
    ) |> 
  ungroup() |> 
  arrange(desc(PP), -tot_LC115, -tot_LBP) |> 
  mutate(
    pid = pid |> fct_inorder(),
    visit_label = visit |> as.character() |> fct_reorder(visit |> as.numeric())
    )

Code

g_trajectories <-
  trajectories_data |> 
  ggplot() + 
  aes(y = rel_ab, x = pid, fill = taxon) +
  geom_col() +
  ggh4x::facet_nested(
    rows = vars(visit_label),
    cols = vars(arm, site),
    scales = "free_x", space = "free_x", 
    strip = 
      ggh4x::strip_nested(
        background_x = 
          ggh4x::elem_list_rect(
            fill = c(rep("gray90",5), 
                     rep(c("#FF0000", "#00868B"), 5))
          ), 
        text_x = 
          ggh4x::elem_list_text(
            face = rep("bold", 15),
            colour = c(rep("black", 5), rep("white", 5), "transparent", "white", "transparent", rep("white", 3))
            )              
      )
  ) +
  scale_fill_manual(
    "LBP strain or taxon",
    values = get_taxa_colors(rel_abs_agg$taxon |> levels()),
    labels = get_taxa_labels(rel_abs_agg$taxon |> levels()),
    guide = guide_legend(ncol = 1)
    ) +
  scale_y_continuous(
    "Relative abundance", labels = scales::percent,
    breaks = seq(0, 0.5, by = 0.5), minor_breaks = seq(0, 1, by = 0.25)
  ) +
  scale_x_discrete("Participants", labels = pid_label) +
  theme(
    strip.text.x = element_text(hjust = 0.5),
    axis.text.x = element_text(hjust = 0.5, vjust = 1, size = 15, margin = margin(t = -1)), 
    axis.ticks.x = element_blank(),
    legend.text = element_markdown(),
    legend.key.height = unit(15, "pt")
  )

Code

g_trajectories

Microbiota trajectories of mITT participants in the VIBRANT study. The relative abundances of the LBP strains and top taxa are shown. Participants are ordered by their total relative abundance of LBP strains, and the study weeks are shown in the rows. Stars at the bottom of trajectories indicates participant that were not part of the per-protocol (PP) population.

Code

# checking the data for the placebo participant that has the LBP strains at MGH

mae[["mg"]] |> 
  as_tibble() |> 
  dplyr::left_join(mae@colData |> as_tibble()) |> 
  filter(pid == "068100062", study_week == 5) 


mae[["qPCR"]] |> 
  as_tibble() |> 
  dplyr::left_join(mae@colData |> as_tibble()) |> 
  filter(pid == "068100062", study_week == 5) 

mae[["amplicon"]] |> 
  as_tibble() |> 
  dplyr::left_join(mae@colData |> as_tibble()) |> 
  filter(pid == "068100062", study_week == 5) |> 
  dplyr::select(.feature, .sample, rel_ab, exclude_sample)


mae[["amplicon"]] |> 
  as_tibble() |> 
  dplyr::left_join(mae@colData |> as_tibble()) |> 
  filter(pid == "068100062", .feature == "Lactobacillus_crispatus") |> 
  dplyr::select(.feature, .sample, visit_code, rel_ab, exclude_sample)

Individual LBP strains by sites

Relative abundance of each strain at each visit as estimated by metagenomic sequencing. Strains are organized according to geographical origin, and product(s) in which they are included.

Code

LBP_strains <- 
  mae[["mg"]] |> 
  as_tibble() |> 
  filter(!is.na(LBP)) |> 
  dplyr::select(.feature, .sample, uid, rel_ab, LBP, strain_origin, poor_quality_mg_data) |> 
  left_join(mae |> colData() |> as_tibble()) |> 
  filter(mITT, !poor_quality_mg_data, randomized)

Code

LBP_strains |> 
   mutate(
    visit_label = visit |> as.character() |> fct_reorder(visit |> as.numeric())
    ) |> 
  filter(!is.na(arm), !is.na(study_week), study_week >= 2) |> 
  ggplot() +
  aes(x = .feature, y = rel_ab, col = site, fill = site) +
  ggbeeswarm::geom_quasirandom(alpha = 0.5, size = 0.5, dodge.width = 0.5) +
  ggh4x::facet_nested(
    rows = vars(visit_label),
    cols = vars(LBP, strain_origin),
    scales = "free_x", space = "free_x") + 
  xlab("LBP strain") +
  scale_y_continuous(
    "Relative abundance", labels = scales::percent,
    breaks = seq(0, 1, by = 0.2)
  ) +
  scale_color_manual("Study\nsite", values = site_colors) +
  scale_fill_manual("Study\nsite", values = site_colors) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

Code

g_relative_abundance <- 
  LBP_strains |> 
  mutate(
    visit_label = visit |> as.character() |> fct_reorder(visit |> as.numeric()), 
    strain_origin = recode(strain_origin, "SA" = "ZA strains", "US" = "US strains")
    ) |>
  filter(!is.na(arm), !is.na(study_week), study_week >= 2) |> 
  ggplot() +
  aes(x = .feature, y = rel_ab, col = site, fill = site) +
  geom_boxplot(outlier.size = 0.7, alpha = 0.4) +
  ggh4x::facet_nested(
    rows = vars(visit_label),
    cols = vars(LBP, strain_origin),
    scales = "free_x", space = "free_x") +
  xlab("") +
  scale_y_continuous(
    "Relative abundance", labels = scales::percent,
    breaks = seq(0, 1, by = 0.2)
  ) +
  scale_color_manual(values = site_colors) +
  scale_fill_manual(values = site_colors) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1)
  ) 
g_relative_abundance

Relative abundances of individual LBP strains estimated by metagenomic sequencing across study weeks, stratified by site, strain origin, and strain identity. Each line represents the longitudinal trajectory of one participant; points show visit-specific measurements.

Code

LBP_strains |> 
  filter(!is.na(arm), !is.na(study_week), study_week >= 2) |> 
  mutate(
   strain_origin = recode(strain_origin, "SA" = "ZA strains", "US" = "US strains")
  ) |> 
  ggplot() +
  aes(x = study_week, y = rel_ab, col = strain_origin) +
  geom_point(alpha = 0.5, size = 0.5) +
  geom_line(aes(group = pid), alpha = 0.5) +
  ggh4x::facet_nested(
    rows = vars(site),
    cols = vars(LBP, strain_origin, .feature),
    scales = "free_x", space = "free_x") +
  xlab("Study week") +
  scale_y_continuous(
    "Relative abundance bacteria", labels = scales::percent,
    breaks = seq(0, 1, by = 0.2)
  ) +
  scale_color_manual("Strain origin", values = strain_origin_colors) +
  scale_x_continuous("Study weeks", breaks = c(0:5,7,9,12), minor_breaks = 0:12) +
  theme(
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5),
    legend.position = "bottom"
  )

Proportion colonized among exposed

We estimated the proportion of exposed participants who ever colonized with each LBP strain during follow-up (visits after the first 2 weeks), based on metagenomic relative abundances.

Colonization was defined at the participant level as detection of a given strain at ≥5% relative abundance in at least one post-week 2 sample. Placebo participants were excluded, since they were not exposed to any of the strains.

For LC115 strains, only participants in the LC115 arm were considered exposed, while for LC106 strains all participants in active arms were considered exposed. For each strain and site, we calculated the number of exposed participants, the number who colonized, the corresponding proportion, and 95% confidence intervals using Wilson’s method.

Code

tmp <- 
  # we use the metagenomics data for this
  mae[["mg"]] |> 
  as_tibble() |> 
  filter(!is.na(LBP), !poor_quality_mg_data) |> 
  dplyr::select(.feature, LBP, strain_origin, .sample, uid, rel_ab) |> 
  # add the arm, mITT, etc.
  left_join(
    mae |> colData() |> as_tibble() |> dplyr::select(uid, pid, site, randomized, mITT, arm, visit_code, visit, PIPV), 
    by = join_by(uid)
  ) |>
  # only consider the mITT
  filter(randomized, mITT) |> 
  filter(PIPV, as.numeric(visit_code) >= 1200) |> # only include visits after the first 2 weeks
  # define the outcome of interest for each person and strain
  group_by(.feature, LBP, strain_origin, pid, arm, site) |>
  summarize(
    ever_colonized = any(rel_ab >= 0.05, na.rm = TRUE),
    .groups = "drop"
  ) |> 
  # exclude Placebo participants because, they were, in theory, not exposed to any of the strains.
  filter(arm != "Pl") |> 
  # define the total number of participants exposed for each strain.
  # For LC115 strains, this is only participants of arm LC115
  # For LC106 strains, this is participants from all active arms.
  mutate(
    exposed = case_when(
      LBP == "LC115" ~ (arm == "LC115"),
      LBP == "LC106 & LC115" ~ TRUE,
      TRUE ~ FALSE
    )
  ) |> 
  # we compute the statistics of interest
  group_by(.feature, LBP, strain_origin, site) |> 
  summarize(
    n_exposed = sum(exposed, na.rm = TRUE),
    n_colonized = sum(ever_colonized & exposed, na.rm = TRUE),
    .groups = "drop"
  ) |> 
  mutate(
    prop_colonized = n_colonized / n_exposed,
    CI_low = binom::binom.confint(n_colonized, n_exposed, method = "wilson")$lower,
    CI_high = binom::binom.confint(n_colonized, n_exposed, method = "wilson")$upper
  )

Code

g_colonized_among_exposed <- 
  tmp |> 
  mutate(strain_origin = recode(strain_origin, "SA" = "ZA strains", "US" = "US strains")) |>
  ggplot() +
  aes(x = .feature, y = prop_colonized, col = site) +
   ggh4x::facet_nested(
    cols = vars(LBP, strain_origin),
    scales = "free_x", space = "free_x") +
  geom_errorbar(
    aes(ymin = CI_low, ymax = CI_high), 
    linewidth = 1, alpha = 0.7,
    width = 0.3, position = position_dodge(width = 0.5)
    ) +
  geom_point(position = position_dodge(width = 0.5), size = 2) +
  scale_color_manual("Study site", values = site_colors) +
  xlab("") + 
  ylab("Proportion of participants colonized\namong those exposed") +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

g_colonized_among_exposed

Code

tmp |> 
  mutate(
    strain_origin = recode(strain_origin, "SA" = "ZA strains", "US" = "US strains"),
    CI = str_c(
      "[", 
      scales::percent(CI_low, accuracy = 1), 
      " - ", 
      scales::percent(CI_high, accuracy = 1), 
      "]"
    ),
    value = 
      str_c(
        scales::percent(prop_colonized, accuracy = 1), 
        " (", n_colonized, "/", n_exposed,")<br>",
       CI
      ),
    strain_info = str_c(LBP, "<br>(", strain_origin, " strains)")
  ) |> 
  dplyr::select(-n_exposed, -n_colonized, -prop_colonized, -CI_low, -CI_high, -CI, -LBP, -strain_origin) |> 
  pivot_wider(
    names_from = c(site), 
    values_from = value
  ) |>
  arrange(strain_info) |> 
  group_by(strain_info) |> 
  gt(
    row_group_as_column = TRUE, 
    process_md = TRUE,
    caption = "Proportion of mITT participants exposed to a given strain who colonized with that strain (colonization defined as a ≥ 5% relative abundance by metagenomics) at any weekly visit after product exposure."
  ) |> 
  cols_label(.feature = "LBP strain") |> 
  cols_align(columns = .feature, align = "left") |> 
  cols_align(columns = c("CAP", "MGH"), align = "center") |>
  fmt_markdown(columns = everything())

Proportion of mITT participants exposed to a given strain who colonized with that strain (colonization defined as a ≥ 5% relative abundance by metagenomics) at any weekly visit after product exposure.
	LBP strain	CAP	MGH
LC106 & LC115 (US strains strains)	C0022A1	25% (14/56) [16% - 38%]	47% (7/15) [25% - 70%]
	C0059E1	61% (34/56) [48% - 72%]	87% (13/15) [62% - 96%]
	C0175A1	39% (22/56) [28% - 52%]	67% (10/15) [42% - 85%]
LC106 & LC115 (ZA strains strains)	FF00018	7% (4/56) [3% - 17%]	20% (3/15) [7% - 45%]
	FF00051	30% (17/56) [20% - 43%]	67% (10/15) [42% - 85%]
	UC101	25% (14/56) [16% - 38%]	73% (11/15) [48% - 89%]
LC115 (US strains strains)	C0006A1	50% (7/14) [27% - 73%]	67% (4/6) [30% - 90%]
	C0028A1	0% (0/14) [0% - 22%]	0% (0/6) [0% - 39%]
	C0112A1	64% (9/14) [39% - 84%]	83% (5/6) [44% - 97%]
LC115 (ZA strains strains)	122010	57% (8/14) [33% - 79%]	33% (2/6) [10% - 70%]
	185329	0% (0/14) [0% - 22%]	0% (0/6) [0% - 39%]
	FF00004	50% (7/14) [27% - 73%]	83% (5/6) [44% - 97%]
	FF00064	50% (7/14) [27% - 73%]	83% (5/6) [44% - 97%]
	FF00072	7% (1/14) [1% - 31%]	50% (3/6) [19% - 81%]
	UC119	0% (0/14) [0% - 22%]	0% (0/6) [0% - 39%]

Excluding immediate post-product visit

Same analyze as above but excluding the week 2 visit for the LC106-7 and LC115 arms, since this visit is immediately after product use.

Code

tmp <- 
  # we use the metagenomics data for this
  mae[["mg"]] |> 
  as_tibble() |> 
  filter(!is.na(LBP), !poor_quality_mg_data) |> 
  dplyr::select(.feature, LBP, strain_origin, .sample, uid, rel_ab) |> 
  # add the arm, mITT, etc.
  left_join(
    mae |> colData() |> as_tibble() |> 
      dplyr::select(uid, pid, site, randomized, mITT, arm, visit_code, visit, PIPV), 
    by = join_by(uid)
  ) |>
  # only consider the mITT
  filter(randomized, mITT) |> 
  mutate(
    include_visit = 
      case_when(
        (arm %in% c("LC106-7", "LC115")) & as.numeric(visit_code) > 1200 ~ TRUE,
        (arm %in% c("LC106-3", "LC106-o")) & as.numeric(visit_code) >= 1200 ~TRUE,
        TRUE ~ FALSE
      )
  ) |> 
  filter(PIPV, include_visit) |> # only include visits after the first 2 weeks
  # define the outcome of interest for each person and strain
  group_by(.feature, LBP, strain_origin, pid, arm, site) |>
  summarize(
    ever_colonized = any(rel_ab >= 0.05, na.rm = TRUE),
    .groups = "drop"
  ) |> 
  # exclude Placebo participants because, they were, in theory, not exposed to any of the strains.
  filter(arm != "Pl") |> 
  # define the total number of participants exposed for each strain.
  # For LC115 strains, this is only participants of arm LC115
  # For LC106 strains, this is participants from all active arms.
  mutate(
    exposed = case_when(
      LBP == "LC115" ~ (arm == "LC115"),
      LBP == "LC106 & LC115" ~ TRUE,
      TRUE ~ FALSE
    )
  ) |> 
  # we compute the statistics of interest
  group_by(.feature, LBP, strain_origin, site) |> 
  summarize(
    n_exposed = sum(exposed, na.rm = TRUE),
    n_colonized = sum(ever_colonized & exposed, na.rm = TRUE),
    .groups = "drop"
  ) |> 
  mutate(
    prop_colonized = n_colonized / n_exposed,
    CI_low = binom::binom.confint(n_colonized, n_exposed, method = "wilson")$lower,
    CI_high = binom::binom.confint(n_colonized, n_exposed, method = "wilson")$upper
  )

Code

g_colonized_among_exposed_supp <- 
  tmp |> 
  mutate(strain_origin = recode(strain_origin, "SA" = "ZA strains", "US" = "US strains")) |>
  ggplot() +
  aes(x = .feature, y = prop_colonized, col = site) +
  facet_grid(. ~ LBP + strain_origin, scales = "free", space = "free") +
  geom_errorbar(
    aes(ymin = CI_low, ymax = CI_high), 
    linewidth = 1, alpha = 0.7,
    width = 0.3, position = position_dodge(width = 0.5)
    ) +
  geom_point(position = position_dodge(width = 0.5), size = 2) +
  scale_color_manual("Study site", values = site_colors) +
  xlab("LBP strain") + 
  ylab("Proportion of participants colonized\namong those exposed") +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

g_colonized_among_exposed_supp

Code

tmp |> 
  mutate(
    strain_origin = recode(strain_origin, "SA" = "ZA strains", "US" = "US strains"),
    CI = str_c(
      "[", 
      scales::percent(CI_low, accuracy = 1), 
      " - ", 
      scales::percent(CI_high, accuracy = 1), 
      "]"
    ),
    value = 
      str_c(
        scales::percent(prop_colonized, accuracy = 1), 
        " (", n_colonized, "/", n_exposed,")<br>",
       CI
      ),
    strain_info = str_c(LBP, "<br>(", strain_origin, " strains)")
  ) |> 
  dplyr::select(-n_exposed, -n_colonized, -prop_colonized, -CI_low, -CI_high, -CI, -LBP, -strain_origin) |> 
  pivot_wider(
    names_from = c(site), 
    values_from = value
  ) |>
  arrange(strain_info) |> 
  group_by(strain_info) |> 
  gt(
    row_group_as_column = TRUE, 
    process_md = TRUE,
    caption = "Proportion of mITT participants exposed to a given strain who colonized with that strain (colonization defined as a ≥ 5% relative abundance by metagenomics) at any weekly visit after product exposure but excluding immediate post-product visit (i.e., excluding week 2 visit for the LC106-7 and LC115 arms)."
  ) |> 
  cols_label(.feature = "LBP strain") |> 
  cols_align(columns = .feature, align = "left") |> 
  cols_align(columns = c("CAP", "MGH"), align = "center") |>
  fmt_markdown(columns = everything())

Proportion of mITT participants exposed to a given strain who colonized with that strain (colonization defined as a ≥ 5% relative abundance by metagenomics) at any weekly visit after product exposure but excluding immediate post-product visit (i.e., excluding week 2 visit for the LC106-7 and LC115 arms).
	LBP strain	CAP	MGH
LC106 & LC115 (US strains strains)	C0022A1	21% (12/56) [13% - 34%]	33% (5/15) [15% - 58%]
	C0059E1	43% (24/56) [31% - 56%]	40% (6/15) [20% - 64%]
	C0175A1	39% (22/56) [28% - 52%]	67% (10/15) [42% - 85%]
LC106 & LC115 (ZA strains strains)	FF00018	2% (1/56) [0% - 9%]	7% (1/15) [1% - 30%]
	FF00051	9% (5/56) [4% - 19%]	20% (3/15) [7% - 45%]
	UC101	2% (1/56) [0% - 9%]	7% (1/15) [1% - 30%]
LC115 (US strains strains)	C0006A1	0% (0/14) [0% - 22%]	0% (0/6) [0% - 39%]
	C0028A1	0% (0/14) [0% - 22%]	0% (0/6) [0% - 39%]
	C0112A1	0% (0/14) [0% - 22%]	0% (0/6) [0% - 39%]
LC115 (ZA strains strains)	122010	36% (5/14) [16% - 61%]	17% (1/6) [3% - 56%]
	185329	0% (0/14) [0% - 22%]	0% (0/6) [0% - 39%]
	FF00004	0% (0/14) [0% - 22%]	0% (0/6) [0% - 39%]
	FF00064	0% (0/14) [0% - 22%]	0% (0/6) [0% - 39%]
	FF00072	0% (0/14) [0% - 22%]	0% (0/6) [0% - 39%]
	UC119	0% (0/14) [0% - 22%]	0% (0/6) [0% - 39%]

Supplementary analyses

Applicator staining

We analyze returned applicators and staining results to assess adherence to study product use. We use data from participant_crfs_merged stored in the @metadata slot of the mae object, which contains data from CRF23 (returned applicators) and CRF46 (staining results). We also use the exposures table to obtain information on study product use.

From these, we create the applicator table, which includes only participants from the mITT population.

Code

tmp <- 
   metadata(mae)$participant_crfs_merged |> 
   as_tibble() |> 
   select(pid, used_applicators, number_applicators, applicator_stain_positive, applicator_stain_negative, applicator_stain_indeterminant, total_applicators_stained, proportion_positive_applicators, total) |> 
   dplyr::rename(
     returned_used_applicators = used_applicators,
     number_of_returned_applicators = number_applicators
   )

study_product <- 
  mae@colData |> 
  as_tibble() |> 
  select(pid, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |> 
  left_join(
    mae@metadata$calc_vars_pid |> select(pid, n_product_doses),  
    by = "pid"
  )

applicator <- 
  mae@colData |> as_tibble() |>
  select(pid, site, randomized, arm, ITT, mITT, PP) |>
  distinct() |>
  filter(randomized, mITT) |>
  left_join(tmp, by = c("pid")) |>
  left_join(study_product, by = "pid")

We note that two participants returned eight applicators. In both cases, comments explain that they had received a spare applicator:

Participant 068-20-0425 reported taking the spare applicator out of its wrapping and placing it in the return bag without using or washing any applicators.
Participant 068-20-0439 reported inserting the spare applicator on the day of the visit, thinking all applicators had to be used, but also did not wash any of them.

Returned applicator statistics

Here, we want to document the proportions of mITT participants who

returned any used applicators
returned their complete set of applicators (i.e., 7 or 8 applicators)

Code

applicator <- 
  applicator |> 
  mutate(
    returned_complete_set = (number_of_returned_applicators >= 7),
    return_category = case_when(
      (returned_used_applicators == "Yes") & returned_complete_set ~ "Returned complete set",
      (returned_used_applicators == "Yes") & !returned_complete_set ~ "Returned incomplete set",
      returned_used_applicators == "No" ~ "Did not return applicators",
      TRUE ~ "Missing"
    ) |> fct_infreq()
  ) 

applicator |> 
  dplyr::count(return_category) |> 
  mutate(`%` = round(n / sum(n), 3) * 100) |> 
  gt() |> 
  cols_label(return_category = "")

	n	%
Returned complete set	77	85.6
Returned incomplete set	7	7.8
Did not return applicators	6	6.7

Staining statistics

Here, we want to document, for those who returned their complete set of applicators, how many got their full set stained.

Code

applicator <- 
  applicator |> 
  mutate(
    staining_done_category = 
      case_when(
        (number_of_returned_applicators == 0) | is.na(number_of_returned_applicators) ~ "No applicators returned",
        is.na(total_applicators_stained) ~ "No staining done",
        total_applicators_stained == number_of_returned_applicators ~ "All returned applicators stained",
        total_applicators_stained < number_of_returned_applicators ~ "Partial staining",
        total_applicators_stained > number_of_returned_applicators ~ "???",
        TRUE ~ "???"
      )
  )

Code

applicator |> 
  dplyr::count(return_category, staining_done_category) |> 
  group_by(return_category) |>
  mutate(
    `%` = round(n / sum(n), 3) * 100
  ) |>
  gt(row_group_as_column = TRUE)

	staining_done_category	n	%
Returned complete set	All returned applicators stained	77	100.0
Returned incomplete set	All returned applicators stained	6	85.7
Returned incomplete set	Partial staining	1	14.3
Did not return applicators	No applicators returned	6	100.0

Code

applicator |> 
  dplyr::count(return_category, staining_done_category) |> 
  group_by(return_category) |>
  mutate(
    `%` = round(n / sum(n), 3) * 100
  ) |>
  gt(row_group_as_column = TRUE)

	staining_done_category	n	%
Returned complete set	All returned applicators stained	77	100.0
Returned incomplete set	All returned applicators stained	6	85.7
Returned incomplete set	Partial staining	1	14.3
Did not return applicators	No applicators returned	6	100.0

Positive staining statistics

Here, we document, for the mITT participants who returned their complete set and got all of their applicators stained, the number of positively stained applicators.

Code

applicator |> 
  mutate(
    positive_staining_category = 
      ifelse(
      applicator_stain_positive == total_applicators_stained,
      "All stained applicators are positive",
      "Some stained applicators are not positive"
      )
  ) |> 
  dplyr::count(return_category, staining_done_category, positive_staining_category) |> 
  group_by(return_category, staining_done_category) |> 
  mutate(`%` = round(n / sum(n), 3) * 100) |>
  gt(row_group_as_column = TRUE) |>
  cols_label(
    positive_staining_category = "Staining result"
  )

	Staining result	n	%
Returned complete set - All returned applicators stained	All stained applicators are positive	61	79.2
Returned complete set - All returned applicators stained	Some stained applicators are not positive	16	20.8
Returned incomplete set - All returned applicators stained	All stained applicators are positive	5	83.3
Returned incomplete set - All returned applicators stained	Some stained applicators are not positive	1	16.7
Returned incomplete set - Partial staining	All stained applicators are positive	1	100.0
Did not return applicators - No applicators returned	NA	6	100.0

Code

applicator |> 
  filter(
    return_category == "Returned complete set",
    staining_done_category == "All returned applicators stained"
  ) |> 
  dplyr::count(
    return_category,
    proportion_positive_applicators, 
    str_c(100*proportion_positive_applicators, "%")
  ) |> 
  mutate(`%` = round(n / sum(n), 3) * 100) |>
  select(-proportion_positive_applicators) |> 
  group_by(return_category) |> 
  gt(row_group_as_column = TRUE) |> 
  cols_label(
    `str_c(100 * proportion_positive_applicators, "%")` = "% of positive applicators"
  )

	% of positive applicators	n	%
Returned complete set	0%	1	1.3
	57%	1	1.3
	71%	5	6.5
	86%	8	10.4
	88%	1	1.3
	100%	61	79.2

Code

applicator |> 
  filter(
    return_category == "Returned complete set",
    staining_done_category == "All returned applicators stained"
  ) |> 
  mutate(
    number_of_non_positive_applicators = 
      total_applicators_stained - applicator_stain_positive
  ) |> 
  dplyr::count(
    return_category, number_of_non_positive_applicators
  ) |> 
  mutate(`%` = round(n / sum(n), 3) * 100) |>
  group_by(return_category) |> 
  gt(row_group_as_column = TRUE) |> 
  cols_label(
    `number_of_non_positive_applicators` = "n non-positive"
  )

	n non-positive	n	%
Returned complete set	0	61	79.2
	1	9	11.7
	2	5	6.5
	3	1	1.3
	7	1	1.3

Comparison with declared number of doses taken

Code

applicator |> 
  filter(
    return_category == "Returned complete set",
    staining_done_category == "All returned applicators stained"
  ) |>
  mutate(
    applicator_stain_positive_fct = applicator_stain_positive |> as.factor() |> fct_expand("Missing") |> replace_na("Missing"),
    n_product_doses_fct = n_product_doses |> as.factor()
  ) |>
  dplyr::count(
    n_product_doses, n_product_doses_fct, 
    applicator_stain_positive, applicator_stain_positive_fct
    ) |>
  mutate(
    label = str_c(n, " (", round(100 * n / sum(n)), "%)") 
  ) |> 
  ggplot() + 
  aes(
    x = n_product_doses_fct , 
    y = applicator_stain_positive_fct, 
    fill = applicator_stain_positive >= n_product_doses,
    alpha = n
  ) +
  geom_tile() +
  geom_label(aes(label = label), size = 3) +
  scale_fill_manual(na.value = "gray70", values = c("tomato", "steelblue2")) +
  # scale_fill_gradient(low = "lightblue1", high = "lightblue4") +
  guides(fill = "none", alpha = "none") +
  xlab("Number of doses taken (self-report)") +
  ylab("Number of positive applicators") +
  labs(
    caption = "Participants who returned their complete set of applicators were included in this analysis"
  )

Interval between last dose and week 2 clinic visit in active arms

Code

tmp <- 
  mae@metadata$calc_vars_pid |>
  left_join(
    mae@colData |>
      as.data.frame() |> 
      select(pid, site, arm, randomized, mITT) |> 
      distinct(),
    by = join_by(pid)
  ) |> 
  filter(randomized, mITT) 

tmp |> 
  filter(arm != "Pl") |> 
  dplyr::count(site, arm, last_dose_at_W2) |> 
  pivot_wider(
    id_cols = c(site, last_dose_at_W2), 
    names_from = arm,
    values_from = n, values_fill = 0
  ) |> 
  group_by(site) |> 
  gt()

last_dose_at_W2	LC106-7	LC106-3	LC106-o	LC115
CAP
same day	1	0	0	0
day before	10	12	0	13
more than one day	3	2	14	1
MGH
same day	1	0	0	0
day before	5	1	0	5
more than one day	1	0	1	1

Code

tmp |> 
  filter(arm != "Pl") |>
  dplyr::count(last_dose_at_W2) |>
  gt()

last_dose_at_W2	n
same day	2
day before	46
more than one day	23

Code

rm(tmp)

Dominance of LBP strains at visits remote from dosing

To assess whether colonization typically reflected dominance, we examined the composition of samples from participants in active arms who achieved the primary outcome at visits conducted after product use (weeks 3–5 for LC106-7 and LC115; weeks 2–5 for LC106-o and LC106-3).

For each colonized sample, we calculated the total relative abundance of LBP strains based on metagenomic data. We then determined the proportion of samples in which LBP strains accounted for at least 50% of the community.

Code

col_at <- 
  mae@colData |> as_tibble() |>
  select(pid, site, randomized, arm, ITT, mITT, PP) |>
  distinct() |>
  filter(randomized, mITT, arm != "Pl") |>
  left_join(
    mae[["col_LBP_mg"]] |>
      as_tibble() |>
      dplyr::left_join(
        mae@colData |> as_tibble() |> select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      )
  ) |>
  dplyr::filter(
    .feature == "colonized_LBP_at_mg" &
      (arm %in% c("LC106-7", "LC115") & visit_code %in% c("1300", "1400", "1500") |
       arm %in% c("LC106-o", "LC106-3") & visit_code %in% c("1200", "1300", "1400", "1500")
      )
  ) |>
  filter(outcome) # only primary outcome achieved 

col_at |> 
  left_join(mae[["mg"]] |> as_tibble(), by = ".sample") |> 
  filter(!is.na(LBP)) |> # only LBP strain
  group_by(.sample) |> 
  summarise(
    prop_tot_LBP = sum(rel_ab, na.rm = TRUE),
    .groups = "drop"
  ) |> 
  dplyr::count(prop_tot_LBP >= 0.5) |> 
  mutate(prop = 100 * round(n / sum(n), 3)) |> 
  gt() |> 
  cols_label(
    `prop_tot_LBP >= 0.5` = "LBP strains ≥ 50% total rel. ab",
    prop = "%"
  )

LBP strains ≥ 50% total rel. ab	n	%
FALSE	13	13.3
TRUE	85	86.7

LBP colonization in participants with BV post-MTZ

We assessed treatment response by Nugent score at the first visit following oral metronidazole treatment (visit 1100). Nugent scores were extracted from visits_crfs_merged and a binary variable was created to classify participants as BV positive (nugent_total_score ≥ 7) or BV negative (< 7).

We then estimated the number and proportion of participants who remained BV positive post-metronidazole, overall and by treatment arm.

Code

# nugent score for all visits 

nugent_score <- 
  mae@colData |> 
  as_tibble() |> 
  select(uid, pid, visit_code, study_day, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |> 
  left_join(
    metadata(mae)$visits_crfs_merged |> 
      as_tibble() |> 
      select(pid, visit_code, nugent_total_score),
    by = c("pid", "visit_code")
  ) |> 
  # some pid x visit are duplicate but there is no data for the nugent score, so we remove the duplicate visit with no data for nugent score
  group_by(pid, visit_code) |> 
  arrange(is.na(nugent_total_score)) |>  
  dplyr::slice(1) |>  
  ungroup() |>
  group_by(visit_code) |> 
  mutate(any_Nugent = any(!is.na(nugent_total_score))) |> 
  ungroup() |> 
  filter(any_Nugent) |> 
  group_by(pid) |>
  mutate(
    active_arm = arm != "Pl",
    nugent_positive = nugent_total_score >= 7
  ) |> 
  ungroup()

Code

nugent_score |> 
  ggplot()+ 
  aes(x = visit_code, y = pid, fill = nugent_total_score) +
  geom_tile() +
  scale_fill_gradient(
    name = "Nugent score",
    low = "lightblue1", high = "red3"
  ) + 
  labs(x = "Visit Code", y = "Participant ID") + 
  theme(axis.text.y = element_text(size = 4))

Code

nugent_score |>
  filter(visit_code == "1100") |> 
  dplyr::count(nugent_positive) |> 
  gt(caption = "Number of mITT participant with BV (nugent_positive == TRUE) post-MTZ (visit 1100)")

Number of mITT participant with BV (nugent_positive == TRUE) post-MTZ (visit 1100)
nugent_positive	n
FALSE	68
TRUE	18
NA	4

Code

nugent_score |>
  filter(visit_code == "1100") |> 
  dplyr::count(active_arm, nugent_positive) |> 
  gt(caption = "Number of mITT participant with BV (nugent_positive == TRUE) post-MTZ (visit 1100) by active arm")

Number of mITT participant with BV (nugent_positive == TRUE) post-MTZ (visit 1100) by active arm
active_arm	nugent_positive	n
FALSE	FALSE	11
FALSE	TRUE	6
FALSE	NA	2
TRUE	FALSE	57
TRUE	TRUE	12
TRUE	NA	2

At visit 1100, among participants with BV who were exposed to the LBP (i.e., in the active arm), 5 participants had a detectable LBP strain identified after treatment by metagenomics.

Code

# among the 12 of these people exposed to LBP

tmp <- 
  nugent_score |>
  filter(visit_code == "1100") |> 
  filter(active_arm) |>
  left_join(
    # add mg data to see how many had detectable LBP strain post-MTZ
    mae[["mg"]] |> as_tibble() |>  # add mg data
      dplyr::left_join(
        mae@colData |> as_tibble() |> select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |>
      filter(visit_code == "1200", !poor_quality_mg_data)|>
      filter(!is.na(LBP)) |> #only LBP strain
      group_by(pid) |>
      summarise(prop_tot_LBP = sum(rel_ab)) |>
      arrange(prop_tot_LBP |> desc()) |> 
      mutate(LBP_detected_post_INT = prop_tot_LBP >= 0.05),
    by = c("pid")
  )  

tmp |> 
  dplyr::count(nugent_positive, LBP_detected_post_INT) |> 
  gt(caption = "Number of mITT participant in one of the active arms with BV diagnosis post-MTZ (visit 1100) and LBP detection (>= 5% total LBP rel. ab) post-INT (visit 1200)")

Number of mITT participant in one of the active arms with BV diagnosis post-MTZ (visit 1100) and LBP detection (>= 5% total LBP rel. ab) post-INT (visit 1200)
nugent_positive	LBP_detected_post_INT	n
FALSE	FALSE	19
FALSE	TRUE	36
FALSE	NA	2
TRUE	FALSE	6
TRUE	TRUE	5
TRUE	NA	1
NA	TRUE	2

rBV by colonization status

In this section, we are investigating whether LBP colonization is protective of rBV.

We only include participants who were BV-cured post-MTZ, then count the number and proportions of participants who had rBV among those who met the primary outcome and those who did not.

Code

rBV_stats <- 
  # we first define for each mITT participant whether they were BV-cured or not post-MTZ by Nugent
  nugent_score |> 
  filter(visit_code == "1100", mITT) |> 
  select(uid, pid, visit_code, site, mITT, active_arm, arm, nugent_total_score) |> 
  mutate(BV_cure_post_MTZ = nugent_total_score < 7) |> 
  # we then specify for each of these participants whether they met the criteria for the primary endpoint of LBP colonization by week 5
  left_join(
    mae[["col_LBP_mg"]] |> 
      as_tibble() |> 
      dplyr::left_join(
        mae@colData |> as_tibble() |> select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |> 
      dplyr::filter(visit_code == "1500", .feature == "colonized_LBP_by_mg") |> 
      mutate(LBP_colonization_by_week5 = outcome |> replace_na(FALSE)) |> 
      select(pid, LBP_colonization_by_week5),
    by = join_by(pid)
  ) |> 
  # we next check who had BV (or rBV) by week 12
  left_join(
    nugent_score |> 
      select(pid, visit_code, nugent_positive) |> 
      filter(as.numeric(visit_code) > 1100, as.numeric(visit_code) <= 2120) |> 
      group_by(pid) |> 
      summarize(
        BV_by_week12 = any(nugent_positive, na.rm = TRUE),
        BV_after_week5 = any(nugent_positive & as.numeric(visit_code) > 1500, na.rm = TRUE), # any BV after week 5
        ),
    by = join_by(pid)
  ) |> 
  mutate(
    rBV_by_week12 = BV_by_week12 & BV_cure_post_MTZ # rBV if BV by week 12 and cured post-MTZ
  )
  
  
rBV_stats |> 
  filter(active_arm) |> 
  dplyr::count(active_arm, BV_cure_post_MTZ, LBP_colonization_by_week5, BV_by_week12)

# A tibble: 10 × 5
   active_arm BV_cure_post_MTZ LBP_colonization_by_week5 BV_by_week12     n
   <lgl>      <lgl>            <lgl>                     <lgl>        <int>
 1 TRUE       FALSE            FALSE                     FALSE            1
 2 TRUE       FALSE            FALSE                     TRUE             6
 3 TRUE       FALSE            TRUE                      FALSE            2
 4 TRUE       FALSE            TRUE                      TRUE             3
 5 TRUE       TRUE             FALSE                     FALSE            5
 6 TRUE       TRUE             FALSE                     TRUE            12
 7 TRUE       TRUE             TRUE                      FALSE           27
 8 TRUE       TRUE             TRUE                      TRUE            13
 9 TRUE       NA               TRUE                      FALSE            1
10 TRUE       NA               TRUE                      TRUE             1

Code

rBV_stats |> 
  filter(active_arm, BV_cure_post_MTZ) |> 
  dplyr::count(LBP_colonization_by_week5, BV_by_week12) |> 
  group_by(LBP_colonization_by_week5) |> 
  mutate(
    N = sum(n),
    `%` = round(n / N, 3) * 100,
    CI = binom::binom.confint(n, N, method = "exact"),
    `95% CI` = str_c(round(CI$lower * 100, 1), "%–", round(CI$upper * 100, 1), "%")
  ) |> 
  select(-CI)

# A tibble: 4 × 6
# Groups:   LBP_colonization_by_week5 [2]
  LBP_colonization_by_week5 BV_by_week12     n     N   `%` `95% CI`   
  <lgl>                     <lgl>        <int> <int> <dbl> <chr>      
1 FALSE                     FALSE            5    17  29.4 10.3%–56%  
2 FALSE                     TRUE            12    17  70.6 44%–89.7%  
3 TRUE                      FALSE           27    40  67.5 50.9%–81.4%
4 TRUE                      TRUE            13    40  32.5 18.6%–49.1%

Code

rBV_stats <- 
  # we first define for each mITT participant whether they were BV-cured or not post-MTZ by Nugent
  nugent_score |> 
  select(-any_Nugent, -nugent_positive) |> 
  filter(visit_code == "1100", mITT) |> 
  mutate(BV_cure_post_MTZ = nugent_total_score < 7) |> 
  # we then specify for each of these participants whether they met the criteria for the primary endpoint of LBP colonization by week 5
  left_join(
    mae[["col_LBP_mg"]] |> 
      as_tibble() |> 
      dplyr::left_join(
        mae@colData |> as_tibble() |> select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |> 
      dplyr::filter(visit_code == "1500", .feature == "colonized_LBP_by_mg") |> 
      mutate(LBP_colonization_by_week5 = outcome |> replace_na(FALSE)) |> 
      select(pid, LBP_colonization_by_week5),
    by = join_by(pid)
  ) |> 
  # we next check who had BV (or rBV) by week 12
  left_join(
    nugent_score |> 
      select(pid, visit_code, nugent_positive) |> 
      filter(as.numeric(visit_code) > 1100, as.numeric(visit_code) <= 2120) |> 
      group_by(pid) |> 
      summarize(
        BV_by_week12 = any(nugent_positive, na.rm = TRUE),
        BV_after_week5 = any(nugent_positive & as.numeric(visit_code) > 1500, na.rm = TRUE), # any BV after week 5
        ),
    by = join_by(pid)
  ) |> 
  mutate(
    rBV_by_week12 = BV_by_week12 & BV_cure_post_MTZ, # rBV if BV by week 12 and cured post-MTZ
  )
  
  
rBV_stats |> filter(active_arm) |> dplyr::count(active_arm, LBP_colonization_by_week5, BV_after_week5)

# A tibble: 4 × 4
  active_arm LBP_colonization_by_week5 BV_after_week5     n
  <lgl>      <lgl>                     <lgl>          <int>
1 TRUE       FALSE                     FALSE             11
2 TRUE       FALSE                     TRUE              13
3 TRUE       TRUE                      FALSE             37
4 TRUE       TRUE                      TRUE              10

Code

col_by_week5 <- 
  mae@colData |> as_tibble() |> 
  select(pid, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |> 
  left_join(
    mae[["col_LBP_mg"]] |> 
      as_tibble() |> 
      dplyr::left_join(
        mae@colData |> as_tibble() |> select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |> 
      dplyr::filter(visit_code == "1500", .feature == "colonized_LBP_by_mg")
  ) |> 
  mutate(LBP_colonization_by_week5 = outcome |> replace_na(FALSE)) 


nugent_score |> 
  left_join(col_by_week5 |> select(pid, LBP_colonization_by_week5), 
            by = "pid") |> 
  arrange(pid, visit_code) |>
  group_by(pid, active_arm) |>
  summarise(
    LBP_colonization_by_week5 = any(LBP_colonization_by_week5, na.rm = TRUE),
    BV_recurrence = {
      cured <- cumsum(!nugent_positive & !is.na(nugent_positive)) > 0
      any(nugent_positive & cured, na.rm = TRUE)
    },
    .groups = "drop"
  ) |> 
  dplyr::count(active_arm, LBP_colonization_by_week5, BV_recurrence)

# A tibble: 7 × 4
  active_arm LBP_colonization_by_week5 BV_recurrence     n
  <lgl>      <lgl>                     <lgl>         <int>
1 FALSE      FALSE                     FALSE             9
2 FALSE      FALSE                     TRUE              9
3 FALSE      TRUE                      TRUE              1
4 TRUE       FALSE                     FALSE             8
5 TRUE       FALSE                     TRUE             16
6 TRUE       TRUE                      FALSE            32
7 TRUE       TRUE                      TRUE             15

Code

nugent_score |> 
  filter(visit_code == "1500") |> 
  group_by(active_arm) |> 
  summarise(
    n = n(),
    n_positive = sum(nugent_positive, na.rm = TRUE),
    p = n_positive / n,
    CI = binom::binom.confint(n_positive, n, method = "exact")
  ) |>
  mutate(
    pct = round(p * 100, 1),
    CI_text = str_c("(", round(CI$lower * 100, 1), "%–", round(CI$upper * 100, 1), "%)")
  ) |> 
  select(-CI, -p) |> 
  gt() |> 
  tab_header(
    title = "Proportion of Participants with Nugent Score ≥ 7 at Visit 1500 (week 5)"
  ) |> 
  cols_label(
    n = "N participants",
    n_positive = "n (%) participants with nugent score >= 7",
    pct = "Percentage",
    CI_text = "95% CI"
  )

Proportion of Participants with Nugent Score ≥ 7 at Visit 1500 (week 5)
active_arm	N participants	n (%) participants with nugent score >= 7	Percentage	95% CI
FALSE	17	13	76.5	(50.1%–93.2%)
TRUE	71	25	35.2	(24.2%–47.5%)

Code

  nugent_score |> 
  filter(visit_code == "2120") |> 
  group_by(active_arm) |> 
  summarise(
    n = n(),
    n_positive = sum(nugent_positive, na.rm = TRUE),
    p = n_positive / n,
    CI = binom::binom.confint(n_positive, n, method = "exact")
  ) |>
  mutate(
    pct = round(p * 100, 1),
    CI_text = str_c("(", round(CI$lower * 100, 1), "%–", round(CI$upper * 100, 1), "%)")
  ) |> 
  select(-CI, -p) |> 
  gt() |> 
  tab_header(
    title = "Proportion of Participants with Nugent Score ≥ 7 at Visit 2120 (week12)"
  ) |> 
  cols_label(
    n = "N participants",
    n_positive = "n (%) participants with nugent score >= 7",
    pct = "Percentage",
    CI_text = "95% CI"
  )

Proportion of Participants with Nugent Score ≥ 7 at Visit 2120 (week12)
active_arm	N participants	n (%) participants with nugent score >= 7	Percentage	95% CI
FALSE	17	12	70.6	(44%–89.7%)
TRUE	69	23	33.3	(22.4%–45.7%)

Code

nugent_score |> 
  filter(visit_code == "2120") |> 
  left_join(col_by_week5 |> select(pid, LBP_colonization_by_week5), by="pid") |> 
  group_by(LBP_colonization_by_week5) |> 
  summarise(
    n = n(),
    n_positive = sum(nugent_positive, na.rm = TRUE),
    p = n_positive / n,
    CI = binom::binom.confint(n_positive, n, method = "exact")
  ) |>
  mutate(
    pct = round(p * 100, 1),
    CI_text = str_c("(", round(CI$lower * 100, 1), "%–", round(CI$upper * 100, 1), "%)")
  ) |> 
  select(-CI, -p) |> 
  gt() |> 
  tab_header(
    title = "Proportion of Participants with Nugent Score ≥ 7 at Visit 2120 (week12)"
  ) |> 
  cols_label(
    n = "N participants",
    n_positive = "n (%) participants with nugent score >= 7",
    pct = "Percentage",
    CI_text = "95% CI"
  )

Proportion of Participants with Nugent Score ≥ 7 at Visit 2120 (week12)
LBP_colonization_by_week5	N participants	n (%) participants with nugent score >= 7	Percentage	95% CI
FALSE	39	25	64.1	(47.2%–78.8%)
TRUE	47	10	21.3	(10.7%–35.7%)

LBP Colonization dominance

We first assessed whether LBP strains, when detected by metagenomic sequencing, typically established dominance in the vaginal microbiota. For each colonized sample, L. crispatus relative abundance was calculated, and dominance was defined as L. crispatus representing more than 50% of the total community.

We then investigated whether dominance by LBP strains was associated with a reduced risk of recurrent BV. Colonization outcomes were linked to Nugent scores collected at follow-up visits, and we compared the prevalence of BV (Nugent ≥ 7) among participants with or without LBP dominance at week 5. Results suggest that participants who achieved LBP strain dominance were less likely to experience recurrent BV.

Code

rel_abs <- 
  mae[["mg"]] |> 
  as_tibble() |> 
  filter(!poor_quality_mg_data) |> 
  left_join(
    mae |> colData() |> as_tibble(), 
    by = join_by(uid)
  ) |>
  filter(randomized, mITT)

rel_abs |> 
  group_by(.sample) |> 
  summarise(
    detected = any(!is.na(LBP) & rel_ab > 0, na.rm = TRUE),
    crispatus_abundance = sum(rel_ab[species == "crispatus"], na.rm = TRUE), 
    crispatus_dominance = ifelse(crispatus_abundance > 0.5, TRUE, FALSE), 
    .groups = "drop"
  ) |> 
  filter(detected) |> 
  dplyr::count(crispatus_dominance, name = "n") |> 
  mutate(
    crispatus_dominance = if_else(crispatus_dominance, "Yes", "No"),
    total = sum(n),
    pct = round(100 * n / total, 1),
    CI = binom::binom.confint(n, total, method = "exact"),
    CI_text = str_c("(", round(CI$lower * 100, 1), "%–", round(CI$upper * 100, 1), "%)")
  ) |> 
  select(crispatus_dominance, n, pct, CI_text) |> 
  gt() |> 
  tab_header(
    title = md("Dominance of *L. crispatus* when LBP strains are detected")
  ) |> 
  cols_label(
    crispatus_dominance = "L. crispatus dominance",
    n = "N samples with LBP detected",
    pct = "Percentage",
    CI_text = "95% CI"
  )

Dominance of L. crispatus when LBP strains are detected
L. crispatus dominance	N samples with LBP detected	Percentage	95% CI
No	39	17.2	(12.5%–22.7%)
Yes	188	82.8	(77.3%–87.5%)

Code

tmp <- 
  rel_abs |> 
  group_by(.sample) |> 
  summarise(
    detected = any(!is.na(LBP) & rel_ab > 0, na.rm = TRUE),
    crispatus_abundance = sum(rel_ab[species == "crispatus"], na.rm = TRUE), 
    LBP_abundance = sum(rel_ab[!is.na(LBP)], na.rm = TRUE),
    LBP_dominance = ifelse(LBP_abundance>0.5, TRUE, FALSE), 
    crispatus_dominance = ifelse(crispatus_abundance>0.5, TRUE, FALSE), 
    .groups = "drop"
  ) |> 
  left_join(nugent_score |> 
              select(uid, pid, visit_code, nugent_total_score, nugent_positive), 
            by = c(".sample"="uid")) 

tmp |>  
  filter(!is.na(visit_code)) |> 
  ggplot() + 
  aes(x=visit_code, y = pid, 
      shape = nugent_positive |> as.character(), 
      col = LBP_dominance |> as.character()) + 
  geom_point() + 
  scale_shape_manual(
    name = "Nugent Postive", 
    values = c("TRUE" = 16, "FALSE" = 1)
  ) + 
  scale_color_manual(
    name = "LPB Dominance", 
    values = c("TRUE" = "green3", "FALSE" = "orchid2" )
  ) + 
  labs(x = "Visit code", y = "Pid") + 
  theme(axis.text.y = element_text(size = 4))

Code

tmp |> 
  filter(!is.na(LBP_dominance), !is.na(nugent_positive)) |> 
  filter(visit_code == "1500") |> 
  dplyr::count(LBP_dominance, nugent_positive) |> 
  group_by(LBP_dominance) |> 
  mutate(
    pct = round(100 * n / sum(n), 1)
  ) |> 
  ungroup() |> 
  gt()

LBP_dominance	nugent_positive	n	pct
FALSE	FALSE	25	40.3
FALSE	TRUE	37	59.7
TRUE	FALSE	23	95.8
TRUE	TRUE	1	4.2

LBP detectable at week 12

To evaluate the durability of colonization, we assessed the presence of LBP strains at the week 12 visit (visit 2120) among participants in the active arms. Detection was defined by metagenomic sequencing as the presence of any LBP strain with non-zero relative abundance.

Code

rel_abs |> 
  filter(visit_code == "2120") |> 
  group_by(.sample) |> 
  summarise(detected = any(!is.na(LBP) & rel_ab > 0, na.rm = TRUE)) |> 
  ungroup() |> 
  dplyr::count(detected) |>
  mutate(
    pct = round(100 * n / sum(n), 1)
  ) |> 
  gt()

detected	n	pct
FALSE	61	71.8
TRUE	24	28.2

LBP detection at week 3 and 12 by qPCR

As a complementary and more sensitive assay, we assessed LBP strain detection by qPCR. Detection was defined as the concurrent observation of Cq values for at least two LBP strains at the same visit.

Code

detected_at_week2_qPCR <-
  mae@colData |> as_tibble() |> 
  select(pid, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |> 
  left_join(
    mae[["col_LBP_qPCR"]] |> 
      as_tibble() |> 
      dplyr::left_join(
        mae@colData |> as_tibble() |> select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |> 
      dplyr::filter(visit_code == "1200", .feature == "LBP_detected_at_qpcr")
  ) |> 
  mutate(LBP_detected_at_week2 = outcome |> replace_na(FALSE))

Code

detected_at_week2_qPCR |> 
  filter(arm != "Pl") |> 
  dplyr::count(LBP_detected_at_week2) |> 
  gt()

LBP_detected_at_week2	n
FALSE	25
TRUE	46

Code

detected_at_week12_qPCR <-
  mae@colData |> as_tibble() |> 
  select(pid, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |> 
  left_join(
    mae[["col_LBP_qPCR"]] |> 
      as_tibble() |> 
      dplyr::left_join(
        mae@colData |> as_tibble() |> select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |> 
      dplyr::filter(visit_code == "2120", .feature == "LBP_detected_at_qpcr")
  ) |> 
  mutate(LBP_detected_at_week12 = outcome |> replace_na(FALSE))

Code

detected_at_week12_qPCR |> 
  filter(arm != "Pl") |> 
  dplyr::count(LBP_detected_at_week12) |> 
  gt()

LBP_detected_at_week12	n
FALSE	51
TRUE	20

LBP detection week 3 (1300) and week 4_12 - qPCR

LBP detection was assessed by qPCR, defined as the presence of Cq values for at least two LBP strains at the same visit. We computed the proportion of participants with detectable LBP strains at week 3 (visit 1300, one week after the end of dosing) and at week 9 (visit 1900, after 7 weeks off product), both overall and excluding placebo participants.

Code

# one week after LBP = 1300

detected_at_week3_qPCR <-
  mae@colData |> as_tibble() |> 
  select(pid, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |> 
  left_join(
    mae[["col_LBP_qPCR"]] |> 
      as_tibble() |> 
      dplyr::left_join(
        mae@colData |> as_tibble() |> select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |> 
      dplyr::filter(visit_code == "1300", .feature == "LBP_detected_at_qpcr")
  ) |> 
  mutate(LBP_detected_at_week3 = outcome |> replace_na(FALSE)) 

detected_at_week3_qPCR |> 
  filter(arm != "Pl") |> 
  dplyr::count(LBP_detected_at_week3) |> 
  mutate(p = n/sum(n) * 100) |> 
  gt(caption = "Without placebo")

Without placebo
LBP_detected_at_week3	n	p
FALSE	36	50.70423
TRUE	35	49.29577

Code

detected_at_week3_qPCR |> 
  dplyr::count(LBP_detected_at_week3) |>
  mutate(p = n/sum(n) * 100) |> 
  gt(caption = "With placebo")

With placebo
LBP_detected_at_week3	n	p
FALSE	53	58.88889
TRUE	37	41.11111

Code

detected_at_week9_qPCR <-
  mae@colData |> as_tibble() |> 
  select(pid, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |> 
  left_join(
    mae[["col_LBP_qPCR"]] |> 
      as_tibble() |> 
      dplyr::left_join(
        mae@colData |> as_tibble() |> select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |> 
      dplyr::filter(visit_code == "1900", .feature == "LBP_detected_at_qpcr")
  ) |> 
  mutate(LBP_detected_at_week9 = outcome |> replace_na(FALSE)) 

detected_at_week9_qPCR |> 
  filter(arm != "Pl") |> 
  dplyr::count(LBP_detected_at_week9) |> 
  mutate(p = n/sum(n) * 100) |> 
  gt(caption = "Without placebo")

Without placebo
LBP_detected_at_week9	n	p
FALSE	55	77.46479
TRUE	16	22.53521

Code

detected_at_week9_qPCR |> 
  dplyr::count(LBP_detected_at_week9) |> 
  mutate(p = n/sum(n) * 100) |> 
  gt(caption = "With placebo")

With placebo
LBP_detected_at_week9	n	p
FALSE	74	82.22222
TRUE	16	17.77778

Effective LBP exposure from daily qPCR data during dosing period

Here we look at the proportion of days exposed by qPCR (daily home swabs), among expected dosing days based on participants’ arm.

Code

eff_exp_qPCR <- 
  mae@colData |> as_tibble() |> 
  select(pid, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |>
  left_join(
    mae@metadata$calc_vars_pid |> 
      select(pid, pid_exposed_by_qPCR, n_days_exposed_by_qPCR, n_days_exp_dosed),
    by = join_by(pid)
  ) |>
  mutate(
    prop_days_exposed = 
      case_when(
        n_days_exp_dosed > 0 ~ n_days_exposed_by_qPCR / n_days_exp_dosed,
        TRUE ~ 0
      )
  )

Code

eff_exp_qPCR |> 
  filter(mITT) |> 
  group_by(arm, site) |> 
  summarize(
    N = n(),
    n = sum(pid_exposed_by_qPCR, na.rm = TRUE),
    .groups = "drop"
  ) |> 
  mutate(
    value = str_c(n, " / ", N, " (", round(100*n/N, 1), "%)")
  ) |>  
  select(-n, -N) |>
  pivot_wider(names_from = arm, values_from = value) |> 
  gt(caption = "Participants exposed by qPCR (daily home swabs): n/N (%)")

Participants exposed by qPCR (daily home swabs): n/N (%)
site	Pl	LC106-7	LC106-3	LC106-o	LC115
CAP	1 / 14 (7.1%)	13 / 14 (92.9%)	13 / 14 (92.9%)	13 / 14 (92.9%)	11 / 14 (78.6%)
MGH	0 / 5 (0%)	6 / 7 (85.7%)	1 / 1 (100%)	1 / 1 (100%)	6 / 6 (100%)

Code

bind_rows(
  eff_exp_qPCR |> filter(arm != "Pl"),
  eff_exp_qPCR |> filter(arm != "Pl") |> mutate(site = "both"),
  eff_exp_qPCR |> filter(arm != "Pl", arm != "LC106-3") |> mutate(arm = "all active arms excl. LC106-3"),
  eff_exp_qPCR |> filter(arm != "Pl", arm != "LC106-3") |> mutate(site = "both", arm = "all active arms excl. LC106-3")
) |> 
  filter(mITT) |> 
  group_by(arm, site) |> 
  summarize(
    n = sum(!is.na(prop_days_exposed)),
    mean_prop_days_exposed = mean(prop_days_exposed, na.rm = TRUE),
    sd_prop = sd(prop_days_exposed, na.rm = TRUE),
    se = sd_prop / sqrt(n), 
    ci_lower = pmax(0, mean_prop_days_exposed - qt(0.975, df = pmax(n - 1, 1)) * se),
    ci_upper = pmin(1, mean_prop_days_exposed + qt(0.975, df = pmax(n - 1, 1)) * se), 
    mean_with_ci = str_c(percent(mean_prop_days_exposed, accuracy = 1), " (", percent(ci_lower, accuracy = 1), "–", percent(ci_upper, accuracy = 1), ")"),
    .groups = "drop"
  ) |>  
  select(arm, site, mean_with_ci) |>
  pivot_wider(names_from = arm, values_from = mean_with_ci) |> 
  gt(caption = "Proportion of days exposed by qPCR (daily home swabs), among dosing days: mean proportion (in %) and associated 95% CI.")

Proportion of days exposed by qPCR (daily home swabs), among dosing days: mean proportion (in %) and associated 95% CI.
site	LC106-3	LC106-7	LC106-o	LC115	all active arms excl. LC106-3
CAP	71% (53%–90%)	65% (49%–82%)	69% (53%–85%)	63% (40%–86%)	66% (56%–76%)
MGH	NA	64% (27%–100%)	NA	72% (42%–100%)	70% (50%–89%)
both	71% (54%–88%)	65% (50%–79%)	71% (56%–86%)	66% (49%–83%)	67% (58%–75%)

We group participants by the proportion of days exposed by qPCR (daily home swabs), among dosing days: >50%, >0% to ≤50%, and 0%. We then summarize the number and percentage of participants in each group, along with 95% confidence intervals, stratified by study arm.

Code

summary_exposed_qpcr <-
  eff_exp_qPCR |>
  filter(arm != "Pl") |>
  filter(mITT) |> 
  mutate(
    expo_grp = case_when(
      prop_days_exposed > 0.5                     ~ ">50%",
      prop_days_exposed > 0 & prop_days_exposed <= 0.5 ~ ">0%, ≤50%",
      prop_days_exposed == 0                      ~ "0%",
      .default = NA_character_
    ) |> factor(levels = c(">50%", ">0%, ≤50%", "0%"))
  ) |>
  dplyr::count(arm, expo_grp, name = "n") |>
  group_by(arm) |>
  mutate(tot_participant = sum(n), p = n / tot_participant |> round(2)) |>
  rowwise() |>
  mutate(
    CI = list(binom::binom.confint(n, tot_participant, method = "exact")[, c("lower","upper")])
  ) |>
  ungroup() |>
  unnest_wider(CI, names_sep = "_")

summary_exposed_qpcr |> 
  group_by(arm) |> 
  gt(row_group_as_column = TRUE)|>
  fmt_number(
    columns = where(is.numeric),
    decimals = 2
  )

	expo_grp	n	tot_participant	p	CI_lower	CI_upper
LC106-7	>50%	14.00	21.00	0.67	0.43	0.85
	>0%, ≤50%	5.00	21.00	0.24	0.08	0.47
	0%	2.00	21.00	0.10	0.01	0.30
LC106-3	>50%	12.00	15.00	0.80	0.52	0.96
	>0%, ≤50%	2.00	15.00	0.13	0.02	0.40
	0%	1.00	15.00	0.07	0.00	0.32
LC106-o	>50%	13.00	15.00	0.87	0.60	0.98
	>0%, ≤50%	1.00	15.00	0.07	0.00	0.32
	0%	1.00	15.00	0.07	0.00	0.32
LC115	>50%	14.00	20.00	0.70	0.46	0.88
	>0%, ≤50%	3.00	20.00	0.15	0.03	0.38
	0%	3.00	20.00	0.15	0.03	0.38

Code

# supplementary figure 3

exposed_qpcr <- 
  summary_exposed_qpcr |> 
  arrange(arm) |> 
  mutate(arm = arm |> str_replace_all(" ", "\n") |> fct_inorder()) |> 
  ggplot() +
  aes(x = arm, y = p, col = expo_grp) +
  geom_linerange(
    aes(ymin = CI_lower, ymax = CI_upper), 
    linewidth = 1.5, alpha = 0.5, lineend = "round",
    position = position_dodge(width = 0.5)
    ) +
  geom_point(position = position_dodge(width = 0.5), size = 3) +
  scale_color_manual("% of dosing days\nwith evidence of\nLBP exposure by qPCR", values = c(
    ">50%" = "grey50",
    ">0%, ≤50%" = "steelblue", 
    "0%" = "gold1"
  )) +
  xlab("") + 
  scale_y_continuous(
    name = "% of participants",
    labels = scales::percent_format(accuracy = 1), 
    breaks = seq(0, 1, by = 0.2)
  ) +
  theme(
    panel.grid.major.x = element_blank()
    # axis.text.x = element_text(angle = 45, hjust = 1)
  )

exposed_qpcr

Code

supplementary_figure3 <- exposed_qpcr

Code

# alternative supplementary figure 3

exposed_qpcr <- 
  eff_exp_qPCR |> 
  filter(arm != "Pl") |> 
  arrange(arm) |> 
  mutate(arm = arm |> str_replace_all(" ", "\n") |> fct_inorder()) |> 
  ggplot() +
  aes(x = arm, y = prop_days_exposed) +
  geom_violin(col = NA, fill = "steelblue1", alpha = 0.5) +
  ggbeeswarm::geom_quasirandom(alpha = 0.7) +
  scale_y_continuous(
    name = "% of dosing days with\nevidence of LBP exposure by qPCR",
    labels = scales::percent_format(accuracy = 1), 
    breaks = seq(0, 1, by = 0.2)
  ) 

exposed_qpcr

Microbiota trajectories highlighting people exposed by qPCR

Longitudinal stack relative abundances plot:

Code

rel_abs <- 
  mae[["mg"]] |> 
  as_tibble() |> 
  dplyr::select(
    .feature, .sample, uid, rel_ab, poor_quality_mg_data,
    taxon_label, genus, LBP, strain_origin
    )

Code

rel_abs <- 
  rel_abs |> 
  group_by(.feature) |> 
  mutate(max_rel_ab = max(rel_ab, na.rm = TRUE), mean_rel_ab = mean(rel_ab, na.rm = TRUE)) |> 
  ungroup() |> 
  arrange(is.na(LBP), -mean_rel_ab) |> 
  mutate(.feature = .feature |> fct_inorder()) |> 
  mutate(
    taxon = 
      case_when(
        as.numeric(.feature) <= 29 ~ taxon_label,
        (genus == "Lactobacillus") ~ "Other Lactobacillus",
        TRUE ~ "Other non-Lacto."
      )
  ) |> 
  arrange(is.na(LBP), LBP,genus != "Lactobacillus", genus, -mean_rel_ab) |> 
  mutate(taxon = taxon |> fct_inorder() |> fct_relevel("Other Lactobacillus", "Other non-Lacto.", "Other", after = Inf))

Code

rel_abs_agg <- 
  rel_abs |> 
  group_by(uid, taxon, LBP, strain_origin, poor_quality_mg_data) |> 
  summarize(rel_ab = sum(rel_ab), .groups = "drop") |> 
  dplyr::left_join(mae |> colData() |> as_tibble(), by = join_by(uid))

Code

tmp <- 
  eff_exp_qPCR |> 
  right_join(rel_abs_agg |> select(pid) |> distinct()) |> 
  mutate(
    pid_label = 
      case_when(
        prop_days_exposed > 0.5 ~ "E",
        prop_days_exposed > 0 ~ "e",
        is.na(prop_days_exposed) ~ "?",
        TRUE ~ "."
      ) 
    ) |> 
  dplyr::select(pid, pid_label) |> 
  distinct() 
pid_label <- setNames(tmp$pid_label, tmp$pid)

 
g_trajectories_qpcr_exp <-
  rel_abs_agg |>
  filter(mITT, randomized != 0, PIPV, !poor_quality_mg_data) |>
  group_by(pid) |> 
  mutate(
    tot_LC115 = sum(rel_ab[LBP == "LC115"]),
    tot_LBP = sum(rel_ab[!is.na(LBP)])
    ) |> 
  ungroup() |> 
  arrange(desc(PP), -tot_LC115, -tot_LBP) |> 
  mutate(
    pid = pid |> fct_inorder(),
    visit_label = visit |> as.character() |> fct_reorder(visit |> as.numeric())
    ) |> 
  ggplot(aes(y = rel_ab, x = pid, fill = taxon)) + #, alpha = ifelse(PP, "PP", "mITT")
  geom_col() +
  ggh4x::facet_nested(
    rows = vars(visit_label),
    cols = vars(arm, site),
    scales = "free_x", space = "free_x", 
    strip = 
      ggh4x::strip_nested(
        background_x = 
          ggh4x::elem_list_rect(
            fill = c(rep("gray90",5), 
                     rep(c("#FF0000", "#00868B"), 5))
          ), 
        text_x = 
          ggh4x::elem_list_text(
            face = rep("bold", 15),
            colour = c(rep("black", 5), rep("white", 5), "transparent", "white", "transparent", rep("white", 3))
            )              
      )
  ) +
  scale_fill_manual(
    "LBP strain or taxon",
    values = get_taxa_colors(rel_abs_agg$taxon |> levels()),
    labels = get_taxa_labels(rel_abs_agg$taxon |> levels()),
    guide = guide_legend(ncol = 1)
    ) +
  #scale_alpha_discrete("Population", range = c(0.5, 1)) +
  scale_y_continuous(
    "Relative abundance", labels = scales::percent,
    breaks = seq(0, 0.5, by = 0.5), minor_breaks = seq(0, 1, by = 0.25)
  ) +
  scale_x_discrete("Participants", labels = pid_label) +
  theme(
    #strip.background.y = element_rect(fill = "gray90"),
    #strip.text.y = element_text(color = "black", angle = 0, hjust = 0), # face = "bold", 
    strip.text.x = element_text(hjust = 0.5),
    axis.text.x = element_text(hjust = 0.5, vjust = 1, size = 10, margin = margin(t = 0)), 
    axis.ticks.x = element_blank(),
    legend.key.height = unit(15, "pt"),
    legend.text = element_markdown()
  )  


# theme_bw() + theme(axis.ticks.x = element_blank(), strip.text.y = element_text(angle = 0))

Code

g_trajectories_qpcr_exp

Code

tmp <- 
  applicator |> 
  select(pid, number_of_returned_applicators, p_pos_applicator = proportion_positive_applicators) |>
  distinct() |> 
  right_join(rel_abs_agg |> select(pid) |> distinct()) |> 
  mutate(
    pid_label = 
      case_when(
        is.na(number_of_returned_applicators) ~ "?",
        number_of_returned_applicators == 0 ~ "?",
        p_pos_applicator > 0.75 ~ "A",
        p_pos_applicator > 0 ~ "a",
        TRUE ~ "."
      ) 
    ) |> 
  dplyr::select(pid, pid_label) |> 
  distinct() 
pid_label <- setNames(tmp$pid_label, tmp$pid)

 
g_trajectories_appl_exp <-
  rel_abs_agg |>
  filter(mITT, randomized != 0, PIPV, !poor_quality_mg_data) |>
  group_by(pid) |> 
  mutate(
    tot_LC115 = sum(rel_ab[LBP == "LC115"]),
    tot_LBP = sum(rel_ab[!is.na(LBP)])
    ) |> 
  ungroup() |> 
  arrange(desc(PP), -tot_LC115, -tot_LBP) |> 
  mutate(
    pid = pid |> fct_inorder(),
    visit_label = visit |> as.character() |> fct_reorder(visit |> as.numeric())
    ) |> 
  ggplot(aes(y = rel_ab, x = pid, fill = taxon)) + #, alpha = ifelse(PP, "PP", "mITT")
  geom_col() +
  ggh4x::facet_nested(
    rows = vars(visit_label),
    cols = vars(arm, site),
    scales = "free_x", space = "free_x", 
    strip = 
      ggh4x::strip_nested(
        background_x = 
          ggh4x::elem_list_rect(
            fill = c(rep("gray90",5), 
                     rep(c("#FF0000", "#00868B"), 5))
          ), 
        text_x = 
          ggh4x::elem_list_text(
            face = rep("bold", 15),
            colour = c(rep("black", 5), rep("white", 5), "transparent", "white", "transparent", rep("white", 3))
            )              
      )
  ) +
  scale_fill_manual(
    "LBP strain or taxon",
    values = get_taxa_colors(rel_abs_agg$taxon |> levels()),
    labels = get_taxa_labels(rel_abs_agg$taxon |> levels()),
    guide = guide_legend(ncol = 1)
    ) +
  #scale_alpha_discrete("Population", range = c(0.5, 1)) +
  scale_y_continuous(
    "Relative abundance", labels = scales::percent,
    breaks = seq(0, 0.5, by = 0.5), minor_breaks = seq(0, 1, by = 0.25)
  ) +
  scale_x_discrete("Participants", labels = pid_label) +
  theme(
    #strip.background.y = element_rect(fill = "gray90"),
    #strip.text.y = element_text(color = "black", angle = 0, hjust = 0), # face = "bold", 
    strip.text.x = element_text(hjust = 0.5),
    axis.text.x = element_text(hjust = 0.5, vjust = 1, size = 10, margin = margin(t = 0)), 
    axis.ticks.x = element_blank(),
    legend.key.height = unit(15, "pt"), 
    legend.text = element_markdown()
  )  


# theme_bw() + theme(axis.ticks.x = element_blank(), strip.text.y = element_text(angle = 0))

Code

g_trajectories_appl_exp

Figures (export)

Code

ggsave(
  g_trajectories, filename = "figures/Figure2_trajectories_plot.pdf",
  width = 12, height = 7.5, dpi = 300, units = "in"
)

Code

# figure 3

figure3 <- 
  g_prop_colonized_by_final + theme(legend.position = "bottom", legend.direction = "horizontal") +
  g_long_terme_col + theme(legend.position = "bottom", legend.direction = "horizontal") +
  plot_layout(ncol = 2, widths = c(2, 1.4)) +
  plot_annotation(tag_level = "A") + 
  theme(
    plot.tag = element_text(
      face = "bold",
      hjust = -0.5,  
      vjust = 2     
    ))

figure3

ggsave(
  figure3, filename = "figures/Figure3_colonization.pdf",
  width = 12, height = 5, dpi = 300, units = "in"
)

Code

# figure 4 
ggsave(
  g_relative_abundance, filename = "figures/Figure4_relative_abundance.pdf",
  width = 9, height = 6, dpi = 300, units = "in"
)

Code

# figure 5 
ggsave(
  g_colonized_among_exposed, filename = "figures/Figure5_colonized_among_exposed.pdf",
  width = 9, height = 6, dpi = 300, units = "in"
)

Code

# Supp figure 4 
ggsave(
  g_colonized_among_exposed_supp, filename = "figures/Suppl_Figure4_colonized_among_exposed.pdf",
  width = 9, height = 6, dpi = 300, units = "in"
)

Code

# supplementary figure 3 
#| fig-height: 4
#| fig-width: 12
#| eval: false

ggsave(
  supplementary_figure3, filename = "figures/Suppl_Figure3_colonized_among_exposed.pdf",
  width = 9, height = 6, dpi = 300, units = "in"
)