This document details the analyses and provides results for the primary analyses of the VIBRANT trial.
Trial data
The data used for these analyses are stored in an R Bioconductor MultiAssayExperiment object which contains the QCed and pre-processed data. The code and description of how raw data have been processed and QCed is available on the VIBRANT-0-QC-and-preprocessing GitHub repository.
Loading the RDS file containing the unblinded VIBRANT MAE: /06 subsetted MAEs/mae_1_20260109.rds
Code
mae <-readRDS(mae_file)rm(mae_file)mae
A MultiAssayExperiment object of 4 listed
experiments with user-defined names and respective classes.
Containing an ExperimentList class object of length 4:
[1] col_LBP_mg: SummarizedExperiment with 2 rows and 890 columns
[2] col_LBP_qPCR: SummarizedExperiment with 4 rows and 895 columns
[3] col_crispatus_mg: SummarizedExperiment with 2 rows and 890 columns
[4] mg: SummarizedExperiment with 779 rows and 893 columns
Functionality:
experiments() - obtain the ExperimentList instance
colData() - the primary/phenotype DataFrame
sampleMap() - the sample coordination DataFrame
`$`, `[`, `[[` - extract colData columns, subset, or experiment
*Format() - convert into a long or wide DataFrame
assays() - convert ExperimentList to a SimpleList of matrices
exportClass() - save data to flat files
Demographics (Table 1 & suppl. tables by sites)
Summary table
The demographics table is constructed as follows.
We first create a table1_datadata.frame from the participant_crfs_merged table stored in the @metadata slot of the mae object.
This table contains most of the variables required for the demographics summary, including: site, age, race, ethn, cut_size_meals, eat_less, hungry_did_not_eat, sexual_partners_lifetime, sexual_partners_past_month and contraception_method.
Additional variables such as arm and study population flags are retrieved from mae@colData. For this analysis, we restrict the dataset to participants in the mITT population.
From the food-related variables (cut_size_meals, eat_less, and hungry_did_not_eat), we derive a new variable food_insecurity. A participant is considered food insecure if she answered “Yes” to any of the three questions.
table1_data |>ggplot(aes(x = food_insecurity)) +geom_bar() +labs(x="Food insecurity", y ="Count")+theme_classic()
In addition to these variables, we also include information about about the gender of participants’ sexual partners.
Participants were asked about their partner’s gender at each weekly visit (weeks 0 through 12: visits 1000, 1100, 1200, 1300, 1400, 1500, 1700, 1900, and 2120) using Case Report Form (CRF) 19. Responses are recorded in the past_week_sex_partner column of the mae@metadata$visits_crfs_merged table.
Based on participants’ answers across all weekly visits, we categorized their reported partner gender into one of four groups:
No sex
Only male
Only female
Both
This derived variable is included in the table1_data table.
Code
gender_sexual_partner <-metadata(mae)$visits_crfs_merged |>select(pid, past_week_sex_partner) |>filter(!is.na(past_week_sex_partner)) |>group_by(pid) |>summarise(gender_sexual_partner =case_when(all(past_week_sex_partner =="No sex in the past week") ~"No sex",any(past_week_sex_partner %in%c("A single male partner", "Multiple male partners")) &!any(past_week_sex_partner %in%c("A single female partner", "Multiple female partners")) ~"Only man",any(past_week_sex_partner %in%c("A single female partner", "Multiple female partners")) &!any(past_week_sex_partner %in%c("A single male partner", "Multiple male partners")) ~"Only female",any(past_week_sex_partner %in%c("A single male partner", "Multiple male partners")) &any(past_week_sex_partner %in%c("A single female partner", "Multiple female partners")) ~"Both",TRUE~NA_character_ ),.groups ="drop" )table1_data <- table1_data |>left_join(gender_sexual_partner, by ="pid")
We also recoded the race variable to simplify and harmonize categories:
“Asian or South Asian” is re-coded to “Asian” (for brevity)
“Black, African American, or African” and “African” are re-coded to “Black/African” (for brevity)
“Coloured”, “White”, “Other”, and “Prefered not to answer” are left as is.
Code
table1_data <- table1_data |>mutate(across(where(is.character), as.factor),race =fct_recode( race,"Asian"="Asian or South Asian","Black/African"="Black, African American, or African","Black/African"="African","Coloured"="Coloured","White"="White","Other"="Other","Prefer not to answer"="Prefer not to answer" ),race =fct_relevel(race, "Asian", "Black/African", "Coloured", "White", "Other", "Prefer not to answer"),race = race |>fct_drop(),arm = arm |>fct_drop() ) |>set_variable_labels(food_insecurity ="Report food insecurity in past 12 months ",sexual_partners_past_month ="Number of partners in past month",sexual_partners_lifetime ="Number of partners in lifetime",ethn ="Ethnicity", contraception_method ="Contraception", gender_sexual_partner ="Gender of sexual partners" )
Table 1 (Population characteristics by arm)
Table 1 summarizes baseline demographic and behavioral characteristics of the participants described above, stratified by study arm and restricted to the mITT population.. Continuous variables are presented as median and range (minimum–maximum), while categorical variables are shown as counts and percentages.
Comparisons across study arms are performed using the Kruskal-Wallis rank sum test for continuous variables. For categorical variables, either Fisher’s exact test or Pearson’s Chi-squared test is used, depending on cell counts and expected frequencies.
2 Kruskal-Wallis rank sum test; Fisher’s exact test
Study population characteristics appear balanced across arms at the CAPRISA site. Given the low number of participants at the MGH site in two of the five arms, we create an additional table excluding these two arms.
2 Kruskal-Wallis rank sum test; Fisher’s exact test
Besides the number of lifetime sexual partners, arms appear balanced across arm at MGH.
Primary outcomes (Table 2) & sensitivity analyses
LBP colonization by week 5
The primary outcome is defined as “Colonization by any of the LBP strains after product administration by week 5”, and colonization by any of the LBP strain is defined as a relative abundance of 5% by any single strain or 10% total relative abundance of the LBP strains. The relative abundance of LBP strain is estimated using metagenomics (taxonomic composition estimated using VIRGO2 and strain proportion of total L. crispatus is estimated using kSanity), and includes samples from weekly visits 1200 to 1500 (week 2 (post-INT) to week 5 (3 weeks post-INT)).
table_2_with_model_results <-bind_rows( table_2, test_results ) |>mutate(site = site |>str_replace_na(""),Pl =case_when( name =="Adj. p-value"~"Ref.", name =="Benefit ratio"~"",str_detect(name, "CI") ~"",TRUE~ Pl ) )
Code
table_2_with_model_results |>group_by(site) |>gt(caption ="Table 2: Colonization by week 5 by arm and site", row_group_as_column =TRUE) |>cols_width("name"~px(150),everything() ~px(120) ) |>cols_label(name ="",Pl ="Placebo",`LC106-7`="LC106<br>7 days",`LC106-3`="LC106<br>3 days",`LC106-o`="LC106<br>overlap",`LC115`="LC115<br>7 days",.fn = md )
Table 2: Colonization by week 5 by arm and site
Placebo
LC106 7 days
LC106 3 days
LC106 overlap
LC115 7 days
CAP
N participants
N = 14
N = 14
N = 14
N = 14
N = 14
n (%) participants with LBP strain detected
0 (0 %)
9 (64 %)
7 (50 %)
7 (50 %)
10 (71 %)
95% CI
39% - 84%
27% - 73%
27% - 73%
45% - 88%
MGH
N participants
N = 5
N = 7
N = 1
N = 1
N = 6
n (%) participants with LBP strain detected
1 (20 %)
6 (86 %)
1 (100 %)
1 (100 %)
6 (100 %)
95% CI
49% - 97%
21% - 100%
21% - 100%
61% - 100%
Adj. p-value
Ref.
<0.001
<0.001
<0.001
<0.001
Benefit ratio
13.57
10.13
10.13
15.2
95% CI
1.72 - Inf
1.37 - Inf
1.37 - Inf
2.1 - Inf
Sensitivity analyses (Supplementary tables and figures)
LBP colonization by week 5 excluding immediately post-product visits
Defining the alternative primary outcome
We defined an alternative primary outcome that excluded visits conducted immediately after the last product administration. Specifically, for participants in the 7-dose arms (LC107-7 and LC115) and placebo, we eclude visit 1200, while for participants in the 3-dose arm (LC106-3) and overlap arm (LC106-o) only visits prior to 1200 (1000 and 1100) were excluded.
Code
new_outcome_def <-# we start from the "colonized at" mae[["col_LBP_mg"]] |>as_tibble() |> dplyr::filter(.feature =="colonized_LBP_at_mg") |> dplyr::left_join( mae@colData |>as_tibble() |>select(uid, pid, visit_code, arm),by =join_by(.sample == uid) ) |>filter(!is.na(arm))# we fill the missing visitsnew_outcome_def <- new_outcome_def |>select(-c(arm)) |> dplyr::full_join( new_outcome_def |>select(pid, arm) |>distinct() |>expand_grid(visit_code =unique(new_outcome_def$visit_code) |>sort() |>setdiff(c("0001","0010")) ),by =join_by(pid, visit_code) ) |>arrange(pid, visit_code) |>mutate(outcome = outcome |>replace_na(FALSE)) |> dplyr::filter(!is.na(arm))# we compute the new outcomenew_outcome_def <- new_outcome_def |>arrange(pid, visit_code) |>group_by(pid) |>mutate(tmp =case_when( (arm %in%c("Pl", "LC106-7", "LC115")) & (as.numeric(visit_code) <=1200) ~FALSE, (arm %in%c("LC106-3", "LC106-o")) & (as.numeric(visit_code) <1200) ~FALSE,TRUE~ outcome ),colonized_LBP_by_mg =cummax(tmp) |>as.logical() ) |>ungroup()
Code
new_outcome_def |>left_join(mae@colData |>as_tibble() |>select(pid, mITT) |>distinct()) |>ggplot() +aes(y = visit_code |>fct_rev(), x = pid, fill = colonized_LBP_by_mg) +facet_grid(. ~ arm +ifelse(mITT, "mITT", "ITT"), scales ="free", space ="free") +geom_tile() +scale_fill_manual("Colonized by LBP by MG by each visit",values =c("gray", "orangered"), labels =c("No", "Yes") ) +ylab("Visit code") +xlab("Participant IDs") +labs(caption ="Colonization at missed visits was considered negative." ) +theme(axis.text.x =element_text(angle =90, hjust =1, vjust =0.5),legend.position ="bottom" )
table_2_with_model_results <-bind_rows( table_2, test_results # |> filter(name != "Adj. p-value") ) |>mutate(site = site |>str_replace_na(""),Pl =case_when( name =="Adj. p-value"~"Ref.", name =="Benefit ratio"~"-",str_detect(name, "CI") ~"",TRUE~ Pl ) )
Code
table_2_with_model_results |>group_by(site) |>gt(caption ="LBP Colonization by week 5 by arm and site excluding immediately post-product visit.", row_group_as_column =TRUE ) |>cols_width("name"~px(150),everything() ~px(120) ) |>cols_label(name ="",Pl ="Placebo",`LC106-7`="LC106<br>7 days",`LC106-3`="LC106<br>3 days",`LC106-o`="LC106<br>overlap",`LC115`="LC115<br>7 days",.fn = md )
LBP Colonization by week 5 by arm and site excluding immediately post-product visit.
Placebo
LC106 7 days
LC106 3 days
LC106 overlap
LC115 7 days
CAP
N participants
N = 13
N = 14
N = 14
N = 14
N = 14
n (%) participants with LBP strain detected
0 (0 %)
8 (57 %)
7 (50 %)
7 (50 %)
6 (43 %)
95% CI
33% - 79%
27% - 73%
27% - 73%
21% - 67%
MGH
N participants
N = 4
N = 7
N = 1
N = 1
N = 6
n (%) participants with LBP strain detected
1 (25 %)
4 (57 %)
1 (100 %)
1 (100 %)
4 (67 %)
95% CI
25% - 84%
21% - 100%
21% - 100%
30% - 90%
Adj. p-value
Ref.
0.002
0.002
0.002
0.002
Benefit ratio
-
9.71
9.07
9.07
8.5
95% CI
1.35 - Inf
1.35 - Inf
1.35 - Inf
1.27 - Inf
LBP colonization by week 5, qPCR-estimated relative abundances
For this sensitivity analysis, we define colonization by any of the LBP strains using relative abundances estimated from qPCR data. For each participant and visit, the relative abundance of each LBP strain was computed as the ratio of the strain-specific copies per swab to the total 16S rRNA gene copies per swab. In line with the metagenomic definition, colonization at a given visit was considered present if the total relative abundance of all LBP strains exceeded 10% or if the maximum relative abundance of any single LBP strain exceeded 5%. These analyses included the immediate post-product visits.
table_2_with_model_results <-bind_rows( table_2, test_results # |> filter(name != "Adj. p-value") ) |>mutate(site = site |>str_replace_na(""),Pl =case_when( name =="Adj. p-value"~"Ref.", name =="Benefit ratio"~"-",str_detect(name, "CI") ~"",TRUE~ Pl ) )
Code
table_2_with_model_results |>group_by(site) |>gt(caption ="LBP Colonization by week 5 by arm, with colonization definition based on qPCR-estimated relative abundance.", row_group_as_column =TRUE ) |>cols_width("name"~px(150),everything() ~px(120) ) |>cols_label(name ="",Pl ="Placebo",`LC106-7`="LC106<br>7 days",`LC106-3`="LC106<br>3 days",`LC106-o`="LC106<br>overlap",`LC115`="LC115<br>7 days",.fn = md )
LBP Colonization by week 5 by arm, with colonization definition based on qPCR-estimated relative abundance.
Placebo
LC106 7 days
LC106 3 days
LC106 overlap
LC115 7 days
CAP
N participants
N = 14
N = 14
N = 14
N = 14
N = 14
n (%) participants with LBP strain detected
0 (0 %)
8 (57 %)
7 (50 %)
8 (57 %)
10 (71 %)
95% CI
33% - 79%
27% - 73%
33% - 79%
45% - 88%
MGH
N participants
N = 5
N = 7
N = 1
N = 1
N = 6
n (%) participants with LBP strain detected
1 (20 %)
6 (86 %)
1 (100 %)
1 (100 %)
6 (100 %)
95% CI
49% - 97%
21% - 100%
21% - 100%
61% - 100%
Adj. p-value
Ref.
<0.001
<0.001
<0.001
<0.001
Benefit ratio
-
12.67
10.13
11.4
15.2
95% CI
1.58 - Inf
1.37 - Inf
1.47 - Inf
2.1 - Inf
LBP detection by week 5, qPCR-based detection
In this analysis, LBP is considered to be detected if at least 2 LBP strains had observed Cq values at the same visit by week 5. “By week 5” includes all post-product visits from visit_code = 1200 (no excluding the immediate post-product visits.
table_2_with_model_results <-bind_rows( table_2, test_results # |> filter(name != "Adj. p-value") ) |>mutate(site = site |>str_replace_na(""),Pl =case_when( name =="Adj. p-value"~"Ref.", name =="Benefit ratio"~"-",str_detect(name, "CI") ~"",TRUE~ Pl ) )
Code
table_2_with_model_results |>group_by(site) |>gt(caption ="LBP detection by week 5 by arm, with LBP detection defined as at least two LBP strains simultaneously detected by qPCR.", row_group_as_column =TRUE ) |>cols_width("name"~px(150),everything() ~px(120) ) |>cols_label(name ="",Pl ="Placebo",`LC106-7`="LC106<br>7 days",`LC106-3`="LC106<br>3 days",`LC106-o`="LC106<br>overlap",`LC115`="LC115<br>7 days",.fn = md )
LBP detection by week 5 by arm, with LBP detection defined as at least two LBP strains simultaneously detected by qPCR.
Placebo
LC106 7 days
LC106 3 days
LC106 overlap
LC115 7 days
CAP
N participants
N = 14
N = 14
N = 14
N = 14
N = 14
n (%) participants with LBP strain detected
2 (14 %)
12 (86 %)
8 (57 %)
9 (64 %)
11 (79 %)
95% CI
60% - 96%
33% - 79%
39% - 84%
52% - 92%
MGH
N participants
N = 5
N = 7
N = 1
N = 1
N = 6
n (%) participants with LBP strain detected
3 (60 %)
6 (86 %)
1 (100 %)
1 (100 %)
6 (100 %)
95% CI
49% - 97%
21% - 100%
21% - 100%
61% - 100%
Adj. p-value
Ref.
<0.001
0.028
0.015
<0.001
Benefit ratio
-
3.26
2.28
2.53
3.23
95% CI
1.47 - Inf
1.07 - Inf
1.15 - Inf
1.47 - Inf
L. crispatus dominance by week 5
In this analysis, L. crispatus dominance is defined as a metagenomic-estimated relative abundance ≥ 50%. “By week 5” includes all post-product visits from visit_code = 1200 (no excluding the immediate post-product visits.
table_2_with_model_results <-bind_rows( table_2, test_results # |> filter(name != "Adj. p-value") ) |>mutate(site = site |>str_replace_na(""),Pl =case_when( name =="Adj. p-value"~"Ref.", name =="Benefit ratio"~"-",str_detect(name, "CI") ~"",TRUE~ Pl ) )
Code
table_2_with_model_results |>group_by(site) |>gt(caption ="Ever ≥50% L. crispatus by week 5 by arm.", row_group_as_column =TRUE ) |>cols_width("name"~px(150),everything() ~px(120) ) |>cols_label(name ="",Pl ="Placebo",`LC106-7`="LC106<br>7 days",`LC106-3`="LC106<br>3 days",`LC106-o`="LC106<br>overlap",`LC115`="LC115<br>7 days",.fn = md )
Ever ≥50% L. crispatus by week 5 by arm.
Placebo
LC106 7 days
LC106 3 days
LC106 overlap
LC115 7 days
CAP
N participants
N = 14
N = 14
N = 14
N = 14
N = 14
n (%) participants ≥ 50% L. crispatus
0 (0 %)
8 (57 %)
5 (36 %)
6 (43 %)
10 (71 %)
95% CI
33% - 79%
16% - 61%
21% - 67%
45% - 88%
MGH
N participants
N = 5
N = 7
N = 1
N = 1
N = 6
n (%) participants ≥ 50% L. crispatus
1 (20 %)
6 (86 %)
1 (100 %)
1 (100 %)
5 (83 %)
95% CI
49% - 97%
21% - 100%
21% - 100%
44% - 97%
Adj. p-value
Ref.
<0.001
0.008
0.004
<0.001
Benefit ratio
-
12.67
7.6
8.87
14.25
95% CI
1.58 - Inf
1.21 - Inf
1.29 - Inf
1.89 - Inf
Secondary outcomes (Figures 2-4)
Code
fig_title <-mae_title(mae)
Kinetics of LBP colonization
Total relative abundances of LBP strains
To investigate the kinetics of colonization, we estimated at each clinic visit the total relative abundance of all LBP strains combined from metagenomic sequencing data. For each participant, the relative abundances of the individual LBP strains were summed, and trajectories were visualized over time stratified by study arm and site.
To facilitate interpretation, each participant within the same study arm and site is labeled by a unique number. These numbers allow visual tracking of individual trajectories across visits within each stratified panel.
Code
g_tot_prop_LBP_strains +geom_label(aes(label = pid_nb), size =2) +labs(caption ="Participants are labeled by their enrollment order within each arm and site." )
Proportions of participants who colonized in each arm and each visit
We further examined colonization dynamics by focusing on Lactobacillus crispatus, irrespective of whether strains belonged to the investigational product.
For each participant and visit, the total relative abundance of L. crispatus was calculated by summing all metagenomic features annotated to this species. Trajectories were visualized over time stratified by study arm and site.
g_tot_prop_Lc +geom_label(aes(label = pid_nb), size =2) +labs(caption ="Participants are labeled by their enrollment order within each arm and site." )
Proportions of participants who colonized with 50% L. crispatus in each arm and each visit
We estimated at each visit the proportion of participants whose vaginal microbiota was dominated by L. crispatus (defined as a relative abundance >50%). For each study arm and site, proportions were calculated together with 95% confidence intervals derived from binomial tests.
We assessed the persistence of colonization among participants who had achieved LBP colonization by week 5. Specifically, we compared colonization status at week 12 to determine the proportion of initially colonized participants who remained colonized at the end of follow-up.
summary_col_still_col <-bind_rows(# by site and arms col_still_col |>filter(LBP_colonization_by_week5) |>group_by(site, arm) |>summarize(n =n(), # total with colonization at week 5n_success =sum(LBP_colonization_at_week12), # still colonized at week 12.groups ="drop" ),# combining sites, by arm col_still_col |>filter(LBP_colonization_by_week5) |>group_by(arm) |>summarize(n =n(), # total with colonization at week 5n_success =sum(LBP_colonization_at_week12), # still colonized at week 12.groups ="drop" ) |>mutate(site ="CAP & MGH") ) |>mutate(site = site |>fct_inorder(),arm = arm |>fct_expand("All arms", "All active arms") )summary_col_still_col <- summary_col_still_col |>bind_rows(# overall by site, all arms summary_col_still_col |>group_by(site) |>summarize(n =sum(n), # total with colonization at week 5n_success =sum(n_success), # still colonized at week 12.groups ="drop" ) |>mutate(arm ="All arms"|>factor(levels = summary_col_still_col$arm |>levels())),# overall by site, all active arms summary_col_still_col |>filter(arm !="Pl") |>group_by(site) |>summarize(n =sum(n), # total with colonization at week 5n_success =sum(n_success), # still colonized at week 12.groups ="drop" ) |>mutate(arm ="All active arms"|>factor(levels = summary_col_still_col$arm |>levels())) )summary_col_still_col <- summary_col_still_col |>mutate(p_col_stil_col = n_success / n,CI = binom::binom.confint(n_success, n, method ="exact") )
Microbiota trajectories of mITT participants in the VIBRANT study. The relative abundances of the LBP strains and top taxa are shown. Participants are ordered by their total relative abundance of LBP strains, and the study weeks are shown in the rows. Stars at the bottom of trajectories indicates participant that were not part of the per-protocol (PP) population.
Code
# checking the data for the placebo participant that has the LBP strains at MGHmae[["mg"]] |>as_tibble() |> dplyr::left_join(mae@colData |>as_tibble()) |>filter(pid =="068100062", study_week ==5) mae[["qPCR"]] |>as_tibble() |> dplyr::left_join(mae@colData |>as_tibble()) |>filter(pid =="068100062", study_week ==5) mae[["amplicon"]] |>as_tibble() |> dplyr::left_join(mae@colData |>as_tibble()) |>filter(pid =="068100062", study_week ==5) |> dplyr::select(.feature, .sample, rel_ab, exclude_sample)mae[["amplicon"]] |>as_tibble() |> dplyr::left_join(mae@colData |>as_tibble()) |>filter(pid =="068100062", .feature =="Lactobacillus_crispatus") |> dplyr::select(.feature, .sample, visit_code, rel_ab, exclude_sample)
Individual LBP strains by sites
Relative abundance of each strain at each visit as estimated by metagenomic sequencing. Strains are organized according to geographical origin, and product(s) in which they are included.
Relative abundances of individual LBP strains estimated by metagenomic sequencing across study weeks, stratified by site, strain origin, and strain identity. Each line represents the longitudinal trajectory of one participant; points show visit-specific measurements.
We estimated the proportion of exposed participants who ever colonized with each LBP strain during follow-up (visits after the first 2 weeks), based on metagenomic relative abundances.
Colonization was defined at the participant level as detection of a given strain at ≥5% relative abundance in at least one post-week 2 sample. Placebo participants were excluded, since they were not exposed to any of the strains.
For LC115 strains, only participants in the LC115 arm were considered exposed, while for LC106 strains all participants in active arms were considered exposed. For each strain and site, we calculated the number of exposed participants, the number who colonized, the corresponding proportion, and 95% confidence intervals using Wilson’s method.
Code
tmp <-# we use the metagenomics data for this mae[["mg"]] |>as_tibble() |>filter(!is.na(LBP), !poor_quality_mg_data) |> dplyr::select(.feature, LBP, strain_origin, .sample, uid, rel_ab) |># add the arm, mITT, etc.left_join( mae |>colData() |>as_tibble() |> dplyr::select(uid, pid, site, randomized, mITT, arm, visit_code, visit, PIPV), by =join_by(uid) ) |># only consider the mITTfilter(randomized, mITT) |>filter(PIPV, as.numeric(visit_code) >=1200) |># only include visits after the first 2 weeks# define the outcome of interest for each person and straingroup_by(.feature, LBP, strain_origin, pid, arm, site) |>summarize(ever_colonized =any(rel_ab >=0.05, na.rm =TRUE),.groups ="drop" ) |># exclude Placebo participants because, they were, in theory, not exposed to any of the strains.filter(arm !="Pl") |># define the total number of participants exposed for each strain.# For LC115 strains, this is only participants of arm LC115# For LC106 strains, this is participants from all active arms.mutate(exposed =case_when( LBP =="LC115"~ (arm =="LC115"), LBP =="LC106 & LC115"~TRUE,TRUE~FALSE ) ) |># we compute the statistics of interestgroup_by(.feature, LBP, strain_origin, site) |>summarize(n_exposed =sum(exposed, na.rm =TRUE),n_colonized =sum(ever_colonized & exposed, na.rm =TRUE),.groups ="drop" ) |>mutate(prop_colonized = n_colonized / n_exposed,CI_low = binom::binom.confint(n_colonized, n_exposed, method ="wilson")$lower,CI_high = binom::binom.confint(n_colonized, n_exposed, method ="wilson")$upper )
tmp |>mutate(strain_origin =recode(strain_origin, "SA"="ZA strains", "US"="US strains"),CI =str_c("[", scales::percent(CI_low, accuracy =1), " - ", scales::percent(CI_high, accuracy =1), "]" ),value =str_c( scales::percent(prop_colonized, accuracy =1), " (", n_colonized, "/", n_exposed,")<br>", CI ),strain_info =str_c(LBP, "<br>(", strain_origin, " strains)") ) |> dplyr::select(-n_exposed, -n_colonized, -prop_colonized, -CI_low, -CI_high, -CI, -LBP, -strain_origin) |>pivot_wider(names_from =c(site), values_from = value ) |>arrange(strain_info) |>group_by(strain_info) |>gt(row_group_as_column =TRUE, process_md =TRUE,caption ="Proportion of mITT participants exposed to a given strain who colonized with that strain (colonization defined as a ≥ 5% relative abundance by metagenomics) at any weekly visit after product exposure." ) |>cols_label(.feature ="LBP strain") |>cols_align(columns = .feature, align ="left") |>cols_align(columns =c("CAP", "MGH"), align ="center") |>fmt_markdown(columns =everything())
Proportion of mITT participants exposed to a given strain who colonized with that strain (colonization defined as a ≥ 5% relative abundance by metagenomics) at any weekly visit after product exposure.
LBP strain
CAP
MGH
LC106 & LC115 (US strains strains)
C0022A1
25% (14/56) [16% - 38%]
47% (7/15) [25% - 70%]
C0059E1
61% (34/56) [48% - 72%]
87% (13/15) [62% - 96%]
C0175A1
39% (22/56) [28% - 52%]
67% (10/15) [42% - 85%]
LC106 & LC115 (ZA strains strains)
FF00018
7% (4/56) [3% - 17%]
20% (3/15) [7% - 45%]
FF00051
30% (17/56) [20% - 43%]
67% (10/15) [42% - 85%]
UC101
25% (14/56) [16% - 38%]
73% (11/15) [48% - 89%]
LC115 (US strains strains)
C0006A1
50% (7/14) [27% - 73%]
67% (4/6) [30% - 90%]
C0028A1
0% (0/14) [0% - 22%]
0% (0/6) [0% - 39%]
C0112A1
64% (9/14) [39% - 84%]
83% (5/6) [44% - 97%]
LC115 (ZA strains strains)
122010
57% (8/14) [33% - 79%]
33% (2/6) [10% - 70%]
185329
0% (0/14) [0% - 22%]
0% (0/6) [0% - 39%]
FF00004
50% (7/14) [27% - 73%]
83% (5/6) [44% - 97%]
FF00064
50% (7/14) [27% - 73%]
83% (5/6) [44% - 97%]
FF00072
7% (1/14) [1% - 31%]
50% (3/6) [19% - 81%]
UC119
0% (0/14) [0% - 22%]
0% (0/6) [0% - 39%]
Excluding immediate post-product visit
Same analyze as above but excluding the week 2 visit for the LC106-7 and LC115 arms, since this visit is immediately after product use.
Code
tmp <-# we use the metagenomics data for this mae[["mg"]] |>as_tibble() |>filter(!is.na(LBP), !poor_quality_mg_data) |> dplyr::select(.feature, LBP, strain_origin, .sample, uid, rel_ab) |># add the arm, mITT, etc.left_join( mae |>colData() |>as_tibble() |> dplyr::select(uid, pid, site, randomized, mITT, arm, visit_code, visit, PIPV), by =join_by(uid) ) |># only consider the mITTfilter(randomized, mITT) |>mutate(include_visit =case_when( (arm %in%c("LC106-7", "LC115")) &as.numeric(visit_code) >1200~TRUE, (arm %in%c("LC106-3", "LC106-o")) &as.numeric(visit_code) >=1200~TRUE,TRUE~FALSE ) ) |>filter(PIPV, include_visit) |># only include visits after the first 2 weeks# define the outcome of interest for each person and straingroup_by(.feature, LBP, strain_origin, pid, arm, site) |>summarize(ever_colonized =any(rel_ab >=0.05, na.rm =TRUE),.groups ="drop" ) |># exclude Placebo participants because, they were, in theory, not exposed to any of the strains.filter(arm !="Pl") |># define the total number of participants exposed for each strain.# For LC115 strains, this is only participants of arm LC115# For LC106 strains, this is participants from all active arms.mutate(exposed =case_when( LBP =="LC115"~ (arm =="LC115"), LBP =="LC106 & LC115"~TRUE,TRUE~FALSE ) ) |># we compute the statistics of interestgroup_by(.feature, LBP, strain_origin, site) |>summarize(n_exposed =sum(exposed, na.rm =TRUE),n_colonized =sum(ever_colonized & exposed, na.rm =TRUE),.groups ="drop" ) |>mutate(prop_colonized = n_colonized / n_exposed,CI_low = binom::binom.confint(n_colonized, n_exposed, method ="wilson")$lower,CI_high = binom::binom.confint(n_colonized, n_exposed, method ="wilson")$upper )
tmp |>mutate(strain_origin =recode(strain_origin, "SA"="ZA strains", "US"="US strains"),CI =str_c("[", scales::percent(CI_low, accuracy =1), " - ", scales::percent(CI_high, accuracy =1), "]" ),value =str_c( scales::percent(prop_colonized, accuracy =1), " (", n_colonized, "/", n_exposed,")<br>", CI ),strain_info =str_c(LBP, "<br>(", strain_origin, " strains)") ) |> dplyr::select(-n_exposed, -n_colonized, -prop_colonized, -CI_low, -CI_high, -CI, -LBP, -strain_origin) |>pivot_wider(names_from =c(site), values_from = value ) |>arrange(strain_info) |>group_by(strain_info) |>gt(row_group_as_column =TRUE, process_md =TRUE,caption ="Proportion of mITT participants exposed to a given strain who colonized with that strain (colonization defined as a ≥ 5% relative abundance by metagenomics) at any weekly visit after product exposure but excluding immediate post-product visit (i.e., excluding week 2 visit for the LC106-7 and LC115 arms)." ) |>cols_label(.feature ="LBP strain") |>cols_align(columns = .feature, align ="left") |>cols_align(columns =c("CAP", "MGH"), align ="center") |>fmt_markdown(columns =everything())
Proportion of mITT participants exposed to a given strain who colonized with that strain (colonization defined as a ≥ 5% relative abundance by metagenomics) at any weekly visit after product exposure but excluding immediate post-product visit (i.e., excluding week 2 visit for the LC106-7 and LC115 arms).
LBP strain
CAP
MGH
LC106 & LC115 (US strains strains)
C0022A1
21% (12/56) [13% - 34%]
33% (5/15) [15% - 58%]
C0059E1
43% (24/56) [31% - 56%]
40% (6/15) [20% - 64%]
C0175A1
39% (22/56) [28% - 52%]
67% (10/15) [42% - 85%]
LC106 & LC115 (ZA strains strains)
FF00018
2% (1/56) [0% - 9%]
7% (1/15) [1% - 30%]
FF00051
9% (5/56) [4% - 19%]
20% (3/15) [7% - 45%]
UC101
2% (1/56) [0% - 9%]
7% (1/15) [1% - 30%]
LC115 (US strains strains)
C0006A1
0% (0/14) [0% - 22%]
0% (0/6) [0% - 39%]
C0028A1
0% (0/14) [0% - 22%]
0% (0/6) [0% - 39%]
C0112A1
0% (0/14) [0% - 22%]
0% (0/6) [0% - 39%]
LC115 (ZA strains strains)
122010
36% (5/14) [16% - 61%]
17% (1/6) [3% - 56%]
185329
0% (0/14) [0% - 22%]
0% (0/6) [0% - 39%]
FF00004
0% (0/14) [0% - 22%]
0% (0/6) [0% - 39%]
FF00064
0% (0/14) [0% - 22%]
0% (0/6) [0% - 39%]
FF00072
0% (0/14) [0% - 22%]
0% (0/6) [0% - 39%]
UC119
0% (0/14) [0% - 22%]
0% (0/6) [0% - 39%]
Supplementary analyses
Applicator staining
We analyze returned applicators and staining results to assess adherence to study product use. We use data from participant_crfs_merged stored in the @metadata slot of the mae object, which contains data from CRF23 (returned applicators) and CRF46 (staining results). We also use the exposures table to obtain information on study product use.
From these, we create the applicator table, which includes only participants from the mITT population.
We note that two participants returned eight applicators. In both cases, comments explain that they had received a spare applicator:
Participant 068-20-0425 reported taking the spare applicator out of its wrapping and placing it in the return bag without using or washing any applicators.
Participant 068-20-0439 reported inserting the spare applicator on the day of the visit, thinking all applicators had to be used, but also did not wash any of them.
Returned applicator statistics
Here, we want to document the proportions of mITT participants who
returned any used applicators
returned their complete set of applicators (i.e., 7 or 8 applicators)
Here, we document, for the mITT participants who returned their complete set and got all of their applicators stained, the number of positively stained applicators.
Dominance of LBP strains at visits remote from dosing
To assess whether colonization typically reflected dominance, we examined the composition of samples from participants in active arms who achieved the primary outcome at visits conducted after product use (weeks 3–5 for LC106-7 and LC115; weeks 2–5 for LC106-o and LC106-3).
For each colonized sample, we calculated the total relative abundance of LBP strains based on metagenomic data. We then determined the proportion of samples in which LBP strains accounted for at least 50% of the community.
We assessed treatment response by Nugent score at the first visit following oral metronidazole treatment (visit 1100). Nugent scores were extracted from visits_crfs_merged and a binary variable was created to classify participants as BV positive (nugent_total_score ≥ 7) or BV negative (< 7).
We then estimated the number and proportion of participants who remained BV positive post-metronidazole, overall and by treatment arm.
Code
# nugent score for all visits nugent_score <- mae@colData |>as_tibble() |>select(uid, pid, visit_code, study_day, site, randomized, arm, ITT, mITT, PP) |>distinct() |>filter(randomized, mITT) |>left_join(metadata(mae)$visits_crfs_merged |>as_tibble() |>select(pid, visit_code, nugent_total_score),by =c("pid", "visit_code") ) |># some pid x visit are duplicate but there is no data for the nugent score, so we remove the duplicate visit with no data for nugent scoregroup_by(pid, visit_code) |>arrange(is.na(nugent_total_score)) |> dplyr::slice(1) |>ungroup() |>group_by(visit_code) |>mutate(any_Nugent =any(!is.na(nugent_total_score))) |>ungroup() |>filter(any_Nugent) |>group_by(pid) |>mutate(active_arm = arm !="Pl",nugent_positive = nugent_total_score >=7 ) |>ungroup()
Code
nugent_score |>ggplot()+aes(x = visit_code, y = pid, fill = nugent_total_score) +geom_tile() +scale_fill_gradient(name ="Nugent score",low ="lightblue1", high ="red3" ) +labs(x ="Visit Code", y ="Participant ID") +theme(axis.text.y =element_text(size =4))
Code
nugent_score |>filter(visit_code =="1100") |> dplyr::count(nugent_positive) |>gt(caption ="Number of mITT participant with BV (nugent_positive == TRUE) post-MTZ (visit 1100)")
Number of mITT participant with BV (nugent_positive == TRUE) post-MTZ (visit 1100)
nugent_positive
n
FALSE
68
TRUE
18
NA
4
Code
nugent_score |>filter(visit_code =="1100") |> dplyr::count(active_arm, nugent_positive) |>gt(caption ="Number of mITT participant with BV (nugent_positive == TRUE) post-MTZ (visit 1100) by active arm")
Number of mITT participant with BV (nugent_positive == TRUE) post-MTZ (visit 1100) by active arm
active_arm
nugent_positive
n
FALSE
FALSE
11
FALSE
TRUE
6
FALSE
NA
2
TRUE
FALSE
57
TRUE
TRUE
12
TRUE
NA
2
At visit 1100, among participants with BV who were exposed to the LBP (i.e., in the active arm), 5 participants had a detectable LBP strain identified after treatment by metagenomics.
Code
# among the 12 of these people exposed to LBPtmp <- nugent_score |>filter(visit_code =="1100") |>filter(active_arm) |>left_join(# add mg data to see how many had detectable LBP strain post-MTZ mae[["mg"]] |>as_tibble() |># add mg data dplyr::left_join( mae@colData |>as_tibble() |>select(uid, pid, visit_code),by =join_by(.sample == uid) ) |>filter(visit_code =="1200", !poor_quality_mg_data)|>filter(!is.na(LBP)) |>#only LBP straingroup_by(pid) |>summarise(prop_tot_LBP =sum(rel_ab)) |>arrange(prop_tot_LBP |>desc()) |>mutate(LBP_detected_post_INT = prop_tot_LBP >=0.05),by =c("pid") ) tmp |> dplyr::count(nugent_positive, LBP_detected_post_INT) |>gt(caption ="Number of mITT participant in one of the active arms with BV diagnosis post-MTZ (visit 1100) and LBP detection (>= 5% total LBP rel. ab) post-INT (visit 1200)")
Number of mITT participant in one of the active arms with BV diagnosis post-MTZ (visit 1100) and LBP detection (>= 5% total LBP rel. ab) post-INT (visit 1200)
nugent_positive
LBP_detected_post_INT
n
FALSE
FALSE
19
FALSE
TRUE
36
FALSE
NA
2
TRUE
FALSE
6
TRUE
TRUE
5
TRUE
NA
1
NA
TRUE
2
rBV by colonization status
In this section, we are investigating whether LBP colonization is protective of rBV.
We only include participants who were BV-cured post-MTZ, then count the number and proportions of participants who had rBV among those who met the primary outcome and those who did not.
Code
rBV_stats <-# we first define for each mITT participant whether they were BV-cured or not post-MTZ by Nugent nugent_score |>filter(visit_code =="1100", mITT) |>select(uid, pid, visit_code, site, mITT, active_arm, arm, nugent_total_score) |>mutate(BV_cure_post_MTZ = nugent_total_score <7) |># we then specify for each of these participants whether they met the criteria for the primary endpoint of LBP colonization by week 5left_join( mae[["col_LBP_mg"]] |>as_tibble() |> dplyr::left_join( mae@colData |>as_tibble() |>select(uid, pid, visit_code),by =join_by(.sample == uid) ) |> dplyr::filter(visit_code =="1500", .feature =="colonized_LBP_by_mg") |>mutate(LBP_colonization_by_week5 = outcome |>replace_na(FALSE)) |>select(pid, LBP_colonization_by_week5),by =join_by(pid) ) |># we next check who had BV (or rBV) by week 12left_join( nugent_score |>select(pid, visit_code, nugent_positive) |>filter(as.numeric(visit_code) >1100, as.numeric(visit_code) <=2120) |>group_by(pid) |>summarize(BV_by_week12 =any(nugent_positive, na.rm =TRUE),BV_after_week5 =any(nugent_positive &as.numeric(visit_code) >1500, na.rm =TRUE), # any BV after week 5 ),by =join_by(pid) ) |>mutate(rBV_by_week12 = BV_by_week12 & BV_cure_post_MTZ # rBV if BV by week 12 and cured post-MTZ )rBV_stats |>filter(active_arm) |> dplyr::count(active_arm, BV_cure_post_MTZ, LBP_colonization_by_week5, BV_by_week12)
rBV_stats <-# we first define for each mITT participant whether they were BV-cured or not post-MTZ by Nugent nugent_score |>select(-any_Nugent, -nugent_positive) |>filter(visit_code =="1100", mITT) |>mutate(BV_cure_post_MTZ = nugent_total_score <7) |># we then specify for each of these participants whether they met the criteria for the primary endpoint of LBP colonization by week 5left_join( mae[["col_LBP_mg"]] |>as_tibble() |> dplyr::left_join( mae@colData |>as_tibble() |>select(uid, pid, visit_code),by =join_by(.sample == uid) ) |> dplyr::filter(visit_code =="1500", .feature =="colonized_LBP_by_mg") |>mutate(LBP_colonization_by_week5 = outcome |>replace_na(FALSE)) |>select(pid, LBP_colonization_by_week5),by =join_by(pid) ) |># we next check who had BV (or rBV) by week 12left_join( nugent_score |>select(pid, visit_code, nugent_positive) |>filter(as.numeric(visit_code) >1100, as.numeric(visit_code) <=2120) |>group_by(pid) |>summarize(BV_by_week12 =any(nugent_positive, na.rm =TRUE),BV_after_week5 =any(nugent_positive &as.numeric(visit_code) >1500, na.rm =TRUE), # any BV after week 5 ),by =join_by(pid) ) |>mutate(rBV_by_week12 = BV_by_week12 & BV_cure_post_MTZ, # rBV if BV by week 12 and cured post-MTZ )rBV_stats |>filter(active_arm) |> dplyr::count(active_arm, LBP_colonization_by_week5, BV_after_week5)
Proportion of Participants with Nugent Score ≥ 7 at Visit 2120 (week12)
LBP_colonization_by_week5
N participants
n (%) participants with nugent score >= 7
Percentage
95% CI
FALSE
39
25
64.1
(47.2%–78.8%)
TRUE
47
10
21.3
(10.7%–35.7%)
LBP Colonization dominance
We first assessed whether LBP strains, when detected by metagenomic sequencing, typically established dominance in the vaginal microbiota. For each colonized sample, L. crispatus relative abundance was calculated, and dominance was defined as L. crispatus representing more than 50% of the total community.
We then investigated whether dominance by LBP strains was associated with a reduced risk of recurrent BV. Colonization outcomes were linked to Nugent scores collected at follow-up visits, and we compared the prevalence of BV (Nugent ≥ 7) among participants with or without LBP dominance at week 5. Results suggest that participants who achieved LBP strain dominance were less likely to experience recurrent BV.
To evaluate the durability of colonization, we assessed the presence of LBP strains at the week 12 visit (visit 2120) among participants in the active arms. Detection was defined by metagenomic sequencing as the presence of any LBP strain with non-zero relative abundance.
As a complementary and more sensitive assay, we assessed LBP strain detection by qPCR. Detection was defined as the concurrent observation of Cq values for at least two LBP strains at the same visit.
LBP detection was assessed by qPCR, defined as the presence of Cq values for at least two LBP strains at the same visit. We computed the proportion of participants with detectable LBP strains at week 3 (visit 1300, one week after the end of dosing) and at week 9 (visit 1900, after 7 weeks off product), both overall and excluding placebo participants.
Participants exposed by qPCR (daily home swabs): n/N (%)
site
Pl
LC106-7
LC106-3
LC106-o
LC115
CAP
1 / 14 (7.1%)
13 / 14 (92.9%)
13 / 14 (92.9%)
13 / 14 (92.9%)
11 / 14 (78.6%)
MGH
0 / 5 (0%)
6 / 7 (85.7%)
1 / 1 (100%)
1 / 1 (100%)
6 / 6 (100%)
Code
bind_rows( eff_exp_qPCR |>filter(arm !="Pl"), eff_exp_qPCR |>filter(arm !="Pl") |>mutate(site ="both"), eff_exp_qPCR |>filter(arm !="Pl", arm !="LC106-3") |>mutate(arm ="all active arms excl. LC106-3"), eff_exp_qPCR |>filter(arm !="Pl", arm !="LC106-3") |>mutate(site ="both", arm ="all active arms excl. LC106-3")) |>filter(mITT) |>group_by(arm, site) |>summarize(n =sum(!is.na(prop_days_exposed)),mean_prop_days_exposed =mean(prop_days_exposed, na.rm =TRUE),sd_prop =sd(prop_days_exposed, na.rm =TRUE),se = sd_prop /sqrt(n), ci_lower =pmax(0, mean_prop_days_exposed -qt(0.975, df =pmax(n -1, 1)) * se),ci_upper =pmin(1, mean_prop_days_exposed +qt(0.975, df =pmax(n -1, 1)) * se), mean_with_ci =str_c(percent(mean_prop_days_exposed, accuracy =1), " (", percent(ci_lower, accuracy =1), "–", percent(ci_upper, accuracy =1), ")"),.groups ="drop" ) |>select(arm, site, mean_with_ci) |>pivot_wider(names_from = arm, values_from = mean_with_ci) |>gt(caption ="Proportion of days exposed by qPCR (daily home swabs), among dosing days: mean proportion (in %) and associated 95% CI.")
Proportion of days exposed by qPCR (daily home swabs), among dosing days: mean proportion (in %) and associated 95% CI.
site
LC106-3
LC106-7
LC106-o
LC115
all active arms excl. LC106-3
CAP
71% (53%–90%)
65% (49%–82%)
69% (53%–85%)
63% (40%–86%)
66% (56%–76%)
MGH
NA
64% (27%–100%)
NA
72% (42%–100%)
70% (50%–89%)
both
71% (54%–88%)
65% (50%–79%)
71% (56%–86%)
66% (49%–83%)
67% (58%–75%)
We group participants by the proportion of days exposed by qPCR (daily home swabs), among dosing days: >50%, >0% to ≤50%, and 0%. We then summarize the number and percentage of participants in each group, along with 95% confidence intervals, stratified by study arm.
Microbiota trajectories of mITT participants in the VIBRANT study. The relative abundances of the LBP strains and top taxa are shown. Participants are ordered by their total relative abundance of LBP strains, and the study weeks are shown in the rows.
Microbiota trajectories of mITT participants in the VIBRANT study. The relative abundances of the LBP strains and top taxa are shown. Participants are ordered by their total relative abundance of LBP strains, and the study weeks are shown in the rows.