Wastewater sequencing reveals community and variant dynamics of … – Nature.com
October 29, 2023
Probe-based capture drives viral enrichment
We developed a comprehensive viral capture approach using a diverse probe set across ten different sites on a weekly basis for nearly 1year. The probes (TWIST Comprehensive Viral Capture Panel) are directed against a panel of 3153 different human and animal virus genomes. As part of an initiative from the Texas Epidemic Public Health Institute23, composite 24-h wastewater influent was collected from six treatment plants in Houston, Texas, USA and four plants El Paso, Texas, USA from May 2022 through February 2023 (Fig.1A). Wastewater treatment plant catchment areas varied between 10,000 and 400,000 people (estimated 618,148 people served in Houston and 751,982 in El Paso County). These sites were chosen because they allowed us to examine the breadth and robustness of our approach across two large cities with different characteristics. Houston and El Paso also differ in size and diversity, have contrasting climate and rainfall (El Paso dry and Houston humid), are geographically distant (almost 1200 kilometers), and have different patterns of human travel (El Paso a border city with thousands of daily cross-border commuters, Houston a coastal city with one of the largest ports in the world).
A Map of wastewater catchment areas in Houston and El Paso, TX. The colored areas refer to the sites in each city (EP=4, Houston=6). B The treelike object was drawn with hierarchical taxonomical labels (kingdom, phylum, class, order, family, genus, species) rather than multiple sequence alignments due to independent origins of different virus phyla. Tip point size corresponds to number of wastewater samples with the virus detected, and color corresponds to the skew of the species to Houston (red) or El Paso (blue). C Number of distinct virus strains detected per sample from each wastewater treatment plant. D Rarefaction curves measuring distinct virus strains detected as more samples were analyzed. Lines represent average strains detected while shaded bands represent minimum and maximum values from 50 permutations. E Genome coverage of detected virus genome/segments for each sample. F Percentage of reads aligned to virus pathogen genome database in paired control (no-probe) and treatment (capture with the TWIST Comprehensive Virus Research Panel) groups. n=18 biologically independent samples. Boxplots are defined as: center line=median, lower and upper box-bounds=25th and 75th data percentiles, and whiskers extend to the minimum and maximum values.
The efficacy of probe-based enrichment methods was tested on 18 pilot samples. Following clearance of solids and nucleic acid extraction using methods designed for SARS-CoV-2 detection24, we first sequenced and examined viral read numbers from unenriched samples. Low proportions of viral reads were derived from these unenriched samples (4 78 aligned reads out of 9.8 18.0 million total reads), with 0 to 1 total mammalian viruses detected. In contrast, utilizing the TWIST Comprehensive Viral Research Panel probes on the same extractions, a 3,374-fold enrichment in the proportion of virus reads was observed (Fig.1F) (14.9 thousand 407.0 thousand aligned reads out of 11.6 24.2 million total reads), with 42 to 128 total mammalian viruses detected).
Read mapping-based virus detection and abundance measurement was conducted using EsViritu, a bioinformatics tool we developed for this purpose (Fig.S1). EsViritu leverages sequence information to sensitively detect mammalian viruses and filter out false positives (see materials and methods) (https://github.com/cmmr/EsViritu).
Applying these methods to 363 longitudinal wastewater samples, we detected 28 viral families, 77 genera, 191 species, and 465 distinct virus strains in total (Fig.1B), with a median of 54 to 98 strains detected per sample, depending on the wastewater treatment plant (Fig.1C). Furthermore, rarefaction analysis of virus strains showed that the unique detections were not saturated, and additional virus strains are likely to be detected in future samples (Fig.1D). A median of 28.5 reference genomes or segments had sequencing reads aligning to over 90% of their length with an additional 41 (median) genomes or segments with over 50% alignment (Fig.1E). From a methodological standpoint, this emphasizes the potential for in-depth analysis of circulating viruses beyond abundance measurements.
To infer the quantitative dynamic range for pathogen detection of this assay, we added in lab-grown respiratory syncytial virus A (RSV) virions to real wastewater samples (samples were previously determined to have no detectable RSV). Based on a stepwise dilution series, we could accurately detect and quantify RSV from a spike-in of 51 genome copies to 4 million genome copies with a Pearson correlation of at least R=0.975 (Fig.S2).
Having established a capture-based approach that offers the prospect of a comprehensive virome analysis of complex wastewater samples, we next asked whether signals generated from sequencing data mirror trends observed from publicly available clinical datasets. Case data from select viral pathogens, namely SARS-CoV-2, influenza virus, and monkeypox virus, were obtained for Houston and, when available El Paso, from local or state government sources. We started first with SARS-CoV-2, as wastewater levels have previously been correlated with case data25. Using the reads per kilobase of transcript per million filtered reads (RPKMF) as a proxy for relative virus levels in a given sample, there was a positive correlation between case data and positivity rate for SARS-CoV-2 summer and winter waves and the wastewater signal in Houston (Fig.2A, S3A, B, R=0.50.78) and case data from El Paso (Fig.2B, R=0.59 - 0.73). This finding is strengthened by the fact that a second orthogonal technique to measure SARS-CoV-2 levels in wastewater (i.e., qPCR which is the current standard) was also closely correlated with the RPKMF (Fig.S3C, D) for both Houston (R=0.64) and El Paso (R=0.84).
A SARS-CoV-2 wastewater sequencing abundance compared to reported cases (top) and scatter plot with Pearson correlation coefficients and p-value for two-sided test between wastewater sequencing abundance compared to reported cases of SARS-CoV-2 (bottom) in Houston, TX. B SARS-CoV-2 wastewater sequencing abundance compared to reported cases (top) and scatter plot with Pearson correlation coefficients and p-value for two-sided test between wastewater sequencing abundance compared to reported cases of SARS-CoV-2 (bottom) in El Paso, TX. C Influenza wastewater sequencing abundance compared to reported Weekly Percentage of Visits with Discharge Diagnosed Influenza (top) and scatter plot with Pearson correlation coefficients and p-value for two-sided test between wastewater sequencing abundance compared to Weekly Percentage of Visits with Discharge Diagnosed of Influenza (bottom) in Houston, TX. D Monkeypox virus wastewater sequencing abundance compared to reported Mpox cases (top) and scatter plot with Pearson correlation coefficients and p-value for two-sided test between wastewater sequencing abundance. E Heatmap for all ten wastewater sites for presence/absence and abundance for 11 pathogens of major concern (y-axis) across the entire study period (x-axis).
Similarly, Influenza A Virus abundance in the virome sequencing data was highly concordant with reporting of Weekly Percentage of Visits with Discharge Diagnosed Influenza in the Houston area (Fig.2C, R=0.9). Influenza variants H3N2 and H1N1 were also resolved in our data, concordant with clinical subtyping of this flu season in Texas (see Data and Materials Availability). Once more, the virome sequencing data was highly correlated with qPCR measurements from the same samples (Fig.S3E, F, R=0.57 0.73). Finally, a Monkeypox (Mpox) outbreak occurred in the summer of 2022 in several U.S. cities. Rather strikingly, monkeypox virus was detected numerous times at low abundance in Houston wastewater samples (Fig.2D, R=0.46) in our virome dataset, even though only 1,050 cases were reported in the entire Houston area between July and November 2022. Meanwhile, no detection events of monkeypox virus were recorded from El Paso wastewater samples, consistent with only 10 total reported clinical cases in this metro area.
Encouraging from a detection and possibly public health standpoint, 11 categories of major viral pathogens were routinely detected and could be tracked over the sampling period (Fig.2E), including noroviruses, rotavirus A, hepatitis A virus, RSV, parainfluenza viruses, and enterovirus D68. Interestingly, at times, there were different trends in virus levels observed in both cities and at different periods of the year (Fig.S4).
We wished to understand how the human wastewater virome changed over space and time. Important variables in the structure of virome communities were realized by generating t-distributed stochastic neighbor embedding (t-SNE) plots from the virus abundance data of each sample. There was a stark separation of the samples by the city of collection and date of collection (Fig.3AB). Virus species from several families showed an uneven distribution between Houston and El Paso (Fig.S5A). For example, while we expect most viruses to have a prevalence bias towards El Paso due to higher median levels of strain detection per site (Fig.1C), El Paso had especially strong signals from many Parvoviridae and Sedoreoviridae whereas Houston samples had higher prevalence of many Calicivirdae and Astroviridae, the reasons for which are currently unknown (Fig.S5A).
A t-SNE of wastewater samples using virome abundance data, showing different cities/sites. B t-SNE of wastewater samples using virome abundance data, showing samples over time. C Temporal analysis of intra-site community changes. Each dot is a comparison between two samples. The x-axis measures days in between sampling. The y-axis measures Bray-Curtis dissimilarity between the samples. D Bray-Curtis dissimilarity between samples taken +/- 7days apart, comparing samples from the same site, different site but same city, and different city. ****Represents t-test p-value<1e-04. Different City vs Same Site, p=6.2e164. Different City vs Same City, p=2.1e214. Same City vs Same Site, p=2.9e30.
To assess community dynamics over time, all samples from each site were compared to each other using the Bray-Curtis dissimilarity statistic (Fig.3C). In general, as time went on, the composition of the virome in samples diverged such that samples taken closer in time were quite similar, whereas those separated by many months were very different. Interestingly, a possible exception to the temporal divergence rule can be seen in samples taken from the wastewater treatment plant serving Houstons large intercontinental airport, which likely reflects a transient population of world travelers (Fig.3C, HOU R5). Here, the compositional dissimilarity was poorly correlated with the passing of time, possibly due to flux of the virome from incoming people. On the other hand, as the data collection approaches 1year and the seasons repeat, samples from 3 of 4 El Paso sites seem to be re-converging on their community structures from the previous year. In general, dissimilarity follows a pattern where sites from different cities are more different than sites within the same city, and samples from the same site are more similar than everything else (Fig.3D, Fig.S5B). Finally, we assessed the impact of human population size on virome diversity. The alpha diversity (Shannons statistic) was measured for each sample (Fig.S5C), and the average diversity and population of the service area for each site were plotted (Fig.S5D). Average diversity increases from catchment populations of 10,000 to 100,000 inhabitants, but the diversity values level off with greater numbers of people. Collectively, this data confirms that the structure of wastewater virome communities are substantially determined by temporal and geospatial factors.
A handful of viruses had high or complete genome coverage across many wastewater samples and were therefore suitable for variant analysis. Although a single lineage seemed to dominate the sample read abundance for some virus strains, many samples had a mix of two or more lineages. Therefore, allelic variants were measured by the frequency of non-synonymous mutations compared to the reference genome. We focused on three examples.
Astrovirus MLB1, which has a seroprevalence in Americans close to 100%26, was the virus contained at high genome coverage in the most samples in our dataset. The variant landscape of Astrovirus MLB1 was largely dictated by the city-of-origin of the sample (Fig.4E), with gene-specific mutations in the capsid, ORF1a, and ORF1b showing strong regional localization in time (Fig.4B, Fig.S6B). Human Adenovirus 41 is an enteric virus associated with diarrhea and, possibly, hepatitis27 in children and was also quite common in wastewater. This virus splits into two major lineages (Fig.4A, D, Fig.S6A), with the hypervariable capsid (hexon) gene having a lot of diversity28. Although both lineages dominated in samples from either city, each city had one lineage that was more common. JC Polyomavirus, which is secreted in the urine, commonly establishes long, asymptomatic infections in a high proportion of the population29. Consistent with non-acute, rarely transmitted infections, the variant landscape of this virus seems to lack meaningful spatiotemporal structure and most samples appear to have a diversity of lineages (Fig.4C, F, Fig.S6C).
A Genome map of Human Adenovirus 41 (middle) with non-synonymous variants displayed above (Houston, TX) and below (El Paso, TX) according to genome position (X-axis) and date (Y-axis). B Like (A) but with Astrovirus MLB1. C Like (A) but with JC Polyomavirus. D t-SNE of non-synonymous variant frequency of astrovirus MLB1. E Like (D) but with JC Polyomavirus. F Like (D) but with JC Polyomavirus.
See original here:
Wastewater sequencing reveals community and variant dynamics of ... - Nature.com