Surveillance methods for avian influenza virus (AIV) based upon collecting and testing samples from individual wild birds have several significant limitations primarily related to the difficulties associated with obtaining samples. Because AIVs are shed in waterfowl feces, the use of environmental substrates where waterfowl feces accumulate may overcome some of these limitations. However, these substrates are difficult to analyze using traditional diagnostic techniques, such as virus culture and PCR, because of virus inactivation, RNA degradation, low concentration of target RNA, microbial complexity, presence of inhibitory substances, and other factors. We investigated the use of a genomics-based approach called targeted resequencing to detect and characterize AIVs in wetland sediments during the 2014–15 North American highly pathogenic avian influenza outbreak. We identified AIV in 20.6% (71/345) sediment samples obtained from wetlands (n=15) and outdoor waterbodies on AIV-infected poultry farms (n=10) in British Columbia, Canada (the first area affected during the outbreak). Thirteen hemagglutinin (HA) and nine neuraminidase (NA) subtypes were detected, including H5, N1, and N8 sequences that clustered with other sequences associated with the North American outbreak. Additionally, as many as eight HA and eight NA subtypes could be detected in a single sediment sample. This proof-of-concept study shows the potential utility of sediment sampling coupled with genomics-based analysis as a tool for AIV surveillance.
Avian influenza (AI) is a viral disease that can cause significant morbidity and mortality in domestic poultry. Certain strains of avian influenza virus (AIV) can also cause disease in people and wildlife (Stallknecht and Brown 2007; Lai et al. 2016). Wild waterfowl, particularly Anseriformes, are the natural reservoir for all AIV subtypes (Olsen et al. 2006; Stallknecht and Brown 2007). They are generally asymptomatic carriers of AIV and can spread viruses among geographic locations during migration (Olsen et al. 2006). Overlap between migratory routes facilitates virus reassortment and emergence of novel subtypes and strains (Olsen et al. 2006). Most AIV surveillance programs in wildlife are centered around the testing of individual wild birds. While these methods have advantages (e.g., the ability to obtain live virus and to link specific AIV subtypes to specific bird species), it can be very difficult to obtain a representative sample of animals (Deliberto et al. 2009).
Waterfowl shed AIV in their feces (Olsen et al. 2006; Stallknecht and Brown 2007); therefore, testing substrates containing fecal material could provide an alternative strategy for AIV surveillance (Deliberto et al. 2009). Sediment (i.e., organic and inorganic matter that settles to the bottom of a wetland) has a number of advantages compared to other environmental samples. It can be easier to obtain than feces and may contain AIVs from a number of individual birds encompassing a range of species. AIVs may be more likely to accumulate and persist in sediment than water, and sediment does not require concentrating techniques (Khalenkov et al. 2008; Zhang et al. 2014). Previous studies have demonstrated the presence of AIV in sediment (Lang et al. 2008); however, there are several impediments to sediment analysis using traditional diagnostic techniques (e.g., virus isolation and PCR). These include viral degradation, low concentration of target RNA, biotic complexity, and the presence of inhibitory substances (Lang et al. 2008). Also, given that novel, unexpected, and multiple AIV strains may be encountered in these samples, more robust methods must be employed to detect and differentiate all AIV subtypes (Lin et al. 2009).
The challenges associated with analyzing sediment samples may be overcome with recent advancements in genomics technology. Targeted resequencing (i.e., the use of probes to target specific regions of the viral genome, allowing those regions to be enriched and sequenced) is particularly promising for the detection and characterization of AIVs in environmental substrates. Compared to other genomics approaches, targeted resequencing increases the probability of detecting AIV RNA, even at very low concentrations within a complex matrix. It also allows for higher sequencing coverage for regions of interest and enables the characterization of a variety of AIV subtypes and strains (including novel strains) within a single sample.
The objective of this study was to determine whether targeted resequencing of wetland sediment samples could be used to detect and characterize AIVs during the 2014–15 highly pathogenic avian influenza (HPAI) outbreak in the Fraser Valley of British Columbia (49°3′0″N, 122°19′0″W), Canada. During this outbreak (December 2014 to March 2015), 12 poultry farms were infected with highly pathogenic avian influenza HPAI H5N2 (Pasick et al. 2015). The H5 segment originated from Eurasian HPAI H5N8, which was likely introduced into North America by migratory waterfowl and subsequently underwent reassortment with a North American N2 virus (Pasick et al. 2015). One farm was infected with HPAI H5N1 (a second reassortment between a virus carrying the Eurasian H5 segment and one carrying a North American N1 segment; Berhane et al. 2016).
MATERIALS AND METHODS
Sediment sample and data collection
Samples (n=300) were collected from 15 wetlands from 19 January to 13 February, 2015. Wetlands were selected based on accessibility and relative waterfowl abundance and diversity as determined by a local waterfowl biologist using data from eBird (eBird 2018), personal experience, and site visits. For each wetland, five sampling sites were selected based on accessibility and observed waterfowl activity. Within each sampling site, four 50 mL samples of superficial sediment were collected about 1–5 m from each other and about 1–2 m from the shoreline to target areas where dabbling waterfowl might defecate (Olsen et al. 2006). The average water depth at sampling was 10–30 cm. We used 50 mL conical centrifuge tubes to scoop sediment and sealed them immediately to prevent cross-contamination (Supplementary Material Fig. 1). Arm-length rubber gloves were used for sample collection. Gloves were disinfected between wetlands but not between samples. The majority of samples were collected without touching the sediment being sampled. We recorded which samples had to be pushed into the tube with a gloved finger.
For each wetland, data collected included wetland area (ha), distance (km) to the nearest infected poultry farm, and anthropogenic modification (highly modified, semimodified, unmodified). Anthropogenic modification was assessed by an observer at the time of sample collection and was based on a combination of public access to the wetland and modification of land adjacent to the waterfront (e.g., use of land for agricultural fields, residential housing, mowed grass lawns). Where available, the average abundances of 16 dabbling duck species observed on each wetland from 1 October 2014 to 28 February 2015 were obtained from eBird. The eBird Basic Dataset (eBird 2018) was filtered by species, time frame, and location using auk v0.3.0 (Strimas-Mackey et al. 2017) and tidyverse v1.2.1 (Wickham 2017) and then used to calculate the numerical species richness and Shannon-Wiener diversity index using vegan v2.5-3 (Oksanen et al. 2018) in R v3.5.1 (R Core Team 2018).
For each sampling site, presence or absence of waterfowl, waterfowl feces, and feathers was recorded, and one 200 mL water sample was collected and analyzed for total coliforms and Escherichia coli (colony-forming units, CFU/mL) by Colilert-24 (IDEXX, Westbrook, Maine, USA). Several sampling sites were transient water bodies that were grasslands or agricultural fields during the summer, which was also recorded (flooded field: yes or no). For each sample, data collected included the presence of reeds and aquatic plants (yes or no), the amount of leaves and roots (none, low, high), sediment type (fine, mud, gravel, rocks), water depth (≤10 cm, 11–30 cm, >30 cm), distance from shoreline (≤1 m, >1 m), and sterility of sampling (sample touched with a gloved finger: yes or no).
The Canadian Food Inspection Agency provided 45 sediment samples from outdoor water bodies that were, or could be, used by waterfowl on 10 of the 13 infected poultry farms (referred to as farm samples). There was a median of two sampling sites per farm (range=1–4) with two to four samples collected per site for a median of four samples per farm (range=2–8). No additional data were available for these samples.
Total RNA was extracted using the RNA PowerSoil™ Total RNA Isolation kit (MO BIO Laboratories, Carlsbad, California, USA). Briefly, 2.0 g of sediment was homogenized by vortexing in a tube containing silica carbide beads, kit buffers, and phenol:chloroform:isoamyl alcohol (pH 6.5–8.0; Sigma-Aldrich, St. Louis, Missouri, USA). After centrifugation, sample supernatants were mixed with equal volumes of chloroform and subjected to two consecutive chloroform extractions, followed by sample precipitation and resuspension in designated buffers. Samples were loaded into the flow column for final elution and precipitation. Sample pellets were resuspended in 30 µL of RNase-free water and stored at –80 C. We included RNA from the following influenza strains as positive controls: A/Victoria/3/1975(H3N2), A/Hong Kong/1968(H3N2), A/Denver/1957(H1N1), A/California/2009(H1N1), and A/chicken/BC/FAV25/2014(H5N2).
Targeted resequencing and bioinformatics analysis
The RNA samples were enriched and sequenced using the ONETest™ Influenza Envrioscreen (Fusion Genomics, Burnaby, British Columbia, Canada). Briefly, RNA samples were reversed transcribed into cDNA using random primers, and dual-indexed Illumina-compatible libraries were constructed. The libraries were enriched for HA and NA gene sequences using a liquid phase hybridization and target capture protocol with QUANTUMProbes™ (Fusion Genomics). The enriched libraries were pair-end sequenced (2×150 base pairs) on an Illumina NextSeq 550 platform (Illumina, San Diego, California, USA) according to manufacturer's instructions.
The quality of the raw sequencing reads was checked using FastQC (Andrews 2010) prior to removing the adaptor sequences and trimming low-quality regions with the Trimmomatic v0.36 tool using the following parameters: leading: 20, trailing: 20, slidingwindow: 4:20 (Bolger et al. 2014). Only reads >60 base pairs were retained for subsequent analyses using a custom bioinformatics pipeline (AIV_seeker; Supplementary Material Fig. 2). Briefly, putative AIV reads were identified using Diamond (v0.7.9.58, Buchfink et al. 2015) to perform alignments against a custom protein database curated from Influenza Research Database (Zhang et al. 2017) and GISAID (Shu and McCauley 2017) with an E-value of 1E-5. To detect chimeric reads, putative AIV reads were queried against a custom nucleotide database using the BLASTn (E-value 1E-20) in GenBank (Clark et al. 2016). Both the protein and nucleotide databases contain representative sequences from Influenza Research Database and GISAID that were selected by removing sequences that are more than 99% identical using VSEARCH (Rognes et al. 2016). The alignment to the top sequence match in the database was used to calculate the coverage for each read, and only reads with >75% coverage were kept. Subsequently, a customized, highly conservative procedure was used to remove reads potentially attributable to index hopping (i.e., the assignment of reads to the incorrect sample). The probability of identifying identical reads in multiple samples is low given that AIV is rare in environmental samples. However, due to PCR duplication artifacts produced during library construction, redundant sequences may be observed in the same sample. Based on this observation, we reasoned that the index-hopped reads can be identified in the form of redundant reads in multiple samples. This approach has the added benefit of removing possible carryover contamination between runs. Reads were clustered together using VSEARCH (sorted by abundance and target coverage >0.6), and an operational taxonomic unit table was generated based on the read distribution in different samples within each clustering group (Supplementary Material Fig. 3). The groups in which all samples contained unique reads were kept. For groups where reads were found in more than one sample, a ratio was used to identify the dominant source of the reads (≥10 reads and ratio ≥3). Reads found in samples other than the dominant source were considered to arise from index hopping and were eliminated. A cluster criterion of 0.6 was used to better account for artifacts from PCR, sequencing, and post-sequencing trimming, while still achieving a highly conservative filtering result. For remaining reads, a second round of alignment using BLASTn was performed to infer HA and NA subtype. BLAST hits that had a BLAST Score Ratio>0.4 and fell within the top 30% BLAST Score Ratio group were used to identify subtypes, but only reads with BLAST hits that were >90% concordant (to allow for curation errors in the public databases) to a reference subtype were assigned a final subtype. All sequence data were deposited into the National Center for Biotechnology Information BioProject database (PRJNA353856).
Phylogenetic analysis (Supplementary Material Fig. 5) was focused on the subtypes involved in the 2014–15 North American AIV outbreak (H5, N1, N2, N8). The final set of reads was searched against the custom nucleotide database described earlier using BLASTn (E-value 1e-20). For each read, the top hit sequence from the database was selected as the guide sequence for mapping that read. Where the same guide sequence was selected for multiple reads from the same sample, those reads were aligned to the guide sequence using MAFFT (v7.147b; Katoh et al. 2013). The multiple alignment results were visualized by MView (Brown et al. 1998), and a custom script (Duan 2019) was used to create a consensus sequence from each set of overlapping reads. Consensus sequences from each sample that were >200 base pairs were retained, and, together with selected reference sequences from GenBank, multiple sequence alignment was performed using MAFFT (Supplementary Material File 1). Single reads were excluded from further analysis. Reference sequences included those obtained from wild birds and poultry in North America and Eurasia in 2014–15 as well those from publications related to the 2014–15 outbreak (Pasick et al. 2015; Ramey et al. 2017; Supplementary Material Table 2). If an aligned base position contained a gap in >40% of the sequences, that position was removed using trimAI (Capella-Gutiérrez et al. 2009). Maximum likelihood phylogenetic trees were constructed for HA and NA subtypes using RAxML v8.2.12 (Stamatakis 2014) with 100 bootstrap replications.
Any sample in which ≥1 HA or NA subtype was detected was considered AIV-positive. The proportion of AIV-positive samples was calculated for each sampling site, and a Wilcoxon rank sum test was used to assess whether AIV positivity was distributed differently between farms and wetlands. For each AIV-positive sample, the average (mean) number of HA and NA subtypes and the average (mean) number of HA and NA reads were calculated. Wilcoxon rank sum tests were used to assess the distribution of these variables between farms and wetlands. The Bonferroni adjustment was used to establish a P value threshold of 0.0125 to account for multiple comparisons among AIV-positive samples. The proportion of AIV-positive samples containing each HA and NA subtype in each sampling site was calculated, and jackknifing was used to calculate sampling bias-corrected estimates of the relative frequency of each HA and NA subtype in farm and wetland samples (with 95% confidence intervals).
For wetland samples only, mixed effects logistic regression using lme4 (Bates et al. 2015) in R v3.5.0 (R Core Team 2018) was used to identify the most parsimonious set of the aforementioned wetland, sampling site, and sample variables that best predicted an AIV-positive sample while controlling for clustering by wetland and sampling site. Collinear variables were identified using a correlation matrix, and an automated stepwise selection procedure was used to identify a final model based on Akaike Information Criterion (Burnham and Anderson 2002).
A total of 1.4 billion high-quality reads were obtained with an average of 4.2 million reads per sample (Supplementary Material Table 1). At least one HA or NA subtype was identified by targeted resequencing in 20.6% (71/345) of samples (henceforth referred to as AIV-positive). This included 15.7% (47/300) of wetland samples and 53.3% (24/45) of farm samples. A median of 10.0% and 43.8% of samples were positive for each wetland and farm, respectively. Farm samples were significantly more likely to be AIV-positive compared to wetland samples (P=0.002). Farm samples had a median of one HA subtype, 47 nonredundant HA reads, 0.5 NA subtypes, and four nonredundant NA reads. Wetland samples had a median of one HA subtype, two nonredundant HA reads, one NA subtype, and five nonredundant NA reads. Farm samples were more likely to contain a greater number of HA subtypes compared to wetland samples (P=0.010) and marginally more likely to contain a greater number of HA reads (P=0.015). Farm and wetland samples were not significantly different with regard to the number of NA subtypes and reads (P=0.839 and 0.624, respectively).
A total of 13 HA and 9 NA subtypes were detected (Fig. 1 and Supplementary Material Fig. 4). Among the 71 AIV-positive samples, 18 (25.4%) contained more than one HA subtype and 18 (25.4%) contained more than one NA subtype, with a maximum of eight HA subtypes and eight NA subtypes in a single sample. Based on the 95% confidence intervals, there was no significant difference in subtype frequency between farms and wetlands except for H1 and H9, which were observed in farm but not wetland samples.
We detected H5 segments in 13 (3.8%) samples. Two of these samples (GL4c and ML5d) contained H5 sequences that clustered with the H5 HPAI outbreak clade 22.214.171.124 (Fig. 2). Further analysis showed that the sequences in ML5d contained the typical cleavage site for HPAI H5 clade 126.96.36.199 (PLRERRRKR/GLF). In addition to the HPAI H5, LPAI H5 was found on one farm (IP1) with sequences that clustered with other LPAI H5 strains identified in North America in 2014 (Fig. 2, top branch). Both the N2 and N1 sequences clustered with other sequences obtained from North America (Figs. 3, 4). However, while the N2 sequences did not cluster closely with any outbreak-associated sequences (i.e., any N1, N2, or N8 associated with an clade 188.8.131.52 H5 segment and obtained in 2014–15), four samples from one farm (IP1) contained N1 sequences that clustered closely with outbreak-associated sequences obtained from a British Columbia poultry farm and a wild duck in Washington, US (adjacent to British Columbia). Interestingly, the farm from which the sediment samples were obtained was infected with HPAI H5N2 and not H5N1. The N8 sequences clustered with those obtained from both North America and Europe (including outbreak-associated sequences), which is not surprising given that this subtype is believed to have been translocated from Eurasia to North America in migratory birds, and therefore similar sequences can be found in all three continents (Fig. 5).
In the final regression model, AIV-positive samples were more likely to be obtained from highly modified wetlands, from sampling sites where waterfowl feces were observed, and >1 m from shore (Table 1). Sampling within a flooded field, presence of waterfowl at the sampling site, presence of aquatic plants, and collection of the sample in a sterile manner were also included but were not statistically significant. The variance associated with the random effect of sampling site and wetland was 0.
Using targeted resequencing, we were able to identify AIVs in 20.6% of sediment samples, with a total of 13 HA and nine NA subtypes being detected. As many as eight HA or NA subtypes were identifed in a single sample, and, in some cases, multiple strains of a single subtype were detected (e.g., two distinct subgroups of N1 sequences were identified in IP1-1C; Fig. 4).
Two wetland samples contained H5 sequences that fell within HPAI H5 clade 184.108.40.206, and clustered with 2014/2015 HPAI H5 outbreak sequences obtained from infected poultry. This is likely an underestimate of the true prevalence of HPAI H5 in these samples because of the highly conservative bioinformatics pipeline that was used to avoid false positives. This pipeline included steps to aggressively address the issue of index hopping, or assignment of a read to an incorrect sample, which is a documented issue with the Illumina sequencing platform (Griffiths et al. 2018). The majority of index hopping in this study was associated with the positive controls, which contained a far greater concentration of AIV RNA compared to the sediment samples. These controls included a strain of 2014–15 HPAI H5N2; therefore filtering out misassigned reads associated with the positive control likely also resulted in the removal of true positive reads present within the sediment. Indeed, the lack of outbreak-associated N2 reads may have been a result of this filtering, whereas outbreak-associated N1 reads were more readily detected. In the future, target strains should not be included as positive controls.
Unfortunately, positive results are difficult to confirm using virus culture or PCR because sediments are complex samples where there may be no live virus and viral RNA is present at a very low concentration or is significantly degraded (Stallknecht et al. 2010; Deboosere et al. 2012; Horm et al. 2012). It is also difficult to establish strict cutoffs regarding the number of reads that constitute a positive identification, because reads are generally present in very low numbers, and even a single read of a significant strain could be an important finding. For this reason, care must be taken in interpreting and acting on genomics data, and it will be important to establish substrate- and target-specific bioinformatics standards in the future.
Sediment samples obtained from farms had a higher rate of positivity (53.3%) compared to wetland samples (15.7%). Given that only one virus subtype (H5N2 or H5N1) was identified in poultry on each farm, the diversity of AIV subtypes detected in farm sediment samples (Fig. 1) suggested that they originated from wild waterfowl (vs. poultry). However, we suggest that the tool should be primarily used for wetland-based surveillance as only farms that had experienced an AI outbreak were sampled; therefore, it may not be valid to extrapolate results to all poultry farms. Wetlands are more theoretically appealing as sentinel sites because they are locations where large and diverse waterfowl populations will more reliably congregate.
Wetland factors that increased the odds of obtaining an AIV-positive sediment sample included anthropogenic modification, the presence of waterfowl feces, and sampling >1 m from shore. Previous studies have shown that birds residing in modified wetlands tend to have higher site fidelity, leading to increased contamination of the environment with their excreta and associated pathogens (Murray et al. 2016). The presence of waterfowl feces on shore may be an indicator of greater fecal (and AIV) contamination of nearby superficial sediments; however, as in previous studies (Nazir et al. 2010), there was no association between E. coli or coliform counts (a measure of fecal contamination in the water column) and sediment positivity. This may be because AIV is degraded by environmental microbes (Henaux et al. 2012) or because AIV RNA remains in the sediment longer than it does in the overlying water column. Observation of waterfowl at the sampling site was not strongly linked to AIV positivity, potentially because this variable was the result of a single observation on the day of sampling. The eBird data could have provided a long-term metric; however, these data were available for only half of the wetlands included in the study. Therefore, the lack of an association between AIV positivity and waterfowl species richness and diversity may be a due to low statistical power. The association between distance from shore and the odds of obtaining an AI positive sediment sample is likely a result of the fact that sampling was conducted during the winter, when increased precipitation causes rapid wetland expansion; therefore, shallow sampling sites may be locations that were not submerged days or weeks earlier. In the future, these characteristics could be used to guide sediment sampling.
Sediment-based AIV surveillance has a number of advantages. Sediment samples are easier to obtain, have a higher rate of AIV positivity, and contain a greater diversity of AIV subtypes or both compared to wild bird and fecal samples (Kang et al. 2010; Shin et al. 2015; Bevins et al. 2016). Compared to water, AIVs in sediment are more concentrated and persistent (Khalenkov et al. 2008; Henaux et al. 2012). However, the microbial complexity of sediment, relatively low concentration of AIV, AIV inactivation or degradation, and presence of inhibitory substances can present significant impediments for traditional diagnostic approaches (Lang et al. 2008; Deboosere et al. 2012). Targeted resequencing can overcome many of these obstacles with the added benefit of being capable of identifying all AIVs within a sample. This is important because it is now recognized that AIV surveillance systems should aim for multitype detection rather than focusing on specific subtypes (Machalaba et al. 2015).
It is equally important to note the limitations of a sediment surveillance approach. It is not clear how AIV persistence in sediment is influenced by viral subtype and abiotic parameters such as pH, temperature, etc. (Brown et al. 2009; Stallknecht and Brown 2009). Additionally, data obtained from sediment cannot be used to determine either AIV prevalence in waterfowl or which species are infected, or to identify novel gene reassortments because it cannot be determined if HA and NA sequences originate from the same virion. That being said, from a regulatory standpoint, the implementation of appropriate monitoring and mitigation strategies is based on the detection and characterization of specific HA subtypes (World Organization for Animal Health 2016)—data that genomic analysis of sediment can provide.
Much work needs to be done to translate this tool from the present proof of concept to its implementation. The validity of the sediment surveillance approach is based on the assumption that AIV subtypes identified in sediment within a wetland are a representative and contemporaneous reflection of waterfowl using the watershed—a research question that we are currently addressing using a 2 yr longitudinal study. That study will also further examine the impact of environmental factors (e.g., temperature, pH, specific conductance, chemical composition of sediment, abundance and diversity of waterfowl using the wetlands) on sediment surveillance, and it will develop and refine field and laboratory methods (e.g., establishing sample size for surveillance).
We would like to thank Genome British Columbia and Genome Canada (E02AFV), the British Columbia Ministry of Agriculture, including the Ministry's Growing Forward 2 Program, the Sustainable Poultry Farming Group, and the Canadian Food Inspection Agency for funding this study. Computational infrastructure was supported by Compute Canada (computational infrastructure). Development of the ONETest™ Influenza Enviroscreen was supported by Simon Fraser University VentureLabs and the National Research Council's Industrial Research Assistance Program.
Supplementary material for this article is online at http://dx.doi.org/10.7589/2019-05-135.