RegulatoryTesting & AnalysisEnvironmental TestingMethodsMicrobiologicalFDAGuidelines

Use of Whole-Genome Sequencing in Food Safety

Phenotypic as well as molecular subtyping methods have been key tools in food safety and have played important roles for foodborne disease outbreak detection, identification of pathogen sources responsible for food contamination through the food chain and source attribution. Traditional phenotypic subtyping methods include most prominently serotyping as well as phage typing and biotyping, to name a few. The development of molecular and nucleic acid-based subtyping methods has revolutionized the field of subtyping; molecular subtyping methods used for foodborne pathogens can be divided into banding pattern-based methods [e.g., pulse field gel electrophoresis (PFGE); ribotyping; repetitive extragenic palindromic sequence (REP)-PCR] as well as sequencing-based subtyping [e.g., multilocus sequence typing (MLST) and multiple-locus variable number tandem repeat analysis (MLVA)]. Importantly, many molecular subtyping methods allow for more sensitive discrimination than traditional phenotypic methods (e.g., a single Salmonella serotype may be differentiated into 20+ PFGE types). These methods often also allow for more reproducible subtyping compared with traditional methods.[1]

A turning point for molecular subtyping use for bacterial foodborne disease surveillance was the establishment of PulseNet in the U.S. in 1996. This network initially focused on subtype characterization of Escherichia coli O157:H7 but was subsequently expanded to other pathogens.[2] This system has also expanded internationally as “PulseNet International.” Key innovations with PulseNet include the development and implementation of a highly standardized subtyping method for bacterial pathogens, based on PFGE separation of whole-genomic restriction digests, as well as rapid Web-based exchange of the resulting PFGE patterns. This approach has provided tremendous improvements in the ability to detect temporally and spatially distributed foodborne disease outbreaks.

While the tremendous food safety impact of PulseNet and other molecular subtyping methods is well recognized, there is no doubt that the rapidly emerging use of whole-genome sequencing (WGS) for foodborne pathogen subtyping will provide another major improvement in our ability to detect foodborne disease outbreaks and define pathogen sources throughout the food chain. Importantly, even though WGS provides for virtually complete characterization of bacterial isolates and maximum resolution for DNA-based characterization, data interpretation can and will still be challenging, particularly if one aims to establish whether two isolates that are genetically identical (or have only one or a few genetic differences) share a recent enough common ancestor to establish a cause-and-effect-type relationship. To illustrate, WGS of Listeria monocytogenes isolates obtained 12 years apart, but from foods produced in a single facility (as well as associated human cases) indicated that an L. monocytogenes strain persisted in this plant for 12 years without any detectable genetic changes in the core genome.[3] This suggests that L. monocytogenes transfer from one location (e.g., a farm) to at least one other location (e.g., processing plant, retail environment) may lead to a situation where isolates from different potential outbreak sources may show virtually identical genomes that may complicate trace-back. This also illustrates the need for good epidemiological data to facilitate appropriate interpretation of WGS data.

As the WGS revolution in food safety has started to gain momentum, it is essential for everyone involved in food safety to understand both the basics of this technology as well as its already existing and future applications and uses. While this article will provide an introduction to the application of WGS in food safety, this field is constantly changing and new technologies are rapidly being developed and improved. It is therefore essential for food safety professionals to ensure that they continue to stay informed on advances in this field, which will have significant impact in food safety and beyond (e.g., food spoilage, food authenticity and fraud detection).

Whole-Genome Sequencing: The Basics
While traditional sequencing methods have been used to sequence the complete genomes of bacteria, these methods are too time consuming and expensive to allow for routine use of bacterial WGS as part of surveillance systems or for bacterial characterization and subtyping. As described in detail in a number of review articles,[4,5] the development and commercial introduction of new rapid-sequencing methods (often referred to as “next-generation sequencing” methods) have made it possible to perform routine WGS of bacterial isolates at costs and turnaround times that make these tools competitive with more traditional molecular subtyping methods. Development of these new genome-sequencing methods was initially driven by the desire to develop tools for sequencing of a complete human genome for less than $1,000. As bacterial genomes are roughly 1,000 times smaller than the human genome (the human genome contains about 3 billion base pairs, while the L. monocytogenes genome contains almost exactly 3 million base pairs), it is easy to see how development of tools to sequence a human genome for less than $1,000 will also yield tools that facilitate affordable bacterial genome sequencing. There are several commercially available platforms for bacterial genome sequencing that allow one to complete the actual sequencing of a bacterial genome for less than $50/isolate and with turnaround times, starting from a single bacterial colony, of fewer than 5 days. Typically, to achieve sequencing at costs under $50/isolate, a considerable number of isolates must be sequenced at the same time on a given instrument to achieve maximum economy of scale. WGS of a single or a few isolates typically is an order of magnitude more expensive. This is important, as it means that in-house sequencing, for example, by a food company or food testing lab, will only be cost effective if large numbers of isolates are sequenced at the same time. Practically, this may mean longer turnaround times, as labs that receive few isolates may need to batch them into a single run and therefore may wait until they have accumulated enough isolates for WGS to be cost effective. With the current status of WGS, in-house sequencing capabilities are likely to be cost effective only if large numbers of isolates are being sequenced, or if sequencing equipment is used for multiple applications (e.g., WGS of pathogens, starter cultures and spoilage organisms, and metagenomic sequencing). On the other hand, public health laboratories involved in foodborne disease surveillance typically will receive enough isolates to make WGS cost effective, particularly since WGS for different pathogens can be performed in the same run (unlike methods like PFGE, where different gels may be needed for different pathogens).

Advantages of WGS over Other Molecular Subtyping Methods
While PFGE (as well as other molecular methods) has had tremendous positive impacts on food safety, it and other methods have shortcomings and challenges that can, and will, be overcome by WGS. For example, PFGE and other methods have shown limited discriminatory ability for some highly clonal pathogen populations, such as specific Salmonella serovars (e.g., Enteritidis[6] and Montevideo[7]). As WGS provides significantly improved subtype discrimination and can discriminate isolates that share identical PFGE types, WGS improves outbreak detection. For example, more than 50 percent of Salmonella Enteritidis isolates show identical PFGE types, but WGS can further differentiate isolates that share this common PFGE type and thus identify outbreaks that would not be detected by PFGE alone or even a combination of PFGE and MLVA.[6] PFGE also sometimes yields different patterns for isolates that are closely related. This occurs because a large part of bacterial genomes can rapidly change through acquisition or loss of plasmids or chromosome-integrated prophages; typically, these types of changes yield isolates that differ by three or fewer bands in their PFGE patterns with a given enzyme. This can cause practical challenges: for example, when pathogen isolates from human patients and a food epidemiologically implicated as an outbreak source differ by one to three bands in PFGE. These types of findings complicate high-confidence assignment of an outbreak source. WGS, on the other hand, can easily and rapidly determine whether isolates that differ by specific genetic elements or by a few bands in PFGE are otherwise genetically closely related or not, as shown in the use of WGS to clarify the genetic relatedness of isolates in a large listeriosis outbreak that occurred in Canada in 2008.[8] Generally, WGS allows for highly improved discriminatory power as well as characterization of evolutionary relatedness of isolates, which is not possible with PFGE. In addition, WGS has technical advantages over PFGE and many other subtyping methods such as the potential for a higher level of automation, a simpler integrated work flow, reduced time of analysis and generation of highly standardized and compatible data even with different sequencing platforms.

While rapid analysis of WGS data still remains somewhat of a challenge, and may in some situations represent a bottleneck, easy-to-use, high-throughput bioinformatics tools for bacterial WGS data have been developed and are rapidly being improved. Currently, reliable WGS data analysis still requires a trained bioinformatician to select, properly run and maintain the necessary pipeline of different analysis tools. Typical bioinformatics pipelines can now provide for initial WGS-based classification of isolates in less than 1 hour after raw data are downloaded from the actual sequencing hardware. Alternative approaches, such as a whole-genome MLST approach, which is currently used by the U.S. Centers for Disease Control and Prevention (CDC), further simplify analyses and allow for initial data analyses in a matter of minutes (e.g., 5 minutes as communicated in a CDC presentation[9]). These initial WGS data analyses do not provide detailed genomic information, such as identification of specific genes or prophages or plasmids, though; detailed and more lengthy data analyses are required to gain this type of additional information, which can provide valuable data on the genomic content of isolates, such as presence of novel antibiotic resistance or virulence genes. Even there, rapid tools to extract and identify specific genes are being developed to allow for rapid identification of specific genomic elements and genes,[10] such as antibiotic-resistance genes.[11]

Use of WGS for Foodborne Disease Surveillance
With its advantages and rapidly decreasing costs, WGS has been integrated into routine foodborne disease surveillance. While retrospective studies on the use of WGS for foodborne disease surveillance have been conducted since about 2010, routine use of WGS was initiated by the CDC in 2013. Specifically, all L. monocytogenes isolates obtained from human disease cases in the U.S. have been characterized by WGS since fall of 2013.[12] Capabilities to perform WGS for this type of surveillance exist at some state public health laboratories as well as at CDC; the impact of WGS implementation has been seen with detection of a number of smaller listeriosis outbreaks,[13] at least some of which would have likely gone undetected with sole use of traditional subtyping methods such as PFGE. As it is being implemented in not just the U.S., WGS also will facilitate detection of multi-country outbreaks, as supported by exchange of L. monocytogenes genome sequences between CDC and Canadian investigators, which showed a perfect match between the genome sequences for a lettuce isolate obtained in Canada and a human isolate obtained in Ohio.[14] While L. monocytogenes is a highly suitable model for initial implementation of WGS-based foodborne disease surveillance, due both to a relatively small number of human isolates per year and to its relatively small and easy-to-sequence genome, WGS also is increasingly used by public health and regulatory agencies to characterize other foodborne pathogens, in particular Salmonella. Importantly, U.S. government laboratories are moving to open release of isolate WGS data in real time, in GenomeTrakr, even though metadata are still embargoed for a time. This will facilitate improved utilization of WGS data created for foodborne disease surveillance by groups other than public health laboratories.

Use of WGS for Source Trace-Back
In addition to routine WGS of human clinical isolates, routine WGS characterization of foodborne pathogen isolates obtained from food and environmental samples collected by regulatory agencies is increasingly common and has been spearheaded by the U.S. Food and Drug Administration (FDA). At this point, it is probably appropriate for the food industry to assume that any isolate obtained from a food or environmental sample collected by FDA undergoes characterization by WGS with subsequent comparison of the genome sequence to available human clinical isolates. This approach has started to lead to the identification of human cases and outbreaks likely linked to a contaminated food. For example, in 2014, genome sequences of L. monocytogenes isolated from recalled Hispanic-style cheese produced by Oasis Brands Inc. were found to be highly related to sequences of L. monocytogenes isolated from five ill people, one each in Georgia, New York, and Texas, and two in Tennessee; all of these individuals reported consuming Hispanic-style soft cheese, suggesting that these illnesses could have been related to products from Oasis Brands.[15] Importantly, however, WGS is not a magic bullet that allows for accurate and reliable identification of outbreaks and outbreak sources in the absence of appropriate food consumption history and epidemiological data. Continued investment in epidemiological data collection and analysis capabilities is critical to take full advantage of WGS-based subtyping data for foodborne pathogens.

The Future of WGS in Food Safety
The use of WGS-based characterization of foodborne pathogens by both public health and regulatory agencies will likely expand very quickly and may replace PFGE in the not-too-distant future. The technologies for WGS will also continue to develop and become increasingly simple, with a highly streamlined work flow that will facilitate more widespread application of these tools. With the rapid development of genome-sequencing technologies, food safety applications of sequencing beyond WGS will also rapidly grow. For example, metagenomic applications may have a major impact on food safety, particularly since these tools will allow for detection and identification of nonculturable and previously unknown pathogens, including bacteria, viruses and parasites, in both food specimens and clinical samples. With estimates that around 80 percent of foodborne disease cases in the U.S. are caused by unspecified agents, including known agents not yet recognized as causing foodborne illness, substances known to be in food but of unproven pathogenicity, and unknown agents,[16] these tools likely will reveal the identity of some of these agents, which will provide opportunities to further reduce foodborne illnesses. Analysis of short-read metagenomics data may not always provide for accurate identification of bacteria present, though, and may provide potentially misleading data. For example, short DNA pieces from a nonpathogen could be misidentified as representing pathogen DNA;[17] some of these issues will likely be overcome with new platforms that sequence larger DNA fragments. Industry adoption of WGS and metagenomic approaches for the detection and characterization of foodborne pathogens and disease agents may be slow and hampered by liability concerns. In addition to the potential for misidentification, metagenomics-based approaches may detect and sequence DNA from dead organisms, which are expected in any foods that undergo kill steps such as heat treatment. This may lead to false positives and associated misleading results when DNA from dead pathogens is detected in a properly processed and safe product. A key challenge will be to create a regulatory environment that will facilitate broad industry use of WGS, which will help ensure widespread application of these tools and consequently improve food safety, due to improved trace-back to contamination sources, for example. In the future, integration of WGS and other genomics-based tools with other large datasets (big data) will likely drive a big data paradigm shift in food safety, which has the potential for even larger food safety improvements.[18]

Martin Wiedmann, Ph.D., D.V.M., is a professor in the department of food science at Cornell University and a member of the Cornell Institute for Food Systems.

References
1. Wiedmann, M. 2002. Subtyping technologies for bacterial foodborne pathogens. Nutr Rev 60:201–208.
2. Swaminathan, B et al. 2001. PulseNet: The molecular subtyping network for foodborne bacterial disease surveillance, United States. Emerg Infect Dis 7:382–389.
3. Orsi, RH et al. 2008. Short-term genome evolution of Listeria monocytogenes in a non-controlled environment. BMC Genomics 9:539.
4. Wiedmann, M et al. 2011. Next-generation sequencing methods revolutionize food microbiology. Food Technol 65(6):62–73.
5. Bergholz, TM et al. 2014. ‘Omics’ approaches in food safety: Fulfilling the promise? Trends in Microbiology 22:275–281.
6. den Bakker, HC et al. 2014. Rapid whole-genome sequencing for surveillance of Salmonella enterica serovar Enteritidis. Emerg Infect Dis 20:1306–1314.
7. den Bakker, HC et al. 2011. A whole genome SNP based approach to trace and identify outbreaks linked to a common Salmonella enterica subsp. enterica serovar Montevideo pulsed field gel electrophoresis type. Appl Environ Micro 77:8648–8655.
8. Gilmour, MW et al. 2010. High-throughput genome sequencing of two Listeria monocytogenes clinical isolates during a large foodborne outbreak. BMC Genomics 11:120.
9. www.efsa.europa.eu/en/events/documents/140616-p06.pdf.
10. Inouye, M et al. 2014. SRST2: Rapid genomic surveillance for public health and hospital microbiology labs. Genome Med 6.
11. McArthur, AG et al. 2013. The comprehensive antibiotic resistance database. Antimicrob Agents Chemother 57:3348–3357.
12. http://aphltech.org/2013/11/25/cdc-begins-real-time-whole-genome-sequencing-of-listeria-monocytogenes/.
13. www.cdc.gov/listeria/outbreaks/cheese-02-14/.
14. www.aphl.org/aphlprograms/food/pulsenet/PulseNetTraining/Listeria_state_call_6-12-14.pdf.
15. www.cdc.gov/listeria/outbreaks/cheese-10-14/index.html.
16. Scallan, E et al. 2011. Foodborne illness acquired in the United States—unspecified agents. Emerg Infect Dis 17:16–22.
17. nickloman.github.io/2015/02/11/metagenomics-best-hit-analysis-caveat-emptor/.
18. Strawn, LK et al. 2015. Big data in food safety and quality. Food Technol 69(2):42–49.