Microbiological

New Software From University of George Improves Accuracy of DNA Sequence Analysis

Researchers from the University of Georgia’s Center for Food Safety have developed software that functions as an important step in improving the accuracy of DNA sequence analysis when testing for microbial contamination.

Sepia is a cutting-edge read classifier, written by College of Agricultural and Environmental Sciences Assistant Professor Henk den Bakker, that is available as open-source software. It is expected to make genome sequencing much faster for researchers studying bacteria.

The length of chromosomes of bacteria typically range between 1.5 million base pairs to roughly 9.5 million base pairs, but if researchers want to “read” the individual bases of a genome (the genome sequencing process), they must do it in pieces of 150 to 10,000 base pairs, using modern technology. These pieces are called “reads.”

When researchers want to determine what types of microorganisms and viruses are present in a sample—such as in a nasal swab—and sequence the DNA of those organisms, they use a tool called a “read classifier” to quickly sort through the reads and determine to what microorganisms they most likely belong.

Like other read classifiers, den Bakker’s new software works by cross-referencing the information from the sample to existing databases, but it is designed to address challenges in the process posed by potential errors in the taxonomic information available on some microorganisms or the switch to a new taxonomic system altogether.

Since bacteria are often single-celled microorganisms lacking physical distinctions, they are more difficult to classify than more complex organisms, such as mammals or reptiles. Researchers have only recently begun using DNA to determine the taxonomy of microorganisms. This means that the taxonomy of some databases referenced by read classifiers are sometimes not in agreement with what similarities in DNA show.

“Only recently, in the last decade, we began sequencing these organisms and using the genetic data to build taxonomies. That’s very important because when we know things are genetically similar, a read classifier can use that information to make predictions,” den Bakker said.

Looking for quick answers on food safety topics?
Try Ask FSM, our new smart AI search tool. Ask FSM →

Using these predictions, when the read classifier discovers an organism that is missing from the database, it can help researchers determine what that unidentified organism is most closely related to by comparing its genetic material to that of known microorganisms, he said.

When writing the software, den Bakker intentionally made it simple for the end user to make edits and corrections, as needed, to help address the problems with the taxonomy used in databases. Given its wide range of applications, much of his focus was on creating software that was user-friendly, allowing researchers to easily edit the taxonomy of the databases if they find an error.

To test the software, den Bakker recruited the help of Lee Katz, a bioinformatician with the U.S. Centers for Disease Control and Prevention (CDC) and adjunct faculty member with the UGA Center for Food Safety. Katz tested the software for genome contamination, which occurs when researchers confirm that they have sequenced only the organism that they are interested in, and not a mixture of organisms. Based on his findings, Katz has suggested its use to CDC colleagues for metagenomics analysis.

Den Bakker anticipates that the software in its current form will function as a base model onto which he will build additional features. One such upcoming feature is designed to help protect patient confidentiality by removing human DNA from test results. Researchers will then be able to share the results of their research while simultaneously complying with health information privacy laws.

“For me, writing software is also exploring new data structures on a data science level—how to make these things more efficient. Writing it is more or less like starting an experiment in the lab,” den Bakker said.

The software is available now and is free to download on GitHub. More information on Sepia can be found in The Journal of Open Source Software.

Recommended Content

Serovar Differences Matter: Utility of Deep Serotyping in Broiler Production and Processing
This article discusses the significance of Salmonella in...
Contamination Control
By: Nikki Shariat Ph.D.
Building a Culture of Hygiene in the Food Processing Plant
Everyone entering a food processing facility needs to...
Management
By: Richard F. Stier, M.S.
Climate Change and Emerging Risks to Food Safety: Building Climate Resilience
This article examines the multifaceted threats to food...
Management
By: Maria Cristina Tirado Ph.D., D.V.M. and Shamini Albert Raj M.A.

Live: February 25, 2026 at 2:00 pm EST: From this webinar, attendees will learn how large food manufacturing organizations can successfully manage their supply chain, food safety, and regulatory risks.

Live: March 3, 2026 at 2:00 pm EST: In this high-level, exclusive webinar, FDA Deputy Commissioner for Human Foods Kyle Diamantas and USDA Under Secretary for Food Safety Mindy Brashears, Ph.D. will share their agencies' regulatory priorities and work plans for 2026 and beyond.

Stay informed on the latest food safety trends, innovations, emerging challenges, and expert analysis. Leave the Summit with actionable insights ready to drive measurable improvements in your organization. Do not miss this opportunity to learn from experts about contamination control, food safety culture, regulations, sanitation, supply chain traceability, and so much more.