Milk safety advances: Scientists untangle microbial challenges with AI and metagenomics
Scientists in the US have combined DNA sequencing and AI to detect anomalies in milk production, such as contamination or unauthorized additives. The “proof of concept” study can enhance dairy safety measures and has wider implications in the food industry, according to researchers from Penn State, Cornell University and IBM Research.
Milk is the second most susceptible to counterfeiting after olive oil, which increases milk quantity but significantly hampers quality, according to studies. Moreover, the counterfeiting methods have evolved to become more complex, which experts say calls for more sophisticated detection techniques.
The team used an “untargeted approach” to identify antibiotic-treated milk experimentally and randomly added to the bulk milk samples they collected.
“We can look at the data from the microbes in the raw milk and using artificial intelligence, see if the microbes that are present reveal characteristics such as whether it is pre-pasteurization, post-pasteurization, or is from a cow that has been treated with antibiotics,” says study lead Erika Ganda, assistant professor of food animal microbiomes, Penn State College of Agricultural Sciences in the US.
The scientists selected milk as the study’s model because it is the sole ingredient used to produce fluid milk — a high-volume food with considerable concern for fraud, particularly in developing countries, explains Ganda.

The findings suggest that AI has the potential to significantly enhance the detection of anomalies in food production, providing a more comprehensive method that can be added to scientists’ toolkits for ensuring food safety.
Enhancing traditional analysis with AI
The team also applied their “explainable AI tool” to publicly available, genetically sequenced datasets from bulk milk samples to showcase the method’s robustness and corroborate their findings, published in mSystems, a journal of the American Society for Microbiology.
The team chose milk for the study due to the considerable concern for fraud, particularly in developing countries.The researchers collected 58 bulk tank milk samples and applied various AI algorithms to differentiate between reference samples and those representing potential anomalies, such as milk from an outside farm or milk containing antibiotics.
The study characterized raw milk metagenomes — collections of genomes from many individual microbes within a sample — in more sequencing depth than any other published work to date and demonstrated that there is a set of consensus microbes found to be stable elements across samples, claim the researchers.
“Traditional analysis of microbial sequencing data, such as alpha and beta diversity metrics and clustering, were not as effective in differentiating between baseline and anomalous samples,” notes Ganda.
“However, the integration of AI allowed for accurate classification and identification of microbial drivers associated with anomalies.”
Additionally, microbial systems and the food supply chain are an ideal application for AI since the interactions between microbes are complex and dynamic, underscores Kristen Beck, senior research scientist from IBM Research and the study’s first author.
“There are also a multitude of variables in the food supply chain that affect the signal we’re seeking to observe. AI can help us untangle the signal from the noise.”
Targeting food fraud
Issues in food quality and safety can have rippling effects through the supply chain, leading to health and economic damage, explains Ganda. She believes there is substantial interest in applying both targeted and untargeted methods to identify ingredients or food products that show an increased risk of food fraud, quality and safety issues.
The researchers applied AI algorithms to bulk tank milk samples to differentiate reference samples from potentially contaminated ones (Image credit: GCShutter).Moreover, when key ingredients become less readily available and the industry faces pressure to limit retail price increases, it makes the global supply chain more susceptible to food fraud and adulteration, Kerry previously told Food Ingredients First.
“Untargeted methods characterize all molecules that can be identified to identify ingredients or products that deviate from a ‘baseline state’ that would be considered normal or under control,” says Ganda.
“Importantly, these untargeted methods are screening methods that do not define an ingredient or product as unsafe or adulterated, rather they suggest an aberration from the normal state that should trigger follow-up actions or investigations.”
Amalgamation of scientific expertise
Ganda points out that the research collaboration featured IBM’s open-source AI technology, Automated Explainable AI for Omics, to process metagenomic data and analyze all the microbes in the bulk milk samples. This allowed the team to identify microbial signatures that traditional methods can often miss.
The Cornell researchers’ expertise in dairy science enhanced the research’s practical relevance and applicability to the dairy industry, while Penn State’s One Health Microbiome Center in the Huck Institutes for the Life Sciences helped in integrating microbial data for broader health and safety applications.
The US Department of Agriculture (USDA) supported the work through Penn State.
Future work
The scientists acknowledge that for results to be applicable for industry, the sampling needs to be larger and envision extending the approach to larger data sets as they become available.
“Future challenges will include the need to define the appropriate specificity and sensitivity for models that can identify abnormalities in products or ingredients.”