Friday, June 26, 2026
Environmental ScienceHuman Exposure and Public HealthSustainabililty

Can AI be Used to Predict Water Sanitary Risk in Real Time?

Featured Image Caption: Panoramic view of Ubrique Spain by Malopez 21,  CC BY-SA 4.0, via Wikimedia Commons

Primary article

Fernández-Ortega, J., Barberá, J. A., & Andreo, B. (2026). New insights into machine learning prediction techniques for real-time sanitary risk assessment in karst drinking water sources affected by faecal contamination. Water Research (Oxford), 290, Article 125060. https://doi.org/10.1016/j.watres.2025.125060 

Secondary articles:

Levin, R., Villanueva, C.M., Beene, D. et al. US drinking water quality: exposure risk profiles for seven legacy and emerging contaminants. J Expo Sci Environ Epidemiol 34, 3–22 (2024). https://doi.org/10.1038/s41370-023-00597-z 

Tao, Y., & Gao, P. (2025). Global data center expansion and human health: A call for empirical research. Eco-Environment & Health, 4(3), 100157. https://doi.org/10.1016/j.eehl.2025.100157 

Ward, J. S. T., Lapworth, D. J., Read, D. S., Pedley, S., Banda, S. T., Monjerezi, M., Gwengweya, G., & MacDonald, A. M. (2021). Tryptophan-like fluorescence as a high-level screening tool for detecting microbial contamination in drinking water. The Science of the Total Environment, 750, Article 141284. https://doi.org/10.1016/j.scitotenv.2020.141284 

Zerga, B. (2024). Karst topography: Formation, processes, characteristics, landforms, degradation and restoration: A systematic review. Watershed Ecology and the Environment6, 252–269. https://doi.org/10.1016/j.wsee.2024.10.003

https://www.coursera.org/articles/what-is-machine-learning

https://www.epa.gov/report-environment/drinking-water

https://www.usgs.gov/publications/chemical-tracer-methods


How do you know that the water you drink is safe?  Virtually all drinking water in the US comes from fresh surface waters and ground water aquifers that are then treated by a public water system.  However, water can still be contaminated by chemicals, microbes, and radionuclides.  This is caused by industry, agriculture, human and animal waste, byproducts of treatment used to remove contaminants, and natural sources present in local underground soil, sewer overflows, and cracked pipes.

It is important to monitor the quality of water, but traditional water testing can be costly and labor intensive.  They don’t cover the variations of water quality that can happen as quickly as an hour.  For this reason, researchers propose the use of machine learning as a new method to indirectly measure the safety of water.

Using the two spring water sources draining a karst aquifer in Spain, researchers wanted to determine whether machine learning could be used to provide real-time insights on the water quality and predict the risk levels of E-coli based on water measurements.

Researchers were able to develop effective predictions from the two water sources and claim that the method they used could potentially be integrated as an early-warning protection tool for safeguarding drinking water.

What is Machine Learning and Where Does it Fit with AI?

 ML is a subset of AI that can make predictions with or without supervised learning methods.  Image Source: Unraveling AI Complexity by  PopovaZhuhadar, CC BY-SA 4.0 via Wikimedia Commons

Before diving into the study, it is important to understand machine learning.  Machine learning is a type of artificial intelligence that involves a prediction of an outcome.  Essentially, a program is fed a set of rules called algorithms, given past data sets based on these algorithms, and then asked to predict the future based on this information.

It sounds simple, but the process requires statistical thinking; a careful input of parameters, data sets, and refinement of data sets based on what the researcher hopes to study and predict.  Various iterations of machine learning have been around for 80 years.  You  interface with machine learning everyday.  

Whenever you get a recommendation for a product, your bank flags a fraud alert, or you use speech recognition software, you are interfacing with machine learning.  In terms of ML to improve water quality, ML has been used to predict arsenic and fluoride in aquifers, but not faecal bacteria.  In this study, ML was being applied to study E-coli contamination levels. 

Understanding the Test Site: Go with the Flow and Uncover the Water Source

The geology of Ubrique aquifer is made of anticline (archshaped) folds, synclines (trough/basin shaped depressions), limestone, tertiary clay, and sandstone formation. Highlighted in reddish brown were the areas where scientists collected their water samples. Image source: Image was adapted from original image- Fernández-Ortega, J., Barberá, J. A., & Andreo, B. (2026). New insights into machine learning prediction techniques for real-time sanitary risk assessment in karst drinking water sources affected by faecal contamination. Water Research (Oxford), 290, Article 125060. https://doi.org/10.1016/j.watres.2025.125060  

When researchers tackle a problem as big as water sanitation risk, they don’t just grab water samples, plug in some numbers, and call it a day.  They have to understand the hydrogeology of how water is being fed into a system they want to study before deciding where they want to collect water samples.  

Researchers are detectives as much as they are scientists. Using water tracer tests, an understanding of the regional geology, weather patterns, and human activity relevant to the test site, scientists put together Ubrique’s water supply story. 

In the northeastern province of Cadiz province is a pueblo called Ubrique. It is a rural town known for tanning leather, goat and sheep livestock farming, cheese industry, and has a wastewater treatment plant (WWTP) 150 m upstream a neighboring village.The majority of residents living here get their water supply from Sierra de Ubrique binary karst aquifer.  Essentially, a binary karst aquifer is a groundwater system that receives water from direct rainfall and runoff from another secondary source. 

The aquifer recharges through multiple sources: by rainfall, flysch catchment, and spring overflow near the Algarrobal.  In addition ground pressure from livestock activity and the WWTP contribute to the karst.

Based on a field tracer test result researchers conducted a few years earlier, they decided to focus collections at two different sites: Algarrobal and Cornicabra.  Both of these sources feed into the main aquifer.

Water Pollution: If it Doesn’t Show Up in the Wash, it Shows up in the Rinse!

Image Source:  Photo by Eric Erbe, digital colorization by Christopher Pooley, both of USDA, ARS, EMU., Public domain, via Wikimedia Commons

If it doesn’t show up in the wash, it shows up in the rinse. This adage works for this experimental study. The quality of groundwater (the wash) is easier to measure once the water exits spring water (the rinse), especially after rainwater events. In this case, studying the Algarrobal and Cornicabra spring discharge aka spring flow provide microbial clues that will determine the sanitary risk of the Ubrique aquifer.

After researchers understood what water sources were feeding the aquifer, they were ready to collect their samples. Between 2020 and 2023, a total of 194 groundwater samples were collected from these two springs.  Researchers looked at electrical conductivity, turbidity, Tryptophan-like-Fluorescence (TLF), and sanitary risk measurements for E-coli were recorded. Each of these variables provide information on whether water is safe to drink.

E-coli is known to cause enteric diseases and is easily transported in water. The collective parameters that were chosen in this study provide details about microbial contamination.  For example, electrical conductivity of water provides information of dissolved salts and other inorganic chemicals in water.  Changes of electrical current in the water from its baseline can mean there is a disturbance in the water.  Turbidity is all about the clarity of a liquid. A lack of clarity indicates that there are potential pollutants in the water.  The higher the turbidity, the higher the likelihood of pollution. Finally, TLF is used to measure bacterial activity by detecting the L-Tryptophan molecule. 

ANOVA was the statistical test used to determine whether each group category was equal or statistically significant.  Significant values were used for ML models.

Model Selection for Predictive Modeling

Once parameters were determined (electrical conductivity, spring discharge, turbidity, thresholds for safe/unsafe E-coli, etc), it was time to train the ML to do predictive modeling. 

Based on the data sets collected from water sampling, the AI system was trained through ten modeling tools to predict the remaining variables in the data set and recognize patterns within the data set. To prevent model bias and to obtain a more accurate estimate, a stratified cross-validation was performed and a receiver operating curve was used to select the best modeling method with the lowest error rate. 

An easier way to think about the process of ML was trained and then asked to perform. Here is the data of the water samples we collected.  Learn its patterns. Now, based on new input and the mathematical rules (algorithms) that have been provided to you, what are the patterns between the different categories studied? Estimate and predict the level of E-coli contamination that has occurred using your best performance test run and what we have classified as none, low, medium, high, and very high E. coli counts per mL.  

Although there is more nuance in training ML this was the overall goal of the study.

Tracer tests, a common diagnostic test that hydrologists used to monitor a water’s path, were also used to determine whether the ML’s data results made sense with what was actually occurring at the spring sites. 

Not all Spring Karsts are Considered Equal

Both springs responded differently to rain events even though they were both draining to the same aquifer.  Cornicabra has a directly proportional relationship for spring discharge, turbidity, and TLF while electrical conductivity has an inverse relationship. For Algarrobal spring, all parameters had a proportional relationship.  Algarrobal had nearly double the amount of E. coli.

Not all karst systems respond to environmental pressures and conditions the same way and each karst will have different parameters for ML to study.    

Although the results of the tracer test made sense with the prediction of ML. They recommend that a proper validation is needed by occasional water sampling and traditional culture methods based on the site of interest. They further assert that these advances could improve water source protection strategies.   

 Is AI Good or Bad: The Intersection of Social and Environmental Pressure

Although AI has been the driving force in diagnosing diseases, energy management, and biodiversity monitoring, it has also increased social and environmental pressure.  Data centers that house larger models place demands on electricity, water,supply of minerals, cause noise, and air pollution.  Communities near these data centers are the most vulnerable. Can anyone guess where these data centers are calling home and expecting to expand? Primarily rural areas in the South and the Midwest. It is estimated that in 2030 U.S. data centers could contribute to nearly 1300 deaths annual, resulting in a public health burden that exceeds $20 billion.  Power data centers will have to rethink how it runs its infrastructure. Using a water monitoring system that impacts water systems is not a sustainable strategy.


Interested in learning about machine learning, AI, and geology? Check out the videos below!

Machine Learning Goals

Video on how AI uses drinking water.

How AI uses our drinking water – BBC World Service

What is a flysch?

Geopark of the Basque Coast: the Earth’s history book

Share this:
Avatar photo

Christina Andrea Alvear

I'm a freelance writer in San Antonio, Texas. I earned a MS in Biology at the University of Texas at San Antonio. My goal is to make primary research fun and accessible to everyone while connecting with other science writing enthusiasts. I've explored a variety of careers from research, education, and nonprofit mental health, substance abuse, and healthcare programs. When I am not writing or working, I like to lounge around at a coffee shop on a weekend or enjoy a board game with friends.

Leave a Reply