Building a digital twin of the ocean: 4D ecological monitoring with robotics, AI-powered analytics, and Grafana

Marine ecosystems face accelerating pressures from climate change, overexploitation, and habitat degradation, demanding continuous, high-resolution environmental intelligence. To achieve that, DIGI4ECO, a Horizon Europe project, is developing a 4D ecological monitoring system built on a digital twin of the ocean, integrating advanced robotics, AI-powered analytics, and unified data observability pipelines with Grafana.

In this session, Jacopo Aguzzi, Senior Research Scientist at the Institute of Marine Sciences, and Enoc Martinez, Assistant Professor at the Electronic Engineering Department of UPC, discuss how DIGI4ECO's coordinated network of fixed and mobile robotic platforms forms a distributed "physical twin of the ocean." These platforms capture biological, environmental, opto-acoustic, and molecular (eDNA/eRNA) information at unprecedented temporal frequency and spatial resolution.

At the Digital Twin Ocean (DTO) layer, AI algorithms — including state-of-the-art convolutional neural networks and object-detection architectures (e.g., YOLO) — automatically identify species, compute ecological indicators, and characterize biodiversity trends. These indicators inform restoration strategies and support policy-relevant assessments aligned with EU biodiversity and restoration goals.

Jacopo and Enoc also share the interactive Grafana dashboards that visualize both physical and biological data streams, from oceanographic time series and acoustic profiles to AI-processed underwater imagery and ecological indicators. These dashboards demonstrate how observability tools can bridge robotics, environmental monitoring, and marine policy needs, turning complex ocean data into actionable intelligence.

Jacopo Aguzzi:

Thank you very much. Thank you very much. Good morning, everyone. I'm Jacopo Aguzzi from the Marine Science Institute of Barcelona, one of the five Marine Institute in Spain belonging to the Spanish National Research Council, which is the largest academic institution in the planet. Sorry, in the planet Spain. Okay.

Let me introduce myself properly. This is my background. It's in marine ecology. I work in fishery management. And since more than two decades, I got specialized, alphabetized toward the technological development for solutions to monitor the marine environment from the coastal areas to the deep sea. Enoc?

Enoc Martínez:

Hi, everyone. My name is Enoc Martinez. I work at UPC, Technical University of Catalonia. And my background is electronic engineering, although I spent the last 10 years doing scientific data management in scientific data infrastructure specifically focused to the ocean. And here you can see me on a boat, but I actually spend 99% of my time behind the desk.

Jacopo Aguzzi:

Okay. Let's start with the context where DIGI4ECO board, which is the ecological monitoring. Why we need ecological monitoring today? As all you know for sure knows, we have a problem in the sea, like in the terrestrial environments, there is an increasing piece of anthropic activity that damage different ecosystems. There is an interest to monitor them. And for here, I choose two iconic figures. For you, the first one is referring to the industrial extractive activities by trawl hauling, occurring almost in all continental margin of the planet that sweep entire ecosystems on a daily basis. And then we have the omnipresent effect of plastic pollution.

These are two out of many impacts we have. And that's why we have the need to monitor the environment and policy environmental management agencies, including the European communities. The European community are interested in developing metrics to assess the GES, the good environmental status, of marine ecosystems within the biodiversity agenda of the European community.

GES is something that belong to the Marine Strategy Framework Directive, which is a policy ruling body. And that is aiming to measure the status, the health of those ecosystems through 11 descriptors or ecological indicators. Descriptors like, for example, the seabed status, the biological diversity, the presence of contaminant in the seafood as a service we extract from the ecosystems and we heat, or the status of commercially exploited marine species.

But those descriptors are semantically open, so they require a thoughtful dialogue of scientists and policymakers on which type of data we need to compute them. It's very important. So environmental biological data, special temporal scales, and in the end, the discussion always goes to technologies. Why technologies? Because monitoring the ocean, it costs a lot of money. You have to think that what we know about the sea, most of the sea, is because we have been collecting data by vessels in the past 200 years. Vessels are very expensive technologies. Why? Because if you think that we have a star vessel in the Spanish fleet that is named Sarmiento de Gamboa, Sarmiento de Gamboa is an average length vessel of, I would say, 70 meter length, more or less. It's not so big, it's not so small.

In the world planet of the fleets, so it's in the average size. It costs per day up to 25,000 euros. It's a lot of money. So this means that for us to collect the data, data is money, and we need to have a strong budget in support. Plus, there are much more people willing to use the vessels than the vessels themselves. So we have a problem of calendar. They're always booked. That's why DIGI4ECO is willing to drive ecological monitoring toward the use of robotic networks. So autonomous platforms that are being developed under dilemma bring back scientists from the decks of the vessels, so from the field to their office on the shore.

Sorry, Enoc, you will be more on the office. And we use three major type of robotic platforms in DIGI4ECO. The first one are cabled observatories, which are tethering the platform through telephonic or internet cables to shore. So they stream data continuously in real time and they get power to be fed over consecutive years of action.

And then, we have their equivalent, which are Landers. They are a standalone version of cabled observatories. No cables, easy to be adaptively redeployed, strategically redeployed, but with a much shorter autonomy because they have battery packs. In general, up to one year. Because those two platforms have a limited data collection capability because they're fixed. Environmental more heterogeneous than a single pixel of space in which you place those platforms. Then we're working with robotic solution to automatically sweep the environment around those platforms at an hectare scale. And I'm talking about data-read docked crawlers, which are trackable vehicles that we'll describe you in a few slides.

So why this is important? Because European community is trying to change the paradigm of research. And one of the line of of these changes to paradigm is digital twin of the ocean. Digital twin of the ocean is the creation of virtual replicas of the ecosystems. More data we have, better the replica is. Which type of data? The data that we stream with our platforms that are endowed with multi-parametric sensors assets. We have three major type of data that we stream into the digital twin. Biological, and I'm talking about AI to process images. Then we have AI to process acoustic output because we listen to the seascape looking for biological sounds. That's something that Enoc will describe with more detail later on.

And then, we have geochemical data which are provided by sensors that characterize the solvent oxygen. For example, the solvent methane, CO2, nitrate, phosphate from agriculture, pH turbidity, and all the stuff, plus the classical stenographic sensors that characterize the water masses in terms of salinity, temperature, and water speed and direction for currents.

With these, we pretend to create not only virtual replicas of the environments, but also real-time updated replicas of those environments. Okay, why? Because we want to know precisely the present status to run "what if" scenarios by artificial intelligence. We have from one side, the IPCC, the international panel for climate change, that's telling us that on the basis of present knowledge, the status of the sea in the next 50 year will be, in relation to temperature salinity and so on, what it will be. Okay? I won't enter in that.

But we want to model the biological response of life, living components, in the ecosystem. So we want to know, for example, in relation to an increasing temperature and salinity, what will be the presence of species as we see now. If they will go away, if they will go extinct, or they will reduce their numbers of individuals and things like this.

Plus, all these need to be somehow synthetically collected into a web framework in which citizens and policymakers can log on and can synthetically strategy information in a meaningful ecological output that they can understand and can use. That's web visualization. So in this respect, the DIGI4ECO has the following aims. The first aim is that we want to virtually recreate digital replicas of four ecosystem, coastal ecosystems, 20 meter depth in the Atlantic, two in the Atlantic, two in the Med, in the Mediterranean. Why coast? Because humanity developed along the coast. Along the river as well, but this is not the target of our project. Coastal lines are the first line of impact of humanity. We have master platforms in this robotic network establishment for this for environment, which are two cable observatories, one in Galway Bay in Ireland, the other one here close to Barcelona that I will describe you later in a bit, and then we'll go in more detail.

Then we have crawlers operating in Kristineberg above Gothenberg in Sweden, and then we will have in the Eastern Mediterranean Sea, of the Italian cost in Ancona, landers. We will build a standardized digital twin for these four different environments. The heart of it will be an online data bank that remotely will store all the data, will pre-process them for homogenization and standardization, and that will be embedded AI routines for flagging data quality, eliminating outliers, and using different models to fill up the gaps in that acquisition for sensor malfunctioning and all the stuff that will depend on the length of the missing values.

We are going to build up ... Of our digital twin, this is the backend. The front end, so the phenotype of DIGI4ECO, as it will appear online to end users, will be this one. This is a mock up, because the project is four year and we started only two years ago, so we have other two years to go.

Okay. This mock up is telling you where we're going in terms of interaction on the web with people that will log on on the project website. They will select the area, they will select the platforms and the sensors, so the biological and environmental variables they want to analyze together, and they will select the time window in which they will be willing to run those analyses. And then, they will be advised by artificial intelligence based on the screening of data quality, which are the best multi-environment statistic tool to operate those analyses. And in the end, they will have some sort of matrix as the descriptor that I told you before.

Which type of biological data from marine observation platform we are talking about? I will speak about artificial intelligence in image processing. Today in marine ecology, of course, we are inventing the wheel now because the industry possibly is smart, you are guys are much more expert than us, but we are interested in overcoming the human bottleneck in processing terabytes of information we get by week.

Artificial intelligence in image monitoring is something that we elaborated in the past decade thanks to the connection with Ocean Network Canada. Ocean Network Canada providing a network of cable observatories in British Columbia. At one kilometer depth, this is an example, we monitor sable fish. Sable fish is a relevant fishery item for three countries: Canada itself, US, and Japan. But because this sampling is limited, we are using crawler. This is only one, one kilometer depth, data to that cable node, and we drive it in remote with the right panel, so from Barcelona. But because we don't want to drive it, we are endowing it with edge computing routines, the crawler of capability to operate itself back and forth along transit and to count animals in different species. Okay, that's what we're doing.

Bringing together all this experience, what we are doing? We are elevating the intelligence of OBSEA, the master platforms that belongs to us and will service technological development in DIGI4ECO. It is located one hour south driving from Barcelona. Very nice place. Please visit if you have time. Four miles off the harbor of Villanova in a Natura 2000 area, which is a machine protected area. 10 years of streaming of biological and environmental data continuously, okay? And that's the results. The AI automatically classifying animals in a very complex operational scenario, several species of fishes appearing together.

To conclude, we are interested in increase the virtual representation capability of all digital models by ingesting any type of data we can. Also, very different one: satellites and fishing vessel. Because the long liners and the crawlers that work 20 kilometers around all deployment areas will provide us with their logbook in which they annotate the species per day. So we have huge time series of biological data. They will annotate what is happening to the biology. We can introduce this in the forecasting strategies of the DIGI4ECO, providing also information about the such economy output. Enoc now we'll explain you better in detail the [inaudible 00:13:41] application. Thank you very much.

Enoc Martínez:

Thank you, Jacopo. And the engineers have to speak about data flows, servers, and these kind of things. So Jacopo explained what we aim to do. We want to monitor ecosystems. We want to create this virtual copy. We have this platform that gathers data, and that's what we have here. So we have the observatories with centrals and platforms gathering scientific data. We also want to use socio-economical data. Is there some aquaculture in the region? Is there any fisheries data? Do we have industrial impact in the area? And all this needs to be sent to what we call the digital twin, where we have all these AI tools, all these that are repository, the simulations, the filtering duration, et cetera.

Users can get this data, do the simulations, run these "what if" scenarios. And we want to replicate this four times, because we are using it four different times: in Ireland, Sweden, Italy, and Spain. I'm going to focus specifically on OBSEA, which is where I work. And it's, I would say, the most advanced. OBSEA, you can see here in the top right, a picture of this observatory at the seafloor. It has been running for the last 17 years, and we have been acquiring data from different data sources, from acoustics, video cameras, classical sonographic sensors, et cetera.

In terms of IT, what do we have? We have real-time data flowing from the safe floor to our infrastructure where we are acquiring this data and we're ingesting everything that is a time series into a Postgres with a timescale database. And we also have everything that is file-based exposed publicly in an engine instance. So videos, pictures, sound, whatever. And this, you can see here, some of the images of the kind of data that we are working with.

And why are we here? Because we are also using Grafana. For what? To visualize this data, for sure. We have all this data flow that could be a bit of exotic to you coming from the sea floor. We store it in a traditional database, and then we display everything in Grafana. So of course, there will be some Grafana dashboards. What kind of dashboards are we using? How are we using Grafana to provide scientific services to the community?

So first thing, simple enough, displaying time series data. So we have this sensor that you can see here. This is what we call a CTD. It measures conductivity, temperature, and depth. It's a classical physical sonography sensor. And we can plot just our data to see if everything is okay. Do we have a cold water mass, we have a hot water mass? Whatever. We also have some quality metrics about our data. Simple enough.

We can do things a bit more complicated. We are also having, for instance, wave data where we have the different wave height, wave period, the direction. We can plot this in a Windows-like plugin, in this case, which I think it's Operato Windrose. We have several different kinds of dashboards in Grafana. Let's make it a bit more complex. We are also having underwater acoustics. And for that, we use hydrophones, which is basically our underwater microphones, simple enough. And then, since we are sampling 100 kilohertz or sound 24/7 all year long, we cannot really display this data as a time series. So we do things a bit more complicated. For instance, we pre-compute spectrograms, which is basically a visual representations of periods of sound. So we can try to see what's in this sound. For instance, here we can see in the middle, a ship going towards our [inaudible 00:17:24] and then leaving.

We can also pre-compute some of the data, like some pressure level. We can also display statistics, like in this case, the histogram. We have this data, we pre-compute it, and then we can use all this pre-competed data to show a bit more complex graphical dashboards.

Of course, as Jacopo mentioned, we are also using AI. And specifically, we're using AI to detect fish species. So in this case, we are running some yellow object detection models that detect species, then we can extract the time series, and we can also look at individual pictures to see the detections. We can filter them. We can select different species. We can look at the sourcing image, the processed image. Here, for instance, you can see that we are comparing to different textures to see if there is a prey and a predator, to see if they are together or one hits the other, et cetera.

Okay, this is just about data for now, scientific data. But we need to understand that the ocean is a very harsh environment. You have all sorts of issues. You have corrosion, you have storms. Everything in the water, everything will break eventually. It's just a matter of two days, two weeks, two years, but it will break and it will fail at some point, and we need to monitor that. We have these physical issues, like your cable breaks because you have a 20 meters of wave, but then we also have other physical issues like corrosion, your sensor degrades. But you also have what we call bio-fueling, which is basically some shellfish deciding to lift in your sensor.

These sensors are quite expensive. They can be tens of kilo euros. And then, you buy your new shiny sensor, you deploy it, you have excellent data, and after several time, you have this. This is the same sensor deployed for one year with all sorts of shellfish attached to your sensor. You have your conductivity cell, which has a perfect geometry, and then you have a mussel on top of it distorting your measurements. And we need to monitor that. We need to see when it happens so we can go there, take out the equipment, clean it, and deploy it again. And for that, we're also using Grafana.

In this case, we can see here we have in top what is a salinity classical physical senography parameter, and then we see a significant drop at some point, specifically when there's some shellfish in our sensor, in our conductivity cell. And for that, we are also using Grafana.

Since sensors are so expensive and they are difficult to manage, we need to keep track of sensor history. And for that, of course, we are also using Grafana. We have a database with all our metadata stored, adjacent BLOBs with controllable crawlers and all these tools that we are using in the scientific community. And then, we can display all this information into Grafana. We have pictures of sensors before and after deployment, we have a sensor history, we have a timeline, and we can get relatively quickly an overview of the sensors and of their operational life.

So we can say, "Oh, this sensor failed three times during the last year. Maybe it's time to be sent to the manufacturer or to buy a new one." And of course, we have dashboards for individual sensors, but we can also create dashboards for groups of sensors. We are managing tens or even hundreds of sensors, so we cannot go one by one. And for that, we also have lifetime dashboards in Grafana where we can see the operation and the lifetime of all of them.

And finally, we are doing this kind of research that I'm pretty sure will be exotic to most of you, but at the end of the day, it's data flowing from some sensors into a database where we do some processing. So we need servers, we need databases, we need IT infrastructure. And of course, we are also monitoring IT infrastructure. And how do we do that? With Grafana. In this case, we are using Zabbix to provide the metrics, and we visualize all of this into the Grafana dashboards. So we can have in a quick glance. We see all our servers, the status. Do we have enough space? Are we having issues with too many requests, et cetera?

To wrap up, some conclusions. We want to build a digital twin of the ocean, which is this virtual representation of the ecosystem where we have multi-parametric data flows and we need to manage underwater assets. We have all these tools that are more or less common, but in a very specific context, which is data simulation, modeling, AI based products, et cetera. And the idea that we had at the beginning is what you can see here, which is a sort of old school dashboard where we can see all these data at once.

With Grafana, what do we have? Something like this, which looks very much the same. So we can really use Grafana as a front end for our vision of a digital twin. We can see the data that we acquire and we gather from the ocean, we can display it in a nice format using Grafana, which is an excellent open source tool that we're delighted with.

Specifically, what do we do with Grafana? What do we use Grafana for? To visualize real-time data, to visualize also historical scientific data. We can visualize our operations and we can monitor infrastructure health. What is missing in Grafana for us? I know that it's very niche, but it would be nice for us as a scientist working in earth observation that we could also display four dimensional data in Grafana, meaning time, latitude, longitude and depth, gridded data like satellite data or model data. Of course, this is a niche. I know that it's not of interest to most of the audience, but that's what scientists really need. That would be the very last issue that, if we can solve it, we could use Grafana as a complete front end for digital twins of the ocean.

And DIGI4ECO European project. It's a four-year project. Now we are in the middle. We did a lot of housekeeping, keeping the data, harmonizing the data, putting everything in the same place. And now we will work during the next two years in creating scientific products to create statistics from the ocean, to visualize, to run "what if" scenarios, et cetera. Stay tuned for more. And over the next two years, we hope to deliver this complete digital virtualization of the ocean. Thank you very much for your attention.

Jacopo Aguzzi:

Thank you.

Speakers

Jacopo Aguzzi
Senior Researcher — Instituto de Ciencias del Mar (ICM-CSIC)
Enoc Martínez
Assistant Professor — Electronic Engineering Department of UPC

Building a digital twin of the ocean: 4D ecological monitoring with robotics, AI-powered analytics, and Grafana

Speakers

Jacopo Aguzzi

Enoc Martínez

Still have questions?

Get every update