How River Island unified end-to-end retail observability while reducing costs with Grafana Cloud
River Island, a major UK-based fashion retailer, operates a complex, end-to-end digital and physical retail platform spanning ecommerce, stores, logistics, and supply chain systems. As the company scaled its AWS-based microservices architecture, its technology operations team struggled with fragmented observability, limited visibility into critical systems, and rising costs from legacy tooling.
“We did it badly,” said Tonino Greco, Head of Cloud, Infrastructure, and Operations at River Island. “Everything was spread across multiple monitoring tools, and when something went wrong, it took us tens of minutes just to figure out where to start.”
By consolidating observability into Grafana Cloud, River Island brought together metrics, logs, and telemetry from across its entire estate into a single platform. As a result, the team reduced time to identify issues from tens of minutes to minutes, improved proactive detection of performance degradation, and significantly lowered total cost of ownership compared to its previous solution.
Tonino recently spoke with Grafana Labs about River Island’s observability journey.
Can you start by introducing yourself, your role at River Island, and what your team is responsible for?
Sure. I’m Tonino Greco. I’m the Head of Cloud, Infrastructure, and Operations at River Island. The team I manage is pretty much everything from service desk to service delivery, to networks, to infrastructure, to cloud, cloud engineering, and AI platform.
So we’re pretty much the foundation of all the services at River Island from a tech perspective. Everything operational and foundational, technology-wise. I’ve been at River Island for just over six years, coming on seven in the next couple of months.
There are 50 people in the team, split across seven squads. We cover operations for pretty much everything in the business, including stores.
What does your technology ecosystem look like today?
Today it’s made up mostly of microservices that sit within AWS, so a very distributed platform. We do still have some legacy systems, especially around backend financial systems, but the core platform is modern and AWS-centric.
The front end is still a bit monolithic to a small degree, but everything behind it, product listing, product detail pages, basket, checkout is all built on microservices.
Our biggest challenge across that is observability. How do you find that bug, that needle in that haystack of microservices?
Before Grafana, how were you handling observability, and what challenges did you face?
Before Grafana, we did it badly.
It was all distributed across a handful of monitoring solutions, New Relic for front end, CloudWatch, Zabbix, and others. When something went wrong, the biggest challenge was knowing where to start looking.
Triage and defining the problem used to take us quite a long time tens of minutes rather than minutes as it takes us now. And the longer you take, the more money the company is losing.
What ultimately drove your decision to move away from your previous tools?
There were two main factors. One was visibility. We weren’t getting proper visibility into our serverless functions, particularly AWS Lambda. The data just wasn’t appearing in a way that we could use it.
The second was cost. We wanted to expand how much data we were sending, but the pricing model made that difficult. It became a hindrance.
The cost of using New Relic just for the digital platform was the same as what we now pay for the entire Grafana stack across all platforms. That made the conversation easy. We needed a platform that could give us full observability across the estate and allow us to scale without cost becoming a blocker.
How did you approach the migration to Grafana Cloud?
Our journey really came from developer frustration with our existing tools.
Grafana was already well known across the team because of its open source roots. When we said we were going to use Grafana, nobody asked what it was, everyone already knew it.
We started experimenting with it, and once we saw how easy it was to integrate different data sources using plugins, it just grew organically. We kept adding more systems: “there’s a plugin for this, let’s try it” — and everything got bundled in.
That led us to formalize it as our standard platform. The implementation was actually mostly seamless because of that familiarity. Around 90% of the team had used Grafana before.
What impact has Grafana had on your ability to detect and resolve issues?
Now that everything is in one place: logs, metrics, everything, it’s much easier to find problems quickly.
We’ve gone from taking tens of minutes to identify issues to just minutes.
We also now have predictive capabilities. Because we have full visibility across the estate, we can start seeing problems before they become critical.
For example, just before Easter, we noticed a small increase, around 3% to 5%, in page load times in the basket. At the same time, we saw order value starting to drop. We investigated and found a backend process causing additional load on the front end. It was a bug. We were able to fix it and deploy a change within 30 minutes, and immediately saw performance improve.
Previously, we wouldn’t have seen that at all. We would only have noticed the impact the next day in reporting.
How has having end-to-end visibility changed how you understand the customer experience?
Because we now have the full journey, we can spot small changes that have a big downstream impact. A small issue in checkout can snowball into a much bigger problem across the business. We also monitor in-store systems, like self-checkout and store devices, so we can see issues there as well and act quickly.
We’ve built dashboards that represent this visually, including a business-facing dashboard that non-technical users can understand. It shows both operational and business metrics in one place.
Without that centralized observability, you miss a lot of the picture, both the good and the bad.
How does Grafana help your team stay confident during peak events like Black Friday?
Because most of our platform is serverless, everything auto-scales. We track what we call end-user metrics, things like page load times that directly reflect customer experience. We define thresholds where performance becomes unacceptable.
During peak periods like Black Friday, we can see traffic and sales increasing while performance metrics stay within acceptable limits. Our CEO asks every year if we’re ready, and we can confidently say yes and show the dashboards to prove it.
You can literally watch the system scaling and performing in real time, which gives everyone confidence.
How has working with Grafana Labs influenced your journey?
It became a partnership rather than just buying a tool. We had constant back-and-forth with the Grafana team. Our engineers were asking questions all the time, and the support was always there.
Without that partnership, the journey would have been much harder. It’s not just about the product, it’s about the people behind it. Having strong support made a huge difference for us.
What’s next for your observability strategy?
The next step for us is expanding into business observability.
We want to visualize the full journey from order placement to delivery. Including design, manufacturing, and logistics. The goal is to show not just what’s happening in technology, but how the entire business operates and performs.
We think that will give a much better picture of the effort across all teams, not just tech.


