Grafana Labs logo
Search icon

Cost-efficient and open observability: why Ocado migrated to Grafana Cloud

2025-12-296 min
Twitter
Facebook
LinkedIn

As Ocado Technology’s online grocery platform expanded to 12 retail partners across four continents, its observability stack ballooned.

At one point, the team was using 13 tools from seven vendors, plus another three in-house systems, to observe over 100 applications. Combined, they sent 18 TB of metrics and 15 TB of logs every day. 

At that scale, the patchwork of tools for application performance monitoring, logging, alerting, and dashboarding increased costs and complexity — and created friction that slowed teams down. For example: To troubleshoot an incident, engineers often had to switch tools multiple times, copying and pasting request IDs between inconsistent user interfaces.

“We wanted to unify observability in one single platform, and we wanted to standardize on open source to get a little bit more control and flexibility in the future,” said Mirek Wierzba, an engineering director at Ocado.

At his ObservabilityCON 2025 session, Wierzba shared how the company tackled the challenge by consolidating onto a single platform: Grafana Cloud.

Video

From fragmented to focused

Ocado's metrics and logs flowed primarily into New Relic, where closed-source agents offered limited control over the data. Supporting this stack required close coordination between teams and frequent troubleshooting within the observability pipeline itself.

“We had to put a lot of effort and time and energy to keep the whole architecture and whole observability ecosystem up and running,” Wierzba said. “It was sometimes resulting in operational fragility.”

Instead of solving these issues piecemeal, his team set out to simplify the entire architecture — all while improving cost efficiency, reducing incident impact, and giving engineers the tools needed to move faster.

After running tests, trials, and POCs, Ocado selected Grafana Cloud. With its commitment to open standards, Grafana was one of the few platforms that brought together metrics, logs, metrics, traces, load testing, and incident response tools. And its cost governance features aligned well with the company’s internal finance practices.

Migrating from New Relic to Grafana Cloud

To migrate at scale, Ocado relied on three key internal platforms:

  • Its developer portal, the Ocado Technology Platform (OTP), provided shared pipelines and deployment automation. Engineers could easily access Grafana Cloud services through OTP configurations or Terraform modules, while the observability team could roll out changes centrally and monitor adoption.
  • Its FinOps framework aggregated cost data from across the observability stack, then mapped costs to individual services and identified optimization opportunities. Real-time alerts flagged anomalies, and the platform’s recommendation engine highlighted savings opportunities from eliminating unused data, overprovisioned resources, and duplicated tools.
  • Its organizational warehouse, OT Data, stored operational data from systems such as Jira, GitLab, Workday, and ServiceNow. Teams used it to build dashboards, track migration timelines, and measure outcomes so engineering leadership could make more informed decisions.

With that foundation in place, Ocado started by migrating metrics from New Relic to Grafana Cloud, using OpenTelemetry and Micrometer for instrumentation. Because the New Relic agents were embedded in application code, each team had to make manual changes. The observability team supported these efforts by publishing clear migration guides, providing centralized configuration options, and holding regular check-ins with stakeholders. Some teams moved over quickly, while others ran both stacks in parallel until they were confident in the new setup.

The logs migration followed a different approach. Previously, logs were shipped to OpenSearch clusters provisioned across 70 different environments. Ocado replaced that with a Fluent Bit sidecar model, using lightweight logging agents deployed alongside each application to ship logs directly to Grafana Cloud Logs, powered by Loki.

To support engineers during the transition, Ocado hosted workshops on how to use Loki’s query language and interface effectively. The migration was tied to app redeployments, and since many apps underwent frequent releases for patching and library updates, adoption was fast. In some environments, more than 50% of apps switched over within a day. Most reached full adoption in less than two weeks.

Benefits of consolidation

Even with this careful planning, the migration faced some unexpected challenges, from an early spike in support requests to unusually large log lines for certain services. But these benefits have far outweighed the growing pains:

  • Simplified architecture: Ocado significantly reduced the number of systems it had to manage, eliminating the overhead of maintaining OpenSearch clusters and multiple vendor pipelines.
  • Cost savings: Consolidation led to meaningful reductions in vendor spend and duplicated tooling. Grafana Cloud's transparent pricing and FinOps alignment gave engineering leaders real levers to optimize usage and justify investments.
  • Faster troubleshooting: With centralized logs, metrics, and traces, engineers no longer waste time switching between interfaces or recreating search queries. Integrated workflows have helped teams respond to issues faster and reduce mean time to resolve.
  • Developer empowerment: Engineers now rely on a shared, modern toolkit instead of fragmented interfaces. Many are already experimenting with Grafana’s machine learning-based alerting to detect anomalies before they affect users.
  • Future-proofed stack: By eliminating closed-source agents and standardizing on OpenTelemetry and Micrometer, Ocado has unlocked long-term flexibility. Its observability strategy can evolve without depending on proprietary vendor implementations.

“Having all signals in one place is much easier and convenient for users,” Wierzba said. “We empowered our engineers to be more proactive, rather than reactive firefighting.”

What’s Next

Ocado is currently migrating incident response from PagerDuty to Grafana Cloud IRM. It’s also exploring whether the Fluent Bit sidecar model used for logs could help eliminate additional architectural complexity for metrics as well.

Ocado’s migration to Grafana Cloud started as a way to reduce vendor sprawl. It became a new beginning for the company to operate at scale, move faster, and build the future of observability on its own terms.

“It's not the end of our journey,” Wierzba said. “It's actually the first step. We are now working together with Grafana to see what we can do to leverage it even more and get more value out of it.”

Grafana Cloud is the easiest way to get started with metrics, logs, traces, dashboards, and more. We have a generous forever-free tier and plans for every use case. Sign up for free now!

Tags

Related content