Add observability to cart: How online retailer ASOS reduces MTTR with Grafana Cloud
Like the fit your friend got on ASOS? There’s a good chance Grafana Cloud had something to do with that.
Each year, more than 20 million customers come to the UK-based online retailer to fill their digital carts, and they expect a seamless online experience as they shop and check out. ASOS can consistently provide that with the help of Grafana Cloud.
“There are alerts set up for many of our customer-facing journeys,” said Dylan Morley, Lead Principal Engineer at ASOS. “We know pretty instantly as soon as something starts going wrong.”
That was not always the case. About 10 years ago, “observability didn’t really exist as a concept. Data was spread out. It was difficult to find what you needed,” says Morley. Without a centralized data store, “when there was an incident, you had to go and find all of the information you wanted.”
To alleviate that stress, ASOS began to use Grafana OSS in 2017, in large part due to Grafana Labs’ “big tent” philosophy, which allows organizations to choose their own tools and bring together all of their disparate data into one dynamic dashboard.
With a growing catalog of 100+ plugins for Grafana and 65+ Grafana Cloud integrations for monitoring third-party tools, “the data source model that Grafana gives us means that we can quickly plot any data from any source that we want alongside existing telemetry data that we might already have,” says Morley. “It gives us this holistic view to understand our systems from a user perspective.”
Given the immediate benefits of Grafana, ASOS’ adoption of the open source tool grew quickly. By 2022, the team moved to the fully hosted Grafana Cloud platform to avoid updating and maintaining Grafana so they can focus on scaling their observability strategy and investing in other Grafana Cloud tools, such as Grafana SLO and Kubernetes Monitoring.
Access to Grafana Enterprise plugins was another key benefit of moving to Grafana Cloud.
“When we were running Grafana ourselves, we needed to query New Relic and ServiceNow, so we built some in-house versions of plugins — but all that came with the overhead of maintenance,” Morley said. “One of the benefits of the Enterprise plugins was that we could deprecate [our existing plugins], just take advantage of the Grafana Cloud offering, and have less code that we need to own and maintain.”
Now when an issue occurs, “all the monitoring in one place helps,” says Fahri Ulucay, Site Reliability Engineer at ASOS. “We have high-level dashboards. If we see something fluctuating there, we can all go into Grafana, zoom in, look into details and that helps to identify [the issue] quicker and then resolve it quicker as well.”
“When something shows red on our SLOs or our error budgets, we know that that’s a real problem … We are much more confident that the metrics that we are using accurately represent and describe the systems that ASOS relies on.”
— Adam Watson, Lead Site Reliability Engineer
Sounds too good to be true? There are about 800 other ASOS engineers using Grafana who would back this up.
“We have quite a few tools actually in SRE that we try and get people to use in the platforms, but I have never seen people have a problem with Grafana,” says Adam Watson, Lead Site Reliability Engineer. “I have to say one of the nice things about Grafana is, it’s just never been a blocker.”
As a result, “we are more prepared and more confident,” continues Watson. “We understand not only what metrics matter most to us but we understand how they sit in the holistic picture of the systems and the services that we support.”
To learn more about ASOS’ observability stack and how Grafana Cloud has reduced MTTR and their overall costs, check out the ASOS success story.