Help build the future of open source observability software Open positions

Check out the open source projects we support Downloads

Grot cannot remember your choice unless you click the consent notice at the bottom.

Reduce MTTR with Grafana, Grafana k6, and Prometheus: Inside DHL’s observability stack

Reduce MTTR with Grafana, Grafana k6, and Prometheus: Inside DHL’s observability stack

16 Aug, 2023 5 min

Each year, more than 296 million packages are shipped around the world via DHL and their premium service, Time Definite International. And at DHL Express Switzerland, a local unit of the international logistics and shipping company, the IT team provides solutions for tracking customs clearance progress, analytics, mobile and optical character recognition (OCR) scanning, and warehouse management on every package that moves through Switzerland. 

It’s a complex operation that requires a multi-layered business and IT framework where every minute and every shipment counts. Translation: There’s very little room for downtime, false alarms, errors, and failed requests. 

In their recent GrafanaCON 2023 talk, “Transforming IT and business flows at DHL Express with Grafana, k6, and Prometheus” (now available on demand), Head of IT Djamel Djedid and Lead Architect Michael Lerch shared how their phased approach to implementing a Grafana-centric observability solution has helped DHL Express Switzerland resolve issues faster, save manpower, and expand its observability beyond traditional IT monitoring.

Phase one: POC with Prometheus + Grafana

In early 2020, the team identified the need for a more modern and scalable SRE solution. Their legacy monitoring system was siloed, which meant that their teams were often reactive when issues arose. 

They ran a proof of concept with Grafana and Prometheus, eventually making the decision to migrate critical legacy watchers to Prometheus. With the support of Grafana training and self-learning, the team implemented their new stack at a larger scale just in time to tackle a busy period for the business — and to much success. 

They developed Grafana dashboards like the one below, which monitors global customs clearance data. These dashboards allow the team to quickly see and share where bottlenecks are occurring and identify how to resolve them.

Grafana dashboard from DHL GrafanaCON 2023 presentation.
Users from different applications can easily pinpoint issues and identify ways to resolve them via this comprehensive dashboard on package flow through customs clearance.

Phase 2: Full speed ahead with Grafana Alerting 

In 2021, the team was ready to move to a more robust implementation of Grafana dashboards and Grafana Alerting. They built additional dashboards and integrated alerts with Microsoft Teams and their internal Wiki. 

“Our alerts contain the name of the application, a description of the issue, a link to the Grafana dashboard, as well as a link to a wiki in Microsoft Teams containing remediation instructions,” said Lerch. The result? No matter who is on duty, they can quickly address issues that arise and resolve them faster than ever before.

Grafana dashboard from the DHL presentation at GrafanaCON 2023.
Detailed dashboards for a single application provide an in-depth look at performance and show how proactive alerting results in fast issue resolution.

At the time, the team also made a huge shift. “We decided that SRE and observability would be a default attribute of every new application,” said Djedid. From that time on, every new application had to come with its own Grafana dashboard and monitoring. This approach delivered a huge improvement during the subsequent surge in business for DHL Express. Most issues were proactively detected and resolved, driving higher customer satisfaction.

Phase 3: Load testing with Grafana k6 for a smooth cloud migration

After implementing Grafana, the team started an infrastructure modernization project in 2022 to move some of their servers from on-premises data centers to the public cloud. “We needed to monitor performance and ensure the migration wasn’t negatively impacting performance for end users,” said Djedid. “We wanted a tool that could measure the latency between the user and the on-prem server and the cloud server.” Enter Grafana k6.

Graphic showing how DHL used Grafana k6 load testing.
Performance testing with Grafana k6 took the guesswork out of moving from on-prem servers to the public cloud for DHL Express Switzerland.

Performance testing with Grafana k6 took the guesswork out of moving from on-prem servers to the public cloud for DHL Express Switzerland.

By developing k6 scripts to measure the main trends of their business-critical applications, the team could test performance for different user scenarios in both the on-prem environment and the cloud environment. Load testing revealed that the cloud servers were much more stable for a larger number of users. “Grafana k6 really helped us be confident that the solution we were implementing was reliable and scalable,” said Lerch. 

Phase 4: Adding Grafana Loki, Grafana OnCall, and beyond

As of 2023, the team has grown their Grafana implementation to 80 alerts, 40 dashboards, and 60 active users. They’re also planning to add Grafana OnCall for incident management, and they’re exploring Grafana Loki for logs as well. 

Screenshot of computer monitors showing Grafana dashboard and
A quick look at Grafana dashboards gives the DHL Express team a comprehensive view of the status of their systems and processes – and the ability to enjoy that first cup of coffee each morning.

“This observability stack provides so many benefits,” said Djedid. But his favorite part is the clean and comprehensive view he gets of all DHL Express systems each morning at his desk.

“I love arriving each morning and looking at the Grafana dashboards showing everything is OK,” he said. “You don’t need to scroll emails or hope that no one will ring or ping you — you already know.”

To see more dashboards from the DHL Express team and learn more about their cloud migration load testing, watch the full GrafanaCON talk. All sessions from GrafanaCON 2023 are now available on demand.