Webinar

Optimizing observability at scale: The Trade Desk’s Adaptive Metrics journey

You are registered for this webinar Thanks for registering
You'll receive an email confirmation, and a reminder on the day of the event. You'll receive an email when the on-demand video is available.
Optimizing observability at scale: The Trade Desk’s Adaptive Metrics journey

Company: The Trade Desk
Industry: Software & Technology (Ad tech)

The Trade Desk is a global leader in programmatic advertising technology, helping brands and agencies reach audiences through real-time bidding across digital channels. Operating at massive scale they function more like a high-frequency trading platform than a traditional ad tech company.

Challenge

Handling nearly 19 million queries per second, The Trade Desk consistently pushes their observability stack to the limit. With over 210 million time series, managing metrics across metal, Kubernetes, and cloud infrastructure became costly and complex. Despite using Prometheus and a custom alerting platform, cardinality often spiked due to developer-generated metrics, making costs unpredictable. Their previous incident response system, OpsGenie, and aging tools like PushGateway added to the operational burden.

Solution

Eventually, The Trade Desk migrated to Grafana Cloud, adopted Adaptive Metrics, and moved from OpsGenie to Grafana IRM. This involved migrating dashboards and global alerts to Grafana Cloud and educating developers on metric visibility and retention. Once enabled, Adaptive Metrics quickly revealed major savings potential, upwards of $1.74M. The team mapped metrics to owning teams, allowing for targeted cleanup without disrupting visibility. They also transitioned from PushGateway to OpenTelemetry collectors for modernized ingestion and introduced templated Kubernetes alerting via custom resources.

Impact

Adopting Adaptive Metrics delivered significantly greater cost savings (50%) than The Trade Desk anticipated (5%), quickly validating the investment. Developers retained control over their metrics while gaining better visibility into usage and retention. The move to Grafana IRM streamlined incident response and alert ownership, while templated alerts and modernized ingestion pipelines reduced noise and improved operational consistency. Overall, observability became more cost-efficient, scalable, and aligned with developer needs—without sacrificing performance or visibility.

“I’m super excited for [the unified Grafana Cloud IRM app] because right now we have hundreds of thousands of alerts going through our system and a good amount of them are noise. I’d like us to move closer to a system where every alert is an incident, and having the tools right next to each other, in the same app, will help us make that change quicker.”

– Paul Givens, Head of Observability


Your guide

Paul Givens
Paul Givens
Head of Observability
The Trade Desk
Resources

More great videos and webinars