Company: The Trade Desk
Industry: Software & Technology (Ad tech)
The Trade Desk is a global leader in programmatic advertising technology, helping brands and agencies reach audiences through real-time bidding across digital channels. Operating at massive scale they function more like a high-frequency trading platform than a traditional ad tech company.
Challenge
Handling nearly 19 million queries per second, The Trade Desk consistently pushes their observability stack to the limit. With over 210 million time series, managing metrics across metal, Kubernetes, and cloud infrastructure became costly and complex. Despite using Prometheus and a custom alerting platform, cardinality often spiked due to developer-generated metrics, making costs unpredictable. Their previous incident response system, OpsGenie, and aging tools like PushGateway added to the operational burden.
Solution
Eventually, The Trade Desk migrated to Grafana Cloud, adopted Adaptive Metrics, and moved from OpsGenie to Grafana IRM. This involved migrating dashboards and global alerts to Grafana Cloud and educating developers on metric visibility and retention. Once enabled, Adaptive Metrics quickly revealed major savings potential, upwards of $1.74M. The team mapped metrics to owning teams, allowing for targeted cleanup without disrupting visibility. They also transitioned from PushGateway to OpenTelemetry collectors for modernized ingestion and introduced templated Kubernetes alerting via custom resources.
Impact
Adopting Adaptive Metrics delivered significantly greater cost savings (50%) than The Trade Desk anticipated (5%), quickly validating the investment. Developers retained control over their metrics while gaining better visibility into usage and retention. The move to Grafana IRM streamlined incident response and alert ownership, while templated alerts and modernized ingestion pipelines reduced noise and improved operational consistency. Overall, observability became more cost-efficient, scalable, and aligned with developer needs—without sacrificing performance or visibility.
“I’m super excited for [the unified Grafana Cloud IRM app] because right now we have hundreds of thousands of alerts going through our system and a good amount of them are noise. I’d like us to move closer to a system where every alert is an incident, and having the tools right next to each other, in the same app, will help us make that change quicker.”
– Paul Givens, Head of Observability
Your guide
