Webinar

Aurora’s observability in motion: Adaptive profiling and cost-efficient monitoring with Grafana Cloud

November 18, 2025

Happening now!

PTCETUTC

On-demand

This webinar has concluded

You are registered for this webinar Thanks for registering

You'll receive an email confirmation, and a reminder on the day of the event. You'll receive an email when the on-demand video is available.

Join the webinar

Aurora’s observability in motion: Adaptive profiling and cost-efficient monitoring with Grafana Cloud

Company: Aurora
Industry: Travel & Transportation

Aurora is a leader in autonomous vehicle technology, focused on delivering safe and scalable self-driving solutions for the commercial trucking industry. Their ecosystem spans real-time onboard compute, cloud-based machine-learning pipelines, large-scale Kubernetes platforms, and a network of safety-critical services enabling autonomous freight movement across Texas and the Southwest. Maintaining observability across this highly distributed, multi-tenant, latency-sensitive environment is essential for safety, performance, and operational scale.

Challenge

Aurora’s observability footprint expanded dramatically when the company acquired a division of Uber ATG in early 2021—instantly growing its engineering organization by ~200%. This rapid expansion strained an already fragmented monitoring stack that included:

A self-hosted OSS toolchain (Prometheus + Thanos + Grafana) coupled with a separate logging vendor and additional “best-of-breed” telemetry tools.
Multiple dashboards, alerting systems, and time-zone inconsistencies that slowed troubleshooting.
Divergent vendor billing models, making cost forecasting difficult.
Disparate instrumentation patterns across 30+ Kubernetes clusters and numerous service types.

This fragmentation made troubleshooting across metrics, logs, traces, and profiling both time-consuming and expensive—particularly for developers working in a safety-critical autonomous vehicle environment.

Solution

Aurora consolidated telemetry onto Grafana Cloud as its unified observability platform. Key elements included:

Consolidation: Migrating from Chronosphere, Honeycomb, and self-hosted Grafana OSS into one Grafana Cloud platform supporting PromQL, logs, traces, and continuous profiling via Pyroscope.
Phased migration: Metrics migrated first (~30–45 days), followed by logs and traces (~11 months) across 30+ clusters with standardized pipelines and alerting.
Adaptive telemetry: Dynamic control of metric and profiling volume; opt-in profiling to manage cost; deployment and feature-flag annotations added directly to telemetry.
Developer enablement: A single pane of glass with consistent time zones, unified dashboards, and reduced context switching for teams with diverse skill sets.

“A single pane of glass was one of the ways … we really wanted to make this simple for people.” \
– Craig Sebenik, Observability Lead

Impact

By unifying their telemetry into a single platform, Aurora unlocked dramatic improvements in speed, efficiency, and operational scale:

Faster incident resolution: Issues that previously took hours or days now take hours—or even minutes with all telemetry in one place.
Cost control: Adaptive metrics and in the future profiling prevented runaway ingestion; opt-in profiling reduced spend while preserving visibility.
Higher developer productivity: Fewer vendor pivots, standardized telemetry, and lower cognitive load across teams.
Scalable operations: Unified observability now supports 30+ Kubernetes clusters spanning core infrastructure, ML/batch workloads, R&D systems, and customer-visible autonomous trucking services.

“There have been cases where teams have reported that a given kind of incident or issue that might’ve days to resolve… now takes them hours or even potentially minutes because all the data’s in one place.”
– Craig Sebenik, Observability Lead

Your guide

Craig Sebenik

Lead for Observability

Aurora

Aurora’s observability in motion: Adaptive profiling and cost-efficient monitoring with Grafana Cloud

Your guide

Tags

More great videos and webinars

Still have questions?

Get every update