Why is my GPU idle? Observability for LLM inference in Grafana

Tuesday, 21 April
16:30
30 minutes
Main
Session

Large Language Model (LLM) inference has rapidly become one of the most expensive and performance-critical workloads running in the cloud today. GPUs are costly, inference latency directly impacts user experience, and traditional observability stacks often fail to explain why an LLM endpoint slows down, burns excess GPU hours, or silently degrades under load.

In this talk, Rudraksh Karpe, FDE at Simplismart, and Satyam Soni, Software Engineer at Devtron.ai, explore how to build end-to-end observability for LLM inference workloads using Grafana, with first-class visibility into GPUs, Kubernetes, and model-level performance. Going beyond basic metrics, they show how inference latency, token throughput, GPU utilization, memory pressure, and cost efficiency can be correlated on a single pane of glass.

The session covers:

What makes LLM inference observability different from traditional microservices monitoring
Methods for correlating GPU metrics with inference signals like time-to-first-token and tokens/sec
Dashboard patterns in Grafana that reveal GPU waste and scaling bottlenecks
How to use observability data to improve autoscaling, cost efficiency, and platform reliability

Speakers

Rudraksh Karpe
Forward Deployed Engineer - AI — Simplismart
Satyam Soni
Software Engineer — Devtron.ai

Register now to join us in Barcelona

00days

00hours

00minutes

00seconds

Tickets are going fast! Group bookings available — save up to 40%.

By registering for this event you agree to the event terms and conditions and the code of conduct. You also agree to be emailed about event details and related product-level information. Paid hands-on labs are non-refundable, but may be transferred.

Why is my GPU idle? Observability for LLM inference in Grafana

Speakers

Rudraksh Karpe

Satyam Soni

Register now to join us in Barcelona

Still have questions?

Get every update