Event hero background image

Why is my GPU idle? Observability for LLM inference in Grafana

  • Start date
    Tuesday, 21 April
  • Time
    16:30
  • Duration
    30 minutes
  • Spaces
    Main
  • Session
    Session

Large Language Model (LLM) inference has rapidly become one of the most expensive and performance-critical workloads running in the cloud today. GPUs are costly, inference latency directly impacts user experience, and traditional observability stacks often fail to explain why an LLM endpoint slows down, burns excess GPU hours, or silently degrades under load.

In this talk, Rudraksh Karpe, FDE at Simplismart, and Satyam Soni, Software Engineer at Devtron.ai, explore how to build end-to-end observability for LLM inference workloads using Grafana, with first-class visibility into GPUs, Kubernetes, and model-level performance. Going beyond basic metrics, they show how inference latency, token throughput, GPU utilization, memory pressure, and cost efficiency can be correlated on a single pane of glass.

The session covers:

  • What makes LLM inference observability different from traditional microservices monitoring
  • Methods for correlating GPU metrics with inference signals like time-to-first-token and tokens/sec
  • Dashboard patterns in Grafana that reveal GPU waste and scaling bottlenecks
  • How to use observability data to improve autoscaling, cost efficiency, and platform reliability
Speakers

Register now to join us in Barcelona

00days
00hours
00minutes
00seconds

Tickets are going fast! Group bookings available — save up to 40%.

By registering for this event you agree to the event terms and conditions and the code of conduct. You also agree to be emailed about event details and related product-level information. Paid hands-on labs are non-refundable, but may be transferred.