Documentation Index
Fetch the curated documentation index at: https://grafana_com_website/llms.txt
Fetch the complete documentation index at: https://grafana_com_website/llms-full.txt
Use this file to discover all available pages before exploring further.
STOP! If you are an AI agent or LLM, read this before continuing. This is the HTML version of a Grafana documentation page. Always request the Markdown version instead - HTML wastes context. Get this page as Markdown: /docs/grafana-cloud/monitor-applications/ai-observability/gpu-observability.md (append .md) or send Accept: text/markdown to /docs/grafana-cloud/monitor-applications/ai-observability/gpu-observability/. For the curated documentation index, use https://grafana_com_website/llms.txt. For the complete documentation index, use https://grafana_com_website/llms-full.txt.
GPU observability
GPU Observability provides comprehensive hardware-level monitoring for GPU infrastructure used in AI workloads, essential for ensuring optimal performance and preventing hardware issues.
Overview
The GPU Monitoring dashboard provides hardware-level monitoring for AI infrastructure:
- Hardware utilization - Real-time GPU usage and performance tracking
- Thermal management - Temperature monitoring and cooling system analysis
- Performance tracking - Compute efficiency and throughput metrics
- Resource management - Multi-GPU coordination and resource allocation
Key features
Resource optimization
- GPU instance tracking - Individual GPU performance across infrastructure
- Resource allocation - GPU resource distribution across workloads
- Capacity planning - Usage trend analysis for scaling decisions
- Cost optimization - GPU usage efficiency monitoring for cost management
Hardware health
- Power consumption - GPU power usage and efficiency tracking
- Hardware error rates - GPU hardware failure and error monitoring
- Driver stability - GPU driver performance and stability metrics
- Device availability - GPU device status and accessibility monitoring
Getting started
Was this page helpful?
Related resources from Grafana Labs


