Dashboards
CUDA Op Profiler: per-op latency p50/p95/p99 (single host, eBPF) by ingero
Single-host CUDA Runtime + Driver operation profiler. Per-op p50/p95/p99 latency from eBPF uprobes on libcudart and libcuda. Find your slowest kernel: top-10 slowest p99 table, per-op timeseries, op rate. Covers cudaMemcpy, cudaLaunchKernel, cuLaunchKernel, cuMemAlloc_v2, cuCtxSynchronize, cuda graphs, host tracepoints.
uploaded on May 10, 2026
Downloads: 0
Reviews: 0
GPU Data Movement: CUDA memcpy + NCCL collectives (single host) by ingero
Single-host CUDA memcpy + NCCL data-movement dashboard. Per-direction memcpy throughput + p50/p95/p99 (from per-event histogram), local NCCL collective rates by op_type, libnccl version roster. Answer: is data movement my bottleneck on this box? eBPF uprobes on libcudart and libnccl.
uploaded on May 10, 2026
Downloads: 0
Reviews: 0
GPU Memcpy Bandwidth: h2d / d2h / d2d, latency percentiles (multi-node) by ingero
Multi-node CUDA memcpy bandwidth dashboard. Per-direction throughput (h2d, d2h, d2d, peer, default), per-direction latency p50/p95/p99 from per-event histogram. Surfaces data-pipeline bottlenecks (h2d-d2h skew = CPU-bound prep; d2d dominance = peer-copy hot path). eBPF uprobes on libcudart cudaMemcpy*.
uploaded on May 10, 2026
Downloads: 0
Reviews: 0
GPU Memory & Throttle: OOM, thermal, power debug (single host) by ingero
Single-host NVIDIA GPU resource state dashboard. Memory used/free/total/fragmentation, top PIDs by allocation, throttle bitmask + rising-edge event counters (power/thermal/sw/hw), experimental memfrag IOCTL counter. Why is my GPU running hot, throttled, or full?
uploaded on May 10, 2026
Downloads: 0
Reviews: 0
GPU Memory Fragmentation: OOM on half-full GPU debug (multi-node) by ingero
Multi-node NVIDIA GPU memory fragmentation dashboard. Per-GPU used/free/total, fragmentation estimate (heuristic), top processes by allocation, IOCTL event volume per cmd code (experimental kprobe on nvidia_unlocked_ioctl). For OOM-on-half-full-GPU debugging across the cluster.
uploaded on May 10, 2026
Downloads: 2
Reviews: 0
NVIDIA GPU Cluster Overview: NCCL, memcpy, throttle (multi-node) by ingero
Multi-node NVIDIA GPU cluster dashboard. One-glance health: active GPUs, NCCL collective rates by op_type, memcpy bandwidth by direction (h2d/d2h/d2d), throttle event counters, aggregate GPU memory. eBPF-collected agent metrics aggregated across N nodes. Pair with cluster-wide Prometheus + Ingero Fleet collector.
uploaded on May 10, 2026
Downloads: 0
Reviews: 0
NVIDIA GPU Trace Overview: CUDA, NCCL, memcpy, throttle by ingero
Single-host NVIDIA GPU dashboard. Per-operation CUDA latency (p50/p95/p99) from eBPF uprobes on libcudart + libcuda. NCCL collective rates, memcpy bandwidth by direction (h2d/d2h/d2d), GPU memory, throttle state. amd64 + arm64.
uploaded on May 10, 2026
Downloads: 0
Reviews: 0
Per-Node GPU Drill-Down: CUDA, NCCL, memcpy, throttle, kernel launch by ingero
Per-node NVIDIA GPU drill-down dashboard. Pick a node from the variable; everything filters to that host. Per-GPU memory + throttle, NCCL collective rates, memcpy duration percentiles, IOCTL event rates per cmd, kernel launch threads-per-block percentiles. Use after cluster overview surfaces a candidate.
uploaded on May 10, 2026
Downloads: 0
Reviews: 0