Published Plugins

No plugins published yet
View all 0 plugins by ingero

Dashboards

CUDA Op Profiler: per-op latency p50/p95/p99 (single host, eBPF) Logo
CUDA Op Profiler: per-op latency p50/p95/p99 (single host, eBPF) by ingero
Single-host CUDA Runtime + Driver operation profiler. Per-op p50/p95/p99 latency from eBPF uprobes on libcudart and libcuda. Find your slowest kernel: top-10 slowest p99 table, per-op timeseries, op rate. Covers cudaMemcpy, cudaLaunchKernel, cuLaunchKernel, cuMemAlloc_v2, cuCtxSynchronize, cuda graphs, host tracepoints.
uploaded on May 10, 2026
Downloads: 0
Reviews: 0
GPU Data Movement: CUDA memcpy + NCCL collectives (single host) Logo
GPU Data Movement: CUDA memcpy + NCCL collectives (single host) by ingero
Single-host CUDA memcpy + NCCL data-movement dashboard. Per-direction memcpy throughput + p50/p95/p99 (from per-event histogram), local NCCL collective rates by op_type, libnccl version roster. Answer: is data movement my bottleneck on this box? eBPF uprobes on libcudart and libnccl.
uploaded on May 10, 2026
Downloads: 0
Reviews: 0
GPU Memcpy Bandwidth: h2d / d2h / d2d, latency percentiles (multi-node) Logo
GPU Memcpy Bandwidth: h2d / d2h / d2d, latency percentiles (multi-node) by ingero
Multi-node CUDA memcpy bandwidth dashboard. Per-direction throughput (h2d, d2h, d2d, peer, default), per-direction latency p50/p95/p99 from per-event histogram. Surfaces data-pipeline bottlenecks (h2d-d2h skew = CPU-bound prep; d2d dominance = peer-copy hot path). eBPF uprobes on libcudart cudaMemcpy*.
uploaded on May 10, 2026
Downloads: 0
Reviews: 0
GPU Memory & Throttle: OOM, thermal, power debug (single host) Logo
GPU Memory & Throttle: OOM, thermal, power debug (single host) by ingero
Single-host NVIDIA GPU resource state dashboard. Memory used/free/total/fragmentation, top PIDs by allocation, throttle bitmask + rising-edge event counters (power/thermal/sw/hw), experimental memfrag IOCTL counter. Why is my GPU running hot, throttled, or full?
uploaded on May 10, 2026
Downloads: 0
Reviews: 0
GPU Memory Fragmentation: OOM on half-full GPU debug (multi-node) Logo
GPU Memory Fragmentation: OOM on half-full GPU debug (multi-node) by ingero
Multi-node NVIDIA GPU memory fragmentation dashboard. Per-GPU used/free/total, fragmentation estimate (heuristic), top processes by allocation, IOCTL event volume per cmd code (experimental kprobe on nvidia_unlocked_ioctl). For OOM-on-half-full-GPU debugging across the cluster.
uploaded on May 10, 2026
Downloads: 2
Reviews: 0
View all 8 dashboards by ingero