Query Analysis and Insights
Query Analysis and Insights
Introduction
This dashboard is built to extend Percona Monitoring and Management (PMM) Query Analytics by providing deeper visibility into query behavior, performance, and contention patterns using Grafana and ClickHouse. It focuses on practical, DBA-driven questions such as why queries are slow, where contention is coming from, and how workloads are distributed across users and clients.
The panels are designed to be used together as a troubleshooting workflow, starting from identifying slow or unstable queries, drilling into efficiency and locking issues, and finally attributing load to users or hosts.
Author
- Kedar V (https://kedar.nitty-witty.com/blog) Consulting at Percona.com
Summary
By combining PMM’s rich Query Analytics data with custom Grafana visualizations, this dashboard moves beyond simple slow-query listings to expose latency distribution, inefficiency, contention, and multi-tenant behavior. It enables faster root-cause analysis and more informed tuning decisions during both incident response and routine performance reviews.
Use this dashboard as a foundation: adapt thresholds, filters, and queries to match your environment and workload. With PMM’s underlying data model, these views can be extended further to support proactive monitoring and performance optimization.
Dashboard Panels
1. Slow Query Analysis
Shows average execution time over time for the top slowest MySQL query fingerprints.
Use this panel to identify latency spikes, unstable queries, and performance regressions.
2. Latency Distribution Heatmap
Visualizes the distribution of query execution times over time using a heatmap.
Helps detect fat tails and outliers that are often hidden by averages.
3. Query Volatility (P99 vs Average)
Compares average latency against tail latency (P99) for queries.
Useful to spot jitter, contention, or resource saturation even when averages look stable.
4. Lock Wait Ratio Over Time (Top Contended Queries)
Shows the fraction of execution time spent waiting on locks for the most contended queries.
Ratios consistently above ~20% usually indicate lock contention or concurrency issues.
5. Temporary Table Usage (Disk & Memory)
Highlights queries creating large or frequent temporary tables, especially on disk.
Helps identify queries that cause memory pressure, disk I/O, or inefficient sorting/grouping.
6. Query Efficiency (Rows Examined vs Rows Sent)
Displays queries with high read amplification (rows examined compared to rows returned).
Commonly used to detect missing indexes and inefficient access patterns.
7. Error Rate vs Throughput
Compares the number of queries with errors against overall query volume.
Use this panel to catch error spikes, validate deployments, and correlate failures with load.
8. Workload Distribution by User
Shows the relative share of total execution time consumed by each database user.
Useful for identifying noisy neighbors and understanding multi-tenant workload distribution.
9. Query Volume by Client Host
Displays the number of queries executed per client host over time.
Primarily used to spot traffic spikes and imbalances; host information may be unavailable in some setups.
10. Execution Time vs Lock Wait Time
Correlates total execution time with lock wait time for top queries.
Helps distinguish between CPU/I/O-bound queries and those slowed down by locking and contention.
Data source config
Collector config:
Upload an updated version of an exported dashboard.json file from Grafana
| Revision | Description | Created | |
|---|---|---|---|
| Download |