Query Analysis and Insights

Query Analysis and Insights

Introduction

This dashboard is built to extend Percona Monitoring and Management (PMM) Query Analytics by providing deeper visibility into query behavior, performance, and contention patterns using Grafana and ClickHouse. It focuses on practical, DBA-driven questions such as why queries are slow, where contention is coming from, and how workloads are distributed across users and clients.

The panels are designed to be used together as a troubleshooting workflow, starting from identifying slow or unstable queries, drilling into efficiency and locking issues, and finally attributing load to users or hosts.

Author

Summary

By combining PMM’s rich Query Analytics data with custom Grafana visualizations, this dashboard moves beyond simple slow-query listings to expose latency distribution, inefficiency, contention, and multi-tenant behavior. It enables faster root-cause analysis and more informed tuning decisions during both incident response and routine performance reviews.

Use this dashboard as a foundation: adapt thresholds, filters, and queries to match your environment and workload. With PMM’s underlying data model, these views can be extended further to support proactive monitoring and performance optimization.

Dashboard Panels

1. Slow Query Analysis

Shows average execution time over time for the top slowest MySQL query fingerprints.
Use this panel to identify latency spikes, unstable queries, and performance regressions.


2. Latency Distribution Heatmap

Visualizes the distribution of query execution times over time using a heatmap.
Helps detect fat tails and outliers that are often hidden by averages.


3. Query Volatility (P99 vs Average)

Compares average latency against tail latency (P99) for queries.
Useful to spot jitter, contention, or resource saturation even when averages look stable.


4. Lock Wait Ratio Over Time (Top Contended Queries)

Shows the fraction of execution time spent waiting on locks for the most contended queries.
Ratios consistently above ~20% usually indicate lock contention or concurrency issues.


5. Temporary Table Usage (Disk & Memory)

Highlights queries creating large or frequent temporary tables, especially on disk.
Helps identify queries that cause memory pressure, disk I/O, or inefficient sorting/grouping.


6. Query Efficiency (Rows Examined vs Rows Sent)

Displays queries with high read amplification (rows examined compared to rows returned).
Commonly used to detect missing indexes and inefficient access patterns.


7. Error Rate vs Throughput

Compares the number of queries with errors against overall query volume.
Use this panel to catch error spikes, validate deployments, and correlate failures with load.


8. Workload Distribution by User

Shows the relative share of total execution time consumed by each database user.
Useful for identifying noisy neighbors and understanding multi-tenant workload distribution.


9. Query Volume by Client Host

Displays the number of queries executed per client host over time.
Primarily used to spot traffic spikes and imbalances; host information may be unavailable in some setups.


10. Execution Time vs Lock Wait Time

Correlates total execution time with lock wait time for top queries.
Helps distinguish between CPU/I/O-bound queries and those slowed down by locking and contention.

Revisions
RevisionDescriptionCreated

Get this dashboard

Import the dashboard template

or

Download JSON

Datasource
Dependencies