Monitor applications

Database Observability

Optimize

Slow queries

Grafana Cloud

Improve slow queries

This guide walks application developers, DBAs, and SREs through a systematic workflow for finding, prioritizing, diagnosing, and fixing slow queries using Database Observability. You’ll use explain plans, query samples, wait events, and table schema details to identify the root cause, then apply targeted optimizations and verify the results.

Tip
This guide assumes you already have at least one database connected. If you haven’t set up collection yet, start with Set up MySQL or Set up PostgreSQL.

Optimization workflow overview

Follow this workflow to systematically improve query performance:

Identify slow queries by using performance metrics.
Prioritize queries based on impact.
Diagnose why the query is slow.
Apply the right optimization.
Verify improvements in metrics.

Identify slow queries

Start in the Queries Overview dashboard to find problematic queries.

Sort by duration

Use query duration to find the biggest opportunities.

Navigate to Database Observability in Grafana Cloud.
In the query table, click the Duration column header to sort.
Review queries with the highest average or total duration.

Look for error patterns

Use error rate to spot timeouts and resource pressure.

Click the Errors column to sort by error rate.
Identify queries with high error percentages.
Errors often indicate timeout issues or resource exhaustion.

Check recent changes

Compare time ranges to spot regressions.

Set time range to the last hour or day.
Compare with a previous period (using time shift).
Look for queries with increased duration.

Use the wait events chart

Use wait events to understand what the query waits on.

Review the cumulative wait events bar chart.
High wait times indicate queries spending time blocked.
Click through to investigate specific wait patterns.

Prioritize which queries to optimize

Not all slow queries need optimization. Prioritize based on impact.

Calculate optimization impact

Estimate impact so you focus on the highest-return work.

In the query table, the Duration and Calls columns provide the two inputs for estimating impact:

Impact = Average Duration × Execution Frequency

A query taking 100ms that runs 10,000 times per hour has more impact than a query taking 10 seconds that runs once per day. Focus on queries where high duration and high call frequency overlap, factoring in whether the query is user-facing or on a critical path.

Diagnose the problem

After you’ve selected a query to optimize, click the query to open its details and use the investigation tools to understand why it’s slow. Each tool reveals a different dimension of the problem:

Tool	What it reveals	Guide
Explain Plan tab	Scan types, join strategies, cost bottlenecks, missing indexes	Analyze explain plans
Query Samples tab	Execution variability, outliers, parameter-dependent performance	Examine query samples
Wait Events tab	Lock contention, I/O bottlenecks, CPU pressure	Understand wait events
Table Schema Details tab	Missing indexes, column types, foreign key relationships	Explore table schemas

Check infrastructure metrics

Check infrastructure metrics to rule out capacity constraints.

For cloud-managed databases, check if infrastructure constraints are causing slowness:

Click the Instance name to open the infrastructure panel.
Look for these patterns:

Metric	Indicator	Action
High CPU	Query is compute-bound	Optimize query complexity
Memory pressure	Insufficient buffer cache	Increase instance size
High IOPS	Disk throughput limit	Add indexes to reduce scans
Network saturation	Large result sets	Reduce data transferred

If infrastructure is the bottleneck, query optimization alone may not be sufficient. Consider scaling the database instance or optimizing application access patterns.

Note
Requires cloud_provider configuration. Refer to Add cloud provider integrations.

Apply optimization

Based on your diagnosis, apply the appropriate fix.

Add or modify indexes

Add indexes that match your filters, joins, and sort order.

Most common optimization. Add indexes for:

Columns in WHERE clauses
Columns in JOIN conditions
Columns in ORDER BY clauses

PostgreSQL:

-- Single column index
CREATE INDEX idx_orders_status ON orders(status);

-- Composite index (order matters!)
CREATE INDEX idx_orders_status_date ON orders(status, created_at);

-- Covering index (includes non-key columns)
CREATE INDEX idx_orders_covering ON orders(status) INCLUDE (total, customer_id);

-- Partial index (for selective conditions)
CREATE INDEX idx_orders_pending ON orders(created_at) WHERE status = 'pending';

MySQL:

-- Single column index
ALTER TABLE orders ADD INDEX idx_status (status);

-- Composite index
ALTER TABLE orders ADD INDEX idx_status_date (status, created_at);

-- Create index with specific type
ALTER TABLE orders ADD INDEX idx_name USING BTREE (customer_name);

After creating the index, return to the Explain Plan tab to confirm the database uses it, and check the Table Schema Details tab to verify the new index appears.

Rewrite the query

Change query shape to reduce work and improve index use. Common rewrites include selecting only needed columns instead of SELECT *, adding LIMIT for large result sets, and avoiding functions on indexed columns. For example, wrapping a column in a function prevents the database from using an index on that column:

-- Can't use index on created_at:
SELECT * FROM orders WHERE YEAR(created_at) = 2024;

-- Rewrite as a range to use the index:
SELECT * FROM orders
WHERE created_at >= '2024-01-01'
  AND created_at < '2025-01-01';

After rewriting, compare the Duration column in the query table to confirm the change reduced execution time.

Fix N+1 query patterns

Batch similar requests to reduce repeated database round trips.

Instead of (N+1):

-- Application loops through customer IDs
SELECT * FROM orders WHERE customer_id = 1;
SELECT * FROM orders WHERE customer_id = 2;
SELECT * FROM orders WHERE customer_id = 3;
-- ... repeated N times

Use batch query:

SELECT * FROM orders WHERE customer_id IN (1, 2, 3, ...);

Or use JOIN to fetch related data in one query. After batching, check the Calls and Duration columns in the query table to verify fewer, faster executions.

Address lock contention

Reduce lock time and conflicts to improve concurrency.

If wait events show lock issues, try these approaches:

Reduce transaction scope (shorter transactions)
Use appropriate isolation levels
Implement optimistic locking in application
Reorder operations to acquire locks consistently

After changes, return to the Wait Events tab to confirm that lock wait times have decreased.

Address I/O contention

Reduce data scanned and improve cache efficiency to lower I/O waits.

If wait events show I/O issues, try these approaches:

Add more memory to increase cache hit rate
Add indexes to reduce data scanned
Consider read replicas for read-heavy workloads
Review storage performance (IOPS, throughput)

After changes, return to the Wait Events tab to confirm that I/O wait times have decreased.

Use AI Helper

Select the AI Helper tab to get AI-powered explanations of query structure and optimization suggestions. Validate any recommendation against your explain plan and query samples before applying it.

Note
AI Helper is being replaced by the Grafana AI Assistant in a future release. Functionality may change during this transition.

Verify improvement

After applying changes, confirm they worked.

Monitor in Database Observability

Compare key query metrics before and after the change.

Return to Queries Overview.
Find the optimized query.
Compare metrics before and after:

Metric	Check
Duration (avg)	Should decrease
Duration (p99)	Tail latency should improve
Errors	Should not increase
Calls	Should remain stable
Resource utilization	CPU and I/O should decrease

Continue tracking these metrics over time to confirm the improvement holds under varying load.

Check explain plan

Confirm that the optimizer uses your intended plan and indexes.

View the new explain plan.
Verify the index is being used.
Confirm row estimates decreased.

If the plan hasn’t changed after adding an index, update statistics (ANALYZE for PostgreSQL, ANALYZE TABLE for MySQL) and check again.

Review samples

Validate that improvements are consistent across executions.

Check the Query Samples tab.
Verify consistent improvement.
Watch for any regression in specific cases.

Use these guides to dig deeper into the investigation workflow.

Analyze explain plans: Detailed explain plan interpretation
Understand wait events: Diagnose resource contention
Explore table schemas: Review index opportunities
View query performance: Navigate the dashboard

Was this page helpful?

Email docs@grafana.com

Help and support

Community

Improve slow queries

Optimization workflow overview

Identify slow queries

Sort by duration

Look for error patterns

Check recent changes

Use the wait events chart

Prioritize which queries to optimize

Calculate optimization impact

Diagnose the problem

Check infrastructure metrics

Apply optimization

Add or modify indexes

Rewrite the query

Fix N+1 query patterns

Address lock contention

Address I/O contention

Use AI Helper

Verify improvement

Monitor in Database Observability

Check explain plan

Review samples

Related documentation

Was this page helpful?

Related resources from Grafana Labs