Grafana Cloud

Improve slow queries

This guide walks application developers, DBAs, and SREs through a systematic workflow for finding, prioritizing, diagnosing, and fixing slow queries using Database Observability. You’ll use explain plans, query samples, wait events, and table schema details to identify the root cause, then apply targeted optimizations and verify the results.

Tip

This guide assumes you already have at least one database connected. If you haven’t set up collection yet, start with Set up MySQL or Set up PostgreSQL.

Optimization workflow overview

Follow this workflow to systematically improve query performance:

  1. Identify slow queries by using performance metrics.
  2. Prioritize queries based on impact.
  3. Diagnose why the query is slow.
  4. Apply the right optimization.
  5. Verify improvements in metrics.

Identify slow queries

Start in the Queries Overview dashboard to find problematic queries.

Sort by duration

Use query duration to find the biggest opportunities.

  1. Navigate to Database Observability in Grafana Cloud.
  2. In the query table, click the Duration column header to sort.
  3. Review queries with the highest average or total duration.

Look for error patterns

Use error rate to spot timeouts and resource pressure.

  1. Click the Errors column to sort by error rate.
  2. Identify queries with high error percentages.
  3. Errors often indicate timeout issues or resource exhaustion.

Check recent changes

Compare time ranges to spot regressions.

  1. Set time range to the last hour or day.
  2. Compare with a previous period (using time shift).
  3. Look for queries with increased duration.

Use the wait events chart

Use wait events to understand what the query waits on.

  1. Review the cumulative wait events bar chart.
  2. High wait times indicate queries spending time blocked.
  3. Click through to investigate specific wait patterns.

Prioritize which queries to optimize

Not all slow queries need optimization. Prioritize based on impact.

Calculate optimization impact

Estimate impact so you focus on the highest-return work.

In the query table, the Duration and Calls columns provide the two inputs for estimating impact:

Impact = Average Duration × Execution Frequency

A query taking 100ms that runs 10,000 times per hour has more impact than a query taking 10 seconds that runs once per day. Focus on queries where high duration and high call frequency overlap, factoring in whether the query is user-facing or on a critical path.

Diagnose the problem

After you’ve selected a query to optimize, click the query to open its details and use the investigation tools to understand why it’s slow. Each tool reveals a different dimension of the problem:

ToolWhat it revealsGuide
Explain Plan tabScan types, join strategies, cost bottlenecks, missing indexesAnalyze explain plans
Query Samples tabExecution variability, outliers, parameter-dependent performanceExamine query samples
Wait Events tabLock contention, I/O bottlenecks, CPU pressureUnderstand wait events
Table Schema Details tabMissing indexes, column types, foreign key relationshipsExplore table schemas

Check infrastructure metrics

Check infrastructure metrics to rule out capacity constraints.

For cloud-managed databases, check if infrastructure constraints are causing slowness:

  1. Click the Instance name to open the infrastructure panel.
  2. Look for these patterns:
MetricIndicatorAction
High CPUQuery is compute-boundOptimize query complexity
Memory pressureInsufficient buffer cacheIncrease instance size
High IOPSDisk throughput limitAdd indexes to reduce scans
Network saturationLarge result setsReduce data transferred

If infrastructure is the bottleneck, query optimization alone may not be sufficient. Consider scaling the database instance or optimizing application access patterns.

Note

Requires cloud_provider configuration. Refer to Add cloud provider integrations.

Apply optimization

Based on your diagnosis, apply the appropriate fix.

Add or modify indexes

Add indexes that match your filters, joins, and sort order.

Most common optimization. Add indexes for:

  • Columns in WHERE clauses
  • Columns in JOIN conditions
  • Columns in ORDER BY clauses

PostgreSQL:

SQL
-- Single column index
CREATE INDEX idx_orders_status ON orders(status);

-- Composite index (order matters!)
CREATE INDEX idx_orders_status_date ON orders(status, created_at);

-- Covering index (includes non-key columns)
CREATE INDEX idx_orders_covering ON orders(status) INCLUDE (total, customer_id);

-- Partial index (for selective conditions)
CREATE INDEX idx_orders_pending ON orders(created_at) WHERE status = 'pending';

MySQL:

SQL
-- Single column index
ALTER TABLE orders ADD INDEX idx_status (status);

-- Composite index
ALTER TABLE orders ADD INDEX idx_status_date (status, created_at);

-- Create index with specific type
ALTER TABLE orders ADD INDEX idx_name USING BTREE (customer_name);

After creating the index, return to the Explain Plan tab to confirm the database uses it, and check the Table Schema Details tab to verify the new index appears.

Rewrite the query

Change query shape to reduce work and improve index use. Common rewrites include selecting only needed columns instead of SELECT *, adding LIMIT for large result sets, and avoiding functions on indexed columns. For example, wrapping a column in a function prevents the database from using an index on that column:

SQL
-- Can't use index on created_at:
SELECT * FROM orders WHERE YEAR(created_at) = 2024;

-- Rewrite as a range to use the index:
SELECT * FROM orders
WHERE created_at >= '2024-01-01'
  AND created_at < '2025-01-01';

After rewriting, compare the Duration column in the query table to confirm the change reduced execution time.

Fix N+1 query patterns

Batch similar requests to reduce repeated database round trips.

Instead of (N+1):

SQL
-- Application loops through customer IDs
SELECT * FROM orders WHERE customer_id = 1;
SELECT * FROM orders WHERE customer_id = 2;
SELECT * FROM orders WHERE customer_id = 3;
-- ... repeated N times

Use batch query:

SQL
SELECT * FROM orders WHERE customer_id IN (1, 2, 3, ...);

Or use JOIN to fetch related data in one query. After batching, check the Calls and Duration columns in the query table to verify fewer, faster executions.

Address lock contention

Reduce lock time and conflicts to improve concurrency.

If wait events show lock issues, try these approaches:

  • Reduce transaction scope (shorter transactions)
  • Use appropriate isolation levels
  • Implement optimistic locking in application
  • Reorder operations to acquire locks consistently

After changes, return to the Wait Events tab to confirm that lock wait times have decreased.

Address I/O contention

Reduce data scanned and improve cache efficiency to lower I/O waits.

If wait events show I/O issues, try these approaches:

  • Add more memory to increase cache hit rate
  • Add indexes to reduce data scanned
  • Consider read replicas for read-heavy workloads
  • Review storage performance (IOPS, throughput)

After changes, return to the Wait Events tab to confirm that I/O wait times have decreased.

Use AI Helper

Select the AI Helper tab to get AI-powered explanations of query structure and optimization suggestions. Validate any recommendation against your explain plan and query samples before applying it.

Note

AI Helper is being replaced by the Grafana AI Assistant in a future release. Functionality may change during this transition.

Verify improvement

After applying changes, confirm they worked.

Monitor in Database Observability

Compare key query metrics before and after the change.

  1. Return to Queries Overview.
  2. Find the optimized query.
  3. Compare metrics before and after:
MetricCheck
Duration (avg)Should decrease
Duration (p99)Tail latency should improve
ErrorsShould not increase
CallsShould remain stable
Resource utilizationCPU and I/O should decrease

Continue tracking these metrics over time to confirm the improvement holds under varying load.

Check explain plan

Confirm that the optimizer uses your intended plan and indexes.

  1. View the new explain plan.
  2. Verify the index is being used.
  3. Confirm row estimates decreased.

If the plan hasn’t changed after adding an index, update statistics (ANALYZE for PostgreSQL, ANALYZE TABLE for MySQL) and check again.

Review samples

Validate that improvements are consistent across executions.

  1. Check the Query Samples tab.
  2. Verify consistent improvement.
  3. Watch for any regression in specific cases.

Use these guides to dig deeper into the investigation workflow.