Operational practices overview

Why a separate section?

Some capabilities don’t fit neatly into the hierarchy. They span all four levels:

Practice	How it spans levels
Alerting	Alert on infrastructure (L1), services (L2), transactions (L3), or custom metrics (L4)
Proactive testing	Test infrastructure uptime (L1), service endpoints (L2), or user journeys (L3)
Platform management	Govern access, costs, and scale across all levels
Grafana Assistant	Query, troubleshoot, and build dashboards using natural language at any level

The three operational areas (plus an AI accelerator)

When to focus on operations

If you’re at…	Operational priority
Level 1	Basic alerting on infrastructure metrics
Level 2	Service-level alerting, SLOs
Level 3	Synthetic tests for critical user journeys
Level 4	Custom metric alerting, cost optimization

We’ve covered the four levels of observability. But there’s a set of capabilities that don’t fit neatly into any single level. They apply across all of them.

These are operational practices: alerting, incident response, proactive testing, and platform management. They’re the tools that turn observability data into action.

Think about alerting. At Level 1, you alert on infrastructure metrics. At Level 2, you alert on service health. At Level 3, you alert on transaction latency. At Level 4, you alert on custom metrics.

The concept is the same; the scope changes.

Same with proactive testing. You can test that servers respond, that service endpoints work, or that complete user journeys succeed.

And platform management (access control, cost management, scaling) applies across everything.

We’ll cover three operational areas: Alerting and Incident Response Management, which is about detecting problems and responding effectively. Proactive Testing, which is about finding problems before your users do. And Platform Management, which is about governing your observability practice at scale.

And there’s one capability that accelerates all of them: Grafana Assistant. It’s an AI-powered assistant that helps you query data, troubleshoot issues, and build dashboards using natural language.

Whether you’re exploring metrics at Level 1 or debugging traces at Level 3, you can ask questions instead of writing complex queries. We’ll cover it after Alerting and IRM.

What you focus on depends on where you are in the hierarchy, but these capabilities are available to you from day one.

Operational practices overview

Why a separate section?

The three operational areas (plus an AI accelerator)

When to focus on operations

Script

In this module