Plan your alert rules

Before you create an alert rule, take a moment to plan. Good alerts are specific, actionable, and appropriately urgent. Poor alerts create noise that leads to alert fatigue—and eventually, ignored pages.

Effective alert planning answers three questions: What should I monitor? When should it fire? Who should be notified?

To plan your alert rules, consider the following:

  1. Choose what to monitor. Start with metrics or logs that directly indicate user impact or system health.

    Data typeWhat to monitorWhy it matters
    MetricsCPU, memory, disk, networkResource exhaustion affects all services
    LogsError patterns, exceptions, failed requestsApplication health and user impact
  2. Define meaningful thresholds. Base thresholds on what “normal” looks like in your environment, not arbitrary numbers.

    Data typeExample thresholdReasoning
    MetricsCPU > 80%Normal is 40-60%, gives time to respond
    LogsErrors > 10/minNormal is 1-2/min, catches real spikes
  3. Set appropriate urgency. Not every alert needs to page someone at 3 AM.

    Alert typeMetrics exampleLogs exampleUrgency
    CriticalDisk 95% fullFATAL or panic logsPage immediately
    WarningCPU elevated 15 minError rate 5x normalSlack notification
    InfoMemory trending upUnusual log patternEmail digest
  4. Identify the responders. Who should receive this alert? The platform team? Database team? On-call engineer?

  5. Consider the “for” duration. How long should the condition persist before firing? Brief spikes during deployments shouldn’t page anyone.

In the next milestone, you’ll use Grafana’s exploration tools to find the specific metrics or logs you want to alert on.

More to explore (optional)

At this point in your journey, you can explore the following paths:

Alerting best practices


page 3 of 11