Alerts and IRM

IRM

Guides

Best practices

Labels strategy

Grafana Cloud

Best practices for labels

A well-designed labeling strategy is fundamental to effective incident response. Labels serve three primary purposes: routing alerts to the right teams, providing context for triage, and enabling analytics.

Types of labels

Design your labels with these three purposes in mind:

Labels for routing

Routing labels direct alerts to the appropriate team or escalation chain. These are the most critical labels and should be present on every alert.

Examples:

team_name or owner: Identifies the team responsible for the service.
domain: Specifies the business domain (payments, authentication, data).
namespace: Groups by Kubernetes namespace or environment.

Labels for triage and investigation

Investigation labels help responders understand context quickly without digging through dashboards.

Examples:

cluster: Identifies which cluster is affected.
region: Specifies the geographic region.
pod or instance: Identifies the specific resource.
runbook_url: Links to relevant documentation.

Labels for analytics

Analytics labels enable reporting and trend analysis across incidents.

Examples:

severity: Indicates the alert severity level.
service_name: Identifies the service for Service Center integration.
category: Classifies the type of issue (performance, availability, security).

Coarse-grained vs fine-grained routing

Use a layered approach to routing:

Type	Examples	Purpose
Coarse-grain	`team_name`, `domain`, `namespace`	Route to broad team or area
Fine-grain	`severity`, `service_name`, `time of day`	Further refine within team

Example label sets

Grafana Engineering uses 4 labels:

namespace (coarse-grain routing)
severity (fine-grain routing)
service_name (fine-grain routing and Service Center)
cluster (investigation)

General pattern: 4-5 labels total is sufficient for most organizations. More labels add complexity without proportional benefit.

Static vs dynamic labels

You can define labels statically or extract them dynamically from alert data:

Static labels

Hardcoded in alert rules. Use for:

Team ownership that doesn’t change.
Environment classification.
Severity levels defined by the alert rule author.

Dynamic labels

Extracted from query results or payload data. Use for:

service_name from metrics labels.
namespace and cluster from Kubernetes metadata.
Any value that varies per alert instance.

Important behavior

Static labels override dynamic labels: When both have the same key, static labels take precedence.
Labels are additive: Both static and dynamic labels coexist on the same alert.
Plan your label keys: Avoid unintended overrides by using distinct key names.

Label patterns in Jinja2 templates

Use these patterns in IRM alert group label templates:

Coalesce pattern

Use fallback values when label names vary across sources:

service_name={{ payload.commonLabels.service_name or payload.commonLabels.service or payload.commonLabels.container or payload.commonLabels.job or None }}

This pattern tries multiple possible label names and uses the first one found.

Normalize pattern

Rename or clean up inconsistent label values:

{% set raw_service = payload.commonLabels.service_name | default(none) %}
{% set service_name = 'myRealServiceName' if raw_service == 'JankyServiceName' else raw_service %}
{{ service_name }}

This pattern standardizes label values across different alert sources.

Labels across the lifecycle

Labels flow through the entire alert lifecycle:

Alert Rule → Alert Instance → Alert Group → Incident

At each stage:

Alert Rule: Static and dynamic labels defined.
Alert Instance: Labels from the firing alert.
Alert Group: Labels extracted via templates, plus manual labels.
Incident: Labels transferred from alert group when declaring, plus manual labels.

Manual labels can be added at the Alert Group and Incident levels for additional context discovered during investigation.

Audit your labels

Regularly audit your labeling to maintain quality:

Check for missing labels

Review alerts delivering to default routes. This indicates missing or incorrect routing labels.

To find these alerts:

Check the default route in your IRM integration.
Review recent alert groups that matched the default route.
Identify which labels are missing and update the alert rules.

Validate label standards

Ensure alerts match your organization’s label standards:

Use consistent label keys across teams.
Verify expected values for routing labels.
Confirm required labels are present on all alerts.

Monitor label coverage

Track what percentage of alerts have proper routing labels. Set a target (for example, 95%) and work to improve coverage over time.

Best practices summary

Start with routing labels: Ensure every alert can be routed to the right team
Keep it simple: 4-5 labels is usually sufficient
Use consistent keys: Standardize label names across teams
Document your strategy: Share labeling guidelines with alert rule authors
Audit regularly: Review alerts going to default routes
Leverage Service Center: Use service_name consistently for unified views