Best practices for incidents
Incidents represent significant events that require coordinated response. Effective incident management improves response times and enables learning from past events.
Understand incidents
Before creating and managing incidents in IRM, understand what they are and when to use them.
What is an incident
An incident is a formal record of a significant event affecting your services. Incidents provide a coordination point for response activities and a record for post-incident review.
Incidents differ from alert groups:
When to use incidents
Create incidents for events that need:
- Coordinated response across multiple people or teams.
- Formal tracking for compliance or reporting.
- Stakeholder communication beyond the on-call team.
- Post-incident review and learning.
Not every alert needs an incident. Use alert groups for routine alerts that on-call can handle independently.
Relationship to alert groups
Incidents can have alert groups attached to them:
- Alert groups provide technical context for the incident.
- Up to 5 alert groups can be attached per incident.
- Labels flow from alert groups to incidents when declaring automatically.
For large-scale incidents, group related alerts effectively before attaching.
Creating incidents
Incidents can be created automatically through escalation chains or manually by responders.
Automatic creation
Use the Declare incident escalation step for automatic creation:
1. Notify on-call
2. Wait 10 minutes
3. Declare incident (severity: major)
4. Notify incident commanderBest for:
- Critical alerts that always warrant incidents.
- Alerts matching specific patterns (high severity, production impact).
- Standardizing incident creation across teams.
Limitation: Incident declaration only works on non-default routes. Configure specific routes for alerts that should create incidents.
Manual creation
Create incidents manually when:
- Multiple related alert groups need coordination.
- Customer-reported issues aren’t detected by monitoring.
- Security incidents require formal tracking.
- Business events affect operations.
Attaching alert groups
When you attach alert groups to incidents:
- Related alerts are correlated together.
- Technical context is preserved with the incident.
- You can track which alerts contributed to the incident.
Attach alert groups during incident creation or add them later as you identify related alerts.
Configure your incident workflow
Customize severity levels, statuses, and labels to match your organization’s process.
Severity levels
IRM lets you define custom severity levels. Design them based on your SLAs and team capacity.
Refer to the following configuration example:
This is just an example. Create severity levels that reflect your operational requirements.
Status progression
Define statuses that reflect your incident management process.
For example:
- Declared: Incident created, initial response starting.
- Acknowledged: Responders engaged, investigation underway.
- Mitigated: Impact reduced, full resolution pending.
- Resolved: Incident fully resolved.
- Closed: Post-incident activities complete.
Design statuses based on your team’s workflow and reporting needs.
Incident labels
Labels enable filtering, routing, and analytics for incidents.
Label sources:
- Static labels: Set at the integration level, applied to all incidents from that source.
- Dynamic labels: Transferred from alert groups when declaring an incident.
- Manual labels: Added during the incident lifecycle as new information emerges.
Essential labels:
service_name: The affected service (required for Service Center).severity: Incident severity level.team: Responsible team.environment: Production, staging, and so on.
Label flow:
Alert Rule Labels → Alert Group Labels → Incident Labels
↓ ↓ ↓
(automatic) (templates) (manual)Labels flow through the lifecycle, with each stage able to add or modify labels.
Service Center integration
Service Center provides a unified view of operational health by connecting alerts, alert groups, incidents, and SLOs.
The service_name label
The service_name label ties everything together in Service Center:
- Alerts with
service_nameappear in that service’s view. - Alert groups inherit
service_namefrom alerts. - Incidents inherit
service_namefrom alert groups. - SLOs are associated with services.
Best practice: Ensure service_name is consistently applied across all alerts.
Benefits
- Unified view: See all operational activity for a service in one place.
- On-call handoffs: Review recent incidents during shift changes.
- Operational reviews: Analyze trends and patterns per service.
- SLO correlation: Connect incidents to SLO impact.
Enabling Service Center
- Define services in Service Center.
- Ensure alerts include
service_namelabels. - Configure label templates to preserve
service_name. - Verify incidents appear in Service Center views.
During an incident
Keep stakeholders informed and coordinate response throughout the incident lifecycle.
Manage incidents from Slack
The Slack integration helps your teams coordinate incident response, with some of the following benefits:
- Dedicated channels: Create incident-specific channels for coordination.
- Channel naming: Use consistent prefixes like
#inc-for easy identification. - Automated updates: Post status changes to incident channels.
- Timeline sync: Activity in Slack appears in the incident timeline.
Communication and announcements
Configure incident announcements to:
- Notify stakeholders when incidents are declared.
- Provide status updates during response.
- Communicate resolution to affected parties.
Best practice: Define announcement templates for consistency across incidents.
Status updates
Update incident status as the situation evolves:
- Change severity if impact assessment changes.
- Progress through statuses as you move from investigation to mitigation to resolution.
- Add timeline entries to document key decisions and actions.
After resolution
Complete post-incident activities to improve future response.
Resolution notes
Add resolution notes to document:
- Root cause of the incident.
- Steps taken to resolve.
- Lessons learned.
Resolution notes build institutional knowledge and improve future response.
Incident review
After resolution, complete the incident record:
- Finalize the incident timeline.
- Add resolution notes.
- Attach all relevant alert groups.
- Update labels for accurate analytics.
Analytics and reporting
Use incident data for operational insights:
- Trend analysis: Identify recurring issues.
- Response metrics: Track MTTR (Mean Time to Resolve) and MTTA (Mean Time to Acknowledge).
- Service health: Correlate incidents with SLO performance.
- Capacity planning: Understand incident frequency and impact.
Continuous improvement
Leverage incident insights to improve your systems:
- Alert quality: Reduce noise by tuning thresholds and grouping.
- Escalation chains: Speed response with better notification paths.
- Runbooks: Improve documentation based on resolution patterns.
- Monitoring: Enable earlier detection of similar issues.
Best practices summary
- Understand the difference: Use incidents for coordination, alert groups for routine alerts.
- Automate when appropriate: Use escalation steps for critical alerts that always need incidents.
- Apply consistent labels: Especially
service_namefor Service Center integration. - Configure your workflow: Design severity levels and statuses for your organization.
- Communicate proactively: Keep stakeholders informed throughout the lifecycle.
- Document resolutions: Add resolution notes for future learning.
- Review and improve: Use incident data to drive continuous improvement.
Next steps
- Configure incident settings for your organization
- Incident management workflows
- Slack integration for chat-based response
- Configure labels for incident tracking



