Sumo Logic alerting
Grafana alerting lets you define rules that continuously evaluate your Sumo Logic data and send notifications when conditions are met. For example, you can create a rule that fires when CPU usage exceeds a threshold or when error rates spike beyond normal levels. The Sumo Logic data source supports alerting through its backend query implementation, which allows Grafana to evaluate queries server-side on a schedule.
For general information about Grafana alerting, refer to Alerting.
Before you begin
- Configure the Sumo Logic data source.
- Familiarize yourself with Grafana alert rules.
- Ensure you have configured at least one contact point for notifications.
Supported query types
Alert rules require queries that return numeric data so Grafana can evaluate them against threshold conditions.
- Metrics queries: Fully supported. Metrics queries return numeric time-series data.
- Aggregated logs queries: Supported. Logs queries that use aggregation operators (such as
count,sum, oravg) return numeric results that can be evaluated by alert rules. - Raw logs queries: Not supported. Non-aggregated log searches return log messages rather than numeric data, which Grafana can’t evaluate as alert conditions.
Create an alert rule
To create an alert rule using the Sumo Logic data source:
- Navigate to Alerting > Alert rules.
- Click New alert rule.
- Enter a name for the alert rule.
- Select the Sumo Logic data source.
- Write a query that returns numeric data. Use either a metrics query or an aggregated logs query.
- Configure the alert condition by selecting a reducer (for example, Last) and setting a threshold.
- Set the evaluation interval and pending period.
- Configure notification policies and contact points as needed.
- Click Save rule.
For more details on configuring alert rules, refer to Create alert rules.
Alert rule query examples
The following examples show queries suitable for alert rules.
Metrics query examples
Alert when average CPU idle time drops below a threshold:
metric=cpu_idle | avg by hostMonitor memory usage across hosts:
metric=mem_used_percent | max by hostTrack the rate of HTTP errors:
metric=http_errors | rate increasing | sumMonitor disk usage:
metric=disk_used_percent mount_point=/ | max by hostDetect network latency spikes:
metric=http_response_time | avg by service | where _value > 500Aggregated logs query examples
Alert when error count exceeds a threshold:
_sourceCategory=prod/app "ERROR" | countMonitor failed login attempts:
_sourceCategory=auth action=login status=failure | count by _sourceHostTrack 5xx HTTP response rates:
_sourceCategory=prod/web status_code >= 500 | countAlert on high average response times from parsed log fields:
_sourceCategory=prod/api | parse "duration=*ms" as duration | avg(duration)Monitor queue depth from application logs:
_sourceCategory=prod/worker | parse "queue_size=*" as queue_size | max(queue_size) by queue_nameDetect unusual volumes of specific log patterns:
_sourceCategory=prod/app "OutOfMemoryError" | count by _sourceHost


