Meta monitoring for Cloud
Meta monitoring is the process of monitoring your monitoring system (or alerting system).
Monitor your alerting implementation to understand its health, detect potential issues, and troubleshooting.
Grafana provides predefined metrics and logs to enable you to meta monitor Grafana Alerting. You can monitor this data in different ways, such as:
- [Optional] Create a Grafana dashboard with a panel that uses these metrics, similar to Alerting Insights.
- [Optional] Create an alert rule in Grafana that checks a metric regularly, just like any other alert rule.
- [Optional] Use Explore to query the metrics or logs.
Before you begin
To explore your alerting metrics and logs, you must:
- Have Admin or Editor user permissions for the managed Grafana Cloud instance.
- Log in to your instance and click the Explore (compass) icon in the sidebar menu.
Explore insights metrics
Alerting meta-monitoring metrics are stored in Prometheus data sources, which are part of your Grafana Cloud stack and are accessible from your Grafana instance.
Note
A single Grafana Cloud account can run multiple Grafana Cloud stacks, all using the same
grafanacloud-usage
data source. When querying meta-monitoring metrics in thegrafanacloud-usage
data source, filter by your Grafana stack identifier (id
).
For Grafana-managed alerts
Available in the grafanacloud-usage
Prometheus data source.
grafanacloud_grafana_instance_alerting_rule_group_rules
The number of alert rules, labeled by Grafana stack (id
) and alert rule state (state
).
sum by(state) (grafanacloud_grafana_instance_alerting_rule_group_rules{id="<your_grafana_stack_id>"})
The state
label can be active
or paused
.
grafanacloud_grafana_instance_alerting_alerts
The number of alert instances, labeled by Grafana stack (id
) and alert instance state (state
).
sum by(state) (grafanacloud_grafana_instance_alerting_alerts{id="<your_grafana_stack_id>"})
The state
label can be alerting
, error
, nodata
, normal
, or pending
.
grafanacloud_grafana_instance_alerting_rule_evaluations_total:rate5m
The per-second rate of alert rule evaluations over the last 5 minutes, labeled by Grafana stack (id
).
grafanacloud_grafana_instance_alerting_rule_evaluations_total:rate5m{id="<your_grafana_stack_id>"}
grafanacloud_grafana_instance_alerting_rule_evaluation_failures_total:rate5m
The per-second rate of failed alert rule evaluations over the last 5 minutes, labeled by Grafana stack (id
).
grafanacloud_grafana_instance_alerting_rule_evaluation_failures_total:rate5m{id="<your_grafana_stack_id>"}
grafanacloud_grafana_instance_alerting_alertmanager_alerts
The number of alerts received by the Grafana Alertmanager for notification processing, labeled by Grafana stack (id
) and alert notification state (state
).
sum by(state) (grafanacloud_grafana_instance_alerting_alertmanager_alerts{id="<your_grafana_stack_id>"})
The state
label can be active
, suppressed
, or unprocessed
.
grafanacloud_grafana_instance_alerting_silences
The number of silences, labeled by Grafana stack (id
) and silence state (state
).
sum by(state) (grafanacloud_grafana_instance_alerting_silences{id="<your_grafana_stack_id>"})
The state
label can be active
, expired
, or pending
.
For Mimir alerts
Meta-monitoring metrics for Mimir alert rules are stored in the grafanacloud-usage
and grafanacloud-<yourstackname>-prom
Prometheus data sources.
You can find these metrics in Alerting insights.
- In your Grafana Cloud stack, click Alerts & IRM in the left-side menu.
- Click Alerting.
- On the Alerting landing page, view the Insights tab.
- Select a panel from the Mimir sections.
- Click the menu icon (three-dots).
- Click Explore to view the metrics and the data source queried by the panel.
Explore alerting logs
Alerting logs are stored in Loki data sources, which are part of your Grafana Cloud stack and are accessible from your Grafana instance.
For Grafana-managed alert state changes
Logs related to state changes in Grafana-managed alerts are stored in the grafanacloud-<yourstackname>-alert-state-history
Loki data source.
To explore these logs, complete the following steps.
In Explore, select the
grafanacloud-<yourstackname>-alert-state-history
Loki data source.Use the Loki query editor to find logs.
{from="state-history"} | json
Click Run query.
In the Logs section, review specific details about alerts by selecting relevant fields:
previous
: previous alert instance state.current
: current alert instance state.ruleTitle
: alert rule title.ruleID
andruleUID
.labels_alertname
,labels_new_label
, andlabels_grafana_folder
.- Additional available fields.
Alternatively, you can access the History page in Grafana to visualize and filter state changes for individual alerts or all alerts.
For Mimir alerts
Logs for Mimir-managed alerts are stored in the grafanacloud-<yourstackname>-usage-insights
Loki data source.
These logs help you troubleshoot alerts by providing insight about their notification status. They display error messages for failing alerts.
To explore these logs, complete the following steps.
In Explore, select the
grafanacloud-<yourstackname>-usage-insights
Loki data source.Use the Loki query editor to find logs. The following query retrieves all alert logs:
{instance_type="alerts"} | logfmt
Click Run query.
In the Logs section, review specific details about alert logs by selecting relevant fields such as
msg
,alert
, oralerts
.