Check the health status of a collector
If you’ve discovered a degraded collector in your fleet, Grafana Fleet Management can help you diagnose the problem.
Grafana Fleet Management provides a health status indicator so you can see at a glance if your collectors are healthy. You can find a collector’s health status in the Status column on the Inventory tab in the Fleet Management interface.
The health status indicator reflects the current state of the collector:
- Green (Healthy) means the collector is healthy.
- Yellow (Warning) means the collector is potentially unhealthy.
- Red (Error) means the collector is unhealthy.
- Gray (Unknown) means the collector is not reporting data.
You can filter your fleet by status by clicking on the Status dropdown on the Inventory tab.
How health status is determined
The health status is controlled by three factors:
- Has the collector made a
GetConfig
API request in the last 30 minutes? - Is the collector reporting an
up
metric with the correctcollector_id
label? - Does the collector have any active alerts?
Note
The health status does not check for configuration errors.
The Fleet Management service fetches active alerts from the Grafana Prometheus instance. You must create alerts in your stack’s Prometheus Alertmanager to make them discoverable by the health status check. Refer to Create new alerts for guidance on labeling alerts and the Alertmanager documentation for tips on using Mimirtool to configure Alertmanager.
Green status
A green health status indicates that the collector is operational. At minimum, an operational collector:
- Has no active alerts.
- Made a
GetConfig
API request in the last 30 minutes or reported anup
metric.
Note
A false-positive healthy status can result if the
collector_id
label in an alert does not match theid
argument in theremotecfg
block of the collector. If the labels do not match, the alerts cannot be attributed to the collector. When Fleet Management checks the state of a mismatched collector, the service finds no active alerts.
Yellow status
A yellow health status warns that there might be an issue with the collector. It can have several causes:
- Warning: The collector is in one of the following states:
- It has an active, non-critical alert.
- It has not made a
GetConfig
request in the last 30 minutes and is not reporting anup
metric. - Both of the above conditions are true.
- Warning (Inactive): The collector has not made a
GetConfig
request in the last 3 hours and is not reporting anup
metric. The inactive collector might or might not also have an active, non-critical alert.
Note
If your collector is not self-reporting its own metrics with the
collector_id
label, you might see a yellow heartbeat even if the collector is healthy. Collectors in your Fleet Management Inventory should automatically be assigned theself_monitoring_metrics
pipeline. If you see a yellow health status, make sure the pipeline is active.
Red status
A red health status indicates that the collector has an active, critical alert.
Gray status
A gray health status indicates that the collector has not reported telemetry. It has never had a heartbeat, doesn’t have an up
metric within the data retention period, and has no active alerts.
In the next milestone, you’ll learn how to interpret dashboards that show a collector’s internal metrics.