Grafana Enterprise Metrics runbooks
This document contains runbooks specific to Grafana Enterprise Metrics (GEM), extending the Mimir runbooks. These runbooks provide troubleshooting procedures for GEM-specific alerts and components.
Alerts
GEMFederationFrontendRemoteClusterErrors
This alert fires when the federation-frontend is receiving a high error rate (>1%) from a remote cluster over a 15-minute period.
How it works:
- The federation-frontend delegates queries to remote clusters to run them
- The alert triggers when more than 1% of requests to a specific remote cluster result in server errors (5xx) over a 15-minute window
- If partial responses are disabled (default configuration), clients querying the federation-frontend receive errors
- If partial responses are enabled, responses are incomplete but still returned to clients
How to investigate:
- Check the federation-frontend logs for detailed error messages about the failing requests to the remote cluster
- Check the health of the remote cluster:
- Look for any ongoing alerts in the remote cluster
- Check resource utilization (CPU, memory, disk)
- Verify that the remote cluster’s query path components are healthy
- Check network connectivity:
- Verify network connectivity between clusters
- Check for any firewall or security group changes
- Ensure DNS resolution is working correctly
- Monitor the error rate and request patterns on the
Mimir / Federation-frontend
dashboard:- Look at the “Remote requests / sec by request type” panel
- Check the error rates by remote cluster
Common causes and solutions:
Remote cluster is overloaded:
- Check the remote cluster’s resource utilization
- Consider scaling up the remote cluster’s query path components
Network connectivity issues:
- Verify network paths between clusters
- Check for any recent network infrastructure changes
- Ensure all required ports are open between clusters
Authentication/Authorization issues:
- Verify that the federation-frontend’s credentials for the remote cluster are valid. See Cross-cluster query federation for setting up authentication with GEM.
- Check if any authentication tokens or certificates have expired