Grafana Enterprise Metrics runbooks
This document contains runbooks specific to Grafana Enterprise Metrics (GEM), extending the Mimir runbooks. These runbooks provide troubleshooting procedures for GEM-specific alerts and components.
Alerts
GEMFederationFrontendRemoteClusterErrors
This alert fires when the federation-frontend is receiving a high error rate (>1%) from a remote cluster over a 15-minute period.
How it works:
- The federation-frontend delegates queries to remote clusters to run them
 - The alert triggers when more than 1% of requests to a specific remote cluster result in server errors (5xx) over a 15-minute window
 - If partial responses are disabled (default configuration), clients querying the federation-frontend receive errors
 - If partial responses are enabled, responses are incomplete but still returned to clients
 
How to investigate:
- Check the federation-frontend logs for detailed error messages about the failing requests to the remote cluster
 - Check the health of the remote cluster:
- Look for any ongoing alerts in the remote cluster
 - Check resource utilization (CPU, memory, disk)
 - Verify that the remote cluster’s query path components are healthy
 
 - Check network connectivity:
- Verify network connectivity between clusters
 - Check for any firewall or security group changes
 - Ensure DNS resolution is working correctly
 
 - Monitor the error rate and request patterns on the 
Mimir / Federation-frontenddashboard:- Look at the “Remote requests / sec by request type” panel
 - Check the error rates by remote cluster
 
 
Common causes and solutions:
Remote cluster is overloaded:
- Check the remote cluster’s resource utilization
 - Consider scaling up the remote cluster’s query path components
 
Network connectivity issues:
- Verify network paths between clusters
 - Check for any recent network infrastructure changes
 - Ensure all required ports are open between clusters
 
Authentication/Authorization issues:
- Verify that the federation-frontend’s credentials for the remote cluster are valid. See Cross-cluster query federation for setting up authentication with GEM.
 - Check if any authentication tokens or certificates have expired
 



