Important: This documentation is about an older version. It's relevant only to the release noted, many of the features and functions have been updated or replaced. Please view the current version.
Grafana Enterprise Metrics runbooks
This document contains runbooks specific to Grafana Enterprise Metrics (GEM), extending the Mimir runbooks. These runbooks provide troubleshooting procedures for GEM-specific alerts and components.
Alerts
GEMFederationFrontendRemoteClusterErrors
This alert fires when the federation-frontend is receiving a high error rate (>1%) from a remote cluster over a 15-minute period.
How it works:
- The federation-frontend delegates queries to remote clusters to run them
- The alert triggers when more than 1% of requests to a specific remote cluster result in server errors (5xx) over a 15-minute window
- If partial responses are disabled (default configuration), clients querying the federation-frontend receive errors
- If partial responses are enabled, responses are incomplete but still returned to clients
How to investigate:
- Check the federation-frontend logs for detailed error messages about the failing requests to the remote cluster
- Check the health of the remote cluster:
- Look for any ongoing alerts in the remote cluster
- Check resource utilization (CPU, memory, disk)
- Verify that the remote cluster’s query path components are healthy
- Check network connectivity:
- Verify network connectivity between clusters
- Check for any firewall or security group changes
- Ensure DNS resolution is working correctly
- Monitor the error rate and request patterns on the
Mimir / Federation-frontend
dashboard:- Look at the “Remote requests / sec by request type” panel
- Check the error rates by remote cluster
Common causes and solutions:
Remote cluster is overloaded:
- Check the remote cluster’s resource utilization
- Consider scaling up the remote cluster’s query path components
Network connectivity issues:
- Verify network paths between clusters
- Check for any recent network infrastructure changes
- Ensure all required ports are open between clusters
Authentication/Authorization issues:
- Verify that the federation-frontend’s credentials for the remote cluster are valid. See Cross-cluster query federation for setting up authentication with GEM.
- Check if any authentication tokens or certificates have expired