Important: This documentation is about an older version. It's relevant only to the release noted, many of the features and functions have been updated or replaced. Please view the current version.

Enterprise

Use the overrides exporter

Note

Self-monitoring is an experimental feature. As such, the configuration settings, command line flags, or specifics of the implementation are subject to change.

Overview

Since version 1.4, Grafana Enterprise Metrics (GEM) includes the ability to directly record self-monitoring metrics to allow you to easily monitor the health and stability of GEM itself. One of the key features of self-monitoring is being able to tell what resources tenants in your cluster are using and when they are getting close to exceeding limits on those resources. Since version 2.0.0, Grafana Enterprise Metrics (GEM) enables this feature by default. Some additional configuration is required when running in microservices mode.

In order to be able to tell how close tenants are to exceeding their limits, you’ll need to run the GEM overrides-exporter component. This component makes per-tenant limits applied by runtime configuration file or using the admin API available as metrics. If you are using Jsonnet or Helm to run GEM, the overrides-exporter is already run by default. Otherwise, instructions to run the exporter are given below.

Configuration

The overrides-exporter requires the following settings in the GEM configuration file (or their corresponding CLI flags). Note that if you are already using self-monitoring the overrides-exporter should be able to use the same configuration file you are using for all other GEM components.

enterprise-metrics.yaml:

YAML
# NOTE: This is a minimal configuration file, more settings are needed to correctly run GEM!

# Enable self-monitoring so that the overrides-exporter can write metrics directly to GEM
instrumentation:
  distributor_client:
    address: "dns:///enterprise-metrics:9095"

# Limits set via the "runtime config" file are a potential source of limits for the exporter to expose as metrics
runtime_config:
  file: /etc/enterprise-metrics/runtime-config.yaml

# Limits set via the Admin API are a potential source of limits for the exporter to expose as metrics
admin_client:
  storage:
    type: s3
    s3:
      endpoint: s3.example.com
      bucket_name: admin
      access_key_id: enterprise-metrics
      secret_access_key: supersecret

After ensuring your GEM configuration has the sections required for the overrides-exporter (in addition to the other sections required to run GEM), the overrides-exporter can be started using the following command.

enterprise-metrics -target=overrides-exporter -config.file=/etc/enterprise-metrics/enterprise-metrics.yaml

Testing

You can ensure the overrides exporter is making tenant limits available by making an HTTP request to its metrics endpoint.

Ensure default limits are being exported:

$ curl http://example/metrics | grep cortex_limits_defaults
# HELP cortex_limits_defaults Resource limit defaults for tenants without overrides
# TYPE cortex_limits_defaults gauge
cortex_limits_defaults{limit_name="ingestion_burst_size"} 350000
cortex_limits_defaults{limit_name="ingestion_rate"} 350000
cortex_limits_defaults{limit_name="max_fetched_chunk_bytes_per_query"} 0
cortex_limits_defaults{limit_name="max_fetched_series_per_query"} 0
cortex_limits_defaults{limit_name="max_global_series_per_metric"} 300000
cortex_limits_defaults{limit_name="max_global_series_per_user"} 300000
cortex_limits_defaults{limit_name="max_global_exemplars_per_user"} 0
cortex_limits_defaults{limit_name="max_local_series_per_metric"} 0
cortex_limits_defaults{limit_name="max_local_series_per_user"} 0
cortex_limits_defaults{limit_name="max_series_per_query"} 100000
cortex_limits_defaults{limit_name="ruler_max_rule_groups_per_tenant"} 0
cortex_limits_defaults{limit_name="ruler_max_rules_per_rule_group"} 0

Ensure per-tenant limit overrides are being exported:

$ curl http://example/metrics | grep cortex_limits_overrides
# HELP cortex_limits_overrides Resource limit overrides applied to tenants
# TYPE cortex_limits_overrides gauge
cortex_limits_overrides{limit_name="ingestion_burst_size",user="team-b"} 350000
cortex_limits_overrides{limit_name="ingestion_rate",user="team-b"} 350000
cortex_limits_overrides{limit_name="max_fetched_chunk_bytes_per_query",user="team-b"} 0
cortex_limits_overrides{limit_name="max_fetched_series_per_query",user="team-b"} 0
cortex_limits_overrides{limit_name="max_global_series_per_metric",user="team-b"} 300000
cortex_limits_overrides{limit_name="max_global_series_per_user",user="team-b"} 300001
cortex_limits_overrides{limit_name="max_global_exemplars_per_user",user="team-b"} 0
cortex_limits_overrides{limit_name="max_local_series_per_metric",user="team-b"} 0
cortex_limits_overrides{limit_name="max_local_series_per_user",user="team-b"} 0
cortex_limits_overrides{limit_name="max_series_per_query",user="team-b"} 100000
cortex_limits_overrides{limit_name="ruler_max_rule_groups_per_tenant",user="team-b"} 0
cortex_limits_overrides{limit_name="ruler_max_rules_per_rule_group",user="team-b"} 0

After the overrides-exporter is running, your per-tenant self-monitoring dashboards should be fully functional!