Important: This documentation is about an older version. It's relevant only to the release noted, many of the features and functions have been updated or replaced. Please view the current version.

Enterprise

Recommended limits for GEM

You can configure GEM limits to protect a system from any data sender (be it a machine or a person) who inadvertently sends or queries too much data, which can cause the system to run out of resources and crash. The limits also ensure a quality of service and fairness of a service across your user base.

GEM has the concept of tenants and supports setting limits for tenants using runtime configuration in Grafana Mimir. This is the preferred way to manage your limits and keep your system stable. The GEM Plugin for Grafana allows you to easily set these limits from inside your Grafana installation.

Since GEM 1.5.0, a self-monitoring __system__ tenant is available to help determine some of these limit values. For instructions on how to enable self-monitoring, refer to self-monitoring.

If you do not have self-monitoring enabled or are using an earlier version of GEM, you need to query an external monitoring Prometheus to determine these values.

Note

The terms tenant, and user are synonymous within the context of limit names; these terms all reference the same concept.

Global limits and configuration

You can use the following configuration options in your configuration file for all relevant GEM components or you can set them via CLI flags. Before making limits changes to individual tenants, restart the system. This stabilizes the system before you configure it using the API or plugin.

Max inflight ingester push requests

This option is a last-resort limit. If this limit is reached, remote_write samples that were sent to the write path return 5xx errors, which are retried by the sender (in Prometheus and Grafana Agent). A value of 5000 is conservative but effective, based on internal testing. If you change this setting on an already running system, you need to roll restart the ingesters, in the same way you would with any configuration change to the ingesters.

YAML
ingester:
  instance_limits:
    # CLI flag: -ingester.instance-limits.max-inflight-push-requests=5000
    max_inflight_push_requests: 5000 # default is 0 or unlimited in GEM 1.5.1 and earlier

When you reach this limit, requests to POST /api/v1/push fail with the message:

cannot push: too many inflight push requests in ingester

Querier max concurrent

This option changes the number of queries a querier process is able to perform at once. A value of 8 is recommended. If you change this setting on an already running system, you need to roll restart the queriers, in the same way you would with any configuration change to the queriers.

YAML
querier:
  # CLI flag: -querier.max-concurrent
  max_concurrent: 8

Default minimal global settings

You can set limits for tenants above these minimal defaults. However, these limits help to keep the system stable if a new tenant is created and its limits have not been explicitly set. To keep things simple, apply these limits on every component because there are some components which embed other components and are as a result sometimes missed.

To override them, based on the needs of individual tenants, see Tenant limits

YAML
limits:
  max_global_series_per_user: 1000000
  max_global_series_per_metric: 30000
  ingestion_rate: 142857
  ingestion_burst_size: 142857
  ingestion_rate_strategy: global
  ruler_max_rules_per_rule_group: 20
  ruler_max_rule_groups_per_tenant: 70

Tenant limits

To set the following limits on each of your tenants, use the GEM plugin. Alternatively, you can use runtime configuration in Grafana Mimir.

Max global series per user

max_global_series_per_user

Set the maximum number of series per tenant, which are enforced at the ingester level. This is generally described as the “number of metrics” that a tenant has. Note: The term user in the metric name does not mean a person using the system; in this context, the term is synonymous with tenant.

  • If you are onboarding a new tenant, set this value to a low number such as 10000 to ensure that the system has enough memory and CPU headroom before it starts accepting metrics. To determine if CPU and memory metrics for each component are below the limits that you have set, refer to the GEM system monitoring / Reads resources and GEM system monitoring / Writes resources dashboards. If a customer’s metrics are being discarded, slowly raise this value until the discarded samples go to zero, while checking to ensure that the system still has enough memory and CPU headroom to handle the additional load.
  • If you are retroactively applying limits for a tenant that is sending data, and the system is stable, use the GEM system monitoring / Per-tenant usage dashboard for each tenant to determine the value to set this to. For each tenant, choose Tenant from the top of the dashboard, set the dashboard date in the upper right-hand corner to a preferred time frame, and choose a value higher than the maximum value of the Number of active time series graph, which will ensure the senders’ data is not discarded. You might want to add a 10% buffer on top of the maximum number to allow for some ingesters to contain slightly more metrics than others due to imperfect load balancing across ingesters.

To determine if you are discarding samples for a tenant, use the Explore functionality in Grafana to perform the following query against the GEM system monitoring datasource:

promql
sum by (reason) (rate(cortex_discarded_samples_total{user="TENANT-ID"}[1m]))

where TENANT-ID is the ID of the tenant.

The reason label per_user_series_limit should be 0, otherwise GEM is rejecting samples as there is more data than allowed being sent.

When this limit is reached, GEM responds with a 400 HTTP status code on POST /api/v1/push and with the message:

per-user series limit of 1000000 exceeded, please contact administrator to raise it (per-ingester local limit: 100000)

Max global series per metric

max_global_series_per_metric

This limit controls the maximum number of active series per metric name, across the cluster before replication.

When this limit is reached, GEM responds with a 400 HTTP status code on POST /api/v1/push and with the message:

per-metric series limit of 30000 exceeded, please contact administrator to raise it (per-ingester local limit: 1200)

Max fetched chunk bytes per query

max_fetched_chunk_bytes_per_query

This limit helps guard against querier processes utilizing too much memory and causing crashes.

  • For microservices deployments, Grafana Labs recommends leaving this limit unset. If you observe querier processes crashing frequently because they are out of memory you can consider setting this limit, but infrequent querier process crashes are tolerated by the distributed system. Alternatively, setting the querier limit -querier.max-concurrent to a lower value than the default is an option to lower per querier memory usage.
  • For single binary deployments, Grafana Labs recommends determining the value for this limit by taking the amount of memory you are willing to allocate to queries, and dividing that number of bytes by the value of the querier limit -querier.max-concurrent (see Querier max concurrent). Set the final value of max_fetched_chunk_bytes_per_query to 50% of the calculated value to account for various factors which increase memory overhead in the system. For example, if you are allocating 20GB of memory to queries, and the max-concurrent is set to 8, set the value to 1342177280.

Note

This limit is only applied when the runtime configuration setting ingester_stream_chunks_when_using_blocks (or -ingester.stream-chunks-when-using-blocks) is true.

When this limit is reached, GEM responds with a 422 HTTP status code on query endpoints and with the message:

the query hit the aggregated chunks size limit (limit: 1048576 bytes)

Ingestion rate

ingestion_rate

This limit helps guard against an increase in high frequency incoming samples from using too much memory on the system. The number of series and the number of series per metric limits set above are also important, but if a sender was updating a series every second, it would take more memory than one that was updated every minute.

  • If you are onboarding a new tenant, Grafana Labs recommends setting this value to the value determined for max_global_series_per_user, divided by the scraping interval in seconds, then adding a 10% buffer. For example, if you have a value of 1000000 set for your max_global_series_per_user, the scraping interval of a sender is 15s, you should set the ingestion_rate to 73333. If you have higher or lower frequency data, you may want to increase or decrease this value, being sure to watch memory and CPU utilization on the system.
  • If you are retroactively applying limits for a tenant that is sending data, and the system is stable, use the GEM system monitoring / Per-tenant usage dashboard for each tenant to determine the value to set this to. For each tenant, choose its name in the Tenant dropdown at the top of the dashboard, set the dashboard date in the upper right-hand corner to a preferred time frame, and choose a value higher than the maximum value of the Ingestion rate graph, which will ensure the senders’ data is not discarded. You might want to add a 10% buffer on top of the maximum number to allow for some variability in the rate of data being sent.

To determine if senders are hitting the ingestion rate limit, use the Explore functionality in Grafana to perform the following query against the GEM system monitoring datasource:

promql
sum by (reason) (rate(cortex_discarded_samples_total{user="TENANT-ID"}[1m]))

where TENANT-ID is the ID of the tenant.

The reason label rate_limited should be 0, otherwise GEM is rejecting samples as there is more data than allowed being sent.

When this limit is reached, GEM responds with a 429 HTTP status code on POST /api/v1/push and with the message:

ingestion rate limit (142857) exceeded while adding 1000 samples and 1000 metadata

Ingestion burst size

ingestion_burst_size

This limit allows for some temporary influx of data to be sent above the normal ingestion rate, but for short periods of time. To be extra careful and cautious, set this value to the same value you have set for ingestion_rate, which allows for no bursting. However if you have enough headroom in your system and want to hit limits less frequently if a sender temporarily bursts above, you are able to set this value. Depending on your headroom you may want to set this to a higher or lower number, but a relatively safe value would be to double it.

For customers deploying using single binary or in a scenario where the queriers are part of the write path, Grafana Labs recommends setting this value equal to ingestion_rate, which disables bursting.

When this limit is reached, GEM responds with a 429 HTTP status code on POST /api/v1/push and with the message:

ingestion rate limit (142857) exceeded while adding 1000 samples and 1000 metadata

Ruler max rules per rule group

ruler_max_rules_per_rule_group

If you are using the ruler component, you should set limits around how many rules can exist in a rule group. A smaller number of rules per rule group allows for more even distribution across the rulers. Grafana Labs recommends setting this value to 20.

When this limit is reached, GEM responds with a 400 HTTP status code on POST rules endpoints and with the message:

per-user rules per rule group limit (limit: 20 actual: 24) exceeded

Ruler max rule groups per tenant

ruler_max_rule_groups_per_tenant

If you are using the ruler component, you should set limits around how many rule groups a tenant is allowed. This number can be higher or lower depending on how many ruler instances you have. The effective number of rules a tenant can have is the ruler_max_rule_groups_per_tenant multiplied by ruler_max_rules_per_rule_group.

Grafana Labs recommends setting this to a lower number, such as 10, and increasing it if tenants need more ruler capacity.

To determine if the rulers have capacity to allow an increase of rule groups, reference the GEM system monitoring / Reads Resources dashboard in the Ruler section, to determine if CPU and memory metrics are beneath the hard limits you have set.

When this limit is reached GEM, responds with a 400 HTTP status code on POST rules endpoints and with the message:

per-user rule groups limit (limit: 70 actual: 71) exceeded

Migrate your limits configuration

If you configured tenant limits using the deprecated limits field in the Tenant API, you need to migrate to runtime configuration in Grafana Mimir.

Note

In addition to the limits that you set for a tenant using the Tenant API, the limits field also contains duplicates of the cluster-wide GEM limits. If you want to avoid applying these duplicate limits to every tenant, follow this guidance on how to only migrate limits that are unique to a given tenant.

  1. Set up runtime configuration to manage tenant limits.
    • If you’re using Kubernetes through the Grafana Mimir Helm chart, use the built-in runtime configuration support in the values.yaml file. Refer to the runtimeConfig section in the example values.yaml file.
    • If you’re not using the Helm chart, distribute a YAML file to the nodes running the software. Use a well-known path in the main configuration file that points to this other configuration file. Use the deployment strategy that you have in place to distribute binaries and files to nodes. Because Mimir and GEM check the configuration file for updates every minute by default, you don’t need to restart the binary after each change. To determine which limits were set using the Tenant API, compare the Tenant API overrides against the cluster-wide limits configuration reference for your deployment type.
    • If you’re using the Grafana Mimir Helm chart, reference the cluster-wide limits configuration support in the values.yaml file. Refer to the limits section of mimir.config in the example values.yaml file.
    • If you’re not using the Helm chart, refer to the limits section of the main configuration file.
  2. For each tenant, copy the limits applied with the API to the runtime configuration file.
    • These limits have the same names in both systems.
    • Make sure to nest these settings one level under the tenant ID.
  3. (Optional) To avoid setting unnecessary per-tenant limit overrides that you’d rather manage at the cluster level, remove any limits that match the cluster-wide limits.1. Deploy the runtime configuration file.
  4. For each tenant, delete the limits configured with the API.
    • If you’re using the GEM plugin, click the X next to each limit.
    • If you’re using the API, set each limit to a value of null.

You’ve configured GEM to read tenant limits from the runtime configuration file. Don’t use the Tenant API to update limits, as this overrides the values set in the runtime configuration file.

For more information, refer to About Grafana Mimir runtime configuration in the Mimir documentation.