This is archived documentation for v1.6.0. Go to the latest version.

Set up GEMGEM limits recommendations

GEM limits recommendations

You can configure GEM limits to protect a system from any data sender (be it a machine or a person) who inadvertently sends or queries too much data, which can cause the system to run out of resources and crash. The limits also ensure a quality of service and fairness of a service across your user base.

GEM has the concept of tenants and an API where you are able to administer the limits set for your tenants. This is the preferred way to manage your limits and keep your system stable. The GEM Plugin for Grafana allows you to easily set these limits from inside your Grafana installation.

Since GEM 1.5.0, a self-monitoring __system__ tenant is available to help determine some of these limit values. For instructions on how to enable self-monitoring, please see the self-monitoring page.

If you do not have self-monitoring enabled or are using an earlier version of GEM, you need to query an external monitoring Prometheus to determine these values.

> Note: The terms tenant, and user are synonymous within the context of limit names; these terms all reference the same concept.

Global limits and configuration

You can use the following configuration options in your configuration file for all relevant GEM components or you can set them via CLI flags. Before making limits changes to individual tenants, restart the system. This stabilizes the system before you configure it using the API or plugin.

Max inflight ingester push requests

This option is a last-resort limit. If this limit is reached, remote_write samples that were sent to the write path return 5xx errors, which are retried by the sender (in Prometheus and Grafana Agent). A value of 5000 is conservative but effective, based on internal testing. If you change this setting on an already running system, you need to roll restart the ingesters, in the same way you would with any configuration change to the ingesters.

ingester:
  instance_limits:
    # CLI flag: -ingester.instance-limits.max-inflight-push-requests=5000
    max_inflight_push_requests: 5000 # default is 0 or unlimited in GEM 1.5.1 and earlier

Shard by all labels

This option helps to evenly distribute metrics across the pool of ingesters, because the incoming data is hashed by the metric name and the labels rather than only by the metric name. If you do not enable this option, some configured limits will not work correctly. If you change this setting on an already running system, you need to roll restart the distributors, in the same way you would with any configuration change to the distributors.

distributor:
  # CLI flag: -distributor.shard-by-all-labels=true
  shard_by_all_labels: true

Querier max concurrent

This option changes the number of queries a querier process is able to perform at once. A value of 8 is recommended. If you change this setting on an already running system, you need to roll restart the queriers, in the same way you would with any configuration change to the queriers.

querier:
  # CLI flag: -querier.max-concurrent
  max_concurrent: 8

Default minimal global settings

You can set limits for tenants above these minimal defaults. However, these limits help to keep the system stable if a new tenant is created and its limits have not been explicitly set. To keep things simple, apply these limits on every component because there are some components which embed other components and are as a result sometimes missed.

To override them, based on the needs of individual tenants, see Tenant limits

limits:
  max_global_series_per_user: 1000000
  ingestion_rate: 142857
  ingestion_burst_size: 142857
  ingestion_rate_strategy: global
  ruler_max_rules_per_rule_group: 20
  ruler_max_rule_groups_per_tenant: 70

Tenant limits

To set the following limits on each of your tenants, use the GEM plugin. Alternatively, you can manually use the Tenant API within the Admin API:

Max global series per user

max_global_series_per_user

Set the maximum number of series per tenant, which are enforced at the ingester level. This is generally described as the “number of metrics” that a tenant has. Note: The term user in the metric name does not mean a person using the system; in this context, the term is synonymous with tenant.

  • If you are onboarding a new tenant, set this value to a low number such as 10000 to ensure that the system has enough memory and CPU headroom before it starts accepting metrics. To determine if CPU and memory metrics for each component are below the limits that you have set, refer to the GEM system monitoring / GEM / Reads resources and GEM system monitoring / GEM / Writes resources dashboards. If a customer’s metrics are being discarded, slowly raise this value until the discarded samples go to zero, while checking to ensure that the system still has enough memory and CPU headroom to handle the additional load.
  • If you are retroactively applying limits for a tenant that is sending data, and the system is stable, use the GEM system monitoring / GEM / Per-tenant usage dashboard for each tenant to determine the value to set this to. For each tenant, choose Tenant from the top of the dashboard, set the dashboard date in the upper right-hand corner to a preferred time frame, and choose a value higher than the maximum value of the Number of active time series graph, which will ensure the senders' data is not discarded. You might want to add a 10% buffer on top of the maximum number to allow for some ingesters to contain slightly more metrics than others due to imperfect load balancing across ingesters.

To determine if you are discarding samples for a tenant, use the Explore functionality in Grafana to perform the following query against the GEM System Monitoring datasource:

sum by (reason) (rate(cortex_discarded_samples_total{user="TENANT-ID"}[1m]))

where TENANT-ID is the ID of the tenant.

The reason label per_user_series_limit should be 0, otherwise GEM is rejecting samples as there is more data than allowed being sent.

Max fetched chunk bytes per query

max_fetched_chunk_bytes_per_query

This limit helps guard against querier processes utilizing too much memory and causing crashes.

  • For microservices deployments, Grafana Labs recommends leaving this limit unset. If you observe querier processes crashing frequently because they are out of memory you can consider setting this limit, but infrequent querier process crashes are tolerated by the distributed system. Alternatively, setting the querier limit -querier.max-concurrent to a lower value than the default is an option to lower per querier memory usage.
  • For single binary deployments, Grafana Labs recommends determining the value for this limit by taking the amount of memory you are willing to allocate to queries, and dividing that number of bytes by the value of the querier limit -querier.max-concurrent (see Querier max concurrent). Set the final value of max_fetched_chunk_bytes_per_query to 50% of the calculated value to account for various factors which increase memory overhead in the system. For example, if you are allocating 20GB of memory to queries, and the max-concurrent is set to 8, set the value to 1342177280.

Ingestion rate

ingestion_rate

This limit helps guard against an increase in high frequency incoming samples from using too much memory on the system. The number of series and the number of series per metric limits set above are also important, but if a sender was updating a series every second, it would take more memory than one that was updated every minute.

  • If you are onboarding a new tenant, Grafana Labs recommends setting this value to the value determined for max_global_series_per_user, divided by the scraping interval in seconds, then adding a 10% buffer. For example, if you have a value of 1000000 set for your max_global_series_per_user, the scraping interval of a sender is 15s, you should set the ingestion_rate to 73333. If you have higher or lower frequency data, you may want to increase or decrease this value, being sure to watch memory and CPU utilization on the system.
  • If you are retroactively applying limits for a tenant that is sending data, and the system is stable, use the GEM system monitoring / GEM / Per-tenant usage dashboard for each tenant to determine the value to set this to. For each tenant, choose its name in the Tenant dropdown at the top of the dashboard, set the dashboard date in the upper right-hand corner to a preferred time frame, and choose a value higher than the maximum value of the Ingestion rate graph, which will ensure the senders' data is not discarded. You might want to add a 10% buffer on top of the maximum number to allow for some variability in the rate of data being sent.

To determine if senders are hitting the ingestion rate limit, use the Explore functionality in Grafana to perform the following query against the GEM System Monitoring datasource:

sum by (reason) (rate(cortex_discarded_samples_total{user="TENANT-ID"}[1m]))

where TENANT-ID is the ID of the tenant.

The reason label rate_limited should be 0, otherwise GEM is rejecting samples as there is more data than allowed being sent.

Ingestion burst size

ingestion_burst_size

This limit allows for some temporary influx of data to be sent above the normal ingestion rate, but for short periods of time. To be extra careful and cautious, set this value to the same value you have set for ingestion_rate, which allows for no bursting. However if you have enough headroom in your system and want to hit limits less frequently if a sender temporarily bursts above, you are able to set this value. Depending on your headroom you may want to set this to a higher or lower number, but a relatively safe value would be to double it.

For customers deploying using single binary or in a scenario where the queriers are part of the write path, Grafana Labs recommends setting this value equal to ingestion_rate, which disables bursting.

Ruler max rules per rule group

ruler_max_rules_per_rule_group

If you are using the ruler component, you should set limits around how many rules can exist in a rule group. A smaller number of rules per rule group allows for more even distribution across the rulers. Grafana Labs recommends setting this value to 20.

Ruler max rule groups per tenant

ruler_max_rule_groups_per_tenant

If you are using the ruler component, you should set limits around how many rule groups a tenant is allowed. This number can be higher or lower depending on how many ruler instances you have. The effective number of rules a tenant can have is the ruler_max_rule_groups_per_tenant multiplied by ruler_max_rules_per_rule_group.

Grafana Labs recommends setting this to a lower number, such as 10, and increasing it if tenants need more ruler capacity.

To determine if the rulers have capacity to allow an increase of rule groups, reference the GEM System Monitoring / GEM / Reads Resources dashboard in the Ruler section, to determine if CPU and memory metrics are beneath the hard limits you have set.