Important: This documentation is about an older version. It's relevant only to the release noted, many of the features and functions have been updated or replaced. Please view the current version.
GEM limits recommendations
You can configure GEM limits to protect a system from any data sender (be it a machine or a person) who inadvertently sends or queries too much data, which can cause the system to run out of resources and crash. The limits also ensure a quality of service and fairness of a service across your user base.
GEM has the concept of tenants and supports setting limits for tenants using runtime configuration in Grafana Mimir. This is the preferred way to manage your limits and keep your system stable. The GEM Plugin for Grafana allows you to easily set these limits from inside your Grafana installation.
Since GEM 1.5.0, a self-monitoring __system__
tenant is available to help determine some of these limit values. For instructions on how to enable self-monitoring, please see the self-monitoring page.
If you do not have self-monitoring enabled or are using an earlier version of GEM, you need to query an external monitoring Prometheus to determine these values.
> Note: The terms tenant, and user are synonymous within the context of limit names; these terms all reference the same concept.
Global limits and configuration
You can use the following configuration options in your configuration file for all relevant GEM components or you can set them via CLI flags. Before making limits changes to individual tenants, restart the system. This stabilizes the system before you configure it using the API or plugin.
Max inflight ingester push requests
This option is a last-resort limit. If this limit is reached, remote_write
samples that were sent to the write path return 5xx errors, which are retried by the sender (in Prometheus and Grafana Agent). A value of 5000
is conservative but effective, based on internal testing. If you change this setting on an already running system, you need to roll restart the ingesters, in the same way you would with any configuration change to the ingesters.
ingester:
instance_limits:
# CLI flag: -ingester.instance-limits.max-inflight-push-requests=5000
max_inflight_push_requests: 5000 # default is 0 or unlimited in GEM 1.5.1 and earlier
When you reach this limit, requests to POST /api/v1/push
fail with the message:
cannot push: too many inflight push requests in ingester
Querier max concurrent
This option changes the number of queries a querier process is able to perform at once. A value of 8
is recommended. If you change this setting on an already running system, you need to roll restart the queriers, in the same way you would with any configuration change to the queriers.
querier:
# CLI flag: -querier.max-concurrent
max_concurrent: 8
Default minimal global settings
You can set limits for tenants above these minimal defaults. However, these limits help to keep the system stable if a new tenant is created and its limits have not been explicitly set. To keep things simple, apply these limits on every component because there are some components which embed other components and are as a result sometimes missed.
To override them, based on the needs of individual tenants, see Tenant limits
limits:
max_global_series_per_user: 1000000
max_global_series_per_metric: 30000
ingestion_rate: 142857
ingestion_burst_size: 142857
ingestion_rate_strategy: global
ruler_max_rules_per_rule_group: 20
ruler_max_rule_groups_per_tenant: 70
Tenant limits
To set the following limits on each of your tenants, use the GEM plugin. Alternatively, you can use runtime configuration in Grafana Mimir.
Max global series per user
max_global_series_per_user
Set the maximum number of series per tenant, which are enforced at the ingester level. This is generally described as the “number of metrics” that a tenant has. Note: The term user in the metric name does not mean a person using the system; in this context, the term is synonymous with tenant.
- If you are onboarding a new tenant, set this value to a low number such as
10000
to ensure that the system has enough memory and CPU headroom before it starts accepting metrics. To determine if CPU and memory metrics for each component are below the limits that you have set, refer to the GEM system monitoring / Reads resources and GEM system monitoring / Writes resources dashboards. If a customer’s metrics are being discarded, slowly raise this value until the discarded samples go to zero, while checking to ensure that the system still has enough memory and CPU headroom to handle the additional load. - If you are retroactively applying limits for a tenant that is sending data, and the system is stable, use the GEM system monitoring / Per-tenant usage dashboard for each tenant to determine the value to set this to. For each tenant, choose Tenant from the top of the dashboard, set the dashboard date in the upper right-hand corner to a preferred time frame, and choose a value higher than the maximum value of the Number of active time series graph, which will ensure the senders’ data is not discarded. You might want to add a 10% buffer on top of the maximum number to allow for some ingesters to contain slightly more metrics than others due to imperfect load balancing across ingesters.
To determine if you are discarding samples for a tenant, use the Explore functionality in Grafana to perform the following query against the GEM system monitoring
datasource:
sum by (reason) (rate(cortex_discarded_samples_total{user="TENANT-ID"}[1m]))
where TENANT-ID
is the ID of the tenant.
The reason
label per_user_series_limit
should be 0
, otherwise GEM is rejecting samples as there is more data than allowed being sent.
When this limit is reached, GEM responds with a 400 HTTP status code on POST /api/v1/push
and with the message:
per-user series limit of 1000000 exceeded, please contact administrator to raise it (per-ingester local limit: 100000)
Max global series per metric
max_global_series_per_metric
This limit controls the maximum number of active series per metric name, across the cluster before replication.
When this limit is reached, GEM responds with a 400 HTTP status code on POST /api/v1/push
and with the message:
per-metric series limit of 30000 exceeded, please contact administrator to raise it (per-ingester local limit: 1200)
Max fetched chunk bytes per query
max_fetched_chunk_bytes_per_query
This limit helps guard against querier processes utilizing too much memory and causing crashes.
- For microservices deployments, Grafana Labs recommends leaving this limit unset. If you observe querier processes crashing frequently because they are out of memory you can consider setting this limit, but infrequent querier process crashes are tolerated by the distributed system. Alternatively, setting the querier limit
-querier.max-concurrent
to a lower value than the default is an option to lower per querier memory usage. - For single binary deployments, Grafana Labs recommends determining the value for this limit by taking the amount of memory you are willing to allocate to queries, and dividing that number of bytes by the value of the querier limit
-querier.max-concurrent
(see Querier max concurrent). Set the final value ofmax_fetched_chunk_bytes_per_query
to 50% of the calculated value to account for various factors which increase memory overhead in the system. For example, if you are allocating20GB
of memory to queries, and themax-concurrent
is set to 8, set the value to1342177280
.
> Note: This limit will only be applied when the runtime configuration setting ingester_stream_chunks_when_using_blocks
(or -ingester.stream-chunks-when-using-blocks
) is true
.
When this limit is reached, GEM responds with a 422 HTTP status code on query endpoints and with the message:
the query hit the aggregated chunks size limit (limit: 1048576 bytes)
Ingestion rate
ingestion_rate
This limit helps guard against an increase in high frequency incoming samples from using too much memory on the system. The number of series and the number of series per metric limits set above are also important, but if a sender was updating a series every second, it would take more memory than one that was updated every minute.
- If you are onboarding a new tenant, Grafana Labs recommends setting this value to the value determined for
max_global_series_per_user
, divided by the scraping interval in seconds, then adding a 10% buffer. For example, if you have a value of1000000
set for yourmax_global_series_per_user
, the scraping interval of a sender is15s
, you should set theingestion_rate
to73333
. If you have higher or lower frequency data, you may want to increase or decrease this value, being sure to watch memory and CPU utilization on the system. - If you are retroactively applying limits for a tenant that is sending data, and the system is stable, use the GEM system monitoring / Per-tenant usage dashboard for each tenant to determine the value to set this to. For each tenant, choose its name in the Tenant dropdown at the top of the dashboard, set the dashboard date in the upper right-hand corner to a preferred time frame, and choose a value higher than the maximum value of the Ingestion rate graph, which will ensure the senders’ data is not discarded. You might want to add a 10% buffer on top of the maximum number to allow for some variability in the rate of data being sent.
To determine if senders are hitting the ingestion rate limit, use the Explore functionality in Grafana to perform the following query against the GEM system monitoring
datasource:
sum by (reason) (rate(cortex_discarded_samples_total{user="TENANT-ID"}[1m]))
where TENANT-ID
is the ID of the tenant.
The reason
label rate_limited
should be 0
, otherwise GEM is rejecting samples as there is more data than allowed being sent.
When this limit is reached, GEM responds with a 429 HTTP status code on POST /api/v1/push
and with the message:
ingestion rate limit (142857) exceeded while adding 1000 samples and 1000 metadata
Ingestion burst size
ingestion_burst_size
This limit allows for some temporary influx of data to be sent above the normal ingestion rate, but for short periods of time. To be extra careful and cautious, set this value to the same value you have set for ingestion_rate
, which allows for no bursting. However if you have enough headroom in your system and want to hit limits less frequently if a sender temporarily bursts above, you are able to set this value. Depending on your headroom you may want to set this to a higher or lower number, but a relatively safe value would be to double it.
For customers deploying using single binary or in a scenario where the queriers are part of the write path, Grafana Labs recommends setting this value equal to ingestion_rate
, which disables bursting.
When this limit is reached, GEM responds with a 429 HTTP status code on POST /api/v1/push
and with the message:
ingestion rate limit (142857) exceeded while adding 1000 samples and 1000 metadata
Ruler max rules per rule group
ruler_max_rules_per_rule_group
If you are using the ruler component, you should set limits around how many rules can exist in a rule group. A smaller number of rules per rule group allows for more even distribution across the rulers. Grafana Labs recommends setting this value to 20
.
When this limit is reached, GEM responds with a 400 HTTP status code on POST
rules endpoints and with the message:
per-user rules per rule group limit (limit: 20 actual: 24) exceeded
Ruler max rule groups per tenant
ruler_max_rule_groups_per_tenant
If you are using the ruler component, you should set limits around how many rule groups a tenant is allowed. This number can be higher or lower depending on how many ruler instances you have. The effective number of rules a tenant can have is the ruler_max_rule_groups_per_tenant
multiplied by ruler_max_rules_per_rule_group
.
Grafana Labs recommends setting this to a lower number, such as 10
, and increasing it if tenants need more ruler capacity.
To determine if the rulers have capacity to allow an increase of rule groups, reference the GEM system monitoring / Reads Resources
dashboard in the Ruler
section, to determine if CPU and memory metrics are beneath the hard limits you have set.
When this limit is reached GEM, responds with a 400 HTTP status code on POST
rules endpoints and with the message:
per-user rule groups limit (limit: 70 actual: 71) exceeded
Migrate your limits configuration
If you configured tenant limits using the deprecated limits
field in the Tenant API, you need to migrate to runtime configuration in Grafana Mimir.
- Set up runtime configuration to manage tenant limits.
- If you’re using Kubernetes through the Grafana Mimir Helm chart, use the built-in runtime configuration support in the
values.yaml
file. Refer to theruntimeConfig
section in the examplevalues.yaml
file. - If you’re not using the Helm chart, distribute a YAML file to the nodes running the software. Use a well-known path in the main configuration file that points to this other configuration file. Use the deployment strategy that you have in place to distribute binaries and files to nodes. Because Mimir and GEM check the configuration file for updates every minute by default, you don’t need to restart the binary after each change.
- If you’re using Kubernetes through the Grafana Mimir Helm chart, use the built-in runtime configuration support in the
- For each tenant, copy the limits applied with the API to the runtime configuration file.
- These limits have the same names in both systems.
- Make sure to nest these settings one level under the tenant ID.
- Deploy the runtime configuration file.
- For each tenant, delete the limits configured with the API.
- If you’re using the GEM plugin, click the X next to each limit.
- If you’re using the API, set each limit to a value of
null
.
You’ve configured GEM to read tenant limits from the runtime configuration file. Don’t use the Tenant API to update limits, as this overrides the values set in the runtime configuration file.
For more information, refer to About Grafana Mimir runtime configuration in the Mimir documentation.