Important: This documentation is about an older version. It's relevant only to the release noted, many of the features and functions have been updated or replaced. Please view the current version.
Grafana Enterprise Metrics downloads
Releases
v1.5.1 – September 21st 2021
Links
Binary (Linux AMD64)
Deb (Linux AMD64)
RPM (Linux AMD64)
Docker image: run
docker pull grafana/metrics-enterprise:v1.5.1
(digest:sha256:079ed9d61a7ab0953afbfa76de8ab2d38d44ac17e630446bab4084b4aba0c2e4
)License: Grafana Labs license
Changelog
- [ENHANCEMENT] Add ADFS compatibility to our OIDC auth.
- [BUGFIX] Ruler: Use predictable names for Ruler WALs ensuring they are used after crashes and cleaned up.
v1.5.0 – August 24th 2021
Links
Binary (Linux AMD64)
Deb (Linux AMD64)
RPM (Linux AMD64)
Docker image: run
docker pull grafana/metrics-enterprise:v1.5.0
(digest:sha256:b0d98ffe49df461a524743a49dca26952a59c9c007231035e52f0a06e5003fff
)License: Grafana Labs license
Changelog
- [CHANGE] Alertmanager: allowed to configure the experimental receivers firewall on a per-tenant basis. The following CLI flags (and their respective YAML config options) have been changed and moved to the limits config section:
-alertmanager.receivers-firewall.block.cidr-networks
renamed to-alertmanager.receivers-firewall-block-cidr-networks
-alertmanager.receivers-firewall.block.private-addresses
renamed to-alertmanager.receivers-firewall-block-private-addresses
- [CHANGE] Memberlist: Expose default configuration values to the command line options. Note that setting these explicitly to zero will no longer cause the default to be used. If the default is desired, then do set the option. The following are affected:
-memberlist.stream-timeout
-memberlist.retransmit-factor
-memberlist.pull-push-interval
-memberlist.gossip-interval
-memberlist.gossip-nodes
-memberlist.gossip-to-dead-nodes-time
-memberlist.dead-node-reclaim-time
- [CHANGE] Authentication: Access Policy names passed via a JWT token in the OIDC auth flow will be downcased before being matched against Access Policies in GEM. This improves interoperability between GEM and other systems since GEM only allows lowercase characters in Access Policy names
- [CHANGE] Change default value of
-server.grpc.keepalive.min-time-between-pings
from5m
to10s
and-server.grpc.keepalive.ping-without-stream-allowed
totrue
. - [CHANGE] Changed
-alertmanager.storage.type
default value fromconfigdb
tolocal
. - [CHANGE] Changed
-ruler.storage.type
default value fromconfigdb
tolocal
. - [CHANGE] Cortex chunks storage has been deprecated and it’s now in maintenance mode: all Cortex users are encouraged to migrate to the blocks storage. No new features will be added to the chunks storage. The default Cortex configuration still runs the chunks engine; please check out the blocks storage doc on how to configure Cortex to run with the blocks storage.
- [CHANGE] Dependency: update go-redis from v8.2.3 to v8.9.0.
- [CHANGE] Deprecated the
bootstrap
target in favor of thetokengen
target. - [CHANGE] Enable strict JSON unmarshal for
pkg/util/validation.Limits
struct. The customUnmarshalJSON()
will now fail if the input has unknown fields. - [CHANGE] Graphite: proxy no longer generates generic metrics metadata. This helps to reduce ingestion rate as counted by Cortex and used for limits.
- [CHANGE] Ingester: Change default value of
-ingester.active-series-metrics-enabled
totrue
. This incurs a small increase in memory usage, between 1.2% and 1.6% as measured on ingesters with 1.3M active series. - [CHANGE] License: Flag
-bootstrap.license.path
has been deprecated in favor of-license.path
. - [CHANGE] Memberlist: the
memberlist_kv_store_value_bytes
has been removed due to values no longer being stored in-memory as encoded bytes. - [CHANGE] Querier / ruler: Change
-querier.max-fetched-chunks-per-query
configuration to limit to maximum number of chunks that can be fetched in a single query. The number of chunks fetched by ingesters AND long-term storare combined should not exceed the value configured on-querier.max-fetched-chunks-per-query
. - [CHANGE] Querier / ruler: deprecated
-store.query-chunk-limit
CLI flag (and its respective YAML config optionmax_chunks_per_query
) in favour of-querier.max-fetched-chunks-per-query
(and its respective YAML config optionmax_fetched_chunks_per_query
). The new limit specifies the maximum number of chunks that can be fetched in a single query from ingesters and long-term storage: the total number of actual fetched chunks could be 2x the limit, being independently applied when querying ingesters and long-term storage. - [CHANGE] Query-frontend: Enable query stats by default, they can still be disabled with
-frontend.query-stats-enabled=false
. - [CHANGE] Removed
configdb
support from Ruler and Alertmanager backend storages. - [CHANGE] Removed
log_messages_total
metric. - [CHANGE] Removed query sharding for the chunks storage. Query sharding is now only supported for blocks storage.
- [CHANGE] Renamed metric
deprecated_flags_inuse_total
asdeprecated_flags_used_total
. - [CHANGE] Renamed metric
experimental_features_in_use_total
asexperimental_features_used_total
. - [CHANGE] Some files and directories on local disk now have stricter permissions, and are only readable by owner, but not group or others.
- [CHANGE] The example Kubernetes manifests (stored at
k8s/
) have been removed due to a lack of proper support and maintenance. - [CHANGE] Update Go version to 1.16.6.
- [FEATURE] Added flag
-debug.block-profile-rate
to enable goroutine blocking events profiling. - [FEATURE] Alertmanager: Added
-alertmanager.max-config-size-bytes
limit to control size of configuration files that Cortex users can upload to Alertmanager via API. This limit is configurable per-tenant. - [FEATURE] Alertmanager: Added
-alertmanager.max-templates-count
and-alertmanager.max-template-size-bytes
options to control number and size of templates uploaded to Alertmanager via API. These limits are configurable per-tenant. - [FEATURE] Alertmanager: Added rate-limits to notifiers. Rate limits used by all integrations can be configured using
-alertmanager.notification-rate-limit
, while per-integration rate limits can be specified via-alertmanager.notification-rate-limit-per-integration
parameter. Both shared and per-integration limits can be overwritten using overrides mechanism. These limits are applied on individual (per-tenant) alertmanagers. Rate-limited notifications are failed notifications. It is possible to monitor rate-limited notifications via newcortex_alertmanager_notification_rate_limited_total
metric. - [FEATURE] Alertmanager: support negative matchers, time-based muting - upstream release notes.
- [FEATURE] Allow for reporting CPU time usage to AWS Marketplace metering service in case GEM is running as AWS Marketplace container product.
- [FEATURE] Collect and store CPU time usage reports in Admin store, which can later be used to submit to metering services, such as the AWS Marketplace API
- [FEATURE] Querier/Ruler: Added new
-querier.max-fetched-chunk-bytes-per-query
flag. When Cortex is running with blocks storage, the max chunk bytes limit is enforced in the querier and ruler and limits the size of all aggregated chunks returned from ingesters and storage as bytes for a query. - [FEATURE] Querier: Added new
-querier.max-fetched-series-per-query
flag. When Cortex is running with blocks storage, the max series per query limit is enforced in the querier and applies to unique series received from ingesters and store-gateway (long-term storage). - [FEATURE] Query Frontend: Add
cortex_query_fetched_chunks_total
per-user counter to expose the number of chunks fetched as part of queries. This metric can be enabled with the-frontend.query-stats-enabled
flag (or its respective YAML config optionquery_stats_enabled
). - [FEATURE] Query Frontend: Add
cortex_query_fetched_series_total
andcortex_query_fetched_chunks_bytes_total
per-user counters to expose the number of series and bytes fetched as part of queries. These metrics can be enabled with the-frontend.query-stats-enabled
flag (or its respective YAML config optionquery_stats_enabled
). - [FEATURE] Query Frontend: Add experimental querysharding for the block storage. You can now enabled querysharding for block storage (
-store.engine
) by setting-querier.parallelise-shardable-queries
totrue
. - [FEATURE] Ruler Storage: S3 header extensions were added to the new ruler storage S3 config block.
- [FEATURE] Ruler: Add new
-ruler.query-stats-enabled
which when enabled will report thecortex_ruler_query_seconds_total
as a per-user metric that tracks the sum of the wall time of executing queries in the ruler in seconds. - [FEATURE] When running GEM as AWS Marketplace container product then the Go runtime variable
GOMAXPROCS
is automatically set to match the container CPU quota, in case Kubernetes CPU resource limits are set. - [FEATURE] Alertmanager: The experimental sharding feature is now considered complete. Detailed information about the configuration options can be found here for alertmanager and here for the alertmanager storage. To use the feature:
- Ensure that a remote storage backend is configured for Alertmanager to store state using
-alertmanager-storage.backend
, and flags related to the backend. Note that thelocal
andconfigdb
storage backends are not supported. - Ensure that a ring store is configured using
-alertmanager.sharding-ring.store
, and set the flags relevant to the chosen store type. - Enable the feature using
-alertmanager.sharding-enabled
. - Note the prior addition of a new configuration option
-alertmanager.persist-interval
. This sets the interval between persisting the current alertmanager state (notification log and silences) to object storage. See the configuration file reference for more information.
- Ensure that a remote storage backend is configured for Alertmanager to store state using
- [ENHANCEMENT] Add Cassandra support.
- [ENHANCEMENT] Add timeout for waiting on compactor to become ACTIVE in the ring.
- [ENHANCEMENT] Added
tenant_ids
tag to tracing spans - [ENHANCEMENT] Added option
-distributor.excluded-zones
to exclude ingesters running in specific zones both on write and read path. - [ENHANCEMENT] Added zone-awareness support to alertmanager for use when sharding is enabled. When zone-awareness is enabled, alerts will be replicated across availability zones.
- [ENHANCEMENT] Admin-API: Add a new endpoint for returning product and feature information at /admin/api/v1/features
- [ENHANCEMENT] Admin-API: Allow admin-api to operate for read-only request when no license is present.
- [ENHANCEMENT] Alertmanager: Added
-alertmanager.max-alerts-count
and-alertmanager.max-alerts-size-bytes
to control max number of alerts and total size of alerts that a single user can have in Alertmanager’s memory. Adding more alerts will fail with a log message and incrementingcortex_alertmanager_alerts_insert_limited_total
metric (per-user). These limits can be overrided by using per-tenant overrides. Current values are tracked incortex_alertmanager_alerts_limiter_current_alerts
andcortex_alertmanager_alerts_limiter_current_alerts_size_bytes
metrics. - [ENHANCEMENT] Alertmanager: Added
-alertmanager.max-dispatcher-aggregation-groups
option to control max number of active dispatcher groups in Alertmanager (per tenant, also overrideable). When the limit is reached, Dispatcher produces log message and increasescortex_alertmanager_dispatcher_aggregation_group_limit_reached_total
metric. - [ENHANCEMENT] Alertmanager: Cleanup persisted state objects from remote storage when a tenant configuration is deleted.
- [ENHANCEMENT] Authentiation: OIDC integration now supports a JWT with multiple roles. When present, these roles will be rolled up into a “virtual” access policy that provides metrics read access to the union of instances contained in those roles.
- [ENHANCEMENT] Blocks storage: support ingesting exemplars and querying of exemplars. Enabled by setting new CLI flag
-blocks-storage.tsdb.max-exemplars=<n>
or config optionblocks_storage.tsdb.max_exemplars
to positive value. - [ENHANCEMENT] Distributor: Added distributors ring status section in the admin page.
- [ENHANCEMENT] Etcd: Added username and password to etcd config.
- [ENHANCEMENT] Expose CPU quota information (number of cores, cgroup quota) as Prometheus metrics.
- [ENHANCEMENT] Expose error counters and timestamps of CPU usage reporting as Prometheus metrics when AWS Marketplace meterting is enabled.
- [ENHANCEMENT] Expose value of GOMAXPROCS as Prometheus metrics.
- [ENHANCEMENT] Facilitate running GEM Docker image as a non-root user. Usage is documented in the Kubernetes deployment documentation.
- [ENHANCEMENT] Ingester: Added option
-ingester.ignore-series-limit-for-metric-names
with comma-separated list of metric names that will be ignored in max series per metric limit. - [ENHANCEMENT] Ingester: added option
-ingester.readiness-check-ring-health
to disable the ring health check in the readiness endpoint. - [ENHANCEMENT] License: Added flag
-license.type
that is used to specify that the APP is running through AWS Marketplace. - [ENHANCEMENT] License: Implemented
/licenses
endpoint that responds with static list of licenses that replaces default implementation if the APP is running through AWS Marketplace. - [ENHANCEMENT] License: Implemented logic to check if AWS Marketplace subscription is active instead of checking license file if the APP is running through AWS Marketplace.
- [ENHANCEMENT] Memberlist: expose configuration of memberlist packet compression via
-memberlist.compression=enabled
. - [ENHANCEMENT] Memberlist: optimized receive path for processing ring state updates, to help reduce CPU utilization in large clusters.
- [ENHANCEMENT] Node-API: Added TSDB block metadata to the exportable debug archive.
- [ENHANCEMENT] Node-API: Register a new endpoint for fetching a compressed debug file containing config and version information at /node/api/v1/debug-export.
- [ENHANCEMENT] Node-API: Register a new endpoint for fetching version information about the nodes at /node/api/v1/version.
- [ENHANCEMENT] Querier now can use the
LabelNames
call with matchers, if matchers are provided in the/labels
API call, instead of using the more expensiveMetricsForLabelMatchers
call as before. This can be enabled by enabling the-querier.query-label-names-with-matchers-enabled
flag once the ingesters are updated to this version. In the future this is expected to become the default behavior. - [ENHANCEMENT] Reduce memory used by streaming queries, particularly in ruler.
- [ENHANCEMENT] Ring, query-frontend: Avoid using automatic private IPs (APIPA) when discovering IP address from the interface during the registration of the instance in the ring, or by query-frontend when used with query-scheduler. APIPA still used as last resort with logging indicating usage.
- [ENHANCEMENT] Ruler: added
rule_group
label to metricscortex_prometheus_rule_group_iterations_total
andcortex_prometheus_rule_group_iterations_missed_total
. - [ENHANCEMENT] Scanner: add support for DynamoDB (v9 schema only).
- [ENHANCEMENT] Scanner: retry failed uploads.
- [ENHANCEMENT] Storage: Added the ability to disable Open Census within GCS client (e.g
-gcs.enable-opencensus=false
). - [ENHANCEMENT] Store-gateway: added
-store-gateway.sharding-ring.wait-stability-min-duration
and-store-gateway.sharding-ring.wait-stability-max-duration
support to store-gateway, to wait for ring stability at startup. - [ENHANCEMENT] Wildcard Datasource: Wildcard “*” datasources are now supported in datasource urls for GEM. This allows an action to have access to all instances in all access policies associated with the provided token. If that set of instances includes a wildcard “*”, then access is expanded to all instances in the cluster.
- [ENHANCEMENT] Added instrumentation to Redis client, with the following metrics:
cortex_rediscache_request_duration_seconds
- [ENHANCEMENT] Include additional limits in the per-tenant override exporter. The following limits have been added to the
cortex_overrides
metric:max_fetched_series_per_query
max_fetched_chunk_bytes_per_query
ruler_max_rules_per_rule_group
ruler_max_rule_groups_per_tenant
- [ENHANCEMENT] License Manager: Added functionality to regularly check the local license file and sync it to the license storage backend.
- Added metrics
grafana_labs_license_syncs_total
andgrafana_labs_license_sync_failures_total
.
- Added metrics
- [ENHANCEMENT] Ring: allow experimental configuration of disabling of heartbeat timeouts by setting the relevant configuration value to zero. Applies to the following:
-distributor.ring.heartbeat-timeout
-ring.heartbeat-timeout
-ruler.ring.heartbeat-timeout
-alertmanager.sharding-ring.heartbeat-timeout
-compactor.ring.heartbeat-timeout
-store-gateway.sharding-ring.heartbeat-timeout
- [ENHANCEMENT] Ring: allow heartbeats to be explicitly disabled by setting the interval to zero. This is considered experimental. This applies to the following configuration options:
-distributor.ring.heartbeat-period
-ingester.heartbeat-period
-ruler.ring.heartbeat-period
-alertmanager.sharding-ring.heartbeat-period
-compactor.ring.heartbeat-period
-store-gateway.sharding-ring.heartbeat-period
- [ENHANCEMENT] Alertmanager: introduced new metrics to monitor operation when using
-alertmanager.sharding-enabled
:cortex_alertmanager_state_fetch_replica_state_total
cortex_alertmanager_state_fetch_replica_state_failed_total
cortex_alertmanager_state_initial_sync_total
cortex_alertmanager_state_initial_sync_completed_total
cortex_alertmanager_state_initial_sync_duration_seconds
cortex_alertmanager_state_persist_total
cortex_alertmanager_state_persist_failed_total
- [ENHANCEMENT] Memberlist: introduced new metrics to aid troubleshooting tombstone convergence:
memberlist_client_kv_store_value_tombstones
memberlist_client_kv_store_value_tombstones_removed_total
memberlist_client_messages_to_broadcast_dropped_total
- [ENHANCEMENT] Ruler: added new metrics for tracking total number of queries and push requests sent to ingester, as well as failed queries and push requests. Failures are only counted for internal errors, but not user-errors like limits or invalid query. This is in contrast to existing
cortex_prometheus_rule_evaluation_failures_total
, which is incremented also when query or samples appending fails due to user-errors.cortex_ruler_write_requests_total
cortex_ruler_write_requests_failed_total
cortex_ruler_queries_total
cortex_ruler_queries_failed_total
- [BUGFIX] Alertmanager: fix Alertmanager status page if clustering via gossip is disabled or sharding is enabled.
- [BUGFIX] Authentication: fix handling of missing instances, or when instance has no matching access policy, by properly returning a 401 instead of crashing.
- [BUGFIX] Compactor: fixed panic while collecting Prometheus metrics.
- [BUGFIX] Graphite: Apply the max-points-per-req-hard limit correctly.
- [BUGFIX] Graphite: Fix race in index.json API endpoint which lead to incomplete results.
- [BUGFIX] HA Tracker: when cleaning up obsolete elected replicas from KV store, tracker didn’t update number of cluster per user correctly.
- [BUGFIX] Ingester: fix issue where runtime limits erroneously override default limits.
- [BUGFIX] Ingester: fixed infrequent panic caused by a race condition between TSDB mmap-ed head chunks truncation and queries.
- [BUGFIX] Ingester: fixed ingester stuck on start up (LEAVING ring state) when
-ingester.heartbeat-period=0
and-ingester.unregister-on-shutdown=false
. - [BUGFIX] Invalidate cached authentication tokens when they are deleted from object storage.
- [BUGFIX] Make multiple Get requests instead of MGet on Redis Cluster.
- [BUGFIX] Memberlist: fix to setting the default configuration value for
-memberlist.retransmit-factor
when not provided. This should improve propagation delay of the ring state (including, but not limited to, tombstones). Note that if the configuration is already explicitly given, this fix has no effect. - [BUGFIX] Purger: fix
Invalid null value in condition for column range
caused bynil
value in range for WriteBatch query. - [BUGFIX] Querier: Fix issue where samples in a chunk might get skipped by batch iterator.
- [BUGFIX] Querier: fix queries failing with “at least 1 healthy replica required, could only find 0” error right after scaling up store-gateways until they’re ACTIVE in the ring.
- [BUGFIX] Query-frontend: Fix 401s during
query_range
requests when enterprise authentication is used. The workaround involving disabling enterprise authentication on the querier can now be removed. - [BUGFIX] Ruler: Fix bug in rule forwarding with remote write which could cause filling up the disk because it was not truncated.
- New flags called
-ruler.remote-write.wal-truncate-frequency
,-ruler.remote-write.min-wal-time
and-ruler.remote-write.max-wal-time
have been added.
- New flags called
- [BUGFIX] Ruler: Honor the evaluation delay for the
ALERTS
andALERTS_FOR_STATE
series. - [BUGFIX] Ruler: fix
/ruler/rule_groups
endpoint doesn’t work when used with object store. - [BUGFIX] Ruler: fix startup in single-binary mode when the new
ruler_storage
is used. - [BUGFIX] Ruler: fixed counting of PromQL evaluation errors as user-errors when updating
cortex_ruler_queries_failed_total
. - [BUGFIX] Store-gateway: when blocks sharding is enabled, do not load all blocks in each store-gateway in case of a cold startup, but load only blocks owned by the store-gateway replica.
- [BUGFIX] Upgrade Prometheus. TSDB now waits for pending readers before truncating Head block, fixing the
chunk not found
error and preventing wrong query results.
v1.4.2 – Jul 21st 2021
Links
Binary (Linux AMD64)
Deb (Linux AMD64)
RPM (Linux AMD64)
Docker image: run
docker pull grafana/metrics-enterprise:v1.4.2
(digest:sha256:385b563669a5ba4a459f833a2c356884b757de719e43369ead0c5dc59cb11d94
)License: Grafana Labs license
Changelog
- [SECURITY] Prevent path traversal attack from users able to control the HTTP header
X-Scope-OrgID
. (CVE-2021-36157)- Users only have control of the HTTP header when GEM is configured with
flags
-auth.type=default
and-tenant-federation.enabled=false
- Users only have control of the HTTP header when GEM is configured with
flags
- [SECURITY] Update build image to use Go 1.16.6. (CVE-2021-34558) #1874
- [BUGFIX] Ruler: Register remote write metrics correctly. #1814
Upstream Cortex details
- Cortex Hash:
2210ebb7052a9efb99d0e4dc53043a3f5d806d00
v1.4.1 – June 29th 2021
Links
Binary (Linux AMD64)
Deb (Linux AMD64)
RPM (Linux AMD64)
Docker image: run
docker pull grafana/metrics-enterprise:v1.4.1
(digest:sha256:d1d17bfe2ec984b093b9da1ab8cdea1f764f24f16b38557d719254c4e64c9f9a
)License: Grafana Labs license
Changelog
- [BUGFIX] Update the GEM build image to use Alpine 3.14, python 3.9 and gsutil 4.52.
Upstream Cortex details
- Cortex Hash:
98dd0c4d69576fdfaf2b9bfd7aa475e835e11429
v1.4.0 – June 28th 2021
Links
Binary (Linux AMD64)
Deb (Linux AMD64)
RPM (Linux AMD64)
Docker image: run
docker pull grafana/metrics-enterprise:v1.4.0
(digest:sha256:ff38e0544d805bfd1450a1f033ed79585252a4444d247e0e4c649625619215ab
)License: Grafana Labs license
Changelog
- [CHANGE] Breaking: Verify token issuer when using OIDC authentication. Includes a breaking change for users of OIDC authentication. #1571
- Before this change the configuration of OIDC authentication required the OIDC provider’s
jwks_uri
to be set in the configuration flagauth.admin.oidc.url
. This flag has been deprecated. - A new flag named
auth.admin.oidc.issuer-url
has been added, and it must be set to the URL of the OIDC provider. For example:-auth.admin.oidc.issuer-url=https://accounts.google.com
Note: This is not simply a rename of the old flag; you also need to update the value. The defined issuer is required to provide the OIDC discovery endpoint (/.well-known/openid-configuration
)
- Before this change the configuration of OIDC authentication required the OIDC provider’s
- [CHANGE] Breaking: The GEM/GEL Ruler can now be accessed by access policies with rules read/write permissions, which are no longer metrics/logs specific #1366 & #1403
- Before this change, there were metric rule specific permissions
metrics:rules:read
andmetrics:rules:write
. - The data representation for this change in object storage is backwards compatible, so no change is needed for existing access policies using the new rules.
- The JSON representation for these rules is not backwards compatible, and so any JSON interactions with the API that specified the strings
metrics:rules:read
ormetrics:rules:write
must be updated to the stringsrules:read
andrules:write
respectively. - This breaking change applies to the GEM Plugin as well, so please update to version v3.0.X.
- Before this change, there were metric rule specific permissions
- [CHANGE] Remove
enterprise_features
config block entirely. #1453 - [CHANGE] Alertmanager: deprecated
-alertmanager.storage.*
CLI flags (and their respective YAML config options) in favour of-alertmanager-storage.*
. This change doesn’t apply toalertmanager.storage.path
andalertmanager.storage.retention
. - [CHANGE] Blocks storage: removed the config option
-blocks-storage.bucket-store.index-cache.postings-compression-enabled
, which was deprecated. Postings compression is always enabled. - [CHANGE] GEM now fails fast on startup if it is unable to connect to the ring backend.
- [CHANGE] Querier / ruler: deprecated
-store.query-chunk-limit
CLI flag (and its respective YAML config optionmax_chunks_per_query
) in favor of-querier.max-fetched-chunks-per-query
(and its respective YAML configuration optionmax_fetched_chunks_per_query
). The new limit specifies the maximum number of chunks that can be fetched in a single query from ingesters and long-term storage: the total number of chunks that are actually fetched, in the worst case, can be twice the limit because the limit is applied to ingesters as well as long-term storage. - [CHANGE] Query frontend: removed the configuration option
-querier.compress-http-responses
, which was deprecated. Instead, use-api.response-compression-enabled
. - [CHANGE] Runtime-config / overrides: removed the config options
-limits.per-user-override-config
(use-runtime-config.file
) and-limits.per-user-override-period
(use-runtime-config.reload-period
), both deprecated. - [FEATURE] Add embedded recording rules to the Enterprise Ruler to support building dashboards and
alerts from internal metrics written directly to GEM itself via a distributor. #1459
- To enable or disable the feature, use the
-instrumentation.enabled
flag or associatedenabled
setting on theinstrumentation
configuration block. The feature is disabled by default.
- To enable or disable the feature, use the
- [FEATURE] Add the ability to write internal metrics directly to GEM itself via a distributor. #1281
- To configure, or enabled or disabled the feature, user the
-instrumentation.enabled
flag and associated other flags or theinstrumentation
configuration block:The feature is disabled by default.instrumentation: enabled: false flush_period: 15s write_timeout: 10s distributor_client: address: dns:///:9095 connect_timeout: 5s tls_enabled: false tls_cert_path: tls_key_path: tls_ca_path: tls_server_name: tls_insecure_skip_verify:
- To configure, or enabled or disabled the feature, user the
- [FEATURE] Self-monitoring: expose filesystem usage metrics to source the disk utilization panel in the self-monitoring resource dashboards #1618
- [FEATURE] Add an experimental GEM component
federation-frontend
, which can be used to federate queries between multiple GEM clusters. #1274 - [FEATURE] Querier: Added new
-querier.max-fetched-series-per-query
flag. When GEM is running with blocks storage, the max series per query limit is enforced in the querier and applies to unique series received from ingesters and store-gateway (long-term storage). - [FEATURE] Querier/Ruler: Added new
-querier.max-fetched-chunk-bytes-per-query
flag. When GEM is running with blocks storage, the max chunk bytes limit is enforced in the querier and ruler and limits the size of all aggregated chunks returned from ingesters and storage as bytes for a query. - [ENHANCEMENT] Introduce configuration parameter to limit how many points we process per query. #1292
- [ENHANCEMENT] Adding API endpoints via which a user can post / get their storage schemas / aggregations. #1389
- [ENHANCEMENT] Admin-API: Listing mutable resources now includes a comma separated list of versions for those resources in the
ETag
header #1419 - [ENHANCEMENT] Admin-API: Updating a mutable resources now allows a wildcard value (
"*"
) to be passed as theIf-Match
header, which allows the updating of any current version #1449 - [ENHANCEMENT] The
/config
HTTP endpoint now also returns GEM specific options alongside regular Cortex configuration. #1380 - [BUGFIX] Fix LBAC regular expression matchers #1305
- [BUGFIX] Validate all fields of JWT tokens used for auth, except the issuer. #1500
- [BUGFIX] Ruler: ensure the S3 rule storage flags properly maps to the upstream flags. #1460
- [BUGFIX] Admin-API: rejecting update requests when access policies have empty scopes or realms. #1447
- [BUGFIX] Updated licenses are now persisted to object storage, fixing the responses from the license API which would show old license information. #1568
- [BUGFIX] Validate all fields of JWT tokens used for auth, except the issuer. #1500
- [BUGFIX] OAuth: Don’t use default access policy when an invalid JWT claim is provided. #1635
- [BUGFIX] Authentiation: Invalidate cached authentication tokens when they are deleted from object storage. #1703
Upstream Cortex details
- Cortex Hash:
98dd0c4d69576fdfaf2b9bfd7aa475e835e11429
- Cortex Commits
v1.3.1 – Jul 21st 2021
Links
Binary (Linux AMD64)
Deb (Linux AMD64)
RPM (Linux AMD64)
Docker image: run
docker pull grafana/metrics-enterprise:v1.3.1
(digest:sha256:e03a7ae061d5f617490812a6f45c6362fdc9ef79010555a207ebee2174ef9b23
)License: Grafana Labs license
Changelog
- [SECURITY] Prevent path traversal attack from users able to control the HTTP header
X-Scope-OrgID
. (CVE-2021-36157)- Users only have control of the HTTP header when GEM is configured with
flags
-auth.type=default
and-tenant-federation.enabled=false
- Users only have control of the HTTP header when GEM is configured with
flags
- [SECURITY] Update build image to use Go 1.16.6. (CVE-2021-34558) #1874
- [BUGFIX] Update the GEM build image to use Alpine 3.14, python 3.9 and gsutil 4.52. #1781
- [BUGFIX] Ruler: Register remote write metrics correctly. #1814
Upstream Cortex details
- Cortex Hash:
64592254fe91c86e903882947a58d572a316884d
v1.3.0 – April 26th 2021
Links
Binary (Linux AMD64)
Deb (Linux AMD64)
RPM (Linux AMD64)
Docker image: run
docker pull grafana/metrics-enterprise:v1.3.0
License: Grafana Labs license
Changelog
- [SECURITY] Alertmanager: Fix a local file disclosure vulnerability when
-experimental.alertmanager.enable-api
is used (CVE-2021-31231):- The HTTP Basic auth password_file can be used as an attack vector to send any file content via a webhook.
- The Alertmanager templates can be used as an attack vector to send any file content because the Alertmanager can load any text file specified in the templates list.
- [CHANGE] Admin API: Concurrent requests to the same resource are no longer allowed. If two requests are issued to create, update, or delete the same resource, then the first one to achieve a lock executes and the second one returns a conflict error. This is handled per process. To enforce this behavior on multiple processes, use leader election. #1186
- [CHANGE] Admin API: all errors encountered during the processing of HTTP requests are converted to GRPC errors in order to determine the correct HTTP status to return. This enforces consistency for leader election, because some requests are handled internally, and others are forwarded to other instances. #1217
- [CHANGE] Admin API: all mutation operations (
PUT
/DELETE
) now require anIf-Match
header to be set (an integer between""
such as"27"
) to verify that the correct version of the resource is being modified and prevent against race conditions. You can find the current version of a resource in theETag
header that is returned when that resource is read (viaGET
) or updated (viaPUT
). - [FEATURE] Admin API: you can set per-instance resource limits via the Admin API. This is enabled by default. #1173
- You can enable or disable this feature by using the
-admin-api.limits.enabled
or-admin-api.limits.refresh-period
flags. Also, you can configure this feature by using theadmin_api
configuration block:admin_api: limits: enabled: true refresh_period: 1m
- You can enable or disable this feature by using the
- [ENHANCEMENT] Upgrade build image to use Go 1.16.3. #1294
- [ENHANCEMENT] Admin client: Add
cortex_admin_client_is_leader
gauge metric to determine when the client considers itself the leader. #1175 - [ENHANCEMENT] Admin API: update an access policy via the Admin API using a
PUT
request. #1139 - [ENHANCEMENT] Admin API: Update an instance via the Admin API using a
PUT
request. #1180 - [ENHANCEMENT] Gateway: Forward
/multitenant_alertmanager/ring
and/ruler/ring
routes to thealertmanager
andruler
proxy backends. #1144 - [BUGFIX] Graphite: Fix aggregation cache to generate cache keys using correct input data. #963
- [BUGFIX] Authentication: Fix issue where all requests would trigger a panic if authentication is enabled but no admin client is configured. A error is now printed instead. #1106
Upstream Cortex details
- Cortex Hash:
2d8477c4a325ce5071676e906efcee4adb687513
- Cortex Commits
v1.2.1 – April 27 2021
Links
Binary (Linux AMD64)
Deb (Linux AMD64)
RPM (Linux AMD64)
Docker image: run
docker pull grafana/metrics-enterprise:v1.2.1
License: Grafana Labs license
Changelog
- [SECURITY] Alertmanager: Fix a local file disclosure vulnerability when
-experimental.alertmanager.enable-api
is used (CVE-2021-31231):- The HTTP Basic auth password_file can be used as an attack vector to send any file content via a webhook.
- The Alertmanager templates can be used as an attack vector to send any file content because the Alertmanager can load any text file specified in the templates list.
v1.2.0 – March 10 2021
Links
Binary (Linux AMD64)
Deb (Linux AMD64)
RPM (Linux AMD64)
Docker image: run
docker pull grafana/metrics-enterprise:v1.2.0
License: Grafana Labs license
Changelog
- [CHANGE] Gateway: Remove purger proxy configuration, which is not a supported target for blocks clusters.
- [CHANGE] Auth: Override authentication flags have been renamed:
- The
auth.override-admin-token
flag has been changed toauth.override.token
. - The
auth.override-admin-token-file
flag has been changed toauth.override.token-file
.
- The
- [FEATURE] Gateway: Improve the
gateway
target to support unique TLS configurations and write timeouts for each backend.- New fields have been added to allow for configuration:
gateway: proxy: default: tls: tls_cert_path: <string> tls_key_path: <string> tls_ca_path: <string> tls_insecure_skip_verify: <bool> distributor: read_timeout: <duration> write_timeout: <duration> tls: ...
- New fields have been added to allow for configuration:
- [FEATURE] Compactor: Introduced
time-sharding
compaction strategy.
- [ENHANCEMENT] Distributor: Wrap remote writes in distributor to sample and log them as business intelligence events.
- [ENHANCEMENT] Metrics emitted for TLS certificate expiration now reflect certificates being reloaded.
- [ENHANCEMENT] Remove the Graphite Auto Complete Index and use Cortex index instead.
- [ENHANCEMENT] Add Graphite API endpoint /metrics/index.json.
- [ENHANCEMENT] Distributor: Wrap remote writes in distributor to sample and log them as business intelligence events.
- [ENHANCEMENT] Call Cortex Distributor over gRPC from Graphite Write Proxy (formerly Graphite Distributor)
- [ENHANCEMENT] Admin API: Add feature to elect and admin-api leader instance to handle all mutation requests. Requests to non-leader instances are forwarded to the leader instance.
- New fields have been added to allow for configuration:
admin_api: leader_election: enabled: <bool> ring: kvstore: <kv.Config> heartbeat_period: <duration> heartbeat_timeout: <duration> tokens_observe_period: <duration> instance_interface_name: <[]string> client_config: <grpcclient.Config>
- [BUGFIX] LBAC: Fix issue where debug logs would not print the selector and instead print
selector="unsupported value type"
. - [BUGFIX] Admin-Client: Warning logs are no longer created on resource creation.
- [BUGFIX] Ruler: Fix issue where invalid remote-write URLs cause a panic.
- [BUGFIX] Querier: Apply label access filters on multi tenant access policies.
Upstream Cortex details
- Cortex Hash:
003eb33266ca464d7290a938a9d767c36b9a03a4
- Cortex CHANGELOG
v1.1.3 – April 27 2021
Links
Binary (Linux AMD64)
Deb (Linux AMD64)
RPM (Linux AMD64)
Docker image: run
docker pull grafana/metrics-enterprise:v1.1.3
License: Grafana Labs license
Changelog
- [SECURITY] Alertmanager: Fix a local file disclosure vulnerability when
-experimental.alertmanager.enable-api
is used (CVE-2021-31231):- The HTTP Basic auth password_file can be used as an attack vector to send any file content via a webhook.
- The Alertmanager templates can be used as an attack vector to send any file content because the Alertmanager can load any text file specified in the templates list.
v1.1.2 – January 20 2021
Links
Binary (Linux AMD64)
Deb (Linux AMD64)
RPM (Linux AMD64)
License: Grafana Labs license
Changelog
- [BUGFIX] Querier: fix default value incorrectly overriding
-querier.frontend-address
in single-binary mode.
v1.1.1 – January 14 2021
Links
Binary (Linux AMD64)
Deb (Linux AMD64)
RPM (Linux AMD64)
License: Grafana Labs license
Changelog
- [BUGFIX] Ruler: Minimize gaps on rule evaluations with stale input and enabled ruler evaluation delay.
v1.1.0 – January 12 2021
Links
Binary (Linux AMD64)
Deb (Linux AMD64)
RPM (Linux AMD64)
License: Grafana Labs License
Changelog
[CHANGE] Admin-API: Resources must not be both prefixed and suffixed with the
__
characters. If any of your existing resources exist with this naming pattern, they must be deleted and recreated with a new name before upgrading.[CHANGE] Graphite: Allow storage schema and storage aggregation configs to be defined per tenant.
[CHANGE] Admin-Client: Instance management client calls no longer use object storage
Iter
calls when retrieving the latest version of a resource.[CHANGE] Graphite: Add API endpoints to explore the available Graphite functions.
[CHANGE] Admin: The selectors for label policies are now provided as PromQL label strings instead of typed objects.
Deprecated:
"label_policies": [ { "selector": [ { "name": "env", "value": "dev", "type": "EQ" } ] } ]
New:
"label_policies": [ { "selector": "{env=\"dev\"}" } ]
[CHANGE] Admin: Operations with an
ADMIN
scope are no longer restricted to operating on clusters they have as a configured realm.[CHANGE] Deprecate
enterprise_features
config section in favor of the Cortex config extension.Deprecated:
enterprise_features: ruler_s3_request_headers: file: <string> poll_interval: <duration> ruler_remote_write: enabled: <bool> wal_dir: <string>
New:
ruler: storage: s3: header_map_file_path: <string> header_map_poll_interval: <duration> remote_write: enabled: <bool> wal_dir: <string>
[FEATURE] Ruler: Alerts can now be correctly forwarded to the Alertmanager with enterprise authentication enabled by setting the basic authentication username to
__alertmanager__
and the password to a API token with access to every instance.[FEATURE] Queries: LBAC enforcement has been added for queries and label value requests.
- When GEM is run using the
default
authentication mode, LBAC policies are specified using theX-Prom-Label-Policy
HTTP header in the format:X-Prom-Label-Policy: <tenant-id>:urlEscaped(<prometheus label selector>)
. For example, a policy that only allows metrics with the labelenv
equal todev
for tenanttest-instance
could specified with the following header:X-Prom-Label-Policy: test-instance:%7Benv=%22dev%22%7D
. To specify multiple policies either set the header multiple times or set the header with a single string of multiple policies separated by an unescaped comma.
- When GEM is run using the
[FEATURE] Admin API: add
label_policies
field, which contains an array of label matchers to the access policy realm JSON.{ "realms": [ { "instance": "<string>", "cluster": "<string>", "label_policies": [ { "selector": [ { "type": "<enum: EQ | NEQ | RE | NRE>", "name": "<string>", "value": "<string>" } ] } ] } ] }
[FEATURE] Admin: Add target
tokengen
to generate tokens for the default or a custom access policy.[FEATURE] Admin: Added a default
__admin__
access policy that has anADMIN
scope. This policy can be disabled adding the following to the GEM configuration file.admin_client: disable_default_admin_policy: true
[FEATURE] Querier: Queries can be federated across multiple tenants. The tenants IDs involved need to be specified separated by a
|
character in theX-Scope-OrgID
request header.[FEATURE] Add
gateway
target that can be configured to proxy requests to microservices and can be used to load balance remote_write requests to the distributors.[ENHANCEMENT] AdminAPI: Add scope for read only admin access,
admin:read
.[ENHANCEMENT] AdminAPI: Add separate set of scopes for alerts and rules.
alerts:read
alerts:write
logs:rules:read
logs:rules:write
metrics:rules:read
metrics:rules:write
[ENHANCEMENT] Reduce allocations in Graphite Ingester, when ingesting untagged Graphite metrics.
[ENHANCEMENT] Serve Graphite /metrics/find requests by keeping track of all recent metrics in an in-memory index on the Ingesters to reduce latency.
[ENHANCEMENT] Add auxiliary Graphite API endpoints to explore tags and obtain auto-complete suggestions for the Grafana query editor.
[ENHANCEMENT] Admin API: add ClusterKind support for Logs & Traces.
[ENHANCEMENT] Admin API: add scopes for Logs.
[ENHANCEMENT] Admin: The bootstrap target no longer needs to be run before being able to start GEM with enterprise features. Every target will now try to perform bootstrapping on startup if it has not already been done. Failure to bootstrap will not prevent GEM running, but enterprise features will not be available.
[ENHANCEMENT] Add
grafana_labs_license_expiry_timestamp
metric to expose GEM license expiration as a UNIX timestamp, in seconds.[BUGFIX] Graphite: Fixing a bug in the request parsing of GET requests on the auto-complete endpoints.
[BUGFIX] Graphite: When ingesting datapoints resulting in out-of-order/out-of-bounds/duplicate-sample we need to return status 200 to prevent an indefinite loop.
[BUGFIX] Ruler: Fix issue where remote-write rule groups are created then immediately deleted when a rule group name contains the
/
delimiter character.
Upstream Cortex changes
- Upstream Cortex hash:
c3b8c46fd8fc9a2aa85accbe54cb00be2552dcd9
- Changes since last GEM release
v1.0.2 – October 16 2020
Links
Changelog
- [CHANGE] Update vendored Cortex from v1.4.0 to [v1.4.0-21bad5][21bad5]
- [BUGFIX] Fix potential panic due to writing into a closed chan in the graphite query executor.
- [ENHANCEMENT] Admin: Access policy create operations now enforce valid instance/cluster names for the realms configured on the access policy.
- [ENHANCEMENT] Add
-version
flag to GEM. - [FEATURE] Add config options to rate limit the LIST methods of buckets.
- [FEATURE] Adds the Graphite /render API endpoint, which can be used to query metrics with the Graphite query language.
- [FEATURE] Add config options to specify and poll files to inject arbitrary HTTP headers in requests to S3 for the admin and blocks client.
blocks_storage: s3: header_map_file_path: <path to header file> header_map_poll_interval: <duration string> admin_client: storage: s3: header_map_file_path: <path to header file> header_map_poll_interval: <duration string>
- [FEATURE] Adds the Graphite /metrics/find API endpoint, which can be used to obtain lists of metrics matching a given pattern (Grafana query editor auto-complete, dashboard variable population, etc).
- [FEATURE] Add a default access policy option for OpenID Connect tokens.
Upstream Cortex details
- Cortex Hash: [21bad57b346c730d684d6d0205efef133422ab28][21bad5]
- Cortex CHANGELOG
v1.0.1 – October 06 2020
Links
Upstream Cortex details
- Cortex Hash: 23554ce028c090a4a3413ac0e35e5e1dc9fa929f
- Cortex Version: 1.4.0
Changelog
- [CHANGE] Update vendored Cortex to v1.4.0.
v1.0.0 – September 17 2020
Links
Upstream Cortex details
- Cortex Hash: bb5fcc929832f7bd2a6c2df348b387abcb8b961e
- Cortex Version: 1.4.0-rc.0
Changelog
- [BUGFIX] Make config field names consistent.
- [CHANGE] Use Go 1.14.9 to build the project and cut
build-image@v0.1.3
.
v1.0.0-rc.2 – September 15 2020
Links
Upstream Cortex details
- Cortex Hash: c3a344784a0c8ce70ef2521f543033dee3dce6c6
- Cortex Version: 1.3.1
Changelog
- [BUGFIX] Admin API: Fix panic on start up for
admin-api
target.
v1.0.0-rc.1 – September 04 2020
Links
Upstream Cortex details
- Cortex Hash: 4f6e1e5c48ccad2c1988cf1d36ca522ae0c805ed
- Cortex Version: 1.3.1
Changelog
- [CHANGE] Admin-Client: The storage backend for the admin client no longer defaults to
s3
. Instead no default is set and the admin client will not start up unless a default is set. - [CHANGE] The following features will no longer be active unless GEM is started with access to a valid license.
- Admin API
- Ruler S3 auth headers
- Ruler API to configure remote write rule groups
v0.6.3 – August 20 2020
Links
Upstream Cortex details
- Cortex Hash: 2bda7b94
- Cortex Version: 1.2.1
Changelog
- [CHANGE] Auth: removed
auth.enable
flag and addauth.type
flag withdefault
andenterprise
options. - [FEATURE] Admin API: Add list endpoint for stored licenses.
v0.6.2 – August 04 2020
Links
Upstream Cortex details
- Cortex Hash: 6db67a4efbbf62b1133fa037a95382a21f752bbf
- Cortex Version: 1.2.1
Changelog
- [CHANGE] Ruler: S3 Headers are no longer protected by a license.