This is archived documentation for v1.5.0. Go to the latest version.

Downloads

Grafana Enterprise Metrics downloads

Releases

v1.5.0 – August 24th 2021

  • Binary (Linux AMD64)

    • Download
    • SHA256: e1dbb7640ad49509f22182c4a732b3b9b28c57f860ed2860718d33670fbd4fbe
  • Deb (Linux AMD64)

    • Download
    • SHA256: 549207728b7e023109a375f41b403cc73749344e08e285e238081d6ddafd3bc5
  • RPM (Linux AMD64)

    • Download
    • SHA256: f2116c99cf835f10562a67ae052411c82bd1b3f61470d9f4926f6e243fc35227
  • Docker image: run docker pull grafana/metrics-enterprise:v1.5.0 (digest: sha256:b0d98ffe49df461a524743a49dca26952a59c9c007231035e52f0a06e5003fff)

  • License: Grafana Labs license

Changelog

  • [CHANGE] Alertmanager: allowed to configure the experimental receivers firewall on a per-tenant basis. The following CLI flags (and their respective YAML config options) have been changed and moved to the limits config section:
    • -alertmanager.receivers-firewall.block.cidr-networks renamed to -alertmanager.receivers-firewall-block-cidr-networks
    • -alertmanager.receivers-firewall.block.private-addresses renamed to -alertmanager.receivers-firewall-block-private-addresses
  • [CHANGE] Memberlist: Expose default configuration values to the command line options. Note that setting these explicitly to zero will no longer cause the default to be used. If the default is desired, then do set the option. The following are affected:
    • -memberlist.stream-timeout
    • -memberlist.retransmit-factor
    • -memberlist.pull-push-interval
    • -memberlist.gossip-interval
    • -memberlist.gossip-nodes
    • -memberlist.gossip-to-dead-nodes-time
    • -memberlist.dead-node-reclaim-time
  • [CHANGE] Authentication: Access Policy names passed via a JWT token in the OIDC auth flow will be downcased before being matched against Access Policies in GEM. This improves interoperability between GEM and other systems since GEM only allows lowercase characters in Access Policy names
  • [CHANGE] Change default value of -server.grpc.keepalive.min-time-between-pings from 5m to 10s and -server.grpc.keepalive.ping-without-stream-allowed to true.
  • [CHANGE] Changed -alertmanager.storage.type default value from configdb to local.
  • [CHANGE] Changed -ruler.storage.type default value from configdb to local.
  • [CHANGE] Cortex chunks storage has been deprecated and it’s now in maintenance mode: all Cortex users are encouraged to migrate to the blocks storage. No new features will be added to the chunks storage. The default Cortex configuration still runs the chunks engine; please check out the blocks storage doc on how to configure Cortex to run with the blocks storage.
  • [CHANGE] Dependency: update go-redis from v8.2.3 to v8.9.0.
  • [CHANGE] Deprecated the bootstrap target in favor of the tokengen target.
  • [CHANGE] Enable strict JSON unmarshal for pkg/util/validation.Limits struct. The custom UnmarshalJSON() will now fail if the input has unknown fields.
  • [CHANGE] Graphite: proxy no longer generates generic metrics metadata. This helps to reduce ingestion rate as counted by Cortex and used for limits.
  • [CHANGE] Ingester: Change default value of -ingester.active-series-metrics-enabled to true. This incurs a small increase in memory usage, between 1.2% and 1.6% as measured on ingesters with 1.3M active series.
  • [CHANGE] License: Flag -bootstrap.license.path has been deprecated in favor of -license.path.
  • [CHANGE] Memberlist: the memberlist_kv_store_value_bytes has been removed due to values no longer being stored in-memory as encoded bytes.
  • [CHANGE] Querier / ruler: Change -querier.max-fetched-chunks-per-query configuration to limit to maximum number of chunks that can be fetched in a single query. The number of chunks fetched by ingesters AND long-term storare combined should not exceed the value configured on -querier.max-fetched-chunks-per-query.
  • [CHANGE] Querier / ruler: deprecated -store.query-chunk-limit CLI flag (and its respective YAML config option max_chunks_per_query) in favour of -querier.max-fetched-chunks-per-query (and its respective YAML config option max_fetched_chunks_per_query). The new limit specifies the maximum number of chunks that can be fetched in a single query from ingesters and long-term storage: the total number of actual fetched chunks could be 2x the limit, being independently applied when querying ingesters and long-term storage.
  • [CHANGE] Query-frontend: Enable query stats by default, they can still be disabled with -frontend.query-stats-enabled=false.
  • [CHANGE] Removed configdb support from Ruler and Alertmanager backend storages.
  • [CHANGE] Removed log_messages_total metric.
  • [CHANGE] Removed query sharding for the chunks storage. Query sharding is now only supported for blocks storage.
  • [CHANGE] Renamed metric deprecated_flags_inuse_total as deprecated_flags_used_total.
  • [CHANGE] Renamed metric experimental_features_in_use_total as experimental_features_used_total.
  • [CHANGE] Some files and directories on local disk now have stricter permissions, and are only readable by owner, but not group or others.
  • [CHANGE] The example Kubernetes manifests (stored at k8s/) have been removed due to a lack of proper support and maintenance.
  • [CHANGE] Update Go version to 1.16.6.
  • [FEATURE] Added flag -debug.block-profile-rate to enable goroutine blocking events profiling.
  • [FEATURE] Alertmanager: Added -alertmanager.max-config-size-bytes limit to control size of configuration files that Cortex users can upload to Alertmanager via API. This limit is configurable per-tenant.
  • [FEATURE] Alertmanager: Added -alertmanager.max-templates-count and -alertmanager.max-template-size-bytes options to control number and size of templates uploaded to Alertmanager via API. These limits are configurable per-tenant.
  • [FEATURE] Alertmanager: Added rate-limits to notifiers. Rate limits used by all integrations can be configured using -alertmanager.notification-rate-limit, while per-integration rate limits can be specified via -alertmanager.notification-rate-limit-per-integration parameter. Both shared and per-integration limits can be overwritten using overrides mechanism. These limits are applied on individual (per-tenant) alertmanagers. Rate-limited notifications are failed notifications. It is possible to monitor rate-limited notifications via new cortex_alertmanager_notification_rate_limited_total metric.
  • [FEATURE] Alertmanager: support negative matchers, time-based muting - upstream release notes.
  • [FEATURE] Allow for reporting CPU time usage to AWS Marketplace metering service in case GEM is running as AWS Marketplace container product.
  • [FEATURE] Collect and store CPU time usage reports in Admin store, which can later be used to submit to metering services, such as the AWS Marketplace API
  • [FEATURE] Querier/Ruler: Added new -querier.max-fetched-chunk-bytes-per-query flag. When Cortex is running with blocks storage, the max chunk bytes limit is enforced in the querier and ruler and limits the size of all aggregated chunks returned from ingesters and storage as bytes for a query.
  • [FEATURE] Querier: Added new -querier.max-fetched-series-per-query flag. When Cortex is running with blocks storage, the max series per query limit is enforced in the querier and applies to unique series received from ingesters and store-gateway (long-term storage).
  • [FEATURE] Query Frontend: Add cortex_query_fetched_chunks_total per-user counter to expose the number of chunks fetched as part of queries. This metric can be enabled with the -frontend.query-stats-enabled flag (or its respective YAML config option query_stats_enabled).
  • [FEATURE] Query Frontend: Add cortex_query_fetched_series_total and cortex_query_fetched_chunks_bytes_total per-user counters to expose the number of series and bytes fetched as part of queries. These metrics can be enabled with the -frontend.query-stats-enabled flag (or its respective YAML config option query_stats_enabled).
  • [FEATURE] Query Frontend: Add experimental querysharding for the block storage. You can now enabled querysharding for block storage (-store.engine) by setting -querier.parallelise-shardable-queries to true.
  • [FEATURE] Ruler Storage: S3 header extensions were added to the new ruler storage S3 config block.
  • [FEATURE] Ruler: Add new -ruler.query-stats-enabled which when enabled will report the cortex_ruler_query_seconds_total as a per-user metric that tracks the sum of the wall time of executing queries in the ruler in seconds.
  • [FEATURE] When running GEM as AWS Marketplace container product then the Go runtime variable GOMAXPROCS is automatically set to match the container CPU quota, in case Kubernetes CPU resource limits are set.
  • [FEATURE] Alertmanager: The experimental sharding feature is now considered complete. Detailed information about the configuration options can be found here for alertmanager and here for the alertmanager storage. To use the feature:
    • Ensure that a remote storage backend is configured for Alertmanager to store state using -alertmanager-storage.backend, and flags related to the backend. Note that the local and configdb storage backends are not supported.
    • Ensure that a ring store is configured using -alertmanager.sharding-ring.store, and set the flags relevant to the chosen store type.
    • Enable the feature using -alertmanager.sharding-enabled.
    • Note the prior addition of a new configuration option -alertmanager.persist-interval. This sets the interval between persisting the current alertmanager state (notification log and silences) to object storage. See the configuration file reference for more information.
  • [ENHANCEMENT] Add Cassandra support.
  • [ENHANCEMENT] Add timeout for waiting on compactor to become ACTIVE in the ring.
  • [ENHANCEMENT] Added tenant_ids tag to tracing spans
  • [ENHANCEMENT] Added option -distributor.excluded-zones to exclude ingesters running in specific zones both on write and read path.
  • [ENHANCEMENT] Added zone-awareness support to alertmanager for use when sharding is enabled. When zone-awareness is enabled, alerts will be replicated across availability zones.
  • [ENHANCEMENT] Admin-API: Add a new endpoint for returning product and feature information at /admin/api/v1/features
  • [ENHANCEMENT] Admin-API: Allow admin-api to operate for read-only request when no license is present.
  • [ENHANCEMENT] Alertmanager: Added -alertmanager.max-alerts-count and -alertmanager.max-alerts-size-bytes to control max number of alerts and total size of alerts that a single user can have in Alertmanager’s memory. Adding more alerts will fail with a log message and incrementing cortex_alertmanager_alerts_insert_limited_total metric (per-user). These limits can be overrided by using per-tenant overrides. Current values are tracked in cortex_alertmanager_alerts_limiter_current_alerts and cortex_alertmanager_alerts_limiter_current_alerts_size_bytes metrics.
  • [ENHANCEMENT] Alertmanager: Added -alertmanager.max-dispatcher-aggregation-groups option to control max number of active dispatcher groups in Alertmanager (per tenant, also overrideable). When the limit is reached, Dispatcher produces log message and increases cortex_alertmanager_dispatcher_aggregation_group_limit_reached_total metric.
  • [ENHANCEMENT] Alertmanager: Cleanup persisted state objects from remote storage when a tenant configuration is deleted.
  • [ENHANCEMENT] Authentiation: OIDC integration now supports a JWT with multiple roles. When present, these roles will be rolled up into a “virtual” access policy that provides metrics read access to the union of instances contained in those roles.
  • [ENHANCEMENT] Blocks storage: support ingesting exemplars and querying of exemplars. Enabled by setting new CLI flag -blocks-storage.tsdb.max-exemplars=<n> or config option blocks_storage.tsdb.max_exemplars to positive value.
  • [ENHANCEMENT] Distributor: Added distributors ring status section in the admin page.
  • [ENHANCEMENT] Etcd: Added username and password to etcd config.
  • [ENHANCEMENT] Expose CPU quota information (number of cores, cgroup quota) as Prometheus metrics.
  • [ENHANCEMENT] Expose error counters and timestamps of CPU usage reporting as Prometheus metrics when AWS Marketplace meterting is enabled.
  • [ENHANCEMENT] Expose value of GOMAXPROCS as Prometheus metrics.
  • [ENHANCEMENT] Facilitate running GEM Docker image as a non-root user. Usage is documented in the Kubernetes deployment documentation.
  • [ENHANCEMENT] Ingester: Added option -ingester.ignore-series-limit-for-metric-names with comma-separated list of metric names that will be ignored in max series per metric limit.
  • [ENHANCEMENT] Ingester: added option -ingester.readiness-check-ring-health to disable the ring health check in the readiness endpoint.
  • [ENHANCEMENT] License: Added flag -license.type that is used to specify that the APP is running through AWS Marketplace.
  • [ENHANCEMENT] License: Implemented /licenses endpoint that responds with static list of licenses that replaces default implementation if the APP is running through AWS Marketplace.
  • [ENHANCEMENT] License: Implemented logic to check if AWS Marketplace subscription is active instead of checking license file if the APP is running through AWS Marketplace.
  • [ENHANCEMENT] Memberlist: expose configuration of memberlist packet compression via -memberlist.compression=enabled.
  • [ENHANCEMENT] Memberlist: optimized receive path for processing ring state updates, to help reduce CPU utilization in large clusters.
  • [ENHANCEMENT] Node-API: Added TSDB block metadata to the exportable debug archive.
  • [ENHANCEMENT] Node-API: Register a new endpoint for fetching a compressed debug file containing config and version information at /node/api/v1/debug-export.
  • [ENHANCEMENT] Node-API: Register a new endpoint for fetching version information about the nodes at /node/api/v1/version.
  • [ENHANCEMENT] Querier now can use the LabelNames call with matchers, if matchers are provided in the /labels API call, instead of using the more expensive MetricsForLabelMatchers call as before. This can be enabled by enabling the -querier.query-label-names-with-matchers-enabled flag once the ingesters are updated to this version. In the future this is expected to become the default behavior.
  • [ENHANCEMENT] Reduce memory used by streaming queries, particularly in ruler.
  • [ENHANCEMENT] Ring, query-frontend: Avoid using automatic private IPs (APIPA) when discovering IP address from the interface during the registration of the instance in the ring, or by query-frontend when used with query-scheduler. APIPA still used as last resort with logging indicating usage.
  • [ENHANCEMENT] Ruler: added rule_group label to metrics cortex_prometheus_rule_group_iterations_total and cortex_prometheus_rule_group_iterations_missed_total.
  • [ENHANCEMENT] Scanner: add support for DynamoDB (v9 schema only).
  • [ENHANCEMENT] Scanner: retry failed uploads.
  • [ENHANCEMENT] Storage: Added the ability to disable Open Census within GCS client (e.g -gcs.enable-opencensus=false).
  • [ENHANCEMENT] Store-gateway: added -store-gateway.sharding-ring.wait-stability-min-duration and -store-gateway.sharding-ring.wait-stability-max-duration support to store-gateway, to wait for ring stability at startup.
  • [ENHANCEMENT] Wildcard Datasource: Wildcard “*” datasources are now supported in datasource urls for GEM. This allows an action to have access to all instances in all access policies associated with the provided token. If that set of instances includes a wildcard “*”, then access is expanded to all instances in the cluster.
  • [ENHANCEMENT] Added instrumentation to Redis client, with the following metrics:
    • cortex_rediscache_request_duration_seconds
  • [ENHANCEMENT] Include additional limits in the per-tenant override exporter. The following limits have been added to the cortex_overrides metric:
    • max_fetched_series_per_query
    • max_fetched_chunk_bytes_per_query
    • ruler_max_rules_per_rule_group
    • ruler_max_rule_groups_per_tenant
  • [ENHANCEMENT] License Manager: Added functionality to regularly check the local license file and sync it to the license storage backend.
    • Added metrics grafana_labs_license_syncs_total and grafana_labs_license_sync_failures_total.
  • [ENHANCEMENT] Ring: allow experimental configuration of disabling of heartbeat timeouts by setting the relevant configuration value to zero. Applies to the following:
    • -distributor.ring.heartbeat-timeout
    • -ring.heartbeat-timeout
    • -ruler.ring.heartbeat-timeout
    • -alertmanager.sharding-ring.heartbeat-timeout
    • -compactor.ring.heartbeat-timeout
    • -store-gateway.sharding-ring.heartbeat-timeout
  • [ENHANCEMENT] Ring: allow heartbeats to be explicitly disabled by setting the interval to zero. This is considered experimental. This applies to the following configuration options:
    • -distributor.ring.heartbeat-period
    • -ingester.heartbeat-period
    • -ruler.ring.heartbeat-period
    • -alertmanager.sharding-ring.heartbeat-period
    • -compactor.ring.heartbeat-period
    • -store-gateway.sharding-ring.heartbeat-period
  • [ENHANCEMENT] Alertmanager: introduced new metrics to monitor operation when using -alertmanager.sharding-enabled:
    • cortex_alertmanager_state_fetch_replica_state_total
    • cortex_alertmanager_state_fetch_replica_state_failed_total
    • cortex_alertmanager_state_initial_sync_total
    • cortex_alertmanager_state_initial_sync_completed_total
    • cortex_alertmanager_state_initial_sync_duration_seconds
    • cortex_alertmanager_state_persist_total
    • cortex_alertmanager_state_persist_failed_total
  • [ENHANCEMENT] Memberlist: introduced new metrics to aid troubleshooting tombstone convergence:
    • memberlist_client_kv_store_value_tombstones
    • memberlist_client_kv_store_value_tombstones_removed_total
    • memberlist_client_messages_to_broadcast_dropped_total
  • [ENHANCEMENT] Ruler: added new metrics for tracking total number of queries and push requests sent to ingester, as well as failed queries and push requests. Failures are only counted for internal errors, but not user-errors like limits or invalid query. This is in contrast to existing cortex_prometheus_rule_evaluation_failures_total, which is incremented also when query or samples appending fails due to user-errors.
    • cortex_ruler_write_requests_total
    • cortex_ruler_write_requests_failed_total
    • cortex_ruler_queries_total
    • cortex_ruler_queries_failed_total
  • [BUGFIX] Alertmanager: fix Alertmanager status page if clustering via gossip is disabled or sharding is enabled.
  • [BUGFIX] Authentication: fix handling of missing instances, or when instance has no matching access policy, by properly returning a 401 instead of crashing.
  • [BUGFIX] Compactor: fixed panic while collecting Prometheus metrics.
  • [BUGFIX] Graphite: Apply the max-points-per-req-hard limit correctly.
  • [BUGFIX] Graphite: Fix race in index.json API endpoint which lead to incomplete results.
  • [BUGFIX] HA Tracker: when cleaning up obsolete elected replicas from KV store, tracker didn’t update number of cluster per user correctly.
  • [BUGFIX] Ingester: fix issue where runtime limits erroneously override default limits.
  • [BUGFIX] Ingester: fixed infrequent panic caused by a race condition between TSDB mmap-ed head chunks truncation and queries.
  • [BUGFIX] Ingester: fixed ingester stuck on start up (LEAVING ring state) when -ingester.heartbeat-period=0 and -ingester.unregister-on-shutdown=false.
  • [BUGFIX] Invalidate cached authentication tokens when they are deleted from object storage.
  • [BUGFIX] Make multiple Get requests instead of MGet on Redis Cluster.
  • [BUGFIX] Memberlist: fix to setting the default configuration value for -memberlist.retransmit-factor when not provided. This should improve propagation delay of the ring state (including, but not limited to, tombstones). Note that if the configuration is already explicitly given, this fix has no effect.
  • [BUGFIX] Purger: fix Invalid null value in condition for column range caused by nil value in range for WriteBatch query.
  • [BUGFIX] Querier: Fix issue where samples in a chunk might get skipped by batch iterator.
  • [BUGFIX] Querier: fix queries failing with “at least 1 healthy replica required, could only find 0” error right after scaling up store-gateways until they’re ACTIVE in the ring.
  • [BUGFIX] Query-frontend: Fix 401s during query_range requests when enterprise authentication is used. The workaround involving disabling enterprise authentication on the querier can now be removed.
  • [BUGFIX] Ruler: Fix bug in rule forwarding with remote write which could cause filling up the disk because it was not truncated.
    • New flags called -ruler.remote-write.wal-truncate-frequency, -ruler.remote-write.min-wal-time and -ruler.remote-write.max-wal-time have been added.
  • [BUGFIX] Ruler: Honor the evaluation delay for the ALERTS and ALERTS_FOR_STATE series.
  • [BUGFIX] Ruler: fix /ruler/rule_groups endpoint doesn’t work when used with object store.
  • [BUGFIX] Ruler: fix startup in single-binary mode when the new ruler_storage is used.
  • [BUGFIX] Ruler: fixed counting of PromQL evaluation errors as user-errors when updating cortex_ruler_queries_failed_total.
  • [BUGFIX] Store-gateway: when blocks sharding is enabled, do not load all blocks in each store-gateway in case of a cold startup, but load only blocks owned by the store-gateway replica.
  • [BUGFIX] Upgrade Prometheus. TSDB now waits for pending readers before truncating Head block, fixing the chunk not found error and preventing wrong query results.

v1.4.2 – Jul 21st 2021

  • Binary (Linux AMD64)

    • Download
    • SHA256: 69682495d5995e04616894294b5af8661a03155a01a99beff93e0ea9b36a5007
  • Deb (Linux AMD64)

    • Download
    • SHA256: f6f09f334d0b577245309af2ec3429febc11ac3d196a5f0b5f2cd391a4147cd6
  • RPM (Linux AMD64)

    • Download
    • SHA256: 04e9062bafd0298d3402d9051bafe54cb6871ab28de1df7101c505e7d631a4af
  • Docker image: run docker pull grafana/metrics-enterprise:v1.4.2 (digest: sha256:385b563669a5ba4a459f833a2c356884b757de719e43369ead0c5dc59cb11d94)

  • License: Grafana Labs license

Changelog

  • [SECURITY] Prevent path traversal attack from users able to control the HTTP header X-Scope-OrgID. (CVE-2021-36157)
    • Users only have control of the HTTP header when GEM is configured with flags -auth.type=default and -tenant-federation.enabled=false
  • [SECURITY] Update build image to use Go 1.16.6. (CVE-2021-34558) #1874
  • [BUGFIX] Ruler: Register remote write metrics correctly. #1814

Upstream Cortex details

  • Cortex Hash: 2210ebb7052a9efb99d0e4dc53043a3f5d806d00

v1.4.1 – June 29th 2021

  • Binary (Linux AMD64)

    • Download
    • SHA256: e1dd56442d1d2fd8cdf224938207fda4845eeb3f610e5e11a920e23de43adb5a
  • Deb (Linux AMD64)

    • Download
    • SHA256: 664140413d7d47e4a37d9aa435b4ebe9607ed47cdfae4e0631d02cd209f63076
  • RPM (Linux AMD64)

    • Download
    • SHA256: 4bb9c8e17819a63a7e7bc38a4366e4047f7315cfd680524524529a91fb45d9c2
  • Docker image: run docker pull grafana/metrics-enterprise:v1.4.1 (digest: sha256:d1d17bfe2ec984b093b9da1ab8cdea1f764f24f16b38557d719254c4e64c9f9a)

  • License: Grafana Labs license

Changelog

  • [BUGFIX] Update the GEM build image to use Alpine 3.14, python 3.9 and gsutil 4.52.

Upstream Cortex details

  • Cortex Hash: 98dd0c4d69576fdfaf2b9bfd7aa475e835e11429

v1.4.0 – June 28th 2021

  • Binary (Linux AMD64)

    • Download
    • SHA256: 9237645e6e2c046d46035c64c74cd5b146312b19cfc30d684b058a67d89c9f13
  • Deb (Linux AMD64)

    • Download
    • SHA256: de8e197e0ca8420cfe296fee2ba37891e72e7396afcf54a26f91cccafc146b9b
  • RPM (Linux AMD64)

    • Download
    • SHA256: a11f2eb10d5ba375a2480e85f94d1c82d63f142858349562d11b99321a40a8c6
  • Docker image: run docker pull grafana/metrics-enterprise:v1.4.0 (digest: sha256:ff38e0544d805bfd1450a1f033ed79585252a4444d247e0e4c649625619215ab)

  • License: Grafana Labs license

Changelog

  • [CHANGE] Breaking: Verify token issuer when using OIDC authentication. Includes a breaking change for users of OIDC authentication. #1571
    • Before this change the configuration of OIDC authentication required the OIDC provider’s jwks_uri to be set in the configuration flag auth.admin.oidc.url. This flag has been deprecated.
    • A new flag named auth.admin.oidc.issuer-url has been added, and it must be set to the URL of the OIDC provider. For example: -auth.admin.oidc.issuer-url=https://accounts.google.com Note: This is not simply a rename of the old flag; you also need to update the value. The defined issuer is required to provide the OIDC discovery endpoint (/.well-known/openid-configuration)
  • [CHANGE] Breaking: The GEM/GEL Ruler can now be accessed by access policies with rules read/write permissions, which are no longer metrics/logs specific #1366 & #1403
    • Before this change, there were metric rule specific permissions metrics:rules:read and metrics:rules:write.
    • The data representation for this change in object storage is backwards compatible, so no change is needed for existing access policies using the new rules.
    • The JSON representation for these rules is not backwards compatible, and so any JSON interactions with the API that specified the strings metrics:rules:read or metrics:rules:write must be updated to the strings rules:read and rules:write respectively.
    • This breaking change applies to the GEM Plugin as well, so please update to version v3.0.X.
  • [CHANGE] Remove enterprise_features config block entirely. #1453
  • [CHANGE] Alertmanager: deprecated -alertmanager.storage.* CLI flags (and their respective YAML config options) in favour of -alertmanager-storage.*. This change doesn’t apply to alertmanager.storage.path and alertmanager.storage.retention.
  • [CHANGE] Blocks storage: removed the config option -blocks-storage.bucket-store.index-cache.postings-compression-enabled, which was deprecated. Postings compression is always enabled.
  • [CHANGE] GEM now fails fast on startup if it is unable to connect to the ring backend.
  • [CHANGE] Querier / ruler: deprecated -store.query-chunk-limit CLI flag (and its respective YAML config option max_chunks_per_query) in favor of -querier.max-fetched-chunks-per-query (and its respective YAML configuration option max_fetched_chunks_per_query). The new limit specifies the maximum number of chunks that can be fetched in a single query from ingesters and long-term storage: the total number of chunks that are actually fetched, in the worst case, can be twice the limit because the limit is applied to ingesters as well as long-term storage.
  • [CHANGE] Query frontend: removed the configuration option -querier.compress-http-responses, which was deprecated. Instead, use-api.response-compression-enabled.
  • [CHANGE] Runtime-config / overrides: removed the config options -limits.per-user-override-config (use -runtime-config.file) and -limits.per-user-override-period (use -runtime-config.reload-period), both deprecated.
  • [FEATURE] Add embedded recording rules to the Enterprise Ruler to support building dashboards and alerts from internal metrics written directly to GEM itself via a distributor. #1459
    • To enable or disable the feature, use the -instrumentation.enabled flag or associated enabled setting on the instrumentation configuration block. The feature is disabled by default.
  • [FEATURE] Add the ability to write internal metrics directly to GEM itself via a distributor. #1281
    • To configure, or enabled or disabled the feature, user the -instrumentation.enabled flag and associated other flags or the instrumentation configuration block:
      instrumentation:
        enabled: false
        flush_period: 15s
        write_timeout: 10s
        distributor_client:
          address: dns:///:9095
          connect_timeout: 5s
          tls_enabled: false
          tls_cert_path:
          tls_key_path:
          tls_ca_path:
          tls_server_name:
          tls_insecure_skip_verify:
      
      The feature is disabled by default.
  • [FEATURE] Self-monitoring: expose filesystem usage metrics to source the disk utilization panel in the self-monitoring resource dashboards #1618
  • [FEATURE] Add an experimental GEM component federation-frontend, which can be used to federate queries between multiple GEM clusters. #1274
  • [FEATURE] Querier: Added new -querier.max-fetched-series-per-query flag. When GEM is running with blocks storage, the max series per query limit is enforced in the querier and applies to unique series received from ingesters and store-gateway (long-term storage).
  • [FEATURE] Querier/Ruler: Added new -querier.max-fetched-chunk-bytes-per-query flag. When GEM is running with blocks storage, the max chunk bytes limit is enforced in the querier and ruler and limits the size of all aggregated chunks returned from ingesters and storage as bytes for a query.
  • [ENHANCEMENT] Introduce configuration parameter to limit how many points we process per query. #1292
  • [ENHANCEMENT] Adding API endpoints via which a user can post / get their storage schemas / aggregations. #1389
  • [ENHANCEMENT] Admin-API: Listing mutable resources now includes a comma separated list of versions for those resources in the ETag header #1419
  • [ENHANCEMENT] Admin-API: Updating a mutable resources now allows a wildcard value ("*") to be passed as the If-Match header, which allows the updating of any current version #1449
  • [ENHANCEMENT] The /config HTTP endpoint now also returns GEM specific options alongside regular Cortex configuration. #1380
  • [BUGFIX] Fix LBAC regular expression matchers #1305
  • [BUGFIX] Validate all fields of JWT tokens used for auth, except the issuer. #1500
  • [BUGFIX] Ruler: ensure the S3 rule storage flags properly maps to the upstream flags. #1460
  • [BUGFIX] Admin-API: rejecting update requests when access policies have empty scopes or realms. #1447
  • [BUGFIX] Updated licenses are now persisted to object storage, fixing the responses from the license API which would show old license information. #1568
  • [BUGFIX] Validate all fields of JWT tokens used for auth, except the issuer. #1500
  • [BUGFIX] OAuth: Don’t use default access policy when an invalid JWT claim is provided. #1635
  • [BUGFIX] Authentiation: Invalidate cached authentication tokens when they are deleted from object storage. #1703

Upstream Cortex details

  • Cortex Hash: 98dd0c4d69576fdfaf2b9bfd7aa475e835e11429
  • Cortex Commits

v1.3.1 – Jul 21st 2021

  • Binary (Linux AMD64)

    • Download
    • SHA256: 6592ffe2258a44b008c03abe4e52645c7a612bfb7f3d1f5dead44dbc7929904a
  • Deb (Linux AMD64)

    • Download
    • SHA256: d292cb0de1a4ef05b7ffd5c7faa3d9647c91a189cab5daee6362e8f931338be7
  • RPM (Linux AMD64)

    • Download
    • SHA256: 074e9cda3c4c3f74ecf5b45ddcd82c3fc2adc83f93afaaa1f9735eba1854373a
  • Docker image: run docker pull grafana/metrics-enterprise:v1.3.1 (digest: sha256:e03a7ae061d5f617490812a6f45c6362fdc9ef79010555a207ebee2174ef9b23)

  • License: Grafana Labs license

Changelog

  • [SECURITY] Prevent path traversal attack from users able to control the HTTP header X-Scope-OrgID. (CVE-2021-36157)
    • Users only have control of the HTTP header when GEM is configured with flags -auth.type=default and -tenant-federation.enabled=false
  • [SECURITY] Update build image to use Go 1.16.6. (CVE-2021-34558) #1874
  • [BUGFIX] Update the GEM build image to use Alpine 3.14, python 3.9 and gsutil 4.52. #1781
  • [BUGFIX] Ruler: Register remote write metrics correctly. #1814

Upstream Cortex details

  • Cortex Hash: 64592254fe91c86e903882947a58d572a316884d

v1.3.0 – April 26th 2021

  • Binary (Linux AMD64)

    • Download
    • SHA256: 478528db0a22918eeafb1b6f93387d28d0ae6163dd592771a9e9d9f302c3a40d
  • Deb (Linux AMD64)

    • Download
    • SHA256: cfe0ebe4928a4cff007c1f8c86eebc8c73484bfa763c9679c2d3dad7a4c51388
  • RPM (Linux AMD64)

    • Download
    • SHA256: 7278036942905d3341c3e3a9aaadad30dda9fc18e5e3ad86ee092d4e03e77d72
  • Docker image: run docker pull grafana/metrics-enterprise:v1.3.0

  • License: Grafana Labs license

Changelog

  • [SECURITY] Alertmanager: Fix a local file disclosure vulnerability when -experimental.alertmanager.enable-api is used (CVE-2021-31231):
    • The HTTP Basic auth password_file can be used as an attack vector to send any file content via a webhook.
    • The Alertmanager templates can be used as an attack vector to send any file content because the Alertmanager can load any text file specified in the templates list.
  • [CHANGE] Admin API: Concurrent requests to the same resource are no longer allowed. If two requests are issued to create, update, or delete the same resource, then the first one to achieve a lock executes and the second one returns a conflict error. This is handled per process. To enforce this behavior on multiple processes, use leader election. #1186
  • [CHANGE] Admin API: all errors encountered during the processing of HTTP requests are converted to GRPC errors in order to determine the correct HTTP status to return. This enforces consistency for leader election, because some requests are handled internally, and others are forwarded to other instances. #1217
  • [CHANGE] Admin API: all mutation operations (PUT/DELETE) now require an If-Match header to be set (an integer between "" such as "27") to verify that the correct version of the resource is being modified and prevent against race conditions. You can find the current version of a resource in the ETag header that is returned when that resource is read (via GET) or updated (via PUT).
  • [FEATURE] Admin API: you can set per-instance resource limits via the Admin API. This is enabled by default. #1173
    • You can enable or disable this feature by using the -admin-api.limits.enabled or -admin-api.limits.refresh-period flags. Also, you can configure this feature by using the admin_api configuration block:
      admin_api:
        limits:
          enabled: true
          refresh_period: 1m
      
  • [ENHANCEMENT] Upgrade build image to use Go 1.16.3. #1294
  • [ENHANCEMENT] Admin client: Add cortex_admin_client_is_leader gauge metric to determine when the client considers itself the leader. #1175
  • [ENHANCEMENT] Admin API: update an access policy via the Admin API using a PUT request. #1139
  • [ENHANCEMENT] Admin API: Update an instance via the Admin API using a PUT request. #1180
  • [ENHANCEMENT] Gateway: Forward /multitenant_alertmanager/ring and /ruler/ring routes to the alertmanager and ruler proxy backends. #1144
  • [BUGFIX] Graphite: Fix aggregation cache to generate cache keys using correct input data. #963
  • [BUGFIX] Authentication: Fix issue where all requests would trigger a panic if authentication is enabled but no admin client is configured. A error is now printed instead. #1106

Upstream Cortex details

  • Cortex Hash: 2d8477c4a325ce5071676e906efcee4adb687513
  • Cortex Commits

v1.2.1 – April 27 2021

  • Binary (Linux AMD64)

    • Download
    • SHA256: c00f80ceb5994542ec0527e9d1a6a481dbb472c8fdb0318b12142a59b6b32ec4
  • Deb (Linux AMD64)

    • Download
    • SHA256: 741477bbf0d1d4191e413b4f0db96098920df37e27f9a5598b994a6791b0aef3
  • RPM (Linux AMD64)

    • Download
    • SHA256: 28ce6fe43f93bd158d415e03b2ce8bbdf01e0fde1e699f3486b359167d8efb5f
  • Docker image: run docker pull grafana/metrics-enterprise:v1.2.1

  • License: Grafana Labs license

Changelog

  • [SECURITY] Alertmanager: Fix a local file disclosure vulnerability when -experimental.alertmanager.enable-api is used (CVE-2021-31231):
    • The HTTP Basic auth password_file can be used as an attack vector to send any file content via a webhook.
    • The Alertmanager templates can be used as an attack vector to send any file content because the Alertmanager can load any text file specified in the templates list.

v1.2.0 – March 10 2021

  • Binary (Linux AMD64)

    • Download
    • SHA256: 702208cb7b440b44a30a7ba9bbe34e7a1bbd19a632435a92cdd608cb232593c8
  • Deb (Linux AMD64)

    • Download
    • SHA256: a3e5140bf38f6479693608bbaf9bdcb7824795c1a662a70007905310cf35f862
  • RPM (Linux AMD64)

    • Download
    • SHA256: 2de19fd38ed129bc66ee4f76804bd2633c996afaa43900a306cb536d672e8909
  • Docker image: run docker pull grafana/metrics-enterprise:v1.2.0

  • License: Grafana Labs license

Changelog

  • [CHANGE] Gateway: Remove purger proxy configuration, which is not a supported target for blocks clusters.
  • [CHANGE] Auth: Override authentication flags have been renamed:
    • The auth.override-admin-token flag has been changed to auth.override.token.
    • The auth.override-admin-token-file flag has been changed to auth.override.token-file.
  • [FEATURE] Gateway: Improve the gateway target to support unique TLS configurations and write timeouts for each backend.
    • New fields have been added to allow for configuration:
      gateway:
        proxy:
          default:
            tls:
              tls_cert_path: <string>
              tls_key_path: <string>
              tls_ca_path: <string>
              tls_insecure_skip_verify: <bool>
          distributor:
            read_timeout: <duration>
            write_timeout: <duration>
            tls:
            ...
      
  • [FEATURE] Compactor: Introduced time-sharding compaction strategy.
  • [ENHANCEMENT] Distributor: Wrap remote writes in distributor to sample and log them as business intelligence events.
  • [ENHANCEMENT] Metrics emitted for TLS certificate expiration now reflect certificates being reloaded.
  • [ENHANCEMENT] Remove the Graphite Auto Complete Index and use Cortex index instead.
  • [ENHANCEMENT] Add Graphite API endpoint /metrics/index.json.
  • [ENHANCEMENT] Distributor: Wrap remote writes in distributor to sample and log them as business intelligence events.
  • [ENHANCEMENT] Call Cortex Distributor over gRPC from Graphite Write Proxy (formerly Graphite Distributor)
  • [ENHANCEMENT] Admin API: Add feature to elect and admin-api leader instance to handle all mutation requests. Requests to non-leader instances are forwarded to the leader instance.
    • New fields have been added to allow for configuration:
    admin_api:
      leader_election:
        enabled: <bool>
        ring:
          kvstore: <kv.Config>
          heartbeat_period: <duration>
          heartbeat_timeout: <duration>
          tokens_observe_period: <duration>
          instance_interface_name: <[]string>
        client_config: <grpcclient.Config>
    
  • [BUGFIX] LBAC: Fix issue where debug logs would not print the selector and instead print selector="unsupported value type".
  • [BUGFIX] Admin-Client: Warning logs are no longer created on resource creation.
  • [BUGFIX] Ruler: Fix issue where invalid remote-write URLs cause a panic.
  • [BUGFIX] Querier: Apply label access filters on multi tenant access policies.

Upstream Cortex details

v1.1.3 – April 27 2021

  • Binary (Linux AMD64)

    • Download
    • SHA256: 0c2a549552ac2cf406837df4d6823a88bb5089f84d175a5b16d2710dd0ce7f3a
  • Deb (Linux AMD64)

    • Download
    • SHA256: 1814df03f6573deaefbc87de75777873d9d6f724efce74a59e0cae06734c69fb
  • RPM (Linux AMD64)

    • Download
    • SHA256: b2a0fb67aed10a46d1e4e5f3e5db6e77b4e0cb9b167bdb15a6997ae2878d085c
  • Docker image: run docker pull grafana/metrics-enterprise:v1.1.3

  • License: Grafana Labs license

Changelog

  • [SECURITY] Alertmanager: Fix a local file disclosure vulnerability when -experimental.alertmanager.enable-api is used (CVE-2021-31231):
    • The HTTP Basic auth password_file can be used as an attack vector to send any file content via a webhook.
    • The Alertmanager templates can be used as an attack vector to send any file content because the Alertmanager can load any text file specified in the templates list.

v1.1.2 – January 20 2021

  • Binary (Linux AMD64)

    • Download
    • SHA256: 9f302326ddbd9d8f12c78c3d7dab87ab7df086c44d09f94d30491cde3197562c
  • Deb (Linux AMD64)

    • Download
    • SHA256: 8888ad1f009c330820b2bb0e4d1095ba00952f17a97db1ac4bbf837ec76b0d7d
  • RPM (Linux AMD64)

    • Download
    • SHA256: 89f343ce021cbaef0a60e630e7d1b946ef4ea6289a288a67e8c6499bdd660c77
  • License: Grafana Labs license

Changelog

  • [BUGFIX] Querier: fix default value incorrectly overriding -querier.frontend-address in single-binary mode.

v1.1.1 – January 14 2021

  • Binary (Linux AMD64)

    • Download
    • SHA256: 966db853452590a6119826093c77f0ece77f0580e08a21dc392c4c293f27633e
  • Deb (Linux AMD64)

    • Download
    • SHA256: 7f8acb6dcd1de588ccc40235f134f6136f1e810458c40010a9df85a710abc299
  • RPM (Linux AMD64)

    • Download
    • SHA256: a1ba89cca4bbcba887f1bb219eef94b40a75883e8e66616734ab5de5991ad28f
  • License: Grafana Labs license

Changelog

  • [BUGFIX] Ruler: Minimize gaps on rule evaluations with stale input and enabled ruler evaluation delay.

v1.1.0 – January 12 2021

  • Binary (Linux AMD64)

    • Download
    • SHA256: 38207110d6d7db54ac34cc3172b83786dcd053214d49bb0522009991ae21e11d
  • Deb (Linux AMD64)

    • Download
    • SHA256: 25c235bf328125ce3b0a7c4cd65999c6031de915d072ccf81d6aa00db477689e
  • RPM (Linux AMD64)

    • Download
    • SHA256: 73e6662a936ba4ad9126e63896c0f11fba003c0af7dfadc1c8964d027de1bc42
  • License: Grafana Labs License

Changelog

  • [CHANGE] Admin-API: Resources must not be both prefixed and suffixed with the __ characters. If any of your existing resources exist with this naming pattern, they must be deleted and recreated with a new name before upgrading.

  • [CHANGE] Graphite: Allow storage schema and storage aggregation configs to be defined per tenant.

  • [CHANGE] Admin-Client: Instance management client calls no longer use object storage Iter calls when retrieving the latest version of a resource.

  • [CHANGE] Graphite: Add API endpoints to explore the available Graphite functions.

  • [CHANGE] Admin: The selectors for label policies are now provided as PromQL label strings instead of typed objects.

    • Deprecated:

      "label_policies": [
        {
          "selector": [
            {
              "name": "env",
              "value": "dev",
              "type": "EQ"
            }
          ]
        }
      ]
      
    • New:

      "label_policies": [
        {
          "selector": "{env=\"dev\"}"
        }
      ]
      
  • [CHANGE] Admin: Operations with an ADMIN scope are no longer restricted to operating on clusters they have as a configured realm.

  • [CHANGE] Deprecate enterprise_features config section in favor of the Cortex config extension.

    • Deprecated:

      enterprise_features:
        ruler_s3_request_headers:
          file: <string>
          poll_interval: <duration>
        ruler_remote_write:
          enabled: <bool>
          wal_dir: <string>
      
    • New:

      ruler:
        storage:
          s3:
            header_map_file_path: <string>
            header_map_poll_interval: <duration>
        remote_write:
          enabled: <bool>
          wal_dir: <string>
      
  • [FEATURE] Ruler: Alerts can now be correctly forwarded to the Alertmanager with enterprise authentication enabled by setting the basic authentication username to __alertmanager__ and the password to a API token with access to every instance.

  • [FEATURE] Queries: LBAC enforcement has been added for queries and label value requests.

    • When GEM is run using the default authentication mode, LBAC policies are specified using the X-Prom-Label-Policy HTTP header in the format: X-Prom-Label-Policy: <tenant-id>:urlEscaped(<prometheus label selector>). For example, a policy that only allows metrics with the label env equal to dev for tenant test-instance could specified with the following header: X-Prom-Label-Policy: test-instance:%7Benv=%22dev%22%7D. To specify multiple policies either set the header multiple times or set the header with a single string of multiple policies separated by an unescaped comma.
  • [FEATURE] Admin API: add label_policies field, which contains an array of label matchers to the access policy realm JSON.

    {
      "realms": [
        {
          "instance": "<string>",
          "cluster": "<string>",
          "label_policies": [
              {
                "selector": [
                  {
                      "type": "<enum: EQ | NEQ | RE | NRE>",
                        "name": "<string>",
                        "value": "<string>"
                  }
                ]
              }
          ]
        } 
      ]
    }
    
  • [FEATURE] Admin: Add target tokengen to generate tokens for the default or a custom access policy.

  • [FEATURE] Admin: Added a default __admin__ access policy that has an ADMIN scope. This policy can be disabled adding the following to the GEM configuration file.

    admin_client: 
      disable_default_admin_policy: true
    
  • [FEATURE] Querier: Queries can be federated across multiple tenants. The tenants IDs involved need to be specified separated by a | character in the X-Scope-OrgID request header.

  • [FEATURE] Add gateway target that can be configured to proxy requests to microservices and can be used to load balance remote_write requests to the distributors.

  • [ENHANCEMENT] AdminAPI: Add scope for read only admin access, admin:read.

  • [ENHANCEMENT] AdminAPI: Add separate set of scopes for alerts and rules.

    • alerts:read
    • alerts:write
    • logs:rules:read
    • logs:rules:write
    • metrics:rules:read
    • metrics:rules:write
  • [ENHANCEMENT] Reduce allocations in Graphite Ingester, when ingesting untagged Graphite metrics.

  • [ENHANCEMENT] Serve Graphite /metrics/find requests by keeping track of all recent metrics in an in-memory index on the Ingesters to reduce latency.

  • [ENHANCEMENT] Add auxiliary Graphite API endpoints to explore tags and obtain auto-complete suggestions for the Grafana query editor.

  • [ENHANCEMENT] Admin API: add ClusterKind support for Logs & Traces.

  • [ENHANCEMENT] Admin API: add scopes for Logs.

  • [ENHANCEMENT] Admin: The bootstrap target no longer needs to be run before being able to start GEM with enterprise features. Every target will now try to perform bootstrapping on startup if it has not already been done. Failure to bootstrap will not prevent GEM running, but enterprise features will not be available.

  • [ENHANCEMENT] Add grafana_labs_license_expiry_timestamp metric to expose GEM license expiration as a UNIX timestamp, in seconds.

  • [BUGFIX] Graphite: Fixing a bug in the request parsing of GET requests on the auto-complete endpoints.

  • [BUGFIX] Graphite: When ingesting datapoints resulting in out-of-order/out-of-bounds/duplicate-sample we need to return status 200 to prevent an indefinite loop.

  • [BUGFIX] Ruler: Fix issue where remote-write rule groups are created then immediately deleted when a rule group name contains the / delimiter character.

Upstream Cortex changes

v1.0.2 – October 16 2020

Changelog

  • [CHANGE] Update vendored Cortex from v1.4.0 to [v1.4.0-21bad5][21bad5]
  • [BUGFIX] Fix potential panic due to writing into a closed chan in the graphite query executor.
  • [ENHANCEMENT] Admin: Access policy create operations now enforce valid instance/cluster names for the realms configured on the access policy.
  • [ENHANCEMENT] Add -version flag to GEM.
  • [FEATURE] Add config options to rate limit the LIST methods of buckets.
  • [FEATURE] Adds the Graphite /render API endpoint, which can be used to query metrics with the Graphite query language.
  • [FEATURE] Add config options to specify and poll files to inject arbitrary HTTP headers in requests to S3 for the admin and blocks client.
      blocks_storage:
        s3:
          header_map_file_path: <path to header file>
          header_map_poll_interval: <duration string>
      admin_client:
        storage:
          s3:
            header_map_file_path: <path to header file>
            header_map_poll_interval: <duration string>
    
  • [FEATURE] Adds the Graphite /metrics/find API endpoint, which can be used to obtain lists of metrics matching a given pattern (Grafana query editor auto-complete, dashboard variable population, etc).
  • [FEATURE] Add a default access policy option for OpenID Connect tokens.

Upstream Cortex details

  • Cortex Hash: [21bad57b346c730d684d6d0205efef133422ab28][21bad5]
  • Cortex CHANGELOG

v1.0.1 – October 06 2020

Upstream Cortex details

  • Cortex Hash: 23554ce028c090a4a3413ac0e35e5e1dc9fa929f
  • Cortex Version: 1.4.0

Changelog

  • [CHANGE] Update vendored Cortex to v1.4.0.

v1.0.0 – September 17 2020

Upstream Cortex details

  • Cortex Hash: bb5fcc929832f7bd2a6c2df348b387abcb8b961e
  • Cortex Version: 1.4.0-rc.0

Changelog

  • [BUGFIX] Make config field names consistent.
  • [CHANGE] Use Go 1.14.9 to build the project and cut build-image@v0.1.3.

v1.0.0-rc.2 – September 15 2020

Upstream Cortex details

  • Cortex Hash: c3a344784a0c8ce70ef2521f543033dee3dce6c6
  • Cortex Version: 1.3.1

Changelog

  • [BUGFIX] Admin API: Fix panic on start up for admin-api target.

v1.0.0-rc.1 – September 04 2020

Upstream Cortex details

  • Cortex Hash: 4f6e1e5c48ccad2c1988cf1d36ca522ae0c805ed
  • Cortex Version: 1.3.1

Changelog

  • [CHANGE] Admin-Client: The storage backend for the admin client no longer defaults to s3. Instead no default is set and the admin client will not start up unless a default is set.
  • [CHANGE] The following features will no longer be active unless GEM is started with access to a valid license.
    • Admin API
    • Ruler S3 auth headers
    • Ruler API to configure remote write rule groups

v0.6.3 – August 20 2020

Upstream Cortex details

  • Cortex Hash: 2bda7b94
  • Cortex Version: 1.2.1

Changelog

  • [CHANGE] Auth: removed auth.enable flag and add auth.type flag with default and enterprise options.
  • [FEATURE] Admin API: Add list endpoint for stored licenses.

v0.6.2 – August 04 2020

Upstream Cortex details

  • Cortex Hash: 6db67a4efbbf62b1133fa037a95382a21f752bbf
  • Cortex Version: 1.2.1

Changelog

  • [CHANGE] Ruler: S3 Headers are no longer protected by a license.