Help build the future of open source observability software Open positions

Check out the open source projects we support Downloads

The actually useful free plan

Grafana Cloud Free Tier
check

10k series Prometheus metrics

check

50GB logs, 50GB traces, 50GB profiles

check

500VUk k6 testing

check

20+ Enterprise data source plugins

check

100+ pre-built solutions

Featured webinar

Getting started with grafana LGTM stack

Getting started with managing your metrics, logs, and traces using Grafana

Learn how to unify, correlate, and visualize data with dashboards using Grafana.

Grafana Cloud security update: Grafana Cloud Metrics memory corruption issue resolved

Grafana Cloud security update: Grafana Cloud Metrics memory corruption issue resolved

2025-08-03 7 min

On July 31, 2025, we were made aware of an issue within the Grafana Cloud environment that affected users of Grafana Cloud Metrics. We declared an incident, resulting in a patch that addressed the issue within hours. We immediately began an investigation to determine the scope of the incident, which concluded on August 1, 2025.

Our investigation revealed that the cause of the memory corruption issue was a bug introduced in Grafana Cloud via an update to a recent release candidate version of the Mimir distributor, which is part of the open source Mimir project that powers Grafana Cloud Metrics. (The bug is not included in any official OSS Mimir release.) This meant fragments of metrics label and series data from one Grafana Cloud Metrics customer could be inadvertently included in error messages displayed to other Grafana Cloud Metrics customers within the Insights UI or potentially written to their sender logs.

We implemented a fix to the Mimir distributor on July 31, 2025, and all malformed error messages have been scrubbed from Grafana Cloud as of August 3; however, we want to note that the ability to view this data in Grafana Cloud was removed on July 31.

We have alerted via email all Grafana Cloud customers who are impacted by this incident, so if you are a Grafana Cloud customer and have not heard from our teams, you were not impacted.

Our partners and on-prem customers are not affected by this issue. 

For early adopter open source users who have installed a release candidate build of the Mimir distributor (version r350, r351 or r352), you need to update to the r353 release candidate of the Mimir distributor immediately. 

Summary

A memory reuse issue, introduced on July 14, 2025, during an update to the error log management logic of the metrics data receiver process, caused a data leak within the Insights UI for a small subset of Grafana Cloud Metrics customers. This issue stemmed from a change to the Mimir distributor that was intended to improve error messages, but it led to memory corruption.

Specifically, if a race condition was met by Grafana Cloud Metrics users, it was possible for parts of metrics data from one user account to be written to the error log of another in a multi-tenant deployment. This data could then be displayed in the Insights UI, showing information other than that belonging to the account owner. 

Note: Customers using OpenTelemetry Protocol (OTLP) to send metrics data were not impacted by this bug. 

Impact

In a distributed system, it is often difficult to determine where an error is occurring. To help with this, we introduced an update to provide more information if an error occurred while sending metrics from the client to Mimir. Specifically, we introduced a change in error handling and logging in the software that receives metrics (Mimir distributor) and queues the metrics up to write to disk. The change we made to error handling introduced a memory re-use error when certain conditions were met.

To understand the issue better, it helps if we describe how the system operates. Metrics data consists of labels that are composed of name-value pairs. Data is sent from the client to Grafana Cloud, and it is processed by a Mimir distributor before being written to a Mimir instance. When an error occurs with Cloud Metrics, it is stored for use by the Insights Dashboard. Error messages are also returned as part of the API payload to the client sending the data. It is worth noting that if you are using Grafana Alloy as your telemetry collector it defaults to the OpenTelemetry Protocol (OTLP), which was not affected by this issue. 

When we made the change to the code that handled error messages, this change introduced a race condition in our implementation of gRPC buffer pool. Several things would have to happen for this issue to occur, but in short, if your client had a valid error that the label value was too long, random bits of memory from concurrent metrics ingestion requests in the buffer could be written into your error message if they were in the buffer at the same time the message was being written. This could, under some circumstances, include a small portion of another customer’s metrics data. 

These corrupt messages were only visible to a subset of users in one of two conditions: 

  1. Users who were using the Insights Dashboard 
  2. If your metrics sender was configured to log error messages locally, the messages could be stored in your client error logs 

While metrics data typically pertains to performance metrics (e.g., CPU utilization, memory usage), in certain instances, the error logs may have contained information about the process used to generate metrics. Depending on configuration, this may have included, for example, a URL with a query parameter containing a password, or in some cases, an email address for an alert recipient. Although this is not typically how metrics are gathered and we would not generally expect to see these types of data, some users have used our metrics tool in this manner.

Solutions and mitigations

A fix to the Mimir distributor was implemented on July 31, 2025, and all affected data has been scrubbed from Grafana Cloud.

Recommended action: If you are storing passwords in your metrics gathering scripts, such as embedded in a URL or as part of a query string, we strongly advise you to rotate those passwords

Going forward, Grafana Labs will review and remove some of our gRPC buffer pooling. We will also improve our SAST tooling to better test for these types of race conditions and focus on alternative designs of our queuing process to improve security. While there is no easy toggle for us to turn on to fix these problems, we are an open source company, so you can follow the changes to our code and practices that will improve our platform.

As part of continuing forensic analysis, our security team is examining error messages for any compromised Personally Identifiable Information (PII). At this time only the presence of email addresses and service passwords has been discovered.

We take a multi-layered approach to security at Grafana Labs. We will be reinforcing all of the existing layers in our processes and adding new ones with our continued commitment to security and transparency.

Timeline

All times in UTC 

  • 2025-07-14 12:24 - Grafana Labs introduced a change to error handling in the Mimir distributor 
  • 2025-07-31 17:04 - Grafana Labs was made aware of the memory corruption issue 
  • 2025-07-31 21:02 - Grafana Labs released a patch to revert the change and prevent viewing of erroneous log messages. Log analysis began at this time. Code development to scrub logs was initiated 
  • 2025-08-01 14:27 - Data export for Security team forensic analysis kicked off
  • 2025-08-01 21:46 - Possibility of malformed error messages in client logs was confirmed
  • 2025-08-02 20:14 - Scrubbing of malformed error messages from logs commenced
  • 2025-08-03 00:44 - Grafana Labs completed the removal of erroneous messages from all Grafana Cloud data stores 
  • 2025-08-03 23:30 - Affected Grafana Cloud customers notified
  • 2025-08-03 23:45 - Blog published

Reporting security issues

If you think you have found a security vulnerability, please go to our Report a security issue page to learn how to send a security report.

Grafana Labs will send you a response indicating the next steps in handling your report. After the initial reply to your report, the security team will keep you informed of the progress towards a fix and full announcement, and may ask for additional information or guidance.

Important: We ask you to not disclose the vulnerability before it has been fixed and announced, unless you received a response from the Grafana Labs security team that you can do so.

Security announcements

We maintain a security category on our blog, where we will always post a summary, remediation, and mitigation details for any patch containing security fixes. You can also subscribe to our RSS feed.

Tags