Note: We released fixes for CVE-2021-41090 and CVE-2021-43798 within 24 hours and mixed them up in one of the three blog posts. To make it clear: CVE-2021-41090 is for the Grafana Agent and CVE-2021-43798 is for Grafana the software. Only CVE-2021-43798 was a 0day exploit.
Note: A previous version of this blog post included the wrong CVE. It has been corrected. The timeline has also been updated.
We wanted to blog about it again today for two reasons: First, we wanted to give more insight into our timeline as external events forced us to deviate from our planned timeline. On the “positive” side, we had our first 0day!
Second, given the AWS outage yesterday, we wanted to re-amplify the message that all users should upgrade their Grafana 8.x instances as soon as possible.
We are releasing this update in coordination with Jordy Versmissen, the researcher who notified us of the vulnerability. You can find the story of how he found the vulnerability in his blog post.
Please upgrade your Grafana 8.x instances
Again, we are re-releasing this information largely to raise awareness and get all users to upgrade their Grafana 8.x instances. If you have any of those, please UPGRADE RIGHT NOW. This blog post will wait for you until you’re done — promise!
For a description of the vulnerability and download links, please see our advisory.
Our internal vulnerability handling procedure follows these steps:
- Discovery / reporting
- Confirmation & assessment
- Incident declared
- Mitigate vulnerability in Grafana Cloud
- Creation of release plan
- Deployment of fix on Cloud (if applicable)
- Limited release for customers
- Public release
- Postmortem finalized
And we happily followed these steps until yesterday, when we realized that a 0day was out in the wild.
More detailed timeline
It’s important to note that we’re listing this timeline in the spirit of full transparency. We also strongly believe in blameless postmortems, internally and externally. Mistakes happen, in particular when you’re happy and excited. The good thing about mistakes that are treated respectfully and transparently is that the maximum amount of people can learn from them.
As such, onwards! All times in UTC:
- 2021-12-03 02:51: We received and escalated a ticket at email@example.com
- 2021-12-03 07:57: Vulnerability confirmed
- 2021-12-03 08:41: Incident declared
- 2021-12-03 08:42: Grafana Cloud confirmed not to be vulnerable
- 2021-12-03 08:42: Jordy tweets and deletes about “read arbitrary files on the host, no authentication needed” (Editor’s note: We were not aware of this until 2021-12-07.)
- 2021-12-03 09:37: v8.0.0-beta1 to v8.3.0 are affected
- 2021-12-03 09:57: Identified PR introducing vulnerability
- 2021-12-03 09:58: CVE Request sent; Provisional assessment 7.5 High CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:N
- 2021-12-03 10:57: Root cause found
- 2021-12-03 02:49: Release plan set: 2021-12-07 for private customer release, 2021-12-14 for public release (Editor’s note: We never release on Monday or Friday to spare the weekends of customers and users. Customers get a week to upgrade under strict embargo.)
- 2021-12-03 18:13: Patches ready
- 2021-12-06 11:26: Second report about the vulnerability received. Follow-up determines that it’s coming through a bug bounty program. Jordy confirms that he submitted to a bug bounty program in parallel to us
- 2021-12-07 09:04: We become aware of public discussion around the vulnerability. This is now a 0day
- 2021-12-07 09:07: Emergency release plan started
- 2021-12-07 09:07: Decision to maintain complete radio silence to avoid increasing visibility
- 2021-12-07 14:18: Private release with reduced 2-hour grace period, not the usual 1-week timeframe
- 2021-12-07 16:14: Full public release
The vulnerability Jordy found is one of the problem types I personally like the most (at least when not directly affected myself): The obvious-in-hindsight ones. It took someone to do the work, and once they’ve done the work, others will claim how trivial it all is. But it did take someone to do the actual work first.
Unfortunately, in this case, this meant that once people knew what to look for, find it they did — and quickly. We will add boilerplate language asking for radio silence to both our reporting how to guidelines and to our firstname.lastname@example.org response and will incentivize this with a formal bug bounty (see below).
We had to build an impressive amount of release artifacts in a rather short amount of time. Eight releases, four private and four public ones, multiplied by all the platforms and deployment models we support. In total, we ended up releasing dozens and dozens of full artifacts within mere hours. Plus, we had some build failures during release. We will have a release engineering sprint within the next few weeks to allow us to seamlessly build private releases and to massively speed up overall release build speed.
We are already working on establishing a bug bounty program for Grafana Labs. While it will take us some more time to release it, if we had it in place by last Friday, we would not have inadvertently created an incentive for Jordy to submit to other third parties. We strongly believe that researchers should be able to live through their work and will introduce a bug bounty program soon. If you have opinions on bug bounty programs, please see my Twitter poll.
We would like to thank Jordy Versmissen for finding the vulnerability and alerting us to it.
We are currently working on a full postmortem and will publish it on our blog as soon as it’s ready.