The values behind scaling cloud native security at Grafana Labs

Thomas Owen

•

2021-12-21•11 min

On Nov. 8, I started as the new Chief Information and Security Officer at Grafana Labs. In my first five weeks, I’ve met about 100 really amazing people; learned and absorbed early lessons about our workplace culture; kicked off working groups for our 2022 initiatives (bug bounty FTW); and contributed to tackling our first-ever 0day.

Amid all of that, I’ve also been doing a lot of thinking. I’m a big believer in understanding the “why” behind a situation before the what or the how. So one of the first things I’ve been focusing on are the values we’ll seek to create within the security team and support outside of it.

Culture is key. Here at Grafana Labs, we’ve got some pretty unique cultural values. I don’t mean “unique” in the sense of “here are some value statements that we, more or less, aspire to.” I mean unique in terms of reality and outcomes. The culture, so far, appears to be about impact, including through adverse experiences such as the team coming together to manage our recent 0day.*

With that as our foundation, I think we’ve got an opportunity to do something amazing and new with security at Grafana Labs.

Let’s focus on doing things via genuine innovation while also thieving best practices relentlessly from everyone else. Let’s do away with the mistakes and cruft of the last 10 years of cybersecurity.** Let’s structure this team and the way we operate in a fundamentally different way.***

In regards to culture, can we, as it were, out-GitLab GitLab?****

To wit, since we’re building a new security function at Grafana, we should have a manifesto to go along with it.

TL;DR our security manifesto

Bias to action - Do both “something now” and “refine later,” experiment fearlessly, test repeatedly, iterate and fail quickly.
Enable ownership, create accountability - Key DevOps values of ownership and accountability cannot operate at scale without enablement, which is our job.
Serve the user where they live - Don’t expect an engineer to log into a security tool; bring the insights to them to drive outcomes.
Share openly and default to transparency - Grafana has an incredible culture of autonomy, transparency, and collaboration. Alway use the hivemind.
Accurate data, actionable insights - If you have 50,000 findings, you have none. Help teams understand what to focus on, when, and why.
Beautiful experiences - Our tooling and process definition of “done” includes “does this spark joy?”
Dogfood and open source - Solving security at scale is an observability problem. Build our solutions out of and into the Grafana Stack, supporting the open source community. (And don’t consume open source without trying to give back!)
Operational benefit drives compliance - Security first, then compliance.
Security is a product function - If we solve a problem for ourselves, why wouldn’t we also offer the solution to our users and customers?
Minimum viable security controls - Always implement the necessary security controls, only ever implement the necessary security controls.

Bias to action

Even small or low-maturity changes now can fundamentally improve our security posture and the effectiveness of our processes and controls. Whilst our definition of done and criteria for MVP will both be ambitious (see below!) we will not let this get in the way of delivering value incrementally, iteratively, and rapidly.

We will experiment fearlessly without worry over failure. We will test and implement selected tooling and processes in a manner that allows us to grow and mature rapidly, but also fail quickly and at low cost. We want both “something now” and “more later,” and we accept that velocity sometimes means rework or lost effort. Go, go, go.

Enable ownership, create accountability

We believe strongly that risks can only be meaningfully owned when they are owned at the edge, but we also recognize that people need specialist help to feel safe and empowered to make impactful security-related decisions and trade-offs. We will transfer the responsibility for the execution of any control to the edge or end user wherever appropriate, with enablement to make it easy to meet any commitments made. We will remain continually open to feedback and input from end users as to what an “effective control” actually is.

We will actively support and develop autonomy and engineer-led design through a focus on decentralization and creating self-service tools, processes, and experiences. By surfacing risks, recommendations, and suggested priorities based on reliable, largely automated testing and with the creation of recommended security paved paths for development and CI/CD, we will assist engineers and owners with securing their services.

We will curate crowdsourced design guides and definitions of done through open, internal discussion and gladly accept critiques, input, and assistance from any Grafanista who has a passion or the skills to support us. We will develop, support, and evangelize a useful, usable, and efficient approach to design and threat-modeling that will help the teams that use them design secure-by-default features and services.

We will agree on SLOs and KPIs for key security controls with the relevant owners to help them manage up to the agreed definition of “good enough.” We will create systems to monitor conformance against these SLOs and serve the outputs back to the owners where they live, supporting distributed accountability. We will help people feel safe to make important decisions without us.

Serve the user where they live

Security is only one of the non-functional requirements that engineers and R&D teams own. We do not expect engineers and service/risk owners to log into security tools in order to achieve our shared aims. We will abstract our users from the underlying tools and will serve actionable data and insights to engineers and owners in their daily systems. We will dogfood the Grafana Stack wherever possible to achieve this. Our primary intent is for engineers to never have to log into security-specific tooling — unless they want to, in which case we want to ensure they always have access when needed.

Share openly and default to transparency

We acknowledge that we are not the experts in everything and that input from all Grafanistas is the best way to achieve the right balance of security, velocity, and innovation.

We will commit to honest, open, and pragmatic discussions where any Grafanista can comment on, critique, or suggest solutions to problems we’re facing or actions we’re taking. Whenever we start a new project or plan to make a significant change, we will provide an opportunity and encourage open discussion about our plans.

We will publish and regularly update a list of projects we are working on, what their objectives and intentions are, and what our progress is. We invite continual and open feedback based on this list.

Where we are creating a new security control or definition of “good enough,” we will attempt to crowdsource this definition within the boundaries set by immovable forces (i.e regulation, compliance, customer expectations, and contractual commitments) and our own understanding of the threat landscape (where evidence-based). Following a discussion, we commit to disagreeing and committing, as long as you do as well.

Wherever possible we will make our configuration files, metadata, or generated data read-only available to engineers and other relevant roles. We believe that security through obscurity is not security, and that engineers can solve problems faster if they can see the underlying structures. We believe that the security data we generate will be useful to other teams and will work to make this available and composable wherever possible and appropriate.

Accurate data, actionable insights

We will commit to providing, within the capabilities of what is state of the art, the highest quality and most relevant data possible to users whenever we take a data-driven approach to a security control. We will deliver this data to engineers in a clear and composable format. Where we create insights, risks, or recommendations from this data, we will make them actionable and meaningful. We will own and maintain a stateful view of security issues and vulnerabilities at Grafana such that, where an engineer provides feedback, that feedback is retained and acted upon. (i.e., We should only have to mark a vulnerability as “false positive” once for a defined period of time, regardless of how many different tools identify the vulnerability, etc.)

Beautiful experiences

Grafana’s success is in part due to the fact that our products are just so easy and lovely to use. They spark joy. We ultimately want the processes, relationships, and tooling that the security team enables and creates to do the same. We will keep our processes, guardrails, and tooling lightweight, performant, and meaningful. Wherever possible we will invest and prioritize appropriately and work with our internal customers and the product and UX teams to ensure that the changes we facilitate at Grafana are at least invisible, if not actively enjoyable for end users.

Dogfood and open source

We believe that security is ultimately another observability problem. We will attempt to solve every security problem through an observability lens, taking often complex heterogeneous datasets and representing them in elegant, composable ways to help risk owners in agreeing to and managing against security SLOs. We will dogfood relentlessly to achieve this.

Open source software is at the core of modern security at scale. We will both consume and contribute back to open source projects, as well as ideally creating our own new projects under the Grafana umbrella. Where effective, we will either make use of open source or enterprise versions of open source tooling. We will publish our glue code, integrations, alerts, and mixins to allow users and customers to leverage our successes if they are interested.

Operational benefit drives compliance

Unless a given compliance-driven control brings unhinged amounts of revenue (which in itself will help deliver on our value statements) we will focus on security measures and approaches which materially improve the operational efficiency and real security of our products and organization. This in turn will do most of the work to bring us to a compliant posture. Where a control exists for compliance reasons only, we will work to reduce the operational cost of that control outside of Security to near-zero. Compliance exists as a mechanism to validate our controls implementation and to articulate these accurately to customers, which is vital, but will be treated as one user need amongst many.

Security is a product function

Security shouldn’t be an enterprise solution; ours is a big tent philosophy. Whilst we want to cherry-pick from the best of breed vendors who seem to be making siloed and market-grabbing platform plays, we will build on these towards something that serves ourselves and users of all sizes. As we mature we will take a product-led approach to security improvement and feature development, treating all security requirements (customer, internal user, state of the art, risk tolerance, compliance, and regulation) as user requirements.

Minimum viable security controls (MVSC)

We will discuss and agree on a set of risk appetites and related objectives. After synthesizing the needs of Grafanistas, customers, our threat landscape, commercial ambitions, and compliance and regulations, we commit to implementing the minimum viable security controls required to achieve our aims. As we mature, we will seek to make use of a Zero Trust model to only apply the strictest (and at times unavoidably limiting) restrictions where they are specifically required.

Within the context of “disagree and commit,” we will remain open to collaborative challenge and discussion on what the “minimum viable” is for any given situation, environment, action, or context.

We do not want to maintain or support a security control where it does not provide effective security outcomes or otherwise enable revenue. Where a security control is theater for no reason or delivers only noise, we will either improve this control or remove it.

Conclusion

This is ambitious, but it’s a distilled summary of my experience and personal biases to date. I think it’s exciting and something that we can make a reality here at Grafana Labs.

And have I mentioned, the security team is hiring? ;-)

* Caveat - I’ve only been here six weeks but I don’t think this is a honeymoon effect. I’m pretty sure this is something incredible.

** The main thing I ask of any interview candidate is whether they’ve made lots of relevant mistakes previously and whether they will continue to do so while trying to consciously avoid past ones.

*** Privilege-aware statement - We’re a cash-rich and rapidly growing scaleout without legacy baggage. We have a high proportion of technologically skilled and engineering savvy people who are both security-literate and security-engaged. We have well operationalized values of self-organization, autonomy, and distributed ownership in place. I know not everyone has this luck.

**** Meant with love. We are superfans of GitLab’s methodologies. I crib/steal relentlessly from their handbook.

The values behind scaling cloud native security at Grafana Labs