How we're using 'dogfooding' to serve up better alerting for Grafana Cloud

Published: 28 Jul 2020

At Grafana Labs, we’re big fans of putting ourselves in the shoes of our customers. So when it comes to building a product, dogfooding is a term we throw around constantly. In short, what it means is that we actually use the products we create throughout their entire life cycle.

And I really mean the whole life cycle. Early in the development process, we’ll start using a product internally before it’s “production-ready.” Then we’ll spend time modifying existing processes in order to include the new element, and finally, we’ll make sure to proactively give feedback to the team in charge.

We like to think dogfooding is our superpower here at Grafana Labs.

So where am I going with all of this? Well, I want to tell you about how we’re transforming the Grafana Cloud alerting experience into something awesome. Keep in mind that we can’t share all the details about it just yet (when we can, we’ll make sure to tell the world about it – and I’ll be back here with a tutorial). So for now, I want to focus on the process instead of the product.

Talking abstractly isn’t going to help anyone, so let’s dive down into the details.

Running it like our customers

When we say we run a product the same way as our customers, we literally mean a 1:1 copy; no shortcuts, no hiding it as a separate part of our infrastructure. We go out of our way to make sure the new product is part of our workflow. If necessary, we’ll replace all or part of a process to include the new product – and we’re happy to do it.

In the process of revamping Grafana Cloud alerting, we encountered a set of use cases we never would have discovered unless we used the tools ourselves. As a result, we created a GitHub action and CLI tool to help our customers mimic the setup we have.

Of course, this comes with the downside of having a higher cognitive load due to the rate of change. But on the bright side, this makes us more adaptable as engineers and a company. In fact, at Grafana Labs, we like to embrace the idea that change is the only constant.

Today, our internal alerting and rule evaluation workflow is the same as the one our Grafana Cloud customers use, and we’re constantly making improvements based on what we’re observing from its regular usage. (Find out more about our 30-day Grafana Cloud trial here.)

Controlled rollout

Early in the process of developing the shiny new Prometheus-based alerting for Grafana, we figured out the different ways we wanted to give people access to it. We started with different internal teams, then selected customers, and finally, we’ll do an all-customer rollout. With this approach, we ensure quick, iterative feedback as everyone starts to get familiarized with the new product.

It’s important to note that we’re not talking about single-version access. Instead, our plan for the short-term includes rolling updates directly from the repository as quickly as possible while the plugin remains under constant development. There will be a pinned stable release in the long-term once the product has a regular release cadence.

What’s next?

Building products and using them is at the heart of what we do here at Grafana Labs. It keeps everyone engaged in the process and makes the early phases of product development as exciting as the releases. Not only does the feedback we get along the way push us towards a better product, it also keeps the development life cycle interesting and fun.

As much as we love integrating new products into our internal work, introducing them to the public is what’s important. I’ll be back post-launch to talk about our improved Grafana Cloud alerting experience and share setups and examples that we know you’ll find useful . . . because we’ve tried them ourselves.

Related Posts

At PromCon Online, Richard Hartmann, a Prometheus maintainer, spoke about recent developments within the open source project and what’s coming next.
People in the community have long used Grafana and NGINX together. A new partnership is focused on delivering an experience that allows them to continue to innovate on top of the tools.
Ganesh maintains the Prometheus storage engine, TSDB.