It all started with a Twitter conversation. As the site reliability engineering community came together on social media to discuss elevating the level of customer experience and satisfaction between engineers and product managers, they also realized it was time for the first-ever virtual conference dedicated to service level objectives (SLOs).
The Service Level Objective Conference, a.k.a. SLOConf, brings together site reliability engineers to discuss topics ranging from the basics of SLOs to the complex job of converting non-engineers into SLO believers. Grafana Labs team members will be among the conference speakers talking about a wide array of topics in on-demand video sessions that will be available to registered participants on the SLOConf site from May 17-20.
Here’s an overview of what Grafanistas will be sharing:
Infrastructure comes out the wall; no one cares how
Grafana Labs' Director of Community Richard “RichiH” Hartmann introduces a simple way to think about your services: Almost no one cares about how you run your services and how they work internally — until they do not work. Your water, electricity, and Internet come out the wall, and if they stop doing that, that’s when you call someone to complain. RichiH will do a deep dive into this idea and how it can help organizations shape their SLOs.
How to make non-engineers care about SLO
Richard “RichiH” Hartmann brings non-engineers into the spotlight by focusing on why SLOs really matter to everyone — not just a select few. SLOs internalize an aspect normally externalized in contracts: Quantitative, objective measurements agreed upfront and in writing between all stakeholders. This takes away several problem domains and frees up everyone to concentrate on where they provide value instead of on infighting when something goes wrong. Aligning intrinsic motivations across the org while following a common reporting scheme is the dream of all VPs and Cs. We will be looking at how to frame SLOs in a way that should make everyone care. Both in the abstract and in specific examples of success.
Production readiness review
Site Reliability Engineer Milan Plzik takes a closer look at how the Production Readiness Review process can strengthen confidence in the defined SLO by seeking to identify and remove common pitfalls and already-learned mistakes. In addition to this, Grafana Labs will soon be publishing their own Production Readiness Review checklist to help those interested in kickstarting the PRR process at their organization.
Should SLOs be request-based or time-based? (And why neither really works…)
Principal Software Engineer Björn “Beorn” Rabenstein explores the pros and cons of both request-based and time-based SLOs. Those familiar with SLOs probably realize that time-based SLOs aren’t really fair for most users. It doesn’t help if your ISP gives you perfect connectivity while you are asleep but always goes down during the important weekly video conference. In other words: A time-based SLO means free uptime whenever your service isn’t used. Clearly a request-based SLO is much better: It measures what matters, and now an outage during peak time will consume your error budget much more quickly. Sadly, that’s not the end of the story. A request-based SLO can be misleading, too. Beorn highlights a few common scenarios to see how a request-based SLO sometimes exaggerates and masks problems with a service — and what you can do about it.
With three Grafanistas and four topics selected for SLOConf, we’re proud and somewhat humbled to say that Grafana Labs has the largest speaker presence at the event. We see this as a testament to our passion in this space. If you’re similarly passionate about making services and infrastructure measurable, we’re always hiring.