ML-enhanced guidance on SLO target selection
Many teams struggle with picking SLO targets, particularly for new SLOs. The target percentage drives the sensitivity of the burn rate calculations, the error budget remaining, and it can tune alert volume. If you assume you want to create an SLO to ensure “99.5% of HTTP requests return successfully in under 500 ms”, how do you know that 99.5% is a realistic target for your service? People often guess or take a number from management.
Grafana SLO in collaboration with the Machine Learning team is proud to announce a major enhancement to our SLO creation wizard. After defining an SLO, the “step 2” target selection page now shows ML-enhanced guidance to help you assess the risk of breaching a given target. We query 90 days of history from the metrics used in the SLO definition, and run simulations to predict the likelihood of meeting a given target given the history of the metrics. The user can slide the target percentage and see an updated prediction of the likelihood of meeting that target.