Why companies choose Adaptive Metrics and how they save time and (a lot of) money
Let’s cut to the chase: Managing metric volumes at scale is hard. In fact, when we asked the open source observability community about their biggest concerns in this year’s Grafana Labs Observability Survey, the top four responses — cost, complexity, cardinality, and signal-to-noise ratio — can all be tied back to exponential growth in telemetry data.
To help ease those problem areas, we built Adaptive Metrics, a feature in Grafana Cloud that helps cut costs by aggregating unused and partially used metrics into lower cardinality versions.
Adaptive Metrics has delivered a 35% reduction in metrics costs on average for more than 1,200 organizations. With 1,400 stacks performing aggregations and 350 million total aggregated series, the results are substantial, impactful, and can directly benefit your bottom line.
But for the most successful observability teams, these savings aren’t the endgame—they’re just the beginning. In this blog, we’ll show you how four Grafana Cloud customers are taking advantage of Adaptive Metrics and the savings it affords them to:
- Improve and further invest in their observability stack
- Focus more on areas that help drive business value
- Help developers act faster and more effectively
- Take a more proactive approach to incident management
Invest savings into improving observability insights
With an astounding 75 million active series, SailPoint needed to find a way to reduce their metrics volume without disrupting their developers’ experience. Thankfully, the identity security company was able to do just that with Adaptive Metrics, reducing their metrics volume by 33%.
“The great thing with Adaptive Metrics is … it just tells me exactly what I’m looking for. If I want the metrics that we’re not seeing usage on, the tool gives me that information a lot quicker than having to go dig for it,” said Lydia Clarke, a DevOps engineer at SailPoint.
But SailPoint didn’t just see the savings as a way to reduce their bottom line. Instead, they used that money to invest deeper into observability, adopting more Grafana Cloud offerings and enhancing their overall monitoring capabilities.
In addition to using Grafana for visualizations and Grafana Cloud Metrics for storage and management, SailPoint is looking to extend their stack to include Grafana Cloud Frontend Observability for real user monitoring.
“We’re in the middle of our observability journey, but we’re just at the beginning of the relationship with Grafana Labs,” says Omar Lopez, head of the observability team. “We’re exploring more and more of the features and services in Grafana Cloud. It’s like being a kid in a candy store — we’re looking at everything.”
Read more about how SailPoint’s prioritization of high-value metrics has translated to investments in a broader and more comprehensive observability stack.
Spend time and effort on areas that drive value
When you understand which metrics are essential to your operations, you’re better equipped to allocate your resources. Grafana Cloud provides clear insights into metric usage, helping you distinguish between high-value metrics that drive business decisions and low-value metrics that inflate costs without adding significant value.
That makes a lot of sense in theory, but it can lead to plenty of trepidation in reality. For example, engineers at TeleTracking, an integrated healthcare operations platform provider, were skeptical when the observability team broached the idea of eliminating unused metrics. The fear was that removing or aggregating metrics might disrupt critical services or lead to blind spots in their observability. Those concerns were quickly dispelled when they saw the results of the Adaptive Metrics recommendations.
“Some engineers balked at the idea at first, but once they started using the key labels and standing up dashboards and easily writing alerts, they were quick to see the value,” says Oren Lion, Director of Software Engineering, Productivity Engineering at TeleTracking. They used Adaptive Metrics to revert to less verbose metrics by using aggregations, reducing their spend on Grafana Cloud Metrics by 50%.
These savings have been instrumental in helping TeleTracking grow efficiently without blowing up their metrics and costs.
“We’re no longer just keeping the lights on and putting out fires,” Lion says. “Today, we’re being proactive, finding new ways to support the business, and helping our developers do their jobs easier, faster, and better.”
Read more about how TeleTracking built a better observability platform with Grafana Cloud.
Empower engineers to make smarter, quicker decisions
Dell Technologies faced significant challenges with its legacy observability tool, which led to alert fatigue and a burden on its infrastructure. After migrating to Grafana Cloud, they leveraged Adaptive Metrics to tackle their SNMP data.
“Adaptive Metrics takes those metrics that you never look at, that you have no alerts on, and no dashboards related to, and says, ‘Why are you sending these to us? You don’t use them. Save yourself some money.’ So that’s what we did,” says Brian Murphy, Technical Staff SRE at Dell Technologies.
Dell’s SREs wrote rules to stop shipping unused metrics, which resulted in substantial savings, but it also led to greater efficiency for engineers. Instead of focusing on the metrics that are not working for them, they could focus on the ones that do, which improved their decision-making processes and overall productivity.
“For the engineering side, it’s great, too, because you don’t have to look at junk that’s not useful. If we can get rid of it, hide it away, and not even have to look at it, then it’s so much better and easier to work through,” Murphy says.
Watch Murphy’s presentation at ObservabilityCon on the Road to learn more about how Dell focused on higher-priority areas with Grafana Cloud.
Cut out noise automatically to focus on incident response
For Mux, which operates an API-first video platform, the burden of managing their own observability stack meant that engineers were mostly spending their time reacting to alerts and keeping everything up and running.
“Grafana Cloud probably saves us hundreds of engineering hours a year. Our platform engineers don’t have to manage the stack any more, and our product engineers don’t have to work through multiple observability tools, which used to really slow down our response times,” said Ryan Grothouse, VP, Engineering at Mux.
Since moving to Grafana Cloud, they’ve reduced their metrics volume by 60% and they have been able to expand their metrics retention time from 14 days to 13 months. This has helped Mux reduce noise, improve long-term analysis, and take a more proactive approach to incident management.
Moreover, they’re relying on the automated functionality of Adaptive Metrics to help identify new areas of improvements over time.
Adaptive Metrics is an amazing feature. It not only saves us hundreds of thousands of dollars a year, but it’s also a forcing function for us to look closely at our metrics to find additional opportunities for time series reduction and cardinality improvements."
— Kyle Weaver, Mux Staff Software Engineer
Learn more about how Mux is using Grafana Cloud and Adaptive Metrics to improve incident response and productivity.
Grafana Cloud is the easiest way to get started with metrics, logs, traces, dashboards, and more. We have a generous forever-free tier and plans for every use case. Sign up for free now!