KeyCDN: Grafana Is 'Making Monitoring Amazing Again'
With Growth, A Need for Visualization
"There are many challenges when you are a CDN, as you can imagine," says Baumgartner. "The more you grow your network, the more you need to monitor and observe. You need a tool that allows you to dive into certain things, like forecasting how much capacity you need in specific regions. The visualization is very important."
From the beginning, KeyCDN used a SNMP-based monitoring system, Cacti. But it wasn't ideal. "It always involved a lot of effort and even some development to make customizations in the visualization and also in the collection of the data," says Baumgartner. "Then we found that the project wasn't really updated anymore, and it became harder and harder to maintain things."
In search of a new monitoring solution, the team explored several offerings before deciding to go with Prometheus. "We found that it's much better to have the monitoring done through HTTP," says Baumgartner. "It's fantastic to collect data, and it makes our lives much, much easier."
"It was really amazing to have more flexibility and to have all the same customization options that we had before, but not in the really static and cumbersome way that was hard to maintain."
The Grafana-Prometheus Solution
The visualization provided by Prometheus wasn't robust enough for KeyCDN, so the team adopted Prometheus's recommended solution, Grafana. "Of course the connection between Grafana and Prometheus is excellent," says Baumgartner. "And it's fantastic that Grafana device agnostic; it's not a client-based solution. You can use it on your mobile devices. You don't need an app. It's responsive and very easy for us to use."
The impact was immediate: Suddenly, "we had a much better tool than those old-school MRTG graphs that we had before," says Baumgartner. "It was really amazing to have more flexibility and to have all the same customization options that we had before, but not in the really static and cumbersome way that was hard to maintain."
Adoption wasn't a hard sell within KeyCDN. "It looks so beautiful, of course they love it," says Baumgartner with a laugh. "Everyone also understood the need for a change because we required something better. SNMP monitoring was the traditional choice for network devices, so this was quite a change. People were very used to MRTG graphs, but that change went very smoothly. They loved how they can interact with Grafana, how they can zoom in and out, having different things displayed and customized for themselves."
"What's also really nice is you have the possibility of having different sources of the data. Bringing them together in one visualization had big benefits. We can link things together, which makes troubleshooting and analyzing things so much easier."
Multiple Data Sources, One Visualization
Since the migration, which took a couple of months in 2016, KeyCDN has only seen the scope of its Grafana usage grow. "We now have the ability to organize the dashboards for specific teams, for example, so they have specific metrics which are really relevant for them," says Baumgartner. "What's also really nice is you have the possibility of having different sources of the data. Bringing them together in one visualization had big benefits. We can link things together, which makes troubleshooting and analyzing things so much easier."
KeyCDN now has monitoring built into the network layer, down to the ASICs (application-specific integrated circuit), to observe their thresholds. There is also monitoring of the server health, from the hardware up to the application levels. All told, it's a massive amount of data. Every day, KeyCDN generates about 2 terabytes of log data that's processed, aggregated, visualized, and made available to teams through Grafana. "This makes us really comfortable," Baumgartner says, "because before, we had some custom written things, service visualizations here and there. Now we have something very nice and consolidated. Grafana allows us to have a full picture in one solution."
Baumgartner is particularly gratified that KeyCDN has been able to grow with the solution, which isn't always the case with technology. "For example, support for SQL series came later on, and as soon as we saw it, we started to play with it and integrate it into our solution," says Baumgartner. "There's a lively and active community helping to make the product better. We have seen other solutions that really lost traction in the market, and nobody wanted to maintain them. This is clearly not what we see here."
How Monitoring Is Enabling Expansion
The trust that KeyCDN has in its Prometheus-Grafana monitoring system is enabling the company to keep growing, both in its number of customers and the geographic reach of its network. One top priority this year is boosting performance in Australia and New Zealand. "We improve our latency every day, and we have monitoring for this," says Baumgartner. "Because we are so distributed around the globe, it's essential for performance that we properly keep track of metrics from the network, different ISPs, and our service providers. We need monitoring to make sure we realize when we have a connectivity issue in a specific area in a specific ISP, for example."
Another big project is a buildout in the U.S., which involves replacing the hardware in every location with next-generation switches and servers. "With critical aspects we're trying to improve or change, we can zoom in and put full focus on it," says Baumgartner. "Grafana is so fantastic because we can fully customize it to specific scenarios, which allows us to make sure that even if we are changing one thing, we are not breaking two other things. Availability is one of the most important things for us and for our customers. A CDN has to be up all the time, so we cannot allow there to be a hiccup somewhere."
For the team at KeyCDN, Grafana has been a true game changer. "There was a time when people didn't like to monitor things or take care of it," says Baumgartner. "I think Grafana almost makes monitoring amazing again, because of the beauty of the solution. It looks fresh, it looks colorful, and it's so customizable. Everyone loves it. I would say if we tell the team that we're going to change this, that would be almost impossible. You can see we're very happy!"