After Trying to DIY, Wix Embraces Grafana Cloud
Seeking a Fast, Stable Monitoring Solution
Metrics have always been an important part of the culture at Wix. The company requires a super fast—and dependable—solution to generate good alerts. As Wix's infrastructure has grown, so has the scope of their monitoring and alerting needs. False alerts are painful for any team, and if the underlying database is not stable and available at all times, the alerting data is unreliable. At Wix, “We don't have the bandwidth to maintain it 24/7,” says Alex Ulstein, Head of Monitoring, “so availability was going be a big factor in choosing our new monitoring stack.”
After years of investigating commercial and open-source solutions, including two different stints of running and trying to scale Graphite on their own, Wix discovered Grafana Cloud, a high-performance, fully managed, production-ready Grafana stack that is 100% Graphite compatible. Alex and his team were immediately intrigued. The product checked a lot of boxes for them: It offered the speed, stability and scale they needed to monitor their critical infrastructure. They were also already using Grafana, Alex says, because of the “open source, pluggability, very rich functionality and of course awesome UI.”
Grafana Cloud would allow them to offload the hard parts—managing and scaling the stack. “When we started the search in 2016, there wasn't a TSDB that worked for us,” says Alex. “There was a lot of work required, and it wasn't possible for us to do what the Grafana team has done in terms of scaling, performance and reliability. It doesn't make sense, price-to-performance, to do it ourselves, so we were looking for a fully managed solution from a team that had experience running monitoring at scale.”
"We don't have the bandwidth to maintain it 24/7, so availability was going be a big factor in choosing our new monitoring stack."
People Talking to People
Throughout the process of implementing Grafana Cloud, Wix worked closely with many members of the core Grafana Labs team, primarily through its chat-based support channel. The ability for integral members of both teams to join together in a fully transparent, asynchronous manner allowed for a smooth deployment.
For Alex, this interaction revealed a lot of structural similarities between the two companies.
“At Wix, it's totally the same; we have a flat structure. People talking to people; there are no open tickets,” says Alex. “I appreciate that level of deep knowledge and communication.”
Understanding what Wix valued most helped the Grafana Labs team tailor their support in order to speed up implementation.
"It wasn't possible for us to do what the Grafana Labs team has done in terms of scaling, performance and reliability."
A Tale of Two Users
Wix has two main types of users on their teams. The first group of users are developers who view data from proprietary monitoring systems such as New Relic; the second group is composed of power users who have varying requirements.
For these 250+ power users, their needs are complex. They are spread across multiple teams and have their own specific subsets of metrics that they find valuable. The ability to create custom queries and dashboards is critical for these users. Grafana provides the power and flexibility to create their metrics the way they want to see them, no matter where the data lives. Much of the value Wix gets from Grafana is the ability to show different data types together in one dashboard. For example, Wix combines Grafana Cloud, Elasticsearch, Cloudwatch, and MySQL data on the same dashboard, seamlessly blending business intelligence client data with server metrics.
Even the less frequent users find Grafana extremely valuable because so much attention has been focused on the user experience. For these users, Wix has created a set of templates as a starting point, where queries can be adjusted and display settings and alerts can be tailored to visualize the data they find important to complete their tasks. Alex describes this shortcut as “an extremely fast way to give less-experienced users enough speed to do their jobs effectively.”
An Eye Toward the Future
Now that Grafana Cloud is up and running at Wix, “stability of the platform is our primary focus,” says Alex. “But I am always thinking about how we can improve our monitoring. Prometheus and Kubernetes are interesting now and will continue to be in the future. Another key consideration is finding a way to increase the frequency of my collection so I can get alerts and address any performance issues faster.”
With Grafana as the front end, the Wix team know they have that ability to experiment with new tools—without having to sacrifice the stability and speed of their primary TSDB.
During implementation, the many active conversations with multiple members of the Grafana Labs team helped Wix tune their relays and other infrastructure to get the most out of Grafana Cloud. And as Wix continues to evolve their monitoring stack, Grafana Labs will keep those conversations going. It's the Grafana Labs way: People working together, creating the best data back end while allowing them to experiment with new open-source tools and technologies.