Grafana and NGINX are partnering to give the open source community a turnkey experience for visibility

Published: 8 Jul 2020 RSS

Over the past few years, NGINX users have naturally gravitated toward Grafana, and vice versa. These days, it’s not uncommon to see these two open source tools used together in the wild.

And for good reason. F5, which acquired NGINX last year, is prioritizing building visibility across the entire product set, to make it easy for customers to quickly gain the insights that they need. Meanwhile, Grafana has evolved into the primary visualization and analysis tool in the open source market.

“People in the community are oftentimes piecing together their observability platform, recreating dashboards, and setting up Prometheus exporters from scratch,” says Grafana Labs CEO Raj Dutt.

“We’re recognizing that the community has done this in an organic way,” adds Michael Wiley, VP and CTO, Applications at F5. “What we’re looking to do is enable the community to do even more.”

To that end, the two companies have launched a partnership to integrate the tools more tightly. “We’re going to give the community a very good starting point based upon the things that have already been done, and what we can do as two companies together,” says Wiley.

With Grafana Labs investing heavily in Prometheus and Loki, and NGINX’s development of a Prometheus exporter that integrates easily into Grafana dashboards, “Now is the time where we can create a really powerful kind of turnkey experience for our mutual users and customers,” says Dutt. The goal? “I think we can reduce times to value, reduce friction, and provide building blocks that make it much easier for people to more quickly get a complete picture from NGINX using Grafana, using Prometheus, using Loki,” he says.

The NGINX-Grafana integration

Here’s what the NGINX-Grafana partnership enables, according to Wiley:

Since NGINX is usually front-ending applications and infrastructure, it is key to have metrics from this part of your application stack. Being able to combine the information from multiple NGINX nodes, as well as from the rest of your application stack, in a single dashboard with Grafana allows you to easily monitor the health of your overall environment. NGINX OSS exposes a few metrics via the stub status module, but with the flexibility of the NGINX logging ability combined with Grafana’s Loki project, you can get much more insightful information about what is happening in the environment. NGINX can simply send its logs via syslog to the Loki endpoint, saving the overhead of writing the logs to disk. By adding the extended metrics provided by NGINX Plus, you can see even more granular information and identify possible issues before they become outages. On top of the 6 metrics exposed in OSS, Plus includes an additional 90+ metrics that can be associated with different parts of the NGINX configuration. NGINX Plus also has official support for OpenTracing for even more in-depth metrics. While the OpenTracing module is a public OSS project, NGINX provides the module pre-built by the NGINX experts in their package repositories to make the process as smooth as possible.

One exciting development to look out for: Loki will soon give users the ability to generate ad hoc metrics directly from NGINX logs in real time and visualize them in Grafana. Here’s a sneak preview created by Grafana Labs Solutions Engineer Ward Bekker:

A conversation about observability

Michael and Raj recently got on Zoom to discuss the future of the NGINX-Grafana partnership. Here’s an excerpt of their conversation.

What’s behind the affinity between Grafana and NGINX?

Michael: One of the challenges that NGINX in the open source community has had is to provide a robust and verbose instrumentation and environment for measuring, monitoring, and paying attention to all of the activities going into the web server into the application itself. We’ve left it open to the open source community to decide how they’re going to go achieve that. With the Prometheus exporter, we think Grafana is well-positioned to make that whole bundle look very complete, through either the open source Grafana service or the managed services stack.

Raj: There are two angles that I see. One is around the Prometheus story with some of the work that NGINX has done, exposing Prometheus’s metrics directly from their software. And obviously, Grafana is really involved in the Prometheus project, and we’re the preferred visualization layer on top of Prometheus. The other story is NGINX, since day one, has been all about web server logs, and analyzing those logs is really important as part of your observability story. There’s still a really deep level of insight that you can get from the logs that you’ll never be able to get from metrics. And with what Grafana is doing with Loki, I think there’s a really good fit there.

So why are you partnering now?

Michael: We believe that the evolution of Grafana over the past 6-12 months – not only on the core Grafana stack, but also the addition of Loki – ties all of the observability together quite well. It’s the deep integrations with Prometheus on the metric side, the capability to instantaneously visualize or templatize visualizations, as well as integrate logging signals out of the NGINX platform into Loki – and then to have that all as a single pane of glass to visualize not only the performance metrics from your application server, but from the database, compute platforms, and system services creates a powerful solution for operators.

Raj: All the building blocks have been there, but now, they’re really aligned. And I think we can reduce times of value and reduce friction and provide building blocks that make it much easier for people to more quickly get a complete picture from NGINX using Grafana, using Prometheus, using Loki. I think a lot of this has already proved out by how people have been using it organically, so it’s both timely and relevant now to package it all up and take a lot of the work that the community’s done and just make it more consumable.

What are the main benefits of observability that you feel are important to your users?

Michael: I believe observability is critical to everything that you do within a platform, and providing extensible or exhaustive level of metrics, events and logging is paramount. From an F5/NGINX perspective, we’re continuing to identify areas of opportunity to address richer signals that you would want to see out of the performance of applications as they reside on the platform. We recognize that for the 450+ million deployments of NGINX, we want to be able to continue to provide a significant amount of visibility and observability to the open source community. As we continue to make NGINX robust from a functional perspective, we will press more on the observability focus. Through the Grafana partnership, a lot of capabilities in the open source stack, re: visibility, is realized in an almost “out-of-the-box” native way.

Raj: Every user of NGINX – and Grafana for that matter – really cares about three or four main things. They care about performance, they care about availability, they care about security, and everything that they observe or want to monitor, at some level, boils down to that. There are so many different aspects of what NGINX does, so there’s this incredible wealth of data and insight that’s available at the NGINX layer. The more that we can expose all of that, the better. Without exposing that, it’s very hard to deploy any of this stuff and maintain it in production – it’s impossible. So we really just want to lower the bar. And so, whether it’s 99P or distribution of load across load balancers, or security incidents, all this functionality is being provided, and without observing it, it’s unmanageable.

Where do you see the industry going?

Michael: If you look at the APM industry, there’s a lot of consolidation, a lot of acquisitions, a lot of movement around what generally is termed “AI ops,” which really means automating operational tasks through intelligent signaling. I think, in general, the industry is always trying to drive more efficiencies through better signal. I do believe Grafana, along with Prometheus and Loki (the best-in-class stack), will provide extensive visualization and actual insights toward the end user, operator, other teams to give that kind of tightly integrated, cohesive visibility needed to take either automated or other manual actions.

Raj: I think open source is winning and is going to continue to win. That’s why I think tools like NGINX and Grafana are continuing to see adoption accelerate, particularly within enterprise. Open source tooling such as Grafana, Prometheus, Loki, and NGINX has evolved over the last 10 years from being the cheap and cheerful alternative to now being where all the cutting edge action is happening. And that’s also a function of developer mindshare being so important and developers being empowered to make decisions about how they run their stacks.

I think that as people move to things like Kubernetes and do increasingly distributed systems, those signals are coming from all over the place, and in so many disparate different sources. And all this data lives in different places, and it comes from different places. So in order to make sense of it and in order to actually create an observability strategy that works, you have to acknowledge that. And it’s not just about a few checks here and there. The volume of signals is overwhelming. The number of places it’s coming from is overwhelming. So you really need to normalize everything onto a metrics, logs, tracing kind of platform. And you need to correlate what’s going on between all your different systems that are very distributed, oftentimes ephemeral, and that’s really challenging. That’s why it makes sense to think about these things as a metrics backbone or logging backbone, where you can pipe everything into.

Michael: I think the enterprise feels challenged with the massive amounts of data, signaling, and observability efforts. The data that they have to consume is large and complicated. Effectively they have 13 screens that they have to pay attention to today from a vast amount of vendors. Why build a 14th? You solve this by taking accurate, actionable signals and bringing them closer together, with a common consumption model that is easy for all teams (AppDev, SRE, Network, Database, etc.) to build the proper visualizations and advisories. We recognize that NGINX is just one part of the overall ecosystem of application delivery. Grafana brings together all of the vendor/data signals into one consumption visibility model. We’re in a good spot to bring this key, best-of-class tech stack together and make it useful for our customers.

Michael, can you share your perspective on observability as an end user prior to coming to F5?

Michael: Prior to F5, at JPMorgan, we were building the fabric to ingest all of that data that exists in order to make critical business decisions and application decisions over time, as well as computing decisions, capacity decisions, security decisions, etc. The view of a customer is: “How do I bring all of this data together in a more aggregated view, and build capability within my infrastructure to give the right level of information to the right teams in a timely manner, that’s of high quality and accuracy?”

So everybody is trying to solve these problems. It feels as though the market continues to bring point products to deal with each particular vendor. The real question and reality is who’s bringing all those things together? There needs to be what I call a fabric, which you can kind of think of it as a platform or a way to ingest all of that data, consume it at scale, derive the right insights out of it, build the right visualizations, and notify the right people to take their correct and right actions. NGINX and Grafana can bring a lot of that together. NGINX is in the data plane. Grafana sits as an ingestion and consumption fabric and tool and service.

All of this together, it looks really, really good not only as an open source play, but also as a commercial play. You know, collectively, we’re in the middle of a lot of the Kubernetes stacks; we’re in the middle of the application delivery platform. We’re a transparent proxy service. We’re an API gateway service. We’re a security service. So, as an end user, I know where to put NGINX. The challenge I have is I have to go get stuff from Cloud Foundry, Cassandra, Kubernetes, Hadoop, AWS, Google, Azure, and I have to somehow merge them and bring them together for a single view of how my application is working today. This is what we attempt to accomplish within NGINX and Grafana: to actually help the end user and the application teams in the visualizations that they need. Otherwise, they’re going to try to build their own. They’re going to try to stitch it together, and it’s not going to be part of a cohesive ecosystem.

Each organization’s needs are going to be different. What’s your philosophy about who owns the observability strategy?

Raj: It kind of ties in with what Michael just said about these individual point solutions and the elusive single pane of glass, which is kind of the biggest, longest-running, ongoing scam in IT. You can’t really achieve that by introducing another point solution; it’s never going to happen. So you have to create this observability fabric, as Michael put it, or platform. And if you do that in a way that is very interoperable, very composable, you break down silos within an organization, you bring together all the other tools, and you connect to the data wherever it lives, even if it doesn’t live on this platform.

And that’s what Grafana is really good at, because even if the data doesn’t live in Prometheus – let’s say it lives in CloudWatch or Azure Monitor or Google Stackdriver – we can just go get it. And whatever choices that that customer has made, we respect those choices. Every company’s observability strategy is different, and they need to own it.

Any last words about the partnership?

Michael: If people are not running Grafana in their business, they should consider it. Combining NGINX, BIG-IP, and Grafana gives you a level of visibility and insights into your application that might otherwise take many hours of effort to build on your own.

Raj: I just want to reiterate the point that we started with: The community is really excited about the combination of these things and has already laid a lot of the foundation that’s necessary. So for us, it is just about amplifying what they’ve done, packaging it up, and delivering an experience that allows them to continue to innovate on top of it.

Related Posts

It's a wrap! GrafanaCONline ended on Friday with sessions on documentation and the business and people of Grafana Labs. Here are all the highlights of the past three weeks.
At GrafanaCONline, Grafana Labs Senior Software Engineer Ed Welch detailed how he gets the most out of his Nissan Leaf battery using Grafana, Cortex and Loki. Also learn about creating a Raspberry Pi-based desktop Kubernetes cluster.
At GrafanaCONline, Grafana Labs co-founder Torkel Ödegaard gave a full demo of Grafana 7.0, and software engineer Goutham Veeramachaneni discussed the future of Prometheus. Also learn about how Grafana disrupted the oil and gas industry and what observability tools every company needs to succeed.

Related Case Studies

DigitalOcean gains new insight with Grafana visualizations

The company relies on Grafana to be the consolidated data visualization and dashboard solution for sharing data.

"Grafana produces beautiful graphs we can send to our customers, works with our Chef deployment process, and is all hosted in-house."
– David Byrd, Product Manager, DigitalOcean

Hiya migrated to Grafana Cloud to cut costs and gain control over its metrics

To scale Prometheus, says Senior Software Engineer Jake Utley, Grafana Cloud was ‘the most in line with what we wanted to accomplish.’

"We wanted the ability to look at our own information and understand it from top to bottom."
– Dan Sabath, Senior Software Engineer, Hiya

How Cortex helped REWE digital ensure stability while scaling grocery delivery services during the COVID-19 pandemic

Cortex’s horizontal scaling has been crucial; reads and writes increased significantly, and the platform was able to handle the added load.

"We wanted a software-as-a-service approach, with just one team that provides Cortex, which can be used by all the teams within the company."
– Martin Schneppenheim, Cloud Platform Engineer, REWE digital