Ask Us Anything: Should I Run Prometheus in a Container?

Published: 7 May 2019 RSS

At Grafana Labs, we field questions about best practices from customers all the time. One company recently asked whether it should run a containerized Prometheus environment rather than a VM-based one. We thought we’d share our answer here too.

So: Should you run Prometheus in a container?

If you’re monitoring services in Kubernetes, you probably want to run Prometheus in Kubernetes, and therefore as a container. This is because Prometheus needs to be able to directly connect to every target to scrape metrics. Kubernetes gives every Pod a unique IP address, but typically these are only accessible within the Kubernetes cluster.

In general, you want your Prometheus servers to run as close to your services as possible. If you have multiple private networks that can’t talk to each other, then you will most likely need multiple Prometheus servers, one for each private network – or two for HA.

Beyond that, Prometheus is a single binary with no dependencies. It runs equally well inside or outside a container. Prometheus scales “up” very well, and for large Kubernetes clusters, it’s common to dedicate an entire node to Prometheus – even if it’s still a container.

Some of the other components in the system, such as node_exporter (for node-level metrics), are a little trickier to run in containers. They need direct access to lots of kernel interfaces. So in those cases, running directly on the host may be preferable. We run them as a DaemonSet on our Kubernetes cluster, with some special config to map through the right interfaces.

Now comes the question: Who monitors the monitor? What if the Kubernetes cluster is facing issues, and Prometheus can’t notify you of the issues? In that case, you should make Prometheus emit an alert that always fires (a.k.a. a Dead Man’s Switch), and be notified by an external service if the alert hasn’t fired for long. This makes sure that Prometheus is working correctly and the entire alerting pipeline is working.

Got a question for us about monitoring best practices? Email us at help@grafana.com.

Find more blog content on Prometheus here.

Related Posts

The KubeCon + CloudNativeCon caravan heads back to Europe this month, bringing an expected 10,000 cloud native enthusiasts to Barcelona’s Fira Gran Via. Already registered and packed your bags? Here’s where you will find Grafana Labs team members during the conference.
The rest of the city may still have been in a post-Oscars haze, but nearly 300 monitoring mavens gathered in downtown L.A. bright and early on Feb. 25 to kick off GrafanaCon 2019.
Instead of grepping through logs, Grafana Labs Software Engineer Callum Styan explains how Loki makes log aggregrgation for incident investigation easier.

Related Case Studies

DigitalOcean gains new insight with Grafana visualizations

The company relies on Grafana to be the consolidated data visualization and dashboard solution for sharing data.

"Grafana produces beautiful graphs we can send to our customers, works with our Chef deployment process, and is all hosted in-house."
– David Byrd, Product Manager, DigitalOcean

Hiya migrated to Grafana Cloud to cut costs and gain control over its metrics

To scale Prometheus, says Senior Software Engineer Jake Utley, Grafana Cloud was ‘the most in line with what we wanted to accomplish.’

"We wanted the ability to look at our own information and understand it from top to bottom."
– Dan Sabath, Senior Software Engineer, Hiya

How Cortex helped REWE digital ensure stability while scaling grocery delivery services during the COVID-19 pandemic

Cortex’s horizontal scaling has been crucial; reads and writes increased significantly, and the platform was able to handle the added load.

"We wanted a software-as-a-service approach, with just one team that provides Cortex, which can be used by all the teams within the company."
– Martin Schneppenheim, Cloud Platform Engineer, REWE digital