A few weeks ago, I teamed up with Bartek Plotka, a principal software engineer at Red Hat, for a deep-dive session on Prometheus at KubeCon + CloudNativeCon EU. We covered a lot of topics, with highlights that included scaling Prometheus, remote-write and metadata. We ended the talk with a quick demo on how to import data from CSV files into Prometheus.
I want to use this blog post to provide more insight into the state of backfill in Prometheus. But first, I want to clarify what backfill is: It’s not a way to push real-time data into Prometheus, but rather a way to bulkload a lot of older data.
There are two kinds of work-in-progress backfill implementations, and I hope both of them will be merged quite soon!
Retroactive rule persistence
Currently, the oldest open issue in Prometheus is titled “Persist Retroactive Rule Reevaluations," and I want to explore that a bit here.
In Prometheus, recording rules are just normal queries that are run periodically (default 1m) and whose result is stored in another metric. This allows you to load dashboards quickly and put a lower load on Prometheus.
For example, if you have a seven days long dashboard which has queries that do a
sum(high_cardinality_metric), and if the
high_cardinality_metric matches more than 100K series, you’ll be loading a lot of samples for the past seven days and computing the sum. It would be even worse if there’s a refresh period on the dashboard, as it will put continuous load on Prometheus.
To offset this, we can have a recording rule in the form of
job:high_cardinality_metric:sum = sum(high_cardinality_metric) which would run every minute. It would only consider the most recent points and not the points over the last seven days.
In the dashboard, rather than using the query above, you’d use
job:high_cardinality_metric:sum, which will return just a single series over seven days. It’s much quicker and puts a far lower load on Prometheus.
Recording rules are used heavily in Prometheus and are one of the most popular features. But they are not without drawbacks.
In most cases, when you create dashboards or instrument your services, you wouldn’t know the cardinality or query load beforehand, so you wouldn’t start out by writing recording rules. Only after you create the dashboard and it’s under use would you notice the need for recording rules. However, when you do add them, the new recorded metric would only exist from the time you add the recording rule. This means you can’t just change the dashboards the moment you add the recording rule – what you’d see is a mostly-empty graph. This is particularly painful on dashboards where a 30-day view is normal, because you’d have to wait a month before switching to recording rules.
To fix this issue, our community bridge mentee Jessica Greben has written up an excellent design doc and even has an open WIP PR here. The idea is to add a command to promtool in the form of
promtool backfill rules, where you specify the recording rules and the time range you want to backfill them for. Please comment on it if you have thoughts!
Bulk import data from CSV
The other kind of backfill is a simple bulk import. It’s a frequent request from our users, and there are many use cases for it. For instance, maybe you want to load some data into Prometheus to analyze it using PromQL. Or maybe you’re migrating from a different monitoring system and want to move the older data (sometimes three years of it) into Prometheus.
We’ve discussed simple bulk imports in a recent dev summit, and decided to add CSV and OpenMetrics as the formats to be supported. Based on that, Bartek wrote a quick POC for CSV-based import and demo-ed it in the talk. This is based on top of a previous PR by Dipack. And it basically works!
Check out the demo here:
There are still a few things to figure out before we can make it an official tool, namely, getting consensus around the CSV format. The current format is:
metric_name,label_name,label_value,timestamp_ms,label_name,label_value,help,value,type,exemplar_value,unit,exemplar_timestamp_ms metric1,pod,abc-1,1594885435,instance,1,some help,1245214.23423,counter,-0.12,bytes because why not,1 metric1,pod,abc-1,1594885436,instance,1,some help,1.23423,counter,-0.12,bytes because why not,1 metric1,pod,abc-2,1594885432,,,some help2,1245214.23421,gauge,,bytes,
We plan to simplify the format a little bit. For example, we need to specify enough
label_value columns as the largest metric (in terms of labels) requires, and this would mean a lot of empty columns for the smaller metrics. It might be simpler to just have the entire metric as a single string in there and we parse it at load time. And maybe we drop support for exemplars for now, until they’re a first class citizen in Prometheus. (Have some feedback? Give it here.)
The second thing missing is batching. Today, we open multiple blocks and write everything to memory, then flush it all at once at the end. Rather, we should write two hours of data at once and flush that to disk periodically. This is hard to do if the data in the CSV is not ordered – but maybe we should only support ordered data?
These are very minor improvements and I think the PR is mostly there, and we (Grafana Labs along with Bartek) will be dedicating some time this month to cleaning it up.
Even more backfill
We’re not done yet! Both the methods described above are offline tool-based methods where you generate the blocks by running a command and then move the blocks to the Prometheus data directory. We want to support an online backfill method where you push streams / blocks over the network to Prometheus. This was discussed in the same dev summit and we have some rough consensus on the idea, but not on the specifics on how it would work.
Having said that, once we launch the offline backfill tools, we’ll be continuously monitoring for feedback. Then, we’ll use the experience and feedback to design an efficient and useful online backfill solution for users.
Finally, I want to conclude by pointing out that backfill is a small subset of what we spoke about, and you should definitely go watch the video!