PromQL vector matching: what it is and how it affects your Prometheus queries
Dawid Dębowski is a software engineer at G2A.COM and a Grafana Champion. Holding an MS of Computer Science, Dawid’s main fields of interest related to observability are PromQL and data visualizations using Grafana.
Have you ever created an awesome query in PromQL, expecting it to return the exact results you’re looking for, only to receive the “No data” response when you run it? If so, you might have fallen into the trap of PromQL vector matching.
In this post, we will provide an overview of what vectors are in PromQL, including the different types and why they’re important. We will also explain the concept of vector matching and walk through an example of it, using Prometheus and Grafana.
Note: If you want to learn more about the overall mechanics of PromQL queries, you can also check out this blog post.
Prometheus vectors: an overview
In Prometheus, almost every query returns a vector, which is a collection of time series data points. These can either be an instant vector or a range vector.
You can think of an instant vector as a pair (v1, t1)
with value and time, while a range vector is an array [(v1, t1), (v2, t2), (v3, t3), …]
. Each vector can also have labels, and different labels mean different vectors.
There’s also a scalar type, which is a single floating point value without any time connotations — for example, v1
. These definitions of vectors and scalars are important because of binary operations. As you can imagine, dividing a scalar by a scalar is easy, and so is dividing a vector by a scalar (where the value of every element of the vector is divided by the scalar). But what about dividing a vector by a vector?
In PromQL, arithmetic binary operations are defined only for instant vectors (trying to divide a range vector by another range vector will yield an error). According to Prometheus docs, applying binary operations between two instant vectors results in using the binary operator to each entry in the left-hand vector and its matching element in the right-hand vector.
Seems complicated, right? Let’s clear things up with an example.
Setting up the environment
Let’s say we deployed a simple application for collecting Pokémons. The application exposes the pokemon_caught_total
metric (a simple counter) with a type
label, representing the Pokémon type. For scraping metrics, we’re using Prometheus and, for visualization, we’re using Grafana.
Let’s start with a simple query: pokemon_caught_total{type="water"}
, which displays the amount of water
type Pokémons the application gathered from the start of its lifecycle.
As you can see, it’s an instant vector — it is placed in time and has a value. Another quick query shows that, for now, we’ve caught 285 Pokémons. Water types caught by us are roughly 5.26%
of our 285 total Pokémons.
Let’s see what Prometheus will say:
Well, that’s… unexpected. But wait, we forgot about the sum
around the divisor. That should do it!
Still no results. So, what’s going on exactly?
Vector matching
Prometheus documentation includes information about matching elements. When there’s an arithmetic operation between two instant vectors, Prometheus will compare all the labels in the vectors and try to find matches between them. Unmatched vectors will be dropped from the result. That’s why the first example showed 100% — Prometheus found one match between vectors on both sides, and it was a vector with the type=water
label. The rest of the search results did not match, so they were dropped from the final results. This is the same reason why the second example returned No data
, even though both sides had some data: sum
results in a vector with no labels, so there was no way Prometheus could match something to nothing.
This is why it’s important to pay attention when using binary operations between two vectors — sometimes, your result might look right, but it’s not, because Prometheus didn’t match all the vectors it should have.
The solution
The easiest solution here is to have the same labels on the left-hand and right-hand vectors. In the example with the water
type Pokémon percentage calculation, all it takes is to strip the labels from the left side of the operation:
Since we have queried all the interesting Pokémons on the left side, we can also use sum
to get the sum across the series. Now both sides have no labels, so Prometheus can match them. However, what if we wanted to create a chart of the percentages of Pokémon types we caught?
Using grouping on the aggregation makes us go back to square one:
Creating multiple charts or using Grafana variables is also not an option if we want to, for example, compare that data on a single plot. Fortunately, PromQL provides a couple of keywords that allow us to match vectors with different label sets: on
and ignoring
. With those keywords, we show Prometheus the labels on
which the vector matching should be performed, or the labels Prometheus should be ignoring
while performing the match. If you’re using one-to-one matching (where each vector on one side has to exactly match one vector on the other side), all you need to do is use one of the keywords after the binary operator sign.
Let’s go back to calculating the percentage of water
type Pokémons:
As you can see, we told Prometheus to match on
an empty set of labels, since the common label set is empty. Also, as exactly one vector on the left side matches only one vector on the right side, we have one-to-one matching and the on
keyword is enough.
But what about when we want to match multiple vectors on the left side to one vector on the right? As you can see, we have an error:
We have many vectors on the left side that match one vector on the right side — that’s many-to-one matching. In this case, we have to show Prometheus from which side we want our labels (which side represents the “many”), using group_left
or group_right
keywords. When we want to preserve labels from the left side, we use group_left,
and when from the right side, we use group_right
:
Now, we have all our Pokémon type percentages. To check, the water
type percentage is 5.26%
, just as we calculated before.
Note: Prometheus documentation mentions that group_.
are advanced use cases that should be carefully considered.
Wrapping up
Now you know that binary operations between vectors might not be as straightforward as they seem! You can visit this GitHub repo if you’d like to play around with the metrics and sample Pokémon application more.
You can also find more about vectors in these PromQL docs, and the logic of arithmetic operations is described in detail here. Examples we’ve discussed in this post can be easily carried over to real-life scenarios, like HTTP response times or HTTP status codes — for example, to find the percentage of error status codes within all traffic.
That’s it for now, and thank you for following along!