PromQL is the querying language that is part of Prometheus. In addition to PromQL, Prometheus provides a scraper that fetches metrics from instances (any application providing metrics) and a time series database (TSDB), which stores these metrics over time.

This introduction to PromQL will be largely decoupled from specific tools and the non-PromQL parts of Prometheus, in order to focus on the features of the language itself.

I recommend reading Ivana Huckova’s blog post, How to Explore Prometheus with Easy ‘Hello World’ Projects, which has useful tips and links on getting a series and a database set up together with Grafana.

As supplements to this post, check out this excellent recorded talk by Ian Billett from PromCon EU 2019: PromQL for Mere Mortals, and Timber’s handy cheat sheet, PromQL for Humans.

## Data Types

Prometheus uses three data types for metrics: the scalar, the instant vector, and the range vector. The most fundamental data type of Prometheus is the scalar – which represents a floating point value. Examples of scalars include 0, 18.12, and 1000000. All calculations in Prometheus are floating point operations.

When you group together scalars as a set of metrics in a single point in time, you get the instant vector data type. When you run a query asking for only the name of a metric, such as `bicycle_distance_meters_total`

, the response is an instant vector. Since metrics have both names and labels (which I’ll cover a little later), a single name may contain many values, and that is why it’s a vector rather than just a scalar.

An array of vectors over time gives you the range vector. Neither Grafana nor the built-in Prometheus expression browser makes graphs out of range vectors directly, but rather uses instant vectors or scalars independently calculated for different points in time.

Because of this, range vectors are typically wrapped with a function that transforms it into an instant vector (such as rate, delta, or increase) to be plotted. Syntactically, you get a range vector when you query an instant vector and append a time selector such as [5m]. The simplest range vector query is an instant vector with a time selector, such as `bicycle_distance_meters_total[1h]`

.

## Labels

Some earlier metric systems had only the metric name for distinguishing different metrics from each other. This means that you might end up with metric names such as `bicycle.distance.meters.monark.7`

to distinguish a 7-geared Monark bicycle from a 2-geared Brompton bicycle (`bicycle.distance.meters.brompton.2`

). In Prometheus, we use labels for that. A label is written after the metric name in the format `{label=value}`

.

This means that our previous two bikes are rewritten as `bicycle_distance_meters_total{brand="monark",gears="7"}`

and `bicycle_distance_meters_total{brand="brompton",gears="2"}`

.

When querying based on metrics, the same format is used. As such, `bicycle_distance_meters_total`

would give us the mileage of all bikes in this example; `bicycle_distance_meters_total{gears="7"}`

would limit the resulting set to all 7-geared bicycles. This allows us much more flexibility without having to resort to weird regex magic as with the older format.

Negation and regular expressions (in Google’s RE2-format) are supported by replacing `=`

with either `!=`

, `!~`

, or `=~`

for not equal, not matching, and matching respectively. (When selecting multiple values for variables in Grafana, they are represented in a format compatible with `=~`

).

The downside with labels is that the seemingly innocent query `bicycle_distance_meters_total`

can actually return thousands of values, and it’s sometimes not intuitive which queries will end up heavy on the Prometheus server or the client from which you’re querying Prometheus.

## Metric Types

Prometheus conceptually has four different metric types. The metric types are all represented by one or more scalar values with some different conventions dictating how to use them and for what purpose they’re useful.

**Counter** and **Gauge** are basic metric types, both of which store a scalar. A counter always counts up (a reset to zero can happen on restart) compared to a gauge, which can go both up and down. `bicycle_distance_meters_total`

is a counter since the number of kilometers a bike has traveled cannot decrease, whereas `bicycle_speed_meters_per_second`

would have to be a gauge to allow for decreased speeds. By convention, counters end with `_total`

to help the user distinguish between counters and gauges at a glance.

The third data type is **Histogram**, which offers a single interface for measuring three different things:

`<metric>_count`

is a counter that stores the total number of data points.`<metric>_sum`

is a gauge that stores the value of all data points added together. The sum can be used as a counter for all histograms where negative values are impossible.`<metric>_bucket`

is a collection of counters in which a label is used to support calculating the distribution of the values. The buckets are cumulative, so all buckets that are applicable for a value are increased by one on insertion of a sample. There is a`+Inf`

bucket, which should hold the same value as`_count`

.

For a bicycle race, the number of cyclists finishing by the number of hours it takes them to finish could be stored in a histogram with the buckets 21600, 25200, 28800, 32400, 36000, 39600, +Inf. (Time is by convention stored in seconds; this is one bucket per hour for the range [6, 11] hours.)

If there are 2 cyclists finishing in slightly less than 7 hours, 5 in less than 8 hours, 3 in less than 10 hours, and a sole cyclist finishing two days later, the bucket would be represented something like this. (For the purpose of this example, I’ve made up the value for the sum, but of course it can be anything, depending on the values in each bucket.)

```
race_duration_seconds_bucket{le="21600"} 0
race_duration_seconds_bucket{le="25200"} 2
race_duration_seconds_bucket{le="28800"} 7
race_duration_seconds_bucket{le="32400"} 7
race_duration_seconds_bucket{le="36000"} 10
race_duration_seconds_bucket{le="39600"} 10
race_duration_seconds_bucket{le="+Inf"} 11
race_duration_seconds_count 11
race_duration_seconds_sum 511200
```

By having a common convention for how to store histograms, Prometheus can provide functions such as `histogram_quantile`

(which calculates quantiles for a histogram – I’ll go into the details of that further down), and external tools such as Grafana can recognize the format and provide histogram features. Since histograms are “just” a collection of counters, the histograms don’t increase the complexity of Prometheus at large.

When using histograms, knowing how the buckets work and that everything above the largest bucket is simply stored as “more than the largest bucket” can help you understand what kind of accuracy you can get from histograms, and by extension, what accuracy you can expect from a calculation.

For instance, having a `+Inf`

bucket with a significantly higher value than the largest bucket might be an indicator that your buckets are misconfigured (and that the values you’re getting from Prometheus are unreliable).

The final type of metrics is **Summary**. It is similar to a histogram but is defined as a quantile gauge that is calculated by the client to get a higher accuracy of a quantile. The precalculated quantiles for summaries cannot be aggregated in a meaningful way. You can study the quantiles from individual instances of your service, but you cannot aggregate them to a fleet-wide quantile.

One common use case for quantiles is as service level indicators (i.e. SLI/SLO/SLA) to know how large a portion of the incoming requests to a server is slower than say 50ms. With a histogram in which one of the buckets is <0.05 seconds, it’s possible to say with high accuracy how many of the requests were not handled within that time. Adding more buckets will make it possible to calculate quantiles, which gives you an idea of the performance. With summaries, this aggregation is not at all possible.

To summarize, histograms require you to have some level of insight into the distribution of your values to begin with to set up appropriate buckets whereas summaries lack reliable aggregation operations.

## Functions & Operators

Metrics can be useful by themselves, but in order to maximize their utility, some kind of manipulation is necessary, which is why Prometheus provides a number of operators and functions for manipulation.

### Aggregation Operators

Aggregation operators reduce one instant vector to another instant vector representing the same or fewer label sets, either by aggregating the values of different label sets or by keeping one or more distinct sets, depending on its values. The simplest form of aggregation operators would appear as `avg(bicycle_speed_meters_per_second)`

, which gives you the overall average speed of the bicycles in the set.

If you want to be able to differentiate bicycles by the labels, brand, and number of gears, you can instead use `avg(bicycle_speed_meters_per_second) by (brand, gears)`

. ‘by`can be replaced with`

without` if you want to discard a label for the new vector instead of selecting which you’d like to keep.

There are a number of aggregations available, the most prominent being the hopefully self-explanatory `sum`

, `min`

, `max`

, and `avg`

. Some of the more complex aggregators take additional parameters, such as `topk(3, bicycle_speed_meters_per_second)`

, which gives you the overall three highest speeds.

### Binary Operators

The arithmetic binary operators (+, -, *, /, % [modulo/clock counting], ^ [power]) can operate on a combination of instant vectors and scalars, which can lead to a bit of a world of pain if you’re trying to be mathematically sound. So I’ll summarize the different cases and how to handle the weird cases that come with doing vector arithmetics.

Scalar-to-scalar arithmetics is at its core the arithmetics from primary school. Scalar to vector arithmetics is almost as simple: For every value in the vector, apply the calculation with the scalar. (If you have the `bicycle_speed_meters_per_second`

and want to express it in the more kilometers per hour (km/h), that is done with `bicycle_speed_meters_per_second*3.6`

.)

Vector-to-vector arithmetics is where it becomes really interesting. There is label matching going on, and the vectors that have labels that exactly match up to each other are calculated toward each other. All other values are discarded. Example: `bicycle_speed_meters_per_second`

/ `bicycle_cadence_revolutions_per_minute`

.

Since that’s often not what you want, you can add an `on`

(or `ignoring`

) operator to the right of the binary operator, and you’ll end up limiting the set of labels being used for the comparison before running it. However, all labels that are not used for the comparison are thrown away. This would be `bicycle_speed_meters_per_second / on (gears) bicycle_cadence_revolutions_per_minute`

.

What if you want to keep those labels? Well, you can keep all labels from the left hand side by adding `group_left`

after the `on`

or `ignoring`

keyword. When you do that, the value of the right-hand side will be applied on each of the left-hand side labels that match the labels from `on`

. In practice, this looks like `bicycle_speed_meters_per_second / on (gears) group_left bicycle_cadence_revolutions_per_minute`

. There is also a `group_right`

, which instead groups by the right-hand side.

In addition to the arithmetic operators, there are also the comparison (==, !=, >, <, >=, <=) and set (and, or, unless) operations.

Comparison operations are defined for instant vectors and scalars. A comparison between two scalars returns either 0 for false or 1 for true and requires the `bool`

keyword after the comparator. For instant vectors, when compared to a scalar, every data point for which the comparison is true is kept, and the others are thrown away.

When comparing two instant vectors, the logic is similar, but per set of labels, both the labels and the values are compared. When the operation returns false, or there is no metric with a matching set of labels on the opposite side, the value is thrown away; otherwise it’s kept. If you want 0 or 1 instead of keeping or tossing the value, you can add the keyword `bool`

after the comparator.

The set operators operate on instant vectors and work by inspecting the label sets on the metrics. For `and`

, if a label set exists on both the left-hand side and right-hand side, the value of the left-hand side is returned; otherwise, nothing. For `or`

, all label sets for the left-hand side are returned, as are the sets for the right-hand side that don’t exist on the left-hand side. And finally, `unless`

returns the values on the left-hand side for which the label set does not also exist on the right-hand side.

### Functions

Functions in Prometheus work much like functions in programming in general, but are limited to a pre-defined set. It’s important to know that most of Prometheus’s functions are approximative and extrapolate the result – which occasionally turns what should be integer calculations into floating point values, and also means that Prometheus is really bad to use when exactness is required (for example, for billing purposes).

Some particularly useful functions are `delta`

, `increase`

, and `rate`

. Each takes a range vector as input and returns an instant vector. `delta`

operates on gauges and returns the difference between the start and end of the range. `increase`

and `rate`

operate on counters and return the amount the counter has increased over the specified time. `increase`

gives the total increase over the time, and rate gives per second increase. `rate(bicycle_distance_meters_total[1h])`

should be the same as `increase(bicycle_distance_meters_total[1h]) / 3600`

. Because `increase`

and `rate`

have logic to handle restarts when the value is reset to zero, it’s important to avoid using them with gauges that go up and down. That could end up looking like a restart to the functions, resulting in a nonsensical value.

To make sense out of the histogram buckets, the `histogram_quantile`

function takes two arguments: first, the quantile that should be calculated, and second, the bucket’s instant vector. For our earlier race example, with a few more labels added, this could be `histogram_quantile(0.95, sum(race_duration_seconds_bucket) by (le))`

, which returns the time in which the 95-percentile racer would finish.
The reason we sum the finish time with `by (le)`

before performing the quantile calculation is because the quantile is calculated per unique combination of labels. This allows us to graph things such as the median time per number of gears on the bike with `histogram_quantile(0.5, sum(race_duration_seconds_bucket) by (le, gears))`

.

## Read More

PromQL is a domain-specific language with a syntax that hides a few surprises and isn’t always intuitively understandable. I originally wrote an earlier version of this post before I joined Grafana Labs because I wanted to understand the syntax used by Prometheus better, and because I noticed that a lot of PromQL queries out in the wild do not match up with the intentions of their authors. This post is a summary of my experimentation and reading Prometheus’s documentation and source code. I’d recommend reading these pages in the documentation, which cover pretty much the content I’ve written here and a lot more: