PromQL is a functional query language that lets users select and aggregate time series data in real time. A query language is a computer language used to make queries in databases and information systems. PromQL is used for making ad-hoc queries, creating dashboards, alerting, etc. For an introduction to PromQL, I recommend this blog. Life of a PromQL query is a good talk for understanding PromQL. Anatomy of a PromQL query is also a great blog post that talks about the structure of PromQL queries and how they are evaluated over time.
In this training, we started with the basics of the time series data model, metric types, and querying, and then moved toward advanced querying. It is imperative that you understand the time series data model before you proceed any further. Prometheus fundamentally stores all data as time series, which is basically a stream of time-stamped values. Read the documentation on Prometheus’ Data Model to learn more about it.
Next, we walked through some basic PromQL queries.
For example, the figure below illustrates a PromQL query that selects only those time series with the http_requests_total metric name that have the job label set to Prometheus and the group label set to canary.
I would suggest going through the examples in the Prometheus documentation and trying them out yourself in a live Prometheus server or using the PromLens demo server to get a better understanding. After this, we looked at the anatomy of a PromQL query. Unlike SQL, PromQL is a nested functional language, which means that we describe the data that we are looking for as a nested set of expressions that each evaluate to an intermediary value. A PromQL expression is not just the entire query, but any nested part of a query.
In Prometheus, there are multiple concepts of “type” that are talked about. The first is the type of a metric and the second is the type of a PromQL expression. Prometheus also has two types of PromQL queries: instant queries and range queries. Instant queries are used to evaluate a query at a single point in time, whereas range queries are used to evaluate an expression query over a range of time. You can find out more about them here.
Then we moved on to advanced querying, where we learned about working with histograms, filtering a set of series by their sample value, and how set operators are used to correlate sets of time series with each other. You can find out more about it in the Prometheus documentation. These slides are also a good resource for the same.
The training also had two bonus sections on instrumentation best practices and pitfalls and PromQL-based alerting.
There are two well-known guidelines for what metrics to add to a given system.
The USE Method for resources (queues, CPUs, memory, disks, etc.):
- Utilization: The average time that the resource was busy (e.g., disk at 90% I/O utilization)
- Saturation: The degree to which extra work is queued (or denied) that can’t be serviced (e.g. scheduler run queue length)
- Errors: How many (and what) errors occurred
The RED Method for request-handling services:
- Error Rate
- Durations (distribution)
There are also a number of Prometheus-specific instrumentation best practices. These are documented here.
Another really important use case of PromQL is to write alerts based on it. Prometheus offers alerting best practices that are worth taking a look at. To understand more about Alertmanager and alerting, Life of an Alert is a great talk to watch.
This training was really thorough and helped me get more familiar with the Prometheus and PromQL ecosystem. I highly recommend that you attend it if you want a deep dive into PromQL and Prometheus.
Helpful Links :
- Official PromQL documentation
- PromQL Cheat Sheet
- Introduction to Prometheus, PromQL, & PromLens