Grafana Alerting enables users to create and customize alert rules as separate entities and link them to Grafana panels. It also supports various data sources with built-in alerting engines, such as Prometheus, Grafana Mimir, and Grafana Loki, allowing users to manage their alert rules directly from Grafana’s UI. With Grafana Alerting, users can design and manage their alerting workflows with ease, ensuring that potential issues are addressed promptly and improving system reliability and uptime.
But the wide array of features provided in Grafana, coupled with our big tent philosophy, presented a new set of challenges for the Grafana Alerting team to tackle. We quickly discovered that managing hundreds, if not thousands, of alert rules was a common occurrence — and was becoming increasingly cumbersome for Grafana users.
To address this issue, we implemented a new search engine designed to help users quickly find specific information in their alerts and simplify managing a large number of alert rules and complex alerting workflows.
The challenges of searching across Grafana alert rules
1. The hierarchical structure
Alert rules in Grafana Alerting and Prometheus are organized in an hierarchical structure.
At the top level, we have folders (namespaces for Prometheus), which contain evaluation groups; inside them, we have alert rules.
Evaluation Group 1
- Alert rule 1
- Alert rule 2
Evaluation Group 2
- Alert rule 3
- Alert rule 4
This approach gives users a lot of flexibility in organizing their rules, but it creates some challenges when you want to find something specific in this structure.
2. So many properties per alert rule
Filtering by folder, evaluation group, and alert rule name are just the tip of the iceberg.
Each alert rule also has multiple important properties like its state, health, and type. Additionally, labels can be automatically generated by data sources or applied to alert rules by users so they can categorize and organize them more effectively. These labels can also then be used as filters to find specific alert rules.
Ultimately that’s a lot of different properties to sift through for each Grafana alert rule.
3. Label syntax and special characters
Labels in Grafana Alerting play a key role in delivering information about the issue, from alert rules to contact points (email, Slack, etc.). They are key-value pairs describing your time series data.
While keys can contain only certain characters, values are much more permissive and even allow emojis.
Additionally, the old search syntax supports so-called label matcher filtering. A few examples:
foo=~ba.* // Regex label matcher. Matches e.g., foo=bar, foo=baz, foo=baa foo!=baz // Negative label matcher. Matches all foo labels not equal baz foo!~ba.* // Negative regex matcher. Matches foo labels not matching ba.* regex```
In the end, we wanted to preserve the flexibility of this label filtering functionality, which added more requirements to our new Grafana Alerting search solution.
Inside Grafana Alerting’s new flexible search engine
Search with multiple filter options
When working on the new search engine, we had two main factors driving our decisions:
- Ease of use
As described above, alert rules are complex creatures, and users want to filter them by various properties. On the other hand, a vast list of inputs, dropdowns, and switches can become overwhelming very quickly.
How could we get the best of both worlds into Grafana Alerting without bringing their biggest pitfalls along? The way Github’s search field works got our attention, and we decided to give this approach a try in Grafana Alerting. Initially, it is straightforward: A simple text field. Yet, it becomes a powerful tool with a simple query language that allows you to search by different data properties.
Having decided how our new search should work, we could get our hands dirty with coding it.
Custom grammar parser for search syntax
The most significant step in our search journey was to create a query syntax and a parser for it.
We followed a simple key:value pattern for our filter expressions. The left side is for the filter type, the colon is a separator, and the right side is for the value you seek.
To support filters’ values consisting of multiple words we had to expand the syntax with quotes. If the user wants to use more than one word in the filter value, it must be wrapped by quotation marks.
A few examples:
rule:"High CPU usage"
Lezer for parsing
Developing a custom syntax parser can be a complex task. Thankfully, there are existing tools to simplify the process of crafting a grammar and constructing the accompanying parser.
The proven success of Lezer with PromQL and LogQL further reinforced its suitability for our needs. With this in mind, we sought to leverage Lezer to construct a grammar for our alert rules search query syntax.
Writing the grammar
When processing text input, a grammar parser constructs a syntax tree based on predefined grammar rules. The syntax tree is composed of syntax nodes, each of which may be classified as either terminal or non-terminal. Terminals represent the smallest, individual component of the input expression, while non-terminals consist of a combination of terminals that follow a particular grammar rule. Simply put, grammar parsers analyze text input and break it down into its smallest distinguishable parts before assembling those parts into larger rules based on their relationships.
Lezer’s documentation nicely describes the grammar vocabulary: “A grammar is a collection of rules, which define terms. Terms can be either tokens, in which case they directly match a piece of input text, or nonterminals, which match expressions made up of other terms.”
The grammar for this search defines filter type tokens for each available filter:
These tokens are combined with the colon and the filter value token to produce the filter term. The filter term defined in the Lezer grammar is as follows:
<Filter Type>:<Filter Value>
Additionally, the alert rules search grammar supports free-form search. The user can type anything, and the grammar will interpret that as words that will be used as a query to filter alert rules by name. This behavior gives users more flexibility when defining their search queries.
Why not regex?
Regular expressions (regex) can be powerful, but they have their limitations and may not always be the best solution for every scenario. For instance, complex regex expressions can be challenging to comprehend and maintain, and tracking the context of the interpreted expression can be complicated.
That’s where grammar parsers come into play. They provide a syntax tree, terminals, and non-terminals, and more flexible ways to express your intentions, reducing the burden of regex. Interestingly, the Lezer parser, which we use in Grafana, is, at its core, a sizable auto-generated regex expression.
Reusable grammar for other filters
Building query syntax and a parser takes effort. At some point, we started wondering whether we could reuse the parser for other filters.
Digging into the Lezer docs, the dialects feature appeared promising to us. With a few changes in the grammar, we built a parser in which we can conditionally turn certain expressions on and off. This improvement will allow us to effortlessly provide a better search experience in other parts of Grafana Alerting in the future.
UI connection to search
Sync UI toggles and search expressions
Having rich filtering capabilities is one thing, but making them easily discoverable and not overwhelming is a different story.
To keep the UI clean and simple, we provided UI components only for a subset of all available filters. This means we needed to connect the state between the UI filter toggles and the search query expression.
To solve this problem, we implemented two-way filter state binding. The filter state can be created from the search query, and the search query can be built up from the filter state.
When the user switches a filter toggle in the UI, the filter state changes as does the search query. Correspondingly when the query input changes in the search, the change gets reflected in the filter state as well as the UI elements.
Stable order of filters
One more thing we wanted to implement was keeping the order of filters stable. Imagine the situation when you have already typed a few expressions into the search input, something like this:
Here, we see three filters followed by a free-form text. Now, suppose you changed the state filter from the UI toggle. The state filter preserves its position in the search query.
To do this, we parse the existing query before building a new search query from the UI state. We then extract the order of filters and apply the updated filters in the same order as in the original query. It improves the predictability of the UI and makes the new search experience easy to use.
Learn more about Grafana Alerting
In conclusion, managing multiple alert rules in Grafana Alerting can be challenging, but with the new filtering capabilities, our users can efficiently manage complex alerting workflows. The new Grafana Alerting search engine’s flexibility ensures that you can find the information you need quickly and accurately. Ultimately, with Grafana Alerting, you can improve system reliability and uptime by addressing potential issues promptly.
To learn more, go to our Grafana Alerting documentation or visit our Grafana Incident & Response Management page. You can also watch the introductory “Alerting in Grafana 9” session from GrafanaCON 2022 on demand now.