Blog  /  Engineering

Identify operational issues quickly by using Grafana and Amazon CloudWatch Metrics Insights

15 Dec 2021 4 min read

This blog post was co-written by Grafana Labs Software Engineer Erik Sundell, who mainly focuses on integrations between Grafana and cloud services, and Omur Kirikci, a Senior Product Manager for Amazon CloudWatch responsible for the Metrics domain. Kirikci is passionate about creating new products and strives for success by ensuring customer satisfaction. Outside of work, he enjoys being outdoors and hiking, spending time with his family, tasting different cuisines, and watching soccer with friends.

Amazon CloudWatch has recently launched Metrics Insights (Preview) — a fast, flexible, SQL-based query engine that enables you to identify trends and patterns across millions of operational metrics in real-time. With Metrics Insights, you can easily query and analyze your metrics to gain better visibility into the health and performance of your infrastructure and large scale applications. This capability is now also available in Open Source Grafana, and it will soon be available in Amazon Managed Grafana.

For the last two years, Grafana users have been analyzing CloudWatch metrics using SEARCH expressions. Now in Grafana 8.3, which features the Metrics Insights integration that was completed in partnership with AWS and Grafana Labs, Grafana extends the metric analytics capabilities with flexible querying and on-the-fly metric aggregations in order to respond promptly to operational problems. Metrics Insights also works with Grafana Alerting (introduced in Grafana 8.0), allowing users to create alerts powered by CloudWatch Metrics Insights that quickly target problems in infrastructures as they happen.

With Metrics Insights, you can slice and dice your operational metrics with your dimensions to dive deep and identify issues to the finest granular detail. For example, you can analyze thousands of Amazon EC2 instances by CPU utilization to troubleshoot an underperforming application. You can group your metrics by InstanceId to narrow down your analysis and identify failing instances rapidly. Once the instance is isolated, you can now recover the application by rebooting problematic instances. Moreover, you can use your queries to create powerful visualizations that will stay up to date as resources are deployed or shut down, helping you proactively monitor and identify issues quickly.

Metrics Insights comes with standard SQL query language, giving you the flexibility to group your metrics in real-time based on your use cases. You can use WHERE clause in order to narrow down your analysis or use ORDER BY to sort your results. For example, DESC displays in (descending) order and lists your time series such that the highest data point will be returned first. Likewise, LIMIT clause enables you to limit the number of results that can be returned, helping you to run “TOP N” type queries.

Using the Amazon CloudWatch plugin to query CloudWatch Metrics Insights

CloudWatch Metrics Insights is available through the GetMetricData API, which means that if you already have a CloudWatch data source plugin setup in Grafana, there’s no need for additional configuration. 

To use Metrics Insights in the Amazon CloudWatch plugin in Grafana, you must select the Metric Query type. The Metric Query type has two different modes — a Builder mode and a Code editor mode.

CloudWatch Metrics Insights uses a dialect of SQL. The syntax is as follows: 

SELECT FUNCTION(MetricName)
    FROM Namespace | SCHEMA(...)
    [ WHERE labelKey OPERATOR labelValue [AND|...]]
    [ GROUP BY labelKey [, ...]]
    [ ORDER BY FUNCTION() [DESC | ASC] ]
    [ LIMIT number]

Metrics Insights makes it easy to get started. Even if you don’t know SQL, you can use the visual query builder to browse through your existing metrics and dimensions by selecting the metrics of interest, resource names, and filters to narrow down your analysis. Then the console automatically constructs your SQL queries for you based on your selections. 

You can use the code editor in the Code mode of the CloudWatch plugin to type in your raw SQL queries anytime to dive deep and identify issues to the finest granular detail. The code editor comes with built-in autocompletion support that provides suggestions for keywords, functions, namespaces, metrics, dimension keys, and dimension values. 

The following example demonstrates how to view the top five instances with the highest average CPU utilization, ordered by maximum, and in descending order. The autocompleter provides smart suggestions throughout the composition of the query.

Grafana template variables allow you to create more interactive and dynamic dashboards. Instead of hardcoding values like namespace, metric, and aggregation functions in your queries, you can use variables in their place. Variables are displayed as dropdown lists at the top of the dashboard. These dropdowns make it easy to change the data displayed in your dashboard. When using Metric Query type in the CloudWatch plugin, you can use template variables in both the Builder and the Code mode. 

To learn more about CloudWatch Metrics Insights or to get started, check out the CloudWatch Metrics Insights documentation and the Grafana documentation.