Data configurationGrafana integrationsIntegrations referenceCloudWatch integration

CloudWatch integration for Grafana Cloud

The CloudWatch integration enables you to quickly pull in CloudWatch metrics to Grafana Cloud. The integration also provides a number of prebuilt dashboards to help you monitor your Amazon Web Services (AWS). No agent is required and you can create multiple configurations called scrape jobs to organize your data.

CloudWatch integration vs CloudWatch data source

Grafana Cloud offers two solutions for visualizing your CloudWatch metrics, the integration or the data source. The data source allows you to keep your data in CloudWatch and build dashboards, rules, and alerts without pulling the data in to Grafana Cloud. The integration continuously pulls data from CloudWatch and pushes it to your Grafana Cloud Hosted Metrics Instance. The integration might be a better fit for you if you want to use promql to query your metric data vs needing to learn and understand the CloudWatch query language.

This page only covers the integration which is managed via the Integrations and Connections (lightning bolt icon) in your Grafana Cloud instance. If you are not using the integration and are looking for documentation on the data source that can be found here, https://grafana.com/docs/grafana/latest/datasources/aws-cloudwatch/.

Install CloudWatch integration for Grafana Cloud

  1. In your Grafana instance, Click Integrations and Connections (lightning bolt icon).

  2. Click the CloudWatch Metrics tile and follow the installation instructions.

Configure scrape jobs

You can create scrape job configurations automatically using two possible alternatives described below, or configure them manually.

Automatically configure scrape jobs using CloudFormation

Scrape jobs can be named and connected to a specific AWS CloudWatch account. Each scrape job contains a number of services available to scrape. For example, you can create a job that scrapes metrics from your EC2 instances from a specific AWS account.

  1. In the CloudWatch Metrics tile, click Add scrape job.

  2. Select Create Automatically in the first step of creating a new AWS role.

  3. Follow the steps to create an IAM role for CloudFormation.

  4. In the scrape job configuration UI, enter the ARN from your AWS IAM role in the scrape job field.

  5. Select relevant regions.

  6. Test the connection.

  7. Name the scrape job and select the services to import data from.

  8. Click Configure integration to create the scrape job.

    You’ll see a success page and can navigate to the dashboards that have been installed.

Automatically configure scrape jobs using Terraform

You’ll find a Terraform snippet in this section that can be used to provision the IAM role needed to create the scrape jobs.

The input variables are:

  • external_id: your Grafana Cloud identifier used for security purposes.

  • iam_role_name: customizable name of the IAM role used by Grafana for the CloudWatch integration. The default value is GrafanaCloudWatchIntegration.

The output value is:

  • role_arn: the IAM role ARN you need to use when creating the scrape job.

To run the Terraform file:

  1. Configure the AWS CLI.

  2. Copy this snippet into your Terraform file

    terraform {
      required_providers {
        aws = {
          source  = "hashicorp/aws"
          version = "~> 3.0"
        }
      }
    }
    
    locals {
      grafana_account_id = "008923505280"
    }
    
    variable "external_id" {
      type        = string
      description = "This is your Grafana Cloud identifier and is used for security purposes."
    
      validation {
        condition     = length(var.external_id) > 0
        error_message = "ExternalID is required."
      }
    }
    
    variable "iam_role_name" {
      type        = string
      default     = "GrafanaLabsCloudWatchIntegration"
      description = "Customize the name of the IAM role used by Grafana for the CloudWatch integration."
    }
    
    data "aws_iam_policy_document" "trust_grafana" {
      statement {
        effect = "Allow"
    
        principals {
          type        = "AWS"
          identifiers = ["arn:aws:iam::${local.grafana_account_id}:root"]
        }
    
        actions = ["sts:AssumeRole"]
        condition {
          test     = "StringEquals"
          variable = "sts:ExternalId"
          values   = [var.external_id]
        }
      }
    }
    
    resource "aws_iam_role" "grafana_labs_cloudwatch_integration" {
      name        = var.iam_role_name
      description = "Role used by Grafana CloudWatch integration."
    
      # Allow Grafana Labs' AWS account to assume this role.
      assume_role_policy = data.aws_iam_policy_document.trust_grafana.json
    
      # This policy allows the role to discover metrics via tags and export them.
      inline_policy {
        name = var.iam_role_name
        policy = jsonencode({
          Version = "2012-10-17"
          Statement = [
            {
              Effect = "Allow"
              Action = [
                "tag:GetResources",
                "cloudwatch:GetMetricData",
                "cloudwatch:GetMetricStatistics",
                "cloudwatch:ListMetrics"
              ]
              Resource = "*"
            }
          ]
        })
      }
    }
    
    output "role_arn" {
      value       = aws_iam_role.grafana_labs_cloudwatch_integration.arn
      description = "The ARN for the role created, copy this into Grafana Cloud installation."
    }
    
  3. Run the terraform apply command in one of the following ways:

  • By setting variables directly CLI

    terraform apply \
       -var="grafana_importer_external_id=<your external ID>" \
       -var="iam_role_name=GrafanaCloudWatchIntegration"
    
  • Create a tfvars file

    <your-tfvars-file>.tfvars

    Add the following text:

    grafana_importer_external_id="<your external ID>"
    iam_role_name="GrafanaCloudWatchIntegration"
    

    Run the following command:

    terraform apply -var-file="<your-tfvars-file>.tfvars"
    

Once the terraform apply command has finished creating the IAM Role, it will output your role_arn. For example:

role_arn = "arn:aws:iam::<yourAWSAccountID>:role/<iam_role_name>"

Use the role_arn in the next step of the scrape job creation.

Manually configure scrape jobs

Please note that we recommend using automation as a best practice. Creating the role in the AWS IAM console requires many more steps.

  1. Open the CloudWatch integration (configuration), click Add scrape job.

  2. Select Manual and create a new role in your AWS IAM console.

Configure the your AWS settings

  1. Click the link to open the AWS IAM console and do the following:

  2. In Roles, click Create role.

  3. Choose Another AWS account.

  4. In Account ID, enter the Grafana AWS account ID shown in the scrape job configuration.

  5. Select Require external ID and enter the Grafana external ID shown in the scrape job configuration.

  6. Click Next: Permissions.

  7. Click Create policy.

  8. Go to the JSON section. Overwrite existing code with the code provided in the Grafana Cloud instructions.

  9. At the bottom of each screen, click Next: Tags > Next: Review > Create policy.

  10. Return to the scrape job configuration UI and do the following:

    • Paste the ARN from your AWS IAM role in the scrape job field.
    • Select relevant regions.
    • Test the connection.
    • Name the scrape job and select the services to import data from.
    • Click Configure integration to create the scrape job.

    You’ll see a success page and can navigate to the dashboards that have been installed.

Dashboards

After you have successfully configured the CloudWatch integration, prebuilt dashboards will be installed in your Grafana instance to help you monitor your AWS services.

Managing Your integration

After you’ve successfully configured a scrape job, no other management is needed. Grafana Cloud will manage the scraping of metrics from CloudWatch into Grafana Cloud.

You can view, edit or delete your existing scrap jobs at any time by navigating to the integrations management page via the Integrations and Connections button (lightning bolt icon) on the left hand side and selecting the CloudWatch Metrics tile.

Services and metrics captured by Grafana CloudWatch integration

Services

The CloudWatch integration allows you to pull in metrics from the following AWS services:

  • Amazon Elastic Block Store (Amazon EBS)
  • EC2
  • Lamba
  • RDS
  • S3

Note: You must add tags to AWS resources so Grafana Cloud can discover their metrics. For more information, see the AWS tagging documentation.

Metrics

Below is a list of the metrics per service that are automatically written to your Grafana Cloud instance when you select a service to connect to. The metrics will be named using the following naming convention: aws_servicename_metricname_statistic. For example aws_ebs_volume_total_read_time_average is how the time series that measures the average VolumeTotalReadTime for Amazon Elastic Block Store (EBS) will be named.

Amazon Elastic Block Store (scraped every 5 minutes)

  • VolumeReadBytes (Sum)
  • VolumeWriteBytes (Sum)
  • VolumeReadOps (Average)
  • VolumeWriteOps (Average)
  • VolumeTotalReadTime (Average)
  • VolumeTotalWriteTime (Average)
  • VolumeIdleTime (Average)
  • VolumeQueueLength (Average)
  • VolumeThroughputPercentage (Average)
  • VolumeConsumedReadWriteOps (Average)
  • BurstBalance (Average)

EC2 (scraped every 5 minutes)

  • CPUUtilization (Maximum)
  • NetworkIn (Average, Sum)
  • NetworkOut (Average, Sum)
  • NetworkPacketsIn (Sum)
  • NetworkPacketsOut (Sum)
  • DiskReadBytes (Sum)
  • DiskWriteBytes (Sum)
  • DiskReadOps (Sum)
  • DiskWriteOps (Sum)
  • StatusCheckFailed (Sum)
  • StatusCheckFailed_Instance (Sum)
  • StatusCheckFailed_System (Sum)

Lambda (scraped every 5 minutes)

  • Invocations (Sum)
  • Errors (Sum)
  • Throttles (Sum)
  • Duration (Maximum, Minimum, p90)

RDS (scraped every 5 minutes)

  • CPUUtilization (Maximum)
  • DatabaseConnections (Sum)
  • FreeableMemory (Average)
  • FreeStorageSpace (Average)
  • ReadThroughput (Average)
  • WriteThroughput (Average)
  • ReadLatency (Maximum)
  • WriteLatency (Maximum)
  • ReadIOPS (Average)
  • WriteIOPS (Average)

S3 Storage (scraped every 6 hours)

  • NumberOfObjects (Average)
  • BucketSizeBytes (Average)

S3 Request (scraped every 10 minutes)

  • AllRequests (Sum)
  • 4xxErrors (Sum)
  • TotalRequestLatency (p95)

CloudWatch Metrics vs Prometheus Metrics

This integration treats each CloudWatch Metric as a Prometheus Gauge, a value which can increase or decrease. Prometheus views gauge metrics as a point in time representation of the value which is not altered or changed. If the value changes that value is represented at a new sample time.

CloudWatch represents their latest samples over a period of time. The latest period of data is being updated as new samples arrive but the timestamp does not change. As an example, using a 5 minute period, data aggregated from 12:00-12:05 would be exposed with a timestamp of 12:00 and would be changing until 12:05, at which point the new metric period of 12:05-12:10 starts.

How do we handle this? We use a period which matches the scrape interval and round metric requests to ensure we only gather metrics which are not actively being updated. The data is exposed at the timestamp it was scraped instead of the original CloudWatch timestamp. This results in Grafana dashboards being shifted in to the future when compared to CloudWatch. If we look at the 5 minute period 12:00-12:05 represented as 12:00 in CloudWatch, that value could appear at 12:07 in Grafana. Since we do not guarantee the exact time a scrape occurs, just the interval, it’s possible the timestamp of the 12:00 CloudWatch value will be between 12:05-12:09.

Cost

By connecting your AWS CloudWatch Metrics to Grafana Cloud you might incur charges. For more information, use the following links:

  • For an increase in the number of active series that your Grafana Cloud account uses for metrics included in each Cloud tier, see Active series and dpm usage and Cloud tier pricing.
  • The CloudWatch integration uses the ListMetrics and GetMetricData CloudWatch API calls to list and retrieve metrics and the GetResources Resource Groups Tagging API to discover resources
  • Each service configured in a job is scraped independently and the call counts below assume we are scraping lambda
    • GetResources is called 1 time
    • ListMetrics is called 4 times, once per metric
    • Assuming ListMetrics returned 5 values for each metric
      • GetMetricData would be called a single time requesting 30 metrics, 1 per metric + statistic combination
  • See CloudWatch Pricing for cost information associated with CloudWatch APIs