Menu
Grafana Cloud

CloudWatch metrics integration for Grafana Cloud

The CloudWatch integration allows you to scrape AWS CloudWatch metrics without installing the Grafana agent. You can create multiple configurations called “scrape jobs” to separate concerns. Please note that we can only discover metrics for AWS resources that have tags applied to them. For more information, see the AWS docs.

CloudWatch integration vs CloudWatch data source

Grafana Cloud offers two solutions for visualizing your CloudWatch metrics, the integration or the data source. The data source allows you to keep your data in CloudWatch and build dashboards, rules, and alerts without pulling the data in to Grafana Cloud. The integration continuously pulls data from CloudWatch and pushes it to your Grafana Cloud Hosted Metrics Instance. The integration might be a better fit for you if you want to use promql to query your metric data vs needing to learn and understand the CloudWatch query language.

This page only covers the integration which is managed via the Integrations and Connections (lightning bolt icon) in your Grafana Cloud instance. If you are not using the integration and are looking for documentation on the data source that can be found here, https://grafana.com/docs/grafana/latest/datasources/aws-cloudwatch/.

Install CloudWatch metrics integration for Grafana Cloud

  1. In your Grafana instance, Click Integrations and Connections (lightning bolt icon), then click on install integration on CloudWatch metrics tile.
  2. Click the CloudWatch metrics tile and follow the installation instructions.

Configure scrape jobs

You can create scrape job configurations automatically using two possible alternatives described below, or configure them manually.

Automatically configure scrape jobs using CloudFormation

Scrape jobs can be named and connected to a specific AWS CloudWatch account. Each scrape job contains a number of services available to scrape. For example, you can create a job that scrapes metrics from your EC2 instances from a specific AWS account.

  1. In the CloudWatch Metrics tile, click Add scrape job.

  2. Select Create Automatically in the first step of creating a new AWS role.

  3. Follow the steps to create an IAM role for CloudFormation.

  4. In the scrape job configuration UI, enter the ARN from your AWS IAM role in the scrape job field.

  5. Select relevant regions.

  6. Test the connection.

  7. Name the scrape job and select the services to import data from.

  8. Click Configure integration to create the scrape job.

    You’ll see a success page and can navigate to the dashboards that have been installed.

Automatically configure scrape jobs using Terraform

Before you begin, make sure you have the Username / Instance ID for your Grafana Cloud Prometheus. This can be found by clicking on Details in the Prometheus card of the Cloud Portal.

You’ll find a Terraform snippet in this section that can be used to provision the IAM role needed to create the scrape jobs.

The input variables are:

  • external_id: the Username / Instance ID for your Grafana Cloud Prometheus (see above for how to find it). An External ID is used by AWS to provide an extra layer of security when giving Grafana access to pull your CloudWatch metrics in to Grafana Cloud. To learn more about External IDs, see [why use an external ID?] (https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html#external-id-purpose)

  • iam_role_name: customizable name of the IAM role used by Grafana for the CloudWatch integration. The default value is GrafanaCloudWatchIntegration.

The output value is:

  • role_arn: the IAM role ARN you need to use when creating the scrape job.

To run the Terraform file:

  1. Configure the AWS CLI.

  2. Copy this snippet into your Terraform file

    terraform {
      required_providers {
        aws = {
          source  = "hashicorp/aws"
          version = "~> 3.0"
        }
      }
    }
    locals {
      grafana_account_id = "008923505280"
    }
    variable "external_id" {
      type        = string
      description = "This is your Grafana Cloud identifier and is used for security purposes."
      validation {
        condition     = length(var.external_id) > 0
        error_message = "ExternalID is required."
      }
    }
    variable "iam_role_name" {
      type        = string
      default     = "GrafanaLabsCloudWatchIntegration"
      description = "Customize the name of the IAM role used by Grafana for the CloudWatch integration."
    }
    data "aws_iam_policy_document" "trust_grafana" {
      statement {
        effect = "Allow"
        principals {
          type        = "AWS"
          identifiers = ["arn:aws:iam::${local.grafana_account_id}:root"]
        }
        actions = ["sts:AssumeRole"]
        condition {
          test     = "StringEquals"
          variable = "sts:ExternalId"
          values   = [var.external_id]
        }
      }
    }
    resource "aws_iam_role" "grafana_labs_cloudwatch_integration" {
      name        = var.iam_role_name
      description = "Role used by Grafana CloudWatch integration."
      # Allow Grafana Labs' AWS account to assume this role.
      assume_role_policy = data.aws_iam_policy_document.trust_grafana.json
    
      # This policy allows the role to discover metrics via tags and export them.
      inline_policy {
        name = var.iam_role_name
        policy = jsonencode({
          Version = "2012-10-17"
          Statement = [
            {
              Effect = "Allow"
              Action = [
                "tag:GetResources",
                "cloudwatch:GetMetricData",
                "cloudwatch:GetMetricStatistics",
                "cloudwatch:ListMetrics"
              ]
              Resource = "*"
            }
          ]
        })
      }
    }
    output "role_arn" {
      value       = aws_iam_role.grafana_labs_cloudwatch_integration.arn
      description = "The ARN for the role created, copy this into Grafana Cloud installation."
    }
    
  3. Run the terraform apply command in one of the following ways:

  • By setting variables directly CLI

    terraform apply \
       -var="grafana_importer_external_id=<your external ID>" \
       -var="iam_role_name=GrafanaCloudWatchIntegration"
    
  • Create a tfvars file

    <your-tfvars-file>.tfvars

    Add the following text:

    grafana_importer_external_id="<your external ID>"
    iam_role_name="GrafanaCloudWatchIntegration"
    

    Run the following command:

    terraform apply -var-file="<your-tfvars-file>.tfvars"
    

Once the terraform apply command has finished creating the IAM Role, it will output your role_arn. For example:

role_arn = "arn:aws:iam::<yourAWSAccountID>:role/<iam_role_name>"

Use the role_arn in the next step of the scrape job creation.

Manually configure scrape jobs

Please note that we recommend using automation as a best practice. Creating the role in the AWS IAM console requires many more steps.

Before you begin, make sure you have the Username / Instance ID for your Grafana Cloud Prometheus. This can be found by clicking on Details in the Prometheus card of the Cloud Portal.

  1. Open the CloudWatch integration (configuration), click Add scrape job.

  2. Select Manual and create a new role in your AWS IAM console.

  3. Click the link to open the AWS IAM console and do the following:

  4. In Roles, click Create role.

  5. Choose AWS Account for Trusted entity type.

  6. Choose Another AWS account.

  7. In Account ID, enter the Grafana AWS account ID shown in the scrape job configuration.

  8. Select Require external ID and enter the Username / Instance ID for your Grafana Cloud Prometheus (see above for how to find it). An External ID is used by AWS to provide an extra layer of security when giving Grafana access to pull your CloudWatch metrics in to Grafana Cloud. To learn more about External IDs, see [why use an external ID?] (https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html#external-id-purpose)

  9. Click Next: Permissions.

  10. Click Create policy.

  11. Go to the JSON section. Overwrite existing code with the code provided in the Grafana Cloud instructions.

  12. At the bottom of each screen, click Next: Tags > Next: Review > Create policy.

  13. Return to the scrape job configuration UI and do the following:

    • Paste the ARN from your AWS IAM role in the scrape job field.
    • Select relevant regions.
    • Test the connection.
    • Name the scrape job and select the services to import data from.
    • Click Configure integration to create the scrape job.

    You’ll see a success page and can navigate to the dashboards that have been installed.

Scrape Job Resource Limit

When creating a job you might be presented with an error You reached the resource limit for a single scrape job. To continue, create several jobs with fewer services or regions.. There is a limit of 1024 resources per scrape job. This is intended to ensure metrics can be delivered to your Grafana Cloud instance in a timely manner.

The number of resources is determined by checking resources for each service per region. Let’s say you hit the limit setting up a job with name cloudwatch to cover EBS, EC2, and lambda in the us-east-1, us-east-2, eu-central-1, and eu-north-1 regions. You could create multiple jobs such as,

By Service

  • cloudwatch-ebs: EBS in us-east-1, us-east-2, eu-central-1, and eu-north-1
  • cloudwatch-ec2: EC2 in us-east-1, us-east-2, eu-central-1, and eu-north-1
  • cloudwatch-lambda: lambda in us-east-1, us-east-2, eu-central-1, and eu-north-1

By Region

  • cloudwatch-us-east: EBS, EC2, and lambda in us-east-1, and us-east-2
  • cloudwatch-eu: EBS, EC2, and lambda in eu-central-1, and eu-north-1

By Service + Region

  • cloudwatch-us-east-ebs: EBS in us-east-1, and us-east-2
  • cloudwatch-eu-ebs: EBS in eu-central-1, and eu-north-1
  • cloudwatch-us-east-ec2: EC2 in us-east-1, and us-east-2
  • cloudwatch-eu-ec2: EC2 in eu-central-1, and eu-north-1
  • cloudwatch-us-east-lambda: lambda in us-east-1, and us-east-2
  • cloudwatch-eu-lambda: lambda in eu-central-1, and eu-north-1

Services

The CloudWatch integration supports the following services and metrics:

  • AWS/AmazonMQ
    • AckRate: Average
    • BurstBalance: Average
    • ChannelCount: Average
    • ConfirmRate: Average
    • ConnectionCount: Average
    • ConsumerCount: Average
    • CpuCreditBalance: Average
    • CpuUtilization: Average
    • CurrentConnectionsCount: Average
    • DequeueCount: Average
    • DispatchCount: Average
    • EnqueueCount: Average
    • EnqueueTime: Average
    • EstablishedConnectionsCount: Average
    • ExchangeCount: Average
    • ExpiredCount: Average
    • HeapUsage: Average
    • InactiveDurableTopicSubscribersCount: Average
    • InFlightCount: Average
    • JobSchedulerStorePercentUsage: Average
    • JournalFilesForFastRecovery: Average
    • JournalFilesForFullRecovery: Average
    • MemoryUsage: Average
    • MessageCount: Average
    • MessageReadyCount: Average
    • MessageUnacknowledgedCount: Average
    • NetworkIn: Average
    • NetworkOut: Average
    • OpenTransactionCount: Average
    • ProducerCount: Average
    • PublishRate: Average
    • QueueCount: Average
    • QueueSize: Average
    • RabbitMQDiskFree: Average
    • RabbitMQDiskFreeLimit: Average
    • RabbitMQFdUsed: Average
    • RabbitMQMemLimit: Average
    • RabbitMQMemUsed: Average
    • ReceiveCount: Average
    • StorePercentUsage: Average
    • SystemCpuUtilization: Average
    • TempPercentUsage: Average
    • TotalConsumerCount: Average
    • TotalDequeueCount: Average
    • TotalEnqueueCount: Average
    • TotalMessageCount: Average
    • TotalProducerCount: Average
    • VolumeReadOps: Average
    • VolumeWriteOps: Average
  • AWS/DynamoDB
    • AccountMaxReads: Average
    • AccountMaxTableLevelReads: Average
    • AccountMaxTableLevelWrites: Average
    • AccountMaxWrites: Average
    • AccountProvisionedReadCapacityUtilization: Average
    • AccountProvisionedWriteCapacityUtilization: Average
    • AgeOfOldestUnreplicatedRecord: Average
    • ConditionalCheckFailedRequests: Average
    • ConsumedChangeDataCaptureUnits: Average
    • ConsumedReadCapacityUnits: Average
    • ConsumedWriteCapacityUnits: Average
    • FailedToReplicateRecordCount: Average
    • MaxProvisionedTableWriteCapacityUtilization: Average
    • OnlineIndexConsumedWriteCapacity: Average
    • OnlineIndexPercentageProgress: Average
    • OnlineIndexThrottleEvents: Average
    • PendingReplicationCount: Average
    • ProvisionedReadCapacityUnits: Average
    • ProvisionedWriteCapacityUnits: Average
    • ReadThrottleEvents: Average
    • ReplicationLatency: Average
    • ReturnedBytes: Average
    • ReturnedItemCount: Average
    • ReturnedRecordsCount: Average
    • SuccessfulRequestLatency: Average
    • SystemErrors: Average
    • TimeToLiveDeletedItemCount: Average
    • ThrottledPutRecordCount: Average
    • ThrottledRequests: Average
    • TransactionConflict: Average
    • UserErrors: Average
    • WriteThrottleEvents: Average
  • AWS/ECS
    • CPUReservation: Average
    • CPUUtilization: Average
    • GPUReservation: Average
    • MemoryReservation: Average
    • MemoryUtilization: Average
  • AWS/ES
    • 2xx, 3xx, 4xx, 5xx: Average
    • ADAnomalyDetectorsIndexStatus.red: Average
    • ADAnomalyDetectorsIndexStatusIndexExists: Average
    • ADAnomalyResultsIndexStatus.red: Average
    • ADAnomalyResultsIndexStatusIndexExists: Average
    • ADExecuteFailureCount: Average
    • ADExecuteRequestCount: Average
    • ADHCExecuteFailureCount: Average
    • ADHCExecuteRequestCount: Average
    • ADModelsCheckpointIndexStatus.red: Average
    • ADModelsCheckpointIndexStatusIndexExists: Average
    • ADPluginUnhealthy: Average
    • AlertingDegraded: Average
    • AlertingIndexExists: Average
    • AlertingIndexStatus.green: Average
    • AlertingIndexStatus.red: Average
    • AlertingIndexStatus.yellow: Average
    • AlertingNodesNotOnSchedule: Average
    • AlertingNodesOnSchedule: Average
    • AlertingScheduledJobEnabled: Average
    • AsynchronousSearchCancelled: Average
    • AsynchronousSearchCompletionRate: Average
    • AsynchronousSearchFailureRate: Average
    • AsynchronousSearchInitializedRate: Average
    • AsynchronousSearchMaxRunningTime: Average
    • AsynchronousSearchPersistFailedRate: Average
    • AsynchronousSearchPersistRate: Average
    • AsynchronousSearchRejected: Average
    • AsynchronousSearchRunningCurrent: Average
    • AsynchronousSearchStoreHealth: Average
    • AsynchronousSearchStoreSize: Average
    • AsynchronousSearchStoredResponseCount: Average
    • AsynchronousSearchSubmissionRate: Average
    • AutomatedSnapshotFailure: Average
    • CPUCreditBalance: Average
    • CPUUtilization: Average
    • ClusterIndexWritesBlocked: Average
    • ClusterStatus.green: Average
    • ClusterStatus.red: Average
    • ClusterStatus.yellow: Average
    • ClusterUsedSpace: Average
    • ColdStorageSpaceUtilization: Average
    • ColdToWarmMigrationFailureCount: Average
    • ColdToWarmMigrationLatency: Average
    • ColdToWarmMigrationQueueSize: Average
    • ColdToWarmMigrationSuccessCount: Average
    • CoordinatingWriteRejected: Average
    • CrossClusterInboundRequests: Average
    • CrossClusterOutboundConnections: Average
    • CrossClusterOutboundRequests: Average
    • DeletedDocuments: Average
    • DiskQueueDepth: Average
    • FollowerCheckPoint: Average
    • FreeStorageSpace: Average
    • HotStorageSpaceUtilization: Average
    • HotToWarmMigrationFailureCount: Average
    • HotToWarmMigrationForceMergeLatency: Average
    • HotToWarmMigrationProcessingLatency: Average
    • HotToWarmMigrationQueueSize: Average
    • HotToWarmMigrationSnapshotLatency: Average
    • HotToWarmMigrationSuccessCount: Average
    • HotToWarmMigrationSuccessLatency: Average
    • IndexingLatency: Average
    • IndexingRate: Average
    • InvalidHostHeaderRequests: Average
    • JVMGCOldCollectionCount: Average
    • JVMGCOldCollectionTime: Average
    • JVMGCYoungCollectionCount: Average
    • JVMGCYoungCollectionTime: Average
    • JVMMemoryPressure: Average
    • KMSKeyError: Average
    • KMSKeyInaccessible: Average
    • KNNCacheCapacityReached: Average
    • KNNCircuitBreakerTriggered: Average
    • KNNEvictionCount: Average
    • KNNGraphIndexErrors: Average
    • KNNGraphIndexRequests: Average
    • KNNGraphMemoryUsage: Average
    • KNNGraphQueryErrors: Average
    • KNNGraphQueryRequests: Average
    • KNNHitCount: Average
    • KNNLoadExceptionCount: Average
    • KNNLoadSuccessCount: Average
    • KNNMissCount: Average
    • KNNQueryRequests: Average
    • KNNScriptCompilationErrors: Average
    • KNNScriptCompilations: Average
    • KNNScriptQueryErrors: Average
    • KNNScriptQueryRequests: Average
    • KNNTotalLoadTime: Average
    • KibanaReportingFailedRequestSysErrCount: Average
    • KibanaReportingFailedRequestUserErrCount: Average
    • KibanaReportingRequestCount: Average
    • KibanaReportingSuccessCount: Average
    • LTRFeatureMemoryUsageInBytes: Average
    • LTRFeaturesetMemoryUsageInBytes: Average
    • LTRMemoryUsage: Average
    • LTRModelMemoryUsageInBytes: Average
    • LTRRequestErrorCount: Average
    • LTRRequestTotalCount: Average
    • LTRStatus.red: Average
    • LeaderCheckPoint: Average
    • MasterCPUCreditBalance: Average
    • MasterCPUUtilization: Average
    • MasterFreeStorageSpace: Average
    • MasterJVMMemoryPressure: Average
    • MasterReachableFromNode: Average
    • MasterSysMemoryUtilization: Average
    • Nodes: Average
    • OpenSearchDashboardsConcurrentConnections: Average
    • OpenSearchDashboardsHealthyNode: Average
    • OpenSearchDashboardsHealthyNodes: Average
    • OpenSearchDashboardsHeapTotal: Average
    • OpenSearchDashboardsHeapUsed: Average
    • OpenSearchDashboardsHeapUtilization: Average
    • OpenSearchDashboardsOS1MinuteLoad: Average
    • OpenSearchDashboardsRequestTotal: Average
    • OpenSearchDashboardsResponseTimesMaxInMillis: Average
    • OpenSearchRequests: Average
    • PPLFailedRequestCountByCusErr: Average
    • PPLFailedRequestCountBySysErr: Average
    • PPLRequestCount: Average
    • PrimaryWriteRejected: Average
    • ReadIOPS: Average
    • ReadLatency: Average
    • ReadThroughput: Average
    • ReplicaWriteRejected: Average
    • ReplicationRate: Average
    • SQLDefaultCursorRequestCount: Average
    • SQLFailedRequestCountByCusErr: Average
    • SQLFailedRequestCountBySysErr: Average
    • SQLRequestCount: Average
    • SQLUnhealthy: Average
    • SearchLatency: Average
    • SearchRate: Average
    • SearchableDocuments: Average
    • SegmentCount: Average
    • Shards.active: Average
    • Shards.activePrimary: Average
    • Shards.delayedUnassigned: Average
    • Shards.initializing: Average
    • Shards.relocating: Average
    • Shards.unassigned: Average
    • SysMemoryUtilization: Average
    • ThreadpoolBulkQueue: Average
    • ThreadpoolBulkRejected: Average
    • ThreadpoolBulkThreads: Average
    • ThreadpoolForce_mergeQueue: Average
    • ThreadpoolForce_mergeRejected: Average
    • ThreadpoolForce_mergeThreads: Average
    • ThreadpoolIndexQueue: Average
    • ThreadpoolIndexRejected: Average
    • ThreadpoolIndexThreads: Average
    • ThreadpoolSearchQueue: Average
    • ThreadpoolSearchRejected: Average
    • ThreadpoolSearchThreads: Average
    • ThreadpoolWriteQueue: Average
    • ThreadpoolWriteRejected: Average
    • ThreadpoolWriteThreads: Average
    • Threadpoolsql-workerQueue: Average
    • Threadpoolsql-workerRejected: Average
    • Threadpoolsql-workerThreads: Average
    • WarmCPUUtilization: Average
    • WarmFreeStorageSpace: Average
    • WarmJVMGCOldCollectionCount: Average
    • WarmJVMGCYoungCollectionCount: Average
    • WarmJVMGCYoungCollectionTime: Average
    • WarmJVMMemoryPressure: Average
    • WarmSearchLatency: Average
    • WarmSearchRate: Average
    • WarmSearchableDocuments: Average
    • WarmStorageSpaceUtilization: Average
    • WarmSysMemoryUtilization: Average
    • WarmThreadpoolSearchQueue: Average
    • WarmThreadpoolSearchRejected: Average
    • WarmThreadpoolSearchThreads: Average
    • WarmToColdMigrationFailureCount: Average
    • WarmToColdMigrationLatency: Average
    • WarmToColdMigrationQueueSize: Average
    • WarmToColdMigrationSuccessCount: Average
    • WarmToHotMigrationQueueSize: Average
    • WriteIOPS: Average
    • WriteLatency: Average
    • WriteThroughput: Average
  • AWS/ElastiCache
    • ActiveDefragHits: Average
    • AuthenticationFailures: Average
    • BytesReadIntoMemcached: Average
    • BytesUsedForCache: Average
    • BytesUsedForCacheItems: Average
    • BytesUsedForHash: Average
    • BytesReadFromDisk: Average
    • BytesWrittenToDisk: Average
    • BytesWrittenOutFromMemcached: Average
    • CPUUtilization: Average
    • CPUCreditBalance: Average
    • CPUCreditUsage: Average
    • CacheHitRate: Average
    • CacheHits: Average
    • CacheMisses: Average
    • CasBadval: Average
    • CasHits: Average
    • CasMisses: Average
    • CmdConfigGet: Average
    • CmdConfigSet: Average
    • CmdFlush: Average
    • CmdGet: Average
    • CmdSet: Average
    • CmdTouch: Average
    • CommandAuthorizationFailures: Average
    • CurrConfig: Average
    • CurrConnections: Average
    • CurrItems: Average
    • CurrVolatileItems: Average
    • DatabaseMemoryUsagePercentage: Average
    • DatabaseMemoryUsageCountedForEvictPercentage: Average
    • DB0AverageTTL: Average
    • DecrHits: Average
    • DecrMisses: Average
    • DeleteHits: Average
    • DeleteMisses: Average
    • EngineCPUUtilization: Average
    • EvalBasedCmds: Average
    • EvalBasedCmdsLatency: Average
    • EvictedUnfetched: Average
    • Evictions: Average
    • ExpiredUnfetched: Average
    • FreeableMemory: Average
    • GeoSpatialBasedCmds: Average
    • GeoSpatialBasedCmdsLatency: Average
    • GetHits: Average
    • GetMisses: Average
    • GetTypeCmds: Average
    • GetTypeCmdsLatency: Average
    • GlobalDatastoreReplicationLag: Average
    • IsMaster: Average
    • HashBasedCmds: Average
    • HashBasedCmdsLatency: Average
    • HyperLogLogBasedCmds: Average
    • HyperLogLogBasedCmdsLatency: Average
    • IsPrimary: Average
    • IncrHits: Average
    • IncrMisses: Average
    • KeyAuthorizationFailures: Average
    • KeyBasedCmds: Average
    • KeyBasedCmdsLatency: Average
    • KeysTracked: Average
    • ListBasedCmds: Average
    • ListBasedCmdsLatency: Average
    • MasterLinkHealthStatus: Average
    • MemoryFragmentationRatio: Average
    • NetworkBytesIn: Average
    • NetworkBytesOut: Average
    • NetworkPacketsIn: Average
    • NetworkPacketsOut: Average
    • NetworkBandwidthInAllowanceExceeded: Average
    • NetworkBandwidthOutAllowanceExceeded: Average
    • NetworkConntrackAllowanceExceeded: Average
    • NetworkLinkLocalAllowanceExceeded: Average
    • NetworkPacketsPerSecondAllowanceExceeded: Average
    • NewConnections: Average
    • NewItems: Average
    • NumItemsReadFromDisk: Average
    • NumItemsWrittenToDisk: Average
    • PrimaryLinkHealthStatus: Average
    • PubSubBasedCmds: Average
    • PubSubBasedCmdsLatency: Average
    • Reclaimed: Average
    • ReplicationBytes: Average
    • ReplicationLag: Average
    • SaveInProgress: Average
    • SetBasedCmds: Average
    • SetBasedCmdsLatency: Average
    • SetTypeCmds: Average
    • SetTypeCmdsLatency: Average
    • SlabsMoved: Average
    • SortedSetBasedCmds: Average
    • SortedSetBasedCmdsLatency: Average
    • StreamBasedCmds: Average
    • StreamBasedCmdsLatency: Average
    • StringBasedCmds: Average
    • StringBasedCmdsLatency: Average
    • SwapUsage: Average
    • TouchHits: Average
    • TouchMisses: Average
    • UnusedMemory: Average
  • AWS/Kafka
    • ActiveControllerCount: Average
    • BytesInPerSec: Average
    • BytesOutPerSec: Average
    • CpuIdle: Average
    • CpuSystem: Average
    • CpuUser: Average
    • EstimatedMaxTimeLag: Average
    • EstimatedTimeLag: Average
    • FetchConsumerLocalTimeMsMean: Average
    • FetchConsumerRequestQueueTimeMsMean: Average
    • FetchConsumerResponseQueueTimeMsMean: Average
    • FetchConsumerResponseSendTimeMsMean: Average
    • FetchConsumerTotalTimeMsMean: Average
    • FetchFollowerLocalTimeMsMean: Average
    • FetchFollowerRequestQueueTimeMsMean: Average
    • FetchFollowerResponseQueueTimeMsMean: Average
    • FetchFollowerResponseSendTimeMsMean: Average
    • FetchFollowerTotalTimeMsMean: Average
    • FetchMessageConversionsPerSec: Average
    • FetchThrottleByteRate: Average
    • FetchThrottleQueueSize: Average
    • FetchThrottleTime: Average
    • GlobalPartitionCount: Average
    • GlobalTopicCount: Average
    • KafkaAppLogsDiskUsed: Average
    • KafkaDataLogsDiskUsed: Average
    • LeaderCount: Average
    • MaxOffsetLag: Average
    • MemoryBuffered: Average
    • MemoryCached: Average
    • MemoryFree: Average
    • MemoryUsed: Average
    • MessagesInPerSec: Average
    • NetworkProcessorAvgIdlePercent: Average
    • NetworkRxDropped: Average
    • NetworkRxErrors: Average
    • NetworkRxPackets: Average
    • NetworkTxDropped: Average
    • NetworkTxErrors: Average
    • NetworkTxPackets: Average
    • OfflinePartitionsCount: Average
    • PartitionCount: Average
    • ProduceLocalTimeMsMean: Average
    • ProduceMessageConversionsPerSec: Average
    • ProduceMessageConversionsTimeMsMean: Average
    • ProduceRequestQueueTimeMsMean: Average
    • ProduceResponseQueueTimeMsMean: Average
    • ProduceResponseSendTimeMsMean: Average
    • ProduceThrottleByteRate: Average
    • ProduceThrottleQueueSize: Average
    • ProduceThrottleTime: Average
    • ProduceTotalTimeMsMean: Average
    • ReplicationBytesInPerSec: Average
    • ReplicationBytesOutPerSec: Average
    • RequestBytesMean: Average
    • RequestExemptFromThrottleTime: Average
    • RequestHandlerAvgIdlePercent: Average
    • RequestThrottleQueueSize: Average
    • RequestThrottleTime: Average
    • RequestTime: Average
    • RootDiskUsed: Average
    • SumOffsetLag: Average
    • SwapFree: Average
    • SwapUsed: Average
    • OffsetLag: Average
    • UnderMinIsrPartitionCount: Average
    • UnderReplicatedPartitions: Average
    • ZooKeeperRequestLatencyMsMean: Average
    • ZooKeeperSessionState: Average
  • AWS/Kinesis
    • GetRecords.Bytes: Average
    • GetRecords.IteratorAge: Average
    • GetRecords.IteratorAgeMilliseconds: Average
    • GetRecords.Latency: Average
    • GetRecords.Records: Average
    • GetRecords.Success: Average
    • IncomingBytes: Average
    • IncomingRecords: Average
    • IteratorAgeMilliseconds: Average
    • OutgoingBytes: Average
    • OutgoingRecords: Average
    • PutRecord.Bytes: Average
    • PutRecord.Latency: Average
    • PutRecord.Success: Average
    • PutRecords.Bytes: Average
    • PutRecords.Latency: Average
    • PutRecords.Records: Average
    • PutRecords.Success: Average
    • ReadProvisionedThroughputExceeded: Average
    • SubscribeToShard.RateExceeded: Average
    • SubscribeToShard.Success: Average
    • SubscribeToShardEvent.Bytes: Average
    • SubscribeToShardEvent.MillisBehindLatest: Average
    • SubscribeToShardEvent.Records: Average
    • SubscribeToShardEvent.Success: Average
    • WriteProvisionedThroughputExceeded: Average
  • AWS/Route53
    • ChildHealthCheckHealthyCount: Average
    • ConnectionTime: Average
    • DNSQueries: Average
    • HealthCheckPercentageHealthy: Average
    • HealthCheckStatus: Average
    • SSLHandshakeTime: Average
    • TimeToFirstByte: Average
  • AWS/SQS
    • ApproximateAgeOfOldestMessage: Average
    • ApproximateNumberOfMessagesDelayed: Average
    • ApproximateNumberOfMessagesNotVisible: Average
    • ApproximateNumberOfMessagesVisible: Average
    • NumberOfEmptyReceives: Average
    • NumberOfMessagesDeleted: Average
    • NumberOfMessagesReceived: Average
    • NumberOfMessagesSent: Average
    • SentMessageSize: Average
  • ECS/ContainerInsights
    • ContainerInstanceCount: Average
    • CpuUtilized: Average
    • CpuReserved: Average
    • DeploymentCount: Average
    • DesiredTaskCount: Average
    • MemoryUtilized: Average
    • MemoryReserved: Average
    • NetworkRxBytes: Average
    • NetworkTxBytes: Average
    • PendingTaskCount: Average
    • RunningTaskCount: Average
    • ServiceCount: Average
    • StorageReadBytes: Average
    • StorageWriteBytes: Average
    • TaskCount: Average
    • TaskSetCount: Average
    • instance_cpu_limit: Average
    • instance_cpu_reserved_capacity: Average
    • instance_cpu_usage_total: Average
    • instance_cpu_utilization: Average
    • instance_filesystem_utilization: Average
    • instance_memory_limit: Average
    • instance_memory_reserved_capacity: Average
    • instance_memory_utliization: Average
    • instance_memory_working_set: Average
    • instance_network_total_bytes: Average
    • instance_number_of_running_tasks: Average
  • ebs
    • VolumeReadBytes: Sum
    • VolumeWriteBytes: Sum
    • VolumeReadOps: Average
    • VolumeWriteOps: Average
    • VolumeTotalReadTime: Average
    • VolumeTotalWriteTime: Average
    • VolumeIdleTime: Average
    • VolumeQueueLength: Average
    • VolumeThroughputPercentage: Average
    • VolumeConsumedReadWriteOps: Average
    • BurstBalance: Average
  • ec2
    • CPUUtilization: Maximum
    • NetworkIn: Average,Sum
    • NetworkOut: Average,Sum
    • NetworkPacketsIn: Sum
    • NetworkPacketsOut: Sum
    • DiskReadBytes: Sum
    • DiskWriteBytes: Sum
    • DiskReadOps: Sum
    • DiskWriteOps: Sum
    • StatusCheckFailed: Sum
    • StatusCheckFailed_Instance: Sum
    • StatusCheckFailed_System: Sum
  • lambda
    • Invocations: Sum
    • Errors: Sum
    • Throttles: Sum
    • Duration: Maximum,Minimum,p90
  • rds
    • CPUUtilization: Maximum
    • DatabaseConnections: Sum
    • FreeableMemory: Average
    • FreeStorageSpace: Average
    • ReadThroughput: Average
    • WriteThroughput: Average
    • ReadLatency: Maximum
    • WriteLatency: Maximum
    • ReadIOPS: Average
    • WriteIOPS: Average
  • s3
    • NumberOfObjects: Average
    • BucketSizeBytes: Average
    • AllRequests: Sum
    • 4xxErrors: Sum
    • TotalRequestLatency: p95

Dashboards

This integration includes the following dashboards.

  • AWS EBS
  • AWS EC2
  • AWS Lambda
  • AWS RDS
  • AWS S3

Cost

By connecting your AWS CloudWatch Metrics to Grafana Cloud you might incur charges. For more information, use the following links:

  • By connecting your CloudWatch metrics instance to Grafana Cloud you might incur charges. To view information on the number of active series that your Grafana Cloud account uses for metrics included in each Cloud tier, see Active series and dpm usage and Cloud tier pricing.
  • The CloudWatch integration uses the ListMetrics and GetMetricData CloudWatch API calls to list and retrieve metrics and the GetResources Resource Groups Tagging API to discover resources
  • Each service configured in a job is scraped independently and the call counts below assume we are scraping lambda
    • GetResources is called 1 time
    • ListMetrics is called 4 times, once per metric
    • Assuming ListMetrics returned 5 values for each metric
      • GetMetricData would be called a single time requesting 30 metrics, 1 per metric + statistic combination
  • See CloudWatch Pricing for cost information associated with CloudWatch APIs

Changelog

# 0.0.8 - June 2022

* Change 'job' dashboard template variable label to lowercase, and add  for allValues on all dashboards
* Change 'instance' dashboard template variable label to lowercase
* Re-add dashboard UIDs to satisfy mixtool linting

# 0.0.7 - December 2021

* Move datasource specification to each target to resolve failed queries in 8.3.x

# 0.0.6 - November 2021

* Set static x/y panel coords in an attempt to work around render issues with collapsed rows

# 0.0.5 - October 2021

* Remove dashboard UIDs which conflict with static UIDs set in the CloudWatch datasource codebase.

# 0.0.4 - October 2021

* Improve copy at the top of each dashboard.

# 0.0.3 - September 2021

* Several fixes for EBS dashboard having empty panels
* Force all dashboards to have a refresh of 5m
* Set `job` and `region` in all CW dashboards
* Add an info text at the top of every dashboard

# 0.0.2 - July 2021

* Re-implement Lambda Dashboard with the following changes:
  - Allow all and multi select on function name variable
  - Add region variable
  - Fix queries to use regex match and show only one value per function

# 0.0.1 - April 2021

* Initial release