Menu
Grafana Cloud

Services

CloudWatch metrics supports the following services, and allows you to pick from a wide array of available metrics and statistics. Metrics in bold text are included in the default configuration. The statistics for all metrics are Average, Maximum, Minimum, Sum, SampleCount, p50, p75, p90, p95, p99.

AWS/ACMPrivateCA

Function: Provides a private certificate authority for managing SSL/TLS certificates

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_acmprivateca_info
aws_acmprivateca_crlgeneratedCRLGeneratedMonitors the number of Certificate Revocation Lists (CRLs) generated. Used to ensure the regular creation of revocation lists for certificate management.
aws_acmprivateca_failureFailureTracks the number of failures in Private CA operations. Useful for identifying issues in certificate issuance or other operations.
aws_acmprivateca_misconfigured_crlbucketMisconfiguredCRLBucketMonitors the number of instances where the CRL bucket is misconfigured. Useful for ensuring proper configuration and access to the CRL storage bucket.
aws_acmprivateca_successSuccessTracks the number of successful operations within the ACM Private CA. Useful for monitoring operational efficiency and successful certificate issuances.
aws_acmprivateca_timeTimeMeasures the time taken for various operations in ACM Private CA, helping to monitor performance and identify any slowdowns in certificate processing.

AWS/AmazonMQ

Function: Managed message broker service for Apache ActiveMQ and RabbitMQ

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_amazonmq_info
aws_amazonmq_ack_rateAckRateMonitors the acknowledgment rate of messages, ensuring efficient message processing and acknowledgment.
aws_amazonmq_burst_balanceBurstBalanceTracks the balance of burst credits, monitoring if the broker can handle sudden spikes in traffic.
aws_amazonmq_channel_countChannelCountMonitors the number of active channels, indicating resource usage and load on the broker.
aws_amazonmq_confirm_rateConfirmRateMeasures the rate at which messages are confirmed, ensuring message delivery guarantees.
aws_amazonmq_connection_countConnectionCountTracks the number of active connections, helping monitor broker usage and possible overloading.
aws_amazonmq_consumer_countConsumerCountMonitors the number of consumers connected, useful for understanding broker demand and throughput.
aws_amazonmq_cpu_credit_balanceCpuCreditBalanceTracks the remaining CPU credits, important for ensuring the broker has enough processing power to handle workload.
aws_amazonmq_cpu_utilizationCpuUtilizationMeasures the percentage of CPU usage, helping identify potential performance bottlenecks.
aws_amazonmq_current_connections_countCurrentConnectionsCountShows the number of currently connected clients, useful for tracking session loads.
aws_amazonmq_dequeue_countDequeueCountMonitors the number of messages dequeued, which helps gauge message consumption activity.
aws_amazonmq_dispatch_countDispatchCountMeasures the number of messages dispatched to consumers, helping monitor message flow.
aws_amazonmq_enqueue_countEnqueueCountTracks the number of messages enqueued, giving insights into the volume of messages entering the system.
aws_amazonmq_enqueue_timeEnqueueTimeMeasures the time taken to enqueue messages, used to monitor latency and performance.
aws_amazonmq_established_connections_countEstablishedConnectionsCountTracks the number of successfully established connections, used to monitor system stability.
aws_amazonmq_exchange_countExchangeCountMonitors the number of exchanges, useful for analyzing message routing activity.
aws_amazonmq_expired_countExpiredCountTracks the number of messages that have expired without being consumed, useful for monitoring failed message deliveries.
aws_amazonmq_heap_usageHeapUsageMeasures the heap memory usage of the broker, useful for detecting memory-related performance issues.
aws_amazonmq_in_flight_countInFlightCountMonitors the number of messages currently in transit, helping to ensure the broker isn’t overwhelmed by unacknowledged messages.
aws_amazonmq_inactive_durable_topic_subscribers_countInactiveDurableTopicSubscribersCountMonitors inactive durable subscribers, useful for tracking unused resources or inefficient topic subscriptions.
aws_amazonmq_job_scheduler_store_percent_usageJobSchedulerStorePercentUsageMeasures the percentage of the job scheduler store usage, important for capacity planning and performance.
aws_amazonmq_journal_files_for_fast_recoveryJournalFilesForFastRecoveryMonitors the number of journal files available for fast recovery, ensuring quick system recovery.
aws_amazonmq_journal_files_for_full_recoveryJournalFilesForFullRecoveryTracks journal files required for full recovery, ensuring data durability and integrity during failures.
aws_amazonmq_memory_usageMemoryUsageMeasures the memory usage of the broker, ensuring the broker has adequate memory for message processing.
aws_amazonmq_message_countMessageCountTracks the total number of messages in the broker, providing insights into message load and storage.
aws_amazonmq_message_ready_countMessageReadyCountMonitors the number of messages ready for delivery, helping gauge the efficiency of message consumption.
aws_amazonmq_message_unacknowledged_countMessageUnacknowledgedCountTracks unacknowledged messages, useful for detecting potential message delivery problems.
aws_amazonmq_network_inNetworkInMeasures the incoming network traffic, useful for tracking data ingestion and throughput.
aws_amazonmq_network_outNetworkOutMeasures the outgoing network traffic, helping monitor data egress and bandwidth usage.
aws_amazonmq_open_transaction_countOpenTransactionCountTracks the number of open transactions, useful for identifying resource contention or potential system stalls.
aws_amazonmq_producer_countProducerCountMonitors the number of producers, useful for understanding message production activity in the system.
aws_amazonmq_publish_ratePublishRateMeasures the rate at which messages are being published, providing insights into message inflow.
aws_amazonmq_queue_countQueueCountTracks the number of active queues, useful for analyzing message distribution across queues.
aws_amazonmq_queue_sizeQueueSizeMonitors the size of the message queues, helping gauge message backlog and system load.
aws_amazonmq_rabbit_mqdisk_freeRabbitMQDiskFreeTracks the available disk space for RabbitMQ, ensuring that there’s enough storage for message persistence.
aws_amazonmq_rabbit_mqdisk_free_limitRabbitMQDiskFreeLimitMonitors the disk free space threshold, alerting when approaching critical limits to avoid disruptions.
aws_amazonmq_rabbit_mqfd_usedRabbitMQFdUsedTracks the number of file descriptors used by RabbitMQ, ensuring system resources are not exhausted.
aws_amazonmq_rabbit_mqmem_limitRabbitMQMemLimitMonitors the memory usage limit for RabbitMQ, ensuring the broker doesn’t run out of memory.
aws_amazonmq_rabbit_mqmem_usedRabbitMQMemUsedMeasures the memory currently in use by RabbitMQ, useful for monitoring resource efficiency.
aws_amazonmq_receive_countReceiveCountTracks the number of received messages, helping monitor message inflow and processing rates.
aws_amazonmq_store_percent_usageStorePercentUsageMonitors the percentage of the store usage, ensuring sufficient capacity for message persistence.
aws_amazonmq_system_cpu_utilizationSystemCpuUtilizationMeasures the CPU usage of the underlying system, helping to detect potential CPU bottlenecks.
aws_amazonmq_temp_percent_usageTempPercentUsageMonitors the percentage usage of temporary storage, useful for avoiding storage exhaustion during peak loads.
aws_amazonmq_total_consumer_countTotalConsumerCountTracks the total number of consumers, helping assess the overall load and activity on the broker.
aws_amazonmq_total_dequeue_countTotalDequeueCountMonitors the total number of dequeued messages, useful for analyzing message consumption rates.
aws_amazonmq_total_enqueue_countTotalEnqueueCountTracks the total number of enqueued messages, providing insights into message production volumes.
aws_amazonmq_total_message_countTotalMessageCountMonitors the total count of messages in the system, giving an overview of the message load.
aws_amazonmq_total_producer_countTotalProducerCountTracks the total number of producers, useful for understanding message inflow activity.
aws_amazonmq_volume_read_opsVolumeReadOpsMeasures the number of read operations on the broker’s volume, helping monitor disk I/O performance.
aws_amazonmq_volume_write_opsVolumeWriteOpsMeasures the number of write operations on the broker’s volume, useful for detecting disk I/O bottlenecks.

AWS/ApiGateway

Function: Enables developers to create and manage APIs for accessing data and services

Scrape interval: 5 minutes

Includes: Out-of-the-box dashboard

MetricCloudwatch metricPurpose
aws_apigateway_info
aws_apigateway_4xx4xxMonitors the number of 4xx client errors, used to track issues related to invalid requests from clients.
aws_apigateway_5xx5xxTracks the number of 5xx server errors, used to monitor API Gateway or backend server issues.
aws_apigateway_countCountMeasures the total number of API requests, providing insights into traffic volume.
aws_apigateway_integration_latencyIntegrationLatencyMonitors the latency between API Gateway and the backend integration, useful for diagnosing performance issues in backend services.
aws_apigateway_latencyLatencyTracks overall API latency, including both API Gateway processing and backend integration latency, helping to monitor user experience.
aws_apigateway_4_xxerror4XXErrorMeasures the occurrence of 4xx errors (client errors), useful for understanding the rate of client-related issues.
aws_apigateway_5_xxerror5XXErrorMonitors 5xx errors (server errors), used to detect server-side failures in the API Gateway or its backend.
aws_apigateway_cache_hit_countCacheHitCountTracks the number of times API requests were served from the cache, helping to monitor the efficiency of cache usage.
aws_apigateway_cache_miss_countCacheMissCountMonitors the number of cache misses, useful for optimizing cache configuration and reducing backend load.
aws_apigateway_client_errorClientErrorMeasures errors originating from the client (4xx), used to monitor the rate of invalid requests sent by clients.
aws_apigateway_connect_countConnectCountTracks the number of successful WebSocket connection requests, providing insights into the usage of WebSocket APIs.
aws_apigateway_data_processedDataProcessedMonitors the amount of data processed by the API Gateway, useful for analyzing API data transfer and throughput.
aws_apigateway_execution_errorExecutionErrorTracks execution errors during the API request process, useful for identifying failures in API execution logic.
aws_apigateway_integration_errorIntegrationErrorMonitors errors that occur during integration with backend services, useful for detecting issues in backend communication.
aws_apigateway_message_countMessageCountTracks the number of messages sent and received in WebSocket APIs, useful for monitoring message flow in real-time communication APIs.

AWS/AppStream

Function: Delivers cloud-based desktops and applications to end-users on any device

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_appstream_info
aws_appstream_actual_capacityActualCapacityMonitors the actual number of available instances for streaming, used to ensure enough resources are deployed.
aws_appstream_available_capacityAvailableCapacityTracks the number of instances available for use but not currently in use, helping to gauge spare capacity for handling future demand.
aws_appstream_capacity_utilizationCapacityUtilizationMeasures the percentage of capacity utilization, useful for optimizing resource allocation and ensuring cost-effective usage.
aws_appstream_desired_capacityDesiredCapacityRepresents the desired number of instances based on scaling policies, helping to monitor scaling efficiency and capacity planning.
aws_appstream_in_use_capacityInUseCapacityTracks the number of instances currently in use, helping to monitor active workload and resource consumption.
aws_appstream_insufficient_capacity_errorInsufficientCapacityErrorMeasures the number of times a capacity request failed due to insufficient resources, indicating capacity shortages or bottlenecks.
aws_appstream_pending_capacityPendingCapacityMonitors instances that are in the process of being provisioned, helping to track the status of scaling events.
aws_appstream_running_capacityRunningCapacityTracks the total number of running instances, providing insights into the active resources currently being used to support users.

AWS/AppSync

Function: Managed service for building GraphQL APIs that connects to data sources like DynamoDB

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_appsync_info
aws_appsync_4_xxerror4XXErrorMonitors client-side (4xx) errors in requests, useful for tracking invalid requests made by clients.
aws_appsync_5_xxerror5XXErrorTracks server-side (5xx) errors, helping to detect issues in the API or the server infrastructure.
aws_appsync_active_connectionsActiveConnectionsMeasures the number of active WebSocket connections, useful for understanding the real-time activity on the AppSync API.
aws_appsync_active_subscriptionsActiveSubscriptionsTracks the number of active subscriptions, helping to monitor usage and engagement with subscription-based real-time data services.
aws_appsync_connect_client_errorConnectClientErrorMonitors errors encountered by clients while trying to establish connections, indicating issues in the client-side configuration or request.
aws_appsync_connect_server_errorConnectServerErrorTracks server-side errors during the connection process, helping to identify server-side failures or misconfigurations during connection attempts.
aws_appsync_connect_successConnectSuccessMeasures the successful WebSocket connection attempts, useful for monitoring overall connection success rates.
aws_appsync_connection_durationConnectionDurationMonitors the duration of WebSocket connections, helping to gauge session longevity and user engagement.
aws_appsync_disconnect_client_errorDisconnectClientErrorTracks errors that occur when clients try to disconnect, useful for monitoring client-side disconnection issues.
aws_appsync_disconnect_server_errorDisconnectServerErrorMonitors server-side errors during disconnection, helping to detect issues in properly closing WebSocket connections.
aws_appsync_disconnect_successDisconnectSuccessMeasures successful disconnections from WebSocket connections, useful for ensuring smooth session terminations.
aws_appsync_latencyLatencyTracks the time taken to process requests, useful for monitoring API performance and identifying latency issues.
aws_appsync_publish_data_message_client_errorPublishDataMessageClientErrorMonitors client-side errors during data message publishing, used to detect issues with client-side data transmission.
aws_appsync_publish_data_message_server_errorPublishDataMessageServerErrorTracks server-side errors during data message publishing, helping to identify issues in server-side message handling or transmission.
aws_appsync_publish_data_message_sizePublishDataMessageSizeMeasures the size of data messages being published, useful for tracking payload sizes and ensuring efficient message transmission.
aws_appsync_publish_data_message_successPublishDataMessageSuccessTracks successful data message publications, helping to monitor overall message delivery success.
aws_appsync_requestsRequestsMeasures the total number of requests processed by AppSync, providing insights into traffic and API usage.
aws_appsync_subscribe_client_errorSubscribeClientErrorMonitors client-side errors during subscription attempts, useful for tracking issues in subscribing to real-time data feeds.
aws_appsync_subscribe_server_errorSubscribeServerErrorTracks server-side errors during subscription attempts, helping to identify server failures when clients try to subscribe.
aws_appsync_subscribe_successSubscribeSuccessMeasures successful subscription attempts, useful for monitoring subscription adoption and engagement rates.
aws_appsync_tokens_consumedTokensConsumedTracks the number of tokens consumed by requests, useful for managing API rate limits and monitoring user activity.
aws_appsync_unsubscribe_client_errorUnsubscribeClientErrorMonitors client-side errors during unsubscription attempts, used to detect issues when clients try to unsubscribe from data feeds.
aws_appsync_unsubscribe_server_errorUnsubscribeServerErrorTracks server-side errors during unsubscription attempts, useful for identifying server-side issues when clients try to unsubscribe.
aws_appsync_unsubscribe_successUnsubscribeSuccessMeasures successful unsubscription attempts, ensuring smooth termination of real-time data subscriptions.

AWS/ApplicationELB

Function: Distributes incoming traffic to targets like EC2 instances, containers, and IP addresses

Scrape interval: 5 minutes

Includes: Out-of-the-box dashboard

MetricCloudwatch metricPurpose
aws_applicationelb_info
aws_applicationelb_active_connection_countActiveConnectionCountMonitors the number of active connections, useful for understanding current load on the load balancer.
aws_applicationelb_client_tlsnegotiation_error_countClientTLSNegotiationErrorCountTracks the number of failed TLS negotiations between clients and the load balancer, used to detect TLS handshake issues.
aws_applicationelb_consumed_lcusConsumedLCUsMeasures the number of Load Balancer Capacity Units (LCUs) used, helping to track resource consumption and cost.
aws_applicationelb_elbauth_errorELBAuthErrorTracks errors during authentication processes, useful for monitoring failures in authentication workflows.
aws_applicationelb_elbauth_failureELBAuthFailureMonitors failed authentication attempts, helping detect potential security issues or configuration problems.
aws_applicationelb_elbauth_latencyELBAuthLatencyMeasures the latency of authentication requests, useful for identifying delays in authentication workflows.
aws_applicationelb_elbauth_refresh_token_successELBAuthRefreshTokenSuccessTracks successful refresh token requests, useful for monitoring token refresh operations.
aws_applicationelb_elbauth_successELBAuthSuccessMeasures successful authentication requests, useful for monitoring authentication performance.
aws_applicationelb_elbauth_user_claims_size_exceededELBAuthUserClaimsSizeExceededMonitors instances where user claims exceed the allowed size, which can help in tuning authentication configurations.
aws_applicationelb_httpcode_elb_3_xx_countHTTPCode_ELB_3XX_CountTracks the number of 3xx HTTP responses, which indicate redirection, useful for monitoring redirects on the load balancer.
aws_applicationelb_httpcode_elb_4_xx_countHTTPCode_ELB_4XX_CountMonitors the number of 4xx client error responses, useful for detecting invalid client requests.
aws_applicationelb_httpcode_elb_5_xx_countHTTPCode_ELB_5XX_CountTracks the number of 5xx server error responses, helping identify backend issues.
aws_applicationelb_httpcode_target_2_xx_countHTTPCode_Target_2XX_CountMeasures the number of successful 2xx responses from targets, useful for tracking successful request handling.
aws_applicationelb_httpcode_target_3_xx_countHTTPCode_Target_3XX_CountMonitors the number of 3xx redirects from target servers, useful for understanding traffic redirection by targets.
aws_applicationelb_httpcode_target_4_xx_countHTTPCode_Target_4XX_CountTracks 4xx client errors returned by target servers, helping identify configuration or client-side issues.
aws_applicationelb_httpcode_target_5_xx_countHTTPCode_Target_5XX_CountMonitors the number of 5xx errors returned by target servers, useful for identifying server-side issues.
aws_applicationelb_ipv6_processed_bytesIPv6ProcessedBytesMeasures the number of bytes processed over IPv6, useful for tracking IPv6 traffic volume.
aws_applicationelb_ipv6_request_countIPv6RequestCountTracks the number of IPv6 requests, providing insights into IPv6 usage and adoption.
aws_applicationelb_new_connection_countNewConnectionCountMonitors the number of new connections established, helping understand connection initiation patterns.
aws_applicationelb_processed_bytesProcessedBytesMeasures the total amount of data processed by the load balancer, useful for tracking overall throughput.
aws_applicationelb_rejected_connection_countRejectedConnectionCountTracks the number of connections rejected by the load balancer, useful for identifying capacity or configuration issues.
aws_applicationelb_request_countRequestCountMeasures the total number of requests handled by the load balancer, useful for monitoring traffic volume.
aws_applicationelb_rule_evaluationsRuleEvaluationsTracks the number of rule evaluations on the load balancer, helping to monitor rule complexity and processing time.
aws_applicationelb_target_connection_error_countTargetConnectionErrorCountMonitors the number of connection errors to target servers, useful for identifying connectivity issues between the load balancer and targets.
aws_applicationelb_target_response_timeTargetResponseTimeMeasures the response time of target servers, helping to track backend performance and latency.
aws_applicationelb_target_tlsnegotiation_error_countTargetTLSNegotiationErrorCountTracks failed TLS negotiations between the load balancer and target servers, useful for detecting SSL/TLS issues with backend services.
aws_applicationelb_anomalous_host_countAnomalousHostCountMonitors the number of hosts showing anomalous behavior, helping detect potential security issues or performance outliers.
aws_applicationelb_desync_mitigation_mode_non_compliant_request_countDesyncMitigationMode_NonCompliant_Request_CountTracks non-compliant requests under desync mitigation mode, useful for monitoring and securing application traffic.
aws_applicationelb_dropped_invalid_header_request_countDroppedInvalidHeaderRequestCountMonitors requests dropped due to invalid headers, helping identify and fix misconfigurations or potential security risks.
aws_applicationelb_forwarded_invalid_header_request_countForwardedInvalidHeaderRequestCountTracks invalid header requests that were forwarded, helping detect improper traffic that bypassed filtering.
aws_applicationelb_grpc_request_countGrpcRequestCountMeasures the number of gRPC requests handled, useful for tracking gRPC-based API traffic.
aws_applicationelb_httpcode_elb_500_countHTTPCode_ELB_500_CountTracks the number of 500 Internal Server Errors from the load balancer, useful for detecting backend or load balancer failures.
aws_applicationelb_httpcode_elb_502_countHTTPCode_ELB_502_CountMonitors the number of 502 Bad Gateway errors, indicating backend communication failures.
aws_applicationelb_httpcode_elb_503_countHTTPCode_ELB_503_CountTracks the number of 503 Service Unavailable errors, helping detect capacity or service availability issues.
aws_applicationelb_httpcode_elb_504_countHTTPCode_ELB_504_CountMeasures the number of 504 Gateway Timeout errors, indicating backend timeouts.
aws_applicationelb_http_fixed_response_countHTTP_Fixed_Response_CountTracks the number of fixed responses sent by the load balancer, useful for monitoring traffic directed to predefined responses.
aws_applicationelb_http_redirect_countHTTP_Redirect_CountMonitors the number of HTTP redirects sent by the load balancer, useful for tracking traffic redirection.
aws_applicationelb_http_redirect_url_limit_exceeded_countHTTP_Redirect_Url_Limit_Exceeded_CountTracks instances where the redirect URL limit was exceeded, indicating potential configuration issues.
aws_applicationelb_healthy_host_countHealthyHostCountMeasures the number of healthy hosts behind the load balancer, helping monitor service availability.
aws_applicationelb_healthy_state_dnsHealthyStateDNSMonitors DNS health state, useful for ensuring DNS routing functionality.
aws_applicationelb_healthy_state_routingHealthyStateRoutingTracks the health of routing decisions by the load balancer, ensuring smooth traffic distribution.
aws_applicationelb_lambda_internal_errorLambdaInternalErrorMonitors internal errors in AWS Lambda functions invoked by the load balancer, useful for debugging serverless application issues.
aws_applicationelb_lambda_target_processed_bytesLambdaTargetProcessedBytesMeasures the bytes processed by Lambda targets, providing insights into data throughput for serverless applications.
aws_applicationelb_lambda_user_errorLambdaUserErrorTracks user-triggered errors in Lambda functions, helping to identify issues in function logic or inputs.
aws_applicationelb_mitigated_host_countMitigatedHostCountMonitors the number of hosts mitigated due to anomalies, useful for tracking security incidents.
aws_applicationelb_non_sticky_request_countNonStickyRequestCountMeasures the number of non-sticky requests handled, helping to monitor session persistence performance.
aws_applicationelb_request_count_per_targetRequestCountPerTargetTracks the number of requests processed per target, useful for understanding traffic distribution and load balancing efficiency.
aws_applicationelb_standard_processed_bytesStandardProcessedBytesMeasures the total amount of bytes processed, useful for tracking data throughput on standard targets.
aws_applicationelb_un_healthy_host_countUnHealthyHostCountMonitors the number of unhealthy hosts behind the load balancer, helping to identify availability issues.
aws_applicationelb_unhealthy_routing_request_countUnhealthyRoutingRequestCount
aws_applicationelb_unhealthy_state_dnsUnhealthyStateDNS
aws_applicationelb_unhealthy_state_routingUnhealthyStateRouting

AWS/Athena

Function: Interactive query service to analyze data in S3 using SQL

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_athena_info
aws_athena_engine_execution_timeEngineExecutionTimeMeasures the time taken by the query engine to execute a query, helping to monitor query performance and identify execution bottlenecks.
aws_athena_processed_bytesProcessedBytesTracks the amount of data processed by the query engine, useful for understanding query cost and efficiency.
aws_athena_query_planning_timeQueryPlanningTimeMonitors the time taken to plan and prepare the query for execution, helping identify delays during the query planning phase.
aws_athena_query_queue_timeQueryQueueTimeMeasures the time a query spends in the queue before execution, useful for monitoring system load and query prioritization issues.
aws_athena_service_processing_timeServiceProcessingTimeTracks the time taken by Athena’s internal services to process a query, helping to identify processing delays within the service.
aws_athena_total_execution_timeTotalExecutionTimeMeasures the total time from query submission to completion, providing a comprehensive view of query performance and potential bottlenecks.

AWS/AutoScaling

Function: Automatically adjusts capacity to maintain performance and cost efficiency

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_autoscaling_info
aws_autoscaling_group_and_warm_pool_desired_capacityGroupAndWarmPoolDesiredCapacityMonitors the desired capacity of both the Auto Scaling group and the warm pool, used to ensure adequate resources are provisioned.
aws_autoscaling_group_and_warm_pool_total_capacityGroupAndWarmPoolTotalCapacityTracks the total capacity of the Auto Scaling group and warm pool, providing an overview of the available resources.
aws_autoscaling_group_desired_capacityGroupDesiredCapacityMeasures the desired number of instances in the Auto Scaling group, useful for capacity planning and scaling decisions.
aws_autoscaling_group_in_service_capacityGroupInServiceCapacityTracks the number of instances currently in service, helping to monitor the active workload.
aws_autoscaling_group_in_service_instancesGroupInServiceInstancesMonitors the actual number of instances currently running in the group, useful for managing resource availability.
aws_autoscaling_group_max_sizeGroupMaxSizeMeasures the maximum size of the Auto Scaling group, helping ensure the group does not exceed the defined limit.
aws_autoscaling_group_min_sizeGroupMinSizeTracks the minimum size of the Auto Scaling group, ensuring a baseline level of capacity is maintained.
aws_autoscaling_group_pending_capacityGroupPendingCapacityMonitors the capacity of instances that are pending launch, useful for understanding the state of scaling events.
aws_autoscaling_group_pending_instancesGroupPendingInstancesTracks the number of instances that are pending launch, helping monitor scaling processes in progress.
aws_autoscaling_group_standby_capacityGroupStandbyCapacityMeasures the capacity of instances in standby mode, useful for tracking inactive but available resources.
aws_autoscaling_group_standby_instancesGroupStandbyInstancesMonitors the number of instances in standby mode, helping assess resource availability for scaling.
aws_autoscaling_group_terminating_capacityGroupTerminatingCapacityTracks the capacity of instances being terminated, helping to monitor scaling down activities.
aws_autoscaling_group_terminating_instancesGroupTerminatingInstancesMonitors the number of instances being terminated, useful for understanding scaling down operations.
aws_autoscaling_group_total_capacityGroupTotalCapacityMeasures the total capacity of the Auto Scaling group, providing a complete view of resources available for scaling.
aws_autoscaling_group_total_instancesGroupTotalInstancesTracks the total number of instances in the Auto Scaling group, helping to monitor overall resource allocation.
aws_autoscaling_predictive_scaling_capacity_forecastPredictiveScalingCapacityForecastProvides forecasted capacity based on predictive scaling, helping to plan for future resource needs.
aws_autoscaling_predictive_scaling_load_forecastPredictiveScalingLoadForecastTracks forecasted load on the Auto Scaling group, helping to ensure capacity meets future demand.
aws_autoscaling_predictive_scaling_metric_pair_correlationPredictiveScalingMetricPairCorrelationMeasures the correlation between metric pairs for predictive scaling, useful for improving prediction accuracy.
aws_autoscaling_warm_pool_desired_capacityWarmPoolDesiredCapacityMonitors the desired capacity of the warm pool, helping to ensure the pool has sufficient resources for quick scaling.
aws_autoscaling_warm_pool_min_sizeWarmPoolMinSizeTracks the minimum size of the warm pool, ensuring a baseline level of resources for rapid scaling.
aws_autoscaling_warm_pool_pending_capacityWarmPoolPendingCapacityMeasures the capacity of instances pending in the warm pool, useful for understanding warm pool availability.
aws_autoscaling_warm_pool_terminating_capacityWarmPoolTerminatingCapacityMonitors the capacity of instances being terminated in the warm pool, helping to track scaling down activities.
aws_autoscaling_warm_pool_total_capacityWarmPoolTotalCapacityTracks the total capacity of the warm pool, providing a complete view of available resources for quick scaling.
aws_autoscaling_warm_pool_warmed_capacityWarmPoolWarmedCapacityMeasures the capacity of warmed instances in the warm pool, useful for tracking resources that are ready for immediate use.

AWS/Backup

Function: Centralized backup service to automate and manage backups across AWS services

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_backup_info
aws_backup_number_of_backup_jobs_abortedNumberOfBackupJobsAbortedTracks the number of backup jobs that were aborted, useful for monitoring failed or incomplete backup operations.
aws_backup_number_of_backup_jobs_completedNumberOfBackupJobsCompletedMeasures the number of backup jobs successfully completed, useful for tracking the effectiveness of backup operations.
aws_backup_number_of_backup_jobs_createdNumberOfBackupJobsCreatedTracks the total number of backup jobs initiated, helping to monitor backup frequency and schedule adherence.
aws_backup_number_of_backup_jobs_expiredNumberOfBackupJobsExpiredMonitors the number of backup jobs that have expired, useful for ensuring data retention policies are followed.
aws_backup_number_of_backup_jobs_failedNumberOfBackupJobsFailedMeasures the number of backup jobs that have failed, useful for identifying errors in the backup process.
aws_backup_number_of_backup_jobs_pendingNumberOfBackupJobsPendingTracks the number of backup jobs currently in a pending state, helping monitor delays or scheduling issues.
aws_backup_number_of_backup_jobs_runningNumberOfBackupJobsRunningMonitors the number of backup jobs that are currently running, useful for tracking ongoing backup processes.
aws_backup_number_of_copy_jobs_completedNumberOfCopyJobsCompletedMeasures the number of copy jobs successfully completed, helping track backup data replication across regions or storage tiers.
aws_backup_number_of_copy_jobs_createdNumberOfCopyJobsCreatedTracks the number of initiated copy jobs, useful for monitoring data replication schedules.
aws_backup_number_of_copy_jobs_failedNumberOfCopyJobsFailedMonitors the number of failed copy jobs, helping to detect issues with backup replication processes.
aws_backup_number_of_copy_jobs_runningNumberOfCopyJobsRunningTracks the number of copy jobs currently in progress, useful for monitoring ongoing replication activities.
aws_backup_number_of_recovery_points_coldNumberOfRecoveryPointsColdMeasures the number of cold (archived) recovery points, useful for tracking long-term storage of backup data.
aws_backup_number_of_recovery_points_completedNumberOfRecoveryPointsCompletedTracks the total number of recovery points successfully created, helping to ensure that data can be restored when needed.
aws_backup_number_of_recovery_points_deletingNumberOfRecoveryPointsDeletingMonitors the number of recovery points being deleted, useful for tracking clean-up or retention policy actions.
aws_backup_number_of_recovery_points_expiredNumberOfRecoveryPointsExpiredMeasures the number of expired recovery points, useful for ensuring compliance with retention policies.
aws_backup_number_of_recovery_points_partialNumberOfRecoveryPointsPartialTracks the number of incomplete (partial) recovery points, helping to identify issues with backup integrity or storage capacity.
aws_backup_number_of_restore_jobs_completedNumberOfRestoreJobsCompletedMeasures the number of successful restore jobs, useful for tracking data recovery operations.
aws_backup_number_of_restore_jobs_failedNumberOfRestoreJobsFailedMonitors the number of restore jobs that have failed, useful for identifying problems in the recovery process.
aws_backup_number_of_restore_jobs_pendingNumberOfRestoreJobsPendingTracks the number of restore jobs that are pending, useful for monitoring delays in data recovery.
aws_backup_number_of_restore_jobs_runningNumberOfRestoreJobsRunningMonitors the number of restore jobs currently in progress, helping to track ongoing recovery processes.

AWS/Billing

Function: Provides detailed usage and cost data for AWS services. This service only produces metrics to specific regions in AWS. Any jobs configured with this service will only gather data from the us-east-1 regions.

Scrape interval: 5 minutes

Includes: Out-of-the-box dashboard

MetricCloudwatch metricPurpose
aws_billing_estimated_chargesEstimatedChargesTracks the estimated charges for your AWS account, providing insights into overall AWS cost and usage. This is useful for budget monitoring and cost management over time, helping to identify cost spikes or unusual charges.

AWS/Cassandra

Function: Managed Apache Cassandra-compatible database service

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_cassandra_info
aws_cassandra_account_max_readsAccountMaxReadsTracks the maximum number of read requests for the account, helping monitor and manage read activity and limits.
aws_cassandra_account_max_table_level_readsAccountMaxTableLevelReadsMeasures the maximum number of reads at the table level, useful for understanding read distribution across tables.
aws_cassandra_account_max_table_level_writesAccountMaxTableLevelWritesTracks the maximum number of write operations at the table level, helping identify write-heavy tables.
aws_cassandra_account_max_writesAccountMaxWritesMeasures the maximum number of write requests for the account, useful for managing overall write throughput.
aws_cassandra_account_provisioned_read_capacity_utilizationAccountProvisionedReadCapacityUtilizationMonitors the utilization of provisioned read capacity, helping ensure optimal read capacity allocation.
aws_cassandra_account_provisioned_write_capacity_utilizationAccountProvisionedWriteCapacityUtilizationTracks the utilization of provisioned write capacity, ensuring efficient use of write resources.
aws_cassandra_conditional_check_failed_requestsConditionalCheckFailedRequestsMeasures the number of failed conditional checks, useful for monitoring logical errors during write operations.
aws_cassandra_consumed_read_capacity_unitsConsumedReadCapacityUnitsTracks the number of read capacity units consumed, helping monitor read activity and optimize capacity.
aws_cassandra_consumed_write_capacity_unitsConsumedWriteCapacityUnitsMonitors the number of write capacity units consumed, providing insights into write operations and capacity optimization.
aws_cassandra_max_provisioned_table_read_capacity_utilizationMaxProvisionedTableReadCapacityUtilizationTracks the maximum utilization of provisioned read capacity at the table level, helping manage read resources per table.
aws_cassandra_max_provisioned_table_write_capacity_utilizationMaxProvisionedTableWriteCapacityUtilizationMonitors the maximum utilization of provisioned write capacity at the table level, ensuring efficient use of write resources per table.
aws_cassandra_returned_item_countReturnedItemCountMeasures the total number of items returned by read operations, useful for understanding query efficiency.
aws_cassandra_returned_item_count_by_selectReturnedItemCountBySelectTracks the number of items returned by select queries, helping optimize query results and performance.
aws_cassandra_successful_request_countSuccessfulRequestCountMonitors the number of successful requests, providing insights into the operational success rate of read and write operations.
aws_cassandra_successful_request_latencySuccessfulRequestLatencyMeasures the latency of successful requests, helping to optimize performance and identify bottlenecks.
aws_cassandra_system_errorsSystemErrorsTracks the number of system-related errors, useful for identifying and addressing infrastructure or service issues.
aws_cassandra_user_errorsUserErrorsMonitors the number of user-related errors, helping identify application-level issues or misconfigurations.

AWS/CertificateManager

Function: Manages the provisioning, renewal, and deployment of SSL/TLS certificates

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_certificatemanager_info
aws_certificatemanager_days_to_expiryDaysToExpiryTracks the number of days remaining until an SSL/TLS certificate expires. This metric is useful for monitoring certificate lifecycles and ensuring that certificates are renewed before expiration to avoid service disruptions.

AWS/CloudFront

Function: Content delivery network to deliver data, videos, applications globally

Scrape interval: 5 minutes

Includes: Out-of-the-box dashboard

MetricCloudwatch metricPurpose
aws_cloudfront_info
aws_cloudfront_4xx_error_rate4xxErrorRateTracks the rate of 4xx client-side errors, helping to monitor user request issues.
aws_cloudfront_5xx_error_rate5xxErrorRateTracks the rate of 5xx server-side errors, useful for detecting backend or CloudFront issues.
aws_cloudfront_bytes_downloadedBytesDownloadedMeasures the total bytes downloaded via CloudFront, useful for monitoring bandwidth usage.
aws_cloudfront_bytes_uploadedBytesUploadedMonitors the amount of data uploaded to CloudFront, helping track upload activity.
aws_cloudfront_requestsRequestsTracks the total number of requests processed by CloudFront, providing insight into traffic volume.
aws_cloudfront_total_error_rateTotalErrorRateMeasures the combined rate of all error responses (both 4xx and 5xx), helping monitor service reliability.
aws_cloudfront_401_error_rate401ErrorRateTracks the rate of 401 Unauthorized errors, useful for monitoring authentication issues.
aws_cloudfront_403_error_rate403ErrorRateMonitors the rate of 403 Forbidden errors, helping to detect access control issues.
aws_cloudfront_404_error_rate404ErrorRateMeasures the rate of 404 Not Found errors, useful for tracking invalid requests or missing resources.
aws_cloudfront_502_error_rate502ErrorRateTracks the rate of 502 Bad Gateway errors, indicating backend server or network issues.
aws_cloudfront_503_error_rate503ErrorRateMonitors the rate of 503 Service Unavailable errors, helping to detect capacity or availability issues.
aws_cloudfront_504_error_rate504ErrorRateTracks the rate of 504 Gateway Timeout errors, indicating backend server delays.
aws_cloudfront_cache_hit_rateCacheHitRateMeasures the percentage of requests served from CloudFront’s cache, useful for optimizing content delivery efficiency.
aws_cloudfront_function_compute_utilizationFunctionComputeUtilizationTracks the compute utilization of CloudFront Functions, helping to monitor resource usage for custom code execution.
aws_cloudfront_function_execution_errorsFunctionExecutionErrorsMonitors the number of execution errors in CloudFront Functions, helping to identify failures in custom logic.
aws_cloudfront_function_invocationsFunctionInvocationsTracks the total number of CloudFront Function invocations, useful for monitoring function usage.
aws_cloudfront_function_throttlesFunctionThrottlesMeasures throttled CloudFront Function invocations, indicating capacity or rate-limiting issues.
aws_cloudfront_function_validation_errorsFunctionValidationErrorsTracks validation errors for CloudFront Functions, useful for debugging incorrect function configurations.
aws_cloudfront_lambda_execution_errorLambdaExecutionErrorMonitors errors during Lambda@Edge function execution, useful for identifying issues with serverless logic.
aws_cloudfront_lambda_limit_exceeded_errorsLambdaLimitExceededErrorsTracks instances where Lambda@Edge functions exceed their resource limits, helping detect performance bottlenecks.
aws_cloudfront_lambda_validation_errorLambdaValidationErrorMeasures Lambda@Edge validation errors, useful for ensuring proper configuration.
aws_cloudfront_origin_latencyOriginLatencyTracks the latency from CloudFront to the origin server, helping to identify performance bottlenecks in origin server communication.

AWS/Cognito

Function: Provides authentication, authorization, and user management for web and mobile apps

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_cognito_info
aws_cognito_account_take_over_riskAccountTakeOverRiskTracks the risk of account takeover attempts, useful for detecting malicious login attempts.
aws_cognito_compromised_credentials_riskCompromisedCredentialsRiskMonitors the risk of compromised credentials, helping to detect and mitigate security threats.
aws_cognito_federation_successesFederationSuccessesTracks the number of successful federated sign-ins, useful for monitoring third-party identity provider usage.
aws_cognito_federation_throttlesFederationThrottlesMeasures the number of throttled federation sign-in attempts, useful for identifying rate-limiting issues.
aws_cognito_no_riskNoRiskTracks the number of no-risk sign-ins, indicating successful and secure login attempts.
aws_cognito_override_blockOverrideBlockMonitors instances where an administrator overrides a block, useful for auditing account management actions.
aws_cognito_riskRiskTracks general login risk events, helping to monitor suspicious activity.
aws_cognito_sign_in_successesSignInSuccessesTracks the number of successful sign-ins, helping to monitor user authentication success.
aws_cognito_sign_in_throttlesSignInThrottlesMeasures the number of throttled sign-in attempts, useful for detecting excessive login activity or rate-limiting.
aws_cognito_sign_up_successesSignUpSuccessesTracks successful user sign-ups, providing insight into account creation trends.
aws_cognito_sign_up_throttlesSignUpThrottlesMeasures throttled sign-up attempts, useful for identifying potential rate-limiting or abuse during account creation.
aws_cognito_token_refresh_successesTokenRefreshSuccessesTracks the number of successful token refreshes, useful for monitoring user session continuity.
aws_cognito_token_refresh_throttlesTokenRefreshThrottlesMonitors the number of throttled token refresh requests, helping identify rate-limiting or session issues.

AWS/DDoSProtection

Function: Protects against distributed denial of service attacks with AWS Shield

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_ddosprotection_info
aws_ddosprotection_ddo_sattack_bits_per_secondDDoSAttackBitsPerSecondMonitors the volume of a DDoS attack in terms of data transfer per second, useful for detecting bandwidth-based attacks.
aws_ddosprotection_ddo_sattack_packets_per_secondDDoSAttackPacketsPerSecondTracks the number of packets involved in a DDoS attack per second, helping to identify packet flood attacks.
aws_ddosprotection_ddo_sattack_requests_per_secondDDoSAttackRequestsPerSecondMonitors the number of requests in a DDoS attack per second, useful for identifying application-layer DDoS attacks.
aws_ddosprotection_ddo_sdetectedDDoSDetectedTracks the detection of DDoS attacks, providing alerts when a potential attack is detected.
aws_ddosprotection_volume_bits_per_secondVolumeBitsPerSecondMonitors the data transfer volume per second during a DDoS attack, helping to understand the scale of the attack.
aws_ddosprotection_volume_packets_per_secondVolumePacketsPerSecondMeasures the volume of packets per second, useful for tracking the size of DDoS attacks in terms of packet rate.

AWS/DMS

Function: Migrates databases to AWS with minimal downtime

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_dms_info
aws_dms_cdcchanges_disk_sourceCDCChangesDiskSourceTracks changes to the disk source during Change Data Capture (CDC) operations, useful for monitoring disk-based CDC changes.
aws_dms_cdcchanges_disk_targetCDCChangesDiskTargetMonitors changes to the disk target during CDC, useful for tracking target-side disk usage in migrations.
aws_dms_cdcchanges_memory_sourceCDCChangesMemorySourceTracks memory usage on the source during CDC operations, helping monitor memory-based migrations.
aws_dms_cdcchanges_memory_targetCDCChangesMemoryTargetMonitors memory usage on the target during CDC operations, useful for tracking memory consumption on the target side.
aws_dms_cdcincoming_changesCDCIncomingChangesMeasures the number of incoming changes during CDC operations, helping to monitor the rate of data changes.
aws_dms_cdclatency_sourceCDCLatencySourceTracks latency on the source side during CDC operations, helping to identify performance issues with data changes.
aws_dms_cdclatency_targetCDCLatencyTargetMonitors the latency on the target side during CDC operations, useful for tracking potential bottlenecks.
aws_dms_cdcthroughput_bandwidth_sourceCDCThroughputBandwidthSourceMeasures the source bandwidth usage during CDC operations, helping to monitor network usage.
aws_dms_cdcthroughput_bandwidth_targetCDCThroughputBandwidthTargetMonitors the target bandwidth usage during CDC, useful for tracking data transfer rates.
aws_dms_cdcthroughput_rows_sourceCDCThroughputRowsSourceTracks the number of rows processed from the source during CDC operations, useful for monitoring data throughput.
aws_dms_cdcthroughput_rows_targetCDCThroughputRowsTargetMonitors the number of rows written to the target during CDC, helping to ensure data is migrated efficiently.
aws_dms_cpuutilizationCPUUtilizationMeasures the CPU usage of DMS instances, helping to ensure that the system has enough resources to perform migrations.
aws_dms_free_storage_spaceFreeStorageSpaceTracks the amount of free storage available on the DMS instance, useful for preventing storage exhaustion during migrations.
aws_dms_freeable_memoryFreeableMemoryMonitors the available memory on the DMS instance, useful for ensuring that enough memory is available for operations.
aws_dms_full_load_throughput_bandwidth_sourceFullLoadThroughputBandwidthSourceTracks bandwidth usage during full load operations on the source, useful for monitoring network utilization.
aws_dms_full_load_throughput_bandwidth_targetFullLoadThroughputBandwidthTargetMonitors bandwidth usage during full load operations on the target, helping track data transfer efficiency.
aws_dms_full_load_throughput_rows_sourceFullLoadThroughputRowsSourceTracks the number of rows processed from the source during full load migrations, helping to monitor data throughput.
aws_dms_full_load_throughput_rows_targetFullLoadThroughputRowsTargetMonitors the number of rows loaded to the target during full load operations, helping to ensure migration progress.
aws_dms_network_receive_throughputNetworkReceiveThroughputTracks the network receive rate, helping to monitor inbound network performance during migrations.
aws_dms_network_transmit_throughputNetworkTransmitThroughputMeasures the network transmit rate, useful for monitoring outbound network performance.
aws_dms_read_iopsReadIOPSTracks the number of read operations per second, helping to monitor disk read performance.
aws_dms_read_latencyReadLatencyMeasures the latency of read operations, helping to identify performance issues in disk reads.
aws_dms_read_throughputReadThroughputMonitors the throughput of read operations, useful for tracking how much data is being read during migrations.
aws_dms_swap_usageSwapUsageTracks the amount of swap space used, helping monitor memory performance.
aws_dms_write_iopsWriteIOPSMeasures the number of write operations per second, useful for monitoring disk write performance.
aws_dms_write_latencyWriteLatencyTracks the latency of write operations, helping identify performance issues during data writes.
aws_dms_write_throughputWriteThroughputMonitors the throughput of write operations, helping to understand the speed of data writes during migration operations.

AWS/DX

Function: AWS Direct Connect provides a dedicated network connection to AWS.

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_dx_info
aws_dx_connection_bps_egressConnectionBpsEgressMeasures the egress bandwidth (bits per second) for Direct Connect connections, helping monitor outbound data transfer.
aws_dx_connection_bps_ingressConnectionBpsIngressMonitors the ingress bandwidth (bits per second), providing insights into inbound data transfer rates.
aws_dx_connection_crcerror_countConnectionCRCErrorCountTracks CRC errors on the connection, useful for identifying data integrity issues or hardware problems.
aws_dx_connection_encryption_stateConnectionEncryptionStateMonitors the encryption state of Direct Connect connections, helping ensure secure data transfer.
aws_dx_connection_error_countConnectionErrorCountTracks the number of errors on the Direct Connect connection, useful for diagnosing connectivity issues.
aws_dx_connection_light_level_rxConnectionLightLevelRxMeasures the received light level, helping monitor the health of fiber optic connections.
aws_dx_connection_light_level_txConnectionLightLevelTxTracks the transmitted light level, helping ensure proper signal strength in fiber optic connections.
aws_dx_connection_pps_egressConnectionPpsEgressMonitors the number of packets per second being transmitted (egress), useful for tracking network traffic patterns.
aws_dx_connection_pps_ingressConnectionPpsIngressTracks the number of packets per second being received (ingress), useful for understanding inbound traffic load.
aws_dx_connection_stateConnectionStateMonitors the operational state of Direct Connect connections, helping to detect connection status changes.
aws_dx_virtual_interface_bps_egressVirtualInterfaceBpsEgressMeasures the outbound bandwidth usage for virtual interfaces, helping track the data flow from virtual interfaces.
aws_dx_virtual_interface_bps_ingressVirtualInterfaceBpsIngressMonitors inbound bandwidth usage for virtual interfaces, providing insight into data ingress through virtual interfaces.
aws_dx_virtual_interface_pps_egressVirtualInterfacePpsEgressTracks the number of outbound packets per second for virtual interfaces, helping monitor packet-based traffic.
aws_dx_virtual_interface_pps_ingressVirtualInterfacePpsIngressMeasures the number of inbound packets per second for virtual interfaces, useful for monitoring packet-level ingress.

AWS/DocDB

Function: Managed document database service that supports MongoDB workloads

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_docdb_info
aws_docdb_backup_retention_period_storage_usedBackupRetentionPeriodStorageUsedTracks the amount of storage used for backup retention, helping manage backup costs and storage.
aws_docdb_buffer_cache_hit_ratioBufferCacheHitRatioMonitors the cache hit ratio, helping to ensure data is being effectively cached.
aws_docdb_cpuutilizationCPUUtilizationMeasures the CPU usage of the database, useful for monitoring resource consumption.
aws_docdb_change_stream_log_sizeChangeStreamLogSizeTracks the size of the change stream log, helping monitor the volume of changes being processed.
aws_docdb_dbcluster_replica_lag_maximumDBClusterReplicaLagMaximumMonitors the maximum replication lag between the primary and replica nodes in the cluster.
aws_docdb_dbcluster_replica_lag_minimumDBClusterReplicaLagMinimumTracks the minimum replication lag, helping ensure data replication is kept in sync.
aws_docdb_dbinstance_replica_lagDBInstanceReplicaLagMonitors replication lag at the instance level, useful for tracking data consistency across instances.
aws_docdb_database_connectionsDatabaseConnectionsTracks the number of active connections to the database, helping monitor connection load.
aws_docdb_database_connections_maxDatabaseConnectionsMaxMonitors the maximum number of connections allowed, helping avoid connection exhaustion.
aws_docdb_database_cursorsDatabaseCursorsTracks the number of database cursors in use, helping monitor query processing.
aws_docdb_database_cursors_maxDatabaseCursorsMaxMonitors the maximum number of database cursors, useful for managing resource limits.
aws_docdb_database_cursors_timed_outDatabaseCursorsTimedOutTracks cursors that have timed out, helping identify performance issues.
aws_docdb_disk_queue_depthDiskQueueDepthMeasures the depth of the disk I/O queue, useful for monitoring disk performance.
aws_docdb_documents_deletedDocumentsDeletedTracks the number of documents deleted, helping to monitor data deletion operations.
aws_docdb_documents_insertedDocumentsInsertedMeasures the number of documents inserted, helping to track data growth in the database.
aws_docdb_documents_returnedDocumentsReturnedTracks the number of documents returned by queries, useful for monitoring query performance.
aws_docdb_documents_updatedDocumentsUpdatedMeasures the number of documents updated, helping track changes in the database.
aws_docdb_engine_uptimeEngineUptimeMonitors the total uptime of the database engine, useful for tracking availability.
aws_docdb_free_local_storageFreeLocalStorageTracks the amount of free storage on the database node, helping to prevent storage exhaustion.
aws_docdb_freeable_memoryFreeableMemoryMonitors the amount of free memory, useful for ensuring sufficient memory availability.
aws_docdb_network_receive_throughputNetworkReceiveThroughputMeasures the amount of data being received by the database, useful for tracking inbound network usage.
aws_docdb_network_throughputNetworkThroughputMonitors overall network throughput, helping track both inbound and outbound traffic.
aws_docdb_network_transmit_throughputNetworkTransmitThroughputMeasures the amount of data being transmitted from the database, helping track outbound traffic.
aws_docdb_opcounters_commandOpcountersCommandTracks the number of database commands executed, useful for monitoring operational throughput.
aws_docdb_opcounters_deleteOpcountersDeleteMonitors the number of delete operations, useful for tracking data modifications.
aws_docdb_opcounters_getmoreOpcountersGetmoreMeasures the number of getMore operations, useful for monitoring pagination in queries.
aws_docdb_opcounters_insertOpcountersInsertTracks the number of insert operations, helping monitor data insert performance.
aws_docdb_opcounters_queryOpcountersQueryMonitors the number of queries executed, useful for tracking query load.
aws_docdb_opcounters_updateOpcountersUpdateMeasures the number of update operations, helping monitor data modifications in the database.
aws_docdb_read_iopsReadIOPSTracks the number of input/output operations per second for reads, helping to monitor read performance.
aws_docdb_read_latencyReadLatencyMeasures the latency of read operations, helping to identify performance issues with data retrieval.
aws_docdb_read_throughputReadThroughputMonitors the rate of data being read from the database, useful for tracking read performance.
aws_docdb_snapshot_storage_usedSnapshotStorageUsedTracks the amount of storage used for database snapshots, helping manage backup storage costs.
aws_docdb_swap_usageSwapUsageMonitors the amount of swap space used, helping track memory efficiency.
aws_docdb_total_backup_storage_billedTotalBackupStorageBilledTracks the amount of backup storage billed, useful for understanding backup costs.
aws_docdb_volume_bytes_usedVolumeBytesUsedMeasures the amount of storage volume in use, helping track database storage usage.
aws_docdb_volume_read_iopsVolumeReadIOPsTracks the number of read input/output operations per second on the storage volume, useful for monitoring storage performance.
aws_docdb_volume_write_iopsVolumeWriteIOPsMeasures the number of write I/O operations per second, helping monitor write performance on the storage volume.
aws_docdb_write_iopsWriteIOPSTracks the number of write operations per second, useful for tracking write throughput.
aws_docdb_write_latencyWriteLatencyMeasures the latency of write operations, helping to identify performance bottlenecks during data insertion or updates.
aws_docdb_write_throughputWriteThroughputMonitors the rate at which data is written to the database, useful for understanding write performance.

AWS/DynamoDB

Function: Fully managed NoSQL database service for low-latency applications at scale

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_dynamodb_info
aws_dynamodb_account_max_readsAccountMaxReadsMonitors the maximum number of reads across all tables in the account, helping track overall read activity.
aws_dynamodb_account_max_table_level_readsAccountMaxTableLevelReadsTracks the maximum reads at the table level, helping to identify read-heavy tables.
aws_dynamodb_account_max_table_level_writesAccountMaxTableLevelWritesMeasures the maximum number of writes at the table level, useful for identifying write-intensive tables.
aws_dynamodb_account_max_writesAccountMaxWritesTracks the maximum number of writes across all tables in the account, helping monitor write throughput.
aws_dynamodb_account_provisioned_read_capacity_utilizationAccountProvisionedReadCapacityUtilizationMonitors the utilization of the provisioned read capacity, helping ensure sufficient read capacity allocation.
aws_dynamodb_account_provisioned_write_capacity_utilizationAccountProvisionedWriteCapacityUtilizationTracks the utilization of the provisioned write capacity, useful for efficient capacity management.
aws_dynamodb_age_of_oldest_unreplicated_recordAgeOfOldestUnreplicatedRecordMeasures the age of the oldest unreplicated record, helping track replication lag.
aws_dynamodb_conditional_check_failed_requestsConditionalCheckFailedRequestsTracks the number of failed conditional checks, useful for identifying logical issues during write operations.
aws_dynamodb_consumed_change_data_capture_unitsConsumedChangeDataCaptureUnitsMeasures the number of consumed Change Data Capture units, helping monitor CDC-based operations.
aws_dynamodb_consumed_read_capacity_unitsConsumedReadCapacityUnitsMonitors the total read capacity units consumed, helping track and optimize read operations.
aws_dynamodb_consumed_write_capacity_unitsConsumedWriteCapacityUnitsMeasures the total write capacity units consumed, useful for monitoring and optimizing write operations.
aws_dynamodb_failed_to_replicate_record_countFailedToReplicateRecordCountTracks the number of records that failed to replicate, useful for identifying replication issues.
aws_dynamodb_max_provisioned_table_read_capacity_utilizationMaxProvisionedTableReadCapacityUtilizationMeasures the maximum utilization of the provisioned read capacity at the table level, useful for understanding table-specific read activity.
aws_dynamodb_max_provisioned_table_write_capacity_utilizationMaxProvisionedTableWriteCapacityUtilizationTracks the maximum utilization of provisioned write capacity at the table level, helping optimize write capacity.
aws_dynamodb_on_demand_max_read_request_unitsOnDemandMaxReadRequestUnitsMonitors the maximum number of read request units in on-demand mode, useful for managing scaling costs.
aws_dynamodb_on_demand_max_write_request_unitsOnDemandMaxWriteRequestUnitsTracks the maximum number of write request units in on-demand mode, helping optimize scaling and cost management.
aws_dynamodb_online_index_consumed_write_capacityOnlineIndexConsumedWriteCapacityMeasures the write capacity consumed by online index builds, useful for tracking index creation overhead.
aws_dynamodb_online_index_percentage_progressOnlineIndexPercentageProgressMonitors the progress of online index creation, useful for understanding index build status.
aws_dynamodb_online_index_throttle_eventsOnlineIndexThrottleEventsTracks throttle events during online index creation, useful for detecting capacity constraints.
aws_dynamodb_pending_replication_countPendingReplicationCountMonitors the number of records pending replication, useful for tracking replication progress.
aws_dynamodb_provisioned_read_capacity_unitsProvisionedReadCapacityUnitsTracks the total provisioned read capacity units, useful for managing resource allocation.
aws_dynamodb_provisioned_write_capacity_unitsProvisionedWriteCapacityUnitsMonitors the total provisioned write capacity units, helping ensure proper capacity allocation.
aws_dynamodb_read_throttle_eventsReadThrottleEventsMeasures the number of throttled read requests, useful for identifying capacity limitations.
aws_dynamodb_replication_latencyReplicationLatencyTracks the replication latency, helping ensure timely data consistency across replicas.
aws_dynamodb_returned_bytesReturnedBytesMonitors the amount of data returned in response to queries, useful for tracking data retrieval patterns.
aws_dynamodb_returned_item_countReturnedItemCountMeasures the total number of items returned by read operations, useful for monitoring query performance.
aws_dynamodb_returned_records_countReturnedRecordsCountTracks the number of records returned by queries, useful for understanding query load and performance.
aws_dynamodb_successful_request_latencySuccessfulRequestLatencyMonitors the latency of successful requests, useful for optimizing request performance.
aws_dynamodb_system_errorsSystemErrorsTracks system-level errors, helping identify infrastructure or platform issues.
aws_dynamodb_throttled_put_record_countThrottledPutRecordCountMonitors the number of throttled PutItem requests, useful for managing write capacity.
aws_dynamodb_throttled_requestsThrottledRequestsTracks the total number of throttled requests, helping to identify capacity limitations or traffic spikes.
aws_dynamodb_time_to_live_deleted_item_countTimeToLiveDeletedItemCountMeasures the number of items deleted due to Time to Live (TTL) expiration, useful for managing automatic data deletion.
aws_dynamodb_transaction_conflictTransactionConflictMonitors the number of transaction conflicts, helping to optimize transaction performance.
aws_dynamodb_user_errorsUserErrorsTracks user-level errors, helping identify application issues.
aws_dynamodb_write_throttle_eventsWriteThrottleEventsMonitors the number of throttled write requests, useful for identifying capacity constraints during write operations.

AWS/EBS

Function: Block storage for use with EC2 instances

Scrape interval: 5 minutes

Includes: Out-of-the-box dashboard

MetricCloudwatch metricPurpose
aws_ebs_info
aws_ebs_volume_read_bytesVolumeReadBytesMeasures the total bytes read from the EBS volume, useful for monitoring data retrieval activity.
aws_ebs_volume_write_bytesVolumeWriteBytesTracks the total bytes written to the EBS volume, helping monitor data write operations.
aws_ebs_volume_read_opsVolumeReadOpsMonitors the number of read operations on the EBS volume, useful for tracking read performance.
aws_ebs_volume_write_opsVolumeWriteOpsMeasures the number of write operations on the EBS volume, helping to monitor write throughput.
aws_ebs_volume_total_read_timeVolumeTotalReadTimeTracks the total time spent on read operations, useful for understanding read latency.
aws_ebs_volume_total_write_timeVolumeTotalWriteTimeMonitors the total time spent on write operations, helping to understand write latency.
aws_ebs_volume_idle_timeVolumeIdleTimeMeasures the amount of idle time for the EBS volume, useful for understanding periods of inactivity.
aws_ebs_volume_queue_lengthVolumeQueueLengthTracks the length of the queue for I/O requests on the EBS volume, helping to identify potential performance bottlenecks.
aws_ebs_volume_throughput_percentageVolumeThroughputPercentageMonitors the throughput percentage of the EBS volume, useful for ensuring optimal performance.
aws_ebs_volume_consumed_read_write_opsVolumeConsumedReadWriteOpsMeasures the number of read and write operations consumed, helping track IOPS utilization.
aws_ebs_burst_balanceBurstBalanceTracks the balance of burst credits available for burstable performance EBS volumes, helping manage performance spikes.
aws_ebs_enable_copied_image_deprecation_completedEnableCopiedImageDeprecationCompletedMeasures the completion of copied image deprecation operations, useful for lifecycle management.
aws_ebs_enable_copied_image_deprecation_failedEnableCopiedImageDeprecationFailedTracks the failure of copied image deprecation operations, helping identify issues with deprecation.
aws_ebs_enable_image_deprecation_completedEnableImageDeprecationCompletedMeasures the completion of image deprecation operations, helping monitor deprecation success.
aws_ebs_enable_image_deprecation_failedEnableImageDeprecationFailedTracks the failure of image deprecation operations, useful for identifying deprecation issues.
aws_ebs_images_copied_region_completedImagesCopiedRegionCompletedMonitors the completion of image copy operations across regions, helping manage multi-region image availability.
aws_ebs_images_copied_region_deregister_completedImagesCopiedRegionDeregisterCompletedTracks the completion of deregistration of copied images across regions, useful for lifecycle management.
aws_ebs_images_copied_region_deregistered_failedImagesCopiedRegionDeregisteredFailedMeasures failures during the deregistration of copied images, helping identify operational issues.
aws_ebs_images_copied_region_failedImagesCopiedRegionFailedTracks failures in region-to-region image copy operations, useful for identifying cross-region availability issues.
aws_ebs_images_copied_region_startedImagesCopiedRegionStarted
aws_ebs_images_create_completedImagesCreateCompleted
aws_ebs_images_create_failedImagesCreateFailed
aws_ebs_images_create_startedImagesCreateStarted
aws_ebs_images_deregister_completedImagesDeregisterCompleted
aws_ebs_images_deregister_failedImagesDeregisterFailed
aws_ebs_resources_targetedResourcesTargeted
aws_ebs_snapshots_copied_account_completedSnapshotsCopiedAccountCompleted
aws_ebs_snapshots_copied_account_delete_completedSnapshotsCopiedAccountDeleteCompleted
aws_ebs_snapshots_copied_account_delete_failedSnapshotsCopiedAccountDeleteFailed
aws_ebs_snapshots_copied_account_failedSnapshotsCopiedAccountFailed
aws_ebs_snapshots_copied_account_startedSnapshotsCopiedAccountStarted
aws_ebs_snapshots_copied_region_completedSnapshotsCopiedRegionCompleted
aws_ebs_snapshots_copied_region_delete_completedSnapshotsCopiedRegionDeleteCompleted
aws_ebs_snapshots_copied_region_delete_failedSnapshotsCopiedRegionDeleteFailed
aws_ebs_snapshots_copied_region_failedSnapshotsCopiedRegionFailed
aws_ebs_snapshots_copied_region_startedSnapshotsCopiedRegionStarted
aws_ebs_snapshots_create_completedSnapshotsCreateCompletedTracks the successful completion of snapshot creation, helping monitor backup operations.
aws_ebs_snapshots_create_failedSnapshotsCreateFailedMeasures the number of failed snapshot creation attempts, useful for detecting backup failures.
aws_ebs_snapshots_create_startedSnapshotsCreateStarted
aws_ebs_snapshots_delete_completedSnapshotsDeleteCompletedTracks the completion of snapshot deletion, useful for storage management.
aws_ebs_snapshots_delete_failedSnapshotsDeleteFailedMeasures the number of failed snapshot deletion attempts, helping track operational issues with snapshot management.
aws_ebs_snapshots_shared_completedSnapshotsSharedCompleted

AWS/EC2

Function: Virtual servers in the cloud for running applications

Scrape interval: 5 minutes

Includes: Out-of-the-box dashboard

MetricCloudwatch metricPurpose
aws_ec2_info
aws_ec2_cpuutilizationCPUUtilizationMeasures the amount of data received by the EC2 instance, useful for monitoring inbound traffic.
aws_ec2_network_inNetworkInMeasures the amount of data received by the EC2 instance, useful for monitoring inbound traffic.
aws_ec2_network_outNetworkOutMonitors the amount of data sent from the EC2 instance, helping track outbound traffic.
aws_ec2_network_packets_inNetworkPacketsInTracks the number of network packets received, useful for understanding inbound network traffic patterns.
aws_ec2_network_packets_outNetworkPacketsOutMeasures the number of network packets sent, helping monitor outbound network activity.
aws_ec2_disk_read_bytesDiskReadBytesMonitors the number of bytes read from the instance’s storage, useful for tracking data retrieval performance.
aws_ec2_disk_write_bytesDiskWriteBytesMeasures the number of bytes written to the instance’s storage, helping to track storage write operations.
aws_ec2_disk_read_opsDiskReadOpsTracks the number of read operations on the instance’s storage, useful for monitoring storage performance.
aws_ec2_disk_write_opsDiskWriteOpsMeasures the number of write operations on the instance’s storage, helping track write activity.
aws_ec2_status_check_failedStatusCheckFailedTracks whether the EC2 instance has failed the instance or system status checks, useful for identifying potential issues.
aws_ec2_status_check_failed_instanceStatusCheckFailed_InstanceMonitors whether the instance has failed the instance-level status checks, helping to detect internal instance issues.
aws_ec2_status_check_failed_systemStatusCheckFailed_SystemTracks failures in the system-level status checks, useful for identifying infrastructure issues impacting the instance.
aws_ec2_ebsiobalance_percentEBSIOBalance%Measures the I/O balance of attached EBS volumes, helping to ensure that the instance has adequate I/O capacity.
aws_ec2_ebsbyte_balance_percentEBSByteBalance%Tracks the byte balance of attached EBS volumes, useful for managing storage throughput.
aws_ec2_ebsread_opsEBSReadOpsMonitors the number of read operations on attached EBS volumes, useful for tracking storage read performance.
aws_ec2_ebswrite_opsEBSWriteOpsTracks the number of write operations on attached EBS volumes, helping to monitor storage write activity.
aws_ec2_ebsread_bytesEBSReadBytesMeasures the number of bytes read from attached EBS volumes, useful for monitoring data retrieval performance.
aws_ec2_ebswrite_bytesEBSWriteBytesTracks the number of bytes written to attached EBS volumes, helping to monitor data write performance.
aws_ec2_cpucredit_balanceCPUCreditBalanceMonitors the remaining CPU credits for burstable instances, helping ensure that sufficient CPU credits are available for performance.
aws_ec2_cpucredit_usageCPUCreditUsageTracks the number of CPU credits used, useful for monitoring the consumption of burstable instances.
aws_ec2_cpusurplus_credit_balanceCPUSurplusCreditBalanceMeasures the surplus CPU credits available for burstable instances, useful for tracking instance performance capacity.
aws_ec2_cpusurplus_credits_chargedCPUSurplusCreditsChargedTracks the number of surplus CPU credits charged, helping manage costs associated with overutilization.
aws_ec2_dedicated_host_cpuutilizationDedicatedHostCPUUtilizationMeasures the CPU usage of dedicated EC2 hosts, helping to optimize host-level resource allocation.
aws_ec2_metadata_no_tokenMetadataNoTokenMonitors the number of failed attempts to retrieve metadata without a token, useful for identifying security or access issues.
aws_ec2_status_check_failed_attached_ebsStatusCheckFailed_AttachedEBSTracks status check failures related to attached EBS volumes, helping monitor storage health and performance.

AWS/EC2Spot

Function: Uses spare EC2 capacity at reduced prices for workloads with flexible start times

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_ec2spot_info
aws_ec2spot_available_instance_pools_countAvailableInstancePoolsCountMonitors the number of instance pools available for Spot requests, useful for tracking availability.
aws_ec2spot_bids_submitted_for_capacityBidsSubmittedForCapacity
Tracks the number of bids submitted for capacity in Spot instances, helping monitor the Spot instance bidding process.
aws_ec2spot_eligible_instance_pool_countEligibleInstancePoolCountMeasures the number of eligible instance pools for Spot requests, useful for understanding Spot market options.
aws_ec2spot_fulfilled_capacityFulfilledCapacityTracks the capacity fulfilled by Spot instances, helping monitor the success rate of Spot requests.
aws_ec2spot_max_percent_capacity_allocationMaxPercentCapacityAllocationMeasures the maximum percent of capacity allocated, useful for understanding the allocation of Spot instances.
aws_ec2spot_pending_capacityPendingCapacityTracks the pending Spot instance capacity, helping monitor Spot instance provisioning.
aws_ec2spot_percent_capacity_allocationPercentCapacityAllocationMonitors the percentage of capacity allocated to Spot instances, useful for managing resource allocation.
aws_ec2spot_target_capacityTargetCapacityTracks the target capacity for Spot instances, useful for monitoring Spot instance request goals.
aws_ec2spot_terminating_capacityTerminatingCapacityMeasures the capacity being terminated in Spot instances, helping track Spot instance lifecycle management.

AWS/ECR

Function: Managed container image registry for storing Docker images

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_ecr_repository_pull_countRepositoryPullCountMonitors the number of pulls from an ECR repository, useful for tracking container image usage.

AWS/ECS

Function: Fully managed container orchestration service for running Docker containers

Scrape interval: 5 minutes

Includes: Out-of-the-box dashboard

MetricCloudwatch metricPurpose
aws_ecs_info
aws_ecs_cpureservationCPUReservationTracks the CPU reserved for ECS tasks, helping monitor resource reservation.
aws_ecs_cpuutilizationCPUUtilizationMonitors the CPU utilization of ECS tasks, useful for tracking resource usage.
aws_ecs_gpureservationGPUReservationTracks GPU reservation for ECS tasks, helping manage GPU resources.
aws_ecs_memory_reservationMemoryReservationMonitors the memory reserved for ECS tasks, helping track memory resource allocation.
aws_ecs_memory_utilizationMemoryUtilizationTracks the memory utilization of ECS tasks, useful for monitoring memory resource consumption.

AWS/EFS

Function: Scalable and fully managed file storage for use with EC2 instances

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_efs_info
aws_efs_burst_credit_balanceBurstCreditBalanceMonitors the balance of burst credits for EFS, useful for managing performance bursts.
aws_efs_client_connectionsClientConnectionsTracks the number of client connections to EFS, useful for understanding file system usage.
aws_efs_data_read_iobytesDataReadIOBytesMeasures the amount of data read from EFS, helping track read performance.
aws_efs_data_write_iobytesDataWriteIOBytesTracks the amount of data written to EFS, helping monitor write performance.
aws_efs_metadata_iobytesMetadataIOBytesMonitors the metadata operations on EFS, useful for tracking metadata-related I/O.
aws_efs_metered_iobytesMeteredIOBytesTracks the amount of metered I/O operations, helping manage performance limits.
aws_efs_percent_iolimitPercentIOLimitMonitors the percentage of the I/O limit reached, useful for performance management.
aws_efs_permitted_throughputPermittedThroughputMeasures the allowed throughput for EFS, helping monitor throughput limits.
aws_efs_storage_bytesStorageBytesTracks the total storage used by EFS, useful for managing storage capacity.
aws_efs_total_iobytesTotalIOBytesMeasures the total I/O operations, helping monitor overall file system performance.

AWS/ELB

Function: Distributes traffic across multiple targets like EC2 instances and containers

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_elb_info
aws_elb_backend_connection_errorsBackendConnectionErrorsTracks the number of connection errors between ELB and the backend instances, useful for identifying connection issues.
aws_elb_healthy_host_countHealthyHostCountMonitors the number of healthy backend instances, helping track instance health.
aws_elb_httpcode_backend_2_xxHTTPCode_Backend_2XXTracks successful responses (2XX) from the backend, useful for monitoring backend application performance.
aws_elb_httpcode_backend_3_xxHTTPCode_Backend_3XXMeasures redirection responses (3XX) from the backend, helping monitor routing performance.
aws_elb_httpcode_backend_4_xxHTTPCode_Backend_4XXTracks client errors (4XX) from the backend, useful for identifying issues with client requests.
aws_elb_httpcode_backend_5_xxHTTPCode_Backend_5XXMonitors server errors (5XX) from the backend, helping track server-side issues.
aws_elb_httpcode_elb_4_xxHTTPCode_ELB_4XXMeasures client errors (4XX) at the ELB level, useful for tracking errors handled by the ELB.
aws_elb_httpcode_elb_5_xxHTTPCode_ELB_5XXTracks server errors (5XX) at the ELB level, helping monitor ELB server-side performance.
aws_elb_latencyLatencyMonitors the latency of requests through the ELB, useful for tracking response times.
aws_elb_request_countRequestCountTracks the number of requests handled by the ELB, useful for monitoring traffic levels.
aws_elb_spillover_countSpilloverCountMeasures the number of requests that were rejected due to lack of available resources,helping track capacity limitations.
aws_elb_surge_queue_lengthSurgeQueueLengthTracks the length of the request queue, useful for monitoring traffic surges.
aws_elb_un_healthy_host_countUnHealthyHostCountMonitors the number of unhealthy backend instances, helping identify infrastructure issues.
aws_elb_estimated_albactive_connection_countEstimatedALBActiveConnectionCountTracks the number of active connections to the ALB, useful for monitoring load balancer usage.
aws_elb_estimated_albconsumed_lcusEstimatedALBConsumedLCUsMeasures the load balancer capacity units (LCUs) consumed by the ALB, helping monitor resource usage.
aws_elb_estimated_albnew_connection_countEstimatedALBNewConnectionCountTracks the number of new connections established with the ALB, useful for monitoring connection traffic.
aws_elb_estimated_processed_bytesEstimatedProcessedBytesMonitors the total bytes processed by the ALB, helping to track data flow through the load balancer.

AWS/ES

Function: Managed Elasticsearch service for real-time search and analytics

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_es_info
aws_es_infoaws_es_infoProvides general information about the Elasticsearch service
aws_es_2xx2xxTracks successful requests to the Elasticsearch service
aws_es_3xx3xxTracks redirection requests to the Elasticsearch service
aws_es_4xx4xxTracks client error responses from the Elasticsearch service
aws_es_5xx5xxTracks server error responses from the Elasticsearch service
aws_es_adanomaly_detectors_index_status_redADAnomalyDetectorsIndexStatus.redIndicates if the anomaly detection index is in a red (critical) state
aws_es_adanomaly_detectors_index_status_index_existsADAnomalyDetectorsIndexStatusIndexExistsTracks whether the anomaly detection index exists or not
aws_es_adanomaly_results_index_status_redADAnomalyResultsIndexStatus.redIndicates if the anomaly results index is in a red (critical) state
aws_es_adanomaly_results_index_status_index_existsADAnomalyResultsIndexStatusIndexExistsTracks whether the anomaly results index exists or not
aws_es_adexecute_failure_countADExecuteFailureCountTracks the number of times anomaly detection execution has failed
aws_es_adexecute_request_countADExecuteRequestCountTracks the number of anomaly detection execution requests
aws_es_adhcexecute_failure_countADHCExecuteFailureCountTracks the number of high cardinality anomaly detection execution failures
aws_es_adhcexecute_request_countADHCExecuteRequestCountTracks the number of high cardinality anomaly detection execution requests
aws_es_admodels_checkpoint_index_status_redADModelsCheckpointIndexStatus.redIndicates if the model checkpoint index is in a red (critical) state
aws_es_admodels_checkpoint_index_status_index_existsADModelsCheckpointIndexStatusIndexExistsTracks whether the model checkpoint index exists
aws_es_adplugin_unhealthyADPluginUnhealthyIndicates if the anomaly detection plugin is in an unhealthy state
aws_es_alerting_degradedAlertingDegradedIndicates if the alerting feature is in a degraded state
aws_es_alerting_index_existsAlertingIndexExistsTracks whether the alerting index exists
aws_es_alerting_index_status_greenAlertingIndexStatus.greenIndicates if the alerting index is in a green (healthy) state
aws_es_alerting_index_status_redAlertingIndexStatus.redIndicates if the alerting index is in a red (critical) state
aws_es_alerting_index_status_yellow AlertingIndexStatus.yellowIndicates if the alerting index is in a yellow (warning) state
aws_es_alerting_nodes_not_on_scheduleAlertingNodesNotOnScheduleTracks the number of nodes not on schedule for alerting
aws_es_alerting_nodes_on_scheduleAlertingNodesOnScheduleTracks the number of nodes on schedule for alerting
aws_es_alerting_scheduled_job_enabledAlertingScheduledJobEnabledIndicates if alerting scheduled jobs are enabled
aws_es_asynchronous_search_cancelledAsynchronousSearchCancelledTracks the number of asynchronous search requests that were canceled
aws_es_asynchronous_search_completion_rateAsynchronousSearchCompletionRateTracks the rate of successful asynchronous search completions
aws_es_asynchronous_search_failure_rateAsynchronousSearchFailureRateTracks the rate of failed asynchronous search requests
aws_es_asynchronous_search_initialized_rateAsynchronousSearchInitializedRateTracks the rate of initialized asynchronous search requests
aws_es_asynchronous_search_max_running_timeAsynchronousSearchMaxRunningTimeTracks the maximum time taken by asynchronous search requests
aws_es_asynchronous_search_persist_failed_rateAsynchronousSearchPersistFailedRateTracks the rate of failed attempts to persist asynchronous search results
aws_es_asynchronous_search_persist_rateAsynchronousSearchPersistRateTracks the rate of successful attempts to persist asynchronous search results
aws_es_asynchronous_search_rejectedAsynchronousSearchRejectedTracks the number of asynchronous search requests that were rejected
aws_es_asynchronous_search_running_currentAsynchronousSearchRunningCurrentTracks the number of currently running asynchronous search requests
aws_es_asynchronous_search_store_healthAsynchronousSearchStoreHealthTracks the health of the store for asynchronous search
aws_es_asynchronous_search_store_sizeAsynchronousSearchStoreSizeTracks the size of the asynchronous search store
aws_es_asynchronous_search_stored_response_countAsynchronousSearchStoredResponseCountTracks the number of responses stored for asynchronous search
aws_es_asynchronous_search_submission_rateAsynchronousSearchSubmissionRate Tracks the rate of submitted asynchronous search requests
aws_es_auto_follow_leader_call_failureAutoFollowLeaderCallFailureTracks the number of failures when trying to call the leader for cross-cluster replication
aws_es_auto_follow_num_failed_start_replicationAutoFollowNumFailedStartReplicationTracks the number of failed attempts to start cross-cluster replication
aws_es_auto_follow_num_success_start_replicationAutoFollowNumSuccessStartReplicationTracks the number of successful attempts to start cross-cluster replication
aws_es_auto_tune_changes_history_heap_sizeAutoTuneChangesHistoryHeapSizeTracks the heap size usage history for auto-tune changes
aws_es_auto_tune_changes_history_jvmyoung_gen_argsAutoTuneChangesHistoryJVMYoungGenArgsTracks JVM young generation arguments for auto-tune changes
aws_es_auto_tune_failedAutoTuneFailedTracks the number of failed auto-tune attempts
aws_es_auto_tune_succeededAutoTuneSucceededTracks the number of successful auto-tune attempts
aws_es_auto_tune_valueAutoTuneValueTracks the value of auto-tune changes
aws_es_automated_snapshot_failureAutomatedSnapshotFailureTracks the number of failures in automatedsnapshots
aws_es_avg_point_in_time_alive_timeAvgPointInTimeAliveTimeTracks the average lifetime of point-in-time snapshots
aws_es_burst_balanceBurstBalanceTracks the burst balance for the service
aws_es_cpucredit_balanceCPUCreditBalanceTracks the balance of CPU credits for the nodes
aws_es_cpuutilizationCPUUtilizationTracks the CPU utilization of the nodes
aws_es_cluster_index_writes_blockedClusterIndexWritesBlockedTracks whether index writes are blocked at the cluster level
aws_es_cluster_status_greenClusterStatus.greenIndicates if the cluster is in a green (healthy) state
aws_es_cluster_status_redClusterStatus.redIndicates if the cluster is in a red (critical) state
aws_es_cluster_status_yellowClusterStatus.yellowIndicates if the cluster is in a yellow (warning) state
aws_es_cluster_used_spaceClusterUsedSpaceTracks the amount of used storage space in the cluster
aws_es_cold_storage_space_utilizationColdStorageSpaceUtilizationTracks the storage utilization of cold data
aws_es_cold_to_warm_migration_failure_countColdToWarmMigrationFailureCountTracks the number of failures during migration from cold to warm storage
aws_es_cold_to_warm_migration_latencyColdToWarmMigrationLatencyTracks the latency of migration from cold to warm storage
aws_es_cold_to_warm_migration_queue_sizeColdToWarmMigrationQueueSizeTracks the queue size for migration from cold to warm storage
aws_es_cold_to_warm_migration_success_countColdToWarmMigrationSuccessCountTracks the number of successful migrations from cold to warm storage
aws_es_coordinating_write_rejectedCoordinatingWriteRejectedTracks the number of rejected coordinating node write requests
aws_es_cross_cluster_inbound_replication_requestsCrossClusterInboundReplicationRequestsTracks the number of inbound replication requests for cross-cluster replication
aws_es_cross_cluster_inbound_requestsCrossClusterInboundRequestsTracks the number of inbound requests for cross-cluster replication
aws_es_cross_cluster_outbound_connectionsCrossClusterOutboundConnectionsTracks the number of outbound connections for cross-cluster replication
aws_es_cross_cluster_outbound_replication_requestsCrossClusterOutboundReplicationRequestsTracks the number of outbound replication requests for cross-cluster replication
aws_es_cross_cluster_outbound_requestsCrossClusterOutboundRequestsTracks the number of outbound requests for cross-cluster replication
aws_es_current_point_in_timeCurrentPointInTimeTracks the current point in time (snapshot) available in Elasticsearch
aws_es_data_nodesDataNodesTracks the number of data nodes in the Elasticsearch cluster
aws_es_data_nodes_shards_activeDataNodesShards.activeTracks the number of active shards on data nodes
aws_es_data_nodes_shards_initializingDataNodesShards.initializingTracks the number of shards that are initializing on data nodes
aws_es_data_nodes_shards_relocatingDataNodesShards.relocatingTracks the number of shards that are relocating on data nodes
aws_es_data_nodes_shards_unassignedDataNodesShards.unassignedTracks the number of unassigned shards on data nodes
aws_es_deleted_documentsDeletedDocumentsTracks the number of deleted documents from the Elasticsearch cluster
aws_es_disk_queue_depthDiskQueueDepthTracks the depth of the disk queue
aws_es_reporting_failed_request_sys_err_countESReportingFailedRequestSysErrCountTracks the number of failed reporting requests due to system errors
aws_es_reporting_failed_request_user_err_countESReportingFailedRequestUserErrCountTracks the number of failed reporting requests due to user errors
aws_es_reporting_request_countESReportingRequestCountTracks the number of reporting requests submitted to Elasticsearch
aws_es_reporting_success_countESReportingSuccessCountTracks the number of successful reporting requests
aws_es_elasticsearch_requestsElasticsearchRequestsTracks the number of requests to Elasticsearch
aws_es_follower_check_pointFollowerCheckPointTracks the checkpoint of a follower node in cross-cluster replication
aws_es_free_storage_spaceFreeStorageSpaceTracks the available storage space in the Elasticsearch cluster
aws_es_has_active_point_in_timeHasActivePointInTimeIndicates `whether there is an active point-in-time snapshot
aws_es_has_used_point_in_timeHasUsedPointInTimeIndicates whether the point-in-time snapshot has been used
aws_es_hot_storage_space_utilizationHotStorageSpaceUtilizationTracks the storage utilization of hot data
aws_es_hot_to_warm_migration_failure_countHotToWarmMigrationFailureCountTracks the number of failures during migration from hot to warm storage
aws_es_hot_to_warm_migration_force_merge_latencyHotToWarmMigrationForceMergeLatencyTracks the latency of force merging during migration from hot to warm storage
aws_es_hot_to_warm_migration_processing_latencyHotToWarmMigrationProcessingLatencyTracks the latency of processing migration from hot to warm storage
aws_es_hot_to_warm_migration_queue_sizeHotToWarmMigrationQueueSizeTracks the queue size for migration from hot to warm storage
aws_es_hot_to_warm_migration_snapshot_latencyHotToWarmMigrationSnapshotLatencyTracks the latency of snapshotting during migration from hot to warm storage
aws_es_hot_to_warm_migration_success_countHotToWarmMigrationSuccessCountTracks the number of successful migrations from hot to warm storage
aws_es_hot_to_warm_migration_success_latencyHotToWarmMigrationSuccessLatencyTracks the latency of successful migrations from hot to warm storage
aws_es_indexing_latency IndexingLatencyTracks the latency of indexing documents in the Elasticsearch cluster
aws_es_indexing_rate IndexingRateTracks the rate of indexing documents in the Elasticsearch cluster
aws_es_invalid_host_header_requestsInvalidHostHeaderRequestsTracks the number of requests with invalid host headers
aws_es_iops_throttleIopsThrottleTracks throttling of input/output operations
aws_es_jvmgcold_collection_countJVMGCOldCollectionCountTracks the number of garbage collection events in the old generation of JVM
aws_es_jvmgcold_collection_timeJVMGCOldCollectionTimeTracks the time spent in garbage collection in the old generation of JVM
aws_es_jvmgcyoung_collection_countJVMGCYoungCollectionCountTracks the number of garbage collection events in the young generation of JVM
aws_es_jvmgcyoung_collection_timeJVMGCYoungCollectionTimeTracks the time spent in garbage collection in the young generation of JVM
aws_es_jvmmemory_pressureJVMMemoryPressureTracks memory pressure on the JVM used by Elasticsearch
aws_es_kmskey_error KMSKeyErrorTracks the number of errors related to KMS keys used by the Elasticsearch cluster
aws_es_kmskey_inaccessibleKMSKeyInaccessibleTracks the number of times a KMS key is inaccessible for the Elasticsearch cluster
aws_es_knncache_capacity_reachedKNNCacheCapacityReachedTracks when the KNN cache capacity is reached
aws_es_knncircuit_breaker_triggeredKNNCircuitBreakerTriggeredTracks when the KNN circuit breaker is triggered
aws_es_knneviction_countKNNEvictionCountTracks the number of evictions from the KNN cache
aws_es_knngraph_index_errorsKNNGraphIndexErrorsTracks errors during KNN graph indexing
aws_es_knngraph_index_requestsKNNGraphIndexRequestsTracks the number of KNN graph index requests
aws_es_knngraph_memory_usageKNNGraphMemoryUsageTracks memory usage by the KNN graph
aws_es_knngraph_query_errorsKNNGraphQueryErrorsTracks errors during KNN graph queries
aws_es_knngraph_query_requestsKNNGraphQueryRequestsTracks the number of KNN graph query requests
aws_es_knnhit_countKNNHitCountTracks the number of hits returned by KNN queries
aws_es_knnload_exception_countKNNLoadExceptionCountTracks the number of exceptions duringKNN data loading
aws_es_knnload_success_countKNNLoadSuccessCountTracks the number of successful KNN data load operations
aws_es_knnmiss_countKNNMissCountTracks the number of KNN cache misses
aws_es_knnquery_requestsKNNQueryRequestsTracks the number of KNN queries
aws_es_knnscript_compilation_errorsKNNScriptCompilationErrorsTracks the number of errors during KNN script compilation
aws_es_knnscript_compilationsKNNScriptCompilationsTracks the number of KNN script compilations
aws_es_knnscript_query_errorsKNNScriptQueryErrorsTracks errors during KNN script queries
aws_es_knnscript_query_requestsKNNScriptQueryRequestsTracks the number of KNN script queries
aws_es_knntotal_load_timeKNNTotalLoadTimeTracks the total load time for KNN operations
aws_es_kibana_concurrent_connectionsKibanaConcurrentConnectionsTracks the number of concurrent Kibana connections
aws_es_kibana_healthy_nodesKibanaHealthyNodesTracks the number of healthy Kibana nodes
aws_es_kibana_heap_totalKibanaHeapTotalTracks the total heap size of Kibana
aws_es_kibana_heap_usedKibanaHeapUsedTracks the heap size used by Kibana
aws_es_kibana_heap_utilizationKibanaHeapUtilizationTracks the heap utilization of Kibana
aws_es_kibana_os1_minute_loadKibanaOS1MinuteLoadTracks the 1-minute load average of the Kibana node’s operating system
aws_es_kibana_reporting_failed_request_sys_err_countKibanaReportingFailedRequestSysErrCountTracks the number of failed Kibana reporting requests due to system errors
aws_es_kibana_reporting_failed_request_user_err_countKibanaReportingFailedRequestUserErrCountTracks the number of failed Kibana reporting requests due to user errors
aws_es_kibana_reporting_request_countKibanaReportingRequestCountTracks the number of Kibana reporting requests
aws_es_kibana_reporting_success_countKibanaReportingSuccessCountTracks the number of successful Kibana reporting requests
aws_es_kibana_request_totalKibanaRequestTotalTracks the total number of requests sent to Kibana
aws_es_kibana_response_times_max_in_millisKibanaResponseTimesMaxInMillisTracks the maximum response time of Kibana requests in milliseconds
aws_es_ltrfeature_memory_usage_in_bytesLTRFeatureMemoryUsageInBytesTracks memory usage by LTR features in bytes
aws_es_ltrfeatureset_memory_usage_in_bytesLTRFeaturesetMemoryUsageInBytesTracks memory usage by LTR feature sets in bytes
aws_es_ltrmemory_usageLTRMemoryUsageTracks overall memory usage by LTR features
aws_es_ltrmodel_memory_usage_in_bytesLTRModelMemoryUsageInBytesTracks memory usage by LTR models in bytes
aws_es_ltrrequest_error_countLTRRequestErrorCountTracks the number of errors in LTR requests
aws_es_ltrrequest_total_countLTRRequestTotalCountTracks the total number of LTR requests
aws_es_ltrstatus_redLTRStatus.redIndicates if the LTR status is in a red (critical) state
aws_es_leader_check_pointLeaderCheckPointTracks the checkpoint of the leader node in cross-cluster replication
aws_ es_master_cpucredit_balanceMasterCPUCreditBalanceTracks the balance of CPU credits for the master node
aws_ es_master_cpuutilizationMasterCPUUtilizationTracks the CPU utilization of the master node
aws_ es_master_free_storage_spaceMasterFreeStorageSpaceTracks the free storage space available on the naster node
aws_ es_master_jvmmemory_pressureMasterJVMMemoryPressureTracks JVM memory pressure on the master node
aws_ es_master_old_gen_jvmmemory_pressureMasterOldGenJVMMemoryPressureTracks old generation JVM memory pressure on the master node
aws_ es_master_reachable_from_nodeMasterReachableFromNodeTracks whether the master node is reachable from the data nodes
aws_ es_master_sys_memory_utilizationMasterSysMemoryUtilizationTracks system memory utilization of the master node
aws_ es_max_provisioned_throughputMaxProvisionedThroughputTracks the maximum provisioned throughput for Elasticsearch
aws_ es_nodesNodesTracks the number of nodes in the Elasticsearch cluster
aws_ es_old_gen_jvmmemory_pressureOldGenJVMMemoryPressureTracks old generation JVM memory pressure on the nodes
aws_ es_open_search_dashboards_concurrent_connectionspenSearchDashboardsConcurrentConnectionsTracks the number of concurrent connections to OpenSearch Dashboards
aws_ es_open_search_dashboards_healthy_nodeOpenSearchDashboardsHealthyNodeTracks the number of healthy OpenSearch Dashboard nodes
aws_ es_open_search_dashboards_healthy_nodesOpenSearchDashboardsHealthyNodesTracks the number of healthy OpenSearch Dashboard nodes
aws_ es_open_search_dashboards_heap_totalOpenSearchDashboardsHeapTotalTracks the total heap size of OpenSearch Dashboards
aws_ es_open_search_dashboards_heap_usedOpenSearchDashboardsHeapUsedTracks the heap size used by OpenSearch Dashboards
aws_ es_open_search_dashboards_heap_utilizationOpenSearchDashboardsHeapUtilizationTracks the heap utilization of OpenSearch Dashboards
aws_ es_open_search_dashboards_os1_minute_loadOpenSearchDashboardsOS1MinuteLoadTracks the 1-minute load average of the OpenSearch Dashboards node’s operating system
aws_ es_open_search_dashboards_request_totalOpenSearchDashboardsRequestTotalTracks the total number of requests sent to OpenSearch Dashboards
aws_ es_open_search_dashboards_response_times_max_in_millisOpenSearchDashboardsResponseTimesMaxInMillisTracks the maximum response time of OpenSearch Dashboards requests in milliseconds
aws_ es_open_search_requestsOpenSearchRequestsTracks the number of requests to OpenSearch
aws_ es_opensearch_dashboards_reporting_failed_request_sys_err_countOpensearchDashboardsReportingFailedRequestSysErrCountTracks the number of failed OpenSearch Dashboards reporting requests due to system errors
aws_ es_opensearch_dashboards_reporting_failed_request_user_err_countOpensearchDashboardsReportingFailedRequestUserErrCountTracks the number of failed OpenSearch Dashboards reporting requests due to user errors
aws_ es_opensearch_dashboards_reporting_request_countOpensearchDashboardsReportingRequestCountTracks the number of OpenSearch Dashboards reporting requests
aws_ es_opensearch_dashboards_reporting_success_countOpensearchDashboardsReportingSuccessCountTracks the number of successful OpenSearch Dashboards reporting requests
aws_es_pplfailed_request_count_by_cus_errPPLFailedRequestCountByCusErrTracks the number of PPL failed requests due to customer errors
aws_es_pplfailed_request_count_by_sys_errPPLFailedRequestCountBySysErrTracks the number of PPL failed requests due to system errors
aws_es_pplrequest_countPPLRequestCountTracks the total number of PPL requests
aws_es_primary_write_rejectedPrimaryWriteRejectedTracks the number of rejected primary write requests
aws_es_read_iopsReadIOPSTracks input/output operations per second for reads
aws_es_read_iopsmicro_burstingReadIOPSMicroBurstingTracks micro-bursting of input/output operations for reads
aws_es_read_latencyReadLatencyTracks the latency of read operations in the Elasticsearch cluster
aws_es_read_throughputReadThroughputTracks the throughput of read operations
aws_es_read_throughput_micro_burstingReadThroughputMicroBurstingTracks micro-bursting of read throughput
aws_es_remote_storage_used_spaceRemoteStorageUsedSpaceTracks the amount of used space in remote storage
aws_es_remote_storage_write_rejectedRemoteStorageWriteRejectedTracks the number of rejected write operations in remote storage
aws_es_replica_write_rejectedReplicaWriteRejectedTracks the number of rejected replica write requests
aws_es_replication_num_bootstrapping_indicesReplicationNumBootstrappingIndicesTracks the number of indices in the bootstrapping state for replication
aws_es_replication_num_failed_indicesReplicationNumFailedIndicesTracks the number of failed replication indices
aws_es_replication_num_paused_indicesReplicationNumPausedIndicesTracks the number of paused replication indices
aws_es_replication_num_syncing_indicesReplicationNumSyncingIndicesTracks the number of replication indices currently syncing
aws_es_replication_rateReplicationRateTracks the rate of replication in Elasticsearch
aws_es_sqldefault_cursor_request_countSQLDefaultCursorRequestCountTracks the number of default SQL cursor requests
aws_es_sqlfailed_request_count_by_cus_errSQLFailedRequestCountByCusErrTracks the number of SQL failed requests due to customer errors
aws_es_sqlfailed_request_count_by_sys_errSQLFailedRequestCountBySysErrTracks the number of SQL failed requests due to system errors
aws_es_sqlrequest_countSQLRequestCountTracks the total number of SQL requests
aws_es_sqlunhealthySQLUnhealthyTracks whether the SQL plugin is in an unhealthy state
aws_es_search_latencySearchLatencyTracks the latency of search operations in the Elasticsearch cluster
aws_es_search_rateSearchRateTracks the rate of search operations
aws_es_search_shard_task_cancelledSearchShardTaskCancelledTracks the number of search shard tasks that were canceled
aws_es_search_task_cancelledSearchTaskCancelledTracks the number of canceled search tasks
aws_es_searchable_documentsSearchableDocumentsTracks the number of searchable documents
aws_es_segment_countSegmentCountTracks the number of segments in the Elasticsearch cluster
aws_es_shards_activeShards.activeTracks the number of active shards
aws_es_shards_active_primaryShards.activePrimaryTracks the number of active primary shards
aws_es_shards_delayed_unassignedShards.delayedUnassignedTracks the number of delayed unassigned shards
aws_es_shards_initializingShards.initializingTracks the number of initializing shards
aws_es_shards_relocatingShards.relocatingTracks the number of relocating shards
aws_es_shards_unassignedShards.unassignedTracks the number of unassigned shards
aws_es_sys_memory_utilizationSysMemoryUtilizationTracks system memory utilization
aws_es_threadpool_bulk_queueThreadpoolBulkQueueTracks the size of the bulk thread pool queue
aws_es_threadpool_bulk_rejectedThreadpoolBulkRejectedTracks the number of bulk thread pool tasks that were rejected
aws_es_threadpool_bulk_threadsThreadpoolBulkThreadsTracks the number of active threads in the bulk thread pool
aws_es_threadpool_force_merge_queueThreadpoolForce_mergeQueueTracks the size of the force merge thread pool queue
aws_es_threadpool_force_merge_rejectedThreadpoolForce_mergeRejectedTracks the number of force merge thread pool tasks that were rejected
aws_es_threadpool_force_merge_threadsThreadpoolForce_mergeThreadsTracks the number of active threads in the force merge thread pool
aws_es_threadpool_index_queueThreadpoolIndexQueueTracks the size of the index thread pool queue
aws_es_threadpool_index_rejectedThreadpoolIndexRejectedTracks the number of index thread pool tasks that were rejected
aws_es_threadpool_index_threadsThreadpoolIndexThreadsTracks the number of active threads in the index thread pool
aws_es_threadpool_search_queueThreadpoolSearchQueueTracks the size of the search thread pool queue
aws_es_threadpool_search_rejectedThreadpoolSearchRejectedTracks the number of search thread pool tasks that were rejected
aws_es_threadpool_search_threadsThreadpoolSearchThreadsTracks the number of active threads in the search thread pool
aws_es_threadpool_write_queueThreadpoolWriteQueueTracks the size of the write thread pool queue
aws_es_threadpool_write_rejectedThreadpoolWriteRejectedTracks the number of write thread pool tasks that were rejected
aws_es_threadpool_write_threadsThreadpoolWriteThreadsTracks the number of active threads in the write thread pool
aws_es_threadpoolsql_worker_queueThreadpoolsql-workerQueueTracks the size of the SQL worker thread pool queue
aws_es_threadpoolsql_worker_rejectedThreadpoolsql-workerRejectedTracks the number of SQL worker thread pool tasks that were rejected
aws_es_threadpoolsql_worker_threadsThreadpoolsql-workerThreadsTracks the number of active threads in the SQL worker thread pool
aws_es_throughput_throttleThroughputThrottleTracks throttling of throughput in the Elasticsearch cluster
aws_es_total_point_in_timeTotalPointInTimeTracks the total number of point-in-time snapshots
aws_es_warm_cpuutilizationWarmCPUUtilizationTracks the CPU utilization of warm data nodes
aws_es_warm_free_storage_spaceWarmFreeStorageSpaceTracks the available storage space in warm data nodes
aws_es_warm_jvmgcold_collection_countWarmJVMGCOldCollectionCountTracks the number of garbage collection events in the old generation of JVM on warm data nodes
aws_es_warm_jvmgcyoung_collection_countWarmJVMGCYoungCollectionCountTracks the number of garbage collection events in the young generation of JVM on warm data nodes
aws_es_warm_jvmgcyoung_collection_timeWarmJVMGCYoungCollectionTimeTracks the time spent in garbage collection in the young generation of JVM on warm data nodes
aws_es_warm_jvmmemory_pressureWarmJVMMemoryPressureTracks memory pressure on warm data nodes
aws_es_warm_old_gen_jvmmemory_pressureWarmOldGenJVMMemoryPressureTracks old generation JVM memory pressure on warm data nodes
aws_es_warm_search_latencyWarmSearchLatencyTracks the latency of search operations on warm data nodes
aws_es_warm_search_rateWarmSearchRateTracks the rate of search operations on warm data nodes
aws_es_warm_searchable_documentsWarmSearchableDocumentsTracks the number of searchable documents on warm data nodes
aws_es_warm_storage_space_utilizationWarmStorageSpaceUtilizationTracks storage space utilization on warm data nodes
aws_es_warm_sys_memory_utilizationWarmSysMemoryUtilizationTracks system memory utilization on warm data nodes
aws_es_warm_threadpool_search_queueWarmThreadpoolSearchQueueTracks the size of the search thread pool queue on warm data nodes
aws_es_warm_threadpool_search_rejectedWarmThreadpoolSearchRejectedTracks the number of search thread pool tasks that were rejected on warm data nodes
aws_es_warm_threadpool_search_threadsWarmThreadpoolSearchThreadsTracks the number of active threads in the search thread pool on warm data nodes
aws_es_warm_to_cold_migration_failure_countWarmToColdMigrationFailureCountTracks the number of failures during migration from warm to cold storage
aws_es_warm_to_cold_migration_latencyWarmToColdMigrationLatencyTracks the latency of migration from warm to cold storage
aws_es_warm_to_cold_migration_queue_sizeWarmToColdMigrationQueueSizeTracks the queue size for migration from warm to cold storage
aws_es_warm_to_cold_migration_success_countWarmToColdMigrationSuccessCountTracks the number of successful migrations from warm to cold storage
aws_es_warm_to_hot_migration_queue_sizeWarmToHotMigrationQueueSizeTracks the queue size for migration from warm to hot storage
aws_es_write_iops WriteIOPSTracks input/output operations per second for writes
aws_es_write_iopsmicro_burstingWriteIOPSMicroBurstingTracks micro-bursting of input/output operations for writes
aws_es_write_latencyWriteLatencyTracks the latency of write operations in the Elasticsearch cluster
aws_es_write_throughputWriteThroughputTracks the throughput of write operations
aws_es_write_throughput_micro_burstingWriteThroughputMicroBurstingTracks micro-bursting of write throughput

AWS/ElastiCache

Function: Managed Redis and Memcached for real-time caching

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_elasticache_info
aws_elasticache_active_defrag_hitsActiveDefragHitsTracks the number of active defragmentation hits in ElastiCache
aws_elasticache_authentication_failuresAuthenticationFailuresMonitors the number of failed authentication attempts
aws_elasticache_bytes_read_from_diskBytesReadFromDiskMeasures the number of bytes read from disk in the ElastiCache cluster
aws_elasticache_bytes_read_into_memcachedBytesReadIntoMemcachedTracks the number of bytes read into Memcached
aws_elasticache_bytes_used_for_cacheBytesUsedForCacheMonitors the total amount of memory used for cache
aws_elasticache_bytes_used_for_cache_itemsBytesUsedForCacheItemsMeasures the memory used by items in cache
aws_elasticache_bytes_used_for_hashBytesUsedForHashTracks memory used for hash tables in the cache
aws_elasticache_bytes_used_for_memory_dbBytesUsedForMemoryDBMonitors memory usage for MemoryDB in ElastiCache
aws_elasticache_bytes_written_out_from_memcachedBytesWrittenOutFromMemcachedTracks the number of bytes written out from Memcached
aws_elasticache_bytes_written_to_diskBytesWrittenToDiskMeasures the number of bytes written to disk in the ElastiCache cluster
aws_elasticache_cpucredit_balanceCPUCreditBalanceTracks the balance of CPU credits for burstable instance types
aws_elasticache_cpucredit_usageCPUCreditUsageMonitors CPU credit usage for burstable instance types
aws_elasticache_cpuutilizationCPUUtilizationMeasures the CPU utilization of the ElastiCache instance
aws_elasticache_cache_hit_rateCacheHitRateTracks the cache hit rate, indicating how often requested data is found in cache
aws_elasticache_cache_hitsCacheHitsMeasures the total number of cache hits
aws_elasticache_cache_missesCacheMissesTracks the number of cache misses, when requested data is not found in cache
aws_elasticache_cas_badvalCasBadvalMonitors the number of CAS operations that failed due to bad values
aws_elasticache_cas_hitsCasHitsTracks the number of successful CAS operations
aws_elasticache_cas_missesCasMissesMeasures the number of CAS operations that failed due to missing data
aws_elasticache_channel_authorization_failuresChannelAuthorizationFailuresTracks the number of channel authorization failures
aws_elasticache_cluster_based_cmdsClusterBasedCmdsMonitors the number of cluster-based commands executed
aws_elasticache_cluster_based_cmds_latencyClusterBasedCmdsLatencyTracks the latency of cluster-based commands
aws_elasticache_cmd_config_getCmdConfigGetMeasures the number of configuration GET commands executed
aws_elasticache_cmd_config_setCmdConfigSetTracks the number of configuration SET commands executed
aws_elasticache_cmd_flushCmdFlushMonitors the number of flush commands executed in the ElastiCache cluster
aws_elasticache_cmd_getCmdGetTracks the number of GET commands executed in the cache
aws_elasticache_cmd_setCmdSetMeasures the number of SET commands executed in the cache
aws_elasticache_cmd_touchCmdTouchTracks the number of touch commands executed in the cache
aws_elasticache_command_authorization_failuresCommandAuthorizationFailuresMonitors the number of command authorization failures in the ElastiCache cluster
aws_elasticache_curr_configCurrConfigTracks the current configuration state of the ElastiCache instance
aws_elasticache_curr_connectionsCurrConnectionsMeasures the current number of open connections to the ElastiCache instance
aws_elasticache_curr_itemsCurrItemsTracks the current number of items in the cache
aws_elasticache_curr_volatile_itemsCurrVolatileItemsMonitors the number of volatile items in the cache
aws_elasticache_db0_average_ttlDB0AverageTTLMeasures the average time-to-live (TTL) of items in the cache
**aws_elasticache_database_capacity_usage_counted_for_evict_percentageDatabaseCapacityUsageCountedForEvictPercentage**Tracks the percentage of database capacity usage considered for eviction
aws_elasticache_database_capacity_usage_percentageDatabaseCapacityUsagePercentageMonitors the overall percentage of database capacity usage
aws_elasticache_database_memory_usage_counted_for_evict_percentageDatabaseMemoryUsageCountedForEvictPercentage**Tracks the percentage of database memory usage considered for eviction
aws_elasticache_database_memory_usage_percentageDatabaseMemoryUsagePercentageMeasures the overall memory usage percentage in the ElastiCache cluster
aws_elasticache_decr_hitsDecrHitsMonitors the number of successful DECR (decrement) operations
aws_elasticache_decr_missesDecrMissesTracks the number of DECR operations that failed
aws_elasticache_delete_hitsDeleteHitsMeasures the number of successful DELETE operations
aws_elasticache_delete_missesDeleteMissesTracks the number of DELETE operations that failed
aws_elasticache_engine_cpuutilizationEngineCPUUtilizationMonitors the CPU utilization of the ElastiCache engine
aws_elasticache_eval_based_cmdsEvalBasedCmdsTracks the number of EVAL-based commands executed in the cache
aws_elasticache_eval_based_cmds_latencyEvalBasedCmdsLatencyMeasures the latency of EVAL-based commands in the cache
aws_elasticache_evicted_unfetchedEvictedUnfetchedMonitors the number of items evicted before being fetched
aws_elasticache_evictionsEvictionsTracks the total number of evictions in the cache
aws_elasticache_expired_unfetchedExpiredUnfetchedMeasures the number of items that expired before being fetched
aws_elasticache_freeable_memoryFreeableMemoryTracks the amount of free memory available in the ElastiCache cluster
aws_elasticache_geo_spatial_based_cmdsGeoSpatialBasedCmdsMonitors the number of geospatial commands executed
aws_elasticache_geo_spatial_based_cmds_latencyGeoSpatialBasedCmdsLatencyMeasures the latency of geospatial commands
aws_elasticache_get_hitsGetHitsTracks the number of successful GET operations in the cache
aws_elasticache_get_missesGetMissesMeasures the number of GET operations that failed
aws_elasticache_get_type_cmdsGetTypeCmdsMonitors the number of GET-type commands executed
aws_elasticache_get_type_cmds_latencyGetTypeCmdsLatencyMeasures the latency of GET-type commands executed
aws_elasticache_global_datastore_replication_lagGlobalDatastoreReplicationLag
aws_elasticache_hash_based_cmdsHashBasedCmds
aws_elasticache_hash_based_cmds_latencyHashBasedCmdsLatency
aws_elasticache_hyper_log_log_based_cmdsHyperLogLogBasedCmds
aws_elasticache_hyper_log_log_based_cmds_latencyHyperLogLogBasedCmdsLatency
aws_elasticache_iam_authentication_expirationsIamAuthenticationExpirations
aws_elasticache_iam_authentication_throttlingIamAuthenticationThrottling
aws_elasticache_incr_hitsIncrHits
aws_elasticache_incr_missesIncrMisses
aws_elasticache_is_masterIsMaster
aws_elasticache_is_primaryIsPrimary
aws_elasticache_json_based_cmdsJsonBasedCmds
aws_elasticache_json_based_cmds_latencyJsonBasedCmdsLatency
aws_elasticache_json_based_get_cmdsJsonBasedGetCmds
aws_elasticache_key_authorization_failuresKeyAuthorizationFailures
aws_elasticache_key_based_cmdsKeyBasedCmds
aws_elasticache_key_based_cmds_latencyKeyBasedCmdsLatency
aws_elasticache_keys_trackedKeysTracked
aws_elasticache_keyspace_hitsKeyspaceHits
aws_elasticache_keyspace_missesKeyspaceMisses
aws_elasticache_list_based_cmdsListBasedCmds
aws_elasticache_list_based_cmds_latencyListBasedCmdsLatency
aws_elasticache_master_link_health_statusMasterLinkHealthStatus
aws_elasticache_max_replication_throughputMaxReplicationThroughput
aws_elasticache_memory_fragmentation_ratioMemoryFragmentationRatio
aws_elasticache_network_bandwidth_in_allowance_exceededNetworkBandwidthInAllowanceExceeded
aws_elasticache_network_bandwidth_out_allowance_exceededNetworkBandwidthOutAllowanceExceeded
aws_elasticache_network_bytes_inNetworkBytesIn
aws_elasticache_network_bytes_outNetworkBytesOut
aws_elasticache_network_conntrack_allowance_exceededNetworkConntrackAllowanceExceeded
aws_elasticache_network_link_local_allowance_exceededNetworkLinkLocalAllowanceExceeded
aws_elasticache_network_max_bytes_inNetworkMaxBytesIn
awselasticache_network_max_bytes_outNetworkMaxBytesOut
aws_elasticache_network_max_packets_inNetworkMaxPacketsIn
aws_elasticache_network_max_packets_outNetworkMaxPacketsOut
aws_elasticache_network_packets_inNetworkPacketsIn
aws_elasticache_network_packets_outNetworkPacketsOut
aws_elasticache_network_packets_per_second_allowance_exceededNetworkPacketsPerSecondAllowanceExceeded
aws_elasticache_new_connectionsNewConnections
aws_elasticache_new_itemsNewItems
aws_elasticache_num_items_read_from_diskNumItemsReadFromDisk
aws_elasticache_num_items_written_to_diskNumItemsWrittenToDisk
aws_elasticache_primary_link_health_statusPrimaryLinkHealthStatus
aws_elasticache_pub_sub_based_cmdsPubSubBasedCmds
aws_elasticache_pub_sub_based_cmds_latencyPubSubBasedCmdsLatency
aws_elasticache_reclaimedReclaimed
aws_elasticache_replication_bytesReplicationBytes
aws_elasticache_replication_delayed_write_commandsReplicationDelayedWriteCommands
aws_elasticache_replication_lagReplicationLag
aws_elasticache_save_in_progressSaveInProgress
aws_elasticache_search_based_cmdsSearchBasedCmds
aws_elasticache_search_based_get_cmdsSearchBasedGetCmds
aws_elasticache_search_based_set_cmdsSearchBasedSetCmds
aws_elasticache_search_number_of_indexed_keysSearchNumberOfIndexedKeys
aws_elasticache_search_number_of_indexesSearchNumberOfIndexes
aws_elasticache_search_total_index_sizeSearchTotalIndexSize
aws_elasticache_set_based_cmdsSetBasedCmds
aws_elasticache_set_based_cmds_latencySetBasedCmdsLatency
aws_elasticache_set_type_cmdsSetTypeCmds
aws_elasticache_set_type_cmds_latencySetTypeCmdsLatency
aws_elasticache_slabs_movedSlabsMoved
aws_elasticache_sorted_set_based_cmdsSortedSetBasedCmds
aws_elasticache_sorted_set_based_cmds_latencySortedSetBasedCmdsLatency
aws_elasticache_stream_based_cmdsStreamBasedCmds
aws_elasticache_stream_based_cmds_latencyStreamBasedCmdsLatency
aws_elasticache_string_based_cmdsStringBasedCmds
aws_elasticache_string_based_cmds_latencyStringBasedCmdsLatency
aws_elasticache_swap_usageSwapUsage
aws_elasticache_touch_hitsTouchHits
aws_elasticache_touch_missesTouchMisses
aws_elasticache_traffic_management_activeTrafficManagementActive
aws_elasticache_unused_memoryUnusedMemory

AWS/ElasticBeanstalk

Function: Service to quickly deploy and manage applications in the cloud without provisioning resources

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_elasticbeanstalk_infoElasticBeanstalk InfoGeneral information about the AWS Elastic Beanstalk environment
aws_elasticbeanstalk_application_latency_p10ApplicationLatencyP10Tracks the 10th percentile application latency for Elastic Beanstalk
aws_elasticbeanstalk_application_latency_p50ApplicationLatencyP50Measures the median (50th percentile) application latency
aws_elasticbeanstalk_application_latency_p75ApplicationLatencyP75Tracks the 75th percentile latency of requests in Elastic Beanstalk
aws_elasticbeanstalk_application_latency_p85ApplicationLatencyP85Measures the 85th percentile latency for Elastic Beanstalk applications
aws_elasticbeanstalk_application_latency_p90ApplicationLatencyP90Tracks the 90th percentile application latency
aws_elasticbeanstalk_application_latency_p95ApplicationLatencyP95Measures the 95th percentile latency for Elastic Beanstalk applications
aws_elasticbeanstalk_application_latency_p99ApplicationLatencyP99Tracks the 99th percentile application latency
aws_elasticbeanstalk_application_latency_p99_9ApplicationLatencyP99.9Measures the 99.9th percentile application latency in Elastic Beanstalk
aws_elasticbeanstalk_application_requests2xxApplicationRequests2xxTracks the number of successful application requests with 2xx status codes
aws_elasticbeanstalk_application_requests3xxApplicationRequests3xxMeasures the number of application requests with 3xx (redirection) status codes
aws_elasticbeanstalk_application_requests4xxApplicationRequests4xxTracks the number of client error requests with 4xx status codes
aws_elasticbeanstalk_application_requests5xxApplicationRequests5xxMeasures the number of server error requests with 5xx status codes
aws_elasticbeanstalk_application_requests_totalApplicationRequestsTotalTracks the total number of application requests received
aws_elasticbeanstalk_cpuidleCPUIdleMeasures the idle CPU time of instances within Elastic Beanstalk
aws_elasticbeanstalk_cpuiowaitCPUIowaitTracks the CPU time spent waiting for I/O operations to complete
aws_elasticbeanstalk_cpuirqCPUIrqMeasures the time spent on interrupt requests (IRQ) on the CPU
aws_elasticbeanstalk_cpuniceCPUNiceTracks the CPU time spent on user processes that have been “niced”
aws_elasticbeanstalk_cpusoftirqCPUSoftirqMonitors CPU time used for soft interrupt requests
aws_elasticbeanstalk_cpusystemCPUSystemTracks the amount of CPU time spent executing system-level tasks
aws_elasticbeanstalk_cpuuserCPUUserMeasures the amount of CPU time spent executing user processes
aws_elasticbeanstalk_environment_healthEnvironmentHealthMonitors the overall health status of the Elastic Beanstalk environment
aws_elasticbeanstalk_instance_healthInstanceHealthTracks the health status of individual instances in Elastic Beanstalk
aws_elasticbeanstalk_instances_degradedInstancesDegradedMonitors the number of instances with degraded health
aws_elasticbeanstalk_instances_infoInstancesInfoProvides general information about the state of instances in Elastic Beanstalk
aws_elasticbeanstalk_instances_no_dataInstancesNoDataTracks the number of instances reporting no data
aws_elasticbeanstalk_instances_okInstancesOkMonitors the number of healthy instances in the environment
aws_elasticbeanstalk_instances_pendingInstancesPendingMeasures the number of instances in a pending state
aws_elasticbeanstalk_instances_severeInstancesSevereTracks the number of instances with severe health problems
aws_elasticbeanstalk_instances_unknownInstancesUnknownMonitors the number of instances with unknown health status
aws_elasticbeanstalk_instances_warningInstancesWarningTracks the number of instances in warning status
aws_elasticbeanstalk_load_average1minLoadAverage1minMeasures the system load average over the last 1 minute
aws_elasticbeanstalk_load_average5minLoadAverage5minTracks the system load average over the last 5 minutes
aws_elasticbeanstalk_root_filesystem_utilRootFilesystemUtilMonitors the usage of the root file system

AWS/ElasticMapReduce

Function: Managed big data platform for processing large amounts of data using Hadoop

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_elasticmapreduce_infoElasticMapReduce InfoGeneral information about the state of the AWS Elastic MapReduce cluster
aws_elasticmapreduce_apps_completedAppsCompletedTracks the number of applications that have successfully completed
aws_elasticmapreduce_apps_failedAppsFailedMonitors the number of applications that have failed
aws_elasticmapreduce_apps_killedAppsKilledTracks the number of applications that were terminated or killed
aws_elasticmapreduce_apps_pendingAppsPendingMeasures the number of applications that are in the pending state
aws_elasticmapreduce_apps_runningAppsRunningTracks the number of applications currently running
aws_elasticmapreduce_apps_submittedAppsSubmittedMeasures the total number of applications that have been submitted
aws_elasticmapreduce_backup_failedBackupFailedTracks the number of backup attempts that failed
aws_elasticmapreduce_capacity_remaining_gbCapacityRemainingGBMeasures the remaining storage capacity in gigabytes within the cluster
aws_elasticmapreduce_cluster_statusClusterStatusMonitors the overall status of the Elastic MapReduce cluster
aws_elasticmapreduce_container_allocatedContainerAllocatedTracks the number of containers allocated for running tasks
aws_elasticmapreduce_container_pendingContainerPendingMeasures the number of containers pending allocation
aws_elasticmapreduce_container_pending_ratioContainerPendingRatioTracks the ratio of pending containers to total containers
aws_elasticmapreduce_container_reservedContainerReservedMonitors the number of containers reserved for future tasks
aws_elasticmapreduce_core_nodes_pendingCoreNodesPendingTracks the number of core nodes that are pending
aws_elasticmapreduce_core_nodes_runningCoreNodesRunningMeasures the number of core nodes that are currently running
aws_elasticmapreduce_corrupt_blocksCorruptBlocksMonitors the number of blocks that are identified as corrupt
aws_elasticmapreduce_dfs_pending_replication_blocksDfsPendingReplicationBlocksTracks the number of HDFS blocks that are pending replication
aws_elasticmapreduce_hbaseHBaseMonitors the health and activity of the HBase database in the cluster
aws_elasticmapreduce_hdfsbytes_readHDFSBytesReadMeasures the number of bytes read from HDFS in the cluster
aws_elasticmapreduce_hdfsbytes_writtenHDFSBytesWrittenTracks the number of bytes written to HDFS
aws_elasticmapreduce_hdfsutilizationHDFSUtilizationMonitors the utilization of HDFS in the cluster
aws_elasticmapreduce_hbase_backup_failedHbaseBackupFailedTracks the number of failed backups for HBase in the cluster
aws_elasticmapreduce_ioIOMonitors input/output (I/O) operations in the cluster
aws_elasticmapreduce_is_idleIsIdleTracks if the cluster or a node is currently idle
aws_elasticmapreduce_jobs_failedJobsFailedMeasures the number of failed jobs in the cluster
aws_elasticmapreduce_jobs_runningJobsRunningTracks the number of currently running jobs
aws_elasticmapreduce_live_data_nodesLiveDataNodesMonitors the number of live data nodes in the cluster
aws_elasticmapreduce_live_task_trackersLiveTaskTrackersTracks the number of live task trackers
aws_elasticmapreduce_mractive_nodesMRActiveNodesMeasures the number of active MapReduce nodes in the cluster
aws_elasticmapreduce_mrdecommissioned_nodesMRDecommissionedNodesTracks the number of decommissioned MapReduce nodes
aws_elasticmapreduce_mrlost_nodesMRLostNodesMonitors the number of lost MapReduce nodes in the cluster
aws_elasticmapreduce_mrrebooted_nodesMRRebootedNodesMeasures the number of rebooted MapReduce nodes
aws_elasticmapreduce_mrtotal_nodesMRTotalNodesTracks the total number of MapReduce nodes
aws_elasticmapreduce_mrunhealthy_nodesMRUnhealthyNodesMonitors the number of unhealthy MapReduce nodes
aws_elasticmapreduce_map_reduceMap/ReduceGeneral metric for MapReduce activity in the cluster
aws_elasticmapreduce_map_slots_openMapSlotsOpenTracks the number of open Map slots in the cluster
aws_elasticmapreduce_map_tasks_remainingMapTasksRemainingMonitors the number of remaining Map tasks
aws_elasticmapreduce_map_tasks_runningMapTasksRunningTracks the number of Map tasks currently running
aws_elasticmapreduce_memory_allocated_mbMemoryAllocatedMBMeasures the memory allocated in MB in the cluster
aws_elasticmapreduce_memory_available_mbMemoryAvailableMBTracks the available memory in MB in the cluster
aws_elasticmapreduce_memory_reserved_mbMemoryReservedMBMonitors the memory reserved for future tasks in MB
aws_elasticmapreduce_memory_total_mbMemoryTotalMBTracks the total memory available in MB in the cluster
aws_elasticmapreduce_missing_blocksMissingBlocksMeasures the number of missing HDFS blocks in the cluster
aws_elasticmapreduce_most_recent_backup_durationMostRecentBackupDurationTracks the duration of the most recent backup
aws_elasticmapreduce_node_statusNodeStatusMonitors the overall status of the nodes in the cluster
aws_elasticmapreduce_pending_deletion_blocksPendingDeletionBlocksTracks the number of HDFS blocks pending deletion
aws_elasticmapreduce_reduce_slots_openReduceSlotsOpenMeasures the number of open Reduce slots in the cluster
aws_elasticmapreduce_reduce_tasks_remainingReduceTasksRemainingMonitors the number of remaining Reduce tasks
aws_elasticmapreduce_reduce_tasks_runningReduceTasksRunningTracks the number of currently running Reduce tasks
aws_elasticmapreduce_remaining_map_tasks_per_slotRemainingMapTasksPerSlotMeasures the remaining Map tasks per slot
aws_elasticmapreduce_s3_bytes_readS3BytesReadTracks the number of bytes read from S3 during the cluster operation
aws_elasticmapreduce_s3_bytes_writtenS3BytesWrittenMeasures the number of bytes written to S3 during the cluster operation
aws_elasticmapreduce_task_nodes_pendingTaskNodesPendingTracks the number of task nodes that are pending allocation
aws_elasticmapreduce_task_nodes_runningTaskNodesRunningMonitors the number of running task nodes in the cluster
aws_elasticmapreduce_time_since_last_successful_backupTimeSinceLastSuccessfulBackupMeasures the time elapsed since the last successful backup
aws_elasticmapreduce_total_loadTotalLoadTracks the total computational load on the cluster
aws_elasticmapreduce_under_replicated_blocksUnderReplicatedBlocksMonitors the number of under-replicated HDFS blocks in the cluster
aws_elasticmapreduce_yarnmemory_available_percentageYARNMemoryAvailablePercentageTracks the percentage of available YARN memory in the cluster

AWS/Events

Function: Delivers a near real-time stream of system events for building reactive applications

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_events_infoGeneral information about AWS Events
aws_events_dead_letter_invocationsDeadLetterInvocationsTracks the number of times a message is sent to the dead letter queue
aws_events_events EventsMonitors the total number of events received by AWS Events
aws_events_failed_invocationsFailedInvocationsTracks the number of invocation failures
aws_events_ingestionto_invocation_complete_latencyIngestiontoInvocationCompleteLatencyMeasures the latency from event ingestion to invocation completion
aws_events_ingestionto_invocation_start_latencyIngestiontoInvocationStartLatencyMeasures the latency from event ingestion to invocation start
aws_events_invocation_attemptsInvocationAttemptsTracks the total number of invocation attempts
aws_events_invocationsInvocationsTracks the total number of invocations
aws_events_invocations_createdInvocationsCreatedMonitors the number of invocations created
aws_events_invocations_failed_to_be_sent_to_dlqInvocationsFailedToBeSentToDlqTracks the number of invocations that failed to be sent to the dead letter queue
aws_events_invocations_sent_to_dlqInvocationsSentToDlqTracks the number of invocations successfully sent to the dead letter queue
aws_events_matched_eventsMatchedEventsMonitors the number of events that matched event rules
aws_events_put_events_approximate_call_countPutEventsApproximateCallCountMeasures the approximate number of PutEvents API call requests
aws_events_put_events_approximate_failed_countPutEventsApproximateFailedCountTracks the approximate number of PutEvents API call failures
aws_events_put_events_approximate_success_countPutEventsApproximateSuccessCountMonitors the approximate number of successful PutEvents API call requests
aws_events_put_events_approximate_throttled_countPutEventsApproximateThrottledCountTracks the approximate number of throttled PutEvents API call requests
aws_events_put_events_entries_countPutEventsEntriesCountMeasures the number of event entries in PutEvents requests
aws_events_put_events_failed_entries_countPutEventsFailedEntriesCountTracks the number of failed event entries in PutEvents requests
aws_events_put_events_latencyPutEventsLatencyMonitors the latency of PutEvents API requests
aws_events_put_events_request_sizePutEventsRequestSizeMeasures the size of PutEvents API requests
aws_events_put_partner_events_approximate_call_countPutPartnerEventsApproximateCallCountMonitors the approximate number of PutPartnerEvents API call requests
aws_events_put_partner_events_approximate_failed_countPutPartnerEventsApproximateFailedCountTracks the approximate number of failed PutPartnerEvents API call requests
aws_events_put_partner_events_approximate_success_countPutPartnerEventsApproximateSuccessCountMeasures the approximate number of successful PutPartnerEvents API call requests
aws_events_put_partner_events_approximate_throttled_countPutPartnerEventsApproximateThrottledCountTracks the approximate number of throttled PutPartnerEvents API call requests
aws_events_put_partner_events_entries_countPutPartnerEventsEntriesCountMeasures the number of event entries in PutPartnerEvents requests
aws_events_put_partner_events_failed_entries_countPutPartnerEventsFailedEntriesCountMonitors the number of failed event entries in PutPartnerEvents requests
aws_events_put_partner_events_latencyPutPartnerEventsLatencyTracks the latency of PutPartnerEvents API requests
aws_events_retry_invocation_attemptsRetryInvocationAttemptsMeasures the number of retry invocation attempts
aws_events_successful_invocation_attemptsSuccessfulInvocationAttemptsTracks the number of successful invocation attempts
aws_events_throttled_rulesThrottledRulesMonitors the number of rules that were throttled
aws_events_triggered_rulesTriggeredRulesTracks the number of event rules that were triggered

AWS/FSx

Function: Managed file systems optimized for specific workloads like Windows and Lustre

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_fsx_infoGeneral information about FSx
aws_fsx_cpuutilizationCPUUtilizationMeasures the percentage of CPU utilization on the FSx file system
aws_fsx_client_connectionsClientConnectionsTracks the number of active client connections to the FSx file system
aws_fsx_data_read_bytesDataReadBytesMonitors the total bytes read from the file system
aws_fsx_data_read_operationsDataReadOperationsMeasures the number of data read operations
aws_fsx_data_write_bytesDataWriteBytesTracks the total bytes written to the file system
aws_fsx_data_write_operationsDataWriteOperationsMonitors the number of data write operations
aws_fsx_deduplication_saved_storageDeduplicationSavedStorageMeasures the amount of storage saved through data deduplication
aws_fsx_disk_iops_utilizationDiskIopsUtilizationTracks the percentage of disk IOPS (Input/Output Operations Per Second) utilization
aws_fsx_disk_read_bytesDiskReadBytesMonitors the total bytes read from the disk
aws_fsx_disk_read_operationsDiskReadOperationsMeasures the number of disk read operations
aws_fsx_disk_throughput_balanceDiskThroughputBalanceTracks the balance of disk throughput usage
aws_fsx_disk_throughput_utilizationDiskThroughputUtilizationMeasures the percentage of disk throughput utilization
aws_fsx_disk_write_bytesDiskWriteBytesTracks the total bytes written to the disk
aws_fsx_disk_write_operationsDiskWriteOperationsMonitors the number of disk write operations
aws_fsx_file_server_disk_iops_balanceFileServerDiskIopsBalanceMeasures the balance of IOPS utilization on the file server
aws_fsx_file_server_disk_iops_utilizationFileServerDiskIopsUtilizationTracks the percentage of IOPS utilization on the file server
aws_fsx_file_server_disk_throughput_balanceFileServerDiskThroughputBalanceMeasures the balance of disk throughput on the file server
aws_fsx_file_server_disk_throughput_utilizationFileServerDiskThroughputUtilizationMonitors the percentage of disk throughput utilization on the file server
aws_fsx_free_data_storage_capacityFreeDataStorageCapacityTracks the amount of free data storage capacity available
aws_fsx_free_storage_capacityFreeStorageCapacityMeasures the total amount of free storage capacity available
aws_fsx_memory_utilizationMemoryUtilizationMonitors the percentage of memory utilization on the file system
aws_fsx_metadata_operationsMetadataOperationsTracks the number of metadata operations (like file system metadata lookups)
aws_fsx_network_throughput_utilizationNetworkThroughputUtilizationMeasures the percentage of network throughput utilization
aws_fsx_storage_capacity_utilizationStorageCapacityUtilizationTracks the percentage of storage capacity utilization

AWS/Firehose

Function: Service to reliably load streaming data into AWS data stores like S3 and Redshift

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_firehose_infoGeneral information about Firehose
aws_firehose_active_partitions_limitActivePartitionsLimitTracks the limit of active partitions
aws_firehose_backup_to_s3_bytesBackupToS3.BytesMeasures the amount of data backed up to S3 in bytes
aws_firehose_backup_to_s3_data_freshnessBackupToS3.DataFreshnessMonitors the data freshness of backups to S3
aws_firehose_backup_to_s3_recordsBackupToS3.RecordsTracks the number of records backed up to S3
aws_firehose_backup_to_s3_successBackupToS3.SuccessMeasures the success rate of data backup to S3
aws_firehose_bytes_per_second_limitBytesPerSecondLimitMonitors the bytes per second limit for data delivery
aws_firehose_data_read_from_kinesis_stream_bytesDataReadFromKinesisStream.BytesTracks the amount of data read from a Kinesis stream in bytes
aws_firehose_data_read_from_kinesis_stream_recordsDataReadFromKinesisStream.RecordsTracks the number of records read from a Kinesis stream
aws_firehose_data_read_from_source_backpressuredDataReadFromSource.BackpressuredMeasures if the data source is backpressured
aws_firehose_data_read_from_source_bytesDataReadFromSource.BytesMonitors the amount of data read from the source in bytes
aws_firehose_data_read_from_source_recordsDataReadFromSource.RecordsTracks the number of records read from the source
aws_firehose_delivery_to_amazon_open_search_serverless_auth_failureDeliveryToAmazonOpenSearchServerless.AuthFailureTracks authorization failures during delivery to Amazon OpenSearch Serverless
aws_firehose_delivery_to_amazon_open_search_serverless_bytesDeliveryToAmazonOpenSearchServerless.BytesMeasures the amount of data delivered to Amazon OpenSearch Serverless in bytes
aws_firehose_delivery_to_amazon_open_search_serverless_data_freshnessDeliveryToAmazonOpenSearchServerless.DataFreshnessMonitors the data freshness during delivery to Amazon OpenSearch Serverless
aws_firehose_delivery_to_amazon_open_search_serverless_delivery_rejectedDeliveryToAmazonOpenSearchServerless.DeliveryRejectedTracks the number of rejected deliveries to Amazon OpenSearch Serverless
aws_firehose_delivery_to_amazon_open_search_serverless_recordsDeliveryToAmazonOpenSearchServerless.RecordsMeasures the number of records delivered to Amazon OpenSearch Serverless
aws_firehose_delivery_to_amazon_open_search_serverless_successDeliveryToAmazonOpenSearchServerless.SuccessTracks the success rate of delivery to Amazon OpenSearch Serverless
aws_firehose_delivery_to_amazon_open_search_service_auth_failureDeliveryToAmazonOpenSearchService.AuthFailureMonitors authorization failures during delivery to Amazon OpenSearch Service
aws_firehose_delivery_to_amazon_open_search_service_bytesDeliveryToAmazonOpenSearchService.BytesTracks the amount of data delivered to Amazon OpenSearch Service in bytes
aws_firehose_delivery_to_amazon_open_search_service_data_freshnessDeliveryToAmazonOpenSearchService.DataFreshnessMonitors the data freshness during delivery to Amazon OpenSearch Service
aws_firehose_delivery_to_amazon_open_search_service_delivery_rejectedDeliveryToAmazonOpenSearchService.DeliveryRejectedTracks the number of rejected deliveries to Amazon OpenSearch Service
aws_firehose_delivery_to_amazon_open_search_service_recordsDeliveryToAmazonOpenSearchService.RecordsMeasures the number of records delivered to Amazon OpenSearch Service
aws_firehose_delivery_to_amazon_open_search_service_successDeliveryToAmazonOpenSearchService.SuccessTracks the success rate of delivery to Amazon OpenSearch Service
aws_firehose_delivery_to_elasticsearch_bytesDeliveryToElasticsearch.BytesMeasures the amount of data delivered to Elasticsearch in bytes
aws_firehose_delivery_to_elasticsearch_recordsDeliveryToElasticsearch.RecordsTracks the number of records delivered to Elasticsearch
aws_firehose_delivery_to_elasticsearch_successDeliveryToElasticsearch.SuccessMonitors the success rate of delivery to Elasticsearch
aws_firehose_delivery_to_http_endpoint_bytesDeliveryToHttpEndpoint.BytesMeasures the amount of data delivered to an HTTP endpoint in bytes
aws_firehose_delivery_to_http_endpoint_data_freshnessDeliveryToHttpEndpoint.DataFreshnessMonitors the data freshness during delivery to an HTTP endpoint
aws_firehose_delivery_to_http_endpoint_processed_bytesDeliveryToHttpEndpoint.ProcessedBytesTracks the amount of data processed at an HTTP endpoint
aws_firehose_delivery_to_http_endpoint_processed_recordsDeliveryToHttpEndpoint.ProcessedRecordsMonitors the number of records processed at an HTTP endpoint
aws_firehose_delivery_to_http_endpoint_recordsDeliveryToHttpEndpoint.RecordsTracks the number of records delivered to an HTTP endpoint
aws_firehose_delivery_to_http_endpoint_successDeliveryToHttpEndpoint.SuccessMeasures the success rate of delivery to an HTTP endpoint
aws_firehose_delivery_to_redshift_bytesDeliveryToRedshift.BytesTracks the amount of data delivered to Redshift in bytes
aws_firehose_delivery_to_redshift_recordsDeliveryToRedshift.RecordsMonitors the number of records delivered to Redshift
aws_firehose_delivery_to_redshift_successDeliveryToRedshift.SuccessMeasures the success rate of delivery to Redshift
aws_firehose_delivery_to_s3_bytesDeliveryToS3.BytesTracks the amount of data delivered to S3 in bytes
aws_firehose_delivery_to_s3_data_freshnessDeliveryToS3.DataFreshnessMonitors the data freshness during delivery to S3
aws_firehose_delivery_to_s3_object_countDeliveryToS3.ObjectCountTracks the number of objects delivered to S3
aws_firehose_delivery_to_s3_recordsDeliveryToS3.RecordsMonitors the number of records delivered to S3
aws_firehose_delivery_to_s3_successDeliveryToS3.SuccessMeasures the success rate of delivery to S3
aws_firehose_delivery_to_snowflake_bytesDeliveryToSnowflake.BytesTracks the amount of data delivered to Snowflake in bytes
aws_firehose_delivery_to_snowflake_data_commit_latencyDeliveryToSnowflake.DataCommitLatencyMeasures the latency for data commit during delivery to Snowflake
aws_firehose_delivery_to_snowflake_data_freshnessDeliveryToSnowflake.DataFreshnessMonitors the data freshness during delivery to Snowflake
aws_firehose_delivery_to_snowflake_recordsDeliveryToSnowflake.RecordsTracks the number of records delivered to Snowflake
aws_firehose_delivery_to_snowflake_successDeliveryToSnowflake.SuccessMeasures the success rate of delivery to Snowflake
aws_firehose_delivery_to_splunk_bytes DeliveryToSplunk.BytesTracks the amount of data delivered to Splunk in bytes
aws_firehose_delivery_to_splunk_data_ack_latencyDeliveryToSplunk.DataAckLatencyMeasures the acknowledgment latency during delivery to Splunk
aws_firehose_delivery_to_splunk_data_freshnessDeliveryToSplunk.DataFreshnessMonitors the data freshness during delivery to Splunk
aws_firehose_delivery_to_splunk_recordsDeliveryToSplunk.RecordsTracks the number of records delivered to Splunk
aws_firehose_delivery_to_splunk_successDeliveryToSplunk.SuccessMeasures the success rate of delivery to Splunk
aws_firehose_describe_delivery_stream_latencyDescribeDeliveryStream.LatencyTracks the latency for describing a delivery stream
aws_firehose_describe_delivery_stream_requestsDescribeDeliveryStream.RequestsMeasures the number of requests to describe a delivery stream
aws_firehose_execute_processing_durationExecuteProcessing.DurationTracks the duration of data processing during delivery
aws_firehose_execute_processing_successExecuteProcessing.SuccessMeasures the success rate of data processing during delivery
aws_firehose_failed_conversion_bytesFailedConversion.BytesTracks the number of bytes that failed during conversion
aws_firehose_failed_conversion_recordsFailedConversion.RecordsMonitors the number of records that failed during conversion
aws_firehose_failed_validation_bytesFailedValidation.BytesTracks the number of bytes that failed during validation
aws_firehose_failed_validation_recordsFailedValidation.RecordsMonitors the number of records that failed during validation
aws_firehose_incoming_bytesIncomingBytesTracks the amount of incoming data in bytes
aws_firehose_incoming_put_requestsIncomingPutRequestsMeasures the number of incoming put requests
aws_firehose_incoming_recordsIncomingRecordsMonitors the number of incoming records
aws_firehose_jqprocessing_durationJQProcessing.DurationTracks the duration of JQ (JSON Query) processing
aws_firehose_kmskey_access_deniedKMSKeyAccessDeniedMonitors instances where access to the KMS (Key Management Service) key is denied
aws_firehose_kmskey_disabledKMSKeyDisabledTracks the instances where the KMS key is disabled
aws_firehose_kmskey_invalid_stateKMSKeyInvalidStateMonitors the instances where the KMS key is in an invalid state
aws_firehose_kmskey_not_foundKMSKeyNotFoundTracks the instances where the KMS key is not found
aws_firehose_kafka_offset_lagKafkaOffsetLagMonitors the lag in Kafka offset
aws_firehose_kinesis_millis_behind_latestKinesisMillisBehindLatestTracks the time lag (in milliseconds) behind the latest record in Kinesis
aws_firehose_list_delivery_streams_latencyListDeliveryStreams.LatencyMeasures the latency in listing delivery streams
aws_firehose_list_delivery_streams_requestsListDeliveryStreams.RequestsTracks the number of requests for listing delivery streams
aws_firehose_output_decompressed_bytes_failedOutputDecompressedBytes.FailedMeasures the number of decompressed bytes that failed
aws_firehose_output_decompressed_bytes_successOutputDecompressedBytes.SuccessTracks the number of decompressed bytes that succeeded
aws_firehose_output_decompressed_records_failedOutputDecompressedRecords.FailedMonitors the number of decompressed records that failed
aws_firehose_output_decompressed_records_successOutputDecompressedRecords.SuccessTracks the number of decompressed records that succeeded
aws_firehose_partition_countPartitionCountMeasures the count of partitions during data delivery
aws_firehose_partition_count_exceededPartitionCountExceededMonitors instances where partition count exceeds limits
aws_firehose_per_partition_throughputPerPartitionThroughputMeasures the throughput per partition during data delivery
aws_firehose_put_record_bytesPutRecord.BytesTracks the number of bytes delivered via PutRecord API
aws_firehose_put_record_latencyPutRecord.LatencyMeasures the latency in PutRecord API calls
aws_firehose_put_record_requestsPutRecord.RequestsMonitors the number of requests via PutRecord API
aws_firehose_put_record_batch_bytesPutRecordBatch.BytesTracks the number of bytes delivered via PutRecordBatch API
aws_firehose_put_record_batch_latencyPutRecordBatch.LatencyMeasures the latency in PutRecordBatch API calls
aws_firehose_put_record_batch_recordsPutRecordBatch.RecordsMonitors the number of records delivered via PutRecordBatchAPI
aws_firehose_put_record_batch_requestsPutRecordBatch.RequestsMeasures the number of requests via PutRecordBatch API
aws_firehose_put_requests_per_second_limitPutRequestsPerSecondLimitMonitors the limit on PutRecord requests per second
aws_firehose_records_per_second_limitRecordsPerSecondLimitTracks the limit on records processed per second
aws_firehose_resource_countResourceCountMonitors the count of resources in the data delivery stream
aws_firehose_source_throttled_delaySourceThrottled.DelayMeasures the delay caused by throttling on the data source
aws_firehose_succeed_conversion_bytesSucceedConversion.BytesTracks the number of bytes successfully converted
aws_firehose_succeed_conversion_recordsSucceedConversion.RecordsMonitors the number of records successfully converted
aws_firehose_succeed_processing_bytesSucceedProcessing.BytesMeasures the number of bytes successfully processed
aws_firehose_succeed_processing_recordsSucceedProcessing.RecordsTracks the number of records successfully processed
aws_firehose_throttled_describe_streamThrottledDescribeStreamMonitors instances of throttled DescribeStream API calls
aws_firehose_throttled_get_recordsThrottledGetRecordsMeasures instances of throttled GetRecords API calls
aws_firehose_throttled_get_shard_iteratorThrottledGetShardIteratorTracks instances of throttled GetShardIterator API calls
aws_firehose_throttled_recordsThrottledRecordsMeasures instances where records are throttled
aws_firehose_update_delivery_stream_latencyUpdateDeliveryStream.LatencyMeasures the latency in updating delivery streams
aws_firehose_update_delivery_stream_requestsUpdateDeliveryStream.RequestsTracks the number of requests for updating delivery streams

AWS/GameLift

Function: Managed service for deploying, operating, and scaling dedicated game servers

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_gamelift_infoGeneral information about GameLift
aws_gamelift_activating_game_sessionsActivatingGameSessionsTracks the number of game sessions currently being activated
aws_gamelift_active_game_sessionsActiveGameSessionsMonitors the number of active game sessions
aws_gamelift_active_instancesActiveInstancesTracks the number of active GameLift instances
aws_gamelift_active_server_processesActiveServerProcessesMonitors the number of active server processes
aws_gamelift_available_game_serversAvailableGameServersTracks the number of available game servers
aws_gamelift_available_game_sessionsAvailableGameSessionsMonitors the number of available game sessions
aws_gamelift_average_wait_timeAverageWaitTimeTracks the average wait time for players
aws_gamelift_current_player_sessionsCurrentPlayerSessionsMonitors the number of current active player sessions
aws_gamelift_current_ticketsCurrentTicketsTracks the number of current active matchmaking tickets
aws_gamelift_desired_instancesDesiredInstancesTracks the number of desired instances for the fleet
aws_gamelift_draining_available_game_serversDrainingAvailableGameServersMonitors the number of available game servers that are draining
aws_gamelift_draining_utilized_game_serversDrainingUtilizedGameServersTracks the number of utilized game servers that are draining
aws_gamelift_first_choice_not_viableFirstChoiceNotViableMonitors the number of times the first placement choice was not viable
aws_gamelift_first_choice_out_of_capacityFirstChoiceOutOfCapacityTracks the number of times the first placement choice ran out of capacity
aws_gamelift_game_session_interruptionsGameSessionInterruptionsMonitors the number of game session interruptions
aws_gamelift_healthy_server_processesHealthyServerProcessesTracks the number of healthy server processes
aws_gamelift_idle_instancesIdleInstancesMonitors the number of idle instances in the fleet
aws_gamelift_instance_interruptionsInstanceInterruptionsTracks the number of GameLift instance interruptions
aws_gamelift_lowest_latency_placementLowestLatencyPlacementMonitors placements based on the lowest latency
aws_gamelift_lowest_price_placementLowestPricePlacementTracks placements based on the lowest price
aws_gamelift_match_acceptances_timed_outMatchAcceptancesTimedOutMonitors the number of match acceptance timeouts
aws_gamelift_matches_acceptedMatchesAcceptedTracks the number of matches that have been accepted
aws_gamelift_matches_createdMatchesCreatedMonitors the number of matches that have been created
aws_gamelift_matches_placedMatchesPlacedTracks the number of matches successfully placed
aws_gamelift_matches_rejectedMatchesRejectedMonitors the number of rejected matches
aws_gamelift_max_instancesMaxInstancesTracks the maximum number of instances
aws_gamelift_min_instancesMinInstancesMonitors the minimum number of instances
aws_gamelift_percent_available_game_sessionsPercentAvailableGameSessionsTracks the percentage of available game sessions
aws_gamelift_percent_healthy_server_processesPercentHealthyServerProcessesMonitors the percentage of healthy server processes
aws_gamelift_percent_idle_instancesPercentIdleInstancesTracks the percentage of idle instances
aws_gamelift_placementPlacementMonitors the match placement process
aws_gamelift_placements_canceledPlacementsCanceledTracks the number of canceled placements
aws_gamelift_placements_failedPlacementsFailedMonitors the number of failed placements
aws_gamelift_placements_startedPlacementsStartedTracks the number of placement processes started
aws_gamelift_placements_succeededPlacementsSucceededMonitors the number of successful placements
aws_gamelift_placements_timed_outPlacementsTimedOutTracks the number of timed-out placements
aws_gamelift_player_session_activationsPlayerSessionActivationsMonitors the number of activated player sessions
aws_gamelift_players_startedPlayersStartedTracks the number of players who have started their sessions
aws_gamelift_queue_depthQueueDepthMonitors the depth of the matchmaking queue
aws_gamelift_rule_evaluations_failedRuleEvaluationsFailedTracks the number of failed rule evaluations during matchmaking
aws_gamelift_rule_evaluations_passedRuleEvaluationsPassedMonitors the number of passed rule evaluations during matchmaking
aws_gamelift_server_process_abnormal_terminationsServerProcessAbnormalTerminationsTracks the number of abnormal terminations of server processes
aws_gamelift_server_process_activationsServerProcessActivationsMonitors the number of server process activations
aws_gamelift_server_process_terminationsServerProcessTerminationsTracks the number of server process terminations
aws_gamelift_tickets_failedTicketsFailedMonitors the number of failed matchmaking tickets
aws_gamelift_tickets_startedTicketsStartedTracks the number of matchmaking tickets that have started
aws_gamelift_tickets_timed_outTicketsTimedOutMonitors the number of matchmaking tickets that have timed out
aws_gamelift_time_to_matchTimeToMatchTracks the average time taken to find a match
aws_gamelift_time_to_ticket_successTimeToTicketSuccessMonitors the time taken to successfully complete a matchmaking ticket
aws_gamelift_utilized_game_serversUtilizedGameServersTracks the number of utilized game servers

AWS/GlobalAccelerator

Function: Provides static IP addresses to improve availability and performance for global applications

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_globalaccelerator_infoGeneral information about Global Accelerator
aws_globalaccelerator_healthy_endpoint_countHealthyEndpointCountMonitors the number of healthy endpoints in the accelerator
aws_globalaccelerator_new_flow_countNewFlowCountTracks the number of new network flows being processed
aws_globalaccelerator_processed_bytes_inProcessedBytesInMonitors the volume of incoming traffic processed by the accelerator
aws_globalaccelerator_processed_bytes_outProcessedBytesOutTracks the volume of outgoing traffic processed by the accelerator
aws_globalaccelerator_unhealthy_endpoint_countUnhealthyEndpointCount

AWS/Glue

Function: Managed ETL service that prepares and loads data for analytics

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_glue_infoGeneral information about AWS Glue
aws_glue_all_disk_available_gbglue.ALL.disk.available_GBTracks the available disk space in gigabytes for all Glue resources
aws_glue_all_disk_used_percentageglue.ALL.disk.used.percentageMeasures the percentage of disk space used across all Glue resources
aws_glue_all_disk_used_gbglue.ALL.disk.used_GBTracks the used disk space in gigabytes for all Glue resources
aws_glue_all_jvm_heap_usageglue.ALL.jvm.heap.usageMonitors the JVM heap usage for all Glue resources
aws_glue_all_jvm_heap_usedglue.ALL.jvm.heap.usedMeasures the amount of JVM heap used across all Glue resources
aws_glue_all_memory_heap_availableglue.ALL.memory.heap.availableTracks the available memory heap for all Glue resources
aws_glue_all_memory_heap_usedglue.ALL.memory.heap.usedMeasures the used memory heap for all Glue resources
aws_glue_all_memory_heap_used_percentageglue.ALL.memory.heap.used.percentageMeasures the percentage of memory heap used across all Glue resources
aws_glue_all_memory_non_heap_availableglue.ALL.memory.non-heap.availableMonitors the available non-heap memory for all Glue resources
aws_glue_all_memory_non_heap_percentageglue.ALL.memory.non-heap.percentageTracks the percentage of non-heap memory used
aws_glue_all_memory_non_heap_usedglue.ALL.memory.non-heap.usedMeasures the used non-heap memory across all Glue resources
aws_glue_all_memory_total_availableglue.ALL.memory.total.availableTracks the total available memory for all Glue resources
aws_glue_all_memory_total_usedglue.ALL.memory.total.usedMeasures the total used memory for all Glue resources
aws_glue_all_memory_total_used_percentageglue.ALL.memory.total.used.percentageMeasures the total percentage of memory used
aws_glue_all_s3_filesystem_read_bytesglue.ALL.s3.filesystem.read_bytesTracks the total number of bytes read from S3 filesystems
aws_glue_all_s3_filesystem_write_bytesglue.ALL.s3.filesystem.write_bytesTracks the total number of bytes written to S3 filesystems
aws_glue_all_system_cpu_system_loadglue.ALL.system.cpuSystemLoadMonitors the system CPU load across all Glue resources
aws_glue_driver_block_manager_disk_disk_space_used_mbglue.driver.BlockManager.disk.diskSpaceUsed_MBMeasures the disk space used by the block manager in megabytes
aws_glue_driver_executor_allocation_manager_executors_number_all_executorsglue.driver.ExecutorAllocationManager.executors.numberAllExecutorsTracks the number of executors across all Glue drivers
aws_glue_driver_executor_allocation_manager_executors_number_max_needed_executorsglue.driver.ExecutorAllocationManager.executors.numberMaxNeededExecutorsTracks the maximum number of executors needed
aws_glue_driver_aggregate_bytes_readglue.driver.aggregate.bytesReadTracks the total bytes read across all Glue driver instances
aws_glue_driver_aggregate_elapsed_timeglue.driver.aggregate.elapsedTimeMeasures the total elapsed time for tasks
aws_glue_driver_aggregate_num_completed_stagesglue.driver.aggregate.numCompletedStagesTracks the total number of completed stages
aws_glue_driver_aggregate_num_completed_tasksglue.driver.aggregate.numCompletedTasksTracks the total number of completed tasks
aws_glue_driver_aggregate_num_failed_tasksglue.driver.aggregate.numFailedTasksMeasures the number of failed tasks
aws_glue_driver_aggregate_num_killed_tasksglue.driver.aggregate.numKilledTasksTracks the number of killed tasks
aws_glue_driver_aggregate_records_readglue.driver.aggregate.recordsReadTracks the total number of records read by drivers
aws_glue_driver_aggregate_shuffle_bytes_writtenglue.driver.aggregate.shuffleBytesWrittenMeasures the number of shuffle bytes written
aws_glue_driver_aggregate_shuffle_local_bytes_readglue.driver.aggregate.shuffleLocalBytesReadTracks the number of shuffle bytes read locally
aws_glue_driver_bytes_readglue.driver.bytesReadMeasures the total bytes read by drivers
aws_glue_driver_bytes_writtenglue.driver.bytesWrittenMeasures the total bytes written by drivers
aws_glue_driver_disk_available_gbglue.driver.disk.available_GBTracks the available disk space for Glue drivers
aws_glue_driver_disk_used_percentageglue.driver.disk.used.percentageMeasures the percentage of disk space used by Glue drivers
aws_glue_driver_disk_used_gbglue.driver.disk.used_GBMeasures the used disk space in gigabytes for Glue drivers
aws_glue_driver_files_readglue.driver.filesReadTracks the total number of files read
aws_glue_driver_files_writtenglue.driver.filesWrittenMeasures the total number of files written
aws_glue_driver_jvm_heap_usageglue.driver.jvm.heap.usageMonitors the JVM heap usage of Glue drivers
aws_glue_driver_jvm_heap_usedglue.driver.jvm.heap.usedMeasures the used JVM heap for Glue drivers
aws_glue_driver_memory_heap_availableglue.driver.memory.heap.availableTracks the available heap memory for Glue drivers
aws_glue_driver_memory_heap_usedglue.driver.memory.heap.usedMeasures the used heap memory for Glue drivers
aws_glue_driver_memory_heap_used_percentageglue.driver.memory.heap.used.percentageMeasures the percentage of heap memory used
aws_glue_driver_memory_non_heap_availableglue.driver.memory.non-heap.availableTracks the available non-heap memory for Glue drivers
aws_glue_driver_memory_non_heap_percentageglue.driver.memory.non-heap.percentageMeasures the percentage of non-heap memory used
aws_glue_driver_memory_non_heap_usedglue.driver.memory.non-heap.usedTracks the non-heap memory used by Glue drivers
aws_glue_driver_memory_total_availableglue.driver.memory.total.availableTracks the total available memory for Glue drivers
aws_glue_driver_memory_total_usedglue.driver.memory.total.usedMeasures the total memory used by Glue drivers
aws_glue_driver_memory_total_used_percentageglue.driver.memory.total.used.percentageTracks the percentage of total memory used
aws_glue_driver_partitions_readglue.driver.partitionsReadTracks the number of partitions read by drivers
aws_glue_driver_records_readglue.driver.recordsReadTracks the number of records read by Glue drivers
aws_glue_driver_records_writtenglue.driver.recordsWrittenMeasures the number of records written by Glue drivers
aws_glue_driver_s3_filesystem_read_bytesglue.driver.s3.filesystem.read_bytesMeasures the bytes read from S3 filesystem by drivers
aws_glue_driver_s3_filesystem_write_bytesglue.driver.s3.filesystem.write_bytesTracks the bytes written to S3 filesystem by drivers
aws_glue_driver_skewness_jobglue.driver.skewness.jobTracks skewness in job execution
aws_glue_driver_skewness_stageglue.driver.skewness.stageTracks skewness in stages of execution
aws_glue_driver_streaming_batch_processing_time_in_msglue.driver.streaming.batchProcessingTimeInMsMeasures the batch processing time in milliseconds for streaming jobs
aws_glue_driver_streaming_num_recordsglue.driver.streaming.numRecordsTracks the number of records processed in streaming jobs
aws_glue_driver_system_cpu_system_loadglue.driver.system.cpuSystemLoadMonitors the CPU system load on Glue drivers
aws_glue_driver_worker_utilizationglue.driver.workerUtilizationTracks the worker utilization rate
aws_glue_error_allglue.error.ALLTracks all errors occurring in Glue
aws_glue_succeed_allglue.succeed.ALLMeasures the success rate of all Glue jobs

AWS/IoT

Function: Provides cloud services to connect IoT devices to the cloud and manage IoT workloads

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_iot_infoGeneral information about AWS IoT
aws_iot_canceled_job_execution_countCanceledJobExecutionCountTracks the count of canceled job executions
aws_iot_canceled_job_execution_total_countCanceledJobExecutionTotalCountTracks the total count of canceled job executions
aws_iot_client_errorClientErrorMonitors the client error count
aws_iot_connect_auth_errorConnect.AuthErrorTracks authentication errors during connection attempts
aws_iot_connect_client_errorConnect.ClientErrorMeasures client-side errors during connection attempts
aws_iot_connect_server_errorConnect.ServerErrorTracks server-side errors during connection attempts
aws_iot_connect_successConnect.SuccessMeasures successful connection attempts
aws_iot_connect_throttleConnect.ThrottleMonitors throttled connection attempts
aws_iot_delete_thing_shadow_acceptedDeleteThingShadow.AcceptedTracks successful shadow deletions
aws_iot_failed_job_execution_countFailedJobExecutionCountTracks the count of failed job executions
aws_iot_failed_job_execution_total_countFailedJobExecutionTotalCountMeasures the total count of failed job executions
aws_iot_failureFailureTracks overall failure events
aws_iot_get_thing_shadow_accepted**GetThingShadow.AcceptedMeasures the number of successful shadow retrievals
aws_iot_in_progress_job_execution_countInProgressJobExecutionCountTracks the count of in-progress job executions
aws_iot_in_progress_job_execution_total_countInProgressJobExecutionTotalCountMeasures the total count of in-progress job executions
aws_iot_non_compliant_resourcesNonCompliantResourcesTracks the count of non-compliant resources
aws_iot_num_log_batches_failed_to_publish_throttledNumLogBatchesFailedToPublishThrottledMonitors log batches that failed to publish due to throttling
aws_iot_num_log_events_failed_to_publish_throttledNumLogEventsFailedToPublishThrottledMeasures log events that failed to publish due to throttling
aws_iot_parse_errorParseErrorTracks the number of message parse errors
aws_iot_ping_successPing.SuccessMeasures successful ping operations
aws_iot_publish_in_auth_errorPublishIn.AuthErrorTracks authentication errors during inbound publish operations
aws_iot_publish_in_client_errorPublishIn.ClientErrorMonitors client-side errors during inbound publish operations
aws_iot_publish_in_server_errorPublishIn.ServerErrorTracks server-side errors during inbound publish operations
aws_iot_publish_in_successPublishIn.SuccessMeasures successful inbound publish operations
aws_iot_publish_in_throttlePublishIn.ThrottleTracks throttled inbound publish operations
aws_iot_publish_out_auth_errorPublishOut.AuthErrorTracks authentication errors during outbound publish operations
aws_iot_publish_out_client_errorPublishOut.ClientErrorMonitors client-side errors during outbound publish operations
aws_iot_publish_out_successPublishOut.SuccessMeasures successful outbound publish operations
aws_iot_queued_job_execution_countQueuedJobExecutionCountTracks the count of job executions in the queue
aws_iot_queued_job_execution_total_countQueuedJobExecutionTotalCountMeasures the total count of queued job executions
aws_iot_rejected_job_execution_countRejectedJobExecutionCountTracks the count of rejected job executions
aws_iot_rejected_job_execution_total_countRejectedJobExecutionTotalCountMeasures the total count of rejected job executions
aws_iot_removed_job_execution_countRemovedJobExecutionCountTracks the count of removed job executions
aws_iot_removed_job_execution_total_countRemovedJobExecutionTotalCountMeasures the total count of removed job executions
aws_iot_resources_evaluatedResourcesEvaluatedMeasures the number of resources evaluated
aws_iot_rule_message_throttledRuleMessageThrottledTracks the number of rule messages throttled
aws_iot_rule_not_foundRuleNotFoundMeasures instances where rules were not found
aws_iot_rules_executedRulesExecutedTracks the number of executed rules
aws_iot_server_errorServerErrorMonitors server-side errors
aws_iot_subscribe_auth_errorSubscribe.AuthErrorTracks authentication errors during subscription attempts
aws_iot_subscribe_client_errorSubscribe.ClientErrorMeasures client-side errors during subscription attempts
aws_iot_subscribe_server_errorSubscribe.ServerErrorTracks server-side errors during subscription attempts
aws_iot_subscribe_successSubscribe.SuccessMeasures successful subscription attempts
aws_iot_subscribe_throttleSubscribe.ThrottleMonitors throttled subscription attempts
aws_iot_succeeded_job_execution_countSucceededJobExecutionCountTracks the count of successful job executions
aws_iot_succeeded_job_execution_total_countSucceededJobExecutionTotalCountMeasures the total count of successful job executions
aws_iot_successSuccessTracks overall successful operations
aws_iot_topic_matchTopicMatchMeasures the number of successful topic matches
aws_iot_unsubscribe_client_errorUnsubscribe.ClientErrorMonitors client-side errors during unsubscribe operations
aws_iot_unsubscribe_server_errorUnsubscribe.ServerErrorTracks server-side errors during unsubscribe operations
aws_iot_unsubscribe_successUnsubscribe.SuccessMeasures successful unsubscribe operations
aws_iot_unsubscribe_throttleUnsubscribe.ThrottleMonitors throttled unsubscribe operations
aws_iot_update_thing_shadow_acceptedUpdateThingShadow.AcceptedMeasures successful shadow update operations
aws_iot_violationsViolationsTracks policy violations
aws_iot_violations_clearedViolationsClearedMeasures cleared violations
aws_iot_violations_invalidatedViolationsInvalidatedTracks invalidated violations

AWS/Kafka

Function: Managed Apache Kafka service for building real-time streaming applications

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_kafka_infoGeneral information about AWS Kafka cluster
aws_kafka_active_controller_countActiveControllerCountIndicates how many active controllers are in the Kafka cluster
aws_kafka_burst_balanceBurstBalanceMeasures the burst balance remaining for the Kafka broker instances
aws_kafka_bw_in_allowance_exceededBwInAllowanceExceededTracks the instances where incoming bandwidth allowance has been exceeded
aws_kafka_bw_out_allowance_exceededBwOutAllowanceExceededTracks the instances where outgoing bandwidth allowance has been exceeded
aws_kafka_bytes_in_per_secBytesInPerSecMeasures the rate of incoming bytes per second into the Kafka cluster
aws_kafka_bytes_out_per_secBytesOutPerSecMeasures the rate of outgoing bytes per second from the Kafka cluster
aws_kafka_cpucredit_balanceCPUCreditBalanceShows the remaining CPU credits for instances running in burstable performance mode
aws_kafka_client_connection_countClientConnectionCountIndicates the total number of client connections to the Kafka brokers
aws_kafka_conn_track_allowance_exceededConnTrackAllowanceExceededTracks instances where the connection tracking allowance is exceeded
aws_kafka_connection_close_rateConnectionCloseRateMonitors the rate at which connections are being closed
aws_kafka_connection_countConnectionCountDisplays the number of open connections to the Kafka brokers
aws_kafka_connection_creation_rateConnectionCreationRateTracks the rate of new connections being created to the Kafka brokers
aws_kafka_cpu_credit_usageCPUCreditUsageShows the CPU credits consumed by the Kafka instances running in burstable mode
aws_kafka_cpu_idleCPUIdleIndicates the percentage of idle CPU resources on Kafka instances
aws_kafka_cpu_io_waitCpuIoWaitMeasures the time instances spend waiting for I/O operations to complete
aws_kafka_cpu_systemCpuSystemTracks CPU usage by the system processes on Kafka instances
aws_kafka_cpu_userCpuUserShows CPU usage by user processes on Kafka instances
aws_kafka_estimated_max_time_lagEstimatedMaxTimeLagMeasures the maximum estimated time lag in replication
aws_kafka_estimated_time_lagEstimatedTimeLagMonitors the estimated time lag in replication between Kafka brokers
aws_kafka_fetch_consumer_local_time_ms_meanFetchConsumerLocalTimeMsMeanMeasures the average time it takes to fetch messages locally by the consumer
aws_kafka_fetch_consumer_request_queue_time_ms_meanFetchConsumerRequestQueueTimeMsMeanIndicates the average time messages spend in the consumer request queue
aws_kafka_fetch_consumer_response_queue_time_ms_meanFetchConsumerResponseQueueTimeMsMeanTracks the average time it takes for a consumer to queue a response
aws_kafka_fetch_consumer_response_send_time_ms_meanFetchConsumerResponseSendTimeMsMeanMeasures the average time taken to send a consumer response
aws_kafka_fetch_consumer_total_time_ms_meanFetchConsumerTotalTimeMsMeanTracks the total time spent processing a consumer fetch request
aws_kafka_fetch_follower_local_time_ms_meanFetchFollowerLocalTimeMsMeanMeasures the average time it takes for a Kafka broker follower to fetch messages locally
aws_kafka_fetch_follower_request_queue_time_ms_meanFetchFollowerRequestQueueTimeMsMeanMeasures the time follower fetch requests spend in the queue
aws_kafka_fetch_follower_response_queue_time_ms_meanFetchFollowerResponseQueueTimeMsMeanTracks the time follower fetch responses spend in the response queue
aws_kafka_fetch_follower_response_send_time_ms_meanFetchFollowerResponseSendTimeMsMeanMeasures the time it takes for a Kafka broker follower to send a fetch response
aws_kafka_fetch_follower_total_time_ms_meanFetchFollowerTotalTimeMsMeanTracks the total time for a Kafka broker follower to fetch messages
aws_kafka_fetch_message_conversions_per_secFetchMessageConversionsPerSecMonitors the rate of message format conversions during fetching
aws_kafka_fetch_throttle_byte_rateFetchThrottleByteRateMeasures the rate at which fetching is throttled due to byte rate limits
aws_kafka_fetch_throttle_queue_sizeFetchThrottleQueueSizeIndicates the number of messages in the fetch throttle queue
aws_kafka_fetch_throttle_timeFetchThrottleTimeTracks the total time Kafka throttles fetch requests
aws_kafka_global_partition_countGlobalPartitionCountDisplays the total number of partitions in the Kafka cluster
aws_kafka_global_topic_countGlobalTopicCountShows the total number of topics in the Kafka cluster
aws_kafka_heap_memory_after_gcHeapMemoryAfterGCTracks the amount of heap memory remaining after garbage collection
aws_kafka_app_logs_disk_usedKafkaAppLogsDiskUsedMeasures the amount of disk space used by Kafka application logs
aws_kafka_data_logs_disk_usedKafkaDataLogsDiskUsedMeasures the disk space used by Kafka data logs
aws_kafka_leader_countLeaderCountShows the number of partition leaders in the Kafka cluster
aws_kafka_max_offset_lagMaxOffsetLagMeasures the maximum offset lag between Kafka brokers
aws_kafka_memory_bufferedMemoryBufferedIndicates the amount of memory currently buffered by Kafka
aws_kafka_memory_cachedMemoryCachedShows the amount of memory cached by Kafka
aws_kafka_memory_freeMemoryFreeDisplays the amount of free memory on Kafka brokers
aws_kafka_memory_usedMemoryUsedMeasures the total amount of memory being used by Kafka brokers
aws_kafka_messages_in_per_secMessagesInPerSecTracks the number of messages produced per second in the Kafka cluster
aws_kafka_network_processor_avg_idle_percentNetworkProcessorAvgIdlePercentMeasures the idle percentage of the network processors
aws_kafka_network_rx_droppedNetworkRxDroppedShows the number of dropped incoming network packets
aws_kafka_network_rx_errorsNetworkRxErrorsTracks the number of errors on received network packets
aws_kafka_network_rx_packetsNetworkRxPacketsMeasures the number of network packets received
aws_kafka_network_tx_droppedNetworkTxDroppedTracks the number of dropped outgoing network packets
aws_kafka_network_tx_errorsNetworkTxErrorsShows the number of errors on transmitted network packets
aws_kafka_network_tx_packetsNetworkTxPacketsTracks the number of network packets transmitted
aws_kafka_offline_partitions_countOfflinePartitionsCountMonitors the number of Kafka partitions that are offline
aws_kafka_offset_lagOffsetLagMeasures the current offset lag in Kafka replication
aws_kafka_partition_countPartitionCountDisplays the total number of partitions in the Kafka cluster
aws_kafka_pps_allowance_exceededPpsAllowanceExceededTracks instances where the packets-per-second allowance has been exceeded
aws_kafka_produce_local_time_ms_meanProduceLocalTimeMsMeanMeasures the average time taken to produce messages locally
aws_kafka_produce_message_conversions_per_secProduceMessageConversionsPerSecMonitors the rate of message conversions during production
aws_kafka_produce_message_conversions_time_ms_meanProduceMessageConversionsTimeMsMeanTracks the time taken to convert messages during production
aws_kafka_produce_request_queue_time_ms_meanProduceRequestQueueTimeMsMeanMeasures the time produce requests spend in the queue
aws_kafka_produce_response_queue_time_ms_meanProduceResponseQueueTimeMsMeanMonitors the time produce responses spend in the queue
aws_kafka_produce_response_send_time_ms_meanProduceResponseSendTimeMsMeanTracks the time it takes to send produce responses
aws_kafka_produce_throttle_byte_rateProduceThrottleByteRateMeasures the rate at which production is throttled due to byte rate limits
aws_kafka_produce_throttle_queue_sizeProduceThrottleQueueSizeTracks the size of the production throttle queue
aws_kafka_produce_throttle_timeProduceThrottleTimeMeasures the total time Kafka throttles produce requests
aws_kafka_produce_total_time_ms_meanProduceTotalTimeMsMeanTracks the total time spent on producing messages
aws_kafka_remote_copy_bytes_per_secRemoteCopyBytesPerSecMeasures the rate of bytes copied remotely
aws_kafka_remote_copy_errors_per_secRemoteCopyErrorsPerSecTracks the rate of errors during remote copying
aws_kafka_remote_copy_lag_bytesRemoteCopyLagBytesMonitors the lag in bytes during remote copying
aws_kafka_remote_fetch_bytes_per_secRemoteFetchBytesPerSecTracks the rate of bytes fetched remotely
aws_kafka_remote_fetch_errors_per_secRemoteFetchErrorsPerSecMeasures the rate of errors during remote fetching
aws_kafka_remote_fetch_requests_per_secRemoteFetchRequestsPerSecTracks the number of remote fetch requests per second
aws_kafka_remote_log_manager_tasks_avg_idle_percentRemoteLogManagerTasksAvgIdlePercentMonitors the idle percentage of remote log manager tasks
aws_kafka_remote_log_reader_avg_idle_percentRemoteLogReaderAvgIdlePercentTracks the idle percentage of remote log reader tasks
aws_kafka_remote_log_reader_task_queue_sizeRemoteLogReaderTaskQueueSizeMeasures the size of the remote log reader task queue
aws_kafka_replication_bytes_in_per_secReplicationBytesInPerSecTracks the rate of incoming replication bytes
aws_kafka_replication_bytes_out_per_secReplicationBytesOutPerSecMeasures the rate of outgoing replication bytes
aws_kafka_request_bytes_meanRequestBytesMeanTracks the average size of Kafka requests
aws_kafka_request_exempt_from_throttle_timeRequestExemptFromThrottleTimeTracks the time requests are exempt from throttling
aws_kafka_request_handler_avg_idle_percentRequestHandlerAvgIdlePercentMeasures the idle percentage of request handlers
aws_kafka_request_throttle_queue_sizeRequestThrottleQueueSizeTracks the size of the request throttle queue
aws_kafka_request_throttle_timeRequestThrottleTimeMeasures the time requests are throttled in Kafka
aws_kafka_request_timeRequestTimeMonitors the overall time spent handling requests in Kafka
aws_kafka_root_disk_usedRootDiskUsedTracks the amount of disk space used by the root partition
aws_kafka_sum_offset_lagSumOffsetLagMeasures the total offset lag across all partitions
aws_kafka_swap_freeSwapFreeTracks the amount of free swap memory available on Kafka brokers
aws_kafka_swap_usedSwapUsedMeasures the amount of swap memory used by Kafka brokers
aws_kafka_tcpconnectionsTCPConnectionsTracks the total number of TCP connections on the Kafka cluster
aws_kafka_tcp_connectionsTcpConnectionsMonitors the active TCP connections in the Kafka cluster
aws_kafka_traffic_bytesTrafficBytesMeasures the total traffic in bytes on Kafka brokers
aws_kafka_traffic_shapingTrafficShapingTracks instances where traffic shaping is applied to Kafka brokers
aws_kafka_under_min_isr_partition_countUnderMinIsrPartitionCountTracks the number of partitions below the minimum in-sync replicas
aws_kafka_under_replicated_partitionsUnderReplicatedPartitionsMeasures the number of under-replicated partitions in the Kafka cluster
aws_kafka_volume_queue_lengthVolumeQueueLengthTracks the queue length for disk I/O operations
aws_kafka_volume_read_bytesVolumeReadBytesMeasures the number of bytes read from disk
aws_kafka_volume_read_opsVolumeReadOpsTracks the number of read operations on the disk
aws_kafka_volume_total_read_timeVolumeTotalReadTimeMeasures the total time spent on disk read operations
aws_kafka_volume_total_write_timeVolumeTotalWriteTimeMeasures the total time spent on disk write operations
aws_kafka_volume_write_bytesVolumeWriteBytesTracks the number of bytes written to disk
aws_kafka_volume_write_opsVolumeWriteOpsMeasures the number of write operations on the disk
aws_kafka_zoo_keeper_request_latency_ms_meanZooKeeperRequestLatencyMsMeanMeasures the average latency of requests to ZooKeeper
aws_kafka_zoo_keeper_session_stateZooKeeperSessionStateTracks the current session state of ZooKeeper

AWS/Kinesis

Function: Managed service for real-time data processing and analytics

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_kinesis_info
aws_kinesis_get_records_bytesGetRecords.BytesMeasures the total number of bytes retrieved by the GetRecords call.
aws_kinesis_get_records_iterator_ageGetRecords.IteratorAgeMeasures the age of the last record retrieved using the iterator.
aws_kinesis_get_records_iterator_age_millisecondsGetRecords.IteratorAgeMillisecondsMeasures the age of the iterator in milliseconds for the GetRecords call.
aws_kinesis_get_records_latencyGetRecords.LatencyTracks the latency of the GetRecords call to retrieve data from a stream.
aws_kinesis_get_records_recordsGetRecords.RecordsTracks the total number of records retrieved by the GetRecords call.
aws_kinesis_get_records_successGetRecords.SuccessMeasures the success rate of the GetRecords call.
aws_kinesis_incoming_bytesIncomingBytesTracks the number of incoming bytes written to the stream.
aws_kinesis_incoming_recordsIncomingRecordsMeasures the total number of records being written to the stream.
aws_kinesis_iterator_age_millisecondsIteratorAgeMillisecondsTracks the age of the iterator used in GetRecords, measured in milliseconds.
aws_kinesis_outgoing_bytesOutgoingBytesTracks the total number of outgoing bytes from the stream.
aws_kinesis_outgoing_recordsOutgoingRecordsMeasures the total number of outgoing records from the stream.
aws_kinesis_put_record_bytesPutRecord.BytesMeasures the total number of bytes in the PutRecord call.
aws_kinesis_put_record_latencyPutRecord.LatencyTracks the latency of PutRecord requests to write data to the stream.
aws_kinesis_put_record_successPutRecord.SuccessMeasures the success rate of the PutRecord call.
aws_kinesis_put_records_bytesPutRecords.BytesMeasures the total number of bytes written using the PutRecords call.
aws_kinesis_put_records_failed_recordsPutRecords.FailedRecordsTracks the number of failed records in the PutRecords call.
aws_kinesis_put_records_latencyPutRecords.LatencyMeasures the latency of PutRecords requests to the stream.
aws_kinesis_put_records_recordsPutRecords.RecordsTracks the total number of records written using the PutRecords call.
aws_kinesis_put_records_successPutRecords.SuccessMeasures the success rate of the PutRecords call.
aws_kinesis_put_records_successful_recordsPutRecords.SuccessfulRecordsMeasures the total number of successful records in the PutRecords call.
aws_kinesis_put_records_throttled_recordsPutRecords.ThrottledRecordsTracks the number of throttled records in the PutRecords call due to exceeding throughput limits.
aws_kinesis_put_records_total_recordsPutRecords.TotalRecordsMeasures the total number of records submitted via PutRecords.
aws_kinesis_read_provisioned_throughput_exceededReadProvisionedThroughputExceededTracks the number of times read requests exceeded the provisioned throughput.
aws_kinesis_subscribe_to_shard_rate_exceededSubscribeToShard.RateExceededTracks the number of times the rate for SubscribeToShard exceeded limits.
aws_kinesis_subscribe_to_shard_successSubscribeToShard.SuccessMeasures the success rate of SubscribeToShard operations.
aws_kinesis_subscribe_to_shard_event_bytesSubscribeToShardEvent.BytesTracks the number of bytes received in shard events during SubscribeToShard operations.
aws_kinesis_subscribe_to_shard_event_millis_behind_latestSubscribeToShardEvent.MillisBehindLatestTracks how far behind the latest event the shard event is during SubscribeToShard operations.
aws_kinesis_subscribe_to_shard_event_recordsSubscribeToShardEvent.RecordsMeasures the number of records received in shard events during SubscribeToShard operations.
aws_kinesis_subscribe_to_shard_event_successSubscribeToShardEvent.SuccessTracks the success rate of SubscribeToShard events.
aws_kinesis_write_provisioned_throughput_exceededWriteProvisionedThroughputExceededMeasures the number of times write operations exceeded the provisioned throughput limits.

AWS/KinesisAnalytics

Function: Processes streaming data in real time using SQL

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_kinesisanalytics_bytesBytesTracks the total amount of data processed by Kinesis Analytics.
aws_kinesisanalytics_input_processing_dropped_recordsInputProcessing.DroppedRecordsMeasures the number of dropped records during input processing.
aws_kinesisanalytics_input_processing_durationInputProcessing.DurationTracks the duration of input processing.
aws_kinesisanalytics_input_processing_ok_bytesInputProcessing.OkBytesMeasures the number of bytes successfully processed during input.
aws_kinesisanalytics_input_processing_ok_recordsInputProcessing.OkRecordsTracks the number of records successfully processed during input.
aws_kinesisanalytics_input_processing_processing_failed_recordsInputProcessing.ProcessingFailedRecordsMeasures the number of records that failed during input processing.
aws_kinesisanalytics_input_processing_successInputProcessing.SuccessTracks the success rate of input processing operations.
aws_kinesisanalytics_kpus KPUsMonitors the number of Kinesis Processing Units (KPUs) used.
aws_kinesisanalytics_lambda_delivery_delivery_failed_recordsLambdaDelivery.DeliveryFailedRecordsMeasures the number of failed records delivered to AWS Lambda by Kinesis Analytics.
aws_kinesisanalytics_lambda_delivery_durationLambdaDelivery.DurationTracks the duration of record delivery to AWS Lambda.
aws_kinesisanalytics_lambda_delivery_ok_recordsLambdaDelivery.OkRecordsMeasures the number of records successfully delivered to AWS Lambda.
aws_kinesisanalytics_millis_behind_latestMillisBehindLatestTracks the time Kinesis Analytics is behind the latest record in milliseconds.
aws_kinesisanalytics_recordsRecordsMeasures the total number of records processed by Kinesis Analytics.
aws_kinesisanalytics_successSuccessTracks the success rate of all Kinesis Analytics operations.
aws_kinesisanalytics_back_pressured_time_ms_per_secondbackPressuredTimeMsPerSecondMeasures the amount of time in milliseconds Kinesis Analytics was back-pressured.
aws_kinesisanalytics_busy_time_ms_per_secondbusyTimeMsPerSecondTracks the time Kinesis Analytics spent in a busy state, processing data.
aws_kinesisanalytics_bytes_requested_per_fetchbytesRequestedPerFetchMeasures the number of bytes requested in each fetch operation.
aws_kinesisanalytics_bytes_consumed_ratebytes_consumed_rateTracks the rate at which bytes are consumed from the stream.
aws_kinesisanalytics_commits_failedcommitsFailedMeasures the number of failed commit operations.
aws_kinesisanalytics_commits_succeededcommitsSucceededTracks the number of successful commit operations.
aws_kinesisanalytics_committedoffsetscommittedoffsetsMonitors the committed offsets of records processed.
aws_kinesisanalytics_container_cpuutilizationcontainerCPUUtilizationTracks the CPU utilization of the Kinesis Analytics container.
aws_kinesisanalytics_container_disk_utilizationcontainerDiskUtilizationMonitors the disk utilization of the Kinesis Analytics container.
aws_kinesisanalytics_container_memory_utilizationcontainerMemoryUtilizationMeasures the memory utilization of the Kinesis Analytics container.
aws_kinesisanalytics_cpu_utilizationcpuUtilizationTracks the overall CPU utilization of Kinesis Analytics.
aws_kinesisanalytics_current_input_watermarkcurrentInputWatermarkMonitors the current watermark for input data.
aws_kinesisanalytics_current_output_watermarkcurrentOutputWatermarkTracks the current watermark for output data.
aws_kinesisanalytics_currentoffsetscurrentoffsetsMeasures the current offsets for processed records.
aws_kinesisanalytics_downtimedowntimeTracks the total downtime of the Kinesis Analytics application.
aws_kinesisanalytics_full_restartsfullRestartsMeasures the number of full restarts of the Kinesis Analytics application.
aws_kinesisanalytics_heap_memory_utilizationheapMemoryUtilizationMonitors the heap memory utilization.
aws_kinesisanalytics_idle_time_ms_per_secondidleTimeMsPerSecondTracks the idle time of Kinesis Analytics in milliseconds per second.
aws_kinesisanalytics_last_checkpoint_durationlastCheckpointDurationMeasures the duration of the last checkpoint process.
aws_kinesisanalytics_last_checkpoint_sizelastCheckpointSizeMonitors the size of the last checkpoint.
aws_kinesisanalytics_managed_memory_totalmanagedMemoryTotalTracks the total managed memory available.
aws_kinesisanalytics_managed_memory_usedmanagedMemoryUsedMeasures the amount of managed memory currently in use.
aws_kinesisanalytics_managed_memory_utilizationmanagedMemoryUtilizationTracks the utilization of managed memory.
aws_kinesisanalytics_num_late_records_droppednumLateRecordsDroppedMeasures the number of late records dropped by Kinesis Analytics.
aws_kinesisanalytics_num_records_innumRecordsInTracks the number of records ingested by Kinesis Analytics.
aws_kinesisanalytics_num_records_in_per_secondnumRecordsInPerSecondMonitors the rate of incoming records per second.
aws_kinesisanalytics_num_records_outnumRecordsOutMeasures the number of records output by Kinesis Analytics.
aws_kinesisanalytics_num_records_out_per_secondnumRecordsOutPerSecondTracks the rate of outgoing records per second.
aws_kinesisanalytics_number_of_failed_checkpointsnumberOfFailedCheckpointsMeasures the number of failed checkpoints in Kinesis Analytics.
aws_kinesisanalytics_old_generation_gccountoldGenerationGCCountTracks the count of garbage collection events in the old generation heap space.
aws_kinesisanalytics_old_generation_gctimeoldGenerationGCTimeMeasures the time spent in garbage collection for the old generation heap.
aws_kinesisanalytics_records_lag_maxrecords_lag_maxTracks the maximum lag of records being processed by Kinesis Analytics.
aws_kinesisanalytics_thread_countthreadCountMonitors the number of active threads in the Kinesis Analytics application.
aws_kinesisanalytics_uptime uptimeMeasures the uptime of the Kinesis Analytics application.
aws_kinesisanalytics_zeppelin_cpu_utilizationzeppelinCpuUtilizationTracks the CPU utilization of the Zeppelin server used by Kinesis Analytics.
aws_kinesisanalytics_zeppelin_heap_memory_utilizationzeppelinHeapMemoryUtilizationMonitors the heap memory utilization of the Zeppelin server.
aws_kinesisanalytics_zeppelin_server_uptimezeppelinServerUptimeTracks the uptime of the Zeppelin server.
aws_kinesisanalytics_zeppelin_thread_countzeppelinThreadCountMonitors the number of active threads in the Zeppelin server.
aws_kinesisanalytics_zeppelin_waiting_jobszeppelinWaitingJobsMeasures the number of jobs waiting to be processed by the Zeppelin server.

AWS/Lambda

Function: Serverless compute service that runs code in response to events

Scrape interval: 5 minutes

Includes: Out-of-the-box dashboard

MetricCloudwatch metricPurpose
aws_lambda_info
aws_lambda_invocationsInvocationsTracks the number of times your AWS Lambda function is invoked.
aws_lambda_errorsErrorsMonitors the number of invocations that result in an error.
aws_lambda_throttlesThrottlesMeasures the number of times your Lambda function is throttled due to exceeding the concurrency limit.
aws_lambda_durationDurationTracks the amount of time a Lambda function takes to execute.
aws_lambda_async_event_ageAsyncEventAgeMeasures the age of an asynchronous event when Lambda begins executing the associated function.
aws_lambda_async_events_droppedAsyncEventsDroppedMonitors the number of asynchronous events dropped due to Lambda service errors or throttling.
aws_lambda_async_events_receivedAsyncEventsReceivedTracks the number of asynchronous events received by the Lambda function.
aws_lambda_claimed_account_concurrencyClaimedAccountConcurrencyMonitors the number of reserved concurrent executions for your account.
aws_lambda_concurrent_executionsConcurrentExecutionsTracks the number of concurrent executions across all Lambda functions in your account.
aws_lambda_dead_letter_errorsDeadLetterErrorsMeasures the number of failed invocations that couldn’t be sent to the Dead Letter Queue.
aws_lambda_destination_delivery_failuresDestinationDeliveryFailuresTracks the number of failures when delivering function results to a destination service.
aws_lambda_iterator_ageIteratorAgeMeasures the age of the last record in the event source before Lambda starts processing.
aws_lambda_offset_lagOffsetLagTracks the offset lag for Kinesis or DynamoDB streams when invoking Lambda functions.
aws_lambda_oversized_record_countOversizedRecordCountMeasures the number of records that exceeded the maximum size supported by Lambda.
aws_lambda_post_runtime_extensions_durationPostRuntimeExtensionsDurationTracks the time taken by post-runtime extensions after Lambda function execution.
aws_lambda_provisioned_concurrency_invocationsProvisionedConcurrencyInvocationsMeasures the number of invocations served by functions with provisioned concurrency.
aws_lambda_provisioned_concurrency_spillover_invocationsProvisionedConcurrencySpilloverInvocationsTracks the number of invocations that were served by standard concurrency when provisioned concurrency was exhausted.
aws_lambda_provisioned_concurrency_utilizationProvisionedConcurrencyUtilizationMeasures the percentage of provisioned concurrency that is being used by your Lambda function.
aws_lambda_provisioned_concurrent_executionsProvisionedConcurrentExecutionsTracks the number of concurrent executions using provisioned concurrency.
aws_lambda_recursive_invocations_droppedRecursiveInvocationsDroppedMeasures the number of recursive invocations that were dropped.
aws_lambda_unreserved_concurrent_executionsUnreservedConcurrentExecutionsTracks the number of concurrent executions that are not using provisioned concurrency.

AWS/Logs

Function: Centralized logging service for monitoring and troubleshooting applications

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_logs_info
aws_logs_delivery_errorsDeliveryErrorsTracks the number of errors that occurred while attempting to deliver log data to the CloudWatch Logs destination.
aws_logs_delivery_throttlingDeliveryThrottlingMeasures the number of times log delivery was throttled due to exceeding the delivery limits.
aws_logs_forwarded_bytesForwardedBytesMonitors the total volume of log data in bytes that was successfully forwarded to the CloudWatch Logs destination.
aws_logs_forwarded_log_eventsForwardedLogEventsTracks the number of log events successfully forwarded to the CloudWatch Logs destination.
aws_logs_incoming_bytesIncomingBytesMeasures the total volume of incoming log data in bytes received by CloudWatch Logs.
aws_logs_incoming_log_eventsIncomingLogEventsTracks the number of log events received by CloudWatch Logs.

AWS/MWAA

Function: Managed service for Apache Airflow to manage workflows and orchestration

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_mwaa_active_connection_countActiveConnectionCountTracks the number of active connections to the Managed Workflows for Apache Airflow (MWAA) environment.
aws_mwaa_approximate_age_of_oldest_taskApproximateAgeOfOldestTaskMeasures the age of the oldest running task in the MWAA environment.
aws_mwaa_cpuutilizationCPUUtilizationMonitors the percentage of CPU utilization in the MWAA environment.
aws_mwaa_database_connectionsDatabaseConnectionsTracks the number of connections to the database used by MWAA.
aws_mwaa_disk_queue_depthDiskQueueDepthMeasures the depth of the disk queue, indicating the number of IO operations waiting to be processed.
aws_mwaa_freeable_memoryFreeableMemoryMonitors the amount of free memory available in the MWAA environment.
aws_mwaa_memory_utilizationMemoryUtilizationTracks the percentage of memory utilized in the MWAA environment.
aws_mwaa_queued_tasksQueuedTasksMeasures the number of tasks waiting to be executed in the MWAA environment.
aws_mwaa_running_tasksRunningTasksTracks the number of tasks currently running in the MWAA environment.
aws_mwaa_volume_write_iopsVolumeWriteIOPSMonitors the input/output operations per second (IOPS) for write operations on the volume.
aws_mwaa_write_iopsWriteIOPSTracks the number of write operations per second in the MWAA environment.
aws_mwaa_write_latencyWriteLatencyMeasures the latency of write operations in the MWAA environment.
aws_mwaa_write_throughputWriteThroughputMonitors the amount of data written per second in the MWAA environment.

AWS/MediaConnect

Function: Secure and reliable transport of live video streams

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_mediaconnect_info
aws_mediaconnect_arqrecoveredARQRecoveredMonitors the number of Automatic Repeat reQuest (ARQ) packets successfully recovered in the MediaConnect flow.
aws_mediaconnect_arqrequestsARQRequestsTracks the number of ARQ requests made by MediaConnect flows.
aws_mediaconnect_bit_rateBitRateMeasures the bitrate of the MediaConnect stream.
aws_mediaconnect_caterrorCATErrorDetects Conditional Access Table (CAT) errors in the MediaConnect stream.
aws_mediaconnect_crcerrorCRCErrorTracks the number of cyclic redundancy check (CRC) errors in the stream.
aws_mediaconnect_connectedConnectedMonitors the connection status of the MediaConnect flow.
aws_mediaconnect_connected_outputsConnectedOutputsTracks the number of outputs connected to the MediaConnect flow.
aws_mediaconnect_connection_attemptsConnectionAttemptsMeasures the number of attempts made to establish a connection for the flow.
aws_mediaconnect_consecutive_dropsConsecutiveDropsMonitors the number of consecutive dropped packets in the MediaConnect flow.
aws_mediaconnect_consecutive_not_recoveredConsecutiveNotRecoveredTracks the number of consecutive packets that were not successfully recovered.
aws_mediaconnect_continuity_counterContinuityCounterMonitors the continuity counter of the stream to detect missing packets.
aws_mediaconnect_disconnectionsDisconnectionsTracks the number of times the MediaConnect flow was disconnected.
aws_mediaconnect_dropped_packetsDroppedPacketsMonitors the number of packets dropped in the MediaConnect flow.
aws_mediaconnect_egress_bridge_bit_rateEgressBridgeBitRateTracks the bitrate for egress bridge flows.
aws_mediaconnect_egress_bridge_caterrorEgressBridgeCATErrorDetects CAT errors in egress bridge flows.
aws_mediaconnect_egress_bridge_crcerrorEgressBridgeCRCErrorMonitors the CRC errors in egress bridge flows.
aws_mediaconnect_egress_bridge_continuity_counterEgressBridgeContinuityCounterMeasures the continuity of the egress bridge stream to detect missing packets.
aws_mediaconnect_egress_bridge_dropped_packetsEgressBridgeDroppedPacketsTracks the number of packets dropped in the egress bridge flows.
aws_mediaconnect_egress_bridge_failover_switchesEgressBridgeFailoverSwitchesMonitors failover switches in the egress bridge flows.
aws_mediaconnect_egress_bridge_merge_activeEgressBridgeMergeActiveIndicates if an egress bridge merge is active.
aws_mediaconnect_egress_bridge_not_recovered_packetsEgressBridgeNotRecoveredPacketsTracks the number of packets that were not recovered in the egress bridge.
aws_mediaconnect_egress_bridge_paterrorEgressBridgePATErrorDetects Program Association Table (PAT) errors in the egress bridge.
aws_mediaconnect_egress_bridge_pcraccuracy_errorEgressBridgePCRAccuracyErrorMonitors errors related to the accuracy of Program Clock Reference (PCR) in the egress bridge.
aws_mediaconnect_egress_bridge_pcrerrorEgressBridgePCRErrorTracks PCR errors in the egress bridge.
aws_mediaconnect_egress_bridge_piderrorEgressBridgePIDErrorMonitors Packet Identifier (PID) errors in the egress bridge stream.
aws_mediaconnect_egress_bridge_pmterrorEgressBridgePMTErrorDetects errors in the Program Map Table (PMT) in the egress bridge.
aws_mediaconnect_egress_bridge_ptserrorEgressBridgePTSErrorTracks Presentation Time Stamp (PTS) errors in the egress bridge stream.
aws_mediaconnect_egress_bridge_packet_loss_percentEgressBridgePacketLossPercentMeasures the percentage of packet loss in the egress bridge.
aws_mediaconnect_egress_bridge_recovered_packetsEgressBridgeRecoveredPacketsTracks the number of recovered packets in the egress bridge stream.
aws_mediaconnect_egress_bridge_source_bit_rateEgressBridgeSourceBitRateMonitors the bitrate of the source in the egress bridge.
aws_mediaconnect_egress_bridge_source_caterrorEgressBridgeSourceCATErrorDetects CAT errors in the source of the egress bridge.
aws_mediaconnect_egress_bridge_source_crcerrorEgressBridgeSourceCRCErrorTracks CRC errors in the source of the egress bridge.
aws_mediaconnect_egress_bridge_source_continuity_counterEgressBridgeSourceContinuityCounterMeasures the continuity of the source stream in the egress bridge to detect missing packets.
aws_mediaconnect_egress_bridge_source_dropped_packetsEgressBridgeSourceDroppedPacketsMonitors the number of dropped packets in the source stream of the egress bridge.
aws_mediaconnect_egress_bridge_source_merge_activeEgressBridgeSourceMergeActiveIndicates if the source merge is active in the egress bridge.
aws_mediaconnect_egress_bridge_source_merge_latencyEgressBridgeSourceMergeLatencyMeasures latency during source merge in the egress bridge.
aws_mediaconnect_egress_bridge_source_not_recovered_packetsEgressBridgeSourceNotRecoveredPacketsTracks the number of packets not recovered in the source of the egress bridge.
aws_mediaconnect_egress_bridge_source_paterrorEgressBridgeSourcePATErrorDetects PAT errors in the source of the egress bridge.
aws_mediaconnect_egress_bridge_source_pcraccuracy_errorEgressBridgeSourcePCRAccuracyErrorMonitors errors in the accuracy of the PCR in the source of the egress bridge.
aws_mediaconnect_egress_bridge_source_pcrerrorEgressBridgeSourcePCRErrorTracks PCR errors in the source stream of the egress bridge.
aws_mediaconnect_egress_bridge_source_piderrorEgressBridgeSourcePIDError
aws_mediaconnect_egress_bridge_source_pmterrorEgressBridgeSourcePMTError
aws_mediaconnect_egress_bridge_source_ptserrorEgressBridgeSourcePTSError
aws_mediaconnect_egress_bridge_source_packet_loss_percentEgressBridgeSourcePacketLossPercent
aws_mediaconnect_egress_bridge_source_recovered_packetsEgressBridgeSourceRecoveredPackets
aws_mediaconnect_egress_bridge_source_tsbyte_errorEgressBridgeSourceTSByteError
aws_mediaconnect_egress_bridge_source_tssync_lossEgressBridgeSourceTSSyncLoss
aws_mediaconnect_egress_bridge_source_total_packetsEgressBridgeSourceTotalPackets
aws_mediaconnect_egress_bridge_source_transport_errorEgressBridgeSourceTransportError
aws_mediaconnect_egress_bridge_tsbyte_errorEgressBridgeTSByteError
aws_mediaconnect_egress_bridge_tssync_lossEgressBridgeTSSyncLoss
aws_mediaconnect_egress_bridge_total_packetsEgressBridgeTotalPackets
aws_mediaconnect_egress_bridge_transport_errorEgressBridgeTransportError
aws_mediaconnect_failover_switchesFailoverSwitches
aws_mediaconnect_ingress_bridge_bit_rateIngressBridgeBitRate
aws_mediaconnect_ingress_bridge_caterrorIngressBridgeCATError
aws_mediaconnect_ingress_bridge_crcerrorIngressBridgeCRCError
aws_mediaconnect_ingress_bridge_continuity_counterIngressBridgeContinuityCounter
aws_mediaconnect_ingress_bridge_dropped_packetsIngressBridgeDroppedPackets
aws_mediaconnect_ingress_bridge_failover_switchesIngressBridgeFailoverSwitches
aws_mediaconnect_ingress_bridge_merge_activeIngressBridgeMergeActive
aws_mediaconnect_ingress_bridge_not_recovered_packetsIngressBridgeNotRecoveredPackets
aws_mediaconnect_ingress_bridge_paterrorIngressBridgePATError
aws_mediaconnect_ingress_bridge_pcraccuracy_errorIngressBridgePCRAccuracyError
aws_mediaconnect_ingress_bridge_pcrerrorIngressBridgePCRError
aws_mediaconnect_ingress_bridge_piderrorIngressBridgePIDError
aws_mediaconnect_ingress_bridge_pmterrorIngressBridgePMTError
aws_mediaconnect_ingress_bridge_ptserrorIngressBridgePTSError
aws_mediaconnect_ingress_bridge_packet_loss_percentIngressBridgePacketLossPercent
aws_mediaconnect_ingress_bridge_recovered_packetsIngressBridgeRecoveredPackets
aws_mediaconnect_ingress_bridge_source_arqrecoveredIngressBridgeSourceARQRecovered
aws_mediaconnect_ingress_bridge_source_arqrequestsIngressBridgeSourceARQRequests
aws_mediaconnect_ingress_bridge_source_bit_rateIngressBridgeSourceBitRate
aws_mediaconnect_ingress_bridge_source_caterrorIngressBridgeSourceCATError
aws_mediaconnect_ingress_bridge_source_crcerrorIngressBridgeSourceCRCError
aws_mediaconnect_ingress_bridge_source_continuity_counterIngressBridgeSourceContinuityCounter
aws_mediaconnect_ingress_bridge_source_dropped_packetsIngressBridgeSourceDroppedPackets
aws_mediaconnect_ingress_bridge_source_fecpacketsIngressBridgeSourceFECPackets
aws_mediaconnect_ingress_bridge_source_fecrecoveredIngressBridgeSourceFECRecovered
aws_mediaconnect_ingress_bridge_source_merge_activeIngressBridgeSourceMergeActive
aws_mediaconnect_ingress_bridge_source_merge_latencyIngressBridgeSourceMergeLatency
aws_mediaconnect_ingress_bridge_source_not_recovered_packetsIngressBridgeSourceNotRecoveredPackets
aws_mediaconnect_ingress_bridge_source_overflow_packetsIngressBridgeSourceOverflowPackets
aws_mediaconnect_ingress_bridge_source_paterrorIngressBridgeSourcePATError
aws_mediaconnect_ingress_bridge_source_pcraccuracy_errorIngressBridgeSourcePCRAccuracyError
aws_mediaconnect_ingress_bridge_source_pcrerrorIngressBridgeSourcePCRError
aws_mediaconnect_ingress_bridge_source_piderrorIngressBridgeSourcePIDError
aws_mediaconnect_ingress_bridge_source_pmterrorIngressBridgeSourcePMTError
aws_mediaconnect_ingress_bridge_source_ptserrorIngressBridgeSourcePTSError
aws_mediaconnect_ingress_bridge_source_packet_loss_percentIngressBridgeSourcePacketLossPercent
aws_mediaconnect_ingress_bridge_source_recovered_packetsIngressBridgeSourceRecoveredPackets
aws_mediaconnect_ingress_bridge_source_round_trip_timeIngressBridgeSourceRoundTripTime
aws_mediaconnect_ingress_bridge_source_tsbyte_errorIngressBridgeSourceTSByteError
aws_mediaconnect_ingress_bridge_source_tssync_lossIngressBridgeSourceTSSyncLoss
aws_mediaconnect_ingress_bridge_source_total_packetsIngressBridgeSourceTotalPackets
aws_mediaconnect_ingress_bridge_source_transport_errorIngressBridgeSourceTransportError
aws_mediaconnect_ingress_bridge_tsbyte_errorIngressBridgeTSByteError
aws_mediaconnect_ingress_bridge_tssync_lossIngressBridgeTSSyncLoss
aws_mediaconnect_ingress_bridge_total_packetsIngressBridgeTotalPackets
aws_mediaconnect_ingress_bridge_transport_errorIngressBridgeTransportError
aws_mediaconnect_jitterJitter
aws_mediaconnect_latencyLatency
aws_mediaconnect_maintenance_canceledMaintenanceCanceled
aws_mediaconnect_maintenance_failedMaintenanceFailed
aws_mediaconnect_maintenance_rescheduledMaintenanceRescheduled
aws_mediaconnect_maintenance_scheduledMaintenanceScheduled
aws_mediaconnect_maintenance_startedMaintenanceStarted
aws_mediaconnect_maintenance_succeededMaintenanceSucceeded
aws_mediaconnect_merge_activeMergeActive
aws_mediaconnect_merge_latencyMergeLatency
aws_mediaconnect_not_recovered_packetsNotRecoveredPackets
aws_mediaconnect_output_connectedOutputConnected
aws_mediaconnect_output_disconnectionsOutputDisconnections
aws_mediaconnect_output_dropped_payloadsOutputDroppedPayloads
aws_mediaconnect_output_late_payloadsOutputLatePayloads
aws_mediaconnect_output_total_bytesOutputTotalBytes
aws_mediaconnect_output_total_payloadsOutputTotalPayloads
aws_mediaconnect_overflow_packetsOverflowPackets
aws_mediaconnect_paterrorPATError
aws_mediaconnect_pcraccuracy_errorPCRAccuracyError
aws_mediaconnect_pcrerrorPCRError
aws_mediaconnect_piderrorPIDError
aws_mediaconnect_pmterrorPMTError
aws_mediaconnect_ptserrorPTSError
aws_mediaconnect_packet_loss_percentPacketLossPercent
aws_mediaconnect_recovered_packetsRecoveredPackets
aws_mediaconnect_round_trip_timeRoundTripTime
aws_mediaconnect_source_arqrecoveredSourceARQRecovered
aws_mediaconnect_source_arqrequestsSourceARQRequests
aws_mediaconnect_source_bit_rateSourceBitRate
aws_mediaconnect_source_caterrorSourceCATError
aws_mediaconnect_source_crcerrorSourceCRCError
aws_mediaconnect_source_connectedSourceConnected
aws_mediaconnect_source_continuity_counterSourceContinuityCounter
aws_mediaconnect_source_disconnectionsSourceDisconnections
aws_mediaconnect_source_dropped_packetsSourceDroppedPackets
aws_mediaconnect_source_dropped_payloadsSourceDroppedPayloads
aws_mediaconnect_source_fecpacketsSourceFECPackets
aws_mediaconnect_source_fecrecoveredSourceFECRecovered
aws_mediaconnect_source_late_payloadsSourceLatePayloads
aws_mediaconnect_source_merge_activeSourceMergeActive
aws_mediaconnect_source_merge_latencySourceMergeLatency
aws_mediaconnect_source_merge_status_warn_mismatchSourceMergeStatusWarnMismatch
aws_mediaconnect_source_merge_status_warn_soloSourceMergeStatusWarnSolo
aws_mediaconnect_source_missing_packetsSourceMissingPackets
aws_mediaconnect_source_not_recovered_packetsSourceNotRecoveredPackets
aws_mediaconnect_source_overflow_packetsSourceOverflowPackets
aws_mediaconnect_source_paterrorSourcePATError
aws_mediaconnect_source_pcraccuracy_errorSourcePCRAccuracyError
aws_mediaconnect_source_pcrerrorSourcePCRError
aws_mediaconnect_source_piderrorSourcePIDError
aws_mediaconnect_source_pmterrorSourcePMTError
aws_mediaconnect_source_ptserrorSourcePTSError
aws_mediaconnect_source_packet_loss_percentSourcePacketLossPercent
aws_mediaconnect_source_recovered_packetsSourceRecoveredPackets
aws_mediaconnect_source_round_trip_timeSourceRoundTripTime
aws_mediaconnect_source_selectedSourceSelected
aws_mediaconnect_source_tsbyte_errorSourceTSByteError
aws_mediaconnect_source_tssync_lossSourceTSSyncLoss
aws_mediaconnect_source_total_bytesSourceTotalBytes
aws_mediaconnect_source_total_packetsSourceTotalPackets
aws_mediaconnect_source_total_payloadsSourceTotalPayloads
aws_mediaconnect_source_transport_errorSourceTransportError
aws_mediaconnect_tsbyte_errorTSByteError
aws_mediaconnect_tssync_lossTSSyncLoss
aws_mediaconnect_total_packetsTotalPackets
aws_mediaconnect_transport_errorTransportError
aws_mediaconnect_uptimeUptime

AWS/MediaTailor

Function: Personalizes advertisement insertion in video streams for a seamless experience

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_mediatailor_info
aws_mediatailor_ad_decision_server_adsAdDecisionServer.AdsTracks the number of ads provided by the Ad Decision Server (ADS).
aws_mediatailor_ad_decision_server_durationAdDecisionServer.DurationMeasures the duration of requests made to the Ad Decision Server.
aws_mediatailor_ad_decision_server_errorsAdDecisionServer.ErrorsMonitors the number of errors returned by the Ad Decision Server.
aws_mediatailor_ad_decision_server_fill_rateAdDecisionServer.FillRateTracks the rate at which ad slots are successfully filled by the Ad Decision Server.
aws_mediatailor_ad_decision_server_timeoutsAdDecisionServer.TimeoutsTracks the number of timeouts during requests to the Ad Decision Server.
aws_mediatailor_ad_not_readyAdNotReadyIndicates the number of instances where ads were not ready to be served.
aws_mediatailor_avails_durationAvails.DurationMeasures the duration of available ad opportunities (avails).
aws_mediatailor_avails_fill_rateAvails.FillRateTracks the rate at which avails are filled with ads.
aws_mediatailor_avails_filled_durationAvails.FilledDurationMeasures the total filled duration of ad avails.
aws_mediatailor_get_manifest_errorsGetManifest.ErrorsMonitors the number of errors encountered while retrieving the manifest.
aws_mediatailor_origin_errorsOrigin.ErrorsTracks the number of errors originating from the content origin server.
aws_mediatailor_origin_timeoutsOrigin.TimeoutsMonitors the number of timeouts from requests to the content origin server.

AWS/NATGateway

Function: Manages network address translation to securely connect instances to the internet

Scrape interval: 5 minutes

Includes: Out-of-the-box dashboard

MetricCloudwatch metricPurpose
aws_natgateway_info
aws_natgateway_active_connection_countActiveConnectionCountTracks the number of active connections to the NAT Gateway.
aws_natgateway_bytes_in_from_destinationBytesInFromDestinationMeasures the amount of data received by the NAT Gateway from the destination (in bytes).
aws_natgateway_bytes_in_from_sourceBytesInFromSourceMeasures the amount of data received by the NAT Gateway from the source (in bytes).
aws_natgateway_bytes_out_to_destinationBytesOutToDestinationTracks the data sent from the NAT Gateway to the destination (in bytes).
aws_natgateway_bytes_out_to_sourceBytesOutToSourceMeasures the data sent from the NAT Gateway to the source (in bytes).
aws_natgateway_connection_attempt_countConnectionAttemptCountCounts the number of attempts to establish a connection via the NAT Gateway.
aws_natgateway_connection_established_countConnectionEstablishedCountMeasures the successful establishment of connections through the NAT Gateway.
aws_natgateway_error_port_allocationErrorPortAllocationTracks errors related to port allocation failures in the NAT Gateway.
aws_natgateway_idle_timeout_countIdleTimeoutCountCounts the number of times connections are closed due to idle timeouts on the NAT Gateway.
aws_natgateway_packets_drop_countPacketsDropCountMeasures the number of packets dropped by the NAT Gateway.
aws_natgateway_packets_in_from_destinationPacketsInFromDestinationTracks the number of packets received by the NAT Gateway from the destination.
aws_natgateway_packets_in_from_sourcePacketsInFromSourceMeasures the number of packets received by the NAT Gateway from the source.
aws_natgateway_packets_out_to_destinationPacketsOutToDestinationTracks the number of packets sent from the NAT Gateway to the destination.
aws_natgateway_packets_out_to_sourcePacketsOutToSourceMeasures the number of packets sent from the NAT Gateway to the source.

AWS/Neptune

Function: Managed graph database service for building and running graph applications

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_neptune_info
aws_neptune_cpuutilizationCPUUtilizationMonitors the percentage of CPU resources used by the Neptune database instance.
aws_neptune_cluster_replica_lagClusterReplicaLagMeasures the replication lag between the Neptune writer and reader nodes in milliseconds.
aws_neptune_cluster_replica_lag_maximumClusterReplicaLagMaximumTracks the maximum replica lag during the monitored period.
aws_neptune_cluster_replica_lag_minimumClusterReplicaLagMinimumTracks the minimum replica lag during the monitored period.
aws_neptune_engine_uptimeEngineUptimeMonitors the total uptime of the Neptune engine instance.
aws_neptune_free_local_storageFreeLocalStorageMonitors the amount of local storage available on the Neptune instance.
aws_neptune_freeable_memoryFreeableMemoryTracks the amount of available memory on the Neptune instance.
aws_neptune_gremlin_errorsGremlinErrorsCounts the errors encountered in Gremlin queries.
aws_neptune_gremlin_http1xxGremlinHttp1xxTracks HTTP 1xx responses for Gremlin queries.
aws_neptune_gremlin_http2xxGremlinHttp2xxTracks HTTP 2xx (successful) responses for Gremlin queries.
aws_neptune_gremlin_http4xxGremlinHttp4xxMonitors HTTP 4xx (client error) responses for Gremlin queries.
aws_neptune_gremlin_http5xxGremlinHttp5xxTracks HTTP 5xx (server error) responses for Gremlin queries.
aws_neptune_gremlin_requestsGremlinRequestsMonitors the total number of Gremlin requests made.
aws_neptune_gremlin_requests_per_secGremlinRequestsPerSecMeasures the rate of Gremlin requests per second.
aws_neptune_gremlin_web_socket_available_connectionsGremlinWebSocketAvailableConnectionsTracks available WebSocket connections for Gremlin.
aws_neptune_gremlin_web_socket_client_errorsGremlinWebSocketClientErrorsMonitors WebSocket client errors for Gremlin.
aws_neptune_gremlin_web_socket_server_errorsGremlinWebSocketServerErrorsMonitors WebSocket server errors for Gremlin.
aws_neptune_gremlin_web_socket_successGremlinWebSocketSuccessCounts successful WebSocket connections for Gremlin.
aws_neptune_http100Http100Monitors HTTP 100 responses from the Neptune instance.
aws_neptune_http101Http101Tracks HTTP 101 responses (Switching Protocols).
aws_neptune_http1xxHttp1xxTracks all HTTP 1xx responses for requests made to the Neptune instance.
aws_neptune_http200Http200Tracks HTTP 200 (OK) responses.
aws_neptune_http2xxHttp2xxMonitors all HTTP 2xx responses (successful requests).
aws_neptune_http400Http400Tracks HTTP 400 (bad request) responses.
aws_neptune_http403Http403Monitors HTTP 403 (forbidden) responses.
aws_neptune_http405Http405Tracks HTTP 405 (method not allowed) responses.
aws_neptune_http413Http413Tracks HTTP 413 (request entity too large) responses.
aws_neptune_http429Http429Monitors HTTP 429 (too many requests) responses.
aws_neptune_http4xxHttp4xxTracks all HTTP 4xx (client error) responses.
aws_neptune_http500Http500Monitors HTTP 500 (internal server error) responses.
aws_neptune_http501Http501Tracks HTTP 501 (not implemented) responses.
aws_neptune_http5xxHttp5xxMonitors all HTTP 5xx (server error) responses.
aws_neptune_loader_errorsLoaderErrorsCounts errors encountered during bulk loader operations.
aws_neptune_loader_requestsLoaderRequestsTracks requests made to the bulk loader.
aws_neptune_network_receive_throughputNetworkReceiveThroughputMonitors the network throughput for data received by the Neptune instance.
aws_neptune_network_throughputNetworkThroughputMeasures the total network throughput (incoming and outgoing) of the Neptune instance.
aws_neptune_network_transmit_throughputNetworkTransmitThroughputTracks the network throughput for data transmitted by the Neptune instance.
aws_neptune_sparql_errorsSparqlErrorsMonitors errors encountered in SPARQL queries.
aws_neptune_sparql_http1xxSparqlHttp1xxTracks HTTP 1xx responses for SPARQL queries.
aws_neptune_sparql_http2xxSparqlHttp2xxTracks HTTP 2xx responses for SPARQL queries.
aws_neptune_sparql_http4xxSparqlHttp4xxMonitors HTTP 4xx responses for SPARQL queries.
aws_neptune_sparql_http5xxSparqlHttp5xxTracks HTTP 5xx responses for SPARQL queries.
aws_neptune_sparql_requests**SparqlRequestsMeasures the number of SPARQL requests made to the Neptune instance.
aws_neptune_sparql_requests_per_secSparqlRequestsPerSecTracks the rate of SPARQL requests per second.
aws_neptune_status_errorsStatusErrorsMonitors the number of status errors reported by the Neptune instance.
aws_neptune_status_requestsStatusRequestsTracks the number of status requests made to the Neptune instance.
aws_neptune_volume_bytes_usedVolumeBytesUsedMeasures the amount of storage used by the Neptune instance.
aws_neptune_volume_read_iopsVolumeReadIOPsMonitors the read input/output operations per second on the Neptune instance’s volume.
aws_neptune_volume_write_iopsVolumeWriteIOPsTracks the write input/output operations per second on the Neptune instance’s volume.

AWS/NetworkELB

Function: Provides highly scalable and fault-tolerant network load balancing for traffic distribution

Scrape interval: 5 minutes

Includes: Out-of-the-box dashboard

MetricCloudwatch metricPurpose
aws_networkelb_info
aws_networkelb_active_flow_countActiveFlowCountMonitors the total number of active flow connections through the Network Load Balancer.
aws_networkelb_active_flow_count_tlsActiveFlowCount_TLSTracks the number of active flow connections through the Network Load Balancer that are using TLS.
aws_networkelb_client_tlsnegotiation_error_countClientTLSNegotiationErrorCountMonitors the number of client TLS negotiation errors, indicating issues with SSL/TLS handshakes.
aws_networkelb_consumed_lcusConsumedLCUsMeasures Load Balancer Capacity Units (LCUs) consumed by the Network Load Balancer.
aws_networkelb_healthy_host_countHealthyHostCountTracks the number of healthy targets available to receive traffic.
aws_networkelb_new_flow_countNewFlowCountMeasures the number of new flow connections established with the Network Load Balancer.
aws_networkelb_new_flow_count_tlsNewFlowCount_TLSTracks the number of new flow connections using TLS.
aws_networkelb_processed_bytesProcessedBytesMeasures the total amount of data processed by the Network Load Balancer.
aws_networkelb_target_tlsnegotiation_error_countTargetTLSNegotiationErrorCountMonitors TLS negotiation errors on the target side, indicating failed handshakes.
aws_networkelb_tcp_client_reset_countTCP_Client_Reset_CountTracks the number of TCP client resets, indicating client-initiated connection terminations.
aws_networkelb_tcp_target_reset_countTCP_Target_Reset_CountMonitors TCP resets initiated by the target, indicating failed connections.
aws_networkelb_un_healthy_host_countUnHealthyHostCountMeasures the number of targets marked as unhealthy by the load balancer.
aws_networkelb_active_flow_count_tcpActiveFlowCount_TCPMonitors the number of active TCP flows through the Network Load Balancer.
aws_networkelb_active_flow_count_udpActiveFlowCount_UDPTracks the number of active UDP flows through the Network Load Balancer.
aws_networkelb_consumed_lcus_tcpConsumedLCUs_TCPMeasures LCUs consumed by TCP traffic.
aws_networkelb_consumed_lcus_tlsConsumedLCUs_TLSMeasures LCUs consumed by TLS traffic.
aws_networkelb_consumed_lcus_udpConsumedLCUs_UDPMeasures LCUs consumed by UDP traffic.
aws_networkelb_new_flow_count_tcpNewFlowCount_TCPTracks the number of new TCP flow connections established.
aws_networkelb_new_flow_count_udpNewFlowCount_UDPMeasures the number of new UDP flow connections established.
aws_networkelb_peak_packets_per_secondPeakPacketsPerSecondMonitors the highest rate of packets processed by the Network Load Balancer per second.
aws_networkelb_port_allocation_error_countPortAllocationErrorCountTracks the number of errors due to port allocation failures.
aws_networkelb_processed_bytes_tcpProcessedBytes_TCPMeasures the total data processed over TCP connections.
aws_networkelb_processed_bytes_tlsProcessedBytes_TLSTracks the total data processed over TLS connections.
aws_networkelb_processed_bytes_udpProcessedBytes_UDPMonitors the total data processed over UDP connections.
aws_networkelb_processed_packetsProcessedPacketsTracks the total number of packets processed by the Network Load Balancer.
aws_networkelb_security_group_blocked_flow_count_inbound_icmpSecurityGroupBlockedFlowCount_Inbound_ICMPMeasures the number of inbound ICMP flows blocked by security groups.
aws_networkelb_security_group_blocked_flow_count_inbound_tcpSecurityGroupBlockedFlowCount_Inbound_TCPTracks the number of inbound TCP flows blocked by security groups.
aws_networkelb_security_group_blocked_flow_count_inbound_udpSecurityGroupBlockedFlowCount_Inbound_UDPMonitors the number of inbound UDP flows blocked by security groups.
aws_networkelb_security_group_blocked_flow_count_outbound_icmpSecurityGroupBlockedFlowCount_Outbound_ICMPMeasures the number of outbound ICMP flows blocked by security groups.
aws_networkelb_security_group_blocked_flow_count_outbound_tcpSecurityGroupBlockedFlowCount_Outbound_TCPTracks the number of outbound TCP flows blocked by security groups.
aws_networkelb_security_group_blocked_flow_count_outbound_udpSecurityGroupBlockedFlowCount_Outbound_UDPMonitors the number of outbound UDP flows blocked by security groups.
aws_networkelb_tcp_elb_reset_countTCP_ELB_Reset_CountTracks the number of TCP resets initiated by the Network Load Balancer itself.
aws_networkelb_unhealthy_routing_flow_countUnhealthyRoutingFlowCountMonitors the number of routing flows directed to unhealthy targets.

AWS/NetworkFirewall

Function: Managed network firewall service to secure VPCs

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_networkfirewall_info
aws_networkfirewall_dropped_packetsDroppedPacketsTracks the number of packets dropped by the Network Firewall, indicating blocked or failed traffic.
aws_networkfirewall_packetsPacketsMonitors the total number of packets inspected by the Network Firewall.
aws_networkfirewall_passed_packetsPassedPacketsMeasures the number of packets allowed through the Network Firewall, indicating successful traffic.
aws_networkfirewall_received_packet_countReceivedPacketCountTracks the total number of packets received by the Network Firewall for inspection.

AWS/PrivateLinkEndpoints

Function: Provides private connectivity between VPCs and AWS services or third-party services

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_privatelinkendpoints_info
aws_privatelinkendpoints_active_connectionsActiveConnectionsTracks the number of active connections through the PrivateLink endpoints.
aws_privatelinkendpoints_bytes_processedBytesProcessedMeasures the amount of data processed by the PrivateLink endpoints in bytes.
aws_privatelinkendpoints_new_connectionsNewConnectionsMonitors the number of new connections established through the PrivateLink endpoints.
aws_privatelinkendpoints_packets_droppedPacketsDroppedTracks the number of packets dropped by the PrivateLink endpoints, which could indicate errors or network issues.
aws_privatelinkendpoints_rst_packets_receivedRstPacketsReceivedMeasures the number of reset (RST) packets received, which can indicate connection terminations.

AWS/PrivateLinkServices

Function: Service for building services accessible over AWS PrivateLink

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_privatelinkservices_info
aws_privatelinkservices_active_connectionsActiveConnectionsMonitors the number of active connections managed by the PrivateLink services.
aws_privatelinkservices_bytes_processedBytesProcessedMeasures the total amount of data processed by the PrivateLink services in bytes.
aws_privatelinkservices_endpoints_countEndpointsCountTracks the number of PrivateLink service endpoints currently connected.
aws_privatelinkservices_new_connectionsNewConnectionsMonitors the number of new connections established via the PrivateLink services.
aws_privatelinkservices_rst_packets_receivedRstPacketsReceivedMeasures the number of reset (RST) packets received, indicating terminated connections.

AWS/Prometheus

Function: Managed Prometheus service for monitoring and alerting metrics

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_prometheus_info
aws_prometheus_alert_manager_alerts_receivedAlertManagerAlertsReceivedTracks the number of alerts received by the Prometheus Alert Manager.
aws_prometheus_alert_manager_notifications_failedAlertManagerNotificationsFailedMonitors the number of failed alert notifications sent by the Prometheus Alert Manager.
aws_prometheus_alert_manager_notifications_throttledAlertManagerNotificationsThrottledMeasures the number of alert notifications throttled due to rate limits or other constraints.
aws_prometheus_discarded_samplesDiscardedSamplesTracks the number of discarded samples due to errors or incorrect data.
aws_prometheus_rule_evaluation_failuresRuleEvaluationFailuresMonitors the number of failed rule evaluations in Prometheus.
aws_prometheus_rule_evaluationsRuleEvaluationsMeasures the total number of rule evaluations performed by Prometheus.
aws_prometheus_rule_group_iterations_missedRuleGroupIterationsMissedTracks the number of rule group evaluation iterations that were missed due to processing delays.

AWS/RDS

Function: Managed relational database service for databases like MySQL, PostgreSQL, and Oracle

Scrape interval: 5 minutes

Includes: Out-of-the-box dashboard

MetricCloudwatch metricPurpose
aws_rds_info
aws_rds_cpuutilizationCPUUtilizationTracks the utilization of CPU resources by RDS instances.
aws_rds_database_connectionsDatabaseConnectionsMeasures the number of active database connections to RDS instances.
aws_rds_replica_lagReplicaLagMonitors the lag time between the master and replica databases.
aws_rds_freeable_memoryFreeableMemoryIndicates the available memory that can be used by the RDS instance.
aws_rds_free_storage_spaceFreeStorageSpaceShows the remaining storage space available on the RDS instance.
aws_rds_free_storage_space_log_volumeFreeStorageSpaceLogVolume
aws_rds_swap_usageSwapUsageMonitors the amount of swap space used by the RDS instance.
aws_rds_read_throughputReadThroughputMeasures the throughput for read operations from the database.
aws_rds_read_latencyReadLatencyIndicates the latency for read operations on the database.
aws_rds_read_iopsReadIOPSTracks the input/output operations per second for reads on the RDS instance.
aws_rds_write_throughputWriteThroughputMeasures the throughput for write operations to the database.
aws_rds_write_latencyWriteLatencyIndicates the latency for write operations on the database.
aws_rds_write_iopsWriteIOPSTracks the input/output operations per second for writes on the RDS instance.
aws_rds_burst_balanceBurstBalanceMonitors the burst balance percentage for instances with burstable performance.
aws_rds_ebsbyte_balance_percentEBSByteBalance%
aws_rds_ebsiobalance_percentEBSIOBalance%
aws_rds_dbloadDBLoadMeasures the database load on the instance.
aws_rds_dbload_cpuDBLoadCPUTracks the portion of database load related to CPU usage.
aws_rds_dbload_non_cpuDBLoadNonCPUMeasures the portion of database load unrelated to CPU usage.
aws_rds_cpucredit_usageCPUCreditUsage
aws_rds_cpucredit_balanceCPUCreditBalance
aws_rds_acuutilizationACUUtilizationMonitors the utilization of Aurora Capacity Units (ACUs).
aws_rds_aborted_clientsAbortedClientsTracks the number of aborted client connections to the database.
aws_rds_active_transactionsActiveTransactionsShows the number of active transactions on the database.
aws_rds_aurora_binlog_replica_lagAuroraBinlogReplicaLagMonitors the replication lag between the Aurora master and replicas.
aws_rds_aurora_dmlrejected_master_fullAuroraDMLRejectedMasterFull
aws_rds_aurora_dmlrejected_writer_fullAuroraDMLRejectedWriterFull
aws_rds_aurora_estimated_shared_memory_bytesAuroraEstimatedSharedMemoryBytes
aws_rds_aurora_global_dbdata_transfer_bytesAuroraGlobalDBDataTransferBytes
aws_rds_aurora_global_dbprogress_lagAuroraGlobalDBProgressLag
aws_rds_aurora_global_dbrpolagAuroraGlobalDBRPOLag
aws_rds_aurora_global_dbreplicated_write_ioAuroraGlobalDBReplicatedWriteIO
aws_rds_aurora_global_dbreplication_lagAuroraGlobalDBReplicationLag
aws_rds_aurora_memory_health_stateAuroraMemoryHealthStateIndicates the health state of memory in Aurora instances.
aws_rds_aurora_memory_num_declined_sql_totalAuroraMemoryNumDeclinedSqlTotal
aws_rds_aurora_memory_num_kill_conn_totalAuroraMemoryNumKillConnTotal
aws_rds_aurora_memory_num_kill_query_totalAuroraMemoryNumKillQueryTotal
aws_rds_aurora_optimized_reads_cache_hit_ratioAuroraOptimizedReadsCacheHitRatio
aws_rds_aurora_replica_lagAuroraReplicaLag
aws_rds_aurora_replica_lag_maximumAuroraReplicaLagMaximum
aws_rds_aurora_replica_lag_minimumAuroraReplicaLagMinimum
aws_rds_aurora_slow_connection_handle_countAuroraSlowConnectionHandleCount
aws_rds_aurora_slow_handshake_countAuroraSlowHandshakeCount
aws_rds_aurora_volume_bytes_left_totalAuroraVolumeBytesLeftTotal
aws_rds_availability_percentageAvailabilityPercentageMeasures the availability of the RDS instance in terms of percentage uptime.
aws_rds_backtrack_change_records_creation_rateBacktrackChangeRecordsCreationRate
aws_rds_backtrack_change_records_storedBacktrackChangeRecordsStored
aws_rds_backtrack_window_actualBacktrackWindowActual
aws_rds_backtrack_window_alertBacktrackWindowAlert
aws_rds_backup_retention_period_storage_usedBackupRetentionPeriodStorageUsed
aws_rds_bin_log_disk_usageBinLogDiskUsage
aws_rds_blocked_transactionsBlockedTransactions
aws_rds_buffer_cache_hit_ratioBufferCacheHitRatio
aws_rds_cpusurplus_credit_balanceCPUSurplusCreditBalance
aws_rds_cpusurplus_credits_chargedCPUSurplusCreditsCharged
aws_rds_checkpoint_lagCheckpointLag
aws_rds_client_connectionsClientConnections
aws_rds_client_connections_closedClientConnectionsClosed
aws_rds_client_connections_no_tlsClientConnectionsNoTLS
aws_rds_client_connections_receivedClientConnectionsReceived
aws_rds_client_connections_setup_failed_authClientConnectionsSetupFailedAuth
aws_rds_client_connections_setup_succeededClientConnectionsSetupSucceeded
aws_rds_client_connections_tlsClientConnectionsTLS
aws_rds_commit_latencyCommitLatency
aws_rds_commit_throughputCommitThroughput
aws_rds_connection_attemptsConnectionAttempts
aws_rds_ddllatencyDDLLatency
aws_rds_ddlthroughputDDLThroughput
aws_rds_dmllatencyDMLLatency
aws_rds_dmlthroughputDMLThroughput
aws_rds_database_connection_requestsDatabaseConnectionRequests
aws_rds_database_connection_requests_with_tlsDatabaseConnectionRequestsWithTLS
aws_rds_database_connections_borrow_latencyDatabaseConnectionsBorrowLatency
aws_rds_database_connections_currently_borrowedDatabaseConnectionsCurrentlyBorrowed
aws_rds_database_connections_currently_in_transactionDatabaseConnectionsCurrentlyInTransaction
aws_rds_database_connections_currently_session_pinnedDatabaseConnectionsCurrentlySessionPinned
aws_rds_database_connections_setup_failedDatabaseConnectionsSetupFailed
aws_rds_database_connections_setup_succeededDatabaseConnectionsSetupSucceeded
aws_rds_database_connections_with_tlsDatabaseConnectionsWithTLS
aws_rds_deadlocksDeadlocks
aws_rds_delete_latencyDeleteLatency
aws_rds_delete_throughputDeleteThroughput
aws_rds_disk_queue_depthDiskQueueDepth
aws_rds_disk_queue_depth_log_volumeDiskQueueDepthLogVolume
aws_rds_engine_uptimeEngineUptime
aws_rds_failed_sqlserver_agent_jobs_countFailedSQLServerAgentJobsCount
aws_rds_free_ephemeral_storageFreeEphemeralStorage
aws_rds_free_local_storageFreeLocalStorage
aws_rds_insert_latencyInsertLatency
aws_rds_insert_throughputInsertThroughput
aws_rds_login_failuresLoginFailures
aws_rds_max_database_connections_allowedMaxDatabaseConnectionsAllowed
aws_rds_maximum_used_transaction_idsMaximumUsedTransactionIDs
aws_rds_network_receive_throughputNetworkReceiveThroughput
aws_rds_network_throughputNetworkThroughput
aws_rds_network_transmit_throughputNetworkTransmitThroughput
aws_rds_num_binary_log_filesNumBinaryLogFiles
aws_rds_oldest_replication_slot_lagOldestReplicationSlotLag
aws_rds_purge_boundaryPurgeBoundary
aws_rds_purge_finished_pointPurgeFinishedPoint
aws_rds_queriesQueriesCounts the number of queries executed on the RDS instance.
aws_rds_query_database_response_latencyQueryDatabaseResponseLatency
aws_rds_query_requestsQueryRequests
aws_rds_query_requests_no_tlsQueryRequestsNoTLS
aws_rds_query_requests_tlsQueryRequestsTLS
aws_rds_query_response_latencyQueryResponseLatency
aws_rds_to_aurora_postgre_sqlreplica_lagRDSToAuroraPostgreSQLReplicaLag
aws_rds_read_iopsephemeral_storageReadIOPSEphemeralStorage
aws_rds_read_iopslog_volumeReadIOPSLogVolume
aws_rds_read_latency_ephemeral_storageReadLatencyEphemeralStorage
aws_rds_read_latency_log_volumeReadLatencyLogVolume
aws_rds_read_throughput_ephemeral_storageReadThroughputEphemeralStorage
aws_rds_read_throughput_log_volumeReadThroughputLogVolume
aws_rds_replication_channel_lagReplicationChannelLag
aws_rds_replication_slot_disk_usageReplicationSlotDiskUsage
aws_rds_result_set_cache_hit_ratioResultSetCacheHitRatio
aws_rds_rollback_segment_history_list_lengthRollbackSegmentHistoryListLength
aws_rds_row_lock_timeRowLockTime
aws_rds_select_latencySelectLatency
aws_rds_select_throughputSelectThroughput
aws_rds_serverless_database_capacityServerlessDatabaseCapacity
aws_rds_snapshot_storage_usedSnapshotStorageUsed
aws_rds_storage_network_receive_throughputStorageNetworkReceiveThroughput
aws_rds_storage_network_throughputStorageNetworkThroughputMeasures the network throughput for both transmitting and receiving data from the RDS instance.
aws_rds_storage_network_transmit_throughputStorageNetworkTransmitThroughput
aws_rds_sum_binary_log_sizeSumBinaryLogSize
aws_rds_temp_storage_iopsTempStorageIOPS
aws_rds_temp_storage_throughputTempStorageThroughput
aws_rds_total_backup_storage_billedTotalBackupStorageBilled
aws_rds_transaction_logs_disk_usageTransactionLogsDiskUsageTracks the amount of disk space used by transaction logs.
aws_rds_transaction_logs_generationTransactionLogsGeneration
aws_rds_truncate_finished_pointTruncateFinishedPoint
aws_rds_update_latencyUpdateLatency
aws_rds_update_throughputUpdateThroughput
aws_rds_volume_bytes_usedVolumeBytesUsedShows the total amount of disk space used by the RDS instance.
aws_rds_volume_read_iopsVolumeReadIOPs
aws_rds_volume_write_iopsVolumeWriteIOPs
aws_rds_write_iopsephemeral_storageWriteIOPSEphemeralStorage
aws_rds_write_iopslog_volumeWriteIOPSLogVolume
aws_rds_write_latency_ephemeral_storageWriteLatencyEphemeralStorage
aws_rds_write_latency_log_volumeWriteLatencyLogVolumeMonitors the latency for write operations on the log volume.
aws_rds_write_throughput_ephemeral_storageWriteThroughputEphemeralStorage
aws_rds_write_throughput_log_volumeWriteThroughputLogVolume

AWS/Redshift

Function: Fully managed data warehouse for large-scale data analytics

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_redshift_info
aws_redshift_cpuutilizationCPUUtilizationTracks CPU utilization across Redshift clusters.
aws_redshift_commit_queue_lengthCommitQueueLengthMeasures the length of the commit queue for query execution.
aws_redshift_concurrency_scaling_active_clustersConcurrencyScalingActiveClustersMonitors the number of active concurrency scaling clusters.
aws_redshift_concurrency_scaling_secondsConcurrencyScalingSecondsMeasures the time spent scaling for concurrency.
aws_redshift_database_connectionsDatabaseConnectionsTracks the number of database connections to the Redshift cluster.
aws_redshift_health_statusHealthStatusProvides health status of Redshift clusters.
aws_redshift_maintenance_modeMaintenanceModeIndicates if the cluster is in maintenance mode.
aws_redshift_max_configured_concurrency_scaling_clustersMaxConfiguredConcurrencyScalingClustersTracks the maximum number of concurrency scaling clusters configured.
aws_redshift_network_receive_throughputNetworkReceiveThroughputMeasures the network throughput for receiving data.
aws_redshift_network_transmit_throughputNetworkTransmitThroughputMeasures the network throughput for transmitting data.
aws_redshift_num_exceeded_schema_quotasNumExceededSchemaQuotasTracks how often schema quotas have been exceeded.
aws_redshift_percentage_disk_space_usedPercentageDiskSpaceUsedShows the percentage of disk space used by the cluster.
aws_redshift_percentage_quota_usedPercentageQuotaUsedMonitors the percentage of quota used.
aws_redshift_queries_completed_per_secondQueriesCompletedPerSecondMeasures the number of queries completed per second.
aws_redshift_query_durationQueryDurationTracks the duration of queries.
aws_redshift_query_runtime_breakdownQueryRuntimeBreakdownProvides a breakdown of the time spent on query execution.
aws_redshift_read_iopsReadIOPSMeasures input/output operations per second for reads.
aws_redshift_read_latencyReadLatencyTracks latency for read operations.
aws_redshift_read_throughputReadThroughputMeasures throughput for read operations.
aws_redshift_schema_quotaSchemaQuotaMonitors schema quota usage.
aws_redshift_storage_usedStorageUsedShows the amount of storage used by the Redshift cluster.
aws_redshift_total_table_countTotalTableCountMeasures the total number of tables in the cluster.
aws_redshift_wlmqueries_completed_per_secondWLMQueriesCompletedPerSecondTracks the number of queries completed per second in the Workload Management (WLM) queue.
aws_redshift_wlmquery_durationWLMQueryDurationMeasures the duration of queries in the WLM queue.
aws_redshift_wlmqueue_lengthWLMQueueLengthTracks the length of the WLM queue.
aws_redshift_wlmqueue_wait_timeWLMQueueWaitTimeMeasures the wait time for queries in the WLM queue.
aws_redshift_wlmrunning_queriesWLMRunningQueriesShows the number of queries currently running in the WLM queue.
aws_redshift_write_iopsWriteIOPSMeasures input/output operations per second for writes.
aws_redshift_write_latencyWriteLatencyTracks latency for write operations.
aws_redshift_write_throughputWriteThroughputMeasures throughput for write operations.

AWS/Route53

Function: Scalable DNS and domain registration service

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_route53_info
aws_route53_child_health_check_healthy_countChildHealthCheckHealthyCountTracks the count of healthy child health checks.
aws_route53_connection_timeConnectionTimeMeasures the time it takes to establish a connection.
aws_route53_dnsqueriesDNSQueriesMonitors the number of DNS queries handled by Route 53.
aws_route53_health_check_percentage_healthyHealthCheckPercentageHealthyDisplays the percentage of healthy Route 53 health checks.
aws_route53_health_check_statusHealthCheckStatusIndicates the status of health checks, showing whether they are passing or failing.
aws_route53_sslhandshake_timeSSLHandshakeTimeMeasures the time it takes to complete the SSL handshake.
aws_route53_time_to_first_byteTimeToFirstByteTracks the time taken to receive the first byte of the response after a request is sent.

AWS/Route53Resolver

Function: DNS firewall to filter and monitor DNS queries

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_route53resolver_info
aws_route53resolver_inbound_query_volumeInboundQueryVolumeMeasures the volume of DNS queries received by the Route 53 Resolver inbound endpoint.
aws_route53resolver_outbound_query_aggregated_volumeOutboundQueryAggregatedVolumeTracks the total volume of outbound DNS queries across all outbound endpoints.
aws_route53resolver_outbound_query_volumeOutboundQueryVolumeMonitors the volume of DNS queries sent by the Route 53 Resolver outbound endpoint.

AWS/S3

Function: Scalable object storage service for a wide range of data types

Scrape interval: 5 minutes

Includes: Out-of-the-box dashboard

MetricCloudwatch metricPurpose
aws_s3_info
aws_s3_number_of_objectsNumberOfObjectsTracks the total number of objects stored in an S3 bucket.
aws_s3_bucket_size_bytesBucketSizeBytesMeasures the total size of an S3 bucket in bytes.
aws_s3_all_requestsAllRequestsMeasures the total number of all requests made to an S3 bucket.
aws_s3_4xx_errors4xxErrorsCounts the number of 4xx HTTP status code errors encountered during S3 requests.
aws_s3_total_request_latencyTotalRequestLatencyTotalRequestLatency Measures the total latency for S3 requests.
aws_s3_5xx_errors5xxErrorsCounts the number of 5xx HTTP status code errors encountered during S3 requests.
aws_s3_bytes_downloadedBytesDownloadedTracks the total bytes downloaded from an S3 bucket.
aws_s3_bytes_pending_replicationBytesPendingReplicationMeasures the bytes pending replication in S3 cross-region replication scenarios.
aws_s3_bytes_uploadedBytesUploadedTracks the total bytes uploaded to an S3 bucket.
aws_s3_delete_requestsDeleteRequestsMeasures the number of delete requests made to an S3 bucket.
aws_s3_first_byte_latencyFirstByteLatencyTracks the latency until the first byte is sent in an S3 request.
aws_s3_get_requestsGetRequestsMeasures the number of GET requests made to an S3 bucket.
aws_s3_head_requestsHeadRequestsCounts the number of HEAD requests made to an S3 bucket.
aws_s3_list_requestsListRequestsTracks the number of LIST requests made to an S3 bucket.
aws_s3_operations_failed_replicationOperationsFailedReplicationCounts the number of replication operations that have failed.
aws_s3_operations_pending_replicationOperationsPendingReplicationTracks the number of pending replication operations in an S3 bucket.
aws_s3_post_requestsPostRequestsCounts the number of POST requests made to an S3 bucket.
aws_s3_put_requestsPutRequestsTracks the number of PUT requests made to an S3 bucket.
aws_s3_replication_latencyReplicationLatencyMeasures the latency of replication operations.
aws_s3_select_requestsSelectRequestsMeasures the number of select requests made to an S3 bucket.
aws_s3_select_returned_bytesSelectReturnedBytesTracks the bytes returned by S3 Select queries.
aws_s3_select_scanned_bytesSelectScannedBytesMeasures the bytes scanned by S3 Select queries.

AWS/SES

Function: Email service for sending marketing, notification, and transactional emails

Scrape interval: 5 minutes

Includes: Out-of-the-box dashboard

MetricCloudwatch metricPurpose
aws_ses_bounceBounce
aws_ses_complaintComplaint
aws_ses_deliveryDelivery
aws_ses_rejectReject
aws_ses_sendSend
aws_ses_clicksClicks
aws_ses_opensOpens
aws_ses_rendering_failuresRendering Failures
aws_ses_reputation_bounce_rateReputation.BounceRate
aws_ses_reputation_complaint_rateReputation.ComplaintRate

AWS/SNS

Function: Managed messaging service for sending notifications to mobile devices or other services

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_sns_info
aws_sns_number_of_messages_publishedNumberOfMessagesPublishedTracks the number of messages published to SNS topics.
aws_sns_number_of_notifications_deliveredNumberOfNotificationsDeliveredMeasures the number of successfully delivered notifications.
aws_sns_number_of_notifications_failedNumberOfNotificationsFailedTracks the number of failed notifications.
aws_sns_number_of_notifications_filtered_outNumberOfNotificationsFilteredOutMeasures the notifications that were filtered out based on the subscription’s filter policies.
aws_sns_number_of_notifications_filtered_out_invalid_attributesNumberOfNotificationsFilteredOut-InvalidAttributesTracks the notifications filtered out due to invalid message attributes.
aws_sns_number_of_notifications_filtered_out_message_bodyNumberOfNotificationsFilteredOut-MessageBodyMeasures notifications filtered out because of the message body content.
aws_sns_number_of_notifications_filtered_out_no_message_attributesNumberOfNotificationsFilteredOut-NoMessageAttributesTracks notifications filtered out due to missing message attributes.
aws_sns_publish_sizePublishSizeMeasures the size of messages published to SNS topics.
aws_sns_smsmonth_to_date_spent_usdSMSMonthToDateSpentUSDTracks the month-to-date costs incurred for sending SMS messages.
aws_sns_smssuccess_rateSMSSuccessRateMeasures the success rate of sending SMS messages via SNS.

AWS/SQS

Function: Fully managed message queuing service for decoupling and scaling microservices

Scrape interval: 5 minutes

Includes: Out-of-the-box dashboard

MetricCloudwatch metricPurpose
aws_sqs_info
aws_sqs_approximate_age_of_oldest_messageApproximateAgeOfOldestMessageTracks the approximate age of the oldest message in the queue.
aws_sqs_approximate_number_of_messages_delayedApproximateNumberOfMessagesDelayedMeasures the approximate number of messages currently delayed.
aws_sqs_approximate_number_of_messages_not_visibleApproximateNumberOfMessagesNotVisibleTracks the approximate number of messages that are not visible to consumers due to being in flight.
aws_sqs_approximate_number_of_messages_visibleApproximateNumberOfMessagesVisibleMeasures the approximate number of messages currently visible to consumers.
aws_sqs_number_of_empty_receivesNumberOfEmptyReceivesTracks the number of receive requests that did not return any messages.
aws_sqs_number_of_messages_deletedNumberOfMessagesDeletedMeasures the number of messages successfully deleted from the queue.
aws_sqs_number_of_messages_receivedNumberOfMessagesReceivedTracks the number of messages received from the queue.
aws_sqs_number_of_messages_sentNumberOfMessagesSentMeasures the number of messages successfully sent to the queue.
aws_sqs_sent_message_sizeSentMessageSizeTracks the size of messages sent to the queue.

AWS/SageMaker

Function: Managed service for building, training, and deploying machine learning models

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_sagemaker_info
aws_sagemaker_invocation4_xxerrorsInvocation4XXErrorsTracks the count of 4XX errors (client-side errors) during model invocations.
aws_sagemaker_invocation5_xxerrorsInvocation5XXErrorsTracks the count of 5XX errors (server-side errors) during model invocations.
aws_sagemaker_invocation_model_errorsInvocationModelErrorsMeasures the errors specific to model invocations.
aws_sagemaker_invocationsInvocationsCounts the number of successful model invocations.
aws_sagemaker_invocations_per_copyInvocationsPerCopyTracks the number of invocations per copy of the model.
aws_sagemaker_invocations_per_instanceInvocationsPerInstanceMeasures the number of invocations per instance.
aws_sagemaker_model_cache_hitModelCacheHitTracks the instances where model cache is hit, reducing load times.
aws_sagemaker_model_downloading_timeModelDownloadingTimeMeasures the time taken to download the model to the instance.
aws_sagemaker_model_latencyModelLatencyTracks the latency of model invocations.
aws_sagemaker_model_loading_timeModelLoadingTimeMeasures the time taken to load the model on the instance.
aws_sagemaker_model_loading_wait_timeModelLoadingWaitTimeMeasures the wait time during the model loading process.
aws_sagemaker_model_setup_timeModelSetupTimeTracks the time taken to set up the model environment.
aws_sagemaker_model_unloading_timeModelUnloadingTimeMeasures the time taken to unload the model from the instance.
aws_sagemaker_overhead_latencyOverheadLatencyTracks additional latency incurred due to overheads during the invocation process.

AWS/SageMaker/Endpoints

Function: Provides real-time and batch inference capabilities for deployed machine learning models

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_sagemaker_endpoints_info
aws_sagemaker_endpoints_cpureservationCPUReservationTracks the amount of reserved CPU resources for SageMaker endpoints.
aws_sagemaker_endpoints_cpuutilizationCPUUtilizationMonitors the actual CPU utilization by the SageMaker endpoint.
aws_sagemaker_endpoints_cpuutilization_normalizedCPUUtilizationNormalizedMeasures normalized CPU utilization based on instance type and capacity.
aws_sagemaker_endpoints_disk_utilizationDiskUtilizationTracks the disk space utilization for SageMaker endpoints.
aws_sagemaker_endpoints_gpumemory_utilizationGPUMemoryUtilizationMonitors the actual GPU memory utilization for endpoints using GPU instances.
aws_sagemaker_endpoints_gpumemory_utilization_normalizedGPUMemoryUtilizationNormalizedMeasures normalized GPU memory utilization.
aws_sagemaker_endpoints_gpureservationGPUReservationTracks the amount of reserved GPU resources for endpoints using GPU instances.
aws_sagemaker_endpoints_gpuutilizationGPUUtilizationMonitors the actual GPU utilization by the SageMaker endpoint.
aws_sagemaker_endpoints_gpuutilization_normalizedGPUUtilizationNormalizedMeasures normalized GPU utilization.
aws_sagemaker_endpoints_loaded_model_countLoadedModelCountTracks the number of models currently loaded on the SageMaker endpoint.
aws_sagemaker_endpoints_memory_reservationMemoryReservationTracks the amount of reserved memory for the SageMaker endpoint.
aws_sagemaker_endpoints_memory_utilizationMemoryUtilizationMonitors the actual memory utilization by the SageMaker endpoint.

AWS/SageMaker/InferenceRecommendationsJobs

Function: Offers guidance on optimizing inference workloads for ML models

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_sagemaker_inferencerecommendationsjobs_info
aws_sagemaker_inferencerecommendationsjobs_client_invocation_errorsClientInvocationErrorsTracks the number of errors encountered during client invocations for inference recommendations.
aws_sagemaker_inferencerecommendationsjobs_client_invocationsClientInvocationsMonitors the number of client invocations of the inference recommendations job.
aws_sagemaker_inferencerecommendationsjobs_client_latencyClientLatencyMeasures the latency of client invocations during the inference recommendations job.
aws_sagemaker_inferencerecommendationsjobs_number_of_usersNumberOfUsersTracks the number of users interacting with the inference recommendations job.

AWS/SageMaker/ModelBuildingPipeline

Function: Managed pipelines to automate model training and deployment processes

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_sagemaker_modelbuildingpipeline_info
aws_sagemaker_modelbuildingpipeline_execution_durationExecutionDurationTracks the duration of pipeline executions.
aws_sagemaker_modelbuildingpipeline_execution_failedExecutionFailedMonitors the number of failed pipeline executions.
aws_sagemaker_modelbuildingpipeline_execution_startedExecutionStartedCounts the number of started pipeline executions.
aws_sagemaker_modelbuildingpipeline_execution_stoppedExecutionStoppedTracks pipeline executions that were stopped.
aws_sagemaker_modelbuildingpipeline_execution_succeededExecutionSucceededMonitors the number of successfully completed pipeline executions.
aws_sagemaker_modelbuildingpipeline_step_durationStepDurationTracks the duration of individual steps within the pipeline.
aws_sagemaker_modelbuildingpipeline_step_failedStepFailedMonitors the number of failed steps within the pipeline.
aws_sagemaker_modelbuildingpipeline_step_startedStepStartedCounts the number of steps started in the pipeline.
aws_sagemaker_modelbuildingpipeline_step_stoppedStepStoppedTracks the steps that were stopped within the pipeline.
aws_sagemaker_modelbuildingpipeline_step_succeededStepSucceededMonitors the number of successfully completed steps within the pipeline.

AWS/SageMaker/ProcessingJobs

Function: Managed service for processing and transforming data at scale for machine learning

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_sagemaker_processingjobs_info
aws_sagemaker_processingjobs_cpureservationCPUReservationMonitors the amount of CPU resources reserved for processing jobs.
aws_sagemaker_processingjobs_cpuutilizationCPUUtilizationTracks the utilization of CPU resources during processing jobs.
aws_sagemaker_processingjobs_cpuutilization_normalizedCPUUtilizationNormalizedProvides normalized CPU utilization for easier comparison across different instance types.
aws_sagemaker_processingjobs_disk_utilizationDiskUtilizationMonitors the disk utilization during the processing jobs.
aws_sagemaker_processingjobs_gpumemory_utilizationGPUMemoryUtilizationTracks GPU memory usage during processing jobs.
aws_sagemaker_processingjobs_gpumemory_utilization_normalizedGPUMemoryUtilizationNormalizedProvides normalized GPU memory utilization for comparison across different instances.
aws_sagemaker_processingjobs_gpureservationGPUReservationMonitors the amount of GPU resources reserved for processing jobs.
aws_sagemaker_processingjobs_gpuutilizationGPUUtilizationTracks the utilization of GPU resources during processing jobs.
aws_sagemaker_processingjobs_gpuutilization_normalizedGPUUtilizationNormalizedProvides normalized GPU utilization for easier cross-instance comparison.
aws_sagemaker_processingjobs_memory_reservationMemoryReservationTracks memory resources reserved for processing jobs.
aws_sagemaker_processingjobs_memory_utilizationMemoryUtilizationMonitors the utilization of memory resources during processing jobs.

AWS/SageMaker/TrainingJobs

Function: Managed service for training ML models on large datasets

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_sagemaker_trainingjobs_info
aws_sagemaker_trainingjobs_cpureservationCPUReservationTracks the amount of CPU resources reserved for training jobs.
aws_sagemaker_trainingjobs_cpuutilizationCPUUtilizationMonitors the CPU utilization during training jobs.
aws_sagemaker_trainingjobs_cpuutilization_normalizedCPUUtilizationNormalizedProvides normalized CPU utilization across different instance types.
aws_sagemaker_trainingjobs_disk_utilizationDiskUtilizationMonitors the disk utilization during training jobs.
aws_sagemaker_trainingjobs_gpumemory_utilizationGPUMemoryUtilizationTracks GPU memory utilization during training jobs.
aws_sagemaker_trainingjobs_gpumemory_utilization_normalizedGPUMemoryUtilizationNormalizedProvides normalized GPU memory utilization for comparison across different instances.
aws_sagemaker_trainingjobs_gpureservationGPUReservationTracks the amount of GPU resources reserved for training jobs.
aws_sagemaker_trainingjobs_gpuutilizationGPUUtilizationMonitors GPU utilization during training jobs.
aws_sagemaker_trainingjobs_gpuutilization_normalizedGPUUtilizationNormalizedProvides normalized GPU utilization across different instances.
aws_sagemaker_trainingjobs_memory_reservationMemoryReservationMonitors the amount of memory reserved for training jobs.
aws_sagemaker_trainingjobs_memory_utilizationMemoryUtilizationTracks the memory usage during training jobs.

AWS/SageMaker/TransformJobs

Function: Enables large-scale, batch ML model inferences for data transformations

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_sagemaker_transformjobs_info
aws_sagemaker_transformjobs_cpureservationCPUReservationTracks the CPU resources reserved for transform jobs.
aws_sagemaker_transformjobs_cpuutilizationCPUUtilizationMonitors the CPU utilization during transform jobs.
aws_sagemaker_transformjobs_cpuutilization_normalizedCPUUtilizationNormalizedProvides normalized CPU utilization across different instance types during transform jobs.
aws_sagemaker_transformjobs_disk_utilizationDiskUtilizationMonitors disk utilization during transform jobs.
aws_sagemaker_transformjobs_gpumemory_utilizationGPUMemoryUtilizationTracks GPU memory utilization during transform jobs.
aws_sagemaker_transformjobs_gpumemory_utilization_normalizedGPUMemoryUtilizationNormalizedProvides normalized GPU memory utilization for comparison across different instances during transform jobs.
aws_sagemaker_transformjobs_gpureservationGPUReservationTracks the GPU resources reserved for transform jobs.
aws_sagemaker_transformjobs_gpuutilizationGPUUtilizationMonitors GPU utilization during transform jobs.
aws_sagemaker_transformjobs_gpuutilization_normalizedGPUUtilizationNormalizedProvides normalized GPU utilization across different instances during transform jobs.
aws_sagemaker_transformjobs_memory_reservationMemoryReservationMonitors memory resources reserved for transform jobs.
aws_sagemaker_transformjobs_memory_utilizationMemoryUtilizationTracks memory usage during transform jobs.

AWS/Scheduler

Function: Managed service to trigger events or workflows at a scheduled time

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_scheduler_invocation_attempt_countInvocationAttemptCountTracks the number of attempts made for invocations.
aws_scheduler_invocation_dropped_countInvocationDroppedCountMonitors the count of invocations that were dropped.
aws_scheduler_invocation_throttle_countInvocationThrottleCountCounts the number of invocations that were throttled due to exceeding limits.
aws_scheduler_invocations_failed_to_be_sent_to_dead_letter_countInvocationsFailedToBeSentToDeadLetterCountTracks the number of invocations that failed to be sent to the dead letter queue.
aws_scheduler_invocations_sent_to_dead_letter_countInvocationsSentToDeadLetterCountCounts the number of invocations successfully sent to the dead letter queue.
aws_scheduler_invocations_sent_to_dead_letter_count_truncated_message_size_exceededInvocationsSentToDeadLetterCount_Truncated_MessageSizeExceededMonitors the number of invocations sent to the dead letter queue due to exceeding message size.
aws_scheduler_target_error_countTargetErrorCountTracks the count of errors encountered by the target.
aws_scheduler_target_error_throttled_countTargetErrorThrottledCountCounts the number of target errors caused by throttling.

AWS/States

Function: AWS Step Functions for orchestrating workflows and coordinating services

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_states_info
aws_states_activities_failedActivitiesFailedTracks the number of failed activities.
aws_states_activities_heartbeat_timed_outActivitiesHeartbeatTimedOutMonitors activities whose heartbeat timed out.
aws_states_activities_scheduledActivitiesScheduledTracks the number of activities that have been scheduled.
aws_states_activities_startedActivitiesStartedMeasures the number of activities that have started.
aws_states_activities_succeededActivitiesSucceededTracks successful activities.
aws_states_activities_timed_outActivitiesTimedOutTracks the number of activities that timed out.
aws_states_activity_run_timeActivityRunTimeMonitors the runtime of activities.
aws_states_activity_schedule_timeActivityScheduleTimeTracks the schedule time for activities.
aws_states_activity_timeActivityTimeTracks the total time taken by an activity.
aws_states_consumed_capacityConsumedCapacityMeasures the consumed capacity for Step Functions.
aws_states_execution_throttledExecutionThrottledMonitors throttled execution attempts.
aws_states_execution_timeExecutionTimeTracks the total time taken by an execution.
aws_states_executions_abortedExecutionsAbortedTracks the number of executions that were aborted.
aws_states_executions_failedExecutionsFailedMeasures the number of failed executions.
aws_states_executions_startedExecutionsStartedTracks the number of executions that started.
aws_states_executions_succeededExecutionsSucceededTracks successful executions.
aws_states_executions_timed_outExecutionsTimedOutMonitors executions that timed out.
aws_states_express_execution_billed_durationExpressExecutionBilledDurationMeasures the billed duration for Express Workflows.
aws_states_express_execution_billed_memoryExpressExecutionBilledMemoryMeasures the billed memory for Express Workflows.
aws_states_express_execution_memoryExpressExecutionMemoryMonitors the memory consumed by Express Workflows.
aws_states_lambda_function_run_timeLambdaFunctionRunTimeMeasures the runtime of Lambda functions.
aws_states_lambda_function_schedule_timeLambdaFunctionScheduleTimeTracks the schedule time for Lambda functions.
aws_states_lambda_function_timeLambdaFunctionTimeTracks the total time taken by Lambda functions.
aws_states_lambda_functions_failedLambdaFunctionsFailedMonitors Lambda functions that failed.
aws_states_lambda_functions_scheduledLambdaFunctionsScheduledTracks the number of Lambda functions that were scheduled.
aws_states_lambda_functions_startedLambdaFunctionsStartedTracks Lambda functions that have started.
aws_states_lambda_functions_succeededLambdaFunctionsSucceededMeasures successful Lambda function executions.
aws_states_lambda_functions_timed_outLambdaFunctionsTimedOutMonitors Lambda functions that timed out.
aws_states_provisioned_bucket_sizeProvisionedBucketSizeTracks the provisioned bucket size for Step Functions.
aws_states_provisioned_refill_rateProvisionedRefillRateMeasures the rate at which provisioned capacity is refilled.
aws_states_service_integration_run_timeServiceIntegrationRunTime Measuresthe runtime of service integrations.
aws_states_service_integration_schedule_timeServiceIntegrationScheduleTimeTracks the schedule time for service integrations.
aws_states_service_integration_timeServiceIntegrationTimeMonitors the total time taken by service integrations.
aws_states_service_integrations_failedServiceIntegrationsFailedTracks failed service integrations.
aws_states_service_integrations_scheduledServiceIntegrationsScheduledMeasures the number of service integrations that were scheduled.
aws_states_service_integrations_startedServiceIntegrationsStartedTracks service integrations that have started.
aws_states_service_integrations_succeededServiceIntegrationsSucceededMonitors successful service integrations.
aws_states_service_integrations_timed_outServiceIntegrationsTimedOutMeasures service integrations that timed out.
aws_states_throttled_eventsThrottledEventsTracks the number of events that were throttled.

AWS/StorageGateway

Function: Hybrid cloud storage service connecting on-premises software appliances to AWS

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_storagegateway_info
aws_storagegateway_cache_freeCacheFreeTracks the amount of free cache space in the gateway.
aws_storagegateway_cache_hit_percentCacheHitPercentMonitors the percentage of read operations served by the cache.
aws_storagegateway_cache_percent_dirtyCachePercentDirtyMeasures the percentage of cache space that contains data that hasn’t been uploaded yet.
aws_storagegateway_cache_percent_usedCachePercentUsedTracks the percentage of used cache space.
aws_storagegateway_cache_usedCacheUsedMeasures the amount of cache space used.
aws_storagegateway_cloud_bytes_downloadedCloudBytesDownloadedTracks the amount of data downloaded from AWS to the gateway.
aws_storagegateway_cloud_bytes_uploadedCloudBytesUploadedMeasures the amount of data uploaded from the gateway to AWS.
aws_storagegateway_cloud_download_latencyCloudDownloadLatencyTracks the latency experienced during downloads from AWS.
aws_storagegateway_queued_writesQueuedWritesMonitors the number of write operations queued in the gateway.
aws_storagegateway_read_bytesReadBytesTracks the amount of data read by the gateway.
aws_storagegateway_read_timeReadTimeMeasures the time spent on read operations.
aws_storagegateway_time_since_last_recovery_pointTimeSinceLastRecoveryPointTracks the time since the last recovery point was created.
aws_storagegateway_total_cache_sizeTotalCacheSizeMeasures the total size of the cache.
aws_storagegateway_upload_buffer_freeUploadBufferFreeTracks the amount of free space in the upload buffer.
aws_storagegateway_upload_buffer_percent_usedUploadBufferPercentUsedMeasures the percentage of the upload buffer that is used.
aws_storagegateway_upload_buffer_usedUploadBufferUsedMonitors the amount of upload buffer space used.
aws_storagegateway_working_storage_freeWorkingStorageFreeMeasures the amount of free working storage in the gateway.
aws_storagegateway_working_storage_percent_usedWorkingStoragePercentUsedTracks the percentage of working storage used.
aws_storagegateway_working_storage_usedWorkingStorageUsedMonitors the amount of working storage used.
aws_storagegateway_write_bytesWriteBytesMonitors the amount of working storage used.
aws_storagegateway_write_timeWriteTimeTracks the time spent on write operations.

AWS/Timestream

Function: Managed time series database for IoT and operational applications

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_timestream_data_scanned_bytesDataScannedBytesTracks the total amount of data scanned by AWS Timestream during queries.
aws_timestream_successful_request_latencySuccessfulRequestLatencyMeasures the latency of successful requests sent to AWS Timestream.
aws_timestream_system_errorsSystemErrorsMonitors the number of system errors occurring in AWS Timestream.
aws_timestream_user_errorsUserErrorsTracks the number of user-generated errors in AWS Timestream, such as invalid queries.

AWS/TransitGateway

Function: Service for connecting VPCs and on-premises networks through a central hub

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_transitgateway_info
aws_transitgateway_bytes_inBytesInTracks the total number of bytes received by the Transit Gateway.
aws_transitgateway_bytes_outBytesOutMeasures the total number of bytes sent from the Transit Gateway.
aws_transitgateway_packet_drop_count_blackholePacketDropCountBlackholeMonitors the number of packets dropped due to blackholing (unreachable routes).
aws_transitgateway_packet_drop_count_no_routePacketDropCountNoRouteTracks the number of packets dropped due to no matching route found.
aws_transitgateway_packets_inPacketsInMeasures the total number of packets received by the Transit Gateway.
aws_transitgateway_packets_outPacketsOutTracks the total number of packets sent from the Transit Gateway.

AWS/TrustedAdvisor

Function: Provides real-time recommendations to improve AWS resource optimization and security. This service only produces metrics to specific regions in AWS. Any jobs configured with this service will only gather data from the us-east-1 regions.

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_trustedadvisor_green_checksGreenChecksTracks the number of Trusted Advisor checks in the green (optimal) status.
aws_trustedadvisor_red_checksRedChecksMeasures the number of Trusted Advisor checks that indicate critical issues (red status).
aws_trustedadvisor_red_resourcesRedResourcesTracks the number of resources flagged as critical or failing (red status).
aws_trustedadvisor_service_limit_usageServiceLimitUsageMonitors the usage of service limits based on Trusted Advisor service limit checks.
aws_trustedadvisor_yellow_checksYellowChecksMeasures the number of checks that show warnings (yellow status).
aws_trustedadvisor_yellow_resourcesYellowResourcesTracks the number of resources flagged as warnings or requiring attention (yellow status).

AWS/Usage

Function: Tracks AWS service usage for cost monitoring and optimization

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_usage_call_countCallCountTracks the number of API or service calls made.
aws_usage_resource_countResourceCountMeasures the number of resources in use or allocated in the AWS environment.

AWS/VPN

Function: Managed VPN service to securely connect on-premises networks to AWS

Scrape interval: 5 minutes

Includes: Out-of-the-box dashboard

MetricCloudwatch metricPurpose
aws_vpn_info
aws_vpn_tunnel_data_inTunnelDataInMonitors the amount of inbound data being transferred through the VPN tunnel. Helps track network traffic.
aws_vpn_tunnel_data_outTunnelDataOutTracks the amount of outbound data being transferred through the VPN tunnel. Useful for bandwidth monitoring.
aws_vpn_tunnel_stateTunnelStateMonitors the current status of the VPN tunnel (e.g., up or down). Helps in identifying tunnel connectivity issues.

AWS/WAFV2

Function: Web application firewall to protect applications from common web exploits

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_wafv2_info
aws_wafv2_allowed_requestsAllowedRequestsTracks the number of requests that are allowed by the WAF rules. Useful for monitoring legitimate traffic.
aws_wafv2_blocked_requestsBlockedRequestsMonitors the number of requests that are blocked by the WAF rules. Helps detect and prevent malicious traffic.
aws_wafv2_captcha_requestsCaptchaRequestsTracks the number of requests that triggered a CAPTCHA challenge. Useful for tracking potential bot traffic.
aws_wafv2_captchas_attemptedCaptchasAttemptedMonitors the number of CAPTCHA challenges that were attempted by users. Indicates user engagement with challenges.
aws_wafv2_captchas_solvedCaptchasSolvedTracks the number of CAPTCHA challenges successfully solved. Helps assess CAPTCHA effectiveness.
aws_wafv2_challenge_requestsChallengeRequestsMonitors the number of requests that triggered additional security challenges. Useful for advanced threat detection.
aws_wafv2_counted_requestsCountedRequestsTracks the number of requests counted for rule evaluation but not necessarily blocked or allowed.
aws_wafv2_passed_requestsPassedRequestsMonitors requests that passed through the challenge phase and were allowed access.
aws_wafv2_requests_with_valid_captcha_tokenRequestsWithValidCaptchaTokenTracks the number of requests with a valid CAPTCHA token. Useful for validating CAPTCHA implementation.
aws_wafv2_requests_with_valid_challenge_tokenRequestsWithValidChallengeTokenMonitors the number of requests with valid security challenge tokens. Helps track successful security checks.

AWS/WorkSpaces

Function: Managed desktop virtualization service for delivering cloud-based desktops

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_workspaces_info
aws_workspaces_availableAvailableMonitors the number of available WorkSpaces. Useful for tracking the availability of WorkSpaces for users.
aws_workspaces_connection_attemptConnectionAttemptTracks the number of connection attempts to WorkSpaces. Helps monitor user access and demand.
aws_workspaces_connection_failureConnectionFailureMonitors the number of failed connection attempts. Useful for identifying connectivity issues or failures.
aws_workspaces_connection_successConnectionSuccessTracks the number of successful connections to WorkSpaces. Indicates the success rate of user connections.
aws_workspaces_in_session_latencyInSessionLatencyMonitors the latency experienced by users during WorkSpaces sessions. Helps assess user experience quality.
aws_workspaces_maintenanceMaintenanceTracks the number of WorkSpaces under maintenance. Useful for understanding maintenance impact on availability.
aws_workspaces_session_disconnectSessionDisconnectMonitors the number of session disconnections. Helps detect connectivity issues or user-initiated disconnects.
aws_workspaces_session_launch_timeSessionLaunchTimeTracks the time taken to launch a WorkSpaces session. Useful for assessing the performance of WorkSpaces launches.
aws_workspaces_stoppedStoppedMonitors the number of WorkSpaces that are in the stopped state. Helps track WorkSpaces that are not running.
aws_workspaces_unhealthyUnhealthyTracks the number of unhealthy WorkSpaces. Useful for identifying potential issues with WorkSpaces health.
aws_workspaces_user_connectedUserConnectedMonitors the number of users currently connected to WorkSpaces. Helps measure active user engagement.

AmazonMWAA

Function: Managed service for Apache Airflow workflows in the cloud

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_amazonmwaa_info
aws_amazonmwaa_collect_dbdagsCollectDBDags Monitorshow often database DAGs are collected.
aws_amazonmwaa_critical_section_busyCriticalSectionBusyTracks the time spent when critical sections of code are busy.
aws_amazonmwaa_critical_section_durationCriticalSectionDurationMeasures the duration for which critical sections remain busy.
aws_amazonmwaa_critical_section_query_durationCriticalSectionQueryDurationMonitors the time spent querying within critical sections.
aws_amazonmwaa_dagdependency_checkDAGDependencyCheckMonitors dependency checks between DAGs.
aws_amazonmwaa_dagduration_failedDAGDurationFailedTracks the duration of failed DAG runs.
aws_amazonmwaa_dagduration_successDAGDurationSuccessTracks the duration of successful DAG runs.
aws_amazonmwaa_dagfile_processing_last_durationDAGFileProcessingLastDurationMeasures the last processing time for DAG files.
aws_amazonmwaa_dagfile_processing_last_run_seconds_agoDAGFileProcessingLastRunSecondsAgoTracks the time since the last DAG file processing run.
aws_amazonmwaa_dagfile_refresh_errorDAGFileRefreshErrorMonitors errors in refreshing DAG files.
aws_amazonmwaa_dagschedule_delayDAGScheduleDelayMonitors delays in DAG scheduling.
aws_amazonmwaa_dag_bag_sizeDagBagSizeTracks the size of the DAG bag.
aws_amazonmwaa_dag_callback_exceptionsDagCallbackExceptionsMonitors exceptions occurring in DAG callbacks.
aws_amazonmwaa_exception_failuresExceptionFailuresTracks the number of exception failures.
aws_amazonmwaa_executed_tasksExecutedTasksTracks the total number of executed tasks.
aws_amazonmwaa_failed_celery_task_executionFailedCeleryTaskExecutionMonitors failed task executions in Celery.
aws_amazonmwaa_failed_slacallbackFailedSLACallbackTracks failures in SLA callbacks.
aws_amazonmwaa_failed_slaemail_attemptsFailedSLAEmailAttemptsMonitors failed attempts to send SLA emails.
aws_amazonmwaa_file_path_queue_update_countFilePathQueueUpdateCountTracks the number of file path queue updates.
aws_amazonmwaa_first_task_scheduling_delayFirstTaskSchedulingDelayMeasures the delay in scheduling the first task.
aws_amazonmwaa_import_errorsImportErrorsMonitors errors encountered during imports.
aws_amazonmwaa_infra_failuresInfraFailuresTracks infrastructure failures in the environment.
aws_amazonmwaa_job_endJobEndMonitors the number of jobs completed.
aws_amazonmwaa_job_heartbeat_failureJobHeartbeatFailureTracks heartbeat failures for jobs.
aws_amazonmwaa_job_startJobStartMonitors the number of jobs started.
aws_amazonmwaa_loaded_tasksLoadedTasksTracks the number of tasks loaded in the environment.
aws_amazonmwaa_manager_stallsManagerStallsMonitors the number of times the manager process stalls.
aws_amazonmwaa_open_slotsOpenSlotsTracks the number of open task slots.
aws_amazonmwaa_operator_failuresOperatorFailuresTracks the number of operator task failures.
aws_amazonmwaa_operator_successesOperatorSuccessesTracks the number of operator task successes.
aws_amazonmwaa_orphanedOrphanedMonitors orphaned task instances.
aws_amazonmwaa_orphaned_tasks_adoptedOrphanedTasksAdoptedTracks the number of orphaned tasks adopted.
aws_amazonmwaa_orphaned_tasks_clearedOrphanedTasksClearedTracks the number of orphaned tasks cleared.
aws_amazonmwaa_other_callback_countOtherCallbackCountTracks the number of other callbacks occurring in the environment.
aws_amazonmwaa_poked_exceptionsPokedExceptionsMonitors the number of exceptions in poked tasks.
aws_amazonmwaa_poked_successPokedSuccessTracks successful pokes in tasks.
aws_amazonmwaa_poked_tasksPokedTasksTracks the number of poked tasks.
aws_amazonmwaa_pool_deferred_slotsPoolDeferredSlotsTracks deferred slots in task pools.
aws_amazonmwaa_pool_failuresPoolFailuresMonitors the number of task pool failures.
aws_amazonmwaa_pool_open_slotsPoolOpenSlotsTracks the number of open slots in the task pool.
aws_amazonmwaa_pool_queued_slotsPoolQueuedSlotsTracks the number of queued slots in the task pool.
aws_amazonmwaa_pool_running_slotsPoolRunningSlotsTracks the number of running slots in the task pool.
aws_amazonmwaa_pool_starving_tasksPoolStarvingTasksTracks tasks that are starving for resources in the task pool.
aws_amazonmwaa_processesProcessesTracks the number of processes running in the environment.
aws_amazonmwaa_processor_timeoutsProcessorTimeoutsMonitors timeouts in processors.
aws_amazonmwaa_queued_tasksQueuedTasksTracks the number of tasks in the queue.
aws_amazonmwaa_running_tasksRunningTasksTracks the number of running tasks in the environment.
aws_amazonmwaa_slamissedSLAMissedTracks the number of SLA misses in tasks.
aws_amazonmwaa_scheduler_heartbeatSchedulerHeartbeatMonitors the health of the scheduler through its heartbeat.
aws_amazonmwaa_scheduler_loop_durationSchedulerLoopDurationMeasures the duration of scheduler loops.
aws_amazonmwaa_sla_callback_countSlaCallbackCountTracks the number of SLA callbacks made.
aws_amazonmwaa_started_task_instancesStartedTaskInstancesMonitors the number of started task instances.
aws_amazonmwaa_task_instance_created_using_operatorTaskInstanceCreatedUsingOperatorTracks the number of task instances created using an operator.
aws_amazonmwaa_task_instance_durationTaskInstanceDurationMonitors the duration of task instances.
aws_amazonmwaa_task_instance_failuresTaskInstanceFailuresTracks the number of task instance failures.
aws_amazonmwaa_task_instance_finishedTaskInstanceFinishedMonitors the number of task instances that have finished.
aws_amazonmwaa_task_instance_previously_succeededTaskInstancePreviouslySucceededTracks the number of task instances that have previously succeeded.
aws_amazonmwaa_task_instance_queued_durationTaskInstanceQueuedDurationMeasures the time task instances spend in the queue before execution.
aws_amazonmwaa_task_instance_scheduled_durationTaskInstanceScheduledDurationTracks the duration of time task instances were scheduled.
aws_amazonmwaa_task_instance_successesTaskInstanceSuccessesTracks the number of successful task instances.
aws_amazonmwaa_task_removed_from_dagTaskRemovedFromDAGMonitors tasks that were removed from the DAG.
aws_amazonmwaa_task_restored_to_dagTaskRestoredToDAGTracks tasks that were restored to the DAG.
aws_amazonmwaa_task_timeout_errorTaskTimeoutErrorMonitors timeout errors in tasks.
aws_amazonmwaa_tasks_executableTasksExecutableTracks the number of executable tasks.
aws_amazonmwaa_tasks_killed_externallyTasksKilledExternallyTracks tasks that were killed externally.
aws_amazonmwaa_tasks_pendingTasksPendingMonitors pending tasks.
aws_amazonmwaa_tasks_runningTasksRunningTracks the number of tasks currently running.
aws_amazonmwaa_tasks_starvingTasksStarvingTracks the number of tasks starving for resources.
aws_amazonmwaa_tasks_without_dag_runTasksWithoutDagRunTracks tasks that are not associated with any DAG run.
aws_amazonmwaa_total_parse_timeTotalParseTimeMeasures the total time spent parsing DAG files.
aws_amazonmwaa_trigger_heartbeatTriggerHeartbeatTracks the heartbeat of task triggers.
aws_amazonmwaa_triggered_dag_runsTriggeredDagRunsMonitors the number of DAG runs triggered.
aws_amazonmwaa_triggers_blocked_main_threadTriggersBlockedMainThread Tracks the number of triggers that block the main thread.
aws_amazonmwaa_triggers_failedTriggersFailedMonitors failed task triggers.
aws_amazonmwaa_triggers_runningTriggersRunningTracks the number of running task triggers.
aws_amazonmwaa_triggers_succeededTriggersSucceededMonitors successful task triggers.
aws_amazonmwaa_updatesUpdatesTracks the number of updates made to DAGs and other configurations.
aws_amazonmwaa_zombies_killedZombiesKilled Monitorsthe number of zombie tasks killed in the environment.

ECS/ContainerInsights

Function: Provides monitoring and insights for ECS clusters, tasks, and containers

Scrape interval: 5 minutes

MetricCloudwatch metricPurpose
aws_ecs_containerinsights_info
aws_ecs_containerinsights_container_instance_countContainerInstanceCountTracks the number of container instances in a cluster.
aws_ecs_containerinsights_cpu_reservedCpuReservedMonitors the amount of CPU reserved for tasks.
aws_ecs_containerinsights_cpu_utilizedCpuUtilizedTracks the CPU utilization of running tasks.
aws_ecs_containerinsights_deployment_countDeploymentCountMeasures the number of service deployments.
aws_ecs_containerinsights_desired_task_countDesiredTaskCountMonitors the desired number of running tasks in a service.
aws_ecs_containerinsights_ebsfilesystem_sizeEBSFilesystemSizeTracks the size of the EBS filesystem attached to the ECS instance.
aws_ecs_containerinsights_ebsfilesystem_utilizedEBSFilesystemUtilizedMonitors the utilized space in the EBS filesystem.
aws_ecs_containerinsights_ephemeral_storage_reservedEphemeralStorageReservedMeasures the amount of reserved ephemeral storage for tasks.
aws_ecs_containerinsights_ephemeral_storage_utilizedEphemeralStorageUtilizedTracks the ephemeral storage utilized by tasks.
aws_ecs_containerinsights_memory_reservedMemoryReservedMonitors the amount of memory reserved for tasks in ECS.
aws_ecs_containerinsights_memory_utilizedMemoryUtilizedMeasures the memory utilized by tasks.
aws_ecs_containerinsights_network_rx_bytesNetworkRxBytesTracks the number of bytes received by the network interfaces on the instance.
aws_ecs_containerinsights_network_tx_bytesNetworkTxBytesMonitors the number of bytes transmitted from the network interfaces on the instance.
aws_ecs_containerinsights_pending_task_countPendingTaskCountMonitors the number of tasks that are in the pending state in the service.
aws_ecs_containerinsights_running_task_countRunningTaskCountTracks the number of running tasks in the service.
aws_ecs_containerinsights_service_countServiceCountMonitors the number of services running in the cluster.
aws_ecs_containerinsights_storage_read_bytesStorageReadBytesTracks the number of bytes read from the storage attached to the ECS instance.
aws_ecs_containerinsights_storage_write_bytesStorageWriteBytesMeasures the number of bytes written to storage.
aws_ecs_containerinsights_task_countTaskCountMonitors the total number of tasks running in the ECS cluster.
aws_ecs_containerinsights_task_set_countTaskSetCountMeasures the number of task sets in a service.
aws_ecs_containerinsights_instance_cpu_limitinstance_cpu_limitTracks the total CPU limit configured for the instance.
aws_ecs_containerinsights_instance_cpu_reserved_capacityinstance_cpu_reserved_capacityMeasures the reserved CPU capacity on the instance.
aws_ecs_containerinsights_instance_cpu_usage_totalinstance_cpu_usage_totalTracks the total CPU usage across all tasks on the instance.
aws_ecs_containerinsights_instance_cpu_utilizationinstance_cpu_utilizationMonitors the percentage of CPU utilization on the ECS instance.
aws_ecs_containerinsights_instance_filesystem_utilizationinstance_filesystem_utilizationTracks the utilization of the filesystem attached to the ECS instance.
aws_ecs_containerinsights_instance_memory_limitinstance_memory_limitMeasures the total memory limit configured for the instance.
aws_ecs_containerinsights_instance_memory_reserved_capacityinstance_memory_reserved_capacityTracks the reserved memory capacity on the instance.
aws_ecs_containerinsights_instance_memory_utilizationinstance_memory_utilizationMonitors the percentage of memory utilization on the ECS instance.
aws_ecs_containerinsights_instance_memory_working_setinstance_memory_working_setMeasures the working set memory on the instance, which is the amount of memory actively used.
aws_ecs_containerinsights_instance_network_total_bytesinstance_network_total_bytesTracks the total number of bytes transferred (both received and transmitted) by the network interfaces.
aws_ecs_containerinsights_instance_number_of_running_tasksinstance_number_of_running_tasksMonitors the total number of running tasks on the instance.
aws_ecs_containerinsights_instance_memory_utliizationinstance_memory_utliizationMeasures the memory utilization of the instance.