Monitor infrastructure

Grafana integrations

Integrations reference

Alloy Health

Grafana Cloud

Alloy Health integration for Grafana Cloud

The Alloy health integration lets you monitor metrics and logs of Alloy instances.

It can be seen as an extended version of the Alloy Collector monitoring app available under the Connections menu, providing deeper insights over your Alloy deployment environment, including dashboards for Alloy clusters, OTel specific components, resource utilization, logs and more, together with bundled alerts.

This integration includes 10 useful alerts and 7 pre-built dashboards to help monitor and visualize Alloy Health metrics and logs.

Before you begin

This integration relies on metrics emitted by Grafana Alloy. See the following sections for details.

Install Alloy Health integration for Grafana Cloud

In your Grafana Cloud stack, click Connections in the left-hand menu.
Find Alloy Health and click its tile to open the integration.
Review the prerequisites in the Configuration Details tab and set up Grafana Agent to send Alloy Health metrics and logs to your Grafana Cloud instance.
Click Install to add this integration’s pre-built dashboards and alerts to your Grafana Cloud instance, and you can start monitoring your Alloy Health setup.

Configuration snippets for Grafana Alloy

Simple mode

These snippets are configured to scrape a single Grafana alloy node running locally with default ports and log paths.

Manually copy and append the following snippets into your Grafana Alloy configuration file.

You can remove any component named alloy_check from your alloy configuration file, to avoid duplication of metrics.

Integrations snippets

prometheus.exporter.self "integrations_alloy_health" { }

discovery.relabel "integrations_alloy_health" {
  targets = prometheus.exporter.self.integrations_alloy_health.targets

  rule {
    replacement = constants.hostname
    target_label  = "instance"
  }

  rule {
    target_label = "job"
    replacement  = "integrations/alloy"
  }
}

prometheus.scrape "integrations_alloy_health" {
  targets    = discovery.relabel.integrations_alloy_health.output
  forward_to = [prometheus.remote_write.metrics_service.receiver]
  job_name   = "integrations/alloy"
}

Logs snippets

darwin

logging {
  write_to = [loki.process.logs_integrations_integrations_alloy_health.receiver]
}

loki.process "logs_integrations_integrations_alloy_health" {
  forward_to = [loki.relabel.logs_integrations_integrations_alloy_health.receiver]

  stage.regex {
    expression = "(level=(?P<log_level>[\\s]*debug|warn|info|error))"
  }
  
  stage.labels {
    values = {
      level = "log_level",
    }
  }
}

loki.relabel "logs_integrations_integrations_alloy_health" {

  forward_to = [loki.write.grafana_cloud_loki.receiver]

  rule {
    replacement = constants.hostname
    target_label  = "instance"
  }

  rule {
    target_label = "job"
    replacement  = "integrations/alloy"
  }
}

linux

logging {
  write_to = [loki.process.logs_integrations_integrations_alloy_health.receiver]
}

loki.process "logs_integrations_integrations_alloy_health" {
  forward_to = [loki.relabel.logs_integrations_integrations_alloy_health.receiver]

  stage.regex {
    expression = "(level=(?P<log_level>[\\s]*debug|warn|info|error))"
  }
  
  stage.labels {
    values = {
      level = "log_level",
    }
  }
}

loki.relabel "logs_integrations_integrations_alloy_health" {

  forward_to = [loki.write.grafana_cloud_loki.receiver]

  rule {
    replacement = constants.hostname
    target_label  = "instance"
  }

  rule {
    target_label = "job"
    replacement  = "integrations/alloy"
  }
}

windows

logging {
  write_to = [loki.process.logs_integrations_integrations_alloy_health.receiver]
}

loki.process "logs_integrations_integrations_alloy_health" {
  forward_to = [loki.relabel.logs_integrations_integrations_alloy_health.receiver]

  stage.regex {
    expression = "(level=(?P<log_level>[\\s]*debug|warn|info|error))"
  }
  
  stage.labels {
    values = {
      level = "log_level",
    }
  }
}

loki.relabel "logs_integrations_integrations_alloy_health" {

  forward_to = [loki.write.grafana_cloud_loki.receiver]

  rule {
    replacement = constants.hostname
    target_label  = "instance"
  }

  rule {
    target_label = "job"
    replacement  = "integrations/alloy"
  }
}

Advanced mode

The following snippets provide examples to guide you through the configuration process.

Manually copy and append the snippets to your alloy configuration file, then follow subsequent instructions.

You can remove any component named alloy_check from your alloy configuration file, to avoid duplication of metrics.

Advanced integrations snippets

prometheus.exporter.self "integrations_alloy_health" { }

discovery.relabel "integrations_alloy_health" {
  targets = prometheus.exporter.self.integrations_alloy_health.targets

  rule {
    replacement = constants.hostname
    target_label  = "instance"
  }

  rule {
    target_label = "job"
    replacement  = "integrations/alloy"
  }
}

prometheus.scrape "integrations_alloy_health" {
  targets    = discovery.relabel.integrations_alloy_health.output
  forward_to = [prometheus.remote_write.metrics_service.receiver]
  job_name   = "integrations/alloy"
}

This integrations uses the prometheus.exporter.self component to generate metrics from the alloy instance itself.

For the full array of configuration options, refer to the prometheus.exporter.self component reference documentation.

This exporter must be linked with a discovery.relabel component to apply the necessary relabeling rules.

Configure the following properties within the discovery.relabel component:

instance label: constants.hostname sets the instance label to your Grafana Alloy server hostname. If that is not suitable, change it to a value uniquely identifies this alloy instance.

If you are running Grafana Alloy in cluster mode, beware that you may need to change instance label as multiple instances might be running in your server.

You can then scrape them by including each discovery.relabel under targets within the prometheus.scrape component.

Advanced logs snippets

darwin

logging {
  write_to = [loki.process.logs_integrations_integrations_alloy_health.receiver]
}

loki.process "logs_integrations_integrations_alloy_health" {
  forward_to = [loki.relabel.logs_integrations_integrations_alloy_health.receiver]

  stage.regex {
    expression = "(level=(?P<log_level>[\\s]*debug|warn|info|error))"
  }
  
  stage.labels {
    values = {
      level = "log_level",
    }
  }
}

loki.relabel "logs_integrations_integrations_alloy_health" {

  forward_to = [loki.write.grafana_cloud_loki.receiver]

  rule {
    replacement = constants.hostname
    target_label  = "instance"
  }

  rule {
    target_label = "job"
    replacement  = "integrations/alloy"
  }
}

This integration uses the logging configuration block to collect logs internally and direct it to a loki.process and a loki.relabel component to set necessary labels.

Beware that the logging configuration block is also used to set the logging level and format you want alloy to use. If you already have such a block in you configuration file, just include the write_to property to it, as just one should be present.

If you have installed a single Grafana Alloy instance in your server, it should be ready to use. Otherwise, configure the following property within the loki.relabel component:

instance label: constants.hostname sets the instance label to your Grafana Alloy server hostname. If that is not suitable, change it to a value uniquely identifies this alloy instance.

If you are running Grafana Alloy in cluster mode, beware that you may need to change instance label as multiple instances might be running in your server.

You can check the documentation linked above for the full array of options for each component.

linux

logging {
  write_to = [loki.process.logs_integrations_integrations_alloy_health.receiver]
}

loki.process "logs_integrations_integrations_alloy_health" {
  forward_to = [loki.relabel.logs_integrations_integrations_alloy_health.receiver]

  stage.regex {
    expression = "(level=(?P<log_level>[\\s]*debug|warn|info|error))"
  }
  
  stage.labels {
    values = {
      level = "log_level",
    }
  }
}

loki.relabel "logs_integrations_integrations_alloy_health" {

  forward_to = [loki.write.grafana_cloud_loki.receiver]

  rule {
    replacement = constants.hostname
    target_label  = "instance"
  }

  rule {
    target_label = "job"
    replacement  = "integrations/alloy"
  }
}

This integration uses the logging configuration block to collect logs internally and direct it to a loki.process and a loki.relabel component to set necessary labels.

If you have installed a single Grafana Alloy instance in your server, it should be ready to use. Otherwise, configure the following property within the loki.relabel component:

instance label: constants.hostname sets the instance label to your Grafana Alloy server hostname. If that is not suitable, change it to a value uniquely identifies this alloy instance.

If you are running Grafana Alloy in cluster mode, beware that you may need to change instance label as multiple instances might be running in your server.

You can check the documentation linked above for the full array of options for each component.

windows

logging {
  write_to = [loki.process.logs_integrations_integrations_alloy_health.receiver]
}

loki.process "logs_integrations_integrations_alloy_health" {
  forward_to = [loki.relabel.logs_integrations_integrations_alloy_health.receiver]

  stage.regex {
    expression = "(level=(?P<log_level>[\\s]*debug|warn|info|error))"
  }
  
  stage.labels {
    values = {
      level = "log_level",
    }
  }
}

loki.relabel "logs_integrations_integrations_alloy_health" {

  forward_to = [loki.write.grafana_cloud_loki.receiver]

  rule {
    replacement = constants.hostname
    target_label  = "instance"
  }

  rule {
    target_label = "job"
    replacement  = "integrations/alloy"
  }
}

This integration uses the logging configuration block to collect logs internally and direct it to a loki.process and a loki.relabel component to set necessary labels.

If you have installed a single Grafana Alloy instance in your server, it should be ready to use. Otherwise, configure the following property within the loki.relabel component:

instance label: constants.hostname sets the instance label to your Grafana Alloy server hostname. If that is not suitable, change it to a value uniquely identifies this alloy instance.

If you are running Grafana Alloy in cluster mode, beware that you may need to change instance label as multiple instances might be running in your server.

You can check the documentation linked above for the full array of options for each component.

Dashboards

The Alloy Health integration installs the following dashboards in your Grafana Cloud instance to help monitor your system.

Alloy / Cluster Node
Alloy / Cluster Overview
Alloy / Controller
Alloy / Logs Overview
Alloy / OpenTelemetry
Alloy / Prometheus Components
Alloy / Resources

Alloy resources usage overview

Prometheus related components overview

Overall components overview

Alerts

The Alloy Health integration includes the following useful alerts:

alloy_clustering

Alert	Description
ClusterNotConverging	Warning: Cluster is not converging.
ClusterNodeCountMismatch	Warning: Nodes report different number of peers vs. the count of observed Alloy metrics.
ClusterNodeUnhealthy	Warning: Cluster unhealthy.
ClusterNodeNameConflict	Warning: Cluster Node Name Conflict.
ClusterNodeStuckTerminating	Warning: Cluster node stuck in Terminating state.
ClusterConfigurationDrift	Warning: Cluster configuration drifting.

alloy_controller

Alert	Description
SlowComponentEvaluations	Warning: Component evaluations are taking too long.
UnhealthyComponents	Warning: Unhealthy components detected.

alloy_otelcol

Alert	Description
OtelcolReceiverRefusedSpans	Warning: The receiver pushing spans to the pipeline success rate is below 95%.
OtelcolExporterFailedSpans	Warning: The exporter sending spans success rate is below 95%.

Metrics

The most important metrics provided by the Alloy Health integration, which are used on the pre-built dashboards and Prometheus alerts, are as follows:

alloy_build_info
alloy_component_controller_running_components
alloy_component_dependencies_wait_seconds
alloy_component_dependencies_wait_seconds_bucket
alloy_component_evaluation_seconds
alloy_component_evaluation_seconds_bucket
alloy_component_evaluation_seconds_count
alloy_component_evaluation_seconds_sum
alloy_component_evaluation_slow_seconds
alloy_config_hash
alloy_resources_machine_rx_bytes_total
alloy_resources_machine_tx_bytes_total
alloy_resources_process_cpu_seconds_total
alloy_resources_process_resident_memory_bytes
cluster_node_gossip_health_score
cluster_node_gossip_proto_version
cluster_node_gossip_received_events_total
cluster_node_info
cluster_node_lamport_time
cluster_node_peers
cluster_node_update_observers
cluster_transport_rx_bytes_total
cluster_transport_rx_packet_queue_length
cluster_transport_rx_packets_failed_total
cluster_transport_rx_packets_total
cluster_transport_stream_rx_bytes_total
cluster_transport_stream_rx_packets_failed_total
cluster_transport_stream_rx_packets_total
cluster_transport_stream_tx_bytes_total
cluster_transport_stream_tx_packets_failed_total
cluster_transport_stream_tx_packets_total
cluster_transport_streams
cluster_transport_tx_bytes_total
cluster_transport_tx_packet_queue_length
cluster_transport_tx_packets_failed_total
cluster_transport_tx_packets_total
go_gc_duration_seconds_count
go_goroutines
go_memstats_heap_inuse_bytes
otelcol_exporter_send_failed_spans_total
otelcol_exporter_sent_spans_total
otelcol_processor_batch_batch_send_size_bucket
otelcol_processor_batch_metadata_cardinality
otelcol_processor_batch_timeout_trigger_send_total
otelcol_receiver_accepted_spans_total
otelcol_receiver_refused_spans_total
prometheus_remote_storage_bytes_total
prometheus_remote_storage_highest_timestamp_in_seconds
prometheus_remote_storage_metadata_bytes_total
prometheus_remote_storage_queue_highest_sent_timestamp_seconds
prometheus_remote_storage_samples_failed_total
prometheus_remote_storage_samples_retried_total
prometheus_remote_storage_samples_total
prometheus_remote_storage_sent_batch_duration_seconds_bucket
prometheus_remote_storage_sent_batch_duration_seconds_count
prometheus_remote_storage_sent_batch_duration_seconds_sum
prometheus_remote_storage_shards
prometheus_remote_storage_shards_max
prometheus_remote_storage_shards_min
prometheus_remote_write_wal_samples_appended_total
prometheus_remote_write_wal_storage_active_series
rpc_server_duration_milliseconds_bucket
scrape_duration_seconds
up

Changelog

# 1.0.2 - November 2024

* Update mixin to the latest version 
    * Include `cluster_name` label into cluster related alerts to avoid false positives.
    * Otel metric renames

# 1.0.1 - November 2024

* Update status panel check queries

# 1.0.0 - July 2024

* Initial release

Cost

By connecting your Alloy Health instance to Grafana Cloud, you might incur charges. To view information on the number of active series that your Grafana Cloud account uses for metrics included in each Cloud tier, see Active series and dpm usage and Cloud tier pricing.