Cilium Enterprise integration for Grafana Cloud
The Cilium Enterprise integration uses Grafana Alloy to collect metrics exposed by the Cilium Operator, Cilium Agent and its components, as well as Hubble. A series of dashboards have been provided, both for overviews and per-component basis. This integration includes 18 useful alerts and 20 pre-built dashboards to help monitor and visualize Cilium Enterprise metrics.
Kubernetes instructions
Before you begin with Kubernetes
Please note: These instructions assume the use of the Kubernetes Monitoring Helm chart
This integration monitors a Cilium Enterprise & Hubble Enterprise deployment that has metrics exporters enabled. Please ensure you have completed the following setup steps:
- Enabled the embedded Prometheus exporter in your Cilium deployment to collect and expose metrics
- Enabled the embedded Prometheus exporter in Hubble if you want Hubble metrics to be included.
Once the exporters have been enabled, the metrics will be automatically exposed and available for collection by either Prometheus or Grafana Alloy deployed to your cluster.
This integration assumes Hubble metrics have been enabled for:
- dns
- drop
- tcp
- flow
- icmp
- http
e.g. via a helm command similar to the following, adjusted for Cilium Enterprise:
helm install <cilium-enterprise-repository> --version 1.12.2 \
--namespace kube-system \
--set hubble.metrics.enabled="{dns,drop,tcp,flow,icmp,http}"Cilium version 1.12.2 and greater is supported.
Configuration snippets for Kubernetes Helm chart
The following snippets provide examples to guide you through the configuration process.
To scrape your Cilium Enterprise instances, manually modify your Kubernetes Monitoring Helm chart with these configuration snippets.
Replace any values between the angle brackets <> in the provided snippets with your desired configuration values.
Metrics snippets
# Replace any values between the angle brackets '<>', with your desired configuration
alloy-metrics:
extraConfig: |-
// Cilium Agent
discovery.kubernetes "cilium_agent" {
role = "service"
selectors {
role = "service"
label = "k8s-app=cilium"
}
}
discovery.relabel "cilium_agent" {
targets = discovery.kubernetes.cilium_agent.targets
rule {
source_labels = ["__meta_kubernetes_endpoint_port_name"]
regex = "metrics"
action = "keep"
}
rule {
source_labels = ["__meta_kubernetes_service_label_k8s_app"]
target_label = "k8s_app"
}
}
prometheus.scrape "cilium_agent" {
targets = discovery.relabel.cilium_agent.output
job_name = "integrations/cilium-enterprise/cilium-agent"
honor_labels = true
forward_to = [prometheus.remote_write.grafana_cloud_metrics.receiver]
}
// Cilium Operator
discovery.kubernetes "cilium_operator" {
role = "service"
selectors {
role = "service"
label = "name=cilium-operator,io.cilium/app=operator"
}
}
discovery.relabel "cilium_operator" {
targets = discovery.kubernetes.cilium_operator.targets
rule {
source_labels = ["__meta_kubernetes_endpoint_port_name"]
regex = "metrics"
action = "keep"
}
rule {
source_labels = ["__meta_kubernetes_service_label_io_cilium_app_app"]
target_label = "io_cilium_app"
}
}
prometheus.scrape "cilium_operator" {
targets = discovery.relabel.cilium_operator.output
job_name = "integrations/cilium-enterprise/cilium-operator"
honor_labels = true
forward_to = [prometheus.remote_write.grafana_cloud_metrics.receiver]
}
// Hubble Relay
discovery.kubernetes "hubble_relay" {
role = "service"
selectors {
role = "service"
label = "k8s-app=hubble-relay"
}
}
discovery.relabel "hubble_relay" {
targets = discovery.kubernetes.hubble_relay.targets
rule {
source_labels = ["__meta_kubernetes_endpoint_port_name"]
regex = "metrics"
action = "keep"
}
}
prometheus.scrape "hubble_relay" {
targets = discovery.relabel.hubble_relay.output
job_name = "integrations/cilium-enterprise/hubble-relay"
forward_to = [prometheus.remote_write.grafana_cloud_metrics.receiver]
}
// Hubble
discovery.kubernetes "hubble" {
role = "service"
selectors {
role = "service"
label = "k8s-app=hubble"
}
}
discovery.relabel "hubble" {
targets = discovery.kubernetes.services.targets
rule {
source_labels = ["__meta_kubernetes_endpoint_port_name"]
regex = "hubble-metrics"
action = "keep"
}
}
prometheus.scrape "hubble" {
targets = discovery.relabel.hubble.output
job_name = "integrations/cilium-enterprise/hubble"
honor_labels = true
forward_to = [prometheus.remote_write.grafana_cloud_metrics.receiver]
}
// Hubble Enterprise
discovery.kubernetes "hubble_enterprise" {
role = "service"
selectors {
role = "service"
label = "app.kubernetes.io/name=hubble-enterprise"
}
}
discovery.relabel "hubble_enterprise" {
targets = discovery.kubernetes.hubble_enterprise.targets
rule {
source_labels = ["__meta_kubernetes_endpoint_port_name"]
regex = "metrics"
action = "keep"
}
}
prometheus.scrape "hubble_enterprise" {
targets = discovery.relabel.hubble_enterprise.output
job_name = "integrations/cilium-enterprise/hubble-enterprise"
honor_labels = true
forward_to = [prometheus.remote_write.grafana_cloud_metrics.receiver]
}
// Hubble Timescape Ingester
discovery.kubernetes "hubble_timescape_ingester" {
role = "service"
selectors {
role = "service"
label = "app.kubernetes.io/name=hubble-timescape-ingester,app.kubernetes.io/component=ingester"
}
}
discovery.relabel "hubble_timescape_ingester" {
targets = discovery.kubernetes.hubble_timescape_ingester.targets
rule {
source_labels = ["__meta_kubernetes_endpoint_port_name"]
regex = "metrics"
action = "keep"
}
}
prometheus.scrape "hubble_timescape_ingester" {
targets = discovery.relabel.hubble_timescape_ingester.output
job_name = "integrations/cilium-enterprise/hubble-timescape-ingester"
honor_labels = true
forward_to = [prometheus.remote_write.grafana_cloud_metrics.receiver]
}
// Hubble Timescape Server
discovery.kubernetes "hubble_timescape_server" {
role = "service"
selectors {
role = "service"
label = "app.kubernetes.io/name=hubble-timescape-server,app.kubernetes.io/component=server"
}
}
discovery.relabel "hubble_timescape_server" {
targets = discovery.kubernetes.hubble_timescape_server.targets
rule {
source_labels = ["__meta_kubernetes_endpoint_port_name"]
regex = "metrics"
action = "keep"
}
}
prometheus.scrape "hubble_timescape_server" {
targets = discovery.relabel.hubble_timescape_server.output
job_name = "integrations/cilium-enterprise/hubble-timescape-server"
honor_labels = true
forward_to = [prometheus.remote_write.grafana_cloud_metrics.receiver]
}Dashboards
The Cilium Enterprise integration installs the following dashboards in your Grafana Cloud instance to help monitor your system.
- Cilium / Agent Overview
- Cilium / Components / API
- Cilium / Components / Agent
- Cilium / Components / BPF
- Cilium / Components / Conntrack
- Cilium / Components / Datapath
- Cilium / Components / External HA FQDN Proxy
- Cilium / Components / FQDN Proxy
- Cilium / Components / Identities
- Cilium / Components / Kubernetes
- Cilium / Components / L3 Policy
- Cilium / Components / L7 Proxy
- Cilium / Components / Network
- Cilium / Components / Nodes
- Cilium / Components / Policy
- Cilium / Components / Resource Utilization
- Cilium / Operator
- Cilium / Overview
- Hubble / Overview
- Hubble / Timescape
Cilium Overview

Cilium Overview (2)

Cilium Agent Overview

Alerts
The Cilium Enterprise integration includes the following useful alerts:
Cilium Endpoints
Cilium IPAM
Cilium Maps
Cilium NAT
Cilium API
Cilium Conntrack
Cilium Drops
Cilium Policy
Cilium Identity
Cilium Nodes
Metrics
The most important metrics provided by the Cilium Enterprise integration, which are used on the pre-built dashboards and Prometheus alerts, are as follows:
- cilium_agent_api_process_time_seconds_count
- cilium_agent_api_process_time_seconds_sum
- cilium_api_limiter_processed_requests_total
- cilium_bpf_map_ops_total
- cilium_bpf_map_pressure
- cilium_controllers_runs_duration_seconds_count
- cilium_controllers_runs_duration_seconds_sum
- cilium_controllers_runs_total
- cilium_datapath_conntrack_gc_duration_seconds_count
- cilium_datapath_conntrack_gc_duration_seconds_sum
- cilium_datapath_conntrack_gc_entries
- cilium_datapath_conntrack_gc_key_fallbacks_total
- cilium_datapath_conntrack_gc_runs_total
- cilium_drop_bytes_total
- cilium_drop_count_total
- cilium_endpoint_regeneration_time_stats_seconds_count
- cilium_endpoint_regeneration_time_stats_seconds_sum
- cilium_endpoint_regenerations_total
- cilium_endpoint_state
- cilium_errors_warnings_total
- cilium_forward_bytes_total
- cilium_forward_count_total
- cilium_identity
- cilium_ip_addresses
- cilium_k8s_client_api_calls_total
- cilium_k8s_client_api_latency_time_seconds_count
- cilium_k8s_client_api_latency_time_seconds_sum
- cilium_kubernetes_events_received_total
- cilium_kubernetes_events_total
- cilium_nodes_all_events_received_total
- cilium_nodes_all_num
- cilium_operator_ces_queueing_delay_seconds_bucket
- cilium_operator_ces_sync_errors_total
- cilium_operator_ec2_api_duration_seconds_bucket
- cilium_operator_identity_gc_entries
- cilium_operator_identity_gc_runs
- cilium_operator_ipam_allocation_ops
- cilium_operator_ipam_deficit_resolver_duration_seconds_bucket
- cilium_operator_ipam_interface_creation_ops
- cilium_operator_ipam_ips
- cilium_operator_ipam_k8s_sync_queued_total
- cilium_operator_ipam_nodes
- cilium_operator_ipam_resync_queued_total
- cilium_operator_ipam_resync_total
- cilium_operator_number_of_ceps_per_ces_sum
- cilium_operator_process_cpu_seconds_total
- cilium_operator_process_open_fds
- cilium_operator_process_resident_memory_bytes
- cilium_operator_process_virtual_memory_bytes
- cilium_policy
- cilium_policy_endpoint_enforcement_status
- cilium_policy_l7_denied_total
- cilium_policy_l7_forwarded_total
- cilium_policy_l7_received_total
- cilium_proxy_redirects
- cilium_proxy_upstream_reply_seconds_count
- cilium_proxy_upstream_reply_seconds_sum
- cilium_services_events_total
- cilium_triggers_policy_update_call_duration_seconds_count
- cilium_triggers_policy_update_call_duration_seconds_sum
- cilium_unreachable_nodes
- cilium_version
- hubble_dns_queries_total
- hubble_dns_response_types_total
- hubble_dns_responses_total
- hubble_drop_total
- hubble_flows_processed_total
- hubble_http_request_duration_seconds_bucket
- hubble_http_requests_total
- hubble_http_responses_total
- hubble_icmp_total
- hubble_port_distribution_total
- hubble_tcp_flags_total
- isovalent_external_dns_proxy_policy_l7_total
- isovalent_external_dns_proxy_processing_duration_seconds
- isovalent_external_dns_proxy_update_errors_total
- isovalent_external_dns_proxy_update_queue_size
- timescape_clickhouse_queries_duration_seconds_bucket
- timescape_clickhouse_queries_results_count
- timescape_clickhouse_queries_results_sum
- timescape_ingestor_flows_ingested_total
- timescape_ingestor_ingest_duration_seconds_bucket
- timescape_ingestor_ingest_running
- timescape_ingestor_ingestfilter_batch_duration_seconds_bucket
- timescape_ingestor_ingestfilter_filtered_errors_total
- timescape_ingestor_ingestfilter_filtered_skipped_total
- timescape_ingestor_ingestfilter_filtered_total
- timescape_ingestor_ingestlog_getinfo_queries
- up
Changelog
# 1.0.0 - June 2024
* Update Mixin to latest version
- Removed pod filter from alert rules
- Added thresholds for alerts using rate()
- Added aggregation label support
# 0.0.4 - November 2023
* Replaced Angular dashboard panels with React panels
# 0.0.3 - July 2023
* Added support for using the integration in the Grafana Cloud Kubernetes App
* Update all scrape intervals to be 60s
* Fix job name to correct value in static agent config
# 0.0.2 - January 2023
* Update mixin to latest version:
- Add new alert `CiliumOperatorEniIpamErrors` to alert on errors related to allocating new IPAM addresses and situations where nodes are experiencing IPAM exhaustion
- Fix alert conditions to trigger correctly
# 0.0.1 - October 2022
* Initial releaseCost
By connecting your Cilium Enterprise instance to Grafana Cloud, you might incur charges. To view information on the number of active series that your Grafana Cloud account uses for metrics included in each Cloud tier, see Active series and dpm usage and Cloud tier pricing.



