Cilium Enterprise integration for Grafana Cloud
Cilium is an open source software that provides, secures and observes network connectivity between container workloads - cloud native, and fueled by the revolutionary Kernel technology eBPF. The Cilium Enterprise integration uses the Grafana Agent to collect metrics exposed by the Cilium Operator, Cilium Agent and its components, as well as Hubble.
This integration includes:
- Dashboards to visualize high-level cluster status, resource usage for nodes, network status, policies, and more.
- Dashboards to visualize aggregate Hubble metrics, such as flow distribution, port usage, deeper dive network information, and DNS status within the cluster.
- A set of alerting rules to monitor core Cilium components.
- Pre-configured Agent manifest to scrape Cilium Enterprise, Hubble, and Hubble Timescape metrics.
Before you begin
Before you begin, you should have the following available:
- A Kubernetes cluster with role-based access control (RBAC) enabled.
- The
kubectl
command-line tool installed on your local machine, configured to connect to your cluster. To learn more aboutkubectl
, see the Kubernetes documentation.
Note: This integration is intended for use with Cilium Enterprise, and does not support the Cilium OSS product.
Pre-install configuration for the Cilium Enterprise integration
This integration monitors a Cilium Enterprise & Hubble Enterprise deployment that has metrics exporters enabled. Please ensure you have completed the following setup steps:
- Enabled the embedded Prometheus exporter in your Cilium deployment to collect and expose metrics
- Enabled the embedded Prometheus exporter in Hubble if you want Hubble metrics to be included.
Once the exporters have been enabled the metrics are automatically exposed and available to be scraped by either Prometheus or a Grafana Agent deployed to the cluster.
The following sample using Helm enables the Prometheus metrics endpoint and configures the relevant metrics for Cilium Agent, Cilium Operator, and Hubble:
helm install cilium cilium/cilium --version 1.12.2 \
--namespace kube-system \
--set prometheus.enabled=true \
--set operator.prometheus.enabled=true
Sample Hubble metrics setup using helm, specifying which metrics to enable capture and export for:
helm install cilium cilium/cilium --version 1.12.2 \
--namespace kube-system \
--set hubble.metrics.enabled="{dns,drop,tcp,flow,icmp,http}"
Install Cilium Enterprise integration for Grafana Cloud
- From your Grafana Cloud instance, click Integrations and Connections (lightning bolt icon).
- Navigate to the Cilium Enterprise tile and click Install Integration.
- Wait for the integration to be installed and then follow the steps shown to set up Grafana Agent to automatically scrape and send Cilium Enterprise metrics to your Grafana Cloud Instance.
Configure Grafana Agent
The Cilium Enterprise integration uses the Grafana Agent, deployed into your cluster, to scrape metrics from your Cilium and Hubble deployment.
The agent must be configured to scrape your Cilium and Hubble deployment correctly. To do so, please be sure to replace the capitalized variables beginning with YOUR_*
with the appropriate cluster name and Grafana Cloud credentials.
Note: When configuring multiple clusters to report metrics, ensure unique cluster names are used.
To configure the Grafana Agent, copy and paste the following script into your shell:
cat <<'EOF' |
kind: ConfigMap
metadata:
name: grafana-agent
apiVersion: v1
data:
agent.yaml: |
metrics:
wal_directory: /var/lib/agent/wal
global:
scrape_interval: 60s
external_labels:
cluster: YOUR_CLUSTER_NAME
configs:
- name: integrations
remote_write:
- url: YOUR_PROMETHEUS_REMOTE_WRITE_URL
basic_auth:
username: YOUR_PROMETHEUS_REMOTE_WRITE_USERNAME
password: YOUR_PROMETHEUS_REMOTE_WRITE_PASSWORD
scrape_configs:
- job_name: integrations/cilium-enterprise/hubble-timescape-ingestor
honor_labels: true
honor_timestamps: true
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
follow_redirects: true
metric_relabel_configs:
- source_labels: [__name__]
regex: cilium_controllers_runs_duration_seconds_count|hubble_http_requests_total|cilium_operator_process_resident_memory_bytes|cilium_k8s_client_api_calls_total|cilium_controllers_runs_duration_seconds_sum|cilium_operator_ipam_nodes|cilium_datapath_conntrack_gc_entries|hubble_flows_processed_total|cilium_proxy_upstream_reply_seconds_count|timescape_ingestor_ingest_running|cilium_policy_l7_denied_total|timescape_ingestor_ingestfilter_filtered_skipped_total|cilium_operator_process_cpu_seconds_total|cilium_operator_ec2_api_duration_seconds_sum|cilium_controllers_runs_total|timescape_ingestor_ingestfilter_filtered_total|cilium_policy_l7_received_total|hubble_http_responses_total|cilium_version|timescape_clickhouse_queries_results_count|cilium_drop_bytes_total|cilium_ip_addresses|cilium_operator_ipam_interface_creation_ops|cilium_operator_ec2_api_rate_limit_duration_seconds_sum|cilium_k8s_client_api_latency_time_seconds_count|cilium_forward_bytes_total|isovalent_external_dns_proxy_processing_duration_seconds|cilium_services_events_total|cilium_endpoint_regeneration_time_stats_seconds_count|cilium_kubernetes_events_total|cilium_triggers_policy_update_call_duration_seconds_sum|cilium_triggers_policy_update_call_duration_seconds_count|timescape_ingestor_ingestfilter_filtered_errors_total|timescape_clickhouse_queries_results_sum|isovalent_external_dns_proxy_update_queue_size|cilium_operator_ec2_api_duration_seconds_count|cilium_operator_ec2_api_rate_limit_duration_seconds_count|timescape_ingestor_ingestlog_getinfo_queries|hubble_tcp_flags_total|hubble_icmp_total|cilium_policy_endpoint_enforcement_status|cilium_forward_count_total|cilium_datapath_conntrack_gc_duration_seconds_sum|hubble_dns_response_types_total|isovalent_external_dns_proxy_update_errors_total|cilium_operator_ipam_resync_total|cilium_drop_count_total|hubble_dns_queries_total|cilium_policy_l7_forwarded_total|cilium_agent_api_process_time_seconds_count|timescape_clickhouse_queries_duration_seconds_bucket|timescape_ingestor_ingestfilter_batch_duration_seconds_bucket|cilium_endpoint_regenerations_total|cilium_proxy_upstream_reply_seconds_sum|cilium_identity|cilium_k8s_client_api_latency_time_seconds_sum|cilium_bpf_map_ops_total|cilium_errors_warnings_total|cilium_datapath_conntrack_gc_duration_seconds_count|hubble_dns_responses_total|cilium_operator_ipam_ips|cilium_endpoint_state|cilium_endpoint_regeneration_time_stats_seconds_sum|hubble_port_distribution_total|cilium_nodes_all_num|isovalent_external_dns_proxy_policy_l7_total|cilium_datapath_conntrack_gc_runs_total|timescape_ingestor_ingest_duration_seconds_bucket|hubble_drop_total|cilium_agent_api_process_time_seconds_sum|timescape_ingestor_flows_ingested_total|cilium_proxy_redirects|cilium_datapath_conntrack_gc_key_fallbacks_total|hubble_http_request_duration_seconds_bucket|cilium_nodes_all_events_received_total|cilium_kubernetes_events_received_total|cilium_policy|cilium_operator_ipam_available
action: keep
relabel_configs:
- source_labels: [job]
separator: ;
regex: (.*)
target_label: __tmp_prometheus_job_name
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_instance, __meta_kubernetes_service_labelpresent_app_kubernetes_io_instance]
separator: ;
regex: (hubble-timescape);true
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name, __meta_kubernetes_service_labelpresent_app_kubernetes_io_name]
separator: ;
regex: (hubble-timescape-ingestor);true
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_container_name]
separator: ;
regex: (.*)
target_label: container
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: metrics
action: replace
- source_labels: [__address__]
separator: ;
regex: (.*)
modulus: 1
target_label: __tmp_hash
replacement: $1
action: hashmod
- source_labels: [__tmp_hash]
separator: ;
regex: "0"
replacement: $1
action: keep
kubernetes_sd_configs:
- role: endpoints
kubeconfig_file: ""
follow_redirects: true
- job_name: integrations/cilium-enterprise/hubble-timescape-server
honor_labels: true
honor_timestamps: true
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
follow_redirects: true
metric_relabel_configs:
- source_labels: [__name__]
regex: cilium_controllers_runs_duration_seconds_count|hubble_http_requests_total|cilium_operator_process_resident_memory_bytes|cilium_k8s_client_api_calls_total|cilium_controllers_runs_duration_seconds_sum|cilium_operator_ipam_nodes|cilium_datapath_conntrack_gc_entries|hubble_flows_processed_total|cilium_proxy_upstream_reply_seconds_count|timescape_ingestor_ingest_running|cilium_policy_l7_denied_total|timescape_ingestor_ingestfilter_filtered_skipped_total|cilium_operator_process_cpu_seconds_total|cilium_operator_ec2_api_duration_seconds_sum|cilium_controllers_runs_total|timescape_ingestor_ingestfilter_filtered_total|cilium_policy_l7_received_total|hubble_http_responses_total|cilium_version|timescape_clickhouse_queries_results_count|cilium_drop_bytes_total|cilium_ip_addresses|cilium_operator_ipam_interface_creation_ops|cilium_operator_ec2_api_rate_limit_duration_seconds_sum|cilium_k8s_client_api_latency_time_seconds_count|cilium_forward_bytes_total|isovalent_external_dns_proxy_processing_duration_seconds|cilium_services_events_total|cilium_endpoint_regeneration_time_stats_seconds_count|cilium_kubernetes_events_total|cilium_triggers_policy_update_call_duration_seconds_sum|cilium_triggers_policy_update_call_duration_seconds_count|timescape_ingestor_ingestfilter_filtered_errors_total|timescape_clickhouse_queries_results_sum|isovalent_external_dns_proxy_update_queue_size|cilium_operator_ec2_api_duration_seconds_count|cilium_operator_ec2_api_rate_limit_duration_seconds_count|timescape_ingestor_ingestlog_getinfo_queries|hubble_tcp_flags_total|hubble_icmp_total|cilium_policy_endpoint_enforcement_status|cilium_forward_count_total|cilium_datapath_conntrack_gc_duration_seconds_sum|hubble_dns_response_types_total|isovalent_external_dns_proxy_update_errors_total|cilium_operator_ipam_resync_total|cilium_drop_count_total|hubble_dns_queries_total|cilium_policy_l7_forwarded_total|cilium_agent_api_process_time_seconds_count|timescape_clickhouse_queries_duration_seconds_bucket|timescape_ingestor_ingestfilter_batch_duration_seconds_bucket|cilium_endpoint_regenerations_total|cilium_proxy_upstream_reply_seconds_sum|cilium_identity|cilium_k8s_client_api_latency_time_seconds_sum|cilium_bpf_map_ops_total|cilium_errors_warnings_total|cilium_datapath_conntrack_gc_duration_seconds_count|hubble_dns_responses_total|cilium_operator_ipam_ips|cilium_endpoint_state|cilium_endpoint_regeneration_time_stats_seconds_sum|hubble_port_distribution_total|cilium_nodes_all_num|isovalent_external_dns_proxy_policy_l7_total|cilium_datapath_conntrack_gc_runs_total|timescape_ingestor_ingest_duration_seconds_bucket|hubble_drop_total|cilium_agent_api_process_time_seconds_sum|timescape_ingestor_flows_ingested_total|cilium_proxy_redirects|cilium_datapath_conntrack_gc_key_fallbacks_total|hubble_http_request_duration_seconds_bucket|cilium_nodes_all_events_received_total|cilium_kubernetes_events_received_total|cilium_policy|cilium_operator_ipam_available
action: keep
relabel_configs:
- source_labels: [job]
separator: ;
regex: (.*)
target_label: __tmp_prometheus_job_name
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_instance, __meta_kubernetes_service_labelpresent_app_kubernetes_io_instance]
separator: ;
regex: (hubble-timescape);true
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name, __meta_kubernetes_service_labelpresent_app_kubernetes_io_name]
separator: ;
regex: (hubble-timescape-server);true
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_container_name]
separator: ;
regex: (.*)
target_label: container
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: metrics
action: replace
- source_labels: [__address__]
separator: ;
regex: (.*)
modulus: 1
target_label: __tmp_hash
replacement: $1
action: hashmod
- source_labels: [__tmp_hash]
separator: ;
regex: "0"
replacement: $1
action: keep
kubernetes_sd_configs:
- role: endpoints
kubeconfig_file: ""
follow_redirects: true
- job_name: integrations/cilium-enterprise/cilium-agent
honor_labels: true
honor_timestamps: true
scrape_interval: 10s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
follow_redirects: true
metric_relabel_configs:
- source_labels: [__name__]
regex: cilium_controllers_runs_duration_seconds_count|hubble_http_requests_total|cilium_operator_process_resident_memory_bytes|cilium_k8s_client_api_calls_total|cilium_controllers_runs_duration_seconds_sum|cilium_operator_ipam_nodes|cilium_datapath_conntrack_gc_entries|hubble_flows_processed_total|cilium_proxy_upstream_reply_seconds_count|timescape_ingestor_ingest_running|cilium_policy_l7_denied_total|timescape_ingestor_ingestfilter_filtered_skipped_total|cilium_operator_process_cpu_seconds_total|cilium_operator_ec2_api_duration_seconds_sum|cilium_controllers_runs_total|timescape_ingestor_ingestfilter_filtered_total|cilium_policy_l7_received_total|hubble_http_responses_total|cilium_version|timescape_clickhouse_queries_results_count|cilium_drop_bytes_total|cilium_ip_addresses|cilium_operator_ipam_interface_creation_ops|cilium_operator_ec2_api_rate_limit_duration_seconds_sum|cilium_k8s_client_api_latency_time_seconds_count|cilium_forward_bytes_total|isovalent_external_dns_proxy_processing_duration_seconds|cilium_services_events_total|cilium_endpoint_regeneration_time_stats_seconds_count|cilium_kubernetes_events_total|cilium_triggers_policy_update_call_duration_seconds_sum|cilium_triggers_policy_update_call_duration_seconds_count|timescape_ingestor_ingestfilter_filtered_errors_total|timescape_clickhouse_queries_results_sum|isovalent_external_dns_proxy_update_queue_size|cilium_operator_ec2_api_duration_seconds_count|cilium_operator_ec2_api_rate_limit_duration_seconds_count|timescape_ingestor_ingestlog_getinfo_queries|hubble_tcp_flags_total|hubble_icmp_total|cilium_policy_endpoint_enforcement_status|cilium_forward_count_total|cilium_datapath_conntrack_gc_duration_seconds_sum|hubble_dns_response_types_total|isovalent_external_dns_proxy_update_errors_total|cilium_operator_ipam_resync_total|cilium_drop_count_total|hubble_dns_queries_total|cilium_policy_l7_forwarded_total|cilium_agent_api_process_time_seconds_count|timescape_clickhouse_queries_duration_seconds_bucket|timescape_ingestor_ingestfilter_batch_duration_seconds_bucket|cilium_endpoint_regenerations_total|cilium_proxy_upstream_reply_seconds_sum|cilium_identity|cilium_k8s_client_api_latency_time_seconds_sum|cilium_bpf_map_ops_total|cilium_errors_warnings_total|cilium_datapath_conntrack_gc_duration_seconds_count|hubble_dns_responses_total|cilium_operator_ipam_ips|cilium_endpoint_state|cilium_endpoint_regeneration_time_stats_seconds_sum|hubble_port_distribution_total|cilium_nodes_all_num|isovalent_external_dns_proxy_policy_l7_total|cilium_datapath_conntrack_gc_runs_total|timescape_ingestor_ingest_duration_seconds_bucket|hubble_drop_total|cilium_agent_api_process_time_seconds_sum|timescape_ingestor_flows_ingested_total|cilium_proxy_redirects|cilium_datapath_conntrack_gc_key_fallbacks_total|hubble_http_request_duration_seconds_bucket|cilium_nodes_all_events_received_total|cilium_kubernetes_events_received_total|cilium_policy|cilium_operator_ipam_available
action: keep
relabel_configs:
- source_labels: [job]
separator: ;
regex: (.*)
target_label: __tmp_prometheus_job_name
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_label_k8s_app, __meta_kubernetes_service_labelpresent_k8s_app]
separator: ;
regex: (cilium);true
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_container_name]
separator: ;
regex: (.*)
target_label: container
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_label_k8s_app]
separator: ;
regex: (.+)
target_label: k8s_app
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: metrics
action: replace
- source_labels: [__address__]
separator: ;
regex: (.*)
modulus: 1
target_label: __tmp_hash
replacement: $1
action: hashmod
- source_labels: [__tmp_hash]
separator: ;
regex: "0"
replacement: $1
action: keep
kubernetes_sd_configs:
- role: endpoints
kubeconfig_file: ""
follow_redirects: true
- job_name: integrations/cilium-enterprise/cilium-operator
honor_labels: true
honor_timestamps: true
scrape_interval: 10s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
follow_redirects: true
metric_relabel_configs:
- source_labels: [__name__]
regex: cilium_controllers_runs_duration_seconds_count|hubble_http_requests_total|cilium_operator_process_resident_memory_bytes|cilium_k8s_client_api_calls_total|cilium_controllers_runs_duration_seconds_sum|cilium_operator_ipam_nodes|cilium_datapath_conntrack_gc_entries|hubble_flows_processed_total|cilium_proxy_upstream_reply_seconds_count|timescape_ingestor_ingest_running|cilium_policy_l7_denied_total|timescape_ingestor_ingestfilter_filtered_skipped_total|cilium_operator_process_cpu_seconds_total|cilium_operator_ec2_api_duration_seconds_sum|cilium_controllers_runs_total|timescape_ingestor_ingestfilter_filtered_total|cilium_policy_l7_received_total|hubble_http_responses_total|cilium_version|timescape_clickhouse_queries_results_count|cilium_drop_bytes_total|cilium_ip_addresses|cilium_operator_ipam_interface_creation_ops|cilium_operator_ec2_api_rate_limit_duration_seconds_sum|cilium_k8s_client_api_latency_time_seconds_count|cilium_forward_bytes_total|isovalent_external_dns_proxy_processing_duration_seconds|cilium_services_events_total|cilium_endpoint_regeneration_time_stats_seconds_count|cilium_kubernetes_events_total|cilium_triggers_policy_update_call_duration_seconds_sum|cilium_triggers_policy_update_call_duration_seconds_count|timescape_ingestor_ingestfilter_filtered_errors_total|timescape_clickhouse_queries_results_sum|isovalent_external_dns_proxy_update_queue_size|cilium_operator_ec2_api_duration_seconds_count|cilium_operator_ec2_api_rate_limit_duration_seconds_count|timescape_ingestor_ingestlog_getinfo_queries|hubble_tcp_flags_total|hubble_icmp_total|cilium_policy_endpoint_enforcement_status|cilium_forward_count_total|cilium_datapath_conntrack_gc_duration_seconds_sum|hubble_dns_response_types_total|isovalent_external_dns_proxy_update_errors_total|cilium_operator_ipam_resync_total|cilium_drop_count_total|hubble_dns_queries_total|cilium_policy_l7_forwarded_total|cilium_agent_api_process_time_seconds_count|timescape_clickhouse_queries_duration_seconds_bucket|timescape_ingestor_ingestfilter_batch_duration_seconds_bucket|cilium_endpoint_regenerations_total|cilium_proxy_upstream_reply_seconds_sum|cilium_identity|cilium_k8s_client_api_latency_time_seconds_sum|cilium_bpf_map_ops_total|cilium_errors_warnings_total|cilium_datapath_conntrack_gc_duration_seconds_count|hubble_dns_responses_total|cilium_operator_ipam_ips|cilium_endpoint_state|cilium_endpoint_regeneration_time_stats_seconds_sum|hubble_port_distribution_total|cilium_nodes_all_num|isovalent_external_dns_proxy_policy_l7_total|cilium_datapath_conntrack_gc_runs_total|timescape_ingestor_ingest_duration_seconds_bucket|hubble_drop_total|cilium_agent_api_process_time_seconds_sum|timescape_ingestor_flows_ingested_total|cilium_proxy_redirects|cilium_datapath_conntrack_gc_key_fallbacks_total|hubble_http_request_duration_seconds_bucket|cilium_nodes_all_events_received_total|cilium_kubernetes_events_received_total|cilium_policy|cilium_operator_ipam_available
action: keep
relabel_configs:
- source_labels: [job]
separator: ;
regex: (.*)
target_label: __tmp_prometheus_job_name
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_label_io_cilium_app, __meta_kubernetes_service_labelpresent_io_cilium_app]
separator: ;
regex: (operator);true
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_name, __meta_kubernetes_service_labelpresent_name]
separator: ;
regex: (cilium-operator);true
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_container_name]
separator: ;
regex: (.*)
target_label: container
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_label_io_cilium_app]
separator: ;
regex: (.+)
target_label: io_cilium_app
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: metrics
action: replace
- source_labels: [__address__]
separator: ;
regex: (.*)
modulus: 1
target_label: __tmp_hash
replacement: $1
action: hashmod
- source_labels: [__tmp_hash]
separator: ;
regex: "0"
replacement: $1
action: keep
kubernetes_sd_configs:
- role: endpoints
kubeconfig_file: ""
follow_redirects: true
- job_name: integrations/cilium-enterprise/hubble
honor_labels: true
honor_timestamps: true
scrape_interval: 10s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
follow_redirects: true
metric_relabel_configs:
- source_labels: [__name__]
regex: cilium_controllers_runs_duration_seconds_count|hubble_http_requests_total|cilium_operator_process_resident_memory_bytes|cilium_k8s_client_api_calls_total|cilium_controllers_runs_duration_seconds_sum|cilium_operator_ipam_nodes|cilium_datapath_conntrack_gc_entries|hubble_flows_processed_total|cilium_proxy_upstream_reply_seconds_count|timescape_ingestor_ingest_running|cilium_policy_l7_denied_total|timescape_ingestor_ingestfilter_filtered_skipped_total|cilium_operator_process_cpu_seconds_total|cilium_operator_ec2_api_duration_seconds_sum|cilium_controllers_runs_total|timescape_ingestor_ingestfilter_filtered_total|cilium_policy_l7_received_total|hubble_http_responses_total|cilium_version|timescape_clickhouse_queries_results_count|cilium_drop_bytes_total|cilium_ip_addresses|cilium_operator_ipam_interface_creation_ops|cilium_operator_ec2_api_rate_limit_duration_seconds_sum|cilium_k8s_client_api_latency_time_seconds_count|cilium_forward_bytes_total|isovalent_external_dns_proxy_processing_duration_seconds|cilium_services_events_total|cilium_endpoint_regeneration_time_stats_seconds_count|cilium_kubernetes_events_total|cilium_triggers_policy_update_call_duration_seconds_sum|cilium_triggers_policy_update_call_duration_seconds_count|timescape_ingestor_ingestfilter_filtered_errors_total|timescape_clickhouse_queries_results_sum|isovalent_external_dns_proxy_update_queue_size|cilium_operator_ec2_api_duration_seconds_count|cilium_operator_ec2_api_rate_limit_duration_seconds_count|timescape_ingestor_ingestlog_getinfo_queries|hubble_tcp_flags_total|hubble_icmp_total|cilium_policy_endpoint_enforcement_status|cilium_forward_count_total|cilium_datapath_conntrack_gc_duration_seconds_sum|hubble_dns_response_types_total|isovalent_external_dns_proxy_update_errors_total|cilium_operator_ipam_resync_total|cilium_drop_count_total|hubble_dns_queries_total|cilium_policy_l7_forwarded_total|cilium_agent_api_process_time_seconds_count|timescape_clickhouse_queries_duration_seconds_bucket|timescape_ingestor_ingestfilter_batch_duration_seconds_bucket|cilium_endpoint_regenerations_total|cilium_proxy_upstream_reply_seconds_sum|cilium_identity|cilium_k8s_client_api_latency_time_seconds_sum|cilium_bpf_map_ops_total|cilium_errors_warnings_total|cilium_datapath_conntrack_gc_duration_seconds_count|hubble_dns_responses_total|cilium_operator_ipam_ips|cilium_endpoint_state|cilium_endpoint_regeneration_time_stats_seconds_sum|hubble_port_distribution_total|cilium_nodes_all_num|isovalent_external_dns_proxy_policy_l7_total|cilium_datapath_conntrack_gc_runs_total|timescape_ingestor_ingest_duration_seconds_bucket|hubble_drop_total|cilium_agent_api_process_time_seconds_sum|timescape_ingestor_flows_ingested_total|cilium_proxy_redirects|cilium_datapath_conntrack_gc_key_fallbacks_total|hubble_http_request_duration_seconds_bucket|cilium_nodes_all_events_received_total|cilium_kubernetes_events_received_total|cilium_policy|cilium_operator_ipam_available
action: keep
relabel_configs:
- source_labels: [job]
separator: ;
regex: (.*)
target_label: __tmp_prometheus_job_name
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_label_k8s_app, __meta_kubernetes_service_labelpresent_k8s_app]
separator: ;
regex: (hubble);true
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: hubble-metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_container_name]
separator: ;
regex: (.*)
target_label: container
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: hubble-metrics
action: replace
- source_labels: [__address__]
separator: ;
regex: (.*)
modulus: 1
target_label: __tmp_hash
replacement: $1
action: hashmod
- source_labels: [__tmp_hash]
separator: ;
regex: "0"
replacement: $1
action: keep
kubernetes_sd_configs:
- role: endpoints
kubeconfig_file: ""
follow_redirects: true
- job_name: integrations/cilium-enterprise/hubble-enterprise
honor_labels: true
honor_timestamps: true
scrape_interval: 10s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
follow_redirects: true
metric_relabel_configs:
- source_labels: [__name__]
regex: cilium_controllers_runs_duration_seconds_count|hubble_http_requests_total|cilium_operator_process_resident_memory_bytes|cilium_k8s_client_api_calls_total|cilium_controllers_runs_duration_seconds_sum|cilium_operator_ipam_nodes|cilium_datapath_conntrack_gc_entries|hubble_flows_processed_total|cilium_proxy_upstream_reply_seconds_count|timescape_ingestor_ingest_running|cilium_policy_l7_denied_total|timescape_ingestor_ingestfilter_filtered_skipped_total|cilium_operator_process_cpu_seconds_total|cilium_operator_ec2_api_duration_seconds_sum|cilium_controllers_runs_total|timescape_ingestor_ingestfilter_filtered_total|cilium_policy_l7_received_total|hubble_http_responses_total|cilium_version|timescape_clickhouse_queries_results_count|cilium_drop_bytes_total|cilium_ip_addresses|cilium_operator_ipam_interface_creation_ops|cilium_operator_ec2_api_rate_limit_duration_seconds_sum|cilium_k8s_client_api_latency_time_seconds_count|cilium_forward_bytes_total|isovalent_external_dns_proxy_processing_duration_seconds|cilium_services_events_total|cilium_endpoint_regeneration_time_stats_seconds_count|cilium_kubernetes_events_total|cilium_triggers_policy_update_call_duration_seconds_sum|cilium_triggers_policy_update_call_duration_seconds_count|timescape_ingestor_ingestfilter_filtered_errors_total|timescape_clickhouse_queries_results_sum|isovalent_external_dns_proxy_update_queue_size|cilium_operator_ec2_api_duration_seconds_count|cilium_operator_ec2_api_rate_limit_duration_seconds_count|timescape_ingestor_ingestlog_getinfo_queries|hubble_tcp_flags_total|hubble_icmp_total|cilium_policy_endpoint_enforcement_status|cilium_forward_count_total|cilium_datapath_conntrack_gc_duration_seconds_sum|hubble_dns_response_types_total|isovalent_external_dns_proxy_update_errors_total|cilium_operator_ipam_resync_total|cilium_drop_count_total|hubble_dns_queries_total|cilium_policy_l7_forwarded_total|cilium_agent_api_process_time_seconds_count|timescape_clickhouse_queries_duration_seconds_bucket|timescape_ingestor_ingestfilter_batch_duration_seconds_bucket|cilium_endpoint_regenerations_total|cilium_proxy_upstream_reply_seconds_sum|cilium_identity|cilium_k8s_client_api_latency_time_seconds_sum|cilium_bpf_map_ops_total|cilium_errors_warnings_total|cilium_datapath_conntrack_gc_duration_seconds_count|hubble_dns_responses_total|cilium_operator_ipam_ips|cilium_endpoint_state|cilium_endpoint_regeneration_time_stats_seconds_sum|hubble_port_distribution_total|cilium_nodes_all_num|isovalent_external_dns_proxy_policy_l7_total|cilium_datapath_conntrack_gc_runs_total|timescape_ingestor_ingest_duration_seconds_bucket|hubble_drop_total|cilium_agent_api_process_time_seconds_sum|timescape_ingestor_flows_ingested_total|cilium_proxy_redirects|cilium_datapath_conntrack_gc_key_fallbacks_total|hubble_http_request_duration_seconds_bucket|cilium_nodes_all_events_received_total|cilium_kubernetes_events_received_total|cilium_policy|cilium_operator_ipam_available
action: keep
relabel_configs:
- source_labels: [job]
separator: ;
regex: (.*)
target_label: __tmp_prometheus_job_name
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_instance, __meta_kubernetes_service_labelpresent_app_kubernetes_io_instance]
separator: ;
regex: (hubble-enterprise);true
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_managed_by,
__meta_kubernetes_service_labelpresent_app_kubernetes_io_managed_by]
separator: ;
regex: (Helm);true
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name, __meta_kubernetes_service_labelpresent_app_kubernetes_io_name]
separator: ;
regex: (hubble-enterprise);true
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_helm_sh_chart, __meta_kubernetes_service_labelpresent_helm_sh_chart]
separator: ;
regex: (hubble-enterprise-9999.9999.9999-dev);true
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_container_name]
separator: ;
regex: (.*)
target_label: container
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: metrics
action: replace
- source_labels: [__meta_kubernetes_pod_node_name]
separator: ;
regex: (.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__address__]
separator: ;
regex: (.*)
modulus: 1
target_label: __tmp_hash
replacement: $1
action: hashmod
- source_labels: [__tmp_hash]
separator: ;
regex: "0"
replacement: $1
action: keep
kubernetes_sd_configs:
- role: endpoints
kubeconfig_file: ""
follow_redirects: true
EOF
(export NAMESPACE=default && kubectl apply -n $NAMESPACE -f -)
If you deployed the Agent into a non-default Namespace in the previous step, replace NAMESPACE=default
in this command with the new Namespace.
This ConfigMap confiures the Agent to scape the Cilium Agent
, Cilium Operator
, Hubble
, and Hubble Timesscape
resources in your cluster and ship these scraped metrics to Grafana Cloud.
This ConfigMap configures the Agent to scrape the cadvisor
and kubelet
endpoints in your cluster and ship these scraped metrics to Grafana Cloud.
To learn more about configuring the Agent, please see Configure Grafana Agent from the Agent docs.
Deploy Grafana Agent resources
In this step you’ll install the Grafana Agent and its required resources into your cluster.
Run the following command from your shell to install the Grafana Agent into the default
Namespace of your Kubernetes cluster:
MANIFEST_URL=https://raw.githubusercontent.com/grafana/agent/v0.24.2/production/kubernetes/agent-bare.yaml NAMESPACE=default /bin/sh -c "$(curl -fsSL https://raw.githubusercontent.com/grafana/agent/v0.24.2/production/kubernetes/install-bare.sh)" | kubectl apply -f -
This installs a single replica Grafana Agent StatefulSet into your cluster and configures RBAC permissions for the Agent. If you would like to deploy the Agent into a different Namespace, change the NAMESPACE=default
variable, ensuring that this Namespace already exists.
Dashboards
The Cilium Enterprise integration installs the following dashboards in your Grafana Cloud instance to help monitor your metrics.
- Cilium / Agent Overview
- Cilium / Components / API
- Cilium / Components / Agent
- Cilium / Components / BPF
- Cilium / Components / Conntrack
- Cilium / Components / Datapath
- Cilium / Components / External HA FQDN Proxy
- Cilium / Components / FQDN Proxy
- Cilium / Components / Identities
- Cilium / Components / Kubernetes
- Cilium / Components / L3 Policy
- Cilium / Components / L7 Proxy
- Cilium / Components / Network
- Cilium / Components / Nodes
- Cilium / Components / Policy
- Cilium / Components / Resource Utilization
- Cilium / Operator
- Cilium / Overview
- Hubble / Overview
- Hubble / Timescape
Cilium Overview
Cilium Overview
Cilium Agent Overview
Cilium Agent Overview
Cilium Operator Overview
Cilium Operator Overview
Hubble Overview
Hubble Timescape
Cilium Agent - Meta monitoring
Cilium Agent - Meta monitoring
Alerts
The Cilium Enterprise integration includes the following useful alerts:
Cilium Endpoints
Alert | Description |
---|---|
CiliumAgentEndpointFailures | Warning: Cilium Agent endpoints in the invalid state. |
CiliumAgentEndpointUpdateFailure | Warning: API calls to Cilium Agent API to create or update Endpoints are failing. |
CiliumAgentContainerNetworkInterfaceApiErrorEndpointCreate | Info: Cilium Endpoint API endpoint rate limiter is reporting errors while doing endpoint create. |
CiliumAgentApiEndpointErrors | Warning: API calls to Cilium Endpoints API are failing due to server errors. |
Cilium IPAM
Alert | Description |
---|---|
CiliumOperatorExhaustedIpamIps | Critical: Cilium Operator has exhausted its IPAM IPs. |
CiliumOperatorLowAvailableIpamIps | Warning: Cilium Operator has used up over 90% of its available IPs. |
Cilium Maps
Alert | Description |
---|---|
CiliumAgentMapOperationFailures | Warning: Cilium Agent is experiencing errors updating BPF maps on Agent Pod. |
CiliumAgentBpfMapPressure | Warning: Map on Cilium Agent Pod is currently experiencing high map pressure. |
Cilium NAT
Alert | Description |
---|---|
CiliumAgentNatTableFull | Critical: Cilium Agent Pod is dropping packets due to “No mapping for NAT masquerade” errors. |
Cilium API
Alert | Description |
---|---|
CiliumAgentApiHighErrorRate | Info: Cilium Agent API on Pod is experiencing a high error rate. |
Cilium Conntrack
Alert | Description |
---|---|
CiliumAgentConntrackTableFull | Critical: Ciliums conntrack map is failing on new insertions on Agent Pod. |
CiliumAgentConnTrackFailedGarbageCollectorRuns | Warning: Cilium Agent Conntrack GC runs are failing on Agent Pod. |
Cilium Drops
Alert | Description |
---|---|
CiliumAgentHighDeniedRate | Info: Cilium Agent is experiencing a high drop rate due to policy rule denies. |
Cilium Policy
Alert | Description |
---|---|
CiliumAgentPolicyMapPressure | Warning: Cilium Agent is experiencing high BPF map pressure. |
Cilium Identity
Alert | Description |
---|---|
CiliumNodeLocalHighIdentityAllocation | Warning: Cilium is using a very high percent (over 80%) of its maximum per-node identity limit (65535). |
RunningOutOfCiliumClusterIdentities | Warning: Cilium is using a very high percent of its maximum cluster identity limit (65280). |
Cilium Nodes
Alert | Description |
---|---|
CiliumUnreachableNodes | Info: Cilium Agent is reporting unreachable Nodes in the cluster. |
Metrics
The following metrics are automatically written to your Grafana Cloud instance by connecting your Cilium Enterprise instance through this integration:
- cilium_agent_api_process_time_seconds_count
- cilium_agent_api_process_time_seconds_sum
- cilium_api_limiter_processed_requests_total
- cilium_bpf_map_ops_total
- cilium_bpf_map_pressure
- cilium_controllers_runs_duration_seconds_count
- cilium_controllers_runs_duration_seconds_sum
- cilium_controllers_runs_total
- cilium_datapath_conntrack_gc_duration_seconds_count
- cilium_datapath_conntrack_gc_duration_seconds_sum
- cilium_datapath_conntrack_gc_entries
- cilium_datapath_conntrack_gc_key_fallbacks_total
- cilium_datapath_conntrack_gc_runs_total
- cilium_drop_bytes_total
- cilium_drop_count_total
- cilium_endpoint_regeneration_time_stats_seconds_count
- cilium_endpoint_regeneration_time_stats_seconds_sum
- cilium_endpoint_regenerations_total
- cilium_endpoint_state
- cilium_errors_warnings_total
- cilium_forward_bytes_total
- cilium_forward_count_total
- cilium_identity
- cilium_ip_addresses
- cilium_k8s_client_api_calls_total
- cilium_k8s_client_api_latency_time_seconds_count
- cilium_k8s_client_api_latency_time_seconds_sum
- cilium_kubernetes_events_received_total
- cilium_kubernetes_events_total
- cilium_nodes_all_events_received_total
- cilium_nodes_all_num
- cilium_operator_ces_queueing_delay_seconds_bucket
- cilium_operator_ces_sync_errors_total
- cilium_operator_ec2_api_duration_seconds_bucket
- cilium_operator_identity_gc_entries
- cilium_operator_identity_gc_runs
- cilium_operator_ipam_allocation_ops
- cilium_operator_ipam_deficit_resolver_duration_seconds_bucket
- cilium_operator_ipam_interface_creation_ops
- cilium_operator_ipam_ips
- cilium_operator_ipam_k8s_sync_queued_total
- cilium_operator_ipam_nodes
- cilium_operator_ipam_resync_queued_total
- cilium_operator_ipam_resync_total
- cilium_operator_number_of_ceps_per_ces_sum
- cilium_operator_process_cpu_seconds_total
- cilium_operator_process_open_fds
- cilium_operator_process_resident_memory_bytes
- cilium_operator_process_virtual_memory_bytes
- cilium_policy
- cilium_policy_endpoint_enforcement_status
- cilium_policy_l7_denied_total
- cilium_policy_l7_forwarded_total
- cilium_policy_l7_received_total
- cilium_proxy_redirects
- cilium_proxy_upstream_reply_seconds_count
- cilium_proxy_upstream_reply_seconds_sum
- cilium_services_events_total
- cilium_triggers_policy_update_call_duration_seconds_count
- cilium_triggers_policy_update_call_duration_seconds_sum
- cilium_unreachable_nodes
- cilium_version
- hubble_dns_queries_total
- hubble_dns_response_types_total
- hubble_dns_responses_total
- hubble_drop_total
- hubble_flows_processed_total
- hubble_http_request_duration_seconds_bucket
- hubble_http_requests_total
- hubble_http_responses_total
- hubble_icmp_total
- hubble_port_distribution_total
- hubble_tcp_flags_total
- isovalent_external_dns_proxy_policy_l7_total
- isovalent_external_dns_proxy_processing_duration_seconds
- isovalent_external_dns_proxy_update_errors_total
- isovalent_external_dns_proxy_update_queue_size
- timescape_clickhouse_queries_duration_seconds_bucket
- timescape_clickhouse_queries_results_count
- timescape_clickhouse_queries_results_sum
- timescape_ingestor_flows_ingested_total
- timescape_ingestor_ingest_duration_seconds_bucket
- timescape_ingestor_ingest_running
- timescape_ingestor_ingestfilter_batch_duration_seconds_bucket
- timescape_ingestor_ingestfilter_filtered_errors_total
- timescape_ingestor_ingestfilter_filtered_skipped_total
- timescape_ingestor_ingestfilter_filtered_total
- timescape_ingestor_ingestlog_getinfo_queries
Changelog
# 0.0.1 - October 2022
- Initial release
Cost
By connecting your Cilium Enterprise instance to Grafana Cloud you might incur charges. To view information on the number of active series that your Grafana Cloud account uses for metrics included in each Cloud tier, see Active series and dpm usage and Cloud tier pricing.