Grafana Cloud Quickstart GuidesInstalling Grafana Cloud Agent on Kubernetes

Installing Grafana Cloud Agent on Kubernetes

In this guide, you’ll learn how to deploy the Grafana Cloud Agent to a Kubernetes (K8s) cluster and configure it to scrape the K8s API server, node_exporter Pods, and itself.

Grafana Cloud Agent is an observability data collector optimized for sending metrics, log and trace data to Grafana Cloud. The Agent uses the same code as Prometheus, but tackles these issues by only using the most relevant parts of Prometheus for interaction with hosted metrics:

  • Service Discovery
  • Scraping
  • Write Ahead Log (WAL)
  • Remote Write

A typical deployment of the Grafana Cloud Agent for Prometheus metrics can see up to a 40% reduction in memory usage with equal scrape loads.

In a K8s cluster, you can monitor multiple components and systems:

  • The K8s control plane (API server, Scheduler, Controller, etcd, kubelet etc.)
  • The cluster Nodes
  • Containers and apps within the cluster

This guide demonstrates how to:

  • Roll out a single Agent Deployment to scrape the K8s control plane (to reduce metric usage, in this guide the Agent only scrapes a limited set of API server metrics)
  • Roll out node-exporter and Agent as a DameonSet on all the cluster Nodes to provide Linux OS and system metrics (again, these agents only keep a limited set of node-exporter metrics)
  • Configure the Agent Deployment and DaemonSet to also scrape themselves, and ship these Agent metrics to Grafana Cloud

Note: At the present time, Grafana Cloud Agent does not support using Integrations in Kubernetes environments. This feature will be available soon.

Prerequisites

Before you begin this quickstart, you should have the following available to you:

Step 1. Setting up Grafana Cloud Agent RBAC permissions

In this step you’ll set up the appropriate RBAC permissions for the Agents running in your Kubernetes cluster. To do this, you’ll create ServiceAccount, ClusterRole, and ClusterRoleBinding Kubernetes objects. To learn more about these objects, please see Using RBAC Authorization from the Kubernetes docs.

Begin by opening a file called agent_rbac.yaml in your favorite editor. Paste in the following manifest YAML:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: grafana-agent
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: grafana-agent
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - nodes/proxy
  - services
  - endpoints
  - pods
  verbs:
  - get
  - list
  - watch
- nonResourceURLs:
  - /metrics
  verbs:
  - get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: grafana-agent
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: grafana-agent
subjects:
- kind: ServiceAccount
  name: grafana-agent
  namespace: default

This manifest creates a ServiceAccount and ClusterRole called grafana-agent, and binds the ClusterRole to the ServiceAccount using a ClusterRoleBinding. The ServiceAccount is created in the default Namespace.

The ClusterRole grants the necessary permissions for Agents to query the API and fetch Kubernetes workloads like running Nodes, Endpoints, Pods, and Services. Many of these are used for Agent’s Kubernetes service discovery. To learn more about Kubernetes service discovery with Grafana Cloud Agent, please see <kubernetes_sd_config> from the Agent GitHub docs. These configuration parameters are based on Prometheus’s service discovery mechanism. To learn more about this, please see <kubernetes_sd_config> from the Prometheus configuration docs.

When you’re done editing the file, save and close it.

Roll out the objects in your cluster using kubectl apply -f :

kubectl apply -f agent_rbac.yaml
serviceaccount/grafana-agent created
clusterrole.rbac.authorization.k8s.io/grafana-agent created
clusterrolebinding.rbac.authorization.k8s.io/grafana-agent created

With the appropriate RBAC permissions set up, you’re ready to move on to rolling out the Agent ConfigMaps.

Step 2. Configuring and deploying the Agent ConfigMaps

In this step, you’ll create the ConfigMaps for the Agent DaemonSet and Deployment. Kubernetes ConfigMaps allow you to store a workload’s configuration in the cluster, and reference it in the Deployment and DaemonSet manifests. To learn more about ConfigMap objects, please see ConfigMap from the K8s docs.

In this quickstart you’ll roll out an Agent DaemonSet as well as an Agent Deployment. The DaemonSet will run an Agent on each cluster Node (machine) and scrape only Pods and workloads running on that machine. This local Node filtering is enabled using the host_filter: true configuration parameter. To learn more, please see Host Filtering from the Agent GitHub repo.

The DaemonSet will run an Agent on each Node that scrapes:

  • Itself (the Agent exposes Prometheus-style metrics using its own /metrics endpoint)
  • A local node_exporter /metrics endpoint for hardware and OS metrics. To learn more about node_exporter, please see the node_exporter GitHub repo.

The Deployment will run an Agent that scrapes:

  • The Kubernetes API /metrics endpoint.
  • Itself

This dual architecture is currently recommended to reduce query load on the Kubernetes API. If you’re scraping additional control plane components, you should do this using the Deployment Agent, and not the per-Node Agent daemons. You may also consider using the Deployment Agent to scrape the Agent daemons running on the cluster Nodes. You can learn more about monitoring the Agent itself in Monitoring the Grafana Cloud Agent.

Begin by defining the DaemonSet Agent ConfigMap. Open a file called agent_ds_configmap.yaml in your editor of choice. Paste in the following K8s manifest:

apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-agent
data:
  agent.yml: |
    server:
        log_level: info
    prometheus:
        global:
            scrape_interval: 15s
        wal_directory: /var/lib/agent/data
        configs:
          - host_filter: true
            name: agent
            remote_write:
              - basic_auth:
                    password: <your_grafana_cloud_metrics_api_key_here>
                    username: <your_grafana_cloud_metrics_username_here>
                url: https://prometheus-us-central1.grafana.net/api/prom/push
            scrape_configs:
              - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
                job_name: default/node-exporter
                kubernetes_sd_configs:
                  - namespaces:
                        names:
                          - default
                    role: pod
                relabel_configs:
                  - action: keep
                    regex: node-exporter
                    source_labels:
                      - __meta_kubernetes_pod_label_name
                  - action: replace
                    source_labels:
                      - __meta_kubernetes_pod_node_name
                    target_label: instance
                  - action: replace
                    source_labels:
                      - __meta_kubernetes_namespace
                    target_label: namespace
                metric_relabel_configs:
                  - action: keep
                    regex: node_time_seconds|node_boot_time_seconds|node_cpu_seconds_total|node_memory_MemTotal_bytes|node_cpu_seconds_total|node_memory_MemFree_bytes|node_memory_MemTotal_bytes|node_filesystem_free_bytes|node_filesystem_size_bytes|node_network_receive_bytes_total|node_network_transmit_bytes_total|node_filesystem_free_bytes|node_cpu_seconds_total|node_memory_MemFree_bytes|node_memory_MemTotal_bytes|node_memory_MemAvailable_bytes|node_network_receive_bytes_total|node_network_transmit_bytes_total|node_network_receive_errs_total|node_network_receive_drop_total|node_network_transmit_errs_total|node_network_transmit_drop_total|node_filesystem_free_bytes|node_disk_read_bytes_total|node_disk_written_bytes_total|node_disk_read_time_seconds_total|node_disk_write_time_seconds_total
                    source_labels:
                      - __name__               
                tls_config:
                    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
                    insecure_skip_verify: false
              - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
                job_name: default/agent
                kubernetes_sd_configs:
                  - namespaces:
                        names:
                          - default
                    role: pod
                relabel_configs:
                  - action: keep
                    regex: grafana-agent
                    source_labels:
                      - __meta_kubernetes_pod_label_name
                  - action: replace
                    source_labels:
                      - __meta_kubernetes_pod_node_name
                    target_label: instance
                  - action: replace
                    source_labels:
                      - __meta_kubernetes_namespace
                    target_label: namespace
                tls_config:
                    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
                    insecure_skip_verify: false

This manifest:

  • Instructs the Agent to use host filtering to only scrape targets running on the Node that it’s deployed on
  • Sets credentials and the URL used to push scraped metrics to Grafana Cloud’s Prometheus metrics endpoint
  • Defines a scrape configuration for the node_exporter Pod, keeping only a curated set of metrics to reduce Grafana Cloud metric usage
  • Defines a scrape configuration for the agent to scrape itself

Be sure to insert your Grafana Cloud API key and username where indicated above. You can find your username by navigating to your stack in the Cloud Portal and clicking Details next to the Prometheus panel. Your password corresponds to the API key that you generated in the prerequisites section. You can also generate one in this same panel by clicking on Generate now.

Prometheus-style scrape configurations can be quite involved. To learn more about the relabel_configs and metric_relabel_configs in the above manifest, please see Agent’s Configuration Reference, which is based on Prometheus’s Configuration Reference. You may also wish to consult Reducing Prometheus metrics usage with relabeling.

At a high-level, the above configures two scrape jobs that search for the relevant local Pods using a regex, then sets labels like instance and namespace that can be used to query and filter metrics in Grafana.

When you’re done editing, save and close the file.

Roll out the objects in your cluster using kubectl apply -f :

kubectl apply -f agent_ds_configmap.yaml
configmap/grafana-agent created

With the Agent DaemonSet ConfigMap created, repeat this process for the Agent Deployment ConfigMap.

Open a file called agent_deploy_configmap.yaml. Paste in the following:

apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-agent-deployment
data:
  agent.yml: |
    server:
        log_level: info
    prometheus:
        global:
            scrape_interval: 15s
        wal_directory: /var/lib/agent/data
        configs:
          - host_filter: false
            name: agent
            remote_write:
              - basic_auth:
                    password: <your_grafana_cloud_metrics_api_key_here>
                    username: <your_grafana_cloud_metrics_username_here>
                url: https://prometheus-us-central1.grafana.net/api/prom/push
            scrape_configs:
              - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
                job_name: default/kubernetes
                kubernetes_sd_configs:
                  - role: endpoints
                relabel_configs:
                  - action: keep
                    regex: apiserver
                    source_labels:
                      - __meta_kubernetes_service_label_component
                metric_relabel_configs:
                  - action: keep
                    regex: workqueue_queue_duration_seconds_bucket|process_cpu_seconds_total|process_resident_memory_bytes|workqueue_depth|rest_client_request_duration_seconds_bucket|workqueue_adds_total|up|rest_client_requests_total|apiserver_request_total|go_goroutines
                    source_labels:
                      - __name__        
                scheme: https
                tls_config:
                    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
                    insecure_skip_verify: false
                    server_name: kubernetes
              - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
                job_name: default/agent-deployment
                kubernetes_sd_configs:
                  - namespaces:
                        names:
                          - default
                    role: pod
                relabel_configs:
                  - action: keep
                    regex: grafana-agent-deployment
                    source_labels:
                      - __meta_kubernetes_pod_label_name
                  - action: replace
                    source_labels:
                      - __meta_kubernetes_pod_node_name
                    target_label: instance
                  - action: replace
                    source_labels:
                      - __meta_kubernetes_namespace
                    target_label: namespace
                tls_config:
                    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
                    insecure_skip_verify: false

Note the different workload name, grafana-agent-deployment Also note that host_filter is disabled, so the agent will not limit targets to those running on same Node. This allows the Agent to scrape the K8s control plane. Finally, note the two scrape jobs:

  • One to scrape the K8s API server, keeping only a curated set of metrics to reduce Grafana Cloud metric usage
  • One to scrape itself (use the regex grafana-agent-deployment to filter targets)

Be sure to set the appropriate username and password parameters in the basic_auth section. These are the same as used to configure the Agent DaemonSet.

When you’re done, save and close the file.

Roll out the objects in your cluster using kubectl apply -f :

kubectl apply -f agent_deploy_configmap.yaml
configmap/grafana-agent-deployment created

Now that you’ve set up the requisite configuration for the Agent DaemonSet and Deployment, you can roll these workloads out into your Kubernetes cluster.

Step 3. Configuring and deploying the Agent DaemonSet

In this step, you’ll configure and deploy a DaemonSet to manage running an Agent on all of your cluster Nodes. Kubernetes DaemonSets ensure that every Node in your cluster runs a copy of the configured Pod. To learn more about this controller type, please see DaemonSet from the K8s docs.

Open a file called agent_ds.yaml and paste in the following manifest:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: grafana-agent
spec:
  minReadySeconds: 10
  selector:
    matchLabels:
      name: grafana-agent
  template:
    metadata:
      labels:
        name: grafana-agent
    spec:
      containers:
      - args:
        - -config.file=/etc/agent/agent.yml
        - -prometheus.wal-directory=/tmp/agent/data
        command:
        - /bin/agent
        env:
        - name: HOSTNAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        image: grafana/agent:v0.9.1
        imagePullPolicy: IfNotPresent
        name: agent
        ports:
        - containerPort: 80
          name: http-metrics
        securityContext:
          privileged: true
          runAsUser: 0
        volumeMounts:
        - mountPath: /etc/agent
          name: grafana-agent
      serviceAccount: grafana-agent
      tolerations:
      - effect: NoSchedule
        operator: Exists
      volumes:
      - configMap:
          name: grafana-agent
        name: grafana-agent
  updateStrategy:
    type: RollingUpdate

This DaemonSet deploys a Pod on each Node labeled with the name: grafana-agent label pair. Our Agent configuration uses this label key to identify Agent Pods as targets to scrape.

Next, it defines a container that runs the /bin/agent command with some configuration flags, and sets the HOSTNAME environment variable from the K8s Node name. It defines the container image and ports, as well as security parameters necessary to run the Agent. Finally , it mounts the ConfigMap created in the previous step to the /etc/agent/agent.yml path.

To learn more about available configuration flags, please see Configuration Reference from the Agent repo docs.

When you’ve finished editing this file, save and close it.

Roll out the DaemonSet in your cluster using kubectl apply -f :

kubectl apply -f agent_ds.yaml
daemonset.apps/grafana-agent created

Verify that the Agent Pods started up correctly using kubectl get:

kubectl get pod -o wide
NAME                  READY   STATUS    RESTARTS   AGE   IP             NODE                   NOMINATED NODE   READINESS GATES
grafana-agent-2vl74   1/1     Running   0          48s   10.244.0.6     pool-rq2ddri5k-3zptp   <none>           <none>
grafana-agent-ktvwv   1/1     Running   0          48s   10.244.1.14    pool-rq2ddri5k-3zptl   <none>           <none>
grafana-agent-pssx8   1/1     Running   0          48s   10.244.0.176   pool-rq2ddri5k-3zpt2   <none>           <none>

This confirms that we’ve deployed the Agent to all three Nodes in our K8s cluster. The Agents should have begun scraping their own /metrics endpoints. To confirm this, navigate to Grafana Cloud and use the Explore view in the Grafana interface to begin querying your data. For example, you can query the go_memstats_heap_inuse_bytes to see the Agent’s active memory usage. Note that we haven’t yet deployed node-exporter to the cluster, so those metrics won’t appear in Grafana Cloud until we deploy the node-exporter workload.

Now that the Agent Pods are up and running on our cluster Nodes, we can roll out the Agent Deployment to scrape the K8s API server.

Step 4. Configuring and rolling out the Agent Deployment

In this step, you’ll roll out another Agent using a K8s Deployment. This agent will scrape the K8s API server and any other control plane components you’d like to monitor.

Note: Scraping the K8s API server can result in signifcant metric usage. You can allowlist needed metrics or drop high-cardinality metrics to control your active series usage, as is demonstrated in Step 2. To learn more, please see Controlling Prometheus metrics usage.

Open a file called agent_deploy.yaml in your editor:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: grafana-agent-deployment
spec:
  minReadySeconds: 10
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      name: grafana-agent-deployment
  template:
    metadata:
      labels:
        name: grafana-agent-deployment
    spec:
      containers:
      - args:
        - -config.file=/etc/agent/agent.yml
        - -prometheus.wal-directory=/tmp/agent/data
        command:
        - /bin/agent
        env:
        - name: HOSTNAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        image: grafana/agent:v0.9.1
        imagePullPolicy: IfNotPresent
        name: agent
        ports:
        - containerPort: 80
          name: http-metrics
        securityContext:
          privileged: true
          runAsUser: 0
        volumeMounts:
        - mountPath: /etc/agent
          name: grafana-agent-deployment
      serviceAccount: grafana-agent
      volumes:
      - configMap:
          name: grafana-agent-deployment
        name: grafana-agent-deployment

This manifest closely ressembles the manifest used to roll out the Agent DaemonSet. We set the number of replicas to 1 and define a name: grafana-agent-deployment label. We also use the grafana-agent-deployment ConfigMap instead of grafana-agent.

When you’re done editing, save and close the file.

Roll out the objects in your cluster using kubectl apply -f :

kubectl apply -f agent_deploy.yaml
deployment.apps/grafana-agent-deployment created

You can confirm that the Agent Deployment is up and running using kubectl get :

kubectl get deploy -o wide
NAME                       READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS   IMAGES                 SELECTOR
grafana-agent-deployment   1/1     1            1           84s   agent        grafana/agent:v0.9.1   name=grafana-agent-deployment

You’re now ready to roll out node_exporter to your cluster nodes.

Step 5. Configuring and rolling out the Node-exporter DaemonSet

In this step you’ll roll out node_exporter as a DaemonSet so that a node_exporter Pod runs on each of your cluster Nodes. Node_exporter exposes hardware and OS metrics for your system and provides a variety of configurable collectors that allows you to tweak which metrics to expose and collect. To learn more, please consult the node_exporter GitHub repository.

Note: Scraping node_exporter can result in signifcant metric usage. You can allowlist needed metrics or drop high-cardinality metrics to control your active series usage, as is demonstrated in Step 2. To learn more, please see Controlling Prometheus metrics usage.

Begin by creating a file called node_exporter_ds.yaml in your editor. Paste in the following:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  labels:
    name: node-exporter
  name: node-exporter
spec:
  selector:
    matchLabels:
      name: node-exporter
  template:
    metadata:
      labels:
        name: node-exporter
    spec:
      containers:
      - args:
        - --web.listen-address=0.0.0.0:9100
        - --path.sysfs=/host/sys
        - --path.rootfs=/host/root
        - --no-collector.wifi
        - --no-collector.hwmon
        - --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+|var/lib/kubelet/pods/.+)($|/)
        image: quay.io/prometheus/node-exporter:v1.0.1
        name: node-exporter
        ports:
        - containerPort: 9100
        resources:
          limits:
            cpu: 250m
            memory: 180Mi
          requests:
            cpu: 102m
            memory: 180Mi
        volumeMounts:
        - mountPath: /host/sys
          mountPropagation: HostToContainer
          name: sys
          readOnly: true
        - mountPath: /host/root
          mountPropagation: HostToContainer
          name: root
          readOnly: true
      hostNetwork: true
      hostPID: true
      nodeSelector:
        kubernetes.io/os: linux
      securityContext:
        runAsNonRoot: true
        runAsUser: 65534
      tolerations:
      - operator: Exists
      volumes:
      - hostPath:
          path: /sys
        name: sys
      - hostPath:
          path: /
        name: root
  updateStrategy:
    rollingUpdate:
      maxUnavailable: 10%
    type: RollingUpdate

This is a lightly modified version of the node_exporter DaemonSet found in the kube-prometheus GitHub repository. The kube-prometheus project allows you to quickly set up a Prometheus-Alertmanager-Grafana cluster monitoring stack with preconfigured Grafana dashboards and alerts. These testing defaults should be modified depending on your production use case.

The manifest defines a node_exporter DaemonSet that runs on only Linux nodes. Using command-line flags, it configures monitoring paths and collectors. It sets sane defaults for resource requests and limits, defines the containerPort, and mounts needed Node paths to collect data from. It also sets necessary permissions to access and collect this data.

You can also reduce metrics usage by tweaking which collectors to enable. To learn more, please see Collectors from the node-exporter GitHub repository.

When you’re done modifying this file, save and close it.

Roll out the objects in your cluster using kubectl apply -f :

kubectl apply -f node_exporter_ds.yaml
daemonset.apps/node-exporter created

You can confirm that the node-exporter DaemonSet is up and running using kubectl get :

kubectl get deploy -o wide
NAME            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE   CONTAINERS      IMAGES                                    SELECTOR
grafana-agent   3         3         3       3            3           <none>                   20h   agent           grafana/agent:v0.9.1                      name=grafana-agent
node-exporter   3         3         3       3            3           kubernetes.io/os=linux   34s   node-exporter   quay.io/prometheus/node-exporter:v1.0.1   name=node-exporter

You’re now ready to roll out node_exporter to your cluster nodes.

This confirms that we’ve deployed node-exporter to all three Nodes in our K8s cluster. The local Node Agent should have begun scraping its local node-exporter /metrics endpoint. To confirm this, navigate to Grafana Cloud and use the Explore view in the Grafana interface to begin querying your data. For example, you can query the node_memory_MemTotal_bytes metric to see the total amount of memory on the system. You should see n series, where n is the number of Nodes in your Kubernetes cluster.

At this point you’ve successfully rolled out the Grafana Cloud Agent into your Kubernetes cluster and are collecting and storing Agent, K8s API server, and node-exporter metrics on Grafana Cloud.

In the next step, we’ll import a node-exporter dashboard from the Grafana community dashboards site.

Step 6. Importing a Node-Exporter Dashboard

In this step you’ll import a popular node-exporter dashboard to visualize the metrics being scraped from your cluster Nodes.

Begin by navigating to the following URL: https://grafana.com/grafana/dashboards/10180

Copy the Dashboard ID found on the right-hand-side of the screen. The ID for this dashboard is 10180.

Navigate to Grafana and click on the Dashboards UI icon, then Manage:

Dashboards

From here, click Import, and in the Import via grafana.com field, enter 10180.

On the following screen, you can optionally name your dashboard and select a folder for it. Under the Prometheus dropdown, select the datasource corresponding to your Grafana Cloud Prometheus metrics instance. This instance should end with a -prom suffix. Hit Import when you’re ready to import the dashboard.

You’ll be taken to the dashboard where you can begin digging in to your Node’s system metrics. Using the Host dropdown, you can select different Nodes in your cluster:

Node Exporter Dashboard

Conclusion

In this guide you rolled out the Grafana Cloud Agent using two Kubernetes workloads:

  • A DaemonSet to deploy an Agent Pod on each Node to scrape itself, and the node-exporter system metrics collector
  • A Deployment to deploy an Agent Pod to scrape itself, and the K8s API server

You additionally rolled out a node-exporter DaemonSet to deploy a Pod on each Node that collects system metrics. Finally, you imported a dashboard from the Community dashboards site to visualize you metrics data.

From here, you can build dashboards, explore and query your metrics, and set up alerts. Depending on your plan, you may also wish to fine-tune your metrics usage to stay within Free or Pro usage limits. To learn more about reducing your metrics usage, please see Controlling Prometheus metrics usage from the Cloud docs.