Grafana Kubernetes MonitoringKubernetes Monitoring walkthrough

Kubernetes Monitoring app walkthrough

This walkthrough will show you how to deploy an instrumented three-tier (data layer, app logic layer, load-balancing layer) web application into a Kubernetes cluster, and leverage Grafana Cloud’s built-in Kubernetes features for monitoring this application.

By the end of the walkthrough, you’ll have:

  • Deployed the TNS sample app into your Kubernetes cluster
  • Deployed a prebuilt dashboard to your Grafana Cloud instance to visualize the app’s performance metrics
  • Rolled out Grafana Agents to collect metrics, logs, and events from the Kubernetes cluster
  • Configured these agents to collect metrics, logs, and traces (including exemplars) from the TNS app
  • Learned how to navigate from metrics, to logs, to traces and back using Grafana’s powerful correlation features
  • Learned how to use the Kubernetes Cluster Navigator to explore your cluster’s running workloads, jumping from Pods to dashboards and logs

Prerequisites

Before you begin the walkthrough, you should have the following available to you:

  • A Kubernetes, K3s, or OpenShift cluster
  • A Grafana Cloud stack, optionally with exemplar support enabled. To enable exemplars for your Grafana Cloud stack, please contact support.

Deploy and configure Grafana Agent

To begin, we’ll first roll out Grafana Agent into our Kubernetes cluster. To make deploying Agent easier, Grafana Cloud provides preconfigured manifests that you can download and modify as needed.

  1. Download pre-configured manifests.

    In your Grafana instance, click the Kubernetes icon in the navigation bar.

  2. Install pre-configured dashboards.

    Click Install dashboards and rules to install the prebuilt set of Kubernetes dashboards and alerts. After these are installed, you’ll be presented with a set of instructions for deploying Grafana Agent.

  3. Modify the Metrics & Events Agent ConfigMap to configure it to scrape /metrics endpoints of Pods deployed in our cluster. By default it won’t scrape these endpoints and will only scrape cluster metrics endpoints like the /cadvisor and /kubelet endpoints.

    In the generated ConfigMap, add the following scrape job stanza:

    . . .
              relabel_configs:
                  - action: keep
                    regex: kube-state-metrics
                    source_labels:
                      - __meta_kubernetes_pod_label_app_kubernetes_io_name
            # New scrape job below                  
            - job_name: integrations/kubernetes/pod-metrics
              kubernetes_sd_configs:
                - role: pod
              relabel_configs:
                - action: drop
                  regex: kube-state-metrics
                  source_labels:
                    - __meta_kubernetes_pod_label_app_kubernetes_io_name
                - action: labelmap
                  regex: __meta_kubernetes_pod_label_(.+)
                - source_labels: [__meta_kubernetes_namespace]
                  action: replace
                  target_label: namespace
                - source_labels: [__meta_kubernetes_pod_name]
                  action: replace
                  target_label: pod
                - source_labels: ['__meta_kubernetes_namespace', '__meta_kubernetes_pod_label_name']
                  action: 'replace'
                  separator: '/'
                  target_label: 'job'
                  replacement: '$1'
                - source_labels: ['__meta_kubernetes_pod_container_name']
                  action: 'replace'
                  target_label: 'container' 
    . . .
    

    This scrape job attempts to scrape all containers (and ports) running in your cluster at /metrics, drops any kube-state-metrics metrics (since we’re already picking these up in another scrape job), and performs some relabeling (setting job, pod, namespace, etc. labels).

    To learn more about configuring scrape jobs, please see the Prometheus scrape config documentation. You can adjust this generic catchall stanza to restrict scraping to a given namespace, workload label, drop additional metrics, and more using different config directives.

  4. Configure Grafana Agent to send exemplars to Grafana Cloud. To do this, add the following to the ConfigMap:

    . . .
          configs:
          - name: integrations
            remote_write:
            - url: <your_prometheus_metrics_endpoint>
              basic_auth:
                username: <your_prometheus_metrics_user>
                password: <your_prometheus_metrics_api_key>
              # Add the following line
              send_exemplars: true
    . . . 
    

    Note: You must contact Support to enable exemplars in your Grafana Cloud instance.

  5. Deploy ConfigMap into your cluster.

  6. Follow the remaining steps in the quickstart instructions to deploy:

    • An Agent StatefulSet
    • kube-state-metrics
    • An Agent ConfigMap & DaemonSet to tail container logs
  7. Deploy the Agent to collect traces.

    Follow the Grafana Agent Traces quickstart. Be sure to fill in the required remote_write credentials. Your Tempo endpoint URL should look something like tempo-us-central1.grafana.net:443.

    Important: Without an Agent to collect traces, the demo app will not start, so be sure to complete this step.

    When you’re finished deploying the telemetry collectors, your running K8s Pods should look something like this:

    NAME                                      READY   STATUS    RESTARTS   AGE
    grafana-agent-0                           1/1     Running   0          3m
    grafana-agent-logs-lcpjd                  1/1     Running   0          2m44s
    grafana-agent-logs-pc9sp                  1/1     Running   0          2m44s
    grafana-agent-logs-qtjzq                  1/1     Running   0          2m44s
    grafana-agent-traces-7775575d6d-qcrmq     1/1     Running   0          21s
    ksm-kube-state-metrics-58ccd7456c-487c9   1/1     Running   0          2m50s
    

Deploy the TNS app

With the telemetry collectors up and running, we’ll now deploy the TNS demo app into our Kubernetes cluster. The TNS GitHub repository contains Kubernetes manifests (written in Jsonnet) that will deploy all of the app’s required components. You won’t need to learn Jsonnet to follow this guide and can inspect the manifests and code before deploying the components. The repo also contains more information about the app and how it has been instrumented to work with Grafana Cloud. It also contains the app source code.

  1. To deploy the app, run the following command:

    kubectl apply -f https://raw.githubusercontent.com/grafana/tns/main/production/k8s-yamls-cloud/app-full.yaml
    

    You can inspect the YAML manifests before deploying the app into your cluster. This will deploy the app Pods and Services into the tns-cloud namespace of your cluster and will create it if it doesn’t exist.

  2. Inspect deployment status using kubectl:

     kubectl get all -n tns-cloud
    
  3. Forward a port to a local web browser:

    kubectl port-forward -n tns-cloud service/app 8080:80
    
  4. Navigate to https://localhost:8080 in your web browser to see the demo app in action.

With the instrumented demo app and load generator up and running, we can now navigate to Grafana Cloud to query our app logs, visualize its metrics, and inspect its trace data.

Correlate metrics, logs, and traces

At this point, the demo app and load generator are instrumented, up and running, and our telemetry collectors are forwarding metrics, log, trace, and event data to Grafana Cloud. Before exploring some of Grafana Cloud’s built-in features, we’ll install a custom prebuilt app dashboard that will demonstrate some core Grafana features.

  1. In your Grafana instance, click Dashboards and then Import.
  2. Enter 16491 in the ID field, and click Load.
  3. Click Import to import the dashboard.
  4. Navigate to the dashboard.

You should see something similar to the following:

TNS Dashboard

Note: If you do not see the yellow dots (exemplars), you must contact Support to enable exemplars in your Grafana instance.

You can click on an exemplar to jump to a trace for a particularly slow request:

Exemplar

Trace

From here, you can then jump to logs to view the problematic span:

Trace to logs

Logs

You can click Show context on a log line to see the surrounding log context.

The ability to jump from metrics, to traces, to logs, and back is an extremely powerful feature that helps you quickly resolve production issues and reduce MTTR.

You can also jump to logs from metrics using the Explore view.

  1. Enter the following PromQL query, ensuring that you select the correct Prometheus data source:

    sum by (status_code) (rate(tns_request_duration_seconds_count{job=~"tns-cloud/app.*"}[$__rate_interval]))
    

    You should see the following graph:

    PromQL Query

  2. Click Split at the top of the window, and select the correct Loki data source in the dropdown:

    Loki Dropdown

    Grafana will carry over the labels selected in the PromQL query and pre-populate a LogQL query with the same labels:

    Logs Result

This allows you to quickly jump from Prometheus metrics queries to the corresponding Loki logs data relevant to the metrics graph you’re analyzing.

To learn more about these features, please see the following videos:

Another way to explore your Kubernetes workloads is to use the cluster navigator feature. Click on the Kubernetes icon in the left-hand nav bar of your Grafana instance.

You should see something similar to the following:

Cluster nav

Click into a particular namespace to begin exploring workloads:

Workload nav

From here, you can see any firing alerts, the health of running Pods, and can further drill down into a given ReplicaSet:

ReplicaSet nav

Clicking into a Pod allows you to quickly see additional Pod info, its logs, and latest Kubernetes cluster events:

Pod view

You can also quickly jump to a Pod’s resource usage dashboard:

Pod dashboard

Instead of copy and pasting Pod names and labels from a terminal and running kubectl get, kubectl describe, and kubectl logs, you can now navigate and jump straight to the relevant observability data, all in Grafana Cloud.

To learn more about Grafana Cloud Kubernetes monitoring, please see: