Grafana Cloud

Troubleshoot with built-in tools

Kubernetes Monitoring includes built-in tools to help you investigate issues without leaving the app. Use Grafana Assistant for in-context help, view logs and events from any detail page, profile workloads to find expensive code paths, and jump into root cause analysis, Application Observability, Explore, traces, or your Cloud provider when you need to go deeper.

Use Grafana Assistant

Click on the Grafana Assistant icon in any panel that contains it to learn more about the panel. This feature opens the Assistant window where you can ask additional questions about the panel itself or your data displayed in the panel.

Grafana Assistant icon being hovered over for more information
Grafana Assistant icon being hovered over for more information

The Assistant also runs an automatic health check at the top of every Kubernetes detail page. Refer to Use Assistant health checks.

View logs and events

Logs Drilldown is embedded within Kubernetes Monitoring. Instead of going through logs for every Pod, you start with a service (the logical unit your team owns). Logs Drilldown aggregates logs for that service so you can immediately spot whether the problem is one Pod or the whole service. That tells you whether to fix a Pod or the infrastructure.

From Cluster to container, you can click on the Logs or Events tab of any detail page to analyze logs without writing complex queries. For more information, refer to Labels and Fields and Log Patterns.

Give it a try using Grafana Play
Give it a try using Grafana Play

With Grafana Play, you can explore and see how it works, learning from practical examples to accelerate your development. This feature can be seen on the Logs tab of this namespace details page.

Profile applications with continuous profiling

When a workload or Pod shows high CPU or memory usage, it’s common to ask which part of the code is responsible. Profiling captures which applications in your code are responsible for that resource usage. Kubernetes Monitoring integrates Continuous profiling directly into the detail pages for Workload and Pod. You can investigate performance issues related to applications and determine the cause.

Profiling section

The Profiling section of Workload and Pod detail pages displays a flame graph that shows which applications in your code are consuming the most CPU or memory. A flame graph visualizes this data as stacked bars. Each bar represents a function.

The width of each bar shows how much time or resources that function consumed. Wider bars point to the code paths that are worth your time to optimize.

Container detail page with gauge indicating CPU request is undersized
Container detail page with gauge indicating CPU request is undersized

Data used by flame graphs

Profiling data requires instrumentation on the application side. The flame graph works with the Pyroscope instance in Grafana Cloud. To see profiling data, one of the following ingestion methods must be used for profiling data to appear:

  • Configure using the Kubernetes Monitoring Helm chart. This chart directs Alloy to collect profiles in the Cluster using eBPF. No application code changes are required. You can also add annotations to your Pods and Alloy to scrape pprof endpoints automatically. These two options differ in which collection layer they use: pprof is a pull-over-HTTP model; eBPF is a kernel hook model.

    Collection typeValue/when to use
    pprof scraping- Provides fine-grained, application-aware profiles such as goroutines, heap, and CPU at function level
    - Use pprof when you need detailed app-level profiling and can instrument your code
    - Profiles are collected only from pods that include the required Kubernetes annotations
    eBPF profiling- Can provide automatic, Cluster-wide coverage
    - Use eBPF for broad coverage with no code changes, especially for third-party or multi-language workloads

    Refer to Customize the Helm chart configuration for setup steps.

  • Pyroscope SDKs. Applications push profiles directly to Grafana Cloud Pyroscope. SDKs are available for Go, Java, Python, .NET, Ruby, Node.js, and Rust.

    Refer to Send profile data for setup steps.

Troubleshoot missing profiles

The Profiles section appears on all Workload and Pod detail pages but displays “No profiling data available” if there is no profiling data in Pyroscope. If you see missing or unexpected profile data, refer to Troubleshooting.

Resolve issues with built-in tools

Navigate from Kubernetes Monitoring to other capabilities in Grafana Cloud to analyze, troubleshoot, and solve issues.

Access root cause analysis tool

Note

To access root cause analysis tools, enable Knowledge Graph on your stack. If Knowledge Graph is not enabled, Kubernetes Monitoring displays a banner with a link to enable it.

You can take troubleshooting deeper by understanding relationships between components and what is occurring between them. Within Kubernetes Monitoring, access RCA workbench to perform root cause analysis.

Access RCA workbench by any of these methods:

Jump to the application layer

On the detail page for a Pod or workload, hover over View application layer, then click Go to Application Observability to navigate directly to more data, such as the service health.

Navigating directly to the Application Observability app
Navigating directly to the Application Observability app

To return to Kubernetes Monitoring, click the browser back button.

View queries to troubleshoot with Explore

To further query data, use any of the Explore buttons available throughout the interface (such as Explore namespaces or Explore alerts). You see a view that provides additional query tools for troubleshooting.

Raw query with options to add, view query history, and inspect query
Raw metrics

Use debug metrics

For any panel, you can open Debug Metrics to see the metrics used for the panel.

If you choose to enable traces when you configure Kubernetes Monitoring, you can easily click to see them.

  1. Click the main menu icon.

  2. Click Explore.

  3. Choose the Tempo data source.

  4. With the TraceQL tab selected, enter your search query.

  5. Click Run query.

    A table of traces appears.

  6. Click a trace to see the detail.

Explore detail page showing table of traces, TraceQL query, and trace graph
View traces

Note

If you use Istio service mesh and traces don’t appear, verify that the Kubernetes Service port name for the OpenTelemetry Collector gRPC endpoint (the value of spec.ports[].name) is either grpc or starts with grpc-. For more information, refer to Traces missing with Istio service mesh.

Access Nodes in Cloud provider accounts

You can navigate to the EC2 dashboard for Nodes managed by AWS from Kubernetes Monitoring. For example:

  1. Find the EC2 Node by going to Search and searching for the Node name.
  2. In the search results, click the Node name to open the Node detail page.
  3. On the far right-side of the screen, open the AWS drop-down to see the link to the EC2 instance.
    AWS drop-down menu
    AWS drop-down menu
  4. Click the instance link to open the EC2 overview. Here you can find AWS-specific metadata and other data provided by CloudWatch metrics.
    Dashboard for AWS EC2 instance
    Dashboard for AWS EC2 instance
  5. To return to the Kubernetes Monitoring view of the Node, click Back to Kubernetes Node.