---
title: "Troubleshoot Kubernetes Monitoring | Grafana Cloud documentation"
description: "How to troubleshoot issues regarding Grafana Kubernetes Monitoring"
---

> For a curated documentation index, see [llms.txt](/llms.txt). For the complete documentation index, see [llms-full.txt](/llms-full.txt).

# Troubleshoot Kubernetes Monitoring

This section includes common errors encountered while installing and configuring Kubernetes Monitoring components, and tools you can use to troubleshoot.

## User loses access

If you have granted a user the None basic role plus plugins.app:access, that user has no access to Kubernetes. Kubernetes Monitoring has [two user roles](/docs/grafana-cloud/monitor-infrastructure/kubernetes-monitoring/configuration/control-access/#precision-access-with-rbac-custom-plugin-roles) to manage access:

- `plugins:grafana-k8s-app:admin`
- `plugins:grafana-k8s-app:reader`

If a user is having trouble with access, make sure you have granted one of them one of these roles. To assign these roles, refer to [Assign RBAC roles](/docs/grafana/latest/administration/roles-and-permissions/access-control/assign-rbac-roles/).

## Troubleshooting tools

You can use the following to understand help you troubleshoot issues with installation and configuration.

### Alloy tool

Grafana Alloy has a [web user interface](/docs/alloy/latest/tasks/debug/#alloy-ui) that shows every configuration component the Alloy instance is using and the component status. By default, the web UI runs on each Alloy Pod on port `12345`. Since that UI is typically not exposed external to the Cluster, you can access it with port forwarding:

`kubectl port-forward svc/grafana-k8s-monitoring-alloy 12345:12345`

Then open a browser to `http://localhost:12345` to view the GUI.

Access the Alloy web tool when:

- Grafana Alloy isn’t collecting or exporting metrics/logs/traces properly. For example, you’re missing metrics in Grafana Cloud or Prometheus and need to confirm if Alloy is scraping the right targets.
- A component is failing or in an error state. The UI shows each configuration component and its status (running, failed, initializing, and so on).
- You’re validating configuration changes. After updating your alloy.yaml, you can confirm that the configuration loaded correctly and that all pipelines and receivers are active.
- You suspect a dependency or connectivity issue. For example, Alloy when can’t reach Grafana Cloud endpoints, a local data source, or another collector, you can inspect component logs or connection statuses.
- You’re debugging startup or runtime issues. Useful if Alloy pods are up but not behaving as expected (for example, metrics pipeline broken, missing exporters).

### Debug Metrics tool

For any panel, click the menu icon and select **Debug metrics for this panel**.

[Accessing the menu for the panel to show the menu options](/media/docs/grafana-cloud/k8s/panel-menu-access.png)

**Debug Metrics** lists all metrics used for the panel along with any errors found.

[Debug Metrics for the panel](/media/docs/grafana-cloud/k8s/debug-metrics-window.png)

### Metrics status tool

To view the status of metrics being collected, in Kubernetes Monitoring:

1. Click **Configuration** on the menu.
2. Click the **Metrics status** tab.
3. Filter for the Cluster or Clusters you want to see the status of.

[**Metrics status** tab with status indicators for one Cluster](/media/docs/grafana-cloud/k8s/metrics-status-9-18.png)

#### Status icons

Each panel of the **Metrics status** shows an icon that indicates the status of the incoming data, based on the selected data source, Cluster, and time range:

- Check mark in a circle (green): Data for this source is being collected. The version of the source or online status also displays (if available).
- Caution with exclamation mark (yellow): Duplicate data is being collected for the metric source.
- X in a circle (red): There is no data available for this item within the time range specified, and it appears to be offline.

[**Metrics status** panel with icon warning of multiple metrics](/media/docs/grafana-cloud/k8s/multiplemetrics.png)

#### Check initial configuration

When you initially configure, if any box shows a red X in a circle, it can be any of the following:

- The feature was not selected during Cluster configuration.
- The system is not running correctly.
- Alloy was not able to gather data correctly.
- No data was gathered during the time range specified.

#### View the query with Explore

If something in the metrics status looks incorrect, click the icon next to the panel title. This opens the query in [Explore](/docs/grafana/latest/explore/query-management/) where you can examine the query for any issues, such as an incorrect label.

#### Look at a historical time range

Use the time range selector to understand what was occurring in the past. In the following example, Cluster events were being collected but are not currently.

[Time range of last two days for **Metrics status**](/media/docs/grafana-cloud/k8s/2024-nov-metrics-status.png)

#### View documentation for each status

For more information about each status, click the **Docs** link in each panel.

## Troubleshooting deployment with Helm chart

Common issues that can occur when a Helm chart is not configured correctly:

- [Duplicate metrics](#duplicate-metrics)
- [Duplicate alerts](#duplicate-alerts)
- [Helm upgrade doesn’t apply changes to an existing collector](#helm-upgrade-doesnt-apply-changes-to-an-existing-collector)
- [Missing metrics](#metrics-missing)

If you have configured Kubernetes Monitoring with the [Grafana Kubernetes Monitoring Helm chart](https://github.com/grafana/k8s-monitoring-helm/tree/main/charts/k8s-monitoring), here are some general troubleshooting techniques:

- Within Kubernetes Monitoring, view the [metrics status](#metrics-status-tool).
- Check for any changes with the command `helm template ...`. This produces an `output.yaml` file to check the result.
- Check the configuration with the command `helm test --logs`. This provides a configuration validation, including all phases of metrics gathering through display.
- Check the [`extraConfig` section of the Helm chart](https://github.com/grafana/k8s-monitoring-helm/blob/main/charts/k8s-monitoring/docs/UsingExtraConfig.md) to ensure this section is not used for modifications. This section is only for additional configuration not already in the chart, and *not* for modifications to the chart.

### Duplicate metrics

Certain metric data sources (such as Node Exporter or kube-state-metrics) may already exist on the Cluster. When you deployed with the Kubernetes Monitoring Helm chart, these data sources are installed even if they were already present on your Cluster.

1. Visit the [**Metrics status** tab](#metrics-status-tool) to view any duplicates.
2. Remove the duplicates or adjust the Helm chart values to use the existing ones and skip deploying another instance.

### Duplicate alerts and alert errors

You may temporarily see duplicate alerts for the same condition in the **Alerts** page or in alert counts throughout Kubernetes Monitoring. If you are missing alert notifications, refer to [Update error](#update-error).

**Cause**

During the migration from Prometheus Alertmanager to Grafana-managed alerts, both alerting systems may fire alerts simultaneously for the same conditions, resulting in duplicates. Kubernetes Monitoring queries both alert sources (`ALERTS` and `GRAFANA_ALERTS` metrics) to ensure all alerts are detected during this transition period.

**Solution**

This is expected behavior during the migration period and requires no action. Duplicate alerts will automatically resolve after your Grafana Cloud stack completes the migration to Grafana-managed alerts.

If duplicate alerts persist after the migration is complete, contact [Grafana Support](/docs/grafana-cloud/security-and-account-management/support/).

### Helm upgrade doesn’t apply changes to an existing collector

A `helm upgrade` of the Kubernetes Monitoring chart can complete successfully while leaving an existing collector unchanged. This happens when the upgrade tries to modify a field that Kubernetes treats as immutable. Because the Alloy Operator manages each collector through an internal Helm release, the operator’s reconciliation fails and rolls back silently.

**Symptoms**

- `helm upgrade` of the `k8s-monitoring` chart completes successfully.
- The Alloy custom resource shows the new configuration.
- The underlying Alloy workload object is unchanged.
- The Alloy Operator logs contain repeating errors:
  
  text ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy
  
  ```text
  "Release failed" ... "error":"upgrade failed; rollback required"
  ```
- Listing the operator’s internal Helm release history shows a `failed` revision followed by a `deployed` rollback:
  
  shell ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy
  
  ```shell
  kubectl get secrets -l owner=helm,name=<collector-release-name> -o custom-columns=VERSION:.metadata.labels.version,STATUS:.metadata.labels.status
  ```

**Cause**

Several fields deployed by Alloy cannot be changed after creation. When the Alloy Operator reconciles an updated Alloy custom resource that changes one of these fields, it runs an internal `helm upgrade` that tries to patch the existing workload. Kubernetes rejects the patch, the internal release fails, and the operator rolls back.

An example is adding `volumeClaimTemplates` to a StatefulSet collector that was originally deployed without persistent storage. The upgrade appears to succeed, but no PersistentVolumeClaim is ever created.

**Workaround**

Delete the affected Alloy custom resource before upgrading the chart. The operator uninstalls the managed workload cleanly as part of deleting the custom resource, and the subsequent `helm upgrade` recreates both the custom resource and the workload with the new configuration.

1. Identify the Alloy custom resource for the collector you are updating:
   
   shell ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy
   
   ```shell
   kubectl get alloy --all-namespaces
   ```
2. Delete the custom resource and wait for it to be fully removed. This also deletes the underlying StatefulSet and Pods:
   
   shell ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy
   
   ```shell
   kubectl delete alloy <collector-name> --namespace <release-namespace> --wait=true
   ```
3. Run `helm upgrade` with the new values.

The collector is recreated with the new configuration.

### Specific Cluster platform providers

Certain Kubernetes Cluster platforms require some specific configurations for the Kubernetes Monitoring Helm chart. If your Cluster is running on one of these platforms, refer to the example for the changes required to run the Helm chart:

- [Azure AKS](https://github.com/grafana/k8s-monitoring-helm/tree/main/charts/k8s-monitoring/docs/examples/platforms/azure-aks)
- [AWS EKS on Fargate](https://github.com/grafana/k8s-monitoring-helm/tree/main/charts/k8s-monitoring/docs/examples/platforms/eks-fargate)
- [Google GKE Autopilot](https://github.com/grafana/k8s-monitoring-helm/tree/main/charts/k8s-monitoring/docs/examples/platforms/gke-autopilot)
- [OpenShift](https://github.com/grafana/k8s-monitoring-helm/tree/main/charts/k8s-monitoring/docs/examples/platforms/openshift)

## Missing data

Here are some tips for missing data.

### CPU usage negative and missing data

If you have not installed Kubernetes Monitoring with the Helm chart and instead used the OTel collector deployed as a DaemonSet, you could have issues with CPU usage data. The OTel collector should be deployed as a Deployment. By using a DaemonSet, multiple samples may be written out of order to the same time series. This can cause Kubernetes Monitoring to show:

- Negative rates for CPU usage
- Gaps in usage showing on Optimization panels
- Unevenly spaced data points indicative of multiple sample ingestion, which may also be interpreted as [counter resets](https://prometheus.io/docs/prometheus/latest/querying/functions/#rate)

### CPU usage panels missing data

If there is no CPU usage data, the data scraping intervals of the collector and the data source may not match. The default scraping interval for Grafana Alloy is 60 seconds. If the scraping interval for your data source is not 60 seconds, this mismatch may interfere with the calculation for CPU rate of usage.

To resolve, synchronize the scraping interval for the collector and data source.

- If you configured the data source (meaning it wasn’t automatically provisioned by Grafana Cloud), change the scrape interval for the data source to match the collector.
- If the data source was provisioned for you by Grafana Cloud, contact support to request the scrape interval for the data source be changed to match the collector.

### Data missing in a panel

If a panel in Kubernetes Monitoring seems to be missing data or shows a “No data” message, you can use either the [Debug Metrics](#debug-metrics-tool) feature or open the query for the panel in Explore to determine which query is failing.

This can occur when new features are released. For example, if you see no data in the [network bandwidth and saturation panels](/docs/grafana-cloud/monitor-infrastructure/kubernetes-monitoring/navigate-k8s-monitoring/#view-network-bandwidth-and-saturation), it is likely you need to upgrade to the newest version of the Helm chart.

### Data missing for a provider

If your cloud service provider name is not showing up in the Cluster list page, it’s likely due to a `provider_id` missing from some types of Clusters. This occurs in the case of an internal provider or bare metal Clusters. To ensure your provider shows up, create a relabeling rule for the provider. metrics:

![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```none
kube-state-metrics:
  extraMetricRelabelingRules: |-
    rule {
      source_labels = ["__name__", "provider_id", "node"]
      separator = "@"
      regex = "kube_node_info@@(.*)"
      replacement = "<cluster provider id>://${1}"
      action = "replace"
      target_label = "provider_id"
    }
```

Replace `<cluster provider id>` with the provider ID you would like to appear in the Kubernetes Monitoring Cluster list page.

### Efficiency usage data missing

If CPU and memory usage within any table shows no data, it could be due to missing Node Exporter metrics. Navigate to the [**Metrics status** tab](#metrics-status-tool) to determine what is not being reported.

### Job data missing

If you are missing jobs data, make sure you are collecting the following metrics:

- `kube_cronjob_info`
- `kube_cronjob_next_schedule_time`
- `kube_cronjob_spec_suspend`
- `kube_cronjob_status_last_schedule_time`
- `kube_cronjob_status_last_successful_time`
- `kube_job_info`
- `kube_job_owner`
- `kube_job_spec_completions`
- `kube_job_status_completion_time`
- `kube_job_status_failed`
- `kube_job_status_start_time`
- `kube_job_status_succeeded`
- `kube_namespace_status_phase`
- `kube_node_info`
- `kube_pod_completion_time`
- `kube_pod_container_status_last_terminated_timestamp`
- `kube_pod_owner`
- `kube_pod_restart_policy`

### Metrics missing

If metrics are missing even though the [**Metrics status** tab](#metrics-status-tool) is showing that the configuration is set up as you intended, check for an incorrectly configured label for the Node Exporter instance.

Make sure the Node Exporter `instance` label is set to the Node name. The labels for kube-state-metrics `node` and Node Exporter `instance` must contain the same values.

#### Methodology for missing metrics

It’s helpful to keep in mind the different phases of metrics gathering when debugging.

##### Discovery

Find the metric source. In this phase, find out whether the tool to gather metrics is working. For example, is Node Exporter running? Can Alloy find Node Exporter? Perhaps there’s configuration that is incorrect because Alloy is looking in a namespace or for a specific label.

##### Scraping

Ask whether the metrics were gathered correctly. As an example, most metric sources use HTTP, but the metric source you are trying to find uses HTTPS. Identify whether the configuration is set for scraping HTTPS.

##### Processing

Ask whether metrics were correctly processed. With Kubernetes Monitoring, metrics are filtered to a small subset of the useful metrics.

##### Delivery

In this phase, metrics are sent to Grafana Cloud. If there is an issue, there are likely no metrics being delivered. This can occur if your account limits for metrics is reached. Check the **Usage Insights - 5 - Metrics Ingestion** dashboard.

[List of Grafana Cloud dashboards with Metrics Ingestion dashboard highlighted](/media/docs/grafana-cloud/k8s/usage-insights-dashboard.png)

##### Displaying

In this phase, a metric is not showing up in the Kubernetes Monitoring GUI. If you’ve determined the metrics are being delivered but some are not displaying, there may be a missing or incorrect label for the metric. Check the [**Metrics status** tab](#metrics-status-tool).

### Pod logs missing

If you are not seeing Pod logs and your platform is AWS EKS Fargate, these logs cannot be gathered using a hostpath volume mount. Instead, you can use API-based log gathering. For greater detail, refer to [EKS Fargate](https://github.com/grafana/k8s-monitoring-helm/blob/main/charts/k8s-monitoring/docs/examples/platforms/eks-fargate/README.md).

### Network metrics missing

If you have deployed on the AWS EKS Fargate platform, AWS prevents a level of access that Node Exporter requires to gather metrics for the network panels. EKS Fargate provides on-demand compute for Kubernetes objects instead of the traditional means where these objects run on Nodes.

### Port conflicts and Node Exporter

Node Exporter opens host port 9100 on the Kubernetes Node. If there already is a Node Exporter being used, the two exporters experience conflict with their respective default ports. To avoid this conflict, you have two options.

You can change the Node Exporter port number, so the Node Exporter deployed by the Kubernetes Monitoring Helm chart does not conflict with the existing Node Exporter. To do this, customize the Helm chart by adding the following to your values.yaml file:

YAML ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```yaml
clusterMetrics:
  node-exporter:
    enabled: true
    service:
      port: 9101 # Choose an unused port
```

Alternatively, you can disable the Node Exporter deployed by the Helm chart, and target the existing Node Exporter. To do this, customize the Helm chart by adding the following to your values.yaml file:

YAML ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```yaml
clusterMetrics:
  node-exporter:
    enabled: true
    deploy: false
    namespace: '<namespace of the existing Node Exporter>'
    labelSelectors: # Customize to match the existing Node Exporter Pod labels
      app.kubernetes.io/name: node-exporter
```

### Profiling data missing or showing unexpected results

If Pyroscope is configured and collecting profiles but you see either of the following, the cause is likely a label name mismatch:

- The **Profiles** section on a workload or Pod detail page shows “No profiling data available.”
- The **Profiles** column in workload tables is disabled with “No profiling data found.”

A label mismatch can also cause a flame graph to display data from the **wrong service** without any error. This happens because Kubernetes Monitoring falls back to searching all services in Pyroscope when the scoped query fails. And it may incorrectly match a different service that shares part of the workload name. If a flame graph looks unexpected for a workload (for example, it shows function names you don’t recognize or resource usage that doesn’t match what your metrics report), a label mismatch is a likely cause.

**Why this happens**

Kubernetes Monitoring queries Pyroscope using the labels `service_name`, `pod`, `namespace`, and `cluster` by default. If your Pyroscope setup uses different label names (for example, `k8s.namespace.name` instead of `namespace`, or `k8s.pod.name` instead of `pod`), the scoped queries return no results. The integration then falls back to broader, unscoped queries, which can either find no match or match the wrong service.

**How to fix it**

Configure the label mapping so Kubernetes Monitoring uses the correct Pyroscope label names:

1. Open the **Profiles Drilldown** app.
2. Navigate to the settings page.
3. In the **Kubernetes Label Mapping** section, select your Pyroscope data source.
4. Update the label names to match the labels used in your Pyroscope data. For example, change `namespace` to `k8s.namespace.name`.
5. Click **Save**.

After saving, return to Kubernetes Monitoring and refresh the workload or Pod detail page. The flame graph and Profiles links should now display the correct data.

### Workload data missing

If you are seeing Pod resource usage but not workloads usage data, the recording rules and alert rules are likely not installed.

When these rules aren’t installed, Kubernetes Monitoring displays a banner across all pages. Click **Install Now** in the banner to go directly to the configuration step that installs them.

You can also install the rules following these steps:

1. Navigate to the **Configuration** page.
2. Click the **Metrics status** tab.
3. In the **Workload Recording Rule** panel, click **Install** to install alert rules and recording rules.

## Error messages

Here are tips for errors you may receive related to configuration.

### Authentication error: invalid scope requested

To deliver telemetry data to Grafana Cloud, you use an [Access Policy Token](/docs/grafana-cloud/security-and-account-management/authentication-and-permissions/access-policies/) with the appropriate scopes. Scopes define an action that can be done to a specific data type. For example `metrics:write` permits writing metrics.

If sending data to Grafana Cloud, the Helm chart uses the `<data>:write` scopes for delivering data.

If your token does not have the correct scope, you see errors in the Grafana Alloy logs. For example, when trying to deliver profiles to Pyroscope without the `profiles:write` scope:

text ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```text
msg="final error sending to profiles to endpoint" component=pyroscope.write.profiles_service endpoint=https://tempo-prod-1-prod-eu-west-2.grafana.net:443 err="unauthenticated: authentication error: invalid scope requested"
```

The following table shows the scopes required for various actions done by this chart:

Expand table

| Data type               | Server                                      | Scope for writing | Scope for reading |
|-------------------------|---------------------------------------------|-------------------|-------------------|
| Metrics                 | Grafana Cloud Metrics (Prometheus or Mimir) | `metrics:write`   | `metrics:read`    |
| Logs and Cluster Events | Grafana Cloud Logs (Loki)                   | `logs:write`      | `logs:read`       |
| Traces                  | Grafana Cloud Trace (Tempo)                 | `traces:write`    | `traces:read`     |
| Profiles                | Grafana Cloud Profiles (Pyroscope)          | `profiles:write`  | `profiles:read`   |

### Couldn’t load repositories file

If you receive the following message when running the Helm chart installation generated by Grafana Cloud `Error: Couldn't load repositories file (/root/.helm/repository/repositories.yaml).` then run `helm init`. This is a common error for new installations of Kubernetes and K3s.

### Invalid argument 300s

If you receive the following message when running the chart installation generated by Grafana Cloud `Error: invalid argument 300s for --timeout flag: strconv.ParseInt: parsing 300s: invalid syntax`, then you’re using an older version of Helm. Update to the latest version.

### Kepler Pods crashing on AWS Graviton Nodes

Kepler [cannot run](https://github.com/sustainable-computing-io/kepler/issues/1556) on AWS Graviton Nodes and Pods; these Nodes will CrashLoopBackOff. To prevent this, you can add a Node selector to the Kepler deployment:

YAML ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```yaml
kepler:
  nodeSelector:
    kubernetes.io/arch: amd64
```

### Kubernetes Cluster unreachable

For K3s deployments, if you receive the following message when running the Helm chart installation generated by Grafana Cloud `Error: Kubernetes cluster unreachable: Get http://localhost:8080/version: dial tcp 127.0.0.1:8080: connect: connection refused`, then execute the following command before you run Helm: `export KUBECONFIG=/etc/rancher/k3s/k3s.yaml`.

### OpenShift error

With the OpenShift default `SecurityContextConstraints` (`scc`) of `restricted` (refer to the `scc` [documentation](https://docs.redhat.com/en/documentation/openshift_container_platform/latest/html/authentication_and_authorization/managing-pod-security-policies) for more info), you may run into the following errors while deploying Grafana Alloy using the default generated manifests:

![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```none
msg="error creating the agent server entrypoint" err="creating HTTP listener: listen tcp 0.0.0.0:80: bind: permission denied"
```

By default, the Alloy StatefulSet container attempts to bind to port `80`, which is only allowed by the root user (`0`) and other privileged users. With the default `restricted` SCC on OpenShift, this results in the preceding error.

![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```none
Events:
  Type     Reason        Age                   From                  Message
  ----     ------        ----                  ----                  -------
  Warning  FailedCreate  3m55s (x19 over 15m)  daemonset-controller  Error creating: pods "grafana-agent-logs-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[2]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.containers[0].securityContext.runAsUser: Invalid value: 0: must be in the ranges: [1000650000, 1000659999], spec.containers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount]
```

By default, the Alloy DaemonSet attempts to run as root user, and also attempts to access directories on the host (to tail logs). With the default `restricted` SCC on OpenShift, this results in the preceding error.

To solve these errors, use the [`hostmount-anyuid`](https://docs.redhat.com/en/documentation/openshift_container_platform/latest/html/authentication_and_authorization/managing-pod-security-policies) SCC provided by OpenShift, which allows containers to run as root and mount directories on the host.

If this does not meet your security needs, create a new SCC with the required tailored permissions, or investigate running Agent as a non-root container, which goes beyond the scope of this troubleshooting guide.

To use the `hostmount-anyuid` SCC, add the following stanza to the `alloy` and `alloy-logs` ClusterRoles:

YAML ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```yaml
---
- apiGroups:
    - security.openshift.io
  resources:
    - securitycontextconstraints
  verbs:
    - use
  resourceNames:
    - hostmount-anyuid
```

### ResourceExhausted error when sending traces

You might encounter the following if you have traces enabled and you see log entries in your `alloy` instance that looks like this:

text ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```text
Permanent error: rpc error: code = ResourceExhausted desc = grpc: received message after decompression larger than max (5268750 vs. 4194304)" dropped_items=11226
ts=2024-09-19T19:52:35.16668052Z level=info msg="rejoining peers" service=cluster peers_count=1 peers=6436336134343433.grafana-k8s-monitoring-alloy-cluster.default.svc.cluster.local.:12345
```

This error is likely due to the span size being too large. To fix this, adjust the batch size:

YAML ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```yaml
receivers:
  processors:
    batch:
      maxSize: 2000
```

Start with 2000 and adjust as needed.

### Traces missing with Istio service mesh

If traces are not appearing in Grafana Cloud Traces when Istio service mesh is deployed in your Cluster, this is likely due to the protocol detection requirements of Istio. Istio requires Kubernetes Service port names to be `grpc` or start with `grpc-` (for example, `grpc-otlp`) for proper gRPC protocol detection. Without following this naming convention, Istio cannot identify the port as using the gRPC protocol, and cannot properly route trace data from your applications to the OpenTelemetry Collector.

To resolve this issue, ensure your Kubernetes Service port names follow Istio’s protocol naming convention when configuring OpenTelemetry trace collection with Istio. For example, when configuring OTLP gRPC receivers:

YAML ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```yaml
alloy-receiver:
  enabled: true
  alloy:
    extraPorts:
      - name: grpc-otlp # Must be "grpc" or "grpc-<suffix>" for Istio
        port: 4317
        targetPort: 4317
        protocol: TCP
      - name: otlp-http # HTTP ports don't require special naming
        port: 4318
        targetPort: 4318
        protocol: TCP
```

> Warning
> 
> If your port name does not follow Istio’s naming convention (for example, `otlp-grpc` or `otlp`), you must rename it to either `grpc` or a name starting with `grpc-` (for example, `grpc-otlp`) when Istio is deployed in your Cluster.

This Istio protocol detection requirement applies to the Kubernetes Service that exposes the OTLP gRPC port for Grafana Alloy or any other OpenTelemetry Collector running under Istio service mesh. For more information about Istio protocol selection, refer to the [Istio protocol selection documentation](https://istio.io/latest/docs/ops/configuration/traffic-management/protocol-selection/).

### Update error

If you attempted to upgrade Kubernetes Monitoring with the **Update** button on the **Cluster configuration** tab under **Configuration** and received an error message, delete and recreate only the collector that failed to update. This preserves your alerting and recording rules.

1. List the collectors in your cluster:
   
   shell ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy
   
   ```shell
   kubectl get alloy --all-namespaces
   ```
2. Delete the collector that failed to update and wait for it to be fully removed:
   
   shell ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy
   
   ```shell
   kubectl delete alloy <COLLECTOR_NAME> --namespace <RELEASE_NAMESPACE> --wait=true
   ```
3. Run `helm upgrade` with your values. The collector is recreated with the new configuration.

For more details, refer to [Helm upgrade doesn’t apply changes to an existing collector](#helm-upgrade-doesnt-apply-changes-to-an-existing-collector).

If you are unable to resolve the error, follow these steps to uninstall and reinstall the alert and recording rules.

> Warning
> 
> When you perform an uninstall, this deletes the `integrations-kubernetes` alert and recording rule namespace, including all alert and recording rules in that namespace. Re-installation creates new rules that are Grafana-managed alerts instead of data source-managed alerts. If you had data source-managed alerting, your routing notifications for the alerting system routes are changed. For more information, refer to [Alerting rule types](/docs/grafana-cloud/monitor-infrastructure/kubernetes-monitoring/configuration/upgrade-k8s/#alerting-rule-types).

Before you uninstall:

1. Go to **Alerts &amp; IRM** &gt; **Alerting** &gt; **Alert rules** and search for `namespace:integrations-kubernetes`.
2. Export any custom alert rules you added to the `integrations-kubernetes` namespace. Use **More** &gt; **Duplicate** to copy them to a different namespace, or export them using the [export instructions](/docs/grafana/latest/alerting/set-up/provision-alerting-resources/export-alerting-resources/#export-alert-rules).
3. Note any modified recording rules so you can recreate them after reinstalling.

To uninstall and reinstall:

1. Click **Uninstall**.
2. Click **Install** to reinstall.
3. Complete the instructions in [Configure with Grafana Kubernetes Monitoring Helm chart](/docs/grafana-cloud/monitor-infrastructure/kubernetes-monitoring/configuration/helm-chart-config/).

After reinstalling:

1. Verify that alerting and recording rules are installed on the [**Metrics status** tab](#metrics-status-tool).
2. If your rules changed from data source-managed to Grafana-managed, reconfigure your notification routing in the built-in Alertmanager of Grafana. Refer to [Configure contact points](/docs/grafana/latest/alerting/configure-notifications/manage-contact-points/) and [Configure notification policies](/docs/grafana/latest/alerting/configure-notifications/create-notification-policy/).
3. Move any exported custom alert rules back into a namespace other than `integrations-kubernetes` to prevent them from being deleted by future upgrades.
