---
title: "Resource utilization and saturation | Grafana Cloud documentation"
description: "Learn more about resource utilization and saturation in the knowledge graph"
---

> For a curated documentation index, see [llms.txt](/llms.txt). For the complete documentation index, see [llms-full.txt](/llms-full.txt).

# Resource utilization and saturation

Learn how the knowledge graph maps resource utilization and saturation metrics.

## Resource Utilization

Some resources have a finite limit. For these resources, you can:

- Express their current level of utilization as a ratio against the limit.
- Observe their utilization, and get early warning of their saturation.

For these cases, record the `asserts:resource` metric.

For example, consider the number of clients for a Redis server. Redis has a configuration to limit the max number of clients. This limit as well as the current number of clients is available through metrics exposed by the Redis exporter.

Expand table

| **Metric**                | **Details**                                  |
|---------------------------|----------------------------------------------|
| `redis_connected_clients` | Number of active client connections          |
| `redis_config_maxclients` | Maximum number of client connections allowed |

Considering that the number of clients is a finite resource, it’s useful to track the resource’s utilization, and receive an early warning before the resource actually saturates. To achieve this, record the `asserts:resource` metric.

![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```none
- record: asserts:resource
  expr: >
    max by (asserts_env, asserts_site, namespace, service, job) (
      redis_connected_clients / redis_config_maxclients
    )
  labels:
    asserts_entity_type: Service
    asserts_resource_type: client_connections
    asserts_source: redis_exporter
```

With this, the client connections utilization is normalized to a scale of `0-1`. We now need to define the warning and critical thresholds for Saturation.

![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```none
- record: asserts:resource:threshold
  expr: 0.8
  labels:
    asserts_resource_type: client_connections
    asserts_severity: warning

- record: asserts:resource:threshold
  expr: 0.9
  labels:
    asserts_resource_type: client_connections
    asserts_severity: critical
```

The thresholds can also be managed in the UI at `Home -> Observability -> Rules -> Thresholds` under the `Resource` tab.

### Understand the knowledge graph meta labels and aggregation

**asserts\_env** and **asserts\_site**

The same Redis service may be deployed in multiple environments. In the knowledge graph, all services and other infrastructure components are grouped by environment and site. For example, `asserts_env = prod` and `asserts_site = us-west-2`. The metrics, alerts and the services discovered from these metrics are scoped by these labels.

**namespace and service**

There may be multiple deployments of Redis for different functionalities. For example, `redis-payments`, `redis-orders` etc. In K8s, these would be deployed as different stateful services. The knowledge graph uses the `namespace` and `service` label in the metric to uniquely identify each service in a given environment.

**job**

In non-k8s environment, the prometheus metric scrape configuration has a different value for the `job` label for each different deployment of Redis. The knowledge graph uses the `job` label in the metric to uniquely identify each service in a given environment.

**Aggregation**

So far we have identified the labels that uniquely identify the service in a multi-environment, multi-service setup. Redis may also be setup as a cluster with multiple instances. Since the connection count metric is available at the instance level, the utilization is computed at the instance level. But we want the alert to fire for the Redis service and not for a specific instance. So we track the highest utilization amongst all instances by using the `max by(...)` aggregation. We retain the `asserts_env, asserts_site, namespace, service, job` labels in the aggregation clause to help the knowledge graph associate the alert with the correct service entity.

**asserts\_entity\_type**

In the knowledge graph entity model, all infrastructure components are modeled as different types of entities. For example, **Service, ServiceInstance, Pod, Node**, and so on. Redis is a **Service** and the different instances of a clustered Redis are **ServiceInstance**. But as explained in the `Aggregation` section earlier, the knowledge graph associates the alert at the service level. This is achieved by the meta label specification `asserts_entity_type: Service`.

If you want to track the connection saturation at an individual instance, you can tweak the rule as follows:

![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```none
- record: asserts:resource
  expr: >
    max by (asserts_env, asserts_site, namespace, service, job, instance) (
      redis_connected_clients / redis_config_maxclients
    )
  labels:
    asserts_entity_type: ServiceInstance
    asserts_resource_type: client_connections
    asserts_source: redis_exporter
```

The expression now includes the `instance` label and the entity type is set to `ServiceInstance`.

**asserts\_resource\_type**

Similarly, the knowledge graph models the various resources that it observes into different types. For example, `cpu:usage` , `memory:usage`, `disk:usage` etc are different types of resources. In this example, what is being observed is `client_connections`. You may set it to any value that best describes the resource and signal being observed.

**asserts\_source**

Sometimes, the same or similar metrics may be available from multiple instrumentations. The `asserts_source` is a useful meta label to indicate what is the source exporter of the metric. This is helpful when investigating the alert.

We now have a recording rule to track utilization. We also understand the different parts of the recording rule. The knowledge graph now starts observing for utilization of client connections and raise alerts when the warning or critical threshold is exceeded. The knowledge graph has a default threshold for all resource utilization. The thresholds for Saturation of different resources can be configured [here.](/docs/grafana-cloud/knowledge-graph/configure/manage-thresholds/)

## Automatic Anomaly Detection on resource usage

Learn about counter and gauge resource metrics in the following sections.

### Counter resource metric

In some cases, there may be value in observing anomalous changes in a resource metric. These resource metrics are available as [gauges](https://prometheus.io/docs/concepts/metric_types/#gauge) or [counters](https://prometheus.io/docs/concepts/metric_types/#counter). For example, the total number of network bytes received or transmitted is monotonically increasing with time and is available as counters. An unusual spike in the bytes received or transmitted is an anomaly. To observe such anomalies, you must record the `asserts:resource:total` metric. Do this for the `network_bytes`.

![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```none
# For anomaly detection on in-bytes
- record: asserts:resource:total
  expr: redis_net_input_bytes_total
  labels:
    asserts_entity_type: ServiceInstance
    asserts_resource_type: network:rx_bytes
    asserts_source: redis_exporter

# For anomaly detection on out-bytes
- record: asserts:resource:total
  expr: redis_net_output_bytes_total
  labels:
    asserts_entity_type: ServiceInstance
    asserts_resource_type: network:tx_bytes
    asserts_source: redis_exporter
```

Note that in the above rule, the `asserts_entity_type` is set to `ServiceInstance` to observe network transmission anomaly at the specific Redis instance level. This also ensures that the network bytes transfer rate is shown in the Service KPI Dashboard.

### Gauge resource metric

When resource metrics are available as [gauges](https://prometheus.io/docs/concepts/metric_types/#gauge) by mapping them to `asserts:resource:gauge`, the knowledge graph automatically detects anomalies. For example, while you are observing the utilization and saturation of client connections, it might also be interesting to observe for anomalies in the number of connections. For example, a sudden drop or spike in the number of client connections is an anomaly. To observe such anomalies, the number of client connections can be recorded as resource gauge as follows:

![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```none
- record: asserts:resource:gauge
  expr: redis_connected_clients
  labels:
    asserts_resource_type: client_connections
    asserts_entity_type: Service
    asserts_source: redis_exporter
```
