IntegrationsCurrently available IntegrationsCeph Integration

Ceph Integration for Grafana Cloud

Ceph delivers object, block, and file storage in one unified system.

Use the instructions in Grafana Cloud to install the Ceph Integration.

This integration monitors a Ceph cluster based on the built-in Prometheus plugin, which is enabled by the following command in your cluster:

ceph mgr module enable prometheus

After enabling the Prometheus plugin, you should configure the Grafana Agent to scrape your Ceph nodes. A ceph_cluster label must be added to each scrape, so that the integration can identify all the components of your cluster.

prometheus:
  wal_directory: /tmp/wal
  configs:
    - name: integrations
      scrape_configs:
        - job_name: integrations/ceph
          static_configs:
            - targets: ['cephnode1:9283', 'cephnode2:9283', 'cephnode3:9283']
              labels:
                ceph_cluster: 'my-cluster'
      remote_write:
        - url: http://cortex:9009/api/prom/push

The integration is composed of a single and complete dashboard, which summarizes Ceph cluster information, for example: overall cluster information, the number of OSD and monitors nodes that are up and those that are down, bytes and written/read and write/read throughput rate, input/output operations per second (IOPS), cluster availability, total and used capacity, and latency currency rate and distribution.

The Ceph Integration for Grafana Cloud ships the following alerts to make sure that you get notified when something is wrong with your cluster:

  • CephUnhealthy: based on the overall healthiness metric ceph_health_status. If this metric doesn’t exist or it returns something different from 1, the cluster is having critical issues.
  • CephDiskLessThan15Left: Crates an alert warning if there is less than 15% of capacity left in the cluster.
  • CephDiskLessThan5Left: Creates a critical alert warning if there is less than 5% of capacity left in the cluster.
  • OSDNodeDown: Creates an alert warning if any OSD node is down.
  • MDSDown: Create a critical alert if there is no metadata service (MDS) available in the cluster.