Ceph integration for Grafana Cloud
Ceph delivers object, block, and file storage in one unified system.
Use the instructions in Grafana Cloud to install the Ceph integration.
This integration monitors a Ceph cluster based on the built-in Prometheus plugin, which is enabled by the following command in your cluster:
ceph mgr module enable prometheus
After enabling the Prometheus plugin, you should configure the Grafana Agent to scrape your Ceph nodes. A
ceph_cluster label must be added to each scrape, so that the integration can identify all the components of your cluster.
metrics: wal_directory: /tmp/wal configs: - name: integrations scrape_configs: - job_name: integrations/ceph static_configs: - targets: ['cephnode1:9283', 'cephnode2:9283', 'cephnode3:9283'] labels: ceph_cluster: 'my-cluster' remote_write: - url: http://cortex:9009/api/prom/push
The integration is composed of a single and complete dashboard, which summarizes Ceph cluster information, for example: overall cluster information, the number of OSD and monitors nodes that are up and those that are down, bytes and written/read and write/read throughput rate, input/output operations per second (IOPS), cluster availability, total and used capacity, and latency currency rate and distribution.
The Ceph integration for Grafana Cloud ships the following alerts to make sure that you get notified when something is wrong with your cluster:
CephUnhealthy: based on the overall healthiness metric
ceph_health_status. If this metric doesn’t exist or it returns something different from 1, the cluster is having critical issues.
CephDiskLessThan15Left: Crates an alert warning if there is less than 15% of capacity left in the cluster.
CephDiskLessThan5Left: Creates a critical alert warning if there is less than 5% of capacity left in the cluster.
OSDNodeDown: Creates an alert warning if any OSD node is down.
MDSDown: Create a critical alert if there is no metadata service (MDS) available in the cluster.