Linux Server integration for Grafana Cloud
The Linux integration for Grafana Cloud enables you to collect metrics related to the operating system running on a node, including aspects like CPU usage, load average, memory usage, and disk and networking I/O using the well known node_exporter. It also allows you to use the agent to scrape logs with promtail. Supported files are syslog, auth.log, kern.log and journal logs.
This integration includes sixteen useful alerts and two pre-built dashboards to help monitor and visualize Linux metrics and logs.
Install Linux Server integration for Grafana Cloud
- In your Grafana instance, click Integrations and Connections (lightning bolt icon)
- Navigate to the Linux Server tile and review the prerequisites. Then click Install integration.
- Once the integration is installed, follow the steps on the Configuration Details page to setup Grafana Agent and start sending Linux Server metrics to your Grafana Cloud instance.
Post-install configuration for the Linux Server integration
If you want to show logs and metrics signals correlated in your dashboards, ensure the following:
job
andinstance
label values must match fornode_exporter
integration andlogs
scrape config in your agent configuration file.job
label must be set tointegrations/node_exporter
(already configured in the snippets).instance
label must be set to a value that uniquely identifies your Linux node. Replace the defaulthostname
value according to your environment - it should be set manually. Note that if you uselocalhost
for multiple nodes, the dashboards will not be able to filter correctly by instance.
Note: Ensure each deployed Grafana Agent has a configuration that matches the node it is deployed to.
Refer to the following preferred agent configuration example, with logs collected from systemd:
integrations:
node_exporter:
enabled: true
relabel_configs:
- replacement: hostname
target_label: instance
logs:
configs:
- name: integrations
scrape_configs:
- job_name: integrations/node_exporter_journal_scrape
journal:
max_age: 24h
labels:
instance: hostname
job: integrations/node_exporter
relabel_configs:
- source_labels: ['__journal__systemd_unit']
target_label: 'unit'
- source_labels: ['__journal__boot_id']
target_label: 'boot_id'
- source_labels: ['__journal__transport']
target_label: 'transport'
- source_labels: ['__journal_priority_keyword']
target_label: 'level'
If systemd is not an option, you can scrape log files instead:
integrations:
node_exporter:
enabled: true
relabel_configs:
- replacement: hostname
target_label: instance
logs:
configs:
- name: integrations
scrape_configs:
- job_name: integrations/node_exporter_direct_scrape
static_configs:
- targets:
- localhost
labels:
instance: hostname
__path__: /var/log/{syslog,messages,*.log}
job: integrations/node_exporter
Dashboards
The Linux Server integration installs the following dashboards in your Grafana Cloud instance to help monitor your Linux metrics.
- Node Exporter / Nodes
- Node Exporter / USE Method / Node
Linux Node dashboard
Linux USE dashboard
Alerts
This integration includes the following useful alerts:
Group: node-exporter
Alert | Description |
---|---|
NodeFilesystemAlmostOutOfSpace | Warning: Filesystem has less than 5% space left. |
NodeFilesystemAlmostOutOfSpace | Critical: Filesystem has less than 3% space left. |
NodeFilesystemFilesFillingUp | Warning: Filesystem is predicted to run out of inodes within the next 24 hours. |
NodeFilesystemFilesFillingUp | Critical: Filesystem is predicted to run out of inodes within the next 4 hours. |
NodeFilesystemAlmostOutOfFiles | Warning: Filesystem has less than 5% inodes left. |
NodeFilesystemAlmostOutOfFiles | Critical: Filesystem has less than 3% inodes left. |
NodeNetworkReceiveErrs | Warning: Network interface is reporting many receive errors. |
NodeNetworkTransmitErrs | Warning: Network interface is reporting many transmit errors. |
NodeHighNumberConntrackEntriesUsed | Warning: Number of conntrack are getting close to the limit. |
NodeTextFileCollectorScrapeError | Warning: Node Exporter text file collector failed to scrape. |
NodeClockSkewDetected | Warning: Clock skew detected. |
NodeClockNotSynchronising | Warning: Clock not synchronising. |
NodeRAIDDegraded | Critical: RAID Array is degraded |
NodeRAIDDiskFailure | Warning: Failed device in RAID array |
NodeFileDescriptorLimit | Warning: Kernel is predicted to exhaust file descriptors limit soon. |
NodeFileDescriptorLimit | Critical: Kernel is predicted to exhaust file descriptors limit soon. |
Metrics
The following metrics are automatically written to your Grafana Cloud instance by connecting your Linux Server instance through this integration:
- instance:node_cpu_utilisation:rate5m
- instance:node_load1_per_cpu:ratio
- instance:node_memory_utilisation:ratio
- instance:node_network_receive_bytes_excluding_lo:rate5m
- instance:node_network_receive_drop_excluding_lo:rate5m
- instance:node_network_transmit_bytes_excluding_lo:rate5m
- instance:node_network_transmit_drop_excluding_lo:rate5m
- instance:node_num_cpu:sum
- instance:node_vmstat_pgmajfault:rate5m
- instance_device:node_disk_io_time_seconds:rate5m
- instance_device:node_disk_io_time_weighted_seconds:rate5m
- node_cpu_seconds_total
- node_disk_io_time_seconds_total
- node_disk_io_time_weighted_seconds_total
- node_disk_read_bytes_total
- node_disk_written_bytes_total
- node_exporter_build_info
- node_filefd_allocated
- node_filefd_maximum
- node_filesystem_avail_bytes
- node_filesystem_files
- node_filesystem_files_free
- node_filesystem_readonly
- node_filesystem_size_bytes
- node_load1
- node_load15
- node_load5
- node_md_disks
- node_md_disks_required
- node_memory_Buffers_bytes
- node_memory_Cached_bytes
- node_memory_MemAvailable_bytes
- node_memory_MemFree_bytes
- node_memory_MemTotal_bytes
- node_memory_Slab_bytes
- node_network_receive_bytes_total
- node_network_receive_drop_total
- node_network_receive_errs_total
- node_network_receive_packets_total
- node_network_transmit_bytes_total
- node_network_transmit_drop_total
- node_network_transmit_errs_total
- node_network_transmit_packets_total
- node_nf_conntrack_entries
- node_nf_conntrack_entries_limit
- node_textfile_scrape_error
- node_time_seconds
- node_timex_maxerror_seconds
- node_timex_offset_seconds
- node_timex_sync_status
- node_uname_info
- node_vmstat_pgmajfault
Changelog
# 0.0.8 - October 2022
- Update upstream node_exporter mixin: [ba8c043079b38748e57adf1f80e3d86a4060efc5](https://github.com/prometheus/node_exporter/commit/ba8c043079b38748e57adf1f80e3d86a4060efc5)
- Enable multicluster dashboards for use in kubernetes.
- Add direct log file scrape to the agent snippets
# 0.0.7 - September 2022
- Remove source_address from relabel_configs
# 0.0.6 - May 2022
- Reverse fsSpaceAvailableCriticalThreshold and fsSpaceAvailableWarningThreshold
- Update units for disk and networking panels
# 0.0.5 - May 2022
- Update 'Disk Space Usage' panel to table format
# 0.0.4 - April 2022
- Fixed alerts and recording rules by providing proper nodeSelector
# 0.0.3 - February 2022
- Added logs support from Loki datasource
# 0.0.2 - October 2021
- Update all rate queries to use `$__rate_interval`
# 0.0.1 - June 2020
- Initial release
Cost
By connecting your Linux Server instance to Grafana Cloud you might incur charges. To view information on the number of active series that your Grafana Cloud account uses for metrics included in each Cloud tier, see Active series and dpm usage and Cloud tier pricing.