Monitor infrastructure

Grafana integrations

Integrations reference

Linux Server

Grafana Cloud

Linux is a family of open-source Unix-like operating systems based on the Linux kernel. Linux is the leading operating system on servers, and is one of the most prominent examples of free and open-source software collaboration.

The Grafana Linux Server integration collects metrics related to the operating system running on a node, including CPU usage, load average, memory usage, and disk and networking I/O using node_exporter integration. You can also configure Alloy to collect logs.

Linux Server integration for Grafana Cloud

Linux Server integration for Grafana Cloud enables you to collect metrics related to the operating system running on a node, including aspects like CPU usage, load average, memory usage, and disk and networking I/O using node_exporter integration. It also allows you to use Grafana Alloy to scrape logs.

This integration includes 24 useful alerts and 7 pre-built dashboards to help monitor and visualize Linux Server metrics and logs.

Before you begin

Each Linux node being observed must have its dedicated Grafana Alloy running.

If you want to monitor more than one Linux Node with this integration, we recommend you to use the Ansible collection for Grafana Cloud to deploy Grafana Alloy to multiple machines, as described in this documentation.

Install Linux Server integration for Grafana Cloud

In your Grafana Cloud stack, click Connections in the left-hand menu.
Find Linux Server and click its tile to open the integration.
Review the prerequisites in the Configuration Details tab and set up Grafana Alloy to send Linux Server metrics and logs to your Grafana Cloud instance.
Click Install to add this integration’s pre-built dashboards and alerts to your Grafana Cloud instance, and you can start monitoring your Linux Server setup.

Configuration snippets for Grafana Alloy

Simple mode

These snippets are configured to scrape a single Linux Server instance running locally with default ports.

Manually copy and append the following snippets into your Grafana Alloy configuration file.

Integrations snippets

discovery.relabel "integrations_node_exporter" {
  targets = prometheus.exporter.unix.integrations_node_exporter.targets

  rule {
    target_label = "instance"
    replacement  = constants.hostname
  }

  rule {
    target_label = "job"
    replacement = "integrations/node_exporter"
  }
}

prometheus.exporter.unix "integrations_node_exporter" {
  disable_collectors = ["ipvs", "btrfs", "infiniband", "xfs", "zfs"]

  filesystem {
    fs_types_exclude     = "^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|tmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$"
    mount_points_exclude = "^/(dev|proc|run/credentials/.+|sys|var/lib/docker/.+)($|/)"
    mount_timeout        = "5s"
  }

  netclass {
    ignored_devices = "^(veth.*|cali.*|[a-f0-9]{15})$"
  }

  netdev {
    device_exclude = "^(veth.*|cali.*|[a-f0-9]{15})$"
  }
}

prometheus.scrape "integrations_node_exporter" {
  targets    = discovery.relabel.integrations_node_exporter.output
  forward_to = [prometheus.relabel.integrations_node_exporter.receiver]
}

prometheus.relabel "integrations_node_exporter" {
  forward_to = [prometheus.remote_write.metrics_service.receiver]

  rule {
    source_labels = ["__name__"]
    regex         = "node_scrape_collector_.+"
    action        = "drop"
  }
}

Logs snippets

linux

loki.relabel "integrations_node_exporter" {
  forward_to = [loki.write.grafana_cloud_loki.receiver]
  rule {
    target_label = "job"
    replacement  = "integrations/node_exporter"
  }
  rule {
    target_label = "instance"
    replacement  = constants.hostname
  }
}

journal_module "integrations_node_exporter" {
  forward_to = [loki.relabel.integrations_node_exporter.receiver]
}

//JOURNAL
declare "journal_module" {
  argument "forward_to" {
      optional = false
  }

  loki.source.journal "default"  {
      max_age       = "12h0m0s"
      forward_to    = [loki.process.default.receiver]
      relabel_rules = loki.relabel.default.rules
  }

  loki.relabel "default" {
      rule {
          source_labels = ["__journal__systemd_unit"]
          target_label  = "unit"
      }
      rule {
          source_labels = ["__journal__boot_id"]
          target_label  = "boot_id"
      }
      rule {
          source_labels = ["__journal__transport"]
          target_label  = "transport"
      }
      rule {
          source_labels = ["__journal_priority_keyword"]
          target_label  = "level"
      }
      forward_to    = []
  }
  loki.process "default" {
      forward_to    = argument.forward_to.value
  }
}

Advanced mode

To instruct Grafana Alloy to scrape your Linux Server instance, go though the subsequent instructions.

The snippets provide examples to guide you through the configuration process.

First, Manually copy and append the following snippets into your Grafana Alloy configuration file.

Then follow the instructions below to modify the necessary variables.

Advanced integrations snippets

discovery.relabel "integrations_node_exporter" {
  targets = prometheus.exporter.unix.integrations_node_exporter.targets

  rule {
    target_label = "instance"
    replacement  = constants.hostname
  }

  rule {
    target_label = "job"
    replacement = "integrations/node_exporter"
  }
}

prometheus.exporter.unix "integrations_node_exporter" {
  disable_collectors = ["ipvs", "btrfs", "infiniband", "xfs", "zfs"]

  filesystem {
    fs_types_exclude     = "^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|tmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$"
    mount_points_exclude = "^/(dev|proc|run/credentials/.+|sys|var/lib/docker/.+)($|/)"
    mount_timeout        = "5s"
  }

  netclass {
    ignored_devices = "^(veth.*|cali.*|[a-f0-9]{15})$"
  }

  netdev {
    device_exclude = "^(veth.*|cali.*|[a-f0-9]{15})$"
  }
}

prometheus.scrape "integrations_node_exporter" {
  targets    = discovery.relabel.integrations_node_exporter.output
  forward_to = [prometheus.relabel.integrations_node_exporter.receiver]
}

prometheus.relabel "integrations_node_exporter" {
  forward_to = [prometheus.remote_write.metrics_service.receiver]

  rule {
    source_labels = ["__name__"]
    regex         = "node_scrape_collector_.+"
    action        = "drop"
  }
}

This integration uses the prometheus.exporter.unix component to collect system metrics.

The supplied configuration is tuned to exclude any metrics from the exporter which are not used by the integration’s dashboards, alerts, or recording rules. If a broader configuration which includes additional metrics is desired, the prometheus.exporter.unix component can be adjusted accordingly.

Advanced logs snippets

linux

loki.relabel "integrations_node_exporter" {
  forward_to = [loki.write.grafana_cloud_loki.receiver]
  rule {
    target_label = "job"
    replacement  = "integrations/node_exporter"
  }
  rule {
    target_label = "instance"
    replacement  = constants.hostname
  }
}

journal_module "integrations_node_exporter" {
  forward_to = [loki.relabel.integrations_node_exporter.receiver]
}

filelogs_module "integrations_node_exporter" {
  forward_to = [loki.relabel.integrations_node_exporter.receiver] 
}

//FILELOGS
declare "filelogs_module" {
  argument "forward_to" {
      optional = false
  }
  local.file_match "default" {
    path_targets = [{
      __address__ = "localhost",
      __path__    = "/var/log/{syslog,messages,*.log}",
    }]
  }

  loki.source.file "default" {
    targets    = local.file_match.default.targets
    forward_to = argument.forward_to.value
  }
}

//JOURNAL
declare "journal_module" {
  argument "forward_to" {
      optional = false
  }

  loki.source.journal "default"  {
      max_age       = "12h0m0s"
      forward_to    = [loki.process.default.receiver]
      relabel_rules = loki.relabel.default.rules
  }

  loki.relabel "default" {
      rule {
          source_labels = ["__journal__systemd_unit"]
          target_label  = "unit"
      }
      rule {
          source_labels = ["__journal__boot_id"]
          target_label  = "boot_id"
      }
      rule {
          source_labels = ["__journal__transport"]
          target_label  = "transport"
      }
      rule {
          source_labels = ["__journal_priority_keyword"]
          target_label  = "level"
      }
      forward_to    = []
  }
  loki.process "default" {
      forward_to    = argument.forward_to.value
  }
}

This integration uses the loki.source.journal, and local.file_match components to collect system logs.

This includes the systemd journal and the file(s) matching /var/log/{syslog,messages,*.log}.

If you wish to capture other log files, you must add new new maps to the path_targets list parameter of the local.file_match component. If you wish for these additionally captured logs to be labeled so that they can be seen in Linux Node integration logs dashboard, the entry must include the same instance and job labels.

Kubernetes instructions

Before you begin with Kubernetes

These instructions assume the use of the Kubernetes Monitoring Helm chart.

Configuration snippets for Kubernetes Helm chart

To scrape your Linux nodes of the Kubernetes cluster, ensure that your Kubernetes Monitoring Helm chart values has these configuration snippets.

Metrics snippets

alloy-metrics:
  enabled: true
clusterMetrics:
  enabled: true
  node-exporter:
    enabled: true
    metricsTuning:
      useIntegrationAllowList: true

Logs snippets

alloy-logs:
  enabled: true
nodeLogs:
  enabled: true
  journal:
    jobLabel: "integrations/node_exporter"
  labelsToKeep:
    ["instance", "job", "level", "name", "unit", "service_name", "source", "transport", "boot_id"]
  extraLogProcessingStages: |
    stage.labels {
      values = {
        boot_id = "__journal__boot_id",
        transport = "__journal__transport",
      }
    }

Dashboards

The Linux Server integration installs the following dashboards in your Grafana Cloud instance to help monitor your system.

Linux node / CPU and system
Linux node / filesystem and disks
Linux node / fleet overview
Linux node / logs
Linux node / memory
Linux node / network
Linux node / overview

Node overview dashboard

Fleet overview dashboard

Drill down dashboards: Network interfaces

Alerts

The Linux Server integration includes the following useful alerts:

node-exporter-filesystem

Alert	Description
NodeFilesystemAlmostOutOfSpace	Warning: Filesystem has less than 5% space left.
NodeFilesystemAlmostOutOfSpace	Critical: Filesystem has less than 3% space left.
NodeFilesystemFilesFillingUp	Warning: Filesystem is predicted to run out of inodes within the next 24 hours.
NodeFilesystemFilesFillingUp	Critical: Filesystem is predicted to run out of inodes within the next 4 hours.
NodeFilesystemAlmostOutOfFiles	Warning: Filesystem has less than 5% inodes left.
NodeFilesystemAlmostOutOfFiles	Critical: Filesystem has less than 3% inodes left.

node-exporter

Alert	Description
NodeCPUHighUsage	Info: High CPU usage.
NodeClockNotSynchronising	Warning: Clock not synchronising.
NodeClockSkewDetected	Warning: Clock skew detected.
NodeDiskIOSaturation	Warning: Disk IO queue is high.
NodeFileDescriptorLimit	Warning: Kernel is predicted to exhaust file descriptors limit soon.
NodeHasRebooted	Info: Node has rebooted.
NodeHighNumberConntrackEntriesUsed	Warning: Number of conntrack are getting close to the limit.
NodeMemoryHighUtilization	Warning: Host is running out of memory.
NodeMemoryMajorPagesFaults	Warning: Memory major page faults are occurring at very high rate.
NodeNetworkReceiveErrs	Warning: Network interface is reporting many receive errors.
NodeNetworkTransmitErrs	Warning: Network interface is reporting many transmit errors.
NodeProcessesCountIsHigh	Warning: There is more than 400 running processes on host.
NodeRAIDDegraded	Critical: RAID Array is degraded.
NodeRAIDDiskFailure	Warning: Failed device in RAID array.
NodeSystemSaturation	Warning: System saturated, load per core is very high.
NodeSystemdServiceCrashlooping	Warning: Systemd service keeps restaring, possibly crash looping.
NodeSystemdServiceFailed	Warning: Systemd service has entered failed state.
NodeTextFileCollectorScrapeError	Warning: Node Exporter text file collector failed to scrape.

Metrics

The most important metrics provided by the Linux Server integration, which are used on the pre-built dashboards and Prometheus alerts, are as follows:

node_arp_entries
node_boot_time_seconds
node_context_switches_total
node_cpu_seconds_total
node_disk_io_time_seconds_total
node_disk_io_time_weighted_seconds_total
node_disk_read_bytes_total
node_disk_read_time_seconds_total
node_disk_reads_completed_total
node_disk_write_time_seconds_total
node_disk_writes_completed_total
node_disk_written_bytes_total
node_filefd_allocated
node_filefd_maximum
node_filesystem_avail_bytes
node_filesystem_device_error
node_filesystem_files
node_filesystem_files_free
node_filesystem_readonly
node_filesystem_size_bytes
node_intr_total
node_load1
node_load15
node_load5
node_md_disks
node_md_disks_required
node_memory_Active_anon_bytes
node_memory_Active_bytes
node_memory_Active_file_bytes
node_memory_AnonHugePages_bytes
node_memory_AnonPages_bytes
node_memory_Bounce_bytes
node_memory_Buffers_bytes
node_memory_Cached_bytes
node_memory_CommitLimit_bytes
node_memory_Committed_AS_bytes
node_memory_DirectMap1G_bytes
node_memory_DirectMap2M_bytes
node_memory_DirectMap4k_bytes
node_memory_Dirty_bytes
node_memory_HugePages_Free
node_memory_HugePages_Rsvd
node_memory_HugePages_Surp
node_memory_HugePages_Total
node_memory_Hugepagesize_bytes
node_memory_Inactive_anon_bytes
node_memory_Inactive_bytes
node_memory_Inactive_file_bytes
node_memory_Mapped_bytes
node_memory_MemAvailable_bytes
node_memory_MemFree_bytes
node_memory_MemTotal_bytes
node_memory_SReclaimable_bytes
node_memory_SUnreclaim_bytes
node_memory_ShmemHugePages_bytes
node_memory_ShmemPmdMapped_bytes
node_memory_Shmem_bytes
node_memory_Slab_bytes
node_memory_SwapTotal_bytes
node_memory_VmallocChunk_bytes
node_memory_VmallocTotal_bytes
node_memory_VmallocUsed_bytes
node_memory_WritebackTmp_bytes
node_memory_Writeback_bytes
node_netstat_Icmp6_InErrors
node_netstat_Icmp6_InMsgs
node_netstat_Icmp6_OutMsgs
node_netstat_Icmp_InErrors
node_netstat_Icmp_InMsgs
node_netstat_Icmp_OutMsgs
node_netstat_IpExt_InOctets
node_netstat_IpExt_OutOctets
node_netstat_TcpExt_ListenDrops
node_netstat_TcpExt_ListenOverflows
node_netstat_TcpExt_TCPSynRetrans
node_netstat_Tcp_InErrs
node_netstat_Tcp_InSegs
node_netstat_Tcp_OutRsts
node_netstat_Tcp_OutSegs
node_netstat_Tcp_RetransSegs
node_netstat_Udp6_InDatagrams
node_netstat_Udp6_InErrors
node_netstat_Udp6_NoPorts
node_netstat_Udp6_OutDatagrams
node_netstat_Udp6_RcvbufErrors
node_netstat_Udp6_SndbufErrors
node_netstat_UdpLite_InErrors
node_netstat_Udp_InDatagrams
node_netstat_Udp_InErrors
node_netstat_Udp_NoPorts
node_netstat_Udp_OutDatagrams
node_netstat_Udp_RcvbufErrors
node_netstat_Udp_SndbufErrors
node_network_carrier
node_network_info
node_network_mtu_bytes
node_network_receive_bytes_total
node_network_receive_compressed_total
node_network_receive_drop_total
node_network_receive_errs_total
node_network_receive_fifo_total
node_network_receive_multicast_total
node_network_receive_packets_total
node_network_speed_bytes
node_network_transmit_bytes_total
node_network_transmit_compressed_total
node_network_transmit_drop_total
node_network_transmit_errs_total
node_network_transmit_fifo_total
node_network_transmit_multicast_total
node_network_transmit_packets_total
node_network_transmit_queue_length
node_network_up
node_nf_conntrack_entries
node_nf_conntrack_entries_limit
node_os_info
node_procs_running
node_sockstat_FRAG6_inuse
node_sockstat_FRAG_inuse
node_sockstat_RAW6_inuse
node_sockstat_RAW_inuse
node_sockstat_TCP6_inuse
node_sockstat_TCP_alloc
node_sockstat_TCP_inuse
node_sockstat_TCP_mem
node_sockstat_TCP_mem_bytes
node_sockstat_TCP_orphan
node_sockstat_TCP_tw
node_sockstat_UDP6_inuse
node_sockstat_UDPLITE6_inuse
node_sockstat_UDPLITE_inuse
node_sockstat_UDP_inuse
node_sockstat_UDP_mem
node_sockstat_UDP_mem_bytes
node_sockstat_sockets_used
node_softnet_dropped_total
node_softnet_processed_total
node_softnet_times_squeezed_total
node_systemd_service_restart_total
node_systemd_unit_state
node_textfile_scrape_error
node_time_zone_offset_seconds
node_timex_estimated_error_seconds
node_timex_maxerror_seconds
node_timex_offset_seconds
node_timex_sync_status
node_uname_info
node_vmstat_oom_kill
node_vmstat_pgfault
node_vmstat_pgmajfault
node_vmstat_pgpgin
node_vmstat_pgpgout
node_vmstat_pswpin
node_vmstat_pswpout
process_max_fds
process_open_fds
up

Changelog

# 1.6.2 - May 2025

* Update Kubernetes Helm chart snippets.

# 1.6.1 - April 2025

* Update log snippets:
    * Use new snippets to see journal logs on Node logs dashboard filters.

# 1.6.0 - March 2025

* Update node mixin:
    * Add 'Load average' and 'Network usage' panels to fleet dashboard
    * Fix legends in disk panels on fleet dashboard
    * Fix CPU usage (per core) query.

# 1.5.2 - January 2025

* Fix status panel metrics data source.

# 1.5.1 - January 2025

* Fix status panel logs query.

# 1.5.0 - January 2025

* Update node mixin, add new alerts:
    * NodeHasRebooted
    * NodeProcessesCountIsHigh
* Fix logs dashboard not showing any logs if cluster label is missing.

# 1.4.2 - November 2024

* Update Log dashboard job selector to always have a selected option.

# 1.4.1 - November 2024

* Update status panel check queries.

# 1.4.0 - September 2024

* Add asserts support.

# 1.3.0 - June 2024

* Add new alert: NodeSystemdServiceCrashlooping
* Fix links in the fleet overview table.

# 1.2.3 - December 2023

* Accept `integrations/unix` for compatibility with default flow mode node_exporter job name.

# 1.2.2 - December 2023

* Fix issues with showing data on dashboards when `cluster` label has no value.

# 1.2.1 - December 2023

* Fix queries for memoryBuffers memoryCached metrics
* Update network traffic panels to show only interfaces that had traffic
* Update network errors/drops panels to show only values greater than 0.

# 1.2.0 - November 2023

* Dashboards prefixes are changed to 'Linux node/ '
* Add new Loki based annotations:
    * Service failed
    * Critical system event
    * Session (ssh,console) opened/closed
* Apply panel changes, some examples:
    * Use Sentence case in titles
    * Memory TS panel: Show only 'Memory total' and 'Memory used' by default
    * CPU usage TS panel: Use Blue-Yellow-Red color Schema
    * Add OS and group labels(job, cluster) as columns in Fleet overview table
* NodeSystemSaturation alert severity is set to warning
* Attach integration status panel to fleet and logs dashboards.

# 1.1.2 - August 2023

* Add regex filter for logs datasource.

# 1.1.1 - July 2023

* New Filter Metrics option for configuring the Grafana Agent, which saves on metrics cost by dropping any metric not used by this integration. Beware that anything custom built using metrics that are not on the snippet will stop working.

# 1.1.0 - June 2023

* This update introduces generic logs dashboard 'Node Exporter / Node Logs'
    * Drop log panels 'Node Overview' dashboard.

# 1.0.1 - June 2023

* This update includes the following, by updating to the latest mixin:
    * Panel description typos have been fixed
    * Incorrect data links in the "Node Fleet Overview" Dashboard now correctly include the dashboard selector.

# 1.0.0 - April 2023

* This update introduces 3-tier view of linux nodes:
    * TOP: Fleet view: see group of your linux instances at once
    * Overview of the specific node: see specific node at a glance
    * Drill down: Set of dashboards for deep analysis using advanced metrics (Memory, CPU and System, Filesystem and Disk, Networking)
    * Links and data links are provided for better navigation between views
* Update agent's filter config in docs, to reduce number of timeseries generated per node
    * Metrics filter instructions to exclude dynamic network devices, temp filesystems and extended scrape statistics
* Remove USE dashboards
* Convert all graphs to timeseries panels
* Add information row
* New alerts
* Split alerts into two alert groups
* Annotations for events: Reboot, OOMkill, and 'Kernel update'.

# 0.0.8 - October 2022

* Update upstream node_exporter mixin
* Enable multicluster dashboards for use in kubernetes.
* Add direct log file scrape to the agent snippets.

# 0.0.7 - September 2022

* Remove source_address from relabel_configs.

# 0.0.6 - May 2022

* Reverse fsSpaceAvailableCriticalThreshold and fsSpaceAvailableWarningThreshold
* Update units for disk and networking panels.

# 0.0.5 - May 2022

* Update 'Disk Space Usage' panel to table format.

# 0.0.4 - April 2022

* Fixed alerts and recording rules by providing proper nodeSelector.

# 0.0.3 - February 2022

* Added logs support from Loki datasource.

# 0.0.2 - October 2021

* Update all rate queries to use `$__rate_interval`.

# 0.0.1 - June 2020

* Initial release.

Cost

By connecting your Linux Server instance to Grafana Cloud, you might incur charges. To view information on the number of active series that your Grafana Cloud account uses for metrics included in each Cloud tier, see Active series and dpm usage and Cloud tier pricing.