Postgres Exporter

Postgres Exporter

Overview Installation Dashboards Alerting rules Grafana Cloud Integration

On this page:

This quickstart includes the following alerting rules:

  • PostgreSQLMaxConnectionsReached

Postgres ran out of available connections

  • PostgreSQLHighConnections

Postgres is exceeding 80% of the currently configured maximum Postgres connection limit

  • PostgreSQLDown

Postgres is not processing queries

  • PostgreSQLSlowQueries

Postgres has high number of slow queries

  • PostgreSQLQPS

Postgres has high number of queries per second

  • PostgreSQLCacheHitRatio

Postgres is low on cache hit rate

Download the following alerting rules YAML file
  - name: PostgreSQL
    - alert: PostgreSQLMaxConnectionsReached
      expr: sum(pg_stat_activity_count) by (instance) >= sum(pg_settings_max_connections) by (instance) - sum(pg_settings_superuser_reserved_connections) by (instance)
      for: 1m
        severity: email
        summary: "{{ $labels.instance }} has maxed out Postgres connections."
        description: "{{ $labels.instance }} is exceeding the currently configured maximum Postgres connection limit (current value: {{ $value }}s). Services may be degraded - please take immediate action (you probably need to increase max_connections in the Docker image and re-deploy."

    - alert: PostgreSQLHighConnections
      expr: sum(pg_stat_activity_count) by (instance) > (sum(pg_settings_max_connections) by (instance) - sum(pg_settings_superuser_reserved_connections) by (instance)) * 0.8
      for: 10m
        severity: email
        summary: "{{ $labels.instance }} is over 80% of max Postgres connections."
        description: "{{ $labels.instance }} is exceeding 80% of the currently configured maximum Postgres connection limit (current value: {{ $value }}s). Please check utilization graphs and confirm if this is normal service growth, abuse or an otherwise temporary condition or if new resources need to be provisioned (or the limits increased, which is mostly likely)."

    - alert: PostgreSQLDown
      expr: pg_up != 1
      for: 1m
        severity: email
        summary: "PostgreSQL is not processing queries: {{ $labels.instance }}"
        description: "{{ $labels.instance }} is rejecting query requests from the exporter, and thus probably not allowing DNS requests to work either. User services should not be effected provided at least 1 node is still alive."

    - alert: PostgreSQLSlowQueries
      expr: avg(rate(pg_stat_activity_max_tx_duration{datname!~"template.*"}[2m])) by (datname) > 2 * 60
      for: 2m
        severity: email
        summary: "PostgreSQL high number of slow on {{ $labels.cluster }} for database {{ $labels.datname }} "
        description: "PostgreSQL high number of slow queries {{ $labels.cluster }} for database {{ $labels.datname }} with a value of {{ $value }} "

    - alert: PostgreSQLQPS
      expr: avg(irate(pg_stat_database_xact_commit{datname!~"template.*"}[5m]) + irate(pg_stat_database_xact_rollback{datname!~"template.*"}[5m])) by (datname) > 10000
      for: 5m
        severity: email
        summary: "PostgreSQL high number of queries per second {{ $labels.cluster }} for database {{ $labels.datname }}"
        description: "PostgreSQL high number of queries per second on {{ $labels.cluster }} for database {{ $labels.datname }} with a value of {{ $value }}"

    - alert: PostgreSQLCacheHitRatio
      expr: avg(rate(pg_stat_database_blks_hit{datname!~"template.*"}[5m]) / (rate(pg_stat_database_blks_hit{datname!~"template.*"}[5m]) + rate(pg_stat_database_blks_read{datname!~"template.*"}[5m]))) by (datname) < 0.98
      for: 5m
        severity: email
        summary: "PostgreSQL low cache hit rate on {{ $labels.cluster }} for database {{ $labels.datname }}"
        description: "PostgreSQL low on cache hit rate on {{ $labels.cluster }} for database {{ $labels.datname }} with a value of {{ $value }}"

This alerting rule YAML file was generated using the Postgres Exporter mixin.