Open source

Troubleshoot log ingestion (WRITE)

This guide helps you troubleshoot errors that occur when ingesting logs into Loki and writing logs to storage. When Loki rejects log ingestion requests, it’s typically due to exceeding rate limits, violating validation rules, or encountering storage issues.

Before you begin, ensure you have the following:

  • Access to Grafana Loki logs and metrics
  • Understanding of your log ingestion pipeline (Alloy or custom clients)
  • Permissions to configure limits and settings if needed

Monitoring ingestion errors

All ingestion errors are tracked using Prometheus metrics:

  • loki_discarded_samples_total - Count of discarded log samples by reason
  • loki_discarded_bytes_total - Volume of discarded log data by reason

The reason label on these metrics indicates the specific error type. Set up alerts on these metrics to detect ingestion problems.

Rate limit errors

Rate limits protect Loki from being overwhelmed by excessive log volume. When rate limits are exceeded, Loki returns HTTP status code 429 Too Many Requests.

Error: rate_limited

Error message:

ingestion rate limit exceeded for user <tenant> (limit: <limit> bytes/sec) while attempting to ingest <lines> lines totaling <bytes> bytes, reduce log volume or contact your Loki administrator to see if the limit can be increased

Cause:

The tenant has exceeded their configured ingestion rate limit. This is a global per-tenant limit enforced by the distributor.

Default configuration:

  • ingestion_rate_mb: 4 MB/sec
  • ingestion_burst_size_mb: 6 MB
  • ingestion_rate_strategy: “global” (shared across all distributors)

Resolution:

  • Increase rate limits (if you have sufficient cluster resources):

    YAML
    limits_config:
      ingestion_rate_mb: 8
      ingestion_burst_size_mb: 12
  • Reduce log volume by:

    • Filtering unnecessary log lines using Alloy processing stages
    • Collecting logs from fewer targets
    • Sampling high-volume streams
    • Using Alloy’s rate limiting and filtering capabilities
  • Switch to local strategy (if using single-region deployment):

    YAML
    limits_config:
      ingestion_rate_strategy: local

    Note

    This multiplies the effective limit by the number of distributor replicas.

Properties:

  • Enforced by: Distributor
  • Retryable: Yes
  • HTTP status: 429 Too Many Requests
  • Configurable per tenant: Yes

Error: stream_limit

Error message:

maximum active stream limit exceeded when trying to create stream <stream_labels>, reduce the number of active streams (reduce labels or reduce label values), or contact your Loki administrator to see if the limit can be increased, user: <tenant>

Cause:

The tenant has reached the maximum number of active streams. Active streams are held in memory on ingesters, and excessive streams can cause out-of-memory errors.

Default configuration:

  • max_global_streams_per_user: 5000 (globally)
  • max_streams_per_user: 0 (no local limit per ingester)
  • Active stream window: chunk_idle_period (default: 30 minutes)

Terminology:

  • Active stream: A stream that has received logs within the chunk_idle_period (default: 30 minutes)

Resolution:

  • Reduce stream cardinality by:

    • Using fewer unique label combinations by removing unnecessary labels
    • Avoiding high-cardinality labels (IDs, timestamps, UUIDs)
    • Using structured metadata instead of labels for high-cardinality data
  • Increase stream limits (if you have sufficient memory):

    YAML
    limits_config:
      max_global_streams_per_user: 10000

Note

Do not increase stream limits to accommodate high cardinality labels, this can result in Loki flushing extremely high numbers of small files which will make for extremely poor query performance. As volume and the size of the infrastructure being monitored increase, it would be expected to increase the stream limit. However even for hundreds of TBs per day of logs you should avoid exceeding 300,000 max global streams per user.

Properties:

  • Enforced by: Ingester
  • Retryable: Yes
  • HTTP status: 429 Too Many Requests
  • Configurable per tenant: Yes

Validation errors

Validation errors occur when log data doesn’t meet Loki’s requirements. These return HTTP status code 400 Bad Request and are not retryable.

Error: line_too_long

Error message:

max entry size <max_size> bytes exceeded for stream <stream_labels> while adding an entry with length <entry_size> bytes

Cause:

A log line exceeds the maximum allowed size.

Default configuration:

  • max_line_size: 256 KB
  • max_line_size_truncate: false

Resolution:

  • Truncate long lines instead of discarding them:

    YAML
    limits_config:
      max_line_size: 256KB
      max_line_size_truncate: true
  • Increase the line size limit (not recommended above 256KB):

    YAML
    limits_config:
      max_line_size: 256KB

    Warning

    Loki was built as a large multi-user, multi-tenant database and as such this limit becomes very important to maintain stability and performance with many users query the database simultaneously. We strongly recommend against increasing the max_line_size, doing so will make it very difficult to provide consistent query performance and stability of the system without having to throw extremely large amounts of memory and/or increasing the GRPC message size limits to really high levels, both of which will likely lead to poorer performance and worse experiences.

  • Filter or truncate logs before sending using Alloy processing stages:

    Alloy
    stage.replace {
      expression = "^(.{10000}).*$"
      replace    = "$1"
    }

Properties:

  • Enforced by: Distributor
  • Retryable: No
  • HTTP status: 400 Bad Request
  • Configurable per tenant: Yes

Error: invalid_labels

Error message:

error parsing labels <labels> with error: <parse_error>

Cause:

Label names or values contain invalid characters or don’t follow Prometheus naming conventions:

  • Label names must match regex: [a-zA-Z_][a-zA-Z0-9_]*
  • Label names cannot start with __ (reserved for Grafana internal use)
  • Labels must be valid UTF-8
  • Properly escape special characters in label values
  • Ensure label syntax follows PromQL format: {key="value"}

Default configuration:

This validation is always enabled and cannot be disabled.

Resolution:

  • Fix label names in your log shipping configuration:

    YAML
    # Incorrect
    labels:
      "123-app": "value"      # Starts with number
      "app-name": "value"     # Contains hyphen
      "__internal": "value"   # Reserved prefix
    
    # Correct
    labels:
      app_123: "value"
      app_name: "value"
      internal_label: "value"
  • Use Alloy’s relabeling stages to fix labels:

    Alloy
    stage.label_drop {
      values = ["invalid_label"]
    }
    
    stage.labels {
      values = {
        app_name = "",
      }
    }

Properties:

  • Enforced by: Distributor
  • Retryable: No
  • HTTP status: 400 Bad Request
  • Configurable per tenant: No

Error: missing_labels

Error message:

error at least one label pair is required per stream

Cause:

A log stream was submitted without any labels. Loki requires at least one label pair for each log stream.

Default configuration:

This validation is always enabled and cannot be disabled. Every stream must have at least one label.

Resolution:

Ensure all log streams have at least one label. Update your log shipping configuration to include labels:

Alloy
loki.write "default" {
  endpoint {
    url = "http://loki:3100/loki/api/v1/push"
  }
}

loki.source.file "logs" {
  targets = [
    {
      __path__ = "/var/log/*.log",
      job      = "myapp",
      environment = "production",
    },
  ]
  forward_to = [loki.write.default.receiver]
}

Properties:

  • Enforced by: Distributor
  • Retryable: No
  • HTTP status: 400 Bad Request
  • Configurable per tenant: No

Error: out_of_order / too_far_behind

These errors occur when log entries arrive in the wrong chronological order.

Error messages:

  • When unordered_writes: false: entry out of order
  • When unordered_writes: true: entry too far behind, entry timestamp is: <timestamp>, oldest acceptable timestamp is: <cutoff>

Cause:

Logs are being ingested with timestamps that violate Loki’s ordering constraints:

  • With unordered_writes: false: Any log older than the most recent log in the stream is rejected
  • With unordered_writes: true (default): Logs older than half of max_chunk_age from the newest entry are rejected

Default configuration:

  • unordered_writes: true (since Loki 2.4)
  • max_chunk_age: 2 hours
  • Acceptable timestamp window: 1 hour behind the newest entry (half of max_chunk_age)

Resolution:

  • Ensure timestamps are assigned correctly:

    • Use a timestamp parsing stage in Alloy if logs have embedded timestamps
    • Avoid letting Alloy assign timestamps at read time for delayed log delivery
    Alloy
    stage.timestamp {
      source = "timestamp_field"
      format = "RFC3339"
    }
  • Check for clock skew between log sources and Loki. Ensure clocks are synchronized across your infrastructure.

Properties:

  • Enforced by: Ingester
  • Retryable: No
  • HTTP status: 400 Bad Request
  • Configurable per tenant: No (for max_chunk_age)

Error: greater_than_max_sample_age

Error message:

entry for stream <stream_labels> has timestamp too old: <timestamp>, oldest acceptable timestamp is: <cutoff>

Cause:

The log entry’s timestamp is older than the configured maximum sample age. This prevents ingestion of very old logs.

Default configuration:

  • reject_old_samples: true
  • reject_old_samples_max_age: 168 hours (7 days)

Resolution:

  • Increase the maximum sample age:

    YAML
    limits_config:
      reject_old_samples_max_age: 336h  # 14 days
  • Fix log delivery delays causing old timestamps.

  • For historical log imports, temporarily disable the check per tenant.

  • Check for clock skew between log sources and Loki. Ensure clocks are synchronized across your infrastructure.

  • Disable old sample rejection (not recommended):

    YAML
    limits_config:
      reject_old_samples: false

Properties:

  • Enforced by: Distributor
  • Retryable: No
  • HTTP status: 400 Bad Request
  • Configurable per tenant: Yes

Error: too_far_in_future

Error message:

entry for stream <stream_labels> has timestamp too new: <timestamp>

Cause:

The log entry’s timestamp is further in the future than the configured grace period allows.

Default configuration:

  • creation_grace_period: 10 minutes

Resolution:

  • Check for clock skew between log sources and Loki. Ensure clocks are synchronized across your infrastructure.

  • Verify application timestamps Validate timestamp generation in your applications.

  • Verify timestamp parsing in your log shipping configuration.

Properties:

  • Enforced by: Distributor
  • Retryable: No
  • HTTP status: 400 Bad Request
  • Configurable per tenant: Yes

Error: max_label_names_per_series

Error message:

entry for stream <stream_labels> has <count> label names; limit <limit>

Cause:

The stream has more labels than allowed.

Default configuration:

  • max_label_names_per_series: 15

Resolution:

  • Reduce the number of labels by:
    • Using structured metadata for high-cardinality data
    • Removing unnecessary labels
    • Combining related labels

Warning

We strongly recommend against increasing max_label_names_per_series, doing so creates a larger index which hurts query performance as well as opens the door for cardinality explosions. You should be able to categorize your logs with 15 labels or typically much less. In all our years of running Loki, out of thousands of requests to increase this value, the number of valid exceptions we have seen can be counted on one hand.

Properties:

  • Enforced by: Distributor
  • Retryable: No
  • HTTP status: 400 Bad Request
  • Configurable per tenant: Yes

Error: label_name_too_long

Error message:

stream <stream_labels> has label name too long: <label_name>

Cause:

A label name exceeds the maximum allowed length.

Default configuration:

  • max_label_name_length: 1024 bytes

Resolution:

  • Shorten label names in your log shipping configuration to under the configured limit. You can use abbreviations or shorter descriptive names.

Properties:

  • Enforced by: Distributor
  • Retryable: No
  • HTTP status: 400 Bad Request
  • Configurable per tenant: Yes

Error: label_value_too_long

Error message:

stream <stream_labels> has label value too long: <label_value>

Cause:

A label value exceeds the maximum allowed length.

Default configuration:

  • max_label_value_length: 2048 bytes

Resolution:

  • Shorten label values in your log shipping configuration to under the configured limit. You can use hash values for very long identifiers.

  • Use structured metadata for long values instead of labels.

Properties:

  • Enforced by: Distributor
  • Retryable: No
  • HTTP status: 400 Bad Request
  • Configurable per tenant: Yes

Error: duplicate_label_names

Error message:

stream <stream_labels> has duplicate label name: <label_name>

Cause:

The stream has two or more labels with identical names.

Default configuration:

This validation is always enabled and cannot be disabled. Duplicate label names are never allowed.

Resolution:

  • Remove duplicates Remove duplicate label definitions in your log shipping configuration to ensure unique label names per stream.

  • Check clients Check your ingestion pipeline for label conflicts.

  • Verify processing Verify your label processing and transformation rules.

Properties:

  • Enforced by: Distributor
  • Retryable: No
  • HTTP status: 400 Bad Request
  • Configurable per tenant: No

Error: request_body_too_large

Error message:

Request body too large: <size> bytes, limit: <limit> bytes

Cause:

The HTTP compressed push request body exceeds the configured limit in your gateway/reverse proxy or service that fronts Loki after decompression.

Default configuration:

  • Loki server max request body size: Configured at the HTTP server level
  • Nginx gateway (Helm): gateway.nginxConfig.clientMaxBodySize (default: 4 MB)
  • Alloy default batch size: 1 MB

Resolution:

  • Split large batches into smaller, more frequent requests.

  • Increase batch limit Increase the allowed body size on your gateway/reverse proxy. For example, in the Helm chart set gateway.nginxConfig.clientMaxBodySize; default is 4M.

Properties:

  • Enforced by: Distributor
  • Retryable: No
  • HTTP status: 413 Request Entity Too Large
  • Configurable per tenant: No

Storage errors

Storage errors occur when Loki cannot write data to its storage backend or the Write-Ahead Log (WAL).

Error: Disk space full

Error message:

no space left on device

Cause:

Local storage has run out of space. This can affect ingester WAL, chunk cache, or temporary files.

Default configuration:

Disk space is not limited by Loki configuration; it depends on your infrastructure provisioning.

Resolution:

  • Free up disk space on Loki instances.
  • Configure retention policies to remove old data.
  • Scale storage capacity by adding more disk space.
  • Monitor disk usage and set up alerts before reaching critical levels.
  • Review compactor settings to ensure old data is being cleaned up.

Properties:

  • Enforced by: Operating system
  • Retryable: Yes (after freeing space)
  • HTTP status: 500 Internal Server Error
  • Configurable per tenant: No

Error: Chunk write failure

Error message:

failed to store chunks: storage unavailable

Common causes and errors:

  • S3/Object Storage errors:
    • NoSuchBucket: Storage bucket doesn’t exist
    • AccessDenied: Invalid credentials or permissions
    • RequestTimeout: Network or storage latency issues

Resolution:

  • Verify storage configuration:

    YAML
    storage_config:
      aws:
        s3: s3://region/bucket-name
        s3forcepathstyle: true
  • Check credentials and permissions:

    • Ensure service account has write permissions
    • Verify IAM roles for S3
    • Check API keys for cloud storage
  • Monitor storage health:

    • Check object storage availability
  • Review storage metrics:

    • loki_ingester_chunks_flushed_total
    • loki_ingester_chunks_flush_errors_total
  • Increase retries and timeouts:

    YAML
    storage_config:
      aws:
        s3:
          http_config:
            idle_conn_timeout: 90s
            response_header_timeout: 0s

Properties:

  • Enforced by: Ingester (during chunk flush)
  • Retryable: Yes (with backoff)
  • HTTP status: 500 Internal Server Error
  • Configurable per tenant: No

Error: WAL disk full

Error logged:

Error writing to WAL, disk full, no further messages will be logged for this error

Metric: loki_ingester_wal_disk_full_failures_total

Cause:

The disk where the WAL is stored has run out of space. When this occurs, Loki continues accepting writes but doesn’t log them to the WAL, losing durability guarantees.

Default configuration:

  • WAL enabled by default
  • WAL location: configured by -ingester.wal-dir
  • Checkpoint interval: ingester.checkpoint-duration (default: 5 minutes)

Resolution:

  • Increase disk space for the WAL directory.

  • Monitor disk usage and set up alerts:

    promql
    loki_ingester_wal_disk_full_failures_total > 0
  • Reduce log volume to decrease WAL growth.

  • Check WAL checkpoint frequency:

    YAML
    ingester:
      checkpoint_duration: 5m
  • Verify WAL cleanup is working - old segments should be deleted after checkpointing.

Properties:

  • Enforced by: Ingester
  • Retryable: N/A (writes are accepted but not persisted to WAL)
  • HTTP status: N/A (writes succeed, but durability is compromised)
  • Configurable per tenant: No

Note

The WAL sacrifices durability for availability - it won’t reject writes when the disk is full. After disk space is restored, durability guarantees resume. Use metric loki_ingester_wal_disk_usage_percent to monitor disk usage.

Error: WAL corruption

Error message:

encountered WAL read error, attempting repair

Metric: loki_ingester_wal_corruptions_total

Cause:

The WAL has become corrupted, possibly due to:

  • Disk errors
  • Process crashes during write
  • Filesystem issues
  • Partial deletion of WAL files

Default configuration:

  • WAL enabled by default
  • Automatic repair attempted on startup
  • WAL location: configured by -ingester.wal-dir

Resolution:

  • Monitor for corruption:

    promql
    increase(loki_ingester_wal_corruptions_total[5m]) > 0
  • Automatic recovery: Loki attempts to recover readable data and continues starting.

  • Investigate root cause:

    • Check disk health
    • Review system logs for I/O errors
    • Verify filesystem integrity

Properties:

  • Enforced by: Ingester (on startup)
  • Retryable: N/A (Loki attempts automatic recovery)
  • HTTP status: N/A (occurs during startup, not request handling)
  • Configurable per tenant: No

Note

Loki prioritizes availability over complete data recovery. The replication factor provides redundancy if one ingester loses WAL data. Recovered data may be incomplete if corruption is severe.

Network and connectivity errors

These errors occur due to network issues or service unavailability.

Error: Connection refused

Error message:

connection refused when connecting to loki:3100

Cause:

The Loki service is unavailable or not listening on the expected port.

Default configuration:

  • Loki HTTP listen port: 3100 (-server.http-listen-port)
  • Loki gRPC listen port: 9095 (-server.grpc-listen-port)

Resolution:

  • Verify Loki is running and healthy.
  • Check network connectivity between client and Loki.
  • Confirm the correct hostname and port configuration.
  • Review firewall and security group settings.
  • Check for CPU starvation as a Loki pod that is overwhelmed could fail to respond to a request.

Properties:

  • Enforced by: Network/OS
  • Retryable: Yes (after service recovery)
  • HTTP status: N/A (connection-level error)
  • Configurable per tenant: No

Error: Context deadline exceeded

Error message:

context deadline exceeded

Cause:

Requests are timing out due to slow response times or network issues.

Default configuration:

  • Alloy default timeout: 10 seconds
  • Loki server timeout: -server.http-server-write-timeout (default: 30s)

Resolution:

  • Increase timeout values in your ingestion client.
  • Check Loki performance and resource utilization.
  • Review network latency between client and server.
  • Scale Loki resources if needed.

Properties:

  • Enforced by: Client/Server
  • Retryable: Yes
  • HTTP status: 504 Gateway Timeout (or client-side timeout)
  • Configurable per tenant: No

Error: Service unavailable

Error message:

service unavailable (503)

Cause:

Loki is temporarily unable to handle requests due to high load or maintenance. This can occur when ingesters are unhealthy or the ring is not ready.

Default configuration:

  • Ingester ring readiness: Requires minimum healthy ingesters based on replication factor
  • Distributor: Returns 503 when no healthy ingesters available

Resolution:

  • Implement retry logic with exponential backoff in your client.
  • Scale Loki ingester instances to handle load.
  • Check for resource constraints (CPU, memory, storage).
  • Review ingestion patterns for sudden spikes.

Properties:

  • Enforced by: Distributor/Gateway
  • Retryable: Yes
  • HTTP status: 503 Service Unavailable
  • Configurable per tenant: No

Structured metadata errors

These errors occur when using structured metadata incorrectly.

Error: Structured metadata disabled

Error message:

stream <stream_labels> includes structured metadata, but this feature is disallowed. Please see limits_config.allow_structured_metadata or contact your Loki administrator to enable it

Cause:

Structured metadata is disabled in the Loki configuration. This feature must be explicitly enabled.

Default configuration:

  • allow_structured_metadata: true (enabled by default in recent versions)

Resolution:

  • Enable structured metadata:

    YAML
    limits_config:
      allow_structured_metadata: true
  • Move metadata to regular labels if structured metadata isn’t needed.

  • Contact your Loki administrator to enable the feature.

Properties:

  • Enforced by: Distributor
  • Retryable: No
  • HTTP status: 400 Bad Request
  • Configurable per tenant: Yes

Error: Structured metadata too large

Error message:

stream '{job="app"}' has structured metadata too large: '70000' bytes, limit: '65536' bytes. Please see limits_config.max_structured_metadata_size or contact your Loki administrator to increase it

Cause:

The structured metadata size exceeds the configured limit.

Default configuration:

  • max_structured_metadata_size: 64 KB

Resolution:

  • Reduce the size of structured metadata by removing unnecessary fields.

  • Increase the limit:

    YAML
    limits_config:
      max_structured_metadata_size: 128KB
  • Use compression or abbreviations for metadata values.

Properties:

  • Enforced by: Distributor
  • Retryable: No
  • HTTP status: 400 Bad Request
  • Configurable per tenant: Yes

Error: Too many structured metadata entries

Error message:

stream '{job="app"}' has too many structured metadata labels: '150', limit: '128'. Please see limits_config.max_structured_metadata_entries_count or contact your Loki administrator to increase it

Cause:

The number of structured metadata entries exceeds the limit.

Default configuration:

  • max_structured_metadata_entries_count: 128

Resolution:

  • Reduce the number of structured metadata entries by consolidating fields.

  • Increase the limit:

    YAML
    limits_config:
      max_structured_metadata_entries_count: 256
  • Combine related metadata into single entries using JSON or other formats.

Properties:

  • Enforced by: Distributor
  • Retryable: No
  • HTTP status: 400 Bad Request
  • Configurable per tenant: Yes

Blocked ingestion errors

These errors occur when ingestion is administratively blocked.

Ingestion blocked

Error message:

ingestion blocked for user <tenant> until <time> with status code 260

Or for policy-specific blocks:

ingestion blocked for user <tenant> (policy: <policy>) until <time> with status code 260

Cause:

Ingestion has been administratively blocked for the tenant, typically due to policy violations, billing issues, or maintenance. Status code 260 is a custom Loki status code for blocked ingestion.

Default configuration:

  • blocked_ingestion_status_code: 260 (custom status code)
  • Blocks are configured per tenant or per policy in runtime overrides

Resolution:

  • Wait until the block expires if a time limit is set.
  • Contact your Loki administrator to understand the block reason.
  • Review tenant usage patterns and policies.
  • Check for policy-specific blocks if using ingestion policies.

Properties:

  • Enforced by: Distributor
  • Retryable: No (until block is lifted)
  • HTTP status: 260 (custom) or configured status code
  • Configurable per tenant: Yes

Troubleshooting workflow

Follow this workflow when investigating ingestion issues:

  • Check metrics:

    promql
    # See which errors are occurring
    sum by (reason) (rate(loki_discarded_samples_total[5m]))
    
    # Identify affected tenants
    sum by (tenant, reason) (rate(loki_discarded_bytes_total[5m]))
  • Review logs from distributors and ingesters:

    Bash
    # Enable write failure logging per tenant
    limits_config:
      limited_log_push_errors: true
  • Test push endpoint:

    Bash
    curl -H "Content-Type: application/json" \
         -XPOST http://loki:3100/loki/api/v1/push \
         --data-raw '{"streams":[{"stream":{"job":"test"},"values":[["'"$(date +%s)000000000"'","test log line"]]}]}'
  • Check tenant limits:

    Bash
    curl http://loki:3100/loki/api/v1/user_stats
  • Review configuration for affected tenant or global limits.

  • Monitor system resources:

    • Ingester memory usage
    • Disk space for WAL
    • Storage backend health
    • Network connectivity

Additional resources