This is documentation for the next version of Grafana Loki documentation. For the latest stable release, go to the latest version.
Troubleshoot Loki operations
This guide helps you troubleshoot errors that occur during Loki operations, including configuration issues, storage backend problems, cluster communication failures, and service component errors. These errors are distinct from ingestion (write path) and query (read path) errors covered in separate troubleshooting topics.
Before you begin, ensure you have the following:
- Access to Loki logs and metrics
- Permissions to view and modify Loki configuration
- Understanding of your deployment topology (single binary/monolithic, simple scalable, microservices/distributed)
Configuration errors
Configuration errors occur during Loki startup or when loading runtime configuration. These errors prevent Loki from starting or operating correctly.
Error: Multiple config errors found
Error message:
MULTIPLE CONFIG ERRORS FOUND, PLEASE READ CAREFULLY
<list of configuration errors>Cause:
Multiple configuration validation errors were detected during startup. Loki aggregates all configuration errors rather than failing on the first one.
Resolution:
Review all listed errors carefully - each error message describes a specific configuration problem.
Check your configuration file for syntax errors and invalid values.
Validate your configuration before applying:
loki -config.file=/path/to/config.yaml -verify-config
Properties:
- Enforced by: Loki startup
- Retryable: No (configuration must be fixed)
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Error: Too many storage configs
Error message:
too many storage configs provided in the common config, please only define one storage backendCause:
Multiple storage backends are configured in the common configuration section. Loki requires a single storage backend for the common config.
Resolution:
Use only one storage backend in your common config:
common: storage: # Choose only ONE of the following: s3: endpoint: s3.amazonaws.com bucketnames: loki-data # OR gcs: bucket_name: loki-data # OR azure: container_name: loki-dataFor multiple storage backends, configure them explicitly in specific sections rather than common config.
Properties:
- Enforced by: Configuration validation
- Retryable: No
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Error: Persist tokens path prefix required
Error message:
if persist_tokens is true, path_prefix MUST be definedCause:
The persist_tokens option is enabled for a ring but no path_prefix is specified. Loki needs a path to store the token file.
Resolution:
Set the path prefix:
common: path_prefix: /var/loki persist_tokens: true ingester: lifecycler: ring: kvstore: store: memberlist tokens_file_path: /var/loki/tokensOr disable persist_tokens if you don’t need token persistence:
common: persist_tokens: false
Properties:
- Enforced by: Configuration validation
- Retryable: No
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Error: Conflicting gRPC client configs
Error message:
both `grpc_client_config` and (`query_frontend_grpc_client` or `query_scheduler_grpc_client`) are set at the same time. Please use only `query_frontend_grpc_client` and `query_scheduler_grpc_client`Cause:
Both the deprecated grpc_client_config and the newer specific gRPC client configs are set. These are mutually exclusive.
Resolution:
Remove the deprecated config and use specific gRPC client configs:
# Remove this: # grpc_client_config: ... # Use these instead: query_frontend_grpc_client: max_recv_msg_size: 104857600 query_scheduler_grpc_client: max_recv_msg_size: 104857600
Properties:
- Enforced by: Configuration validation
- Retryable: No
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Error: Schema v13 required for structured metadata
Error message:
CONFIG ERROR: schema v13 is required to store Structured Metadata and use native OTLP ingestion, your schema version is <version>. Set `allow_structured_metadata: false` in the `limits_config` section or set the command line argument `-validation.allow-structured-metadata=false` and restart Loki. Then proceed to update to schema v13 or newer before re-enabling this config, search for 'Storage Schema' in the docs for the schema update procedureCause:
Structured metadata is enabled but the active schema version is older than v13. Structured metadata requires schema v13 or newer.
Resolution:
Disable structured metadata temporarily:
limits_config: allow_structured_metadata: falseUpdate your schema config to v13 or newer:
schema_config: configs: - from: "2024-04-01" store: tsdb object_store: s3 schema: v13 index: prefix: index_ period: 24hRe-enable structured metadata after the schema migration is complete.
Properties:
- Enforced by: Configuration validation
- Retryable: No (configuration must be fixed)
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Error: TSDB index type required for structured metadata
Error message:
CONFIG ERROR: `tsdb` index type is required to store Structured Metadata and use native OTLP ingestion, your index type is `<type>` (defined in the `store` parameter of the schema_config). Set `allow_structured_metadata: false` in the `limits_config` section or set the command line argument `-validation.allow-structured-metadata=false` and restart Loki. Then proceed to update the schema to use index type `tsdb` before re-enabling this config, search for 'Storage Schema' in the docs for the schema update procedureCause:
Structured metadata is enabled but the active index type is not TSDB. Structured metadata requires the TSDB index type.
Resolution:
Disable structured metadata temporarily and migrate to the TSDB index type:
limits_config: allow_structured_metadata: false schema_config: configs: - from: "2024-01-01" store: tsdb object_store: s3 schema: v13Re-enable structured metadata after migrating to TSDB.
Properties:
- Enforced by: Configuration validation
- Retryable: No (configuration must be fixed)
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Error: TSDB directories not configured
Error message:
CONFIG ERROR: `tsdb` index type is configured in at least one schema period, however, `storage_config`, `tsdb_shipper`, `active_index_directory` is not set, please set this directly or set `path_prefix:` in the `common:` sectionOr:
CONFIG ERROR: `tsdb` index type is configured in at least one schema period, however, `storage_config`, `tsdb_shipper`, `cache_location` is not set, please set this directly or set `path_prefix:` in the `common:` sectionCause:
The TSDB index type is configured in the schema but required local directories for index files are not set.
Resolution:
Set the common path prefix (simplest approach):
common: path_prefix: /var/lokiOr configure directories explicitly:
storage_config: tsdb_shipper: active_index_directory: /var/loki/tsdb-index cache_location: /var/loki/tsdb-cache
Properties:
- Enforced by: Configuration validation
- Retryable: No (configuration must be fixed)
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Error: Compactor working directory empty
Error message:
CONFIG ERROR: `compactor:` `working_directory:` is empty, please set a valid directory or set `path_prefix:` in the `common:` sectionCause:
The compactor requires a working directory for index compaction, but none is configured.
Resolution:
Set the common path prefix:
common: path_prefix: /var/lokiOr set the compactor working directory explicitly:
compactor: working_directory: /var/loki/compactor
Properties:
- Enforced by: Configuration validation
- Retryable: No (configuration must be fixed)
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Error: Index cache validity conflict
Error message:
CONFIG ERROR: the active index is <type> which is configured to use an `index_cache_validity` (TTL) of <duration>, however the chunk_retain_period is <duration> which is LESS than the `index_cache_validity`. This can lead to query gaps, please configure the `chunk_retain_period` to be greater than the `index_cache_validity`Cause:
The chunk retain period is shorter than the index cache validity (TTL), which can cause query gaps where data exists in the index cache but the chunks have already been flushed and removed from ingesters.
Resolution:
Increase the chunk retain period to be greater than the index cache validity:
ingester: chunk_retain_period: 15m # Must be > index_cache_validity storage_config: index_cache_validity: 5m
Properties:
- Enforced by: Configuration validation
- Retryable: No (configuration must be fixed)
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Error: Invalid target with legacy read mode
Error message:
CONFIG ERROR: invalid target, cannot run backend target with legacy read modeCause:
The backend target is configured while legacy read mode is enabled. These are incompatible deployment configurations.
Resolution:
Disable legacy read mode if using the
backendtarget:# Remove or set to false: legacy_read_mode: falseOr use a different target compatible with legacy read mode.
Properties:
- Enforced by: Configuration validation
- Retryable: No (configuration must be fixed)
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Error: Unrecognized index or store type
Error message:
unrecognized `store` (index) type `<type>`, choose one of: <supported_types>Or:
unrecognized `object_store` type `<type>`, which also does not match any named_stores. Choose one of: <supported_types>. Or choose a named_storeCause:
The schema configuration references an index type or object store type that Loki does not recognize.
Resolution:
Use a supported index type:
tsdb(recommended) orboltdb-shipperUse a supported object store type:
s3,gcs,azure,swift,filesystem,bosOr reference a valid named store defined in your configuration:
storage_config: named_stores: aws: my-store: endpoint: s3.amazonaws.com bucketnames: my-bucket schema_config: configs: - from: 2024-01-01 store: tsdb object_store: my-store # References the named store
Properties:
- Enforced by: Configuration validation
- Retryable: No (configuration must be fixed)
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Error: Overrides exporter requires runtime configuration
Error message:
overrides-exporter has been enabled, but no runtime configuration file was configuredCause:
The overrides-exporter target is enabled but no runtime configuration file is provided. The overrides-exporter needs a runtime config to expose tenant-specific limit overrides as metrics.
Resolution:
Configure a runtime configuration file:
runtime_config: file: /etc/loki/runtime-config.yamlOr disable the overrides-exporter if not needed by removing it from your target list.
Properties:
- Enforced by: Module initialization
- Retryable: No (configuration must be fixed)
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Error: Invalid override for tenant
Error message:
invalid override for tenant <tenant>: <details>Cause:
The runtime configuration file contains an invalid override for a specific tenant. The override failed validation.
Resolution:
- Review the runtime config file for the specified tenant.
- Validate the override values against the limits configuration schema.
- Fix invalid values such as negative durations, invalid label matchers, or out-of-range settings.
Properties:
- Enforced by: Runtime configuration loader
- Retryable: No (runtime config must be fixed)
- HTTP status: N/A (runtime config reload failure)
- Configurable per tenant: Yes
Error: Retention period too short
Error message:
retention period must be >= 24h was <duration>Cause:
A stream-level retention rule specifies a retention period shorter than 24 hours, which is the minimum allowed.
Resolution:
Set retention periods to at least 24 hours:
limits_config: retention_stream: - selector: '{namespace="dev"}' priority: 1 period: 24h # Must be >= 24h
Properties:
- Enforced by: Configuration validation
- Retryable: No (configuration must be fixed)
- HTTP status: N/A (startup failure)
- Configurable per tenant: Yes
Error: Invalid query store max look back period
Error message:
it is an error to specify a non zero `query_store_max_look_back_period` value when using any object store other than `filesystem`Cause:
The query_store_max_look_back_period is set to a non-zero value with a storage backend other than filesystem. This setting only applies to local filesystem storage.
Resolution:
Remove the setting if using object storage:
# Remove or set to 0: query_store_max_look_back_period: 0Or use filesystem storage if this setting is needed for local development.
Properties:
- Enforced by: Configuration validation
- Retryable: No (configuration must be fixed)
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Authentication and tenant errors
Authentication and tenant errors occur when requests are missing required tenant identification or when tenant IDs are invalid. In multi-tenant mode, every request must include a valid tenant ID.
Error: No org ID
Error message:
no org idCause:
A request was made to Loki without the required X-Scope-OrgID header. In multi-tenant mode, every request must identify the tenant.
Resolution:
Add the
X-Scope-OrgIDheader to your requests:curl -H "X-Scope-OrgID: my-tenant" http://loki:3100/loki/api/v1/push ...For Grafana, configure the tenant ID in the Loki data source settings under “HTTP Headers”.
For Alloy, set the tenant ID in the
loki.writecomponent:loki.write "default" { endpoint { url = "http://loki:3100/loki/api/v1/push" tenant_id = "my-tenant" } }Disable multi-tenancy for single-tenant deployments:
auth_enabled: false
Properties:
- Enforced by: Authentication middleware
- Retryable: Yes (with tenant ID)
- HTTP status: 401 Unauthorized
- Configurable per tenant: No
Error: Multiple org IDs present
Error message:
multiple org IDs presentCause:
The request contains multiple different tenant IDs, but the operation requires a single tenant. This can happen when a request is forwarded through multiple proxies that each inject a tenant ID.
Resolution:
Ensure only one tenant ID is set in the
X-Scope-OrgIDheader.Check proxy configurations for conflicting tenant ID injection.
For cross-tenant queries, use pipe-separated tenant IDs only where supported:
curl -H "X-Scope-OrgID: tenant1|tenant2" http://loki:3100/loki/api/v1/query ...
Properties:
- Enforced by: Tenant resolver
- Retryable: Yes (with correct tenant ID)
- HTTP status: 400 Bad Request
- Configurable per tenant: No
Error: Tenant ID too long
Error message:
tenant ID is too long: max 150 charactersCause:
The tenant ID exceeds the maximum allowed length of 150 characters.
Resolution:
- Use a shorter tenant ID (maximum 150 characters).
Properties:
- Enforced by: Tenant validation
- Retryable: Yes (with valid tenant ID)
- HTTP status: 400 Bad Request
- Configurable per tenant: No
Error: Unsafe tenant ID
Error message:
tenant ID is '.' or '..'Cause:
The tenant ID is set to . or .., which are reserved filesystem path components and could cause path traversal issues.
Resolution:
- Choose a different tenant ID that is not
.or...
Properties:
- Enforced by: Tenant validation
- Retryable: Yes (with valid tenant ID)
- HTTP status: 400 Bad Request
- Configurable per tenant: No
Error: Tenant ID contains unsupported character
Error message:
tenant ID '<id>' contains unsupported character '<char>'Cause:
The tenant ID contains characters that are not allowed. Tenant IDs must consist of alphanumeric characters, hyphens, underscores, and periods.
Resolution:
- Use only supported characters in your tenant ID: letters, numbers, hyphens (
-), underscores (_), and periods (.).
Properties:
- Enforced by: Tenant validation
- Retryable: Yes (with valid tenant ID)
- HTTP status: 400 Bad Request
- Configurable per tenant: No
Error: Deletion not available for tenant
Error message:
deletion is not available for this tenantCause:
A delete request was submitted for a tenant that does not have deletion enabled. Log deletion must be explicitly enabled per tenant.
Resolution:
Enable deletion for the tenant in the runtime configuration:
overrides: my-tenant: deletion_mode: filter-and-delete # Or "filter-only"Valid deletion modes:
disabled- Deletion is not allowed (default)filter-only- Lines matching delete requests are filtered at query time but not physically deletedfilter-and-delete- Lines are filtered at query time and physically deleted during compaction
Ensure the compactor is configured for retention:
compactor: retention_enabled: true delete_request_store: s3
Properties:
- Enforced by: Compactor deletion handler
- Retryable: No (configuration must change)
- HTTP status: 403 Forbidden
- Configurable per tenant: Yes
Storage backend errors
Storage backend errors occur when Loki cannot communicate with or properly configure object storage (Amazon S3, Google Cloud Services, Microsoft Azure, Swift, or filesystem).
Error: Unsupported storage backend
Error message:
unsupported storage backendCause:
The specified storage backend type is not recognized. This typically occurs when a typo exists in the storage type configuration.
Resolution:
Use a valid storage backend type:
s3- Amazon S3 or S3-compatible storagegcs- Google Cloud Storageazure- Azure Blob Storageswift- OpenStack Swiftfilesystem- Local filesystembos- Baidu Object Storage
storage_config: boltdb_shipper: shared_store: s3 # Must be one of the valid types
Properties:
- Enforced by: Configuration validation
- Retryable: No
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Error: Invalid characters in storage prefix
Error message:
storage prefix contains invalid characters, it may only contain digits, English alphabet letters and dashesCause:
The storage path prefix contains invalid characters. Only alphanumeric characters and dashes are allowed.
Resolution:
Use valid characters in your storage prefix:
storage_config: # Invalid: prefix_with_underscore_or/special chars # Valid: my-loki-data or lokilogs123 aws: s3: s3://my-bucket/my-loki-data
Properties:
- Enforced by: Configuration validation
- Retryable: No
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Error: Unsupported S3 SSE type
Error message:
unsupported S3 SSE typeCause:
The S3 server-side encryption (SSE) type is not supported. Loki supports specific SSE types.
Resolution:
Use a supported SSE type:
storage_config: aws: sse: type: SSE-S3 # Or SSE-KMSSupported types:
SSE-S3- Server-side encryption with Amazon S3-managed keysSSE-KMS- Server-side encryption with AWS KMS-managed keys
Properties:
- Enforced by: Configuration validation
- Retryable: No
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Error: Invalid S3 SSE encryption context
Error message:
invalid S3 SSE encryption contextCause:
The SSE-KMS encryption context is malformed and cannot be parsed as valid JSON.
Resolution:
Provide valid JSON for the encryption context:
storage_config: aws: sse: type: SSE-KMS kms_key_id: alias/my-key kms_encryption_context: '{"key": "value"}' # Valid JSON
Properties:
- Enforced by: Configuration validation
- Retryable: No
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Error: Invalid S3 endpoint prefix
Error message:
the endpoint must not prefixed with the bucket nameCause:
The S3 endpoint incorrectly includes the bucket name as a prefix. This can cause path-style vs virtual-hosted-style URL issues.
Resolution:
Remove the bucket name from the endpoint and configure it separately:
storage_config: aws: # Incorrect: # endpoint: my-bucket.s3.amazonaws.com # Correct: endpoint: s3.amazonaws.com bucketnames: my-bucket
Properties:
- Enforced by: Configuration validation
- Retryable: No
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Error: Invalid STS endpoint
Error message:
sts-endpoint must be a valid urlCause:
The AWS STS (Security Token Service) endpoint URL is malformed or invalid.
Resolution:
Provide a valid URL for the STS endpoint:
storage_config: aws: sts_endpoint: https://sts.us-east-1.amazonaws.com
Properties:
- Enforced by: Configuration validation
- Retryable: No
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Error: Azure connection string malformed
Error message:
connection string is either blank or malformed. The expected connection string should contain key value pairs separated by semicolons. For example 'DefaultEndpointsProtocol=https;AccountName=<accountName>;AccountKey=<accountKey>;EndpointSuffix=core.windows.net'Cause:
The Azure storage connection string is missing or doesn’t follow the expected format.
Resolution:
Use a valid connection string format:
storage_config: azure: # Use account credentials: account_name: myaccount account_key: mykey # Or connection string: connection_string: "DefaultEndpointsProtocol=https;AccountName=myaccount;AccountKey=mykey;EndpointSuffix=core.windows.net"Verify the connection string in Azure Portal under Storage Account > Access Keys.
Properties:
- Enforced by: Configuration validation
- Retryable: No
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Error: Unrecognized named storage config
Error message:
unrecognized named storage config <name>Or for specific backends:
unrecognized named s3 storage config <name>
unrecognized named gcs storage config <name>
unrecognized named azure storage config <name>
unrecognized named filesystem storage config <name>
unrecognized named swift storage config <name>Or for an unrecognized store type:
unrecognized named storage type: <storeType>Cause:
A named storage configuration referenced in the schema config doesn’t exist in the named stores configuration.
Resolution:
Define the named store in your configuration:
storage_config: named_stores: aws: my-s3-store: # This name must match the reference endpoint: s3.amazonaws.com bucketnames: my-bucket schema_config: configs: - from: 2024-01-01 store: tsdb object_store: my-s3-store # References the named store aboveCheck spelling of the store name in both the definition and reference.
Properties:
- Enforced by: Configuration validation
- Retryable: No
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Cache errors
Cache errors occur when Loki cannot connect to or communicate with caching backends (Memcached, Redis).
Error: Redis client setup failed
Error message:
redis client setup failed: <details>Cause:
Loki cannot establish a connection to the Redis server. Common causes include:
- Incorrect Redis endpoint
- Network connectivity issues
- Authentication failures
- TLS configuration problems
Resolution:
Verify Redis connectivity from the Loki host:
redis-cli -h <REDIS-HOST> -p <REDIS-PORT> pingCheck the Redis endpoint configuration:
chunk_store_config: chunk_cache_config: redis: endpoint: redis:6379 timeout: 500msConfigure authentication if required:
chunk_store_config: chunk_cache_config: redis: endpoint: redis:6379 password: ${REDIS_PASSWORD}
Properties:
- Enforced by: Cache client initialization
- Retryable: Yes (with correct configuration)
- HTTP status: N/A (startup failure or degraded operation)
- Configurable per tenant: No
Error: Could not lookup Redis host
Error message:
could not lookup host: <hostname>Cause:
DNS resolution failed for the Redis hostname.
Resolution:
Verify DNS resolution:
nslookup redis-hostUse an IP address if DNS is not available:
chunk_store_config: chunk_cache_config: redis: endpoint: 10.0.0.100:6379Check your DNS configuration and network settings.
Properties:
- Enforced by: DNS resolution
- Retryable: Yes
- HTTP status: N/A
- Configurable per tenant: No
Error: Unexpected Redis PING response
Error message:
redis: Unexpected PING response "<response>"Cause:
The Redis server returned an unexpected response to a PING command. This could indicate:
- The endpoint is not a Redis server
- A proxy or load balancer is interfering
- Redis is in an error state
Resolution:
Verify the endpoint is actually a Redis server.
Check Redis health:
redis-cli -h <HOST> -p <PORT> INFOReview proxy configurations if using a load balancer in front of Redis.
Properties:
- Enforced by: Redis health check
- Retryable: Yes
- HTTP status: N/A
- Configurable per tenant: No
Error: Multiple cache systems not supported
Error message:
use of multiple cache storage systems is not supportedCause:
Both Memcached and Redis cache backends are configured for the same cache type. Only one caching backend is supported per cache type.
Resolution:
Choose one cache backend per cache type:
chunk_store_config: chunk_cache_config: # Use either memcached OR redis, not both redis: endpoint: redis:6379 memcached: {} # Remove this
Properties:
- Enforced by: Configuration validation
- Retryable: No
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Error: No cache configured
Error message:
no cache configuredCause:
A results cache is required for the query frontend but no cache configuration was provided.
Resolution:
Configure a cache backend:
query_range: results_cache: cache: memcached: expiration: 1h memcached_client: addresses: memcached:11211Or disable results caching if not needed:
query_range: cache_results: false
Properties:
- Enforced by: Query frontend initialization
- Retryable: No
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Ring and cluster communication errors
Ring errors occur when Loki components cannot properly communicate through the hash ring, which is used to distribute work across instances. The ring is fundamental to Loki’s distributed operation.
Error: Too many unhealthy instances in the ring
Error message:
too many unhealthy instances in the ringCause:
The ring contains too many unhealthy instances to satisfy the replication factor. For example, with a replication factor of 3, at least 3 healthy instances must be available.
Resolution:
Check the health of ring members:
curl -s http://loki:3100/ring | jq '.shards[] | select(.state != "ACTIVE")'Restart unhealthy instances that are stuck in a bad state.
Scale up instances if there aren’t enough healthy members.
Check resource constraints (CPU, memory, disk) on unhealthy instances.
Properties:
- Enforced by: Ring replication
- Retryable: Yes (after instances recover)
- HTTP status: 503 Service Unavailable
- Configurable per tenant: No
Error: Empty ring
Error message:
empty ringCause:
No instances are registered in the ring. This typically occurs during initial cluster startup, for example if your ingesters are OOM crashing, or due to misconfiguration.
Resolution:
Wait for instances to register during initial startup.
Check ingesters to make sure they are running.
Check that all instances can communicate over the configured ports.
Verify ring configuration across all components, especially memberlist configuration:
ingester: lifecycler: ring: kvstore: store: memberlist replication_factor: 3Check KV store health (Consul, etcd, or memberlist):
# For memberlist curl -s http://loki:3100/memberlist
Properties:
- Enforced by: Ring operations
- Retryable: Yes (after instances register)
- HTTP status: 503 Service Unavailable
- Configurable per tenant: No
Error: Instance not found in the ring
Error message:
instance <id> not found in the ringCause:
A specific instance is expected to be in the ring but isn’t registered. This can happen after a restart if the instance hasn’t re-joined the ring yet.
Resolution:
- Wait for the instance to re-register in the ring.
- Check the instance’s logs for ring join failures.
- Verify KV store connectivity from the instance.
Properties:
- Enforced by: Ring operations
- Retryable: Yes (after instance registers)
- HTTP status: 503 Service Unavailable
- Configurable per tenant: No
Error: Instance owns no tokens
Error message:
this instance owns no tokensCause:
The instance has joined the ring but hasn’t claimed any tokens. Without tokens, the instance cannot receive any work. This can happen if:
- The instance is still starting up
- Token claim failed
- The KV store update didn’t propagate
Resolution:
Wait for token assignment during startup.
Check the ring status for the instance: Open a browser and navigate to http://localhost:3100/ring. You should see the Loki Ring Status page.
OR
curl -s http://loki:3100/ringRestart the instance if tokens are not assigned after startup completes.
Check KV store connectivity and health.
Properties:
- Enforced by: Lifecycler readiness check
- Retryable: Yes (after tokens are assigned)
- HTTP status: 503 Service Unavailable
- Configurable per tenant: No
Error: Error talking to the KV store
Error message:
error talking to the KV storeCause:
The instance cannot communicate with the key-value store used for ring state. The KV store (Consul, etcd, or memberlist) is required for ring coordination.
Resolution:
Check KV store health and connectivity:
# For Consul curl http://consul:8500/v1/status/leader # For etcd etcdctl endpoint healthVerify network connectivity between Loki instances and the KV store.
Check firewall rules allow traffic on KV store ports.
For memberlist, verify that gossip ports are accessible between all instances:
memberlist: bind_port: 7946 join_members: - loki-memberlist:7946
Properties:
- Enforced by: Ring lifecycler
- Retryable: Yes (after KV store recovery)
- HTTP status: 503 Service Unavailable
- Configurable per tenant: No
Error: No ring returned from the KV store
Error message:
no ring returned from the KV storeCause:
The KV store responded but returned an empty or invalid ring descriptor. This can happen if the KV store was recently initialized or its data was cleared.
Resolution:
- Wait for ring initialization during first startup.
- Check if the KV store data was accidentally cleared.
- Restart all ring members to re-register if the KV store was reset.
Properties:
- Enforced by: Ring lifecycler
- Retryable: Yes (after ring initialization)
- HTTP status: 503 Service Unavailable
- Configurable per tenant: No
Error: Failed to join memberlist cluster
Error message:
failed to join memberlist cluster on startupOr:
joining memberlist cluster failedCause:
The instance could not join the memberlist gossip cluster. Common causes:
- Seed nodes are unreachable
- DNS resolution failure for join addresses
- Firewall blocking gossip ports
- All existing members are down
Resolution:
Check that join members are reachable:
# Test connectivity to seed nodes nc -zv loki-memberlist 7946Verify DNS resolution for join addresses:
nslookup loki-memberlistCheck memberlist configuration:
memberlist: bind_port: 7946 join_members: - loki-gossip-ring.loki.svc.cluster.local:7946Ensure firewall rules allow UDP and TCP traffic on the gossip port (default 7946).
For Kubernetes, verify that the headless service for memberlist is configured correctly.
Properties:
- Enforced by: Memberlist KV client
- Retryable: Yes (automatic retries with backoff)
- HTTP status: N/A (startup failure or degraded operation)
- Configurable per tenant: No
Error: Re-joining memberlist cluster failed
Error message:
re-joining memberlist cluster failedCause:
After being disconnected from the memberlist cluster, the instance failed to rejoin. This can happen during network partitions or after prolonged network issues.
Resolution:
- Check network connectivity between cluster members.
- Verify other cluster members are healthy.
- Restart the affected instance if automatic rejoin continues to fail.
- Review network stability frequent re-joins indicate underlying network issues.
Properties:
- Enforced by: Memberlist KV client
- Retryable: Yes (automatic retries)
- HTTP status: N/A (degraded operation)
- Configurable per tenant: No
Component readiness errors
Readiness errors occur when Loki components are not ready to serve requests. These errors are returned by the /ready health check endpoint and prevent load balancers from routing traffic to unready instances.
Error: Application is stopping
Error message:
Application is stoppingCause:
Loki is shutting down and no longer accepting new requests. This is normal during graceful shutdown.
Resolution:
- Wait for the instance to restart if this is a rolling update.
- Check if the shutdown is expected (maintenance, scaling down).
- Review orchestrator logs (Kubernetes, systemd) if the shutdown is unexpected.
Properties:
- Enforced by: Loki readiness handler
- Retryable: Yes (after restart)
- HTTP status: 503 Service Unavailable
- Configurable per tenant: No
Error: Some services are not running
Error message:
Some services are not Running:
<state>: <count>
<state>: <count>For example:
Some services are not Running:
Starting: 1
Failed: 2Cause:
One or more internal Loki services have failed to start or have stopped unexpectedly. The error message lists each service state with a count of services in that state.
Resolution:
- Check Loki logs for errors from the listed services.
- Verify configuration for the affected services.
- Check resource availability (memory, disk, CPU).
- Restart the instance if services are stuck.
Properties:
- Enforced by: Loki service manager
- Retryable: Yes (after services recover)
- HTTP status: 503 Service Unavailable
- Configurable per tenant: No
Error: Ingester not ready
Error message:
Ingester not ready: <details>When the ingester’s own state check fails, <details> contains the ingester state, giving the full message:
Ingester not ready: ingester not ready: <state>Where <state> is the service state, for example Starting, Stopping, or Failed.
Cause:
The ingester is not in a ready state to accept writes or serve reads. The detail message indicates the specific reason, such as:
- The ingester is still starting up and joining the ring (
Starting) - The lifecycler is not ready (lifecycler error text)
- The ingester is waiting for minimum ready duration after ring join
Resolution:
Wait for startup to complete - ingesters take time to join the ring and become ready.
Check ring membership: Open a browser and navigate to http://localhost:3100/ring. You should see the Loki Ring Status page.
OR
curl -s http://ingester:3100/ringReview logs for startup errors.
Adjust the minimum ready duration if startup is too slow:
ingester: lifecycler: min_ready_duration: 15s
Properties:
- Enforced by: Ingester readiness check
- Retryable: Yes (after ingester becomes ready)
- HTTP status: 503 Service Unavailable
- Configurable per tenant: No
Error: No queriers connected to query frontend
Error message:
Query Frontend not ready: not ready: number of queriers connected to query-frontend is 0Cause:
The query frontend has no querier workers connected. Without queriers, the frontend cannot process any queries. This typically occurs when:
- Queriers are not yet started
- Queriers cannot reach the frontend
- gRPC connectivity issues between queriers and frontend
Resolution:
Check that queriers are running and healthy.
Verify querier configuration points to the correct frontend address:
frontend_worker: frontend_address: query-frontend:9095Check gRPC connectivity between queriers and the frontend:
# Test gRPC port connectivity nc -zv query-frontend 9095Review querier logs for connection errors.
Properties:
- Enforced by: Query frontend (v1) readiness check
- Retryable: Yes (after queriers connect)
- HTTP status: 503 Service Unavailable
- Configurable per tenant: No
Error: No schedulers connected to frontend worker
Error message:
Query Frontend not ready: not ready: number of schedulers this worker is connected to is 0Cause:
The query frontend worker has no active connections to any query scheduler. This prevents the frontend from dispatching queries.
Resolution:
Check that query schedulers are running and healthy.
Verify scheduler address configuration:
frontend_worker: scheduler_address: query-scheduler:9095Check gRPC connectivity between the frontend and schedulers.
Review query scheduler logs for errors.
Properties:
- Enforced by: Query frontend (v2) readiness check
- Retryable: Yes (after schedulers connect)
- HTTP status: 503 Service Unavailable
- Configurable per tenant: No
gRPC and message size errors
gRPC errors occur during inter-component communication. Loki components communicate using gRPC for ring coordination, query execution, and data transfer.
Error: Message size too large
Error message:
message size too large than max (<size> vs <max>)Or for the decompressed body:
decompressed message size too large than max (<size> vs <max>)Cause:
The compressed or decompressed body of an HTTP push request to the distributor exceeds the configured limit.
Default configuration:
distributor.max_recv_msg_size: 100MB (compressed request body limit)distributor.max_decompressed_size: 5000MB (decompressed body limit, defaults to 50×max_recv_msg_size)
Resolution:
Increase the distributor receive message size limit:
distributor: max_recv_msg_size: 209715200 # 200MB compressed max_decompressed_size: 10737418240 # 10GB decompressedReduce push batch sizes in your log shipping client (Alloy, Promtail, etc.) to send smaller individual requests.
Reduce the amount of data per request by lowering the batch size or flush interval in your client.
Properties:
- Enforced by: Distributor push handler
- Retryable: No (request must be smaller or limits increased)
- HTTP status: 413 Request Entity Too Large (compressed), 400 Bad Request (decompressed)
- Configurable per tenant: No
Error: Response larger than max message size
Error message:
response larger than the max message size (<size> vs <max>)Cause:
A query result from the querier to the frontend exceeds the maximum allowed gRPC response size. This typically happens with queries that return very large result sets.
Default configuration:
server.grpc_server_max_send_msg_size: 4MB (gRPC server send limit on the querier)querier.query_frontend_grpc_client.max_recv_msg_size: 100MB (gRPC client receive limit on the querier worker)
Resolution:
Reduce query scope to return fewer results:
- Add more specific label matchers
- Reduce the time range
- Lower the entries limit
Increase gRPC message size limits if needed. Apply these settings to querier nodes:
server: grpc_server_max_send_msg_size: 209715200 # 200MB querier: query_frontend_grpc_client: max_recv_msg_size: 209715200 # 200MB
Properties:
- Enforced by: Querier worker
- Retryable: No (query scope or limits must change)
- HTTP status: 413 Request Entity Too Large
- Configurable per tenant: No
Error: Compressed message size exceeds limit
Error message:
compressed message size <size> exceeds limit <limit>Cause:
The compressed body of an HTTP push request exceeds the distributor’s configured limit. This check runs after the request body has been fully read and validates the total compressed size against the configured maximum.
Default configuration:
distributor.max_recv_msg_size: 100MB
Resolution:
Reduce batch sizes in your log shipping client.
Split large batches into smaller, more frequent requests.
Increase the limit if needed:
distributor: max_recv_msg_size: 209715200 # 200MB
Properties:
- Enforced by: Distributor push handler
- Retryable: No (request must be smaller)
- HTTP status: 400 Bad Request
- Configurable per tenant: No
TLS and certificate errors
TLS errors occur when Loki or its clients cannot establish secure connections due to certificate issues.
Error: TLS certificate loading failed
Error message:
error loading ca cert: <path>Or:
error loading client cert: <path>Or:
error loading client key: <path>Or:
failed to load TLS certificate <cert_path>,<key_path>Cause:
Loki cannot load TLS certificates from the specified paths. Common causes:
- Certificate files don’t exist at the configured paths
- Permission issues prevent reading the files
- Certificate or key format is invalid
- Certificate and key don’t match
Resolution:
Verify certificate files exist and are readable:
ls -la /path/to/cert.pem /path/to/key.pem /path/to/ca.pemCheck file permissions (the Loki process must be able to read them).
Validate the certificate format:
openssl x509 -in /path/to/cert.pem -noout -text openssl rsa -in /path/to/key.pem -checkVerify cert and key match:
openssl x509 -noout -modulus -in cert.pem | md5sum openssl rsa -noout -modulus -in key.pem | md5sum # Both should produce the same hashCheck your TLS configuration:
server: http_tls_config: cert_file: /path/to/cert.pem key_file: /path/to/key.pem client_ca_file: /path/to/ca.pem grpc_tls_config: cert_file: /path/to/cert.pem key_file: /path/to/key.pem client_ca_file: /path/to/ca.pem
Properties:
- Enforced by: TLS configuration
- Retryable: No (certificates must be fixed)
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Error: TLS configuration error
Error message:
error generating http tls config: <details>Or:
error generating grpc tls config: <details>Where <details> may include messages such as TLS version %q not recognized, cipher suite %q not recognized, or unknown TLS version: <version>.
Cause:
The TLS configuration is invalid. This can happen when:
- An unsupported TLS version string is supplied
- Cipher suite configuration is invalid
- Client auth type is unrecognized
Resolution:
Review TLS settings for compatibility issues.
Use supported TLS versions by setting
tls_min_versionat the top level of theserverblock:server: tls_min_version: VersionTLS12Valid values are
VersionTLS10,VersionTLS11,VersionTLS12, andVersionTLS13. There is nomax_versionsetting;tls_min_versionis the only version constraint.Check cipher suite configuration if customized.
Properties:
- Enforced by: TLS initialization
- Retryable: No (configuration must be fixed)
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
DNS resolution errors
DNS errors occur when Loki cannot resolve hostnames for service discovery or backend connections.
Error: DNS lookup timeout
Error message:
msg="failed to resolve server addresses" err="... DNS lookup timeout: [<address>] ..."Cause:
DNS resolution exceeded the 5-second timeout when trying to resolve addresses for Loki service discovery or backend connections.
This error is emitted by the index gateway and bloom gateway DNS discovery loops.
The DNS lookup timeout: [<address>] string is the context cause embedded within the err field; the full address list is formatted as a Go slice (for example, [dns+loki-index-gateway.loki.svc.cluster.local:9095]).
Resolution:
Check DNS server availability and configuration.
Verify hostname resolution:
nslookup <hostname> dig <hostname>Use IP addresses as a workaround if DNS is unreliable:
# Instead of dns+hostname:port memberlist: join_members: - 10.0.0.1:7946 - 10.0.0.2:7946For Kubernetes, ensure CoreDNS is healthy and headless services are configured correctly.
Properties:
- Enforced by: Index gateway client, bloom gateway client DNS discovery loop
- Retryable: Yes (DNS may recover)
- HTTP status: N/A (connectivity failure)
- Configurable per tenant: No
Scheduler and frontend errors
These errors relate to query scheduling, frontend workers, and queue management.
Error: Scheduler is not running
Error message:
scheduler is not runningCause:
The query scheduler service is not in a running state. This can occur when:
- The scheduler is starting up
- The scheduler encountered a fatal error
- The scheduler is shutting down
Resolution:
Check scheduler logs for startup errors or crashes.
Verify scheduler health:
curl -s http://scheduler:3100/readyCheck scheduler ring membership if using ring-based scheduling:
curl -s http://scheduler:3100/ring | jq
Properties:
- Enforced by: Scheduler service
- Retryable: Yes (wait for scheduler to become ready)
- HTTP status: 503 Service Unavailable
- Configurable per tenant: No
Error: Too many outstanding requests
Error message:
too many outstanding requestsCause:
The query queue has reached its maximum capacity. This indicates the system is overloaded with queries.
Resolution:
Scale out queriers to process queries faster:
querier: max_concurrent: 10Increase queue capacity (with caution). The default is
32000; increase beyond that only if you have confirmed the system can handle the additional load. Note that increasing the queue is often necessary because of how many subqueries can be generated by large values fortsdb_max_query_parallelism. Generally it’s preferable to add more queriers and leave this setting unchanged.query_scheduler: max_outstanding_requests_per_tenant: 64000Rate limit queries at the client or load balancer level.
Optimize slow queries to reduce queue time.
Properties:
- Enforced by: Query scheduler/frontend
- Retryable: Yes
- HTTP status: 429 Too Many Requests
- Configurable per tenant: No
Error: Querying is disabled
Error message:
querying is disabled, please contact your Loki operatorCause:
Query parallelism has been set to zero, effectively disabling all queries. This is typically done intentionally during maintenance.
Resolution:
Check the relevant parallelism setting for your index type. For TSDB indexes (the current default),
tsdb_max_query_parallelismsupersedesmax_query_parallelism. Either value being set to zero triggers this error. Verify that both are greater than zero:limits_config: max_query_parallelism: 32 # default; applies to non-TSDB schemas tsdb_max_query_parallelism: 128 # default; applies to TSDB schemasSize
tsdb_max_query_parallelismto your ingest volume. Typical values in production are in the range of 128–2048, proportional to the volume of logs ingested per day:Account for the querier capacity this requires. Each unit of parallelism consumes one querier worker slot. With the default
querier.max_concurrentof4, the number of queriers needed to fully parallelize a single query is:queriers needed = tsdb_max_query_parallelism / max_concurrentFor example,
tsdb_max_query_parallelism: 2048withmax_concurrent: 4requires 512 queriers to run one query fully in parallel. Production deployments supporting many tenants running large queries simultaneously commonly run thousands of queriers.Contact your administrator if you don’t have access to change these settings.
Properties:
- Enforced by: Query frontend
- Retryable: No (configuration must change)
- HTTP status: 429 Too Many Requests
- Configurable per tenant: Yes
Error: No frontend address
Error message:
no frontend addressCause:
The scheduler received a request from a frontend but no frontend address was provided for sending responses back.
Resolution:
Check frontend configuration to ensure the address is set:
frontend: address: query-frontend:9095Verify gRPC connectivity between frontend and scheduler.
Properties:
- Enforced by: Scheduler
- Retryable: No (configuration issue)
- HTTP status: 400 Bad Request
- Configurable per tenant: No
Error: Scheduler shutting down
Error message:
scheduler is shutting downCause:
The frontend scheduler worker detected that the scheduler is in shutdown mode and cannot accept new requests.
Resolution:
- Wait for shutdown to complete and the scheduler to restart.
- Check if this is expected (rolling update, maintenance).
- Retry the request after the scheduler is healthy.
Properties:
- Enforced by: Scheduler
- Retryable: Yes (after scheduler restart)
- HTTP status: 503 Service Unavailable
- Configurable per tenant: No
Index gateway errors
Index gateway errors occur when queriers cannot communicate with index gateways for index lookups.
Error: Index gateway unhealthy in ring
Error message:
index-gateway is unhealthy in the ringCause:
The index gateway instance detects itself as unhealthy in the ring and refuses to process queries. This is a self-check: before handling tenant requests, the gateway verifies it appears in the set of healthy ring members.
Resolution:
Check index gateway health:
curl -s http://index-gateway:3100/readyView the ring status: Open a browser and navigate to http://localhost:3100/ring. You should see the Loki Ring Status page.
OR
curl -s http://index-gateway:3100/ringCheck logs for errors preventing the gateway from becoming healthy.
Restart the index gateway if it’s stuck in an unhealthy state.
Properties:
- Enforced by: Index gateway ring
- Retryable: Yes (wait for gateway to become healthy)
- HTTP status: 500 Internal Server Error
- Configurable per tenant: No
Error: No index gateway instances found
Error message:
no index gateway instances found for tenant <tenant>Cause:
No index gateway instances are available in the ring to serve the tenant’s request. This could be due to:
- All index gateways are unhealthy
- Shuffle sharding excludes this tenant
- Ring is empty
Resolution:
Check if any index gateways are running:
curl -s http://index-gateway:3100/ring | jq '.shards | length'Verify ring mode is configured if using shuffle sharding. The index gateway must run in
ringmode and the per-tenant shard size must be set:index_gateway: mode: ring limits_config: index_gateway_shard_size: 3 # default = 0 (use all instances)Scale up index gateways if needed.
Properties:
- Enforced by: Index gateway client
- Retryable: Yes
- HTTP status: 500 Internal Server Error
- Configurable per tenant: Yes (via
index_gateway_shard_sizeinlimits_config)
Error: Index client not initialized
Error message:
index client is not initialized likely due to boltdb-shipper not being usedCause:
The index gateway was queried for operations that require the index client, but the client wasn’t initialized because the boltdb-shipper store isn’t configured.
Resolution:
Verify your schema config uses the correct index store:
schema_config: configs: - from: 2024-01-01 store: tsdb object_store: s3 schema: v13 index: prefix: index_ period: 24hCheck if the operation requires boltdb-shipper - some legacy operations may not be supported with TSDB.
Properties:
- Enforced by: Index gateway
- Retryable: No (configuration/schema issue)
- HTTP status: 500 Internal Server Error
- Configurable per tenant: No
Compactor and retention errors
Compactor errors occur during index compaction or retention enforcement.
Error: No chunks found in table
Error message:
no chunks found in table, please check if there are really no chunks and manually drop the table or see if there is a bug causing us to drop whole index tableCause:
The compactor found an empty index table during retention processing. This could indicate:
- All chunks in the table have expired
- The table was never populated
- Data corruption
Resolution:
Verify the table should be empty:
# Check if data exists for the time period logcli query '{job=~".+"}' --from="<table-start-time>" --to="<table-end-time>" --limit=1If the table is legitimately empty, manually delete it from object storage.
If data should exist, investigate potential data loss.
Properties:
- Enforced by: Compactor retention
- Retryable: No (requires manual intervention)
- HTTP status: N/A (background process)
- Configurable per tenant: No
Error: Delete request store not configured
Error message:
compactor.delete-request-store should be configured when retention is enabledCause:
Retention is enabled but no store is configured for tracking delete requests.
Resolution:
Configure the delete request store:
compactor: retention_enabled: true delete_request_store: s3Or disable retention if not needed:
compactor: retention_enabled: false
Properties:
- Enforced by: Configuration validation
- Retryable: No
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Error: Max compaction parallelism invalid
Error message:
max compaction parallelism must be >= 1Cause:
The compactor’s parallelism setting is configured to zero or a negative number.
Resolution:
Set a valid parallelism value:
compactor: max_compaction_parallelism: 1 # Must be >= 1
Properties:
- Enforced by: Configuration validation
- Retryable: No
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Error: Delete request not found
Error message:
could not find delete request with given idCause:
An attempt to cancel a delete request failed because no matching request exists.
Resolution:
List existing delete requests:
curl -s http://compactor:3100/loki/api/v1/delete | jqVerify the delete request ID is correct.
Check if the request has already been processed and removed.
Properties:
- Enforced by: Compactor API
- Retryable: No
- HTTP status: 404 Not Found
- Configurable per tenant: No
Error: Retention is not enabled
Error message:
Retention is not enabledCause:
A delete request was submitted but retention is not enabled in the compactor configuration. Delete requests require retention to be enabled.
Resolution:
Enable retention in the compactor:
compactor: retention_enabled: true delete_request_store: s3Restart the compactor after changing the configuration.
Properties:
- Enforced by: Compactor delete request handler
- Retryable: No (configuration must change)
- HTTP status: 400 Bad Request
- Configurable per tenant: No
Error: Invalid delete request time format
Error message:
invalid start time: require unix seconds or RFC3339 formatOr:
invalid end time: require unix seconds or RFC3339 formatCause:
The start or end time in a delete request is not in a valid format.
Resolution:
Use Unix seconds or RFC3339 format:
# Unix seconds curl -X POST http://compactor:3100/loki/api/v1/delete \ -H "X-Scope-OrgID: my-tenant" \ -d "query={app=\"foo\"}" \ -d "start=1704067200" \ -d "end=1704153600" # RFC3339 curl -X POST http://compactor:3100/loki/api/v1/delete \ -H "X-Scope-OrgID: my-tenant" \ -d "query={app=\"foo\"}" \ -d "start=2024-01-01T00:00:00Z" \ -d "end=2024-01-02T00:00:00Z"
Properties:
- Enforced by: Compactor delete request handler
- Retryable: No (request must be fixed)
- HTTP status: 400 Bad Request
- Configurable per tenant: No
Error: Delete request already processed
Error message:
deletion of request which is in process or already processed is not allowedCause:
An attempt was made to cancel a delete request that is already being processed or has completed processing.
Resolution:
Check the status of the delete request:
curl -s http://compactor:3100/loki/api/v1/delete \ -H "X-Scope-OrgID: my-tenant" | jqSubmit a new delete request if you need to delete additional data.
Properties:
- Enforced by: Compactor delete request handler
- Retryable: No
- HTTP status: 400 Bad Request
- Configurable per tenant: No
Error: Invalid max_interval for delete request
Error message:
invalid max_interval: valid time units are 's', 'm', 'h'Or:
max_interval can't be greater than <configured-limit>Or:
max_interval can't be greater than the interval to be deleted (<duration>)Cause:
The max_interval parameter on a delete request has an invalid value, exceeds the configured delete_max_interval limit, or exceeds the time range of the delete request itself.
Resolution:
Use a valid time format with supported units (
s,m,h):curl -X POST http://compactor:3100/loki/api/v1/delete \ -H "X-Scope-OrgID: my-tenant" \ -d "query={app=\"foo\"}" \ -d "start=1704067200" \ -d "end=1704153600" \ -d "max_interval=1h"
Properties:
- Enforced by: Compactor delete request handler
- Retryable: No (request must be fixed)
- HTTP status: 400 Bad Request
- Configurable per tenant: No
Ruler errors
Ruler errors occur when evaluating alerting rules or recording rules.
Error: Invalid ruler evaluation config
Error message:
invalid ruler evaluation config: <details>Cause:
The ruler evaluation mode configuration is invalid.
Resolution:
Use a valid evaluation mode:
ruler: evaluation: mode: local # Or "remote"
Properties:
- Enforced by: Ruler module initialization (
initRuleEvaluator) - Retryable: No
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Error: Ruler remote write config conflict
Error message:
ruler remote write config: both 'client' and 'clients' options are defined; 'client' is deprecated, please only use 'clients'Cause:
Both the deprecated client and the new clients configuration options are set for ruler remote write.
Resolution:
Remove the deprecated config and use
clients:ruler: remote_write: # Remove this: # client: {} # Use this instead: clients: primary: url: http://prometheus:9090/api/v1/write
Properties:
- Enforced by: Ruler initialization (
NewRuler) - Retryable: No
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Error: Remote write enabled but no URL configured
Error message:
remote-write enabled but no clients URL are configuredOr when multiple clients are configured in the clients map and one entry is missing a URL:
remote-write enabled but client '<name>' URL for tenant <client-id> is not configuredCause:
Remote write is enabled for the ruler but no destination URL is configured. The first variant occurs when the clients map is empty. The second occurs when a named entry in the clients map has no url set; <client-id> is the map key for that entry, not a tenant ID.
Resolution:
Configure the remote write URL:
ruler: remote_write: enabled: true clients: primary: url: http://prometheus:9090/api/v1/writeOr disable remote write:
ruler: remote_write: enabled: false
Properties:
- Enforced by: Configuration validation
- Retryable: No
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Error: Rule result is not a vector or scalar
Error message:
rule result is not a vector or scalarCause:
A rule evaluation returned an unexpected result type. Both recording rules and alerting rules must produce vector or scalar results. A plain log-stream expression (one that returns log lines rather than a numeric metric) triggers this error in either rule type.
Resolution:
Check the rule expression returns a vector or scalar:
# Valid - returns vector: record: my_metric expr: sum(rate({job="app"}[5m] | json | level="error")) # Invalid - returns logs (triggers error for both recording and alerting rules): # record: my_metric # expr: '{job="app"}'Use aggregation functions to produce numeric results from log queries.
Properties:
- Enforced by: Ruler evaluation
- Retryable: No (rule must be fixed)
- HTTP status: N/A (background process)
- Configurable per tenant: No
Error: Ruler WAL closed
Error message:
WAL storage closedCause:
An operation was attempted on the ruler’s write-ahead log (WAL) after it was closed. This typically occurs during shutdown.
Resolution:
- Wait for the ruler to restart if it’s restarting.
- Check ruler logs for errors that caused unexpected WAL closure.
- Verify disk space is available for WAL operations.
Properties:
- Enforced by: Ruler WAL
- Retryable: Yes (after ruler restart)
- HTTP status: N/A
- Configurable per tenant: No
Kafka integration errors
These errors occur when Loki is configured to use Kafka for ingestion.
Error: Missing Kafka address
Error message:
the Kafka address has not been configuredCause:
Kafka ingestion is enabled but no Kafka broker address is configured.
Resolution:
Configure the Kafka address:
kafka_config: topic: loki-logs reader_config: address: kafka:9092 writer_config: address: kafka:9092
Properties:
- Enforced by: Configuration validation
- Retryable: No
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Error: Missing Kafka topic
Error message:
the Kafka topic has not been configuredCause:
Kafka ingestion is enabled but no topic name is configured.
Resolution:
Configure the Kafka topic:
kafka_config: topic: loki-logs reader_config: address: kafka:9092 writer_config: address: kafka:9092
Properties:
- Enforced by: Configuration validation
- Retryable: No
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Error: Inconsistent SASL username and password
Error message:
both sasl username and password must be setCause:
Only one of the Simple Authentication and Security Layer (SASL) username or password is configured. Both must be set together.
Resolution:
Configure both username and password:
kafka_config: sasl_username: my-user sasl_password: ${KAFKA_PASSWORD}Or remove both if SASL authentication is not required.
Properties:
- Enforced by: Configuration validation
- Retryable: No
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Error: Kafka enabled in distributor but not in ingester
Error message:
kafka is enabled in distributor but not in ingesterCause:
Kafka is configured for the distributor but the ingester isn’t configured to read from Kafka. Both must be configured together.
Resolution:
Enable Kafka in both distributor and ingester:
distributor: kafka_writes_enabled: true ingester: kafka_ingestion: enabled: true
Properties:
- Enforced by: Configuration validation
- Retryable: No
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Bloom gateway errors
Bloom gateway errors occur when using bloom filters for query acceleration.
Error: Invalid bloom gateway addresses
Error message:
addresses requires a list of comma separated strings in DNS service discovery format with at least one itemCause:
The bloom_gateway.client.addresses configuration field is empty or unset.
Resolution:
Configure valid addresses:
bloom_gateway: client: addresses: dns+bloom-gateway:9095Valid formats:
dns+hostname:port- DNS-based discoveryhost1:port,host2:port- Static list
Properties:
- Enforced by: Configuration validation
- Retryable: No
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Error: Request time range must span exactly one day
Error message:
request time range must span exactly one dayCause:
Bloom gateway requests must be for exactly one day of data due to how bloom blocks are organized.
Resolution:
- This is typically handled automatically by the bloom querier, which splits multi-day queries into per-day requests before sending them to the gateway. If you see this error:
- Check that the querier is properly configured
- Ensure queries are routed through the querier
Properties:
- Enforced by: Bloom gateway
- Retryable: No
- HTTP status: 500 Internal Server Error
- Configurable per tenant: No
Error: From time must not be after through time
Error message:
from time must not be after through timeCause:
The bloom gateway received a request where the start time (from) is later than the end time (through).
Resolution:
- This indicates a malformed request reaching the bloom gateway. Verify that the client sending the request constructs time ranges correctly with
from≤through.
Properties:
- Enforced by: Bloom gateway
- Retryable: No
- HTTP status: 500 Internal Server Error
- Configurable per tenant: No
Write-ahead log (WAL) errors
WAL errors occur when the ingester cannot properly manage its write-ahead log.
Error: WAL is stopped
Error message:
wal is stoppedCause:
An operation was attempted on the WAL after it was stopped. This typically occurs during shutdown or after a fatal error.
Resolution:
- Check ingester health and logs for errors.
- Verify disk space is available.
- Restart the ingester if it’s in a bad state.
Properties:
- Enforced by: Ingester WAL
- Retryable: Yes (after ingester restart)
- HTTP status: 500 Internal Server Error
- Configurable per tenant: No
Error: Invalid checkpoint duration
Error message:
invalid checkpoint duration: <duration>Cause:
The WAL checkpoint duration is set to an invalid value (likely zero or negative).
Resolution:
Set a valid checkpoint duration:
ingester: wal: checkpoint_duration: 5m
Properties:
- Enforced by: Configuration validation
- Retryable: No
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Ingester lifecycle errors
Ingester lifecycle errors occur during ingester startup, shutdown, or state transitions.
Error: Ingester is shutting down
Error message:
Ingester is shutting downCause:
The ingester is in the process of shutting down and is no longer accepting writes. This error (also known as ErrReadOnly) is returned when a push request arrives during graceful shutdown. During this period the ingester may still serve reads for data it holds in memory.
Resolution:
- Configure clients to retry with backoff. The distributor will route to other healthy ingesters.
- Wait for shutdown to complete and the new instance to start.
- Check if shutdown is expected (rolling update, scaling event).
- If unexpected, check orchestrator logs for OOM kills or health check failures.
Properties:
- Enforced by: Ingester
- Retryable: Partial. The distributor sends writes to all ingesters in the replication set in parallel and uses a quorum model. If the remaining ingesters meet the minimum success threshold, the overall write succeeds despite this error from a shutting-down ingester.
- HTTP status: 500 Internal Server Error
- Configurable per tenant: No
Error: Ingester is stopping or already stopped
Error message:
Ingester is stopping or already stopped.Cause:
The ingester’s shutdown management endpoint (POST /loki/api/v1/ingester/shutdown) was called when the ingester was not in a Running state. This happens when the endpoint is called a second time during an in-progress shutdown or after the ingester has already stopped. This error is returned by the shutdown endpoint, not by the log-write or query paths.
Resolution:
- Do not call the shutdown endpoint again while a shutdown is already in progress.
- Check orchestrator for duplicate shutdown signals or restart policies.
- Investigate if the stop was unexpected (pod eviction, OOM, crash).
Properties:
- Enforced by: Ingester shutdown endpoint
- Retryable: No (the shutdown endpoint call itself is not retryable; wait for the ingester to restart before sending new writes)
- HTTP status: 503 Service Unavailable (response from the shutdown endpoint)
- Configurable per tenant: No
Error: Failed to start partition reader
Error message:
failed to start partition reader: <details>Cause:
The ingester could not start its Kafka partition reader. This occurs when Kafka ingestion is enabled but the partition reader fails to initialize.
Resolution:
Check Kafka connectivity from the ingester.
Verify Kafka topic exists and the ingester has appropriate permissions.
Review Kafka configuration:
kafka: address: kafka:9092 topic: loki-logsCheck Kafka broker health.
Properties:
- Enforced by: Ingester startup
- Retryable: No (configuration or infrastructure must be fixed)
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Error: Failed to start partition ring lifecycler
Error message:
failed to start partition ring lifecycler: <details>Cause:
The ingester could not start its Kafka partition ring lifecycler during startup. This is a separate component from the partition reader; it manages the ingester’s membership in the partition ring. This only occurs when Kafka ingestion is enabled.
Resolution:
- Check Kafka connectivity from the ingester.
- Verify the partition ring KV store (the store used for the partition ring) is reachable.
- Review ingester logs for the wrapped error in
<details>. - Check Kafka broker health and partition availability.
Properties:
- Enforced by: Ingester startup
- Retryable: No (configuration or infrastructure must be fixed)
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Error: Lifecycler failed
Error message:
lifecycler failed: <details>Cause:
The ingester’s lifecycler (which manages ring membership) encountered a fatal error. This prevents the ingester from participating in the ring.
Resolution:
- Check KV store connectivity (Consul, etcd, or memberlist).
- Review ingester logs for the specific lifecycler error.
- Verify ring configuration is consistent across all ingesters.
- Restart the ingester after fixing the underlying issue.
Properties:
- Enforced by: Ingester lifecycler
- Retryable: Yes (after fix and restart)
- HTTP status: N/A (internal failure)
- Configurable per tenant: No
Pattern ingester errors
Pattern ingester errors occur when using the pattern ingester for automatic log pattern detection.
Error: Pattern ingester replication factor must be 1
Error message:
pattern ingester replication factor must be 1Cause:
The pattern ingester is configured with a replication factor other than 1. Currently, the pattern ingester only supports a replication factor of 1.
Resolution:
Set the replication factor to 1:
pattern_ingester: lifecycler: ring: replication_factor: 1
Properties:
- Enforced by: Configuration validation
- Retryable: No (configuration must be fixed)
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Error: Pattern ingester retain-for too short
Error message:
retain-for (<duration>) must be greater than or equal to chunk-duration (<duration>)Cause:
The pattern ingester’s retain_for duration is shorter than max_chunk_age, which would cause data loss.
Resolution:
Increase the retain-for duration to be at least as long as
max_chunk_age:pattern_ingester: retain_for: 15m # Must be >= max_chunk_age max_chunk_age: 5m
Properties:
- Enforced by: Configuration validation
- Retryable: No (configuration must be fixed)
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Error: Pattern ingester chunk-duration too short
Error message:
chunk-duration (<duration>) must be greater than or equal to sample-interval (<duration>)Cause:
The pattern ingester’s max_chunk_age is shorter than pattern_sample_interval. Chunks must span at least one sample interval to hold any data.
Resolution:
Increase
max_chunk_ageto be at least as long aspattern_sample_interval:pattern_ingester: max_chunk_age: 1h # Must be >= pattern_sample_interval (default: 1h) pattern_sample_interval: 10s # default: 10s
Properties:
- Enforced by: Configuration validation
- Retryable: No (configuration must be fixed)
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
Error: Pattern ingester volume threshold out of range
Error message:
volume_threshold (<value>) must be between 0 and 1Cause:
The volume_threshold value is outside the valid range of 0 to 1. This setting controls what fraction of log volume the pattern ingester tracks — only patterns representing the top X% of log volume are persisted.
Resolution:
Set
volume_thresholdto a value between 0 and 1 (default is0.99):pattern_ingester: volume_threshold: 0.99
Properties:
- Enforced by: Configuration validation
- Retryable: No (configuration must be fixed)
- HTTP status: N/A (startup failure)
- Configurable per tenant: No
API parameter errors
These errors occur when API requests contain invalid parameters.
Error: Invalid direction
Error message:
invalid direction '<value>'Cause:
The direction query parameter contains an invalid value.
Resolution:
Use a valid direction value:
forward- Oldest to newestbackward- Newest to oldest (default)
curl "http://loki:3100/loki/api/v1/query_range?query={job=\"app\"}&direction=forward"
Properties:
- Enforced by: API handler
- Retryable: No (request must be fixed)
- HTTP status: 400 Bad Request
- Configurable per tenant: No
Error: Limit must be a positive value
Error message:
limit must be a positive valueCause:
The limit parameter is zero or negative.
Resolution:
Provide a positive limit:
curl "http://loki:3100/loki/api/v1/query_range?query={job=\"app\"}&limit=100"
Properties:
- Enforced by: API handler
- Retryable: No (request must be fixed)
- HTTP status: 400 Bad Request
- Configurable per tenant: No
Error: End timestamp must not be before start time
Error message:
end timestamp must not be before or equal to start timeCause:
The query’s end time is before or equal to its start time.
Resolution:
Ensure end time is after start time:
curl "http://loki:3100/loki/api/v1/query_range?\ query={job=\"app\"}&\ start=2024-01-01T00:00:00Z&\ end=2024-01-02T00:00:00Z"
Properties:
- Enforced by: API handler
- Retryable: No (request must be fixed)
- HTTP status: 400 Bad Request
- Configurable per tenant: No
Error: Delay for tailing too large
Error message:
delay_for can't be greater than <max>Cause:
The delay_for parameter for tailing queries exceeds the maximum allowed value.
Resolution:
Reduce the delay_for value:
curl "http://loki:3100/loki/api/v1/tail?query={job=\"app\"}&delay_for=5"The maximum value is typically 5 seconds.
Properties:
- Enforced by: API handler
- Retryable: No (request must be fixed)
- HTTP status: 400 Bad Request
- Configurable per tenant: No
Error: Query filtering requires compactor address
Error message:
query filtering for deletes requires 'compactor_grpc_address' or 'compactor_address' to be configuredCause:
Query-time filtering for delete requests is enabled but Loki doesn’t know how to reach the compactor to retrieve active delete requests.
Resolution:
Configure the compactor address:
compactor: compactor_grpc_address: compactor:9095Or use the HTTP address:
compactor: compactor_address: http://compactor:3100
Properties:
- Enforced by: Module initialization
- Retryable: No (configuration must be fixed)
- HTTP status: N/A (startup failure)
- Configurable per tenant: No



