Configure thresholds using Terraform
Threshold configurations in Knowledge Graph allow you to define custom thresholds for request, resource, and health assertions. These configurations help you set specific limits and conditions for monitoring your services and infrastructure.
For information about managing thresholds in the Knowledge Graph UI, refer to Manage thresholds.
Basic threshold configuration
Create a file named thresholds.tf
and add the following:
# Basic threshold configuration with all three types
resource "grafana_asserts_thresholds" "basic" {
provider = grafana.asserts
request_thresholds = [{
entity_name = "payment-service"
assertion_name = "ErrorRatioBreach"
request_type = "inbound"
request_context = "/charge"
value = 0.01
}]
resource_thresholds = [{
assertion_name = "Saturation"
resource_type = "container"
container_name = "worker"
source = "metrics"
severity = "warning"
value = 75
}]
health_thresholds = [{
assertion_name = "ServiceDown"
expression = "up < 1"
entity_type = "Service"
}]
}
Request threshold configurations
Configure thresholds for different service request types and contexts:
# Multiple request thresholds for different services
resource "grafana_asserts_thresholds" "request_thresholds" {
provider = grafana.asserts
request_thresholds = [
{
entity_name = "api-service"
assertion_name = "ErrorRatioBreach"
request_type = "inbound"
request_context = "/api/v1/users"
value = 0.02
},
{
entity_name = "api-service"
assertion_name = "LatencyP99ErrorBuildup"
request_type = "inbound"
request_context = "/api/v1/orders"
value = 500
},
{
entity_name = "payment-gateway"
assertion_name = "RequestRateAnomaly"
request_type = "outbound"
request_context = "/payment/process"
value = 1000
}
]
}
Resource threshold configurations
Define resource thresholds for different severity levels:
# Resource thresholds for different severity levels
resource "grafana_asserts_thresholds" "resource_thresholds" {
provider = grafana.asserts
resource_thresholds = [
{
assertion_name = "Saturation"
resource_type = "container"
container_name = "web-server"
source = "metrics"
severity = "warning"
value = 75
},
{
assertion_name = "Saturation"
resource_type = "container"
container_name = "web-server"
source = "metrics"
severity = "critical"
value = 90
},
{
assertion_name = "ResourceRateBreach"
resource_type = "Pod"
container_name = "database"
source = "logs"
severity = "warning"
value = 80
}
]
}
Health threshold configurations
Configure health checks with Prometheus expressions:
# Health thresholds with Prometheus expressions
resource "grafana_asserts_thresholds" "health_thresholds" {
provider = grafana.asserts
health_thresholds = [
{
assertion_name = "ServiceDown"
expression = "up{job=\"api-service\"} < 1"
entity_type = "Service"
},
{
assertion_name = "HighMemoryUsage"
expression = "memory_usage_percent > 85"
entity_type = "Service"
},
{
assertion_name = "DatabaseConnectivity"
expression = "db_connection_pool_active / db_connection_pool_max > 0.9"
entity_type = "Service"
}
]
}
Comprehensive threshold configuration
Define comprehensive thresholds for production environments:
# Production environment with comprehensive thresholds
resource "grafana_asserts_thresholds" "production" {
provider = grafana.asserts
request_thresholds = [
{
entity_name = "frontend"
assertion_name = "ErrorRatioBreach"
request_type = "inbound"
request_context = "/"
value = 0.005
},
{
entity_name = "backend-api"
assertion_name = "LatencyP99ErrorBuildup"
request_type = "inbound"
request_context = "/api"
value = 200
}
]
resource_thresholds = [
{
assertion_name = "Saturation"
resource_type = "container"
container_name = "frontend"
source = "metrics"
severity = "warning"
value = 70
},
{
assertion_name = "Saturation"
resource_type = "container"
container_name = "backend-api"
source = "metrics"
severity = "critical"
value = 85
}
]
health_thresholds = [
{
assertion_name = "ServiceDown"
expression = "up < 1"
entity_type = "Service"
},
{
assertion_name = "NodeDown"
expression = "up{job=\"node-exporter\"} < 1"
entity_type = "Service"
}
]
}
Resource reference
grafana_asserts_thresholds
Manage Knowledge Graph threshold configurations through the Grafana API. This resource allows you to define custom thresholds for request, resource, and health assertions.
Arguments
Name | Type | Required | Description |
---|---|---|---|
request_thresholds | list(object) | No | List of request threshold configurations. Refer to request thresholds block for details. |
resource_thresholds | list(object) | No | List of resource threshold configurations. Refer to resource thresholds block for details. |
health_thresholds | list(object) | No | List of health threshold configurations. Refer to health thresholds block for details. |
Request thresholds block
Each request_thresholds
block supports the following:
Name | Type | Required | Description |
---|---|---|---|
entity_name | string | Yes | The name of the entity to apply the threshold to. |
assertion_name | string | Yes | The name of the assertion to configure. |
request_type | string | Yes | The type of request (inbound, outbound). |
request_context | string | Yes | The request context or path to apply the threshold to. |
value | number | Yes | The threshold value. |
Resource thresholds block
Each resource_thresholds
block supports the following:
Name | Type | Required | Description |
---|---|---|---|
assertion_name | string | Yes | The name of the assertion to configure. |
resource_type | string | Yes | The type of resource (container, Pod, node). |
container_name | string | Yes | The name of the container to apply the threshold to. |
source | string | Yes | The source of the metrics (metrics, logs). |
severity | string | Yes | The severity level (warning, critical). |
value | number | Yes | The threshold value. |
Health thresholds block
Each health_thresholds
block supports the following:
Name | Type | Required | Description |
---|---|---|---|
assertion_name | string | Yes | The name of the assertion to configure. |
expression | string | Yes | The Prometheus expression for the health check. |
entity_type | string | Yes | Entity type for the health threshold (for example, Service, Pod, Namespace, Volume). |
alert_category | string | No | Optional alert category label for the health threshold. |
Example
resource "grafana_asserts_thresholds" "example" {
provider = grafana.asserts
request_thresholds = [{
entity_name = "api-service"
assertion_name = "ErrorRatioBreach"
request_type = "inbound"
request_context = "/api/v1/users"
value = 0.02
}]
resource_thresholds = [{
assertion_name = "Saturation"
resource_type = "container"
container_name = "web-server"
source = "metrics"
severity = "warning"
value = 75
}]
health_thresholds = [{
assertion_name = "ServiceDown"
expression = "up{job=\"api-service\"} < 1"
entity_type = "Service"
}]
}
Best practices
Threshold configuration management
- Set appropriate threshold values based on your service level objectives (SLOs)
- Use different severity levels (warning, critical) to create escalation paths
- Test threshold configurations in non-production environments first
- Monitor threshold effectiveness and adjust values based on actual performance data
Request threshold best practices
- Configure request thresholds for critical user-facing endpoints
- Set different thresholds for different request types (inbound vs outbound)
- Consider request context when setting thresholds for specific API paths
- Use error ratio thresholds to catch service degradation early
- Review historical performance data to set realistic threshold values
Resource threshold best practices
- Set resource thresholds based on your infrastructure capacity
- Use container-specific thresholds for microservices architectures
- Configure both warning and critical thresholds for gradual escalation
- Monitor resource utilization patterns to set realistic threshold values
- Consider seasonal or periodic patterns in resource usage
Health threshold best practices
- Use Prometheus expressions that accurately reflect service health
- Test health check expressions independently before applying them
- Set up health thresholds for critical dependencies and external services
- Use composite expressions for complex health checks
- Ensure expressions perform efficiently without causing excessive load
Value selection guidelines
- Start conservative and adjust based on real-world performance
- Use percentages (0-1 range) for ratio-based metrics
- Use milliseconds for latency thresholds
- Document the reasoning behind specific threshold values
- Review and update thresholds regularly based on system evolution
Validation
After applying the Terraform configuration, verify that:
- Threshold configurations are applied in your Knowledge Graph instance
- Configurations appear in the Knowledge Graph UI under Observability > Rules > Threshold
- Request thresholds correctly identify breaches for specified services
- Resource thresholds trigger at appropriate severity levels
- Health thresholds accurately reflect service status
- Threshold values align with your SLO commitments