Menu
Grafana Cloud

Onboard a group of services

In this scenario, you go through how to onboard a new suite of internal services to Adaptive Traces.

Goal

Apply a specific, cost-effective sampling strategy to a new group of services (e.g., app-billing, app-provisioning, app-alerts) that all match the regular expression app-.*. You want Adaptive Traces to sample 5% of traces from these services, plus all of their error traces.

Challenge

Manage sampling policies for this specific group of services without affecting the telemetry data from any other existing services in your environment.

You need a way to:

  • Isolate the new app-.* services for a unique sampling rule.
  • Apply a baseline 5% sample to them to manage observability costs.
  • Ensure all critical error traces from these new services are captured.
  • Guarantee that all your other existing services continue to have 100% of their traces ingested without any changes.

Solution

The solution is a three-layer policy strategy.

To achieve this, you use a combination of three policies to gain precise control over your data ingestion.

Policy 1

Ingest everything from all other services.

First, you create a policy to explicitly keep all traces from services that do not match the app-.* pattern. The invert_match option is crucial here. This policy acts as a “pass-through” rule for all your other services, ensuring they are not affected by the new sampling policies you’re about to add.

Here is an example using JSON.

// This policy ensures traces from all services not matching this condition are ingested without being downsampled
{
  "and_sub_policy": [
    {
      "name": "non-app-services",
      "type": "string_attribute",
      "string_attribute": {
        "key": "service.name",
        "values": [
          "app-.*"
        ],
        "invert_match": true,
        "enabled_regex_matching": true
      }
    }
  ]
}

Policy 2

Keep 5% of traces from your target services

Next, you create a policy that applies only to your app-.* services. This rule uses an AND condition to combine two criteria: the trace must originate from an app-.* service, AND it must fall within a 5% probabilistic sample.

Here is an example using JSON.

{
  "and_sub_policy": [
    {
      "name": "app-services",
      "type": "string_attribute",
      "string_attribute": {
        "key": "service.name",
        "values": [
          "app-.*"
        ],
        "enabled_regex_matching": true
      }
    },
    {
      "name": "5-percent-sample",
      "type": "probabilistic",
      "probabilistic": {
        "sampling_percentage": 5
      }
    }
  ]
}

Policy 3

Keep all error traces from your target services

Finally, you add a critical safety net. This policy also targets only your app-.*services but ensures that any trace with an ERROR status is always kept, overriding the 5% sampling rule for those specific traces.

Here is an example using JSON.

{
  "and_sub_policy": [
    {
      "name": "app-services",
      "type": "string_attribute",
      "string_attribute": {
        "key": "service.name",
        "values": [
          "app-.*"
        ],
        "enabled_regex_matching": true
      }
    },
    {
      "name": "error-traces",
      "type": "status_code",
      "status_code": {
        "status_codes": [
          "ERROR"
        ]
      }
    }
  ]
}

Outcome

With these three policies active, you have successfully achieved your goal:

  • Traces from your other services like payment-api or user-dashboard continue to be ingested at 100% because they match the pass-through rule in Policy 1.
  • Traces from your new app-billing service are now sampled correctly: 5% of its successful traces are kept, while 100% of its error traces are kept.

You can now analyze the performance of your new services using a representative data sample while keeping observability costs under control. This approach allows you to onboard new services to the tracing platform with confidence and without impacting the rest of your organization’s monitoring strategy.