Menu
Documentation
Grafana Cloud
AI and machine learning
Grafana AI Observability
Configure
Tune SDK settings
Grafana Cloud
Configure the AI Observability SDK
All AI Observability SDKs share the same configuration model. This article covers the available options for generation export, authentication, batching, and telemetry.
Generation export
| Parameter | Default | Description |
|---|---|---|
protocol | grpc | Transport protocol. Options: http, grpc, none (instrumentation-only). |
endpoint | varies by protocol | AI Observability API address. HTTP default: http://localhost:8080/api/v1/generations:export. gRPC default: localhost:4317. |
Authentication
| Mode | Required fields | Description |
|---|---|---|
none | — | No authentication. Suitable for local development. |
tenant | tenantId | Injects X-Scope-OrgID header. Use for self-hosted multi-tenant deployments. |
bearer | bearerToken | Injects Authorization: Bearer <token> header. Use with proxy patterns. |
basic | tenantId, basicPassword | Injects Authorization: Basic header. Recommended for Grafana Cloud. |
Batching and retry
| Parameter | Default | Description |
|---|---|---|
batchSize | 100 | Maximum generations per export batch. |
flushInterval | 1s | How often the SDK flushes queued generations. |
queueSize | 2000 | Maximum number of queued generations before the SDK drops new ones. |
maxRetries | 5 | Number of retry attempts for transient failures. |
initialBackoff | 100ms | Initial retry delay. |
maxBackoff | 5s | Maximum retry delay. |
payloadMaxBytes | 16 MB | Maximum payload size per export request. |
OpenTelemetry metrics
The SDK emits these OpenTelemetry metrics:
| Metric | Type | Description |
|---|---|---|
gen_ai.client.operation.duration | Histogram | LLM call duration. |
gen_ai.client.token.usage | Histogram | Token consumption per call. |
gen_ai.client.time_to_first_token | Histogram | Streaming time to first token. |
gen_ai.client.tool_calls_per_operation | Histogram | Tool calls per generation. |
Embedding capture
Embedding capture is off by default. Enable it for debugging only because it may expose sensitive data.
| Parameter | Default | Description |
|---|---|---|
captureInput | false | Capture embedding input content. |
maxInputItems | 20 | Maximum embedding inputs to capture. |
maxTextLength | 1024 | Maximum text length per input. |
Raw artifacts
Raw artifacts capture the unprocessed provider request and response. Off by default.
Enable per-language:
- Go:
WithRawArtifacts()option - Python:
raw_artifacts=True - TypeScript:
rawArtifacts: true - Java:
.setRawArtifacts(true) - .NET:
.WithRawArtifacts()
Next steps
Was this page helpful?
Related resources from Grafana Labs
Additional helpful documentation, links, and articles:
Video

Getting started with managing your metrics, logs, and traces using Grafana
In this webinar, we’ll demo how to get started using the LGTM Stack: Loki for logs, Grafana for visualization, Tempo for traces, and Mimir for metrics.
Video

Intro to Kubernetes monitoring in Grafana Cloud
In this webinar you’ll learn how Grafana offers developers and SREs a simple and quick-to-value solution for monitoring their Kubernetes infrastructure.
Video

Building advanced Grafana dashboards
In this webinar, we’ll demo how to build and format Grafana dashboards.
Choose a product
Scroll for more