Add logs, metrics, and traces to plugin backends

Adding logs, metrics and traces to plugin backend components makes it easier to diagnose and resolve issues for both plugin developers and Grafana operators. This document provides guidance, conventions and best practices to help you effectively instrument your plugins, as well as how to access this data when the plugin is installed.

note

This document expects you to use at least grafana-plugin-sdk-go v0.246.0. However, the recommendation is to keep Grafana plugin SDK for Go up to date to get the latest improvements, security and bug fixes. Refer to Update the Go SDK for update instructions.

Logs

Logs are files that record events, warnings and errors as they occur within a software environment. Most logs include contextual information, such as the time an event occurred and which user or endpoint was associated with it.

Automatic instrumentation by the SDK

The SDK automates some instrumentation to ease developer and operator experience. A message Plugin Request Completed is logged after each method call (QueryData, CallResource, CheckHealth, and so on) is completed. Further, if QueryData response includes any error a message Partial data response error is logged for each data response error. Below, some examples of messages logged:

DEBUG[09-05|17:24:16] Plugin Request Completed  logger=plugin.grafana-test-datasource dsUID=edeuvt04gim0we endpoint=queryData pluginID=grafana-test-datasource statusSource=plugin uname=admin dsName=grafana-test-datasource traceID=604e15b6345c2c0896e6902fa86b82f5 duration=1.482975875s status=ok
DEBUG[09-05|18:24:16] Plugin Request Completed  logger=plugin.grafana-test-datasource dsUID=edeuvt04gim0we endpoint=queryData pluginID=grafana-test-datasource statusSource=plugin uname=admin dsName=grafana-test-datasource traceID=604e15b6345c2c0896e6902fa86b82f5 duration=1.482975875s status=cancelled error=context.Canceled error
ERROR[09-05|19:24:16] Plugin Request Completed  logger=plugin.grafana-test-datasource dsUID=edeuvt04gim0we endpoint=queryData pluginID=grafana-test-datasource statusSource=plugin uname=admin dsName=grafana-test-datasource traceID=604e15b6345c2c0896e6902fa86b82f5 duration=1.482975875s status=error error=something is not working as expected
ERROR[09-06|15:29:47] Partial data response error   logger=plugin.grafana-test-datasource status=500.000 statusSource=plugin dsName=grafana-test-datasource dsUID=edeuvt04gim0we endpoint=queryData refID=A error="no handler found for query type 'noise'" pluginID=grafana-test-datasource traceID=981b7761aa295e371757582c7a4043d1 uname=admin

Implement logging in your plugin

Using the global logger, backend.Logger, from the backend package works everywhere and for most use cases.

Example:

The following example shows basic use of the global logger with different severity levels and key-value pairs.

package plugin

import (
    "errors"

    "github.com/grafana/grafana-plugin-sdk-go/backend"
)

func main() {
    backend.Logger.Debug("Debug msg", "someID", 1)
    backend.Logger.Info("Info msg", "queryType", "default")
    backend.Logger.Warning("Warning msg", "someKey", "someValue")
    backend.Logger.Error("Error msg", "error", errors.New("An error occurred"))
}

The above example would output something like the following.

DEBUG[11-14|15:26:26] Debug msg     logger=plugin.grafana-basic-datasource someID=1
INFO [11-14|15:26:26] Info msg      logger=plugin.grafana-basic-datasource queryType=default
WARN [11-14|15:26:26] Warning msg   logger=plugin.grafana-basic-datasource someKey=someValue
ERROR[11-14|15:26:26] Error msg     logger=plugin.grafana-basic-datasource error=An error occurred

note

The backend.Logger is a convenient wrapper over log.DefaultLogger from the log package which you also can use to access the global logger.

Reuse logger with certain key/value pairs

You can log multiple messages and include certain key-value pairs without repeating your code everywhere, for example when you want to include some specific key-value pairs based on how a datasource has been configured in each log message. To do so, create a new logger with arguments using the With method on your instantiated logger.

Example:

The following example illustrates how you can instantiate a logger per datasource instance, and use the With method to include certain key-value pairs over the life-time of this datasource instance.

package plugin

import (
    "context"
    "errors"

    "github.com/grafana/grafana-plugin-sdk-go/backend"
    "github.com/grafana/grafana-plugin-sdk-go/backend/instancemgmt"
)

func NewDatasource(ctx context.Context, settings backend.DataSourceInstanceSettings) (instancemgmt.Instance, error) {
    logger := backend.Logger.With("key", "value")

    return &Datasource{
        logger: logger,
    }, nil
}

func (ds *Datasource) QueryData(ctx context.Context, req *backend.QueryDataRequest) (*backend.QueryDataResponse, error) {
    ds.logger.Debug("QueryData", "queries", len(req.Queries))
}

The above example would output something like the following each time QueryData is called.

DEBUG[11-14|15:26:26] QueryData     logger=plugin.grafana-basic-datasource key=value queries=2

note

You can also use backend.NewLoggerWith from the backend package which is a helper method that calls log.New().With(args...) from the log package.

Use a contextual logger

Use a contextual logger to automatically include additional key-value pairs attached to context.Context. For example, you can use traceID to allow correlating logs with traces and correlate logs with a common identifier. You can create a new contextual logger by using the FromContext method on your instantiated logger; you can also combine this method when reusing logger with certain key-value pairs. We recommend using a contextual logger whenever you have access to a context.Context.

By default, the following key-value pairs are included in logs when using a contextual logger:

pluginID: The plugin identifier. For example, grafana-github-datasource.
endpoint: The request being handled; that is, callResource, checkHealth, collectMetrics, queryData, runStream, subscribeStream, or publishStream.
traceID: If available, includes the distributed trace identifier.
dsName: If available, the name of the configured datasource instance.
dsUID: If available, the unique identifier (UID) of the configured datasource instance.
uname: If available, the username of the user who made the request.

Example:

The following example extends the Reuse logger with certain key/value pairs example to include usage of a contextual logger.

package plugin

import (
    "context"
    "errors"

    "github.com/grafana/grafana-plugin-sdk-go/backend"
    "github.com/grafana/grafana-plugin-sdk-go/backend/instancemgmt"
)

func NewDatasource(ctx context.Context, settings backend.DataSourceInstanceSettings) (instancemgmt.Instance, error) {
    logger := backend.Logger.With("key", "value")

    return &Datasource{
        logger:   logger,
    }, nil
}

func (ds *Datasource) QueryData(ctx context.Context, req *backend.QueryDataRequest) (*backend.QueryDataResponse, error) {
    ctxLogger := ds.logger.FromContext(ctx)
    ctxLogger.Debug("QueryData", "queries", len(req.Queries))
}

The above example would output something like this each time QueryData is called with 2 queries.

DEBUG[11-14|15:26:26] QueryData     logger=plugin.grafana-basic-datasource pluginID=grafana-basic-datasource endpoint=queryData traceID=399c275ebb516a53ec158b4d0ddaf914 dsName=Basic datasource dsUID=kXhzRl7Mk uname=admin key=value queries=2

Include additional contextual information in logs

If you want to propagate additional contextual key-value pairs to subsequent code/logic you can use the log.WithContextualAttributes function.

Example:

The following example extends the Use a contextual logger example with usage of the log.WithContextualAttributes function by adding additional contextual key-value pairs and allow propagation of these to other methods (handleQuery).

package plugin

import (
    "context"
    "errors"

    "github.com/grafana/grafana-plugin-sdk-go/backend"
    "github.com/grafana/grafana-plugin-sdk-go/backend/instancemgmt"
    "github.com/grafana/grafana-plugin-sdk-go/backend/log"
)

func NewDatasource(ctx context.Context, settings backend.DataSourceInstanceSettings) (instancemgmt.Instance, error) {
    logger := backend.Logger.With("key", "value")

    return &Datasource{
        logger: logger,
    }, nil
}

func (ds *Datasource) QueryData(ctx context.Context, req *backend.QueryDataRequest) (*backend.QueryDataResponse, error) {
    ctxLogger := ds.logger.FromContext(ctx)
    ctxLogger.Debug("QueryData", "queries", len(req.Queries))

    for _, q := range req.Queries {
        childCtx = log.WithContextualAttributes(ctx, []any{"refID", q.RefID, "queryType", q.QueryType})
        ds.handleQuery(childCtx, q)
    }
}

func (ds *Datasource) handleQuery(ctx context.Context, q backend.DataQuery) {
    ctxLogger := ds.logger.FromContext(ctx)
    ctxLogger.Debug("handleQuery")
}

The above example would output something like this each time QueryData is called with 2 queries.

DEBUG[11-14|15:26:26] QueryData     logger=plugin.grafana-basic-datasource pluginID=grafana-basic-datasource endpoint=queryData traceID=399c275ebb516a53ec158b4d0ddaf914 dsName=Basic datasource dsUID=kXhzRl7Mk uname=admin queries=2
DEBUG[11-14|15:26:26] handleQuery   logger=plugin.grafana-basic-datasource pluginID=grafana-basic-datasource endpoint=queryData traceID=399c275ebb516a53ec158b4d0ddaf914 dsName=Basic datasource dsUID=kXhzRl7Mk uname=admin refID=A queryType=simpleQuery
DEBUG[11-14|15:26:26] handleQuery   logger=plugin.grafana-basic-datasource pluginID=grafana-basic-datasource endpoint=queryData traceID=399c275ebb516a53ec158b4d0ddaf914 dsName=Basic datasource dsUID=kXhzRl7Mk uname=admin refID=B queryType=advancedQuery

Best practices

Start the log message with a capital letter; for example, logger.Info("Hello world") instead of logger.Info("hello world").
The log message should be an identifier for the log entry, try to avoid parameterization; for example, logger.Debug(fmt.Sprintf(“Something happened, got argument %d”, “arg”)), in favor of key-value pairs for additional data; for example, logger.Info(“Something happened”, “argument”, “arg”).
Prefer using camelCase style when naming log keys; for example, remoteAddr or userID, to be consistent with Go identifiers.
Use the key error when logging Go errors; for example, logger.Error("Something failed", "error", errors.New("An error occurred").
Use a contextual logger whenever you have access to a context.Context.
Do not log sensitive information, such as data source credentials or IP addresses, or other personally identifiable information.

Validate and sanitize input coming from user input

If log messages or key-value pairs originate from user input they should be validated and sanitized. Be careful to not expose any sensitive information in log messages (secrets, credentials, and so on). It's especially easy to do by mistake when including a Go struct as a value.

If values originating from user input are bounded, that is when there are a fixed set of expected values, it's recommended to validate it's one of these values or else return an error.

If values originating from user input are unbounded, that is when the value could be anything, it's recommended to validate the max length/size of value and return an error or sanitize by just allowing a certain amount/fixed set of characters.

When to use which log level?

Debug: Informational messages of high frequency and less-important messages during normal operations.
Info: Informational messages of low frequency and important messages.
Warning: An error/state that can be be recovered from without interrupting the operation. If used, it should be actionable so that the operator can do something to resolve it.
Error: Error messages indicating some operation failed (with an error) and the program didn't have a way to handle the error.

note

Incoming requests of high frequency are normally more common for the QueryData endpoint, since - for example - the nature of a dashboard generates a request per panel or query.

Inspect logs locally

Logs from a plugin's backend are consumed by the connected Grafana instance and included in the Grafana server log.

Each log message from a plugin backend component includes a logger name logger=plugin.<plugin id>. For example:

DEBUG[11-14|15:26:26] Debug msg     logger=plugin.grafana-basic-datasource someID=1
INFO [11-14|15:26:26] Info msg      logger=plugin.grafana-basic-datasource queryType=default
WARN [11-14|15:26:26] Warning msg   logger=plugin.grafana-basic-datasource someKey=someValue
ERROR[11-14|15:26:26] Error msg     logger=plugin.grafana-basic-datasource error=An error occurred

You can enable debug logging in your Grafana instance and that will normally output a huge amount of information and make it hard to find the logs related to a certain plugin. However, using a named logger makes it convenient to enable debug logging only for a certain named logger and plugin:

[log]
filters = plugin.<plugin id>:debug

Please refer to Configure Grafana for more details about setting up logging.

Further, see How to collect and visualize logs, metrics and traces.

Metrics

Metrics are quantifiable measurements that reflect the health and performance of applications or infrastructure.

Consider using metrics to provide real-time insight into the state of resources. If you want to know how responsive your plugin is or identify anomalies that could be early signs of a performance issue, metrics are a key source of visibility.

Metric types

There are four different metric types supported in Prometheus and that you can use:

Counter: Can only increase or be reset to zero on restart. For example, you can use a counter to represent the number of requests served, tasks completed, or errors.
Gauge: Numerical value that can arbitrarily go up and down. For example, you can use a gauge to represent the temperatures or current memory usage.
Histogram: Samples observations (usually things like request durations or response sizes) and counts them in configurable buckets. It also provides a sum of all observed values.
Summary: Similar to a histogram, a summary samples observations (usually things like request durations and response sizes). While it also provides a total count of observations and a sum of all observed values, it calculates configurable quantiles over a sliding time window.

See Prometheus metric types for a list and detailed description of the different metric types you can use and when to use them.

Automatic instrumentation by the SDK

The SDK automates some instrumentation to ease developer and operator experience. This section explores the default metrics collected and exposed.

Go runtime metrics

The SDK provides automatic collection and exposure of Go runtime, CPU, memory and process metrics to ease developer and operator experience. These metrics are exposed under the go_ and process_ namespaces and includes to name a few:

go_info: Information about the Go environment.
go_memstats_alloc_bytes: Number of bytes allocated and still in use.
go_goroutines: Number of goroutines that currently exist.
process_cpu_seconds_total: Total user and system CPU time spent in seconds.

For further details and an up-to-date list of what metrics are automatically gathered and exposed for your plugin it's suggested to call Grafana's HTTP API, /api/plugins/:pluginID/metrics. See also Collect and visualize metrics locally for further instructions how to pull metrics into Promethus.

Request metrics

The SDK provides automatic collection and exposure of a new counter metric named grafana_plugin_request_total allowing to track the success rate of plugin requests per endpoint (QueryData, CallResource, CheckHealth, and so on), status (ok, cancelled, error), status_source (plugin, downstream). Example output of metric by calling the Grafana HTTP API, /api/plugins/:pluginID/metrics:

# HELP grafana_plugin_request_total The total amount of plugin requests
# TYPE grafana_plugin_request_total counter
grafana_plugin_request_total{endpoint="queryData",status="error",status_source="plugin"} 1
grafana_plugin_request_total{endpoint="queryData",status="ok",status_source="plugin"} 4

Implement metrics in your plugin

The Grafana plugin SDK for Go uses the Prometheus instrumentation library for Go applications. Any custom metric registered with the default registry will be picked up by the SDK and exposed through the Collect metrics capability.

For convenience, it's recommended to use the promauto package when creating custom metrics since it automatically registers the metric in the default registry and exposes them to Grafana.

Example:

The following example shows how to define and use a custom counter metric named grafana_plugin_queries_total that tracks the total number of queries per query type.

package plugin

import (
    "context"

    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promauto"

    "github.com/grafana/grafana-plugin-sdk-go/backend"
)

var queriesTotal = promauto.NewCounterVec(
    prometheus.CounterOpts{
        Namespace: "grafana_plugin",
        Name:      "queries_total",
        Help:      "Total number of queries.",
    },
    []string{"query_type"},
)

func (ds *Datasource) QueryData(ctx context.Context, req *backend.QueryDataRequest) (*backend.QueryDataResponse, error) {
    for _, q := range req.Queries {
        queriesTotal.WithLabelValues(q.QueryType).Inc()
    }
}

Best practices

Consider using the namespace grafana_plugin as that would prefix any defined metric names with plugin. This will make it clear for operators that any metric named grafana_plugin originates from a Grafana plugin.
Use snake case style when naming metrics, e.g. http_request_duration_seconds instead of httpRequestDurationSeconds.
Use snake case style when naming metric labels, e.g. status_code instead of statusCode.
If the metric type is a counter, name it with a _total suffix, e.g. http_requests_total.
If the metric type is a histogram and you're measuring duration, name it with a _<unit> suffix, e.g. http_request_duration_seconds.
If the metric type is a gauge, name it to denote it's a value that can increase and decrease , e.g. http_request_in_flight.

Validate and sanitize input coming from user input

If label values originate from user input they should be validated and cleaned. It is very important to only allow a predefined set of labels to minimize the risk of high cardinality problems. Using or allowing too many label values could result in high cardinality problems. For example, using user IDs, email addresses, or other unbounded sets of values as a label could pretty easily create high cardinality problems and leading to a huge amount of time series in Prometheus. For more information about labels and high cardinality, see Prometheus label naming.

Be careful to not expose any sensitive information in label values (secrets, credentials, and so on).

If a value originating from user input are bounded, that is when there are a fixed set of expected values, it's recommended to validate it's one of these values or else return an error.

If a value originating from user input are unbounded, that is when the value could be anything, it's in general not recommended to use as a label because of high cardinality problems mentioned earlier. If still needed, the recommendation is to validate the max length/size of value and return an error or sanitize by just allowing a certain amount/fixed set of characters.

Collect and visualize metrics locally

Please refer to Pull metrics from a Grafana plugin backend into Prometheus.

Further, see How to collect and visualize logs, metrics and traces.

Traces

Distributed tracing allows plugin backend developers to create custom spans in their plugins, and then send them to the same endpoint and with the same propagation format as the main Grafana instance. The tracing context is also propagated from the Grafana instance to the plugin, so the plugin's spans will be correlated to the correct trace.

OpenTelemetry configuration in Grafana

Grafana supports OpenTelemetry for distributed tracing. If Grafana is configured to use a deprecated tracing system (Jaeger or OpenTracing), then tracing is disabled in the plugin provided by the SDK and configured when calling datasource.Manage | app.Manage.

OpenTelemetry must be enabled and configured for the Grafana instance. Refer to Configure Grafana for more information.

Refer to the OpenTelemetry Go SDK for in-depth documentation about all the features provided by OpenTelemetry.

note

If tracing is disabled in Grafana, backend.DefaultTracer() returns a no-op tracer.

Implement tracing in your plugin

When OpenTelemetry tracing is enabled on the main Grafana instance and tracing is enabled for a plugin, the OpenTelemetry endpoint address and propagation format is passed to the plugin during startup. These parameters are used to configure a global tracer.

Use datasource.Manage or app.Manage to run your plugin to automatically configure the global tracer. Specify any custom attributes for the default tracer using CustomAttributes:

func main() {
    if err := datasource.Manage("MY_PLUGIN_ID", plugin.NewDatasource, datasource.ManageOpts{
        TracingOpts: tracing.Opts{
            // Optional custom attributes attached to the tracer's resource.
            // The tracer will already have some SDK and runtime ones pre-populated.
            CustomAttributes: []attribute.KeyValue{
                attribute.String("my_plugin.my_attribute", "custom value"),
            },
        },
    }); err != nil {
        log.DefaultLogger.Error(err.Error())
        os.Exit(1)
    }
}

Once you have configured tracing, use the global tracer like this:

tracing.DefaultTracer()

This returns an OpenTelemetry trace.Tracer for creating spans.

Example:

func (d *Datasource) query(ctx context.Context, pCtx backend.PluginContext, query backend.DataQuery) (backend.DataResponse, error) {
    ctx, span := tracing.DefaultTracer().Start(
        ctx,
        "query processing",
        trace.WithAttributes(
            attribute.String("query.ref_id", query.RefID),
            attribute.String("query.type", query.QueryType),
            attribute.Int64("query.max_data_points", query.MaxDataPoints),
            attribute.Int64("query.interval_ms", query.Interval.Milliseconds()),
            attribute.Int64("query.time_range.from", query.TimeRange.From.Unix()),
            attribute.Int64("query.time_range.to", query.TimeRange.To.Unix()),
        ),
    )
    defer span.End()

    // ...
}

Automatic instrumentation by the SDK

The SDK automates some instrumentation to ease developer experience. This section explores the default tracing added to gRPC calls and outgoing HTTP requests.

Tracing gRPC calls

When tracing is enabled, a new span is created automatically for each gRPC call (QueryData, CallResource, CheckHealth, and so on), both on Grafana's side and on the plugin's side. The plugin SDK also injects the trace context into the context.Context that is passed to those methods.

You can retrieve the trace.SpanContext with tracing.SpanContextFromContext by passing the original context.Context to it:

func (d *Datasource) query(ctx context.Context, pCtx backend.PluginContext, query backend.DataQuery) (backend.DataResponse, error) {
    spanCtx := trace.SpanContextFromContext(ctx)
    traceID := spanCtx.TraceID()

    // ...
}

Tracing method calls

When tracing is enabled, a new span is created automatically for each method call named sdk.<endpoint> where endpoint is QueryData, CallResource, CheckHealth, and so on. Span attributes may include plugin_id, org_id, datasource_name, datasource_uid, user, request_status (ok, cancelled, error), status_source (plugin, downstream).

Tracing outgoing HTTP requests

When tracing is enabled, a TracingMiddleware is also added to the default middleware stack to all HTTP clients created using the httpclient.New or httpclient.NewProvider, unless you specify custom middleware. This middleware creates spans for each outgoing HTTP request and provides some useful attributes and events related to the request's lifecycle.

Collect and visualize traces locally

Refer to How to collect and visualize logs, metrics and traces.

Collect and visualize logs, metrics and traces

If you want to collect and visualize logs, metrics and traces using Loki, Prometheus, and Tempo when developing your plugin, refer to https://github.com/grafana/grafana/tree/main/devenv/docker/blocks/self-instrumentation which are being used by the Grafana maintainers.

Logs​

Automatic instrumentation by the SDK​

Implement logging in your plugin​

Reuse logger with certain key/value pairs​

Use a contextual logger​

Include additional contextual information in logs​

Best practices​

Validate and sanitize input coming from user input​

When to use which log level?​

Inspect logs locally​

Metrics​

Metric types​

Automatic instrumentation by the SDK​

Go runtime metrics​

Request metrics​

Implement metrics in your plugin​

Best practices​

Validate and sanitize input coming from user input​

Collect and visualize metrics locally​

Traces​

OpenTelemetry configuration in Grafana​

Implement tracing in your plugin​

Automatic instrumentation by the SDK​

Tracing gRPC calls​

Tracing method calls​

Tracing outgoing HTTP requests​

Collect and visualize traces locally​

Collect and visualize logs, metrics and traces​

Logs

Automatic instrumentation by the SDK

Implement logging in your plugin

Reuse logger with certain key/value pairs

Use a contextual logger

Include additional contextual information in logs

Best practices

Validate and sanitize input coming from user input

When to use which log level?

Inspect logs locally

Metrics

Metric types

Automatic instrumentation by the SDK

Go runtime metrics

Request metrics

Implement metrics in your plugin

Best practices

Validate and sanitize input coming from user input

Collect and visualize metrics locally

Traces

OpenTelemetry configuration in Grafana

Implement tracing in your plugin

Automatic instrumentation by the SDK

Tracing gRPC calls

Tracing method calls

Tracing outgoing HTTP requests

Collect and visualize traces locally

Collect and visualize logs, metrics and traces