Monitoring Tempo
Tempo is instrumented to expose metrics, logs and traces. Additionally, the Tempo repository has a mixin that includes a set of dashboards, rules and alerts. Together, these can be used to monitor Tempo in production.
Instrumentation
Tempo is already instrumented with metrics, logs and traces. These can be collected to observe Tempo.
Metrics
Tempo is instrumented with Prometheus metrics. It emits RED metrics for most services and backends. The Tempo mixin provides several dashboards using these metrics.
Logs
Tempo emits logs in the key=value
(logfmt) format.
Traces
Tempo uses the Jaeger Golang SDK for tracing instrumentation. As of this writing, the complete read path and some parts of the write of Tempo are instrumented for tracing.
The tracer can be configured using environment variables.
To enable tracing, set one of the following: JAEGER_AGENT_HOST
and JAEGER_AGENT_PORT
, or JAEGER_ENDPOINT
.
The Jaeger client uses remote sampling by default, if the management server is not available no traces will be sent. To always send traces (no sampling), set the following environment variables:
JAEGER_SAMPLER_TYPE=const
JAEGER_SAMPLER_PARAM=1
Dashboards
The Tempo mixin has four Grafana dashboards in the yamls
folder that you can download and import into your Grafana UI.
Note that at the moment, these work well when Tempo is run in a k8s environment and metrics scraped have the
cluster
and namespace
labels!
Tempo / Reads
This is available as
tempo-reads.json
.
The Reads dashboard gives information information on Requests, Errors and Duration (R.E.D) on the Query Path of Tempo. Each query touches the Gateway, Tempo-Query, Query-Frontend, Queriers, Ingesters, Cache (if present) and the backend.
Use this dashboard to monitor the performance of each of the above mentioned components and to decide the number of replicas in each deployment.
Tempo / Writes
This is available as
tempo-writes.json
.
The Reads dashboard gives information information on Requests, Errors and Duration (R.E.D) on the write/ingest Path of Tempo. A write query touches the Gateway, Distributors, Ingesters and eventually the backend. This dashboard also gives information on the number of operations performed by the Compactor to the backend.
Use this dashboard to monitor the performance of each of the above mentioned components and to decide the number of replicas in each deployment.
Tempo / Resources
This is available as
tempo-resources.json
.
The Resources dashboard provides information on CPU
, Container Memory
and Go Heap Inuse
, and is useful for resource
provisioning for the different Tempo components.
Use this dashboard to see if any components are running close to their assigned limits!
Tempo / Operational
This is available as
tempo-operational.json
.
The Tempo Operational dashboard deserves special mention b/c it probably a stack of dashboard anti-patterns. It’s big and complex, doesn’t use jsonnet and displays far too many metrics in one place. And I love it. For just getting started the Reads, Write and Resources dashboards are great places to learn how to monitor Tempo in an opaque way.
This dashboard is included in this repo for two reasons:
- It provides a stack of metrics for other operators to consider monitoring while running Tempo.
- We want it in our internal infrastructure and we vendor the tempo-mixin to do this.
Rules and Alerts
The Rules and Alerts are available as yaml files in the mixin on the repository.
To set up alerting, download the provided json files and configure them for use on your Prometheus monitoring server.
Check the runbook to understand the various steps that can be taken to fix firing alerts!
Related Tempo resources
GrafanaCONline 2021
Be the first to learn about exciting next-generation features in Grafana 8.0, be inspired by what community members are building, and attend expert-led sessions and workshops on Grafana, Prometheus, Loki logs, and more.
Getting started with tracing and Grafana Tempo
In this session, we will use an instrumented application to demonstrate how to use logs and Prometheus exemplars to find traces effectively in Tempo. The demo will hit on the basics of operating Tempo and reveal how Tempo allows us to scale tracing as far as possible with less operational cost and complexity than ever before.
Tracing made simple with Grafana
Dive into the new options for viewing tracing data inside Grafana and learn how to make tracing an integral part of your observability strategy.