Andreas Gerstmayr is a Software Engineer at Red Hat. He’s working on simplifying the deployment and operations of a modern distributed tracing stack using Tempo and OpenTelemetry on OpenShift.
I’ve been working with Grafana Tempo for about half a year now, and one thing I like about it is that Tempo requires only object storage for storing traces, which is easy to set up in both cloud environments and on-premises. Another outstanding feature is TraceQL, which allows searching for relevant traces with a powerful query language.
Now, let’s imagine you’re a busy system administrator who wants to set up Tempo in your Kubernetes cluster. Even though you’ve read through the Tempo documentation and know Tempo is extremely flexible, you’re overwhelmed by the number of configuration settings and deployment options.
I’m here with good news: There’s a solution for that!
One thing my team has been working on lately is the new Tempo operator, which simplifies deploying a Tempo stack on Kubernetes. It creates and manages all required objects, exposes metrics, and supports upgrading the Tempo instance in the cluster. In this post, I’ll walk you through how to install it.
What is the Tempo operator?
If you’re familiar with Kubernetes, you probably know that a Kubernetes operator extends the Kubernetes API by creating and managing a new Custom Resource. Similarly, the Tempo operator creates a new
TempoStack custom resource. In the same way a Kubernetes deployment creates one or more Pods, a
TempoStack instance creates all objects (Deployments, StatefulSets, Services, ConfigMaps etc.) required to manage a Tempo cluster in the microservices mode.
The operator continuously watches the cluster and converges the current state to match the expected state as defined in the
TempoStack object. What makes an operator stand out from other deployment methods (manifest files, Helm charts), is that it can dynamically react to changes, such as a high load, and perform actions (like increasing the number of replicas of a component).
Installing the operator
Note: The following instructions were tested on Kubernetes v1.26.3 and OpenShift v4.12.
An alternative is to install it by applying Kubernetes manifests directly to the cluster. This requires having cert-manager installed in the cluster. If it’s not there already, please follow the cert-manager installation instructions. Once this step is completed, run the following to install the Tempo operator:
kubectl apply -f https://github.com/grafana/tempo-operator/releases/latest/download/tempo-operator.yaml
You can verify the installation by listing the pods in the operator namespace (
tempo-operator-system when installed via manifests):
$ kubectl -n tempo-operator-system get pod NAME READY STATUS RESTARTS AGE tempo-operator-controller-7cd46dcd4-gs47m 2/2 Running 0 14s
Setting up object storage
In this example, we’ll use MinIO for storage. Run the following to set up a basic MinIO instance in the
minio namespace. (It is intended for testing purposes only.)
kubectl apply -f https://raw.githubusercontent.com/grafana/tempo-operator/41d57e9ec1f78bc9789d3cf55241b2fed2faa269/minio.yaml
In the next step, we configure access to the object storage:
apiVersion: v1 kind: Secret metadata: name: tempo-storage type: Opaque stringData: endpoint: http://minio.minio:9000 bucket: tempo access_key_id: tempo access_key_secret: supersecret
Deploying a Tempo cluster
The final step is to configure a Tempo cluster. This manifest creates a basic, ready-to-use one:
apiVersion: tempo.grafana.com/v1alpha1 kind: TempoStack metadata: name: tempostack1 spec: storage: secret: name: tempo-storage type: s3 storageSize: 2Gi
You can watch your brand new Tempo cluster being created:
kubectl get pod -l app.kubernetes.io/instance=tempostack1 --watch
Run the following command to confirm that all pods and services are created and ready:
$ kubectl get pod,svc -l app.kubernetes.io/instance=tempostack1 NAME READY STATUS RESTARTS AGE pod/tempo-tempostack1-compactor-75dc75d565-jxzrh 1/1 Running 0 84s pod/tempo-tempostack1-distributor-64d486d5b6-smwhb 1/1 Running 0 84s pod/tempo-tempostack1-ingester-0 1/1 Running 0 84s pod/tempo-tempostack1-querier-7f95f8dbf5-hhvmh 1/1 Running 0 84s pod/tempo-tempostack1-query-frontend-5c49496898-fbldg 1/1 Running 0 84s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/tempo-tempostack1-compactor ClusterIP 10.109.222.156 <none> 7946/TCP,3200/TCP 84s service/tempo-tempostack1-distributor ClusterIP 10.109.177.3 <none> 4317/TCP,3200/TCP 84s service/tempo-tempostack1-gossip-ring ClusterIP None <none> 7946/TCP 84s service/tempo-tempostack1-ingester ClusterIP 10.98.10.112 <none> 3200/TCP,9095/TCP 84s service/tempo-tempostack1-querier ClusterIP 10.102.129.172 <none> 7946/TCP,3200/TCP,9095/TCP 84s service/tempo-tempostack1-query-frontend ClusterIP 10.110.69.170 <none> 3200/TCP,9095/TCP 84s service/tempo-tempostack1-query-frontend-discovery ClusterIP None <none> 3200/TCP,9095/TCP,9096/TCP 84s
Sending traces and configuring Grafana
Now it’s time to get traces into Tempo. All you have to do is point your application, Grafana Agent, or OpenTelemetry collector to send OTLP traces via gRPC to:
To generate example traces, you can create the following job:
apiVersion: batch/v1 kind: Job metadata: name: generate-traces spec: template: spec: restartPolicy: Never containers: - name: tracegen image: ghcr.io/grafana/xk6-client-tracing:v0.0.2 env: - name: ENDPOINT value: tempo-tempostack1-distributor.default.svc.cluster.local:4317
Go to your Data source settings page in Grafana, click Add new data source, select Tempo, and enter
http://tempo-tempostack1-query-frontend.default.svc.cluster.local:3200 in the URL field. Once you click the Save & test button, you should see a “Data source is working" info box. Now head over to the Explore page, select your newly created Tempo data source and start querying Tempo with TraceQL!
But wait, there’s more!
The Tempo operator also supports the following features:
Overall resource requests and limits can be specified in the TempoStack CR, and the operator will assign fractions of it to each component (for example the ingester typically requires more CPU than the query-frontend component).
Traces of multiple tenants can be stored in the same Tempo cluster.
The operator can deploy a Jaeger UI container and expose it via Ingress.
The operator exposes metrics about itself and can create ServiceMonitors for the Prometheus operator, which will scrape metrics of each Tempo component.
Communication between the Tempo components can be secured via mTLS.
I hope the Tempo operator makes it easier for system administrators and SREs to run Tempo clusters in production by delegating operational tasks such as upgrades, setting resource limits, configuring metrics and alerting, and configuring mTLS to the operator. We’re continuously working on improving the operator, and plan to add additional self-healing functionality in the future.
Want more information on the Tempo operator? Check out these resources:
- Source: https://github.com/grafana/tempo-operator
- Documentation: https://grafana.com/docs/tempo/latest/setup/operator/
- OperatorHub: https://operatorhub.io/operator/tempo-operator
- Grafana Community Slack: #tempo-operator
Want to share your Grafana story and dashboards with the community? Drop us a note at firstname.lastname@example.org.