Grafana Cloud

What are traces?

A user on your website enters their email address into a form to sign up for your mailing list. They click Enter.

The user’s email address is data that flows through your system.

In a cloud computing world, it’s possible that clicking that one button causes data to touch multiple nodes across your cluster of microservices.

The email address may be sent to a verification algorithm sitting in a microservice that exists solely for that purpose. If it passes the check, the information is stored in a database.

Along the way, a node strips personally identifying data from the address and sends metadata collected to a marketing qualifying algorithm to determine whether the request was sent from a targeted part of the internet.

Services respond and data flows back from each, sometimes triggering new events across the system. Along the way, logs are written in various nodes with a time stamp showing when the info passed through.

Finally, the request and response activity ends and a record of that request is sent to Grafana Cloud.

Traces versus metrics and logs

Each observability signal plays a unique role in providing insights into your systems. Metrics act as the high-level indicators of system health. They alert you that something is wrong or deviating from the norm. Logs then help you understand what exactly is going wrong, for example, the nature or cause of the elevated error rates you’re seeing in your metrics. Traces illustrate where in the sequence of events something is going wrong. They let you pinpoint which service in the many services that any given request traverses is the source of the delay or the error.

If a server takes too long to send data, your metrics that track the latency of your system also increase. They may then trigger an alert once that latency rises outside of an acceptable threshold.

Sending that data likely requires that a request interact with many different services in your system. Traces help you pinpoint the specific service that’s introducing the added latency that you’re seeing in your metrics. Alternatively, if you’re seeing an elevated rate of errors when sending data, traces help you figure out from which service the errors are originating from.

Logs provide a granular view of what exactly is going wrong. For example, there could be multiple connection refused errors in your log lines. This explains why the email server took too long to send data.