Help build the future of open source observability software Open positions

Check out the open source projects we support Downloads

We cannot remember your choice unless you click the consent notice at the bottom.

Metrics, logs, traces, and mayhem: introducing an observability adventure game powered by Grafana Alloy and OTel

Metrics, logs, traces, and mayhem: introducing an observability adventure game powered by Grafana Alloy and OTel

2024-11-20 9 min

Ah, adventurer! Are you ready to embark on a perilous quest through the treacherous lands of observability?

As developer advocates here at Grafana Labs, our day job is to learn and teach the pillars of observability to our end users. We spend a lot of time thinking about how software engineers can best learn the basics. Sure, there is a huge amount of great content out there — blogs, videos, and documentation — but we wondered if there’s a way to make the learning experience a bit more interactive and fun.

As it happens, we’re also both avid gamers. We tend to view work-related challenges as quests to embark on and monsters to slay. And in the gaming world, we thought observability would be like a hydra with four heads: metrics, logs, traces, and profiles.

An image of a knight fighting a hydra.

That’s when the lightbulb went off.

Today, we’re excited to introduce Quest World, an interactive game we created using OpenTelemetry, Grafana Alloy, and the Grafana LGTM (Loki for logs, Grafana for visualization, Tempo for traces, Mimir for metrics) Stack that can help you and your team learn the basics of observability.

Read on to learn how you can set up Quest World today and to explore some of the big architectural decisions we made around game structure, player actions, and incorporating the different observability signals.

We’ll also be demoing the game at AWS re:Invent 2024 this year! We hope you’ll come find us in the AWS Modern Applications & Open Source Zone during the following timeslots to check it out and score some extra swag:

  • Tues., Dec. 3 from 1 - 2pm
  • Thurs., Dec. 5 from 11am - 12pm

Quest World: an interactive observability adventure

Quest World is a text-based adventure game with a twist: you have to use the full Grafana LGTM Stack to solve the quest to defeat the evil wizard.

A screenshot from Quest World.

Without giving too much away, your adventure looks a little like this:

  1. Visit a town
  2. Forge a sword: a metrics challenge
  3. Empower that sword: a logs challenge
  4. Defeat the wizard
A screenshot of the Quest World text.

It may look simple at first glance, but take heed of your observability dashboard, adventurer! It is the secret to your success.

A screenshot of a dashboard from Quest World.

Goal

Our goal was simple when creating the game: we wanted to provide at least two challenges that needed to be solved while learning the basics of that observability data type. Metrics, for example, help you understand trends and accelerate reaction times, while logs allow you to reflect on decisions made and perhaps find secret clues to mechanics not found within the terminal.

The adventure is built on open standards and is highly extendable, so you can modify it to meet your learning (and teaching) needs. For instance, we recently just added traces to the adventure to generate a post-game leaderboard (more on this later in the blog).

Setup

If you would like to try Quest World yourself, you can choose from two paths:

Path 1: Online sandbox

We have created an online sandbox using Killercoda. This will spin up a self-contained environment online so you can play the game without worrying about dependencies:

https://killercoda.com/grafana-labs/course/workshops/adventure

Simply:

  1. Follow the link
  2. Create a free Killercoda account
  3. Let the setup script run
  4. Follow the instructions in the left panel
  5. Click next to see more instructions

Path 2: Run the adventure locally

The code is fully open source and available in this GitHub repo, so you can try it yourself and even expand the adventure.

There are a couple of prerequisites to keep in mind:

  1. Docker and Docker Compose (installed and up-to-date)
  2. Python 3.12 or later

Follow the steps outlined in the README for setup.

Creating the adventure

So, how did all of this come together? Actually, rather quickly once we planned out the architecture! Let’s take a look:

A diagram of the Quest World architecture.

Adventure.py

This is where most of the mayhem happens. I should point out that adventure.py actually represents main.py and otel.py in the repo. The code creates a text-based adventure game by defining a class AdventureGame that encapsulates all the game logic and state management. Here’s a breakdown of how the game is made:

Game structure

  • Locations and actions are defined in a dictionary, self.locations. Each location has a description, available actions, and optional prerequisites or effects that trigger specific game logic.
  • The player navigates between locations, interacting with characters and items like swords and the blacksmith’s forge.

Player actions

  • Actions like “request sword” or “heat forge” are tied to specific effects implemented as methods (e.g., request_sword, heat_forge).
  • Each action checks conditions (pre-requisites) and modifies game state variables like self.heat, self.has_sword, or self.blacksmith_burned_down.

Branching gameplay

  • The storyline evolves based on player choices, tracked by state variables (e.g., self.quest_accepted, self.has_holy_sword).
  • Decisions have consequences, like forging an evil or holy sword or influencing the outcome of interactions with the priest, wizard, or quest giver.

This is all tied together in a basic loop that displays the current location and available actions, and processes player commands.

Okay, so, where does OpenTelemetry come into all this?

We decided to implement each telemetry type using the OTel SDK because it allowed us to unify the implementation of each of our telemetry signals (essentially, each hydra head comes from the same body). It also allowed us to customize a lot of these signals so we could add our fun adventure game log.

Here’s how it all works:

Metrics

Definition: Metrics track numerical data like the forge's heat level and the number of swords forged.
Implementation:
  • CustomMetrics is used to define a meter that creates observable gauges.
  • Metrics include:
    • forge_heat: Tracks the heat level of the blacksmith's forge.
    • swords: Tracks the total number of swords forged.
    • holy_sword and evil_sword: Monitor the status of enchanted swords.
  • Gauges use callback functions like observe_forge_heat to return dynamic values based on the game state.

Logs

Definition: Logs capture textual information about events and player actions, categorized by severity (INFO, WARNING, ERROR, CRITICAL).
Implementation:
  • CustomLogFW sets up logging for the game.
  • Logging messages are distributed throughout the code to capture:
    • Player actions (e.g., heating the forge or requesting a sword).
    • Significant events like burning down the blacksmith or crafting an enchanted sword.
    • Warnings or errors when the player makes poor decisions, such as overheating the forge or obtaining a cursed sword.

Traces

Definition: Traces track the flow of actions taken by the player, capturing contextual metadata for each interaction.
Implementation:
  • CustomTracer sets up tracing using OpenTelemetry.
  • A root span tracks the player's entire journey, while child spans are created for individual actions (e.g., "action: request sword").
  • Spans include attributes like the player's name and current location for additional context.
  • Events are added to spans for significant milestones (e.g., forging a sword or killing the wizard).

As you can see, three of our four observability hydra heads play a pivotal imaginary role within the game, while also staying true to their real-world counterparts. The next question is where do all of these telemetry signals go?

Grafana Alloy

The answer is anywhere you like if it’s OpenTelemetry-compatible. Realistically, you will be sending it to some form of OTel Collector for buffering and potentially further processing. We opted to use Grafana Alloy, a vendor-neutral distribution of the OTel Collector, because it has a couple features that make it a great teaching tool.

A screenshot of a character in Quest World.

Integrated UI

Like Prometheus, Grafana Alloy comes with an integrated UI, so you can visually understand the configuration you are deploying and the flow of your data. Check out this relationship graph:

A screenshot of the Grafana Alloy integrated UI.

You can also understand component configuration settings and whether each stage of your pipeline is healthy.

Live debugging

Live debugging is another, slightly more experimental feature that is great for teaching. This lets you see a live feed of your telemetry data as it is being processed through Alloy.

A screenshot of the live debugging feature.

Telemetry storage

Quest World uses three storage backends: Prometheus (metrics), Grafana Loki (logs), and Grafana Tempo (traces). All three databases share one key feature: they have a fully compliant native OTel endpoint. This allows us to write OTel format data (OTLP) directly to our storage backends of choice and let them handle the rest.

The best part about this is it lets us do some pretty powerful analysis when we reach our observability frontend, Grafana, since all signals share the same attributes.

All three databases are running in monolithic mode, and you can find the docker-compose deployment file here. Also, here is a reference table containing all config files:

Prometheushttps://github.com/grafana/adventure/blob/main/prometheus.yml
Lokihttps://github.com/grafana/adventure/blob/main/loki-config.yaml
Tempohttps://github.com/grafana/adventure/blob/main/tempo.yaml

Grafana

Of course, our adventure would not be complete without Grafana, which acts as our primary game interface (other than the text adventure itself).

An altered Grafana logo.

Grafana acts as our:

  • Quest Log: Notifies the user of events triggered throughout their adventure.
  • Forge Monitor: Extremely useful for forging your own sword (wink, wink, nudge, nudge).

You can either use our Explore Apps for metrics, logs, and traces, or use the included dashboard.

Grafana also serves another role after a user has completed a play-through: it acts as a scoreboard using our traces visualization panel.

A screenshot of the leader board.

If I have lost you, let me explain. The scoreboard is based on how quickly you complete the game. Each stage takes a certain amount of time to complete. We treat each play-through as a single trace, and each step the user takes as a span.

A screenshot of traces.

It also meant we could leave some neat metadata such as Span Events throughout so you can look back on your adventure and understand the implications of the choices you made.

One last hydra head to slay!

You may have noticed that we haven’t covered continuous profiling yet as part of this observability adventure. No magic or trickery here – we are just still figuring out the best way to represent profiles in the game.

Our current thought is to maybe make profiles a mechanic for the “final boss” fight against the wizard. During this stage of the game, users have three lives and a couple of spell options. Certain spells could require more system resources, especially if they are used one after the other, which would leave you vulnerable to the wizard. This is where profiles could come in.

If you have any ideas, we would love to hear from you! Please feel free to open issues or PRs to explain how you would incorporate continuous profiles.

Did someone say ‘swag Easter egg’!?

Would any retro adventure game be complete without an Easter egg hidden somewhere in the game?

A screenshot of an egg.

We are giving away swag to the first 10 players who find the secret Easter egg (you’ll know it when you see it). Just send the secret code to either of us via our community Slack.

The end of our story… for now

So, brave adventurer, we come to the end of our story for now. We hope you have as much fun playing Quest World as we did building it. All great stories start from somewhere, and we hope to see the adventure game expand with new ideas and PRs from our awesome community. We want to treat the project like the OTel Demo, so feel free to fork it, change it, and add your spin.

Lastly, just a reminder that we will be demoing the game at AWS re:Invent. Come visit us at the AWS Modern Applications and Open Source Zone during the following timeslots to try your hand at this observability adventure:

  • Tues., Dec. 3 from 1 - 2pm
  • Thurs., Dec. 5 from 11am - 12pm