Grafana Labs at KubeCon: Prometheus, OpenTelemetry, AI, and more

• 2025-03-21 • 10 min

It’s that time of year again: we’re getting ready for KubeCon, and we can’t wait to see you there!

KubeCon + CloudNativeCon Europe 2025 will run from April 1-4 in London, bringing together open source enthusiasts for the Cloud Native Computing Foundation’s flagship conference. Grafana Labs is a proud silver sponsor of this year’s KubeCon Europe event, including Observability Day 2025.

If you plan to attend, drop by booth S462 at KubeCon’s sponsor solutions showcase to meet the team and grab some swag (because who doesn’t need a new Grot sticker?!).

You can also contact us to set up a meeting with one of our experts, or attend one of the sessions we’re participating in (more on that below).

KubeCon sessions featuring Grafana Labs

The following talks (all times local) feature, or are led by, members of the Grafana Labs team:

Contribfest: Contribution Guide and Workshop: Help us improve the Prometheus ecosystem!

Wednesday, April 2 • 11:15 - 12:30 BST

Have you ever wondered how to introduce that quick fix you always wanted to do to Prometheus or Alertmanger server? What about proposing bigger changes to Prometheus like improving TSDB storage, API, or Prometheus standards like PromQL, OpenMetrics, or Remote Write?

It might be easier than you thought! In this workshop, we will propose an example code change to Prometheus. The participants will learn and exercise:

How to find various code components in the Go codebases for Prometheus and auxiliary projects like Alertmanager, avalanche, etc.
How to propose the bigger changes through the Prometheus proposal process
Testing and benchmarking Prometheus
Documenting changes
What it takes to become a maintainer one day!

This session is led by:

Björn Rabenstein, Principal Software Engineer at Grafana Labs
Arthur Silva Sens, Senior Software Engineer at Grafana Labs
Bartłomiej Płotka, Senior Software Engineer at Google
Arianna Vespri, Software Engineer

Prerequisites for the active participation: Linux or Mac dev machine, git, and Go 1.23 installed.

Prometheus deep dive: What’s new in v3.0 and beyond

Wednesday, April 2 • 14:30 - 15:00 BST

Born at SoundCloud and now a graduated CNCF project, Prometheus has become the de facto standard for monitoring and alerting in Kubernetes and beyond. It benefits from a rich ecosystem, including Alertmanager; efficient client libraries for many languages; the Prometheus Operator, which can be installed on Kubernetes; numerous Exporters to provide the raw data; and even projects to run it at a massive scale.

Last year Prometheus released version 3.0, which includes several new features and enhancements, and a refreshed UI/UX. Join Fiona Liao, Staff Software Engineer at Grafana Labs, and Saswata Mukherjee, Senior Software Engineer at Red Hat — both Prometheus team members — to learn what it enables for the community, the ongoing upstream progress on v3 features, and how to get the most out of them. Learn how to get involved with new v3 initiatives, share feedback, and get an opportunity to have your questions answered!

Enhancing database observability with OpenTelemetry

Wednesday, April 2 • 14:30 - 15:00 BST

With the recent stabilization of the OpenTelemetry semantic conventions for databases, it’s an excellent time for OSS libraries to provide users with the observability they’ve been seeking. This talk, led by Marylia Gutierrez, Staff Software Engineer at Grafana Labs,

dives into how you can instrument your application with OpenTelemetry SDKs to improve observability and collect actionable telemetry data from your databases. Learn about the SDK implementations that are currently available by language and database, their current gaps, and how you can contribute and develop missing instrumentation.

Whether you’re an SRE, developer, or database administrator, this talk will equip you with the tools and knowledge to bring clarity and efficiency to your database systems.

Asimov’s Zeroth Law of Robotics: Observability for AI

Wednesday, April 2 • 16:15 - 16:45 BST

A robot may not harm humans. A robot must obey humans. A robot must protect its own existence.

These are Isaac Asimov’s three Laws of Robotics, created to govern the ethical programming of artificial intelligences. From the Butlerian Jihad to Skynet to cylons, we’ve been immortalizing our collective nightmares about artificial intelligence for years. But there’s an unmentioned law that comes as a prerequisite to all of that: a robot must be observable.

In this talk, presented by Nicole van der Hoeven, Senior Developer Advocate at Grafana Labs, you’ll explore the different types of AI, the factors that make observing AI different from observing applications, and the telemetry signals specific to AI that we might want to listen to. How do we deal with large data sets? How do we observe for model drift? How do we take into account the costs of LLMs? How can we use distributed tracing to follow event sequences? Part cautionary tale and part technical demo, this talk shows how to instrument and monitor AI apps using OpenTelemetry, Prometheus, OpenLit, and more.

The next generation of DaemonSet autoscaling

Wednesday, April 2 • 17:00 - 17:30 BST

Imagine you have small 4-core nodes and larger 64-core nodes in the same cluster, and a DaemonSet that does much more work on the larger nodes. How do you set resource requests and limits appropriately?

Managing resources for workloads deployed as a DaemonSet in Kubernetes can be challenging when load is not evenly distributed across nodes. Static allocation can cause over/under-utilization and scheduling issues. VPA helps, but currently assumes uniform load across all pods, which is a bad assumption for certain types of workloads.

In this talk, Bryan Boreham, Distinguished Engineer at Grafana Labs, and Adam Bernot, Software Engineer at Google Cloud, discuss their case studies, why this feature will be useful, and how their prototype implements per-pod VPA for DaemonSets to improve resource efficiency, stability, and eliminate the need for manual tuning. This is your chance to learn about this upcoming feature and connect with the people who are implementing it!

Logs, metrics, traces and mayhem: An interactive observability adventure game

Wednesday, April 2 • 17:45 - 18:15 BST

Have you ever wanted to play an actual game on your observability stack? Well, you can. Not only does Doom run on Grafana, we also built an actual text-based adventure game.

Join Jay Clifford and Tom Glenn, both Senior Developer Advocates at Grafana Labs, to play a real, text-based observability adventure game! Armed with the tools of the trade — metrics, logs, and traces — you’ll learn to navigate the labyrinth of debugging and optimization, rescuing your application from the clutches of the dark wizard!

In this interactive session, you’ll dive into a game played live to showcase how each telemetry type is used to solve real-world observability challenges. As players encounter obstacles, they’ll wield the power of OpenTelemetry to gather critical data and use OSS tools like Grafana, Loki, Tempo, and Prometheus to make informed decisions.

Whether you’re an observability novice or a seasoned engineer, this talk will level up your debugging skills and showcase how to gamify observability training for your team. So, gear up, adventurer — your quest awaits!

Pushing the limits of Prometheus at Etsy

Thursday, April 3 • 11:45 - 12:15 BST

Take a deep dive into the journey of pushing Prometheus beyond its performance limits. This talk, presented by Chris Leavoy, Staff Observability Engineer at Etsy, and Bryan Boreham, Distinguished Engineer at Grafana Labs, offers an insider’s perspective on scaling a single Prometheus instance using a powerhouse 128-core machine with 4TB of RAM, and processing a staggering 500 million metrics at its peak. It’s a story packed with lessons, insights, and actionable takeaways from operating one of the industry’s largest Prometheus servers.

The talk will go through:

Breaking boundaries: Explore the challenges encountered in Prometheus’ design and how they navigated them.
Diagnosing bottlenecks: Discover how to combine observability signals — metrics, profiles, and traces — to pinpoint and overcome performance roadblocks.
Building resilience: Uncover strategies to optimize metrics volume and enhance Prometheus’ reliability under load.

This session isn’t just about pushing technology to the edge — it’s about learning to work smarter, build better systems, and create a more resilient observability stack.

The State of Prometheus and OpenTelemetry interoperability

Friday, April 4 • 11:45 - 12:15 BST

Prometheus and OpenTelemetry are two CNCF projects focusing on observability and truly excelling at their main purposes. However, they take slightly different approaches, and making both projects work well together has been challenging.

In this talk, Arthur Silva Sens, Senior Software Engineer at Grafana Labs, and Juraj Michálek, Senior Logging & Monitoring engineer at Swiss RE (both active contributors to Prometheus and OpenTelemetry communities), will present all the usual frustrations that a user would face when integrating Prometheus and OTel, and all the work done by the OpenTelemetry-Prometheus SIG (Special Interest Group) in the past year to transform Prometheus+OTel into a love story.

You’ll leave this session understanding the core philosophical differences between the two projects that make interoperability so difficult, the progress made to improve the situation, and what to expect in the near future.

C.A.L.L.I.N.G. now I’m calling you, calling you now

Friday, April 4 • 14:30 - 15:00 BST

The Kubernetes API is awesome and so tempting to use, especially when building observability solutions. Nobody wants to just get raw IP addresses and ports in their network or request telemetry; it’s much better to see your pod and service metadata. But what’s even better is that getting information about all the nodes in your cluster can help you produce amazing service graphs.

This talk from Mario Macías and Terra Tauri, both Staff Software Engineers at Grafana Labs, is a story of how they took down the Kubernetes API in their biggest production cluster by deploying observability tools that make heavy use of the Kubernetes API. They’ll show you the techniques they used to avoid repeating mistakes, by applying configuration changes and building services that helped them shield the Kubernetes API from the information-thirsty observability tools, all while keeping the functionality intact.

Using eBPF for non-invasive, performant, instant network monitoring

Friday, April 4 • 15:15 - 15:45 BST

Traditionally, monitoring your network connections required devices being able to export the flows of data. With the rise of software-defined networks, providing observability capabilities was the responsibility of the SDN providers, or relied on software-based packet analyzers that often have a noticeable impact on the cluster’s performance.

eBPF is presented as an efficient, non-invasive mechanism to observe different layers of clusters’ network layers, from L3 to L7, and automatically extract relevant information without having to redeploy the network infrastructure or applications.

This talk, presented by Mario Macías, Staff Software Engineer at Grafana Labs, and Marc Tudurí, Senior Software Engineer at Grafana Labs, explains the Grafana journey to provide plug and play network and services observability: how they connect to different layers of your services infrastructure, how network packets flow through your system, how the low-level network information is matched with Kubernetes metadata for improved user data navigation, and more.

From chaos to control: Migrating access control to OpenFGA in a multi-tenant world

Friday, April 4 • 15:15 - 15:45 BST

Designing access control that works seamlessly for individuals and scales to millions of resources is a complex challenge.

From lackluster search performance to feature inconsistency and multi-tenant schema discrepancies, there’s no shortage of issues to face. In this talk, Jo Guerreiro, Engineering Manager at Grafana Labs, and Poovamraj Thanganadar Thiagarajan, Senior Software Engineer at Okta, walk through how the Grafana Access squad is tackling these issues using OpenFGA, a CNCF sandbox project, by porting existing access control schema and rethinking their resource search strategy.

If you’ve ever wondered what it takes as a platform engineer to support access control on a multi-tenant system with millions of resources, this is your opportunity to learn how to orchestrate a migration from your current access control system and hear about the peculiar challenges of developing security critical systems.

Grafana Labs at KubeCon: Prometheus, OpenTelemetry, AI, and more

KubeCon sessions featuring Grafana Labs

Related content

Webinar: How to get started with OpenTelemetry and Grafana

Webinar: Mastering OpenTelemetry instrumentation and Grafana

Serverless observability: How to monitor Google Cloud Run with OpenTelemetry and Grafana Cloud