How MIC is evolving from reactive incident response to proactive observability across a complex global platform

MIC, a global provider of customs and trade compliance software, operates in one of the most complex and highly regulated IT environments imaginable. Supporting enterprises like Volkswagen, DHL, and Zalando, its systems power critical processes such as customs declarations, tariff classifications, and regulatory reporting across more than 55 countries.

For MIC, system failures don’t just mean downtime, they can delay shipments at borders, disrupt global supply chains, and trigger compliance risks. As its platform evolved toward a distributed, Kubernetes-based architecture, the company needed a more advanced approach to observability to match that complexity.

“It’s not just a website being down, it can mean trucks or shipments getting stuck at borders,” said Jürgen Peirlberger, Leading Observability Engineer at MIC. “That’s why it’s so important that everything is working in real time.”

After starting with fragmented monitoring tools and reactive workflows, MIC is now on a journey toward full end-to-end observability using Grafana Cloud, empowering teams across the organization to understand, detect, and prevent issues before they impact customers.

Jürgen recently spoke with Grafana Labs about MIC’s observability journey.

Can you start by introducing yourself, your role at MIC, what your team is responsible for, and a bit about what MIC does as a company?

My name is Jürgen Peirlberger. I work in the IT and operations department at MIC, specifically in the SaaS-based infrastructure team.

Our team is responsible for creating, innovating, and maintaining the platform that powers MIC products. That includes everything from our Kubernetes-based cloud platform to legacy SaaS environments and internal tooling. We provide the base infrastructure and framework that all product teams build on.

MIC is a global customs and trade compliance software company. Our products help large enterprises manage complex processes like customs declarations, tariff classifications, export controls, and regulatory compliance. We serve industries like automotive, logistics, and retail.

We believe our customers need full trust in our services, which is why we’re transparent about the technology behind our systems and how it delivers value. Monitoring and observability are ongoing efforts to ensure availability and performance in an uncertain environment. At MIC, we continuously improve our approach to enhance our technology and deliver the best possible customer experience.

MIC operates in a highly complex and regulated space. Can you describe the kind of systems you manage and why reliability and visibility are so critical to your business?

We operate a multi-cluster, multi-region Kubernetes platform that we call our Next Generation Infrastructure Platform. At the same time, we still run legacy SaaS environments and some on-premise systems as we migrate customers over.

In addition, we have dedicated AI and data analytics clusters, CI/CD pipelines, and a wide range of internal tools. All of these systems need to work together across a highly distributed environment.

What makes this especially critical is the nature of the data and integrations. We process highly sensitive customs and compliance data across industries, including logistics, manufacturing, and even defense.

A failure in our system is not just a small outage. It can mean shipments getting stuck at borders or compliance violations for our customers. We also integrate directly with government authorities, and each country behaves differently, some responses are real time, others can take days or weeks. That makes it very challenging from an observability perspective.

With this level of complexity, reliability and visibility are absolutely essential.

Before adopting Grafana more broadly, how were you monitoring your systems, and what were the biggest challenges with that approach?

Our monitoring evolved in phases. In the early days, everything was very fragmented, small tools, manual checks, and mostly focused on operations. There was no centralized approach and very limited visibility for application teams.

Later, we moved to tools like New Relic and Kibana, which was a big step forward. But as our organization grew, so did our requirements. We had more services, more teams, and more complexity.

The biggest challenge was the lack of context. We had metrics and logs, but they were spread across different tools. There was no unified view, no shared access, and no common understanding across teams.

For example, we could see that a pod restarted, but we couldn’t easily understand why. Everything required switching between tools and piecing information together manually.

With Grafana, we saw the opportunity to bring everything into one place: metrics, logs, and traces, and create a shared observability platform for the entire company.

You’ve talked about moving from reactive incident handling to a deeper understanding of your systems. What did “reactive” look like before, and what started to push you toward a more proactive approach?

Reactive meant that customers would open support tickets, and then would we start investigating. At that point, the issue had already impacted them.

We often spent hours trying to reconstruct what happened by looking at logs and different systems. In many cases, we later realized that the signals were already there, we just didn’t see them in time.

That was the turning point. We recognized that we already had a lot of data, but we weren’t using it effectively.

We started to see recurring patterns — such as memory pressure or other specific issues —   and realized that observability is not just about reacting faster, but about understanding the system well enough to prevent incidents.

That shift in thinking pushed us toward a more proactive approach.

You’ve personally driven a lot of this transformation. What motivated you to champion this shift internally?

I joined MIC as a student and worked as a developer for several years. I was building features every day, but I had very little visibility into what happened after deployment.

I kept asking myself: Are customers actually using what I build? Does it work well for them? Are there issues?

Later, during my master’s thesis, I focused on observability and conducted an internal survey. The results were clear, many people couldn’t even assess how difficult it was to debug issues because they didn’t have access to the data. Others said it was very time-consuming.

That really reinforced my view that we needed not just better tools, but a different approach.

The idea became observability for everyone, not just operations, but also for developers, product teams, and support. That was the foundation for rolling out Grafana more broadly across the company.

You started primarily with metrics and some logs, and are now expanding into traces and full end-to-end observability. Can you walk us through that journey?

We started with metrics using Prometheus, which gave us a solid foundation to understand system health.

The next step was centralizing logs with Grafana Loki. That was a major improvement because it made logs accessible to a much broader audience without requiring direct server access, which also improved our security model.

Now we are expanding into distributed tracing with Grafana Tempo. This is still an evolving area for us, but the infrastructure is in place and we are actively rolling it out.

What makes the Grafana stack powerful is the unified experience. Being able to move from metrics to logs to traces within the same interface, with shared context and time ranges, has fundamentally changed how we debug and understand our systems.

Even at this stage, we already see the value of tracing and the potential it brings.

Now that you have broader visibility across your systems, how has that changed the way your team works day to day?

Grafana has become a natural part of how teams work. Most people use it regularly, and many say they couldn’t work without Grafana anymore. That’s a big change compared to a few years ago when visibility was very limited.

One major improvement is shared situational awareness during incidents. Teams now gather around the same dashboards and alerts, looking at the same data.

We no longer rely on assumptions, we rely on facts. This has improved how we resolve issues and helped us distinguish more clearly between infrastructure and application problems. It’s especially important in a Kubernetes environment where everything is dynamic.

AI is also starting to play a role by helping analyze telemetry data and highlight relevant signals, which supports faster root cause analysis.

Cost efficiency is an evergreen topic in the IT industry. How have you approached balancing cost control with gaining deeper observability?

Cost is a very real consideration for us. Our approach is to focus on collecting the right data, not all the data, although in a large environment, that’s always a challenge.

Grafana Cloud features like Adaptive Telemetry help us control data volume and manage costs centrally without needing to change our entire infrastructure.

At the same time, I think it’s important to look at observability as a reinvestment.

In the past, we had situations where the system already contained signals of a failure, but we didn’t detect them early. We ended up spending hours fixing issues that could have been resolved in minutes. That’s the real cost, spending time repairing the past instead of building the future.

With better observability, we can reduce support effort, improve quality, and focus more on innovation.

As you continue this journey, what are you most excited to unlock next?

Application observability is our main focus right now. We already have strong infrastructure visibility, but connecting that to business impact still requires manual effort. Application observability will help us understand what our code is actually doing in production.

Grafana Assistant is also exciting because it lowers the barrier to entry. Not everyone is an expert in query languages, so being able to ask questions and get meaningful answers makes observability more accessible.

We are also exploring Knowledge Graph. In a complex microservices architecture, it’s very difficult to understand all dependencies. A graph-based view can significantly improve how we analyze and troubleshoot issues.

From your perspective, what do you see as the next big trend or shift in observability?

One major trend is the convergence of observability and security. The same telemetry we use for system health can also help detect anomalies and unusual behavior. In our industry, handling sensitive data, this is essential.

Another trend is, of course, AI. We are moving from static alerting to smarter systems that understand context, correlate signals, and surface root causes automatically.

Finally, there is a cultural shift. Observability is moving earlier in the development lifecycle. It’s becoming a first-class concern, not something added after deployment.

That shift in mindset is just as important as the technology itself.

mic logo
Industry
Shipping & Third-Party Logistics
Company Size
500+
Headquarters
Linz, Austria