How DataSnipper scaled observability for its SaaS transition with Grafana Cloud
DataSnipper, a fast-growing audit automation platform, is evolving from a desktop-first product into a full SaaS platform powered by backend services, AI capabilities, and global infrastructure. As part of that shift, its small SRE team needed to support multiple regions, improve reliability, and establish a more mature observability practice, all without adding unnecessary operational complexity.
After evaluating its existing Azure-based setup and other tools, DataSnipper adopted Grafana Cloud to unify observability, streamline incident management, and support its next phase of growth. Within three months, the team migrated its core telemetry, dashboards, and alerting workflows into Grafana Cloud, creating a centralized view across metrics, logs, and traces. Today, roughly 40 services send telemetry into Grafana Cloud, helping the team standardize observability and improve incident response. Incident resolution times have dropped to under 15 minutes, giving engineers clearer ownership over MTTR and MTTD and reducing potential customer impact.
“We wanted one place where we could focus on observability without being distracted by everything else,” said Aleksandar Ivanov, SRE at DataSnipper. “Grafana Cloud gave us that, along with features we didn’t even realize we were missing.”
Aleksandar recently spoke with Grafana Labs about DataSnipper’s journey to Grafana Cloud.
Can you tell us your name, your role at DataSnipper, and a bit about the team you work on?
Yeah, I’m Alexander Ivanov, part of the SRE team at DataSnipper. It’s a four-person team, and we do everything you’d expect from an SRE team in a smaller company.
We handle infrastructure, observability, Azure, and generally connecting all the dots needed to serve the applications. There’s also some communication with customers and supporting developers as best as we can.
For people who may not be familiar with DataSnipper, can you briefly explain what the company does and how your platform supports auditors and finance teams?
DataSnipper is all about helping auditors and financial professionals reduce pressure during audit periods, which can be a really intense few months.
We automate a lot of their processes and significantly improve their speed. The platform itself is a suite of products that integrates closely with Microsoft Excel. It works as a plugin where users can extract data from documents, including financial statements or unstructured data.
One of our most recent products that just became generally available is Excel agents, which will further improve automation. The idea is that you can feed documents into the agent directly or through various integrations, it extracts the data, puts it into Excel, and creates references back to the original documents. Those documents can be large or numerous, so that’s really where we provide value.
DataSnipper has been evolving from a desktop-first product to a more backend-driven SaaS platform. What has that transition looked like from an engineering and platform perspective?
That transition has been quite significant.
As a plugin, things are simpler because everything runs on the client side. Moving to a platform means much more responsibility on our side. We now need to support multiple regions to meet data residency requirements and improve the reliability of all the systems behind the platform.
It’s been a heavy shift. The company has grown quite a bit to support it, not just in engineering but also in areas like marketing and sales as we expand into new markets and offer new types of products.
Before adopting Grafana Cloud, how were you monitoring your systems, and what challenges were you running into with your existing setup?
All of our services are deployed on Azure, so we were using Azure for monitoring as well.
It has its benefits, since everything is already there, but we felt it wasn’t as focused on observability as tools like Grafana. The data could be scattered, and the query language is different, which makes things harder.
Another key gap was the lack of incident response management capabilities, which made coordinating during incidents slower and more difficult for the team.
What we really wanted was one place where we could focus on observability. Having a single pane of glass makes life easier.
What ultimately led your team to evaluate Grafana Cloud, and what were the most important capabilities you were looking for in a solution?
A big factor was that both the SRE team and developers already had experience with Grafana OSS, and we were happy with it. It’s a well-maintained and respected tool.
Open standards were also important for us. Grafana supports and contributes to open standards, which means if we ever need to move to something else, it would be easier.
Another key requirement was incident management. As we transitioned to a platform, we needed on-call schedules across development teams. We didn’t initially realize Grafana Cloud included incident management, so that made the offering even more compelling.
As you started bringing your telemetry into Grafana Cloud, what stood out to you about the experience?
One thing that stood out was the visibility into spending.
Grafana Cloud doesn’t just show which products you’re using, but also how metrics themselves impact cost, like cardinality. That helps us understand what we should optimize or remove.
We haven’t yet used Adaptive Metrics, but we expect it will be helpful to better understand what telemetry we actually need.
How has having metrics, logs, and traces together in one place changed the way your team approaches troubleshooting and incident response?
The entity graph has been a significant improvement in how we work.
You can see all services, now around 40, and their related observability data in one place. When you look into a service, you immediately have access to key metrics, logs, and traces. It only takes a few clicks to move through everything.
That makes troubleshooting much easier because you can quickly navigate between different types of data without switching tools.
As DataSnipper continues expanding its SaaS platform and AI-driven capabilities, how do you see observability supporting that next stage of growth?
This is exactly why we chose Grafana Cloud. It helps us expand more easily because we already have templates and configurations that can be applied to new regions or services. The setup is relatively straightforward, so new teams and services can get observability quickly.
We’re also planning to explore more features like database observability and improving our SLOs through incident management.
It also helps us improve how quickly we detect and resolve issues, which directly impacts the experience we provide to customers.
Overall, it makes it easier to scale both the platform and the number of services we support while maintaining good observability.
Anything else you’d like to add about your experience with Grafana Cloud?
One thing I really appreciate is how many features Grafana Cloud includes. At times, it can feel like a lot to take in compared to Grafana OSS, but it’s a good problem to have. We’re still exploring everything that’s available.
The migration itself took about three months, which was exactly what we planned. In that time, we were able to move our core telemetry, incident management, dashboards, and alerts, and get the team trained on the platform.
Overall, it’s been a positive experience, and there’s still more for us to explore.


