Webinar

Tally Group unifies observability with Grafana Cloud

You are registered for this webinar Thanks for registering
You'll receive an email confirmation, and a reminder on the day of the event. You'll receive an email when the on-demand video is available.
From Fragmented Monitoring to Unified Observability: Tally Group’s Journey to Proactive Operations with Grafana Cloud

Company: Tally Group

Industry: Software & Technology (Energy & Utilties)

Tally Group is a global SaaS company providing energy solutions to commercial, industrial, small-to-medium, and residential markets. As the platform scaled rapidly, fragmented observability tooling — with monitoring siloed across four or more separate systems — left engineering teams without application-level visibility, struggling through slow, manual incident response. Tally migrated to Grafana Cloud in six weeks and achieved a unified observability platform across metrics, logs, traces, alerting, synthetics, and IRM.

Challenge
Tally’s monitoring stack had evolved piecemeal. Metrics and alerting ran in one tool. Logging in another. IRM and synthetic checks in separate systems. The result was operational fragmentation: no single view of the platform, no application-level observability, and difficult, time-consuming incident triage.

“Our observability was fragmented,” said John Bulauan, Senior DevOps Engineer at Tally Group. “It was very siloed.”

When incidents occurred, engineers manually pieced together information across tools — adding time to every outage. Alert storms and false positives added noise. Costs accumulated across multiple vendor contracts. And without application tracing, performance degradation from new releases was effectively invisible.

Solution
Tally partnered with DNX to plan and execute a structured migration to Grafana Cloud. The project followed three phases: a Pilot to validate platform fit, a six-week Implementation to onboard core systems, and a Decommission phase to retire legacy tooling in parallel validation runs.

Implementation covered all observability signals: telemetry ingestion via Grafana Alloy, dashboards for API Gateway and database monitoring, fine-grained alerting and synthetic checks, and IRM integration routing alerts directly to on-call schedules across platform and production support teams.

Application Observability — using Alloy and RCA Workbench — was introduced for the first time, providing end-to-end tracing and service dependency visibility that Tally had never previously had.

Impact
Tally shifted from reactive to proactive operations — now notifying customers of issues before customers report them.

  • Unified single pane of glass across metrics, logs, traces, alerting, synthetics, and IRM
  • Application observability deployed for the first time — bottlenecks and root causes now identifiable
  • Automated daily production health checks, replacing manual early-morning SQL scripts
  • Reduced alert storms and false positives through fine-grained alerting
  • Cost optimization achieved via adaptive telemetry
  • Full migration completed in six weeks with partner DNX

“Another big win for us is using the synthetic checks. So prior to this, we did have the production support team. They would log in very early in the morning and run a manual SQL script, which basically is part of their validation check to ensure the platform is healthy. But with synthetic checks, we’re able to automate this and use a SQL plugin from Grafana back to our databases, and that’s now fully automated. So it removes the need and the operational overhead for someone to come in early in the morning”

– John Bulauan, Sr DevOps Engineer, Tally Group

Looking ahead
Tally is expanding their Grafana Cloud usage across several initiatives: building an automated K6 performance testing framework integrated into their SDLC, exploring the Grafana AI Assistant to enable business stakeholders to build dashboards without PromQL knowledge, and completing a multi-hyperscaler rollout across AWS, Azure, and Google Cloud. The goal is consistent, provider-agnostic observability regardless of which cloud their workloads run on.

Resources

More great videos and webinars