Making logs work smarter: Evolving your observability strategy

Matt Wimpelberg

Bradley Pettit

•

2025-10-20•5 min

When you start building an observability stack, it’s natural to reach for logs first. They’re familiar, easy to generate, and often already part of a developer’s workflow. And sending logs to a centralized system feels like a quick win, too. Simply add a log shipper, and voila, your application is observable.

It might even be as simple as logs already being available, automatically collected from your workload's stdout stream (e.g., when using the Grafana Kubernetes Monitoring Helm chart). In contrast, metrics may require an extra step, such as annotating your workload.

But what starts as an easy win can quickly turn into a scaling problem, with rising costs and slower queries. In this blog, we'll get into some of the challenges that come with taking a logs-first approach in Grafana Cloud and how you can optimize your stack to avoid the typical gotchas.

How Loki uses indexes

Let's get something out of the way before we go any further: We love Grafana Loki, the open source log aggregation system that powers Grafana Logs. Because it only indexes specific labels, Loki can ingest and query logs extremely efficiently while also optimizing for performance.

The challenges we discuss in this post can be applied to many different logging tools, and you can connect scores of them to Grafana as part of our "big tent" philosophy, but we're going to focus on Loki in this post because we're looking specifically at how to get the most from Grafana Cloud.

Loki's targeted indexing means it can optimize queries and keep storage costs low. But this efficiency depends on using labels correctly. Over-labeling or mislabeling can have the opposite effect, slowing queries by increasing the amount of data queried, undermining Loki’s strength.

When queries can’t narrow down results by meaningful labels, Loki is forced to scan more data than necessary. As outlined above, the more unnecessary data returned, the worse performance becomes, due to extra processing and the faster you consume your fair use allowance.

What are some common challenges?

Over-relying on logs for monitoring in Grafana Cloud can lead to operational and cost challenges.

Cost: Grafana Cloud enforces a fair use policy, allowing users to query up to 100 times their ingested log volume each month at no additional charge. Exceeding this threshold can result in unexpected billing increases and operational overhead. This limit becomes an issue when logs are overqueried from alert rules that run on a regular interval, dashboards, or explore queries that cover large time ranges.

Performance: Loki also imposes query limits such as maximum lines per query, which can cause queries to fail or dashboards to fail to display data, especially when users attempt to retrieve large volumes or long time ranges of log data. As a result of these limitations, dashboards become less responsive and troubleshooting becomes more difficult, undermining the reliability and scalability of observability workflows.

Mitigation steps

Thankfully, there are proven ways to optimize your observability stack.

Scrape metrics directly: Instrument your applications to emit metrics directly to your metrics backend. This avoids the overhead of parsing logs and gives you faster, more reliable insights.
Use recording rules: Precompute frequently used queries with recording rules and store the results as new metrics. This reduces query load and speeds up dashboards. (Check out our docs or this blog to learn more.)
Optimize log queries: Filter on indexed labels and narrow time ranges to keep queries efficient and within fair use limits.

In Grafana Cloud, there's also a Loki query fair usage dashboard you can use to dig deeper into root causes of fair use violations. In addition to the fair use dashboard, the billing dashboard can assist here. The beauty of these dashboards is that they leverage existing metrics in the grafanacloud-usage data source that ships with your stack so you can create custom queries, dashboards, or alerts with these metrics. Usage alerts are also now built in the Cost Management and Billing app.

When are logs the right choice?

Logs are still essential, just not for everything.

When you need to trace the root cause of a problem or investigate unexpected behavior, logs provide the detail you need. In addition, for regulatory requirements or forensic investigations, logs are invaluable for maintaining a detailed record of system activity.

Perhaps you’re simply stuck with that legacy monolith your team has inherited that only offers you logs. If there truly is no way to extract metrics, LogQL recording rules will be the next best thing.

Logs are a critical part of observability, but they’re not the whole story. For scalable, performant monitoring, lead with metrics and let logs do what they do best which is provide rich context when you need to dig deeper.

And if you want to learn more about getting the most from Loki, check out our blogs on query performance and labeling best practices.

Grafana Cloud is the easiest way to get started with metrics, logs, traces, dashboards, and more. We have a generous forever-free tier and plans for every use case. Sign up for free now!

Making logs work smarter: Evolving your observability strategy

How Loki uses indexes

What are some common challenges?

Mitigation steps

When are logs the right choice?

Up next

Related content

Related videos

Related docs

Related products

Making logs work smarter: Evolving your observability strategy

How Loki uses indexes

What are some common challenges?

Mitigation steps

When are logs the right choice?

Related Content

Up next

Related content

Related videos

Related docs

Related products