Grafana Cloud

When to use continuous profiling

Continuous profiling is a systematic method of collecting and analyzing performance data from production systems.

Traditionally, profiling is used as a free-form debugging tool in languages like Go and Java. You are probably used to running a benchmark tool locally and getting a pprof file in Go or maybe connecting into a misbehaving prod instance and pulling a flame graph from a JFR file in Java. This is great for debugging but not so great for production.

Example flame graph

Refer to Flame graphs to learn more.

Continuous profiling is a modern approach which is safer and more scalable for production environments. It uses low-overhead sampling to collect profiles from production systems and stores the profiles in a database for later analysis. Using continuous profiling gives you a more holistic view of your application and how it behaves in production.


Diagram showing 3 benefits of continuous profiling

Why prioritize continuous profiling?

  1. In-depth code insights: It provides granular, line-level insights into how application code utilizes resources, offering the most detailed view of application performance.
  2. Complements other observability tools: Continuous profiling fills critical gaps left by metrics, logs, and tracing, creating a more comprehensive observability strategy.
  3. Proactive performance optimization: Regular profiling enables teams to proactively identify and resolve performance bottlenecks, leading to more efficient and reliable applications.

Because of the natural link between historical profiles of your application and the amount of resources used over time, profiling provides clear business and technical value to organizations.

From a business standpoint, continuous profiling delivers:

  • Cost cutting: Profiling identifies resource issues like CPU and memory and helps optimize them to reduce costs.
  • Revenue generation: In highly latency-sensitve industries like ride-sharing, e-commerce, and financial technology, minimizing end-user latency is critical for preventing customer churn and revenue loss. Profiling helps identify and remove code bottlenecks, ensuring a seamless user experience.
  • Incident resolution: Continuous profiling provides valuable insights into code performance and system behavior, enabling faster incident resolution by pinpointing the root cause with accuracy. Profiling can be the difference between a minor service disruption and a critical system-wide outage.

At the same time, from a technical perspective, continuous profiling provides:

  • Comprehensive code analysis: Continuous profiling offers a detailed understanding of code performance, resource utilization, and system interactions, allowing for thorough analysis and optimization.
  • Efficient optimization: By identifying bottlenecks through continuous profiling, critical functions can be optimized to improve overall system efficiency, resulting in better performance and resource utilization.
  • Effective capacity planning: Leveraging profiling data facilitates effective capacity planning and scaling efforts, ensuring that your application can handle increasing demands while maintaining optimal performance.

Use cases

Informational graphic illustrating key business benefits

Adopting continuous profiling with tools like Pyroscope and Cloud Profiles can lead to significant business advantages:

  1. Reduced operational costs: Optimization of resource usage can significantly cut down cloud and infrastructure expenses
  2. Reduced latency: Identifying and addressing performance bottlenecks leads to faster and more efficient applications
  3. Enhanced incident management: Faster problem identification and resolution, reducing Mean Time to Resolution (MTTR) and improving end-user experience

Reduce operational costs

By providing in-depth insights into application performance, Cloud Profiles lets teams identify and eliminate inefficiencies, leading to significant savings in areas like observability, incident management, messaging/queuing, deployment tools, and infrastructure.

By using sampling profilers, Cloud Profiles is able to collect data with minimal overhead (~2-5% depending on a few factors).

The custom storage engine compresses and stores the data efficiently. Some advantages of this are:

  • Low CPU overhead thanks to sampling profiler technology
  • Control over profiling data granularity (10s to multiple years)
  • Efficient compression, low disk space requirements and cost

Reduced latency

Cloud Profiles play a pivotal role in reducing application latency by identifying performance bottlenecks at the code level. This granular insight allows for targeted optimization, leading to faster application response times, improved user experience, and consequently, better business outcomes like increased customer satisfaction and revenue.

Enhanced incident management

Cloud Profiles streamline incident management by offering immediate, actionable insights into application performance issues. With continuous profiling, teams can quickly pinpoint the root cause of an incident, reducing the mean time to resolution (MTTR) and enhancing overall system reliability and user satisfaction.