Profile a backend plugin
This guide provides instructions for configuring a backend plugin to enable certain diagnostics when it starts, generating profiling data. Profiling data provides potentially useful information for investigating certain performance problems, such as high CPU or memory usage, or when you want to use continuous profiling.
Configure profiling data
The Grafana configuration file allows you to configure profiling under the [plugin.<plugin ID>]
.
In this section of the file, specify the <plugin ID>
, a unique identifier, for the backend plugin you want to profile, for example, grafana-github-datasource, together with the profiling configuration options (detailed in sub-sections below).
Example configuration:
[plugin.<plugin ID>]
profiling_enabled = true
profiling_port = 6060
profiling_block_rate = 5
profiling_mutex_rate = 5
Restart Grafana after applying the configuration changes. You should see a log message that indicates whether profiling was enabled. For example:
INFO [07-09|19:15:00] Profiling enabled logger=plugin.<plugin ID> blockProfileRate=5 mutexProfileRate=5
To be able to use profiling_block_rate
and profiling_mutex_rate
, your plugin needs to use at least grafana-plugin-sdk-go v0.238.0
. Refer to Update the Go SDK for instructions on how to update the SDK.
The profiling_enabled option
Use this to enable/disable profiling. The default is false
.
The profiling_port option
Optionally, customize the HTTP port where profile data is exposed. For example, use if you want to profile multiple plugins or if the default port is taken. The default is 6060
.
The profiling_block_rate option
Use this to control the fraction of goroutine
blocking events that are reported in the blocking profile. The default is 0
(that is, track no events). For example, use 5
to report 20 percent of all events. Refer to https://pkg.go.dev/runtime#SetBlockProfileRate for more detailed information.
The higher the fraction (that is, the smaller this value) the more overhead it adds to normal operations.
The profiling_mutex_rate option
Use this to control the fraction of mutex contention events that are reported in the mutex profile. The default is 0
(that is, track no events). For example, use 5
to report 20 percent of all events. Refer to https://pkg.go.dev/runtime#SetMutexProfileFraction for more detailed information.
The higher the fraction (that is, the smaller this value) the more overhead it adds to normal operations.
A note about overhead
Running a backend plugin with profiling enabled and without block and mutex profiles enabled should only add a fraction of overhead. These endpoints are therefore suitable for production or continuous profiling scenarios.
Adding a small fraction of block and mutex profiles, such as 5 or 10 (that is, 10 to 20 percent) should in general be fine, but your experience might vary depending on the plugin.
On the other hand, there are potential issues. For example, if you experience requests being slow or queued and you're out of clues, then you could temporarily configure profiling to collect 100 percent of block and mutex profiles to get the full picture. When this is done, turn it off after the profiles have been collected.
Check for debugging endpoints
Check which debugging endpoints are available by browsing http://localhost:<profiling_port>/debug/pprof
.
In this file, localhost
is used, implying that you're connected to the host where Grafana and the plugin are running. If connecting from another host, adjust as needed.
Additional endpoints
There are some additional godeltaprof endpoints available for profiling. These endpoints are more suitable in a continuous profiling scenario.
These endpoints are:
/debug/pprof/delta_heap
/debug/pprof/delta_block
/debug/pprof/delta_mutex
Collect and analyze profiles
In general, you use the Go command pprof
to both collect and analyze profiling data. You can also use curl
or similar tools to collect profiles which could be convenient in environments where you don't have the Go pprof
command available.
Next, let's look at some examples of using curl
and pprof
to collect and analyze memory and CPU profiles.
Analyze high memory usage and memory leaks
When experiencing high memory usage or potential memory leaks it's useful to collect several heap profiles. And then later you can analyze and compare them.
It's a good idea to wait some time, for example, 30 seconds, between collecting each profile to allow memory consumption to increase.
In the following example, localhost
is used to imply that you're connected to the host where Grafana and the plugin are running. If you're connecting from another host, then adjust the command as needed.
curl http://localhost:<profiling_port>/debug/pprof/heap > heap1.pprof
sleep 30
curl http://localhost:<profiling_port>/debug/pprof/heap > heap2.pprof
You can then use the pprof
tool to compare two heap profiles. For example:
go tool pprof -http=localhost:8081 --base heap1.pprof heap2.pprof
Analyze high CPU usage
When you experience high CPU usage, it's a good idea to collect CPU profiles over a period of time, for example, 30 seconds.
In the following example, localhost
is used to imply that you're connected to the host where Grafana and the plugin are running. If you're connecting from another host, then adjust the command as needed.
curl 'http://localhost:<profiling_port>/debug/pprof/profile?seconds=30' > profile.pprof
You can then use the pprof
tool to compare two heap profiles. For example:
go tool pprof -http=localhost:8081 profile.pprof
More information
Refer to the Grafana profiling documentation for further information and instructions of how to profile Grafana.