Metric usage overview
In Grafana Cloud, metrics usage is calculated by looking at two components: active series and data points per minute (DPM).
- An active series is a time series where new data points, or samples, are being generated. When you stop updating a time series, it is no longer considered “active” for billing purposes.
- A data point is a single measured occurrence of a metric within a time series, consisting of a unique value and timestamp.
What are Prometheus time series?
For Prometheus, a time series is a list of timestamp and value pairs identified by a metric name and zero or more pairs of label names and label values.
As an example, consider the following output from Prometheus:
In this example, we can determine:
- The metric name is
- There are three labels, where:
- The label names are
- The label values are
- The label names are
- The are two total time series, since the label value of
cpuis different for each time series even though
modeare the same.
The example above represents two time series. If one time series were to generate a sample, there would be a specific timestamp and measured value associated with the time series. If the exact same time series generate another value, say 10 minutes later, it would represent a second data point in the same time series, with a new timestamp and value.
Metrics usage can ramp up quickly when a given metric has many different combinations of labels, or high cardinality. For example, with 6 different modes, 10 hosts, and say 4 cpus,
node_cpu_seconds_total would count towards
6*10*4 or 240 active series of your usage.
You can read a detailed explanation of the Prometheus data model in the Prometheus documentation.
What are Graphite time series?
For Graphite, unique times series are equivalent to metric paths.
For example, below is an output from Graphite with eight unique time series:
collect.host1.cpu-0.cpu-idle collect.host1.cpu-0.cpu-user collect.host1.cpu-0.cpu-wait collect.host1.cpu-0.cpu-system collect.host2.cpu-3.cpu-idle collect.host2.cpu-3.cpu-user collect.host2.cpu-3.cpu-wait collect.host2.cpu-3.cpu-system
If Graphite Tags are used, then below is an output with eight unique time series:
collect.cpu;host=host1;cpu=0;mode=idle collect.cpu;host=host1;cpu=0;mode=user collect.cpu;host=host1;cpu=0;mode=wait collect.cpu;host=host1;cpu=0;mode=system collect.cpu;host=host2;cpu=3;mode=idle collect.cpu;host=host2;cpu=3;mode=user collect.cpu;host=host2;cpu=3;mode=wait collect.cpu;host=host2;cpu=3;mode=system
Billing is based on usage, and usage is determined by two primary factors: the number of active series and the number of data points per minute (DPM). Each series supports up to six DPM. If your average DPM per active series does not exceed six, then your usage is equal to your total active series. If your average DPM per active series is greater than six, then your usage is based on a combination of your active series and DPM rate. Your bill will be calculated to reflect this.
An active series is a time series where new data points are being generated. The time span after which a series is no longer considered active after receiving the last data point is different for Prometheus and Graphite:
- For Prometheus, a time series is considered active if new data points have been generated within the last 15 to 30 minutes.
- For Graphite, a time series is considered active if new data points have been generated within the last eight hours.
Data points per minute
The second component for calculating total usage in Grafana Cloud is data points per minute (DPM). Usage is only affected by DPM if it’s been configured above six, meaning that your metrics are being scraped more often than every 10 seconds. Although we support up to an average rate of 6 DPM per active series, we don’t prevent you from sending more data points, more frequently, if you configure your sampling interval to a higher rate. In this case, your usage billing would be calculated accordingly:
percentile_over_time(.95, active_series[30d]) * (max(6, percentile_over_time(.95, total_dpm[30d])) / 6)
For example, suppose you have 1,000 time series and are sampling and sending data at a rate of 4 DPM per series. Your usage would be equal to 1,000 active series.
Now suppose you still have 1,000 time series that are being sampled but have now increased your polling rate to an average of 12 DPM per time series. This is now equivalent to 12,000 total DPM across all series, and you would be billed as if you were sending 2,000 time series (12,000 ÷ 6 = 2,000).
For comparison, the billing calculation for 1,000 time series at 12 DPM is equivalent to 2,000 time series at the base rate of 6 DPM.
95th percentile billing
In Grafana Cloud, we regularly check both the number of active series you send us and your DPM rates. For each new billing period, we bill you based on the 95th percentile of:
- The total number of active series sent.
- The total DPM across all active series.
As a result, this reduces the chances of billing you for unexpected spikes. This means we forgive the highest five percent of usage spikes for each month, which equates to about ~36 hours (5% of 720 hours is 36 hours).
For example, if you normally send around 6,000 active series but spike up to 30,000 active series for a total of 24 hours in a month, you would still only be billed at the rate of 6,000 active series.