Graphite proxySchemas

Schemas

The schemas define how data is aggregated before the Graphite query functions process the aggregated data. There are two types of schemas:

  • Storage-schemas define to what interval data should be aggregated and also up to which point in the past data should be available for querying.
  • Aggregation-schemas define what aggregation function should be used to aggregated the data.

Each of the two schemas has a global default which can be configured via the following flags:

  • -graphite.querier.schemas.default-storage-schemas-file
  • -graphite.querier.schemas.default-storage-aggregations-file

If the config flag -graphite.querier.schemas.enable-user-overrides is enabled then the global defaults can be overridden on a per-tenant basis.

Querying and updating a tenant’s schemas

Each of the two schema types has an API endpoint that can be used to query it and to update it, the endpoints are:

  • /graphite/config/storageSchemas
  • /graphite/config/storageAggregations

This is an example request to the Graphite querier which gets the current storage-schemas of the tenant 12345:

~$ curl -H 'X-Scope-OrgId: 12345' http://<graphite-querier>/graphite/config/storageSchemas
[default]
pattern    = .*
intervals  = 0:1s
retentions = 1m:5w

~$

This is an example request to the Graphite querier which updates the storage-schemas of the tenant 12345:

~$ cat storage-schemas.conf
[default]
pattern    = .*
intervals  = 0:1s
retentions = 1m:5w,2h:1y

~$ curl -H 'X-Scope-OrgId: 12345' --data-binary '@storage-schemas.conf' http://<graphite-querier>/graphite/config/storageSchemas
~$

The tenant schemas get stored in the object store bucket configured with the configuration flags prefixed by -graphite.querier.schemas.

Schema caching

When a query is received the Graphite querier needs to obtain the tenant’s schemas, in order to avoid getting the schema from the object store for every single query the schemas get cached in the process memory by default for 1h, this is configurable via -graphite.querier.schemas.schema-ttl.

Proactive cache refreshing

When the Graphite querier gets a schema from its local cache in order to handle a query it checks whether this cache entry is already expired and whether it is in the first or in the second half of its cache life time. If the entry is expired then it doesn’t get used and the latest schema gets fetched from the object store, the obtained schema is then used to handle the query and to populate the cache. If the entry is not expired but it is already in the second half of its cache life time then the cached value is used, but at the same time an asynchronous job is kicked off which fetches the latest schema from the object store and then populates the cache with it.

This pro-active cache refreshing mechanism prevents that queries ever get blocked on the fetching of the latest schema from the object store, assuming that the tenant submits queries relatively frequently. In the case where a tenant only submits queries very infrequently it is possible that their query handling gets blocked on the fetching of the latest schemas.

Schema structure and syntax

Each of the two schema types (storage-schemas/storage-aggregations) consists of a list of schemas where each schema has a pattern property containing a regular expression. When the Graphite querier looks up the schema for a given metric it loops over the list of schemas and matches their patterns against the given metric name, the first schema where the pattern matches is going to be used, this means that the order of the schemas within the configuration files matters.

It is common practice that the last schema in each file is named default and has the pattern .*, this is a catch-all which matches all metrics that haven’t matched any of the previous schemas. If there is no default schema and none of the defined schemas matches, then a hard-coded default is used.

Storage schemas

The format of the storage schemas is very similar to standard Graphite, with two additional parameters. For the documentation of the format refer to storage-schemas.conf.

Example:

[apache_busyWorkers]
pattern = ^servers\.www.*\.workers\.busyWorkers$
retentions = 15s:7d,1m:21d,15m:5y

This example defines that:

  • All metrics matching this regular expression will be available for querying for up to 5y
  • If a query only queries the most recent 7d of data then the data is available at an interval of 15s
  • If a query queries the most recent 21d of data then the data is available at an interval of 1m
  • If a query queries the most recent 5y of data then the data is available at an interval of 15m
  • If a query queries for more than 5y, it will only get data returned for the most recent 5y

Additions to the standard storage schemas

The Graphite querier supports the standard format of Graphite’s storage-schemas and storage-aggregation config files. In addition to the standard properties, it supports two more parameters in the storage-schemas config file which are specific to the Graphite querier.

Intervals

The intervals parameter can be used to define a minimum interval with which the data of a certain static time range must be returned. This is useful in cases where the retentions would potentially assign an interval to data that is lower than the real interval of the stored data, for example because the data in the store has been generated according to an older version of the storage-schema which has been updated in the meantime. In such a scenario the gaps between the stored points would get filled with math.NaN values in order to enforce the interval defined by the retentions, which can lead to unexpected query results.

The intervals parameter is a list of absolute time ranges, each range has an interval associated with it which is the minimum interval at which this data shall be returned. Each time range is defined by its beginning and it ends at the beginning of the following time range, the last time range is open ended into the future.

Example:

100:15s,200:30s,300:15s This string defines that:

  • The data in the time range 100-200 must be returned with an interval of at least 15s
  • The data in the time range 200-300 must be returned with an interval of at least 30s
  • The data in the time range 300-<unlimited> must be returned with an interval of at least 15s

Keep in mind that in Graphite it is a requirement that each metric which gets passed into the Graphite query engine has a constant interval, this means if a user would query the time range 100-500 then the highest minimum interval of that time range (30s) would effectively become the minimum interval for the entire queried range.

The minimum interval that has been determined according to the intervals parameter is then used in the selection of the retention interval at which the data shall be returned.

For example if the intervals parameter defines a minimum interval of 30s like in the above example and the defined retentions which could be used for a query are 15s:7d,15m:5y based on the time range then the Graphite querier will pick the second retention and return the data at an interval of 15m, because even though the first retention would be valid according to the queried time range its interval is lower than the minimum interval of 30s defined by intervals.

An example schema with the intervals parameter might look like this:

[default]
pattern = .*
intervals = 1594166400:30min,1625702400:15s
retentions = 15s:7d,15m:5y

Applying retentions relative to end of queried time range

The parameter relativeToQuery is an optional flag which can be added to a storage schema, when it is not defined then its default value is false.

If set to true then this flag causes that the defined retentions do not get applied relative to the current wall clock time, but instead they get applied relative to the end of the queried time range. This means that for example with a retentions settings of 15s:7d a time range of 7d can be queried and gets returned with an interval of 15s even if the query requests the time range from now()-1y-7d to now()-1y because in that situation the 7d retention would get applied relative to now()-1y.

An example schema with this parameter might look like this:

[default]
pattern = .*
relativeToQuery = true
retentions = 15s:7d,15m:5y

Storage aggregations

The format of the storage aggregations is exactly the same as in standard Graphite.

For the documentation of the format refer to storage-aggregation.conf.

A valid storage aggregations entry might look like this example:

[all_min]
pattern = \.min$
xFilesFactor = 0.1
aggregationMethod = min