Configurations

Configurations

There are mainly three relevant places where HM-API and its instances get configured:

HM-API Cli arguments

This lists the cli parameters which HM-API knows, together with an explanation what each of them does.

Argument Default Explanation
-alsologtostderr false Log to standard error as well as files
-api-user test Basic auth user for the HTTP API
-api-pass test Basic auth password for the HTTP API
-cassandra-service cassandra Name of the Cassandra Kubernetes services to access
-cluster cluster Name of cluster to be used in reported metrics
-config-map hm-api-config Name of configMap used for default settings
-default-concurrency 2 Default concurrency to be used when rolling out MT deployments
-default-domain hosted-metrics.grafana.net Domain to append to instance names
-dry-run false Starts HM-API as mock server, mostly used for development
-kubeconfig ./config Path to the kubeconfig file
-lets-encrypt false Enable lets-encrypt for automatic creation of ssl keys
-listen-host hm-api Hostname this HM-API can be reached at
-listen_port 80 TCP port to listen on
-log_backtrace_at "" When logging hits line file:N, emit a stack trace
-log_dir "" If non-empty, write log files in this directory
-logtostderr false Log to standard error instead of files
-namespace metrictank Namespace to provision metrictank clusters in
-ns1-enabled false Enable NS1 DNS integration
-ns1-key "" NS1 API key
-ns1-zone grafana.net DNS zone in which records should be created in
-service-account default Service account used by the HM-API pods
-ssl false Use HTTPS
-ssl_certificate "" Path to SSL certificate file
-ssl_key "" Path to SSL certificate key
-stderrthreshold ERROR Logs at or above this threshold go to stderr
-v 0 Enable V-leveled logging at the specified level
-vmodule "" Comma separated list of pattern=N settings for file-filtered logging
-zk-service zookeeper Zookeeper service name

HM-API ConfigMap

By default the name of this ConfigMap is hm-api-config, unless specified otherwise with the -config-map argument. It contains various settings which are regularly read and used by HM-API. If an HM-API default setting is not present, or the whole ConfigMap doesn’t exist, then HM-API will add it on startup. This means it is possible that this ConfigMap gets modified when an HM-API version update is deployed which introduces new defaults.

Deployment Plans

A plan defines what sizes of instances are known to HM-API. When a plan has been selected for an instance, it is still possible to override certain limits with specific overrides, but the plan gives a basic template of sizes which should be deployed.

The plans exist under the key plans.json and the default plans look like this:

    {
      "large": {
        "Name": "Large",
        "Partitions": 128,
        "CPU": 100,
        "Mem": 1000
      },
      "medium": {
        "Name": "Medium",
        "Partitions": 32,
        "CPU": 100,
        "Mem": 1000
      },
      "small": {
        "Name": "Small",
        "Partitions": 8,
        "CPU": 50,
        "Mem": 500
      }
    }

When a new instance gets deployed, one of the configured plans must be selected. It will determine the resource requests & limits, the number of partitions in Kafka, and, based on the partitioning policy, the number of Metrictanks that will get deployed.

Using the default partitioning policy (ConstantWithNoOverlap) with its default values (partitionsPerReader/partitionsPerWriter set to 8). The number of Metrictanks read deployments (and write deployments) is then <number of partitions> / 8, so a medium instance with 32 partitions results in 4 shards. Each read deployment gets 2 Metrictank pods, and each write deployment 1 Metrictank pod.

The plan definitions may be modified by editing the config map with kubectl -n metrictank edit configmap hm-api-config.

Config Defaults

The config defaults are a long list of parameters, some of which affect the behavior of HM-API itself, but most are configuring the instances managed by HM-API. These parameters exist in a map under the key config-defaults.json. All of them can also be overridden in the instance specific configuration which is described here.

The parameters which configure one of the following services are prefixed by one string for each service:

Prefix Service
MT Metrictank
MTIMPORTER Data Importer Tool
GRAPHITE Graphite
GW Tsdb-Gw

For example to configure the Metrictank setting -warm-up-period the accoding entry in config-defaults.json would look like MT_WARM_UP_PERIOD. A few service-specific settings get added to config-defaults.json by default, others can freely be added by using the same string format.

To get a detailed explanation for each of the settings for Tsdb-Gw, Metrictank, the importer tool and Graphite, please refer to their according documentation. This is an overview of all settings, with an explanation, which are not specific to one of those three services:

Configuration Parameter Default Explanation
CARBON_TSDB_ADDR "" The address where each instance’s carbon-relay-ng should send its metrics to
CASS_KS_REPLICATION_FACTOR "3" The Cassandra keyspace replication factor which gets defined when a new keyspace is created
KAFKA_REPLICATION_FACTOR "3" The Kafka replication factor which gets defined when a new topic created
DOMAIN hosted-metrics.grafana.net The domain to append to instance names, this can also be configured via the cli parameter -default-domain.
ALIAS "" Creates an additional ingress with the specified name. Usually only used in instance configs
SSL_SECRET_NAME wildcard-grafana-net Name of the secret where the wildcard SSL cert is stored. Will be used to terminate SSL in instance ingresses
CLUSTER "" Name of cluster to be used in metrics reported by the created instances
POD_SERVICE_ACCOUNT default The service account as which service pods run
DEPLOYMENT_TIMEOUT 12h How long deployment jobs wait for any each deployment to become fully ready
METRICTANK_VERSION latest Metrictank version
GRAPHITE_VERSION latest Graphite version
TSDB_GW_VERSION latest Tsdb-Gw version
CARBON_RELAY_VERSION latest Carbon-Relay-Ng version
STATSDAEMON_VERSION latest Statsdaemon version
MTIMPORTER_VERSION $METRICTANK_VERSION Mt Importer utility version
METRICTANK_IMAGE us.gcr.io/metrictank-gcr/metrictank Metrictank docker image
MTIMPORTER_IMAGE $METRICTANK_IMAGE Mt Importer utility docker image
TSDB_GW_IMAGE docker.io/raintank/tsdb-gw Tsdb-Gw docker image
GRAPHITE_IMAGE docker.io/raintank/graphite-mt Graphite docker image
CARBON_RELAY_IMAGE us.gcr.io/metrictank-gcr/carbon-relay-ng Carbon-Relay-Ng docker image
STATSDAEMON_IMAGE us.gcr.io/metrictank-gcr/statsdaemon Statsdaemon docker image
HOSTED_METRICS_API_IMAGE us.gcr.io/metrictank-gcr/hosted-metrics-api Hosted Metrics API docker image
MT_WRITE_NODESELECTORS "" Optional setting to define node selectors which will get added to the write pods. For details please read the concepts documentation.
MT_READ_NODESELECTORS "" Same as the MT_WRITE_NODESELECTORS but for read deployments

Instance ConfigMap

Each instance has a configmap which stores the settings from which the instance resources are generated. The name format of that configmap looks like instance-config-<org>-<name>, so for example the name of the configmap for an instance of the org 22 with the name myinstance would look like instance-config-22-myinstance.

This configmap contains 5 keys, but only the value of deployment-config.json is relevant for configuring the instance. The values of the other 4 keys of that configmap are generated from the value of deployment-config.json. So editing them is not recommended, because they will get overwritten again.

Key Explanation
deployment-config.json Json string with instance configurations, such as name, orgId, plan etc. Also Includes the values from which the following 4 keys get generated.
index-rules.conf Mounted as config file in Metrictank. Generated from the value at the key indexRules in deployment-config.json.
storage-aggregations.conf Mounted as config file in Metrictank. Generated from the value at the key storage.aggregations in deployment-config.json.
storage-schemas.conf Mounted as config file in Metrictank. Generated from the value at the key storage.schemas in deployment-config.json.
tsdb-auth.ini Mounted as config file in Tsdb-Gw. Generated from the value at the key tsdbAuth in deployment-config.json.

For an explanation of the contents of the Metrictank and Tsdb-Gw configuration files, please refer to the documentation of the according service.

This is a list of the other configurations in deployment-config.json, together with an explanation for each of them:

Key Explanation
org An integer which is the org id of the instance
name A string which is the name of the instance
plan A structure that describes the chosen plan for this instance, it includes a full copy of the plan
plan.Name The name of the plan of the instance
plan.ScaleFactor A factor by which the plan CPU & Memory get multiplied to scale the instance up
plan.Partitions The number of partitions this instance has in Kafka
plan.CPU The amount of CPU the Metrictank pods request, the limit is always set to 8 (1 per partition)
plan.Mem The amount of Memory the Metrictank pods request, the limit is always set to 10x of that value
overrides A map of keys and values, each of which override / extend the settings from HM-API’s default values. See overrides
overridesA A map of keys and values, similar to overrides, but these overrides are only applied in Metrictank pods with the color a. For an explanation of the concept of colors, please read the concepts documentation
overridesB Same as overridesA, but for B
adminKey An admin key that usually gets auto-generated at the time the instance is created. It can be used to authenticate against the Tsdb-Gw
color The color which is currently active. For an explanation of what “color” means in this context, please read the concepts documentation
revision The revision is a counter which starts at 0, and at every change / update to the instance it gets increased
state A string which shows the current state of the instance. Can be one of active, deploying, deleting, failed
importer Used by the Importer utility to store its state, refer to the concepts documentation for more details about how the importer works
jobData Used to pass data to asynchronous jobs that get created by HM-API, these jobs execute long running tasks such as deploying/deleting/modifying instances
rolloutConcurrency Limits how many Metrictank deployments ever get updated at once. For details please read the concepts documentation
partitioningPolicy A structure than defines which partitioning policy to use and its options. For details please read the concepts documentation
partitioningPolicy.name The name of the partitioning policy to use for the instance
partitioningPolicy.options A map of keys/values passed to the partitioning policy when it is instantiated
enableIndexPruningCronjob A bool to enable/disable the optional index pruning cronjob
indexPruningCronjobSchedule The schedule of the index pruning cronjob, if enabled. A random one will be generated if none is specified

Overrides

The overrides, overridesA and overridesB can contain parameters to override default values.

Key Explanation
MT_* override for any metrictank environment variable (for all pods, read/write/query as appropriate)
MTIMPORTER_* override for any metrictank environment variable which should only be applied in the importer tool’s pod
GW_* override for any tsdb-gw environment variable
GRAPHITE_* override for any graphite environment variable
MTIMPORTER_NUM_PARTITIONS number of partitions the importer writes to, by default this is generated based on the instance plan
MTIMPORTER_TTLS TTLs as a comma-separated list. default value is generated based on storage-schemas.conf
MTIMPORTER_HTTP_ENDPOINT http endpoint the importer pod will listen on, in the format <host>:<port>
MTIMPORTER_PARTITION_SCHEME partition schema, such as bySeries
MTIMPORTER_URI_PATH the URI at which the importer should expect posts, such as /chunks
MEM_METRICTANK for read/write: mem request in MiB (limit is 10x this), for query nodes: mem limit in MiB (requested is always 25% of this) (default: plan.mem * scalefactor)
MEM_METRICTANK_READ mem request in MiB (limit is 10x this) for read nodes; overrides MEM_METRICTANK
MEM_METRICTANK_WRITE mem request in MiB (limit is 10x this) for write nodes; overrides MEM_METRICTANK
MEM_METRICTANK_QUERY mem request in MiB (limit is 10x this) for query nodes; overrides MEM_METRICTANK
CPU_METRICTANK cpu request in millis (1/1000 of a core) (for query deployments it is 25% of this) (default: plan.cpu * scalefactor)
CPU_METRICTANK_READ cpu request in millis (1/1000 of a core) for read nodes; overrides CPU_METRICTANK
CPU_METRICTANK_WRITE cpu request in millis (1/1000 of a core) for write nodes; overrides CPU_METRICTANK
CPU_METRICTANK_QUERY cpu request in millis (1/1000 of a core) for query nodes; overrides CPU_METRICTANK
NODES_METRICTANK_QUERY number of query deployments (default: partitions/8)
NODES_METRICTANK_READ number of read deployments (default: partitions/8), ignored by partitioning policy ConstantWithNoOverlap
NODES_METRICTANK_WRITE number of write deployments (default: partitions/8), ignored by partitioning policy ConstantWithNoOverlap
MEM_TSDB mem request for tsdb-gw in MiB (limit is 5x this) (default: 250 + 100* (plan.scalefactor -1))
CPU_TSDB cpu request in millis (1/1000 of a core) (default: plan.cpu * scalefactor)
MEM_GRAPHITE mem request for graphite in MiB (defalut: 250 + 100* (plan.scalefactor -1))
CPU_GRAPHITE cpu request in millis (1/1000 of a core) for graphite (default: plan.cpu * scalefactor/2)
NODES_TSDB number of tsdb-gw pods (default: max(2, partitions/8))
NODES_GRAPHITE number of graphite pods (default: max(2, partitions/8))
CARBON_ORG_ID which org-Id to set for the carbon-relay-ng container (it uses this for auth when submitting data to tsdb-gw) (default: same as instance org id)
CARBON_RELAY_IMAGE docker image for carbon-relay-ng (default: see hm-api config map)
CARBON_RELAY_VERSION docker image version for carbon-relay-ng (default: see hm-api config map)
CARBON_RELAY_MEM_BASE carbon-relay-ng request memory “base” in Mi (memory request will be base * max(2, partitions/8)) (default: 100)
CARBON_RELAY_CPU_BASE carbon-relay-ng request CPU “base” in millis (cpu request will be base * max(2, partitions/8)) (default: 8)
MEM_CARBON_RELAY carbon-relay-ng request memory override in Mi (limit will be 5x this)
CPU_CARBON_RELAY carbon-relay-ng request CPU override in millis
GRAPHITE_IMAGE docker image for graphite (default: see hm-api config map)
GRAPHITE_VERSION docker image version for graphite (default: see hm-api config map)
HOSTED_METRICS_API_IMAGE docker image for hosted-metrics-api (default: see hm-api config map)
METRICTANK_IMAGE docker image for metrictank (default: see hm-api config map)
METRICTANK_VERSION docker image version for metrictank (default: see hm-api config map)
MTIMPORTER_IMAGE docker image for metrictank-importer (default: see hm-api config map)
MTIMPORTER_VERSION docker image version for metrictank-importer (default: see hm-api config map)
STATSD_IMAGE docker image for statsd (default: see hm-api config map)
STATSD_VERSION docker image version for statsd (default: see hm-api config map)
TSDB_GW_IMAGE docker image for tsdb-gw (default: see hm-api config map)
TSDB_GW_VERSION docker image version for tsdb-gw (default: see hm-api config map)
ALIAS alias hostname for ingress (default: see hm-api config map)
CLUSTER used in metrics and jaeger traces
POD_SERVICE_ACCOUNT name of the ServiceAccount to use to run pods
DOMAIN TODO
SSL_SECRET_NAME name of k8s secret to load for TLS
CARBON_TSDB_ADDR value to set carbon-relay-ng TSDB_ADDR value to
CASS_KS_REPLICATION_FACTOR replication factor for Cassandra (default 3)
KAFKA_REPLICATION_FACTOR replication factor for Kafka (default 3)
MEM_REQ_IDX_PRUNING mem request in MiB for the optional index pruning cron job in MB
MEM_LIM_IDX_PRUNING mem limit in MiB for the optional index pruning cron job in MB
CPU_REQ_IDX_PRUNING cpu request in millis (1/1000 of a core) for the optional index pruning cron job in MB