Configurations
There are mainly three relevant places where HM-API and its instances get configured:
HM-API Cli arguments
This lists the cli parameters which HM-API knows, together with an explanation what each of them does.
Argument | Default | Explanation |
---|---|---|
-alsologtostderr | false | Log to standard error as well as files |
-api-user | test | Basic auth user for the HTTP API |
-api-pass | test | Basic auth password for the HTTP API |
-cassandra-service | cassandra | Name of the Cassandra Kubernetes services to access |
-cluster | cluster | Name of cluster to be used in reported metrics |
-config-map | hm-api-config | Name of configMap used for default settings |
-default-concurrency | 2 | Default concurrency to be used when rolling out MT deployments |
-default-domain | hosted-metrics.grafana.net | Domain to append to instance names |
-dry-run | false | Starts HM-API as mock server, mostly used for development |
-kubeconfig | ./config | Path to the kubeconfig file |
-lets-encrypt | false | Enable lets-encrypt for automatic creation of ssl keys |
-listen-host | hm-api | Hostname this HM-API can be reached at |
-listen_port | 80 | TCP port to listen on |
-log_backtrace_at | "" | When logging hits line file:N, emit a stack trace |
-log_dir | "" | If non-empty, write log files in this directory |
-logtostderr | false | Log to standard error instead of files |
-namespace | metrictank | Namespace to provision metrictank clusters in |
-ns1-enabled | false | Enable NS1 DNS integration |
-ns1-key | "" | NS1 API key |
-ns1-zone | grafana.net | DNS zone in which records should be created in |
-service-account | default | Service account used by the HM-API pods |
-ssl | false | Use HTTPS |
-ssl_certificate | "" | Path to SSL certificate file |
-ssl_key | "" | Path to SSL certificate key |
-stderrthreshold | ERROR | Logs at or above this threshold go to stderr |
-v | 0 | Enable V-leveled logging at the specified level |
-vmodule | "" | Comma separated list of pattern=N settings for file-filtered logging |
-zk-service | zookeeper | Zookeeper service name |
HM-API ConfigMap
By default the name of this ConfigMap is hm-api-config
, unless specified otherwise with the -config-map
argument. It contains various settings which are regularly read and used by HM-API. If an HM-API default setting is not present, or the whole ConfigMap doesn’t exist, then HM-API will add it on startup. This means it is possible that this ConfigMap gets modified when an HM-API version update is deployed which introduces new defaults.
Deployment Plans
A plan defines what sizes of instances are known to HM-API. When a plan has been selected for an instance, it is still possible to override certain limits with specific overrides, but the plan gives a basic template of sizes which should be deployed.
The plans exist under the key plans.json
and the default plans look like this:
{
"large": {
"Name": "Large",
"Partitions": 128,
"CPU": 100,
"Mem": 1000
},
"medium": {
"Name": "Medium",
"Partitions": 32,
"CPU": 100,
"Mem": 1000
},
"small": {
"Name": "Small",
"Partitions": 8,
"CPU": 50,
"Mem": 500
}
}
When a new instance gets deployed, one of the configured plans must be selected. It will determine the resource requests & limits, the number of partitions in Kafka, and, based on the partitioning policy, the number of Metrictanks that will get deployed.
Using the default partitioning policy (ConstantWithNoOverlap
) with its default values (partitionsPerReader
/partitionsPerWriter
set to 8
). The number of Metrictanks read deployments (and write deployments) is then <number of partitions> / 8
, so a medium
instance with 32
partitions results in 4
shards. Each read deployment gets 2
Metrictank pods, and each write deployment 1
Metrictank pod.
The plan definitions may be modified by editing the config map with kubectl -n metrictank edit configmap hm-api-config
.
Config Defaults
The config defaults are a long list of parameters, some of which affect the behavior of HM-API itself, but most are configuring the instances managed by HM-API. These parameters exist in a map under the key config-defaults.json
. All of them can also be overridden in the instance specific configuration which is described here.
The parameters which configure one of the following services are prefixed by one string for each service:
Prefix | Service |
---|---|
MT | Metrictank |
MTIMPORTER | Data Importer Tool |
GRAPHITE | Graphite |
GW | Tsdb-Gw |
For example to configure the Metrictank setting -warm-up-period
the accoding entry in config-defaults.json
would look like MT_WARM_UP_PERIOD
. A few service-specific settings get added to config-defaults.json
by default, others can freely be added by using the same string format.
To get a detailed explanation for each of the settings for Tsdb-Gw, Metrictank, the importer tool and Graphite, please refer to their according documentation. This is an overview of all settings, with an explanation, which are not specific to one of those three services:
Configuration Parameter | Default | Explanation |
---|---|---|
CARBON_TSDB_ADDR | "" | The address where each instance’s carbon-relay-ng should send its metrics to |
CASS_KS_REPLICATION_FACTOR | "3" | The Cassandra keyspace replication factor which gets defined when a new keyspace is created |
KAFKA_REPLICATION_FACTOR | "3" | The Kafka replication factor which gets defined when a new topic created |
DOMAIN | hosted-metrics.grafana.net | The domain to append to instance names, this can also be configured via the cli parameter -default-domain . |
ALIAS | "" | Creates an additional ingress with the specified name. Usually only used in instance configs |
SSL_SECRET_NAME | wildcard-grafana-net | Name of the secret where the wildcard SSL cert is stored. Will be used to terminate SSL in instance ingresses |
CLUSTER | "" | Name of cluster to be used in metrics reported by the created instances |
POD_SERVICE_ACCOUNT | default | The service account as which service pods run |
DEPLOYMENT_TIMEOUT | 12h | How long deployment jobs wait for any each deployment to become fully ready |
METRICTANK_VERSION | latest | Metrictank version |
GRAPHITE_VERSION | latest | Graphite version |
TSDB_GW_VERSION | latest | Tsdb-Gw version |
CARBON_RELAY_VERSION | latest | Carbon-Relay-Ng version |
STATSDAEMON_VERSION | latest | Statsdaemon version |
MTIMPORTER_VERSION | $METRICTANK_VERSION | Mt Importer utility version |
METRICTANK_IMAGE | us.gcr.io/metrictank-gcr/metrictank | Metrictank docker image |
MTIMPORTER_IMAGE | $METRICTANK_IMAGE | Mt Importer utility docker image |
TSDB_GW_IMAGE | docker.io/raintank/tsdb-gw | Tsdb-Gw docker image |
GRAPHITE_IMAGE | docker.io/raintank/graphite-mt | Graphite docker image |
CARBON_RELAY_IMAGE | us.gcr.io/metrictank-gcr/carbon-relay-ng | Carbon-Relay-Ng docker image |
STATSDAEMON_IMAGE | us.gcr.io/metrictank-gcr/statsdaemon | Statsdaemon docker image |
HOSTED_METRICS_API_IMAGE | us.gcr.io/metrictank-gcr/hosted-metrics-api | Hosted Metrics API docker image |
MT_WRITE_NODESELECTORS | "" | Optional setting to define node selectors which will get added to the write pods. For details please read the concepts documentation. |
MT_READ_NODESELECTORS | "" | Same as the MT_WRITE_NODESELECTORS but for read deployments |
Instance ConfigMap
Each instance has a configmap which stores the settings from which the instance resources are generated. The name format of that configmap looks like instance-config-<org>-<name>
, so for example the name of the configmap for an instance of the org 22
with the name myinstance
would look like instance-config-22-myinstance
.
This configmap contains 5 keys, but only the value of deployment-config.json
is relevant for configuring the instance. The values of the other 4 keys of that configmap are generated from the value of deployment-config.json
. So editing them is not recommended, because they will get overwritten again.
Key | Explanation |
---|---|
deployment-config.json | Json string with instance configurations, such as name, orgId, plan etc. Also Includes the values from which the following 4 keys get generated. |
index-rules.conf | Mounted as config file in Metrictank. Generated from the value at the key indexRules in deployment-config.json . |
storage-aggregations.conf | Mounted as config file in Metrictank. Generated from the value at the key storage.aggregations in deployment-config.json . |
storage-schemas.conf | Mounted as config file in Metrictank. Generated from the value at the key storage.schemas in deployment-config.json . |
tsdb-auth.ini | Mounted as config file in Tsdb-Gw. Generated from the value at the key tsdbAuth in deployment-config.json . |
For an explanation of the contents of the Metrictank and Tsdb-Gw configuration files, please refer to the documentation of the according service.
This is a list of the other configurations in deployment-config.json
, together with an explanation for each of them:
Key | Explanation |
---|---|
org | An integer which is the org id of the instance |
name | A string which is the name of the instance |
plan | A structure that describes the chosen plan for this instance, it includes a full copy of the plan |
plan.Name | The name of the plan of the instance |
plan.ScaleFactor | A factor by which the plan CPU & Memory get multiplied to scale the instance up |
plan.Partitions | The number of partitions this instance has in Kafka |
plan.CPU | The amount of CPU the Metrictank pods request, the limit is always set to 8 (1 per partition) |
plan.Mem | The amount of Memory the Metrictank pods request, the limit is always set to 10x of that value |
overrides | A map of keys and values, each of which override / extend the settings from HM-API’s default values. See overrides |
overridesA | A map of keys and values, similar to overrides , but these overrides are only applied in Metrictank pods with the color a . For an explanation of the concept of colors, please read the concepts documentation |
overridesB | Same as overridesA , but for B |
adminKey | An admin key that usually gets auto-generated at the time the instance is created. It can be used to authenticate against the Tsdb-Gw |
color | The color which is currently active. For an explanation of what “color” means in this context, please read the concepts documentation |
revision | The revision is a counter which starts at 0 , and at every change / update to the instance it gets increased |
state | A string which shows the current state of the instance. Can be one of active , deploying , deleting , failed |
importer | Used by the Importer utility to store its state, refer to the concepts documentation for more details about how the importer works |
jobData | Used to pass data to asynchronous jobs that get created by HM-API, these jobs execute long running tasks such as deploying/deleting/modifying instances |
rolloutConcurrency | Limits how many Metrictank deployments ever get updated at once. For details please read the concepts documentation |
partitioningPolicy | A structure than defines which partitioning policy to use and its options. For details please read the concepts documentation |
partitioningPolicy.name | The name of the partitioning policy to use for the instance |
partitioningPolicy.options | A map of keys/values passed to the partitioning policy when it is instantiated |
enableIndexPruningCronjob | A bool to enable/disable the optional index pruning cronjob |
indexPruningCronjobSchedule | The schedule of the index pruning cronjob, if enabled. A random one will be generated if none is specified |
Overrides
The overrides
, overridesA
and overridesB
can contain parameters to override default values.
Key | Explanation |
---|---|
MT_* | override for any metrictank environment variable (for all pods, read/write/query as appropriate) |
MTIMPORTER_* | override for any metrictank environment variable which should only be applied in the importer tool’s pod |
GW_* | override for any tsdb-gw environment variable |
GRAPHITE_* | override for any graphite environment variable |
MTIMPORTER_NUM_PARTITIONS | number of partitions the importer writes to, by default this is generated based on the instance plan |
MTIMPORTER_TTLS | TTLs as a comma-separated list. default value is generated based on storage-schemas.conf |
MTIMPORTER_HTTP_ENDPOINT | http endpoint the importer pod will listen on, in the format <host>:<port> |
MTIMPORTER_PARTITION_SCHEME | partition schema, such as bySeries |
MTIMPORTER_URI_PATH | the URI at which the importer should expect posts, such as /chunks |
MEM_METRICTANK | for read/write: mem request in MiB (limit is 10x this), for query nodes: mem limit in MiB (requested is always 25% of this) (default: plan.mem * scalefactor ) |
MEM_METRICTANK_READ | mem request in MiB (limit is 10x this) for read nodes; overrides MEM_METRICTANK |
MEM_METRICTANK_WRITE | mem request in MiB (limit is 10x this) for write nodes; overrides MEM_METRICTANK |
MEM_METRICTANK_QUERY | mem request in MiB (limit is 10x this) for query nodes; overrides MEM_METRICTANK |
CPU_METRICTANK | cpu request in millis (1/1000 of a core) (for query deployments it is 25% of this) (default: plan.cpu * scalefactor ) |
CPU_METRICTANK_READ | cpu request in millis (1/1000 of a core) for read nodes; overrides CPU_METRICTANK |
CPU_METRICTANK_WRITE | cpu request in millis (1/1000 of a core) for write nodes; overrides CPU_METRICTANK |
CPU_METRICTANK_QUERY | cpu request in millis (1/1000 of a core) for query nodes; overrides CPU_METRICTANK |
NODES_METRICTANK_QUERY | number of query deployments (default: partitions/8 ) |
NODES_METRICTANK_READ | number of read deployments (default: partitions/8 ), ignored by partitioning policy ConstantWithNoOverlap |
NODES_METRICTANK_WRITE | number of write deployments (default: partitions/8 ), ignored by partitioning policy ConstantWithNoOverlap |
MEM_TSDB | mem request for tsdb-gw in MiB (limit is 5x this) (default: 250 + 100* (plan.scalefactor -1) ) |
CPU_TSDB | cpu request in millis (1/1000 of a core) (default: plan.cpu * scalefactor ) |
MEM_GRAPHITE | mem request for graphite in MiB (defalut: 250 + 100* (plan.scalefactor -1) ) |
CPU_GRAPHITE | cpu request in millis (1/1000 of a core) for graphite (default: plan.cpu * scalefactor/2 ) |
NODES_TSDB | number of tsdb-gw pods (default: max(2, partitions/8) ) |
NODES_GRAPHITE | number of graphite pods (default: max(2, partitions/8) ) |
CARBON_ORG_ID | which org-Id to set for the carbon-relay-ng container (it uses this for auth when submitting data to tsdb-gw) (default: same as instance org id) |
CARBON_RELAY_IMAGE | docker image for carbon-relay-ng (default: see hm-api config map) |
CARBON_RELAY_VERSION | docker image version for carbon-relay-ng (default: see hm-api config map) |
CARBON_RELAY_MEM_BASE | carbon-relay-ng request memory “base” in Mi (memory request will be base * max(2, partitions/8) ) (default: 100) |
CARBON_RELAY_CPU_BASE | carbon-relay-ng request CPU “base” in millis (cpu request will be base * max(2, partitions/8) ) (default: 8) |
MEM_CARBON_RELAY | carbon-relay-ng request memory override in Mi (limit will be 5x this) |
CPU_CARBON_RELAY | carbon-relay-ng request CPU override in millis |
GRAPHITE_IMAGE | docker image for graphite (default: see hm-api config map) |
GRAPHITE_VERSION | docker image version for graphite (default: see hm-api config map) |
HOSTED_METRICS_API_IMAGE | docker image for hosted-metrics-api (default: see hm-api config map) |
METRICTANK_IMAGE | docker image for metrictank (default: see hm-api config map) |
METRICTANK_VERSION | docker image version for metrictank (default: see hm-api config map) |
MTIMPORTER_IMAGE | docker image for metrictank-importer (default: see hm-api config map) |
MTIMPORTER_VERSION | docker image version for metrictank-importer (default: see hm-api config map) |
STATSD_IMAGE | docker image for statsd (default: see hm-api config map) |
STATSD_VERSION | docker image version for statsd (default: see hm-api config map) |
TSDB_GW_IMAGE | docker image for tsdb-gw (default: see hm-api config map) |
TSDB_GW_VERSION | docker image version for tsdb-gw (default: see hm-api config map) |
ALIAS | alias hostname for ingress (default: see hm-api config map) |
CLUSTER | used in metrics and jaeger traces |
POD_SERVICE_ACCOUNT | name of the ServiceAccount to use to run pods |
DOMAIN | TODO |
SSL_SECRET_NAME | name of k8s secret to load for TLS |
CARBON_TSDB_ADDR | value to set carbon-relay-ng TSDB_ADDR value to |
CASS_KS_REPLICATION_FACTOR | replication factor for Cassandra (default 3) |
KAFKA_REPLICATION_FACTOR | replication factor for Kafka (default 3) |
MEM_REQ_IDX_PRUNING | mem request in MiB for the optional index pruning cron job in MB |
MEM_LIM_IDX_PRUNING | mem limit in MiB for the optional index pruning cron job in MB |
CPU_REQ_IDX_PRUNING | cpu request in millis (1/1000 of a core) for the optional index pruning cron job in MB |