Caution
Grafana Alloy is the new name for our distribution of the OTel collector. Grafana Agent has been deprecated and is in Long-Term Support (LTS) through October 31, 2025. Grafana Agent will reach an End-of-Life (EOL) on November 1, 2025. Read more about why we recommend migrating to Grafana Alloy.
Important: This documentation is about an older version. It's relevant only to the release noted, many of the features and functions have been updated or replaced. Please view the current version.
Grafana Agent Operator Custom Resource Quickstart
In this guide you’ll learn how to deploy Agent Operator’s custom resources into your Kubernetes cluster.
You’ll roll out the following custom resources (CRs):
- A
GrafanaAgent
resource, which discovers one or moreMetricsInstance
andLogsInstances
resources. - A
MetricsInstance
resource that defines where to ship collected metrics. Under the hood, this rolls out a Grafana Agent StatefulSet that will scrape and ship metrics to aremote_write
endpoint. - A
ServiceMonitor
resource to collect cAdvisor and kubelet metrics. Under the hood, this configures theMetricsInstance
/ Agent StatefulSet. - A
LogsInstance
resource that defines where to ship collected logs. Under the hood, this rolls out a Grafana Agent DaemonSet that will tail log files on your cluster nodes. - A
PodLogs
resource to collect container logs from Kubernetes Pods. Under the hood, this configures theLogsInstance
/ Agent DaemonSet.
To learn more about the custom resources Operator provides and their hierarchy, please consult Operator architecture.
Note: Agent Operator is currently in beta and its custom resources are subject to change as the project evolves. It currently supports the metrics and logs subsystems of Grafana Agent. Integrations and traces support is coming soon.
By the end of this guide, you will be scraping and shipping cAdvisor and Kubelet metrics to a Prometheus-compatible metrics endpoint. You’ll also be collecting and shipping your Pods’ container logs to a Loki-compatible logs endpoint.
Prerequisites
Before you begin, make sure that you have installed Agent Operator into your cluster. You can learn how to do this in:
Step 1: Deploy GrafanaAgent resource
In this step you’ll roll out a GrafanaAgent
resource. A GrafanaAgent
resource discovers MetricsInstance
and LogsInstance
resources and defines the Grafana Agent image, Pod requests, limits, affinities, and tolerations. Pod attributes can only be defined at the GrafanaAgent level and are propagated to MetricsInstance
and LogsInstance
Pods. To learn more, please see the GrafanaAgent Custom Resource Definition.
Note: Due to the variety of possible deployment architectures, the official Agent Operator Helm chart does not provide built-in templates for the custom resources described in this quickstart. These must be configured and deployed manually. However, you are encouraged to template and add the following manifests to your own in-house Helm charts and GitOps flows.
Roll out the following manifests in your cluster:
apiVersion: monitoring.grafana.com/v1alpha1
kind: GrafanaAgent
metadata:
name: grafana-agent
namespace: default
labels:
app: grafana-agent
spec:
image: grafana/agent:v0.25.1
logLevel: info
serviceAccountName: grafana-agent
metrics:
instanceSelector:
matchLabels:
agent: grafana-agent-metrics
externalLabels:
cluster: cloud
logs:
instanceSelector:
matchLabels:
agent: grafana-agent-logs
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: grafana-agent
namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: grafana-agent
rules:
- apiGroups:
- ""
resources:
- nodes
- nodes/proxy
- nodes/metrics
- services
- endpoints
- pods
- events
verbs:
- get
- list
- watch
- apiGroups:
- networking.k8s.io
resources:
- ingresses
verbs:
- get
- list
- watch
- nonResourceURLs:
- /metrics
- /metrics/cadvisor
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: grafana-agent
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: grafana-agent
subjects:
- kind: ServiceAccount
name: grafana-agent
namespace: default
This creates a ServiceAccount, ClusterRole, and ClusterRoleBinding for the GrafanaAgent resource. It also creates a GrafanaAgent resource and specifies an Agent image version. Finally, the GrafanaAgent resource specifies MetricsInstance
and LogsInstance
selectors. These search for MetricsInstances and LogsInstances in the same namespace with labels matching agent: grafana-agent-metrics
and agent: grafana-agent-logs
, respectively. It also sets a cluster: cloud
label for all metrics shipped your Prometheus-compatible endpoint. You should change this label to your desired cluster name. To search for MetricsInstances or LogsInstances in a different namespace, please use the instanceNamespaceSelector
field. To learn more about this field, please consult the GrafanaAgent
CRD specification.
The full hierarchy of custom resources is as follows:
GrafanaAgent
MetricsInstance
PodMonitor
Probe
ServiceMonitor
LogsInstance
PodLogs
Deploying a GrafanaAgent resource on its own will not spin up any Agent Pods. Agent Operator will create Agent Pods once MetricsInstance and LogsIntance resources have been created. In the next step, you’ll roll out a MetricsInstance
resource to scrape cAdvisor and Kubelet metrics and ship these to your Prometheus-compatible metrics endpoint.
Step 2: Deploy a MetricsInstance resource
In this step you’ll roll out a MetricsInstance resource. MetricsInstance resources define a remote_write
sink for metrics and configure one or more selectors to watch for creation and updates to *Monitor
objects. These objects allow you to define Agent scrape targets via K8s manifests:
Roll out the following manifest into your cluster:
apiVersion: monitoring.grafana.com/v1alpha1
kind: MetricsInstance
metadata:
name: primary
namespace: default
labels:
agent: grafana-agent-metrics
spec:
remoteWrite:
- url: your_remote_write_URL
basicAuth:
username:
name: primary-credentials-metrics
key: username
password:
name: primary-credentials-metrics
key: password
# Supply an empty namespace selector to look in all namespaces. Remove
# this to only look in the same namespace as the MetricsInstance CR
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector:
matchLabels:
instance: primary
# Supply an empty namespace selector to look in all namespaces. Remove
# this to only look in the same namespace as the MetricsInstance CR.
podMonitorNamespaceSelector: {}
podMonitorSelector:
matchLabels:
instance: primary
# Supply an empty namespace selector to look in all namespaces. Remove
# this to only look in the same namespace as the MetricsInstance CR.
probeNamespaceSelector: {}
probeSelector:
matchLabels:
instance: primary
Be sure to replace the remote_write
URL and customize the namespace and label configuration as necessary. This will associate itself with the agent: grafana-agent
GrafanaAgent resource deployed in the previous step, and watch for creation and updates to *Monitors
monitors with the the instance: primary
label.
Once you’ve rolled out this manifest, create the basicAuth
credentials using a Kubernetes Secret:
apiVersion: v1
kind: Secret
metadata:
name: primary-credentials-metrics
namespace: default
stringData:
username: 'your_cloud_prometheus_username'
password: 'your_cloud_prometheus_API_key'
If you’re using Grafana Cloud, you can find your hosted Prometheus endpoint username and password in the Grafana Cloud Portal. You may wish to base64-encode these values yourself. In this case, please use data
instead of stringData
.
Once you’ve rolled out the MetricsInstance
and its Secret, you can confirm that the MetricsInstance Agent is up and running with kubectl get pod
. Since we haven’t defined any monitors yet, this Agent will not have any scrape targets defined. In the next step, we’ll create scrape targets for the cAdvisor and kubelet endpoints exposed by the kubelet
service in the cluster.
Step 3: Create ServiceMonitors for kubelet and cAdvisor endpoints
In this step, you’ll create ServiceMonitors for kubelet and cAdvisor metrics exposed by the kubelet
Service. Every node in your cluster exposes kubelet and cadvisor metrics at /metrics
and /metrics/cadvisor
respectively. Agent Operator creates a kubelet
service that exposes these Node endpoints so that they can be scraped using ServiceMonitors.
To scrape these two endpoints, roll out the following two ServiceMonitors in your cluster:
- Kubelet ServiceMonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
instance: primary
name: kubelet-monitor
namespace: default
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
honorLabels: true
interval: 60s
metricRelabelings:
- action: keep
regex: kubelet_cgroup_manager_duration_seconds_count|go_goroutines|kubelet_pod_start_duration_seconds_count|kubelet_runtime_operations_total|kubelet_pleg_relist_duration_seconds_bucket|volume_manager_total_volumes|kubelet_volume_stats_capacity_bytes|container_cpu_usage_seconds_total|container_network_transmit_bytes_total|kubelet_runtime_operations_errors_total|container_network_receive_bytes_total|container_memory_swap|container_network_receive_packets_total|container_cpu_cfs_periods_total|container_cpu_cfs_throttled_periods_total|kubelet_running_pod_count|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate|container_memory_working_set_bytes|storage_operation_errors_total|kubelet_pleg_relist_duration_seconds_count|kubelet_running_pods|rest_client_request_duration_seconds_bucket|process_resident_memory_bytes|storage_operation_duration_seconds_count|kubelet_running_containers|kubelet_runtime_operations_duration_seconds_bucket|kubelet_node_config_error|kubelet_cgroup_manager_duration_seconds_bucket|kubelet_running_container_count|kubelet_volume_stats_available_bytes|kubelet_volume_stats_inodes|container_memory_rss|kubelet_pod_worker_duration_seconds_count|kubelet_node_name|kubelet_pleg_relist_interval_seconds_bucket|container_network_receive_packets_dropped_total|kubelet_pod_worker_duration_seconds_bucket|container_start_time_seconds|container_network_transmit_packets_dropped_total|process_cpu_seconds_total|storage_operation_duration_seconds_bucket|container_memory_cache|container_network_transmit_packets_total|kubelet_volume_stats_inodes_used|up|rest_client_requests_total
sourceLabels:
- __name__
- action: replace
targetLabel: job
replacement: integrations/kubernetes/kubelet
port: https-metrics
relabelings:
- sourceLabels:
- __metrics_path__
targetLabel: metrics_path
scheme: https
tlsConfig:
insecureSkipVerify: true
namespaceSelector:
matchNames:
- default
selector:
matchLabels:
app.kubernetes.io/name: kubelet
- cAdvsior ServiceMonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
instance: primary
name: cadvisor-monitor
namespace: default
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
honorLabels: true
honorTimestamps: false
interval: 60s
metricRelabelings:
- action: keep
regex: kubelet_cgroup_manager_duration_seconds_count|go_goroutines|kubelet_pod_start_duration_seconds_count|kubelet_runtime_operations_total|kubelet_pleg_relist_duration_seconds_bucket|volume_manager_total_volumes|kubelet_volume_stats_capacity_bytes|container_cpu_usage_seconds_total|container_network_transmit_bytes_total|kubelet_runtime_operations_errors_total|container_network_receive_bytes_total|container_memory_swap|container_network_receive_packets_total|container_cpu_cfs_periods_total|container_cpu_cfs_throttled_periods_total|kubelet_running_pod_count|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate|container_memory_working_set_bytes|storage_operation_errors_total|kubelet_pleg_relist_duration_seconds_count|kubelet_running_pods|rest_client_request_duration_seconds_bucket|process_resident_memory_bytes|storage_operation_duration_seconds_count|kubelet_running_containers|kubelet_runtime_operations_duration_seconds_bucket|kubelet_node_config_error|kubelet_cgroup_manager_duration_seconds_bucket|kubelet_running_container_count|kubelet_volume_stats_available_bytes|kubelet_volume_stats_inodes|container_memory_rss|kubelet_pod_worker_duration_seconds_count|kubelet_node_name|kubelet_pleg_relist_interval_seconds_bucket|container_network_receive_packets_dropped_total|kubelet_pod_worker_duration_seconds_bucket|container_start_time_seconds|container_network_transmit_packets_dropped_total|process_cpu_seconds_total|storage_operation_duration_seconds_bucket|container_memory_cache|container_network_transmit_packets_total|kubelet_volume_stats_inodes_used|up|rest_client_requests_total
sourceLabels:
- __name__
- action: replace
targetLabel: job
replacement: integrations/kubernetes/cadvisor
path: /metrics/cadvisor
port: https-metrics
relabelings:
- sourceLabels:
- __metrics_path__
targetLabel: metrics_path
scheme: https
tlsConfig:
insecureSkipVerify: true
namespaceSelector:
matchNames:
- default
selector:
matchLabels:
app.kubernetes.io/name: kubelet
These two ServiceMonitors configure Agent to scrape all the Kubelet and cAdvisor endpoints in your Kubernetes cluster (one of each per Node). In addition, it defines a job
label which you may change (it is preset here for compatibility with Grafana Cloud’s Kubernetes integration), and allowlists a core set of Kubernetes metrics to reduce remote metrics usage. If you don’t need this allowlist, you may omit it, however note that your metrics usage will increase significantly.
When you’re done, Agent should now be shipping Kubelet and cAdvisor metrics to your remote Prometheus endpoint.
Step 4: Deploy LogsInstance and PodLogs resources
In this step, you’ll deploy a LogsInstance resource to collect logs from your cluster nodes and ship these to your remote Loki endpoint. Under the hood, Agent Operator will deploy a DaemonSet of Agents in your cluster that will tail log files defined in PodLogs resources.
Deploy the LogsInstance into your cluster:
apiVersion: monitoring.grafana.com/v1alpha1
kind: LogsInstance
metadata:
name: primary
namespace: default
labels:
agent: grafana-agent-logs
spec:
clients:
- url: your_remote_logs_URL
basicAuth:
username:
name: primary-credentials-logs
key: username
password:
name: primary-credentials-logs
key: password
# Supply an empty namespace selector to look in all namespaces. Remove
# this to only look in the same namespace as the LogsInstance CR
podLogsNamespaceSelector: {}
podLogsSelector:
matchLabels:
instance: primary
This LogsInstance will pick up PodLogs resources with the instance: primary
label. Be sure to set the Loki URL to the correct push endpoint (for Grafana Cloud, this will be something like logs-prod-us-central1.grafana.net/loki/api/v1/push
, however you should check the Cloud Portal to confirm).
Also note that we are using the agent: grafana-agent-logs
label here, which will associate this LogsInstance with the GrafanaAgent resource defined in Step 1. This means that it will inherit requests, limits, affinities and other properties defined in the GrafanaAgent custom resource.
Create the Secret for the LogsInstance resource:
apiVersion: v1
kind: Secret
metadata:
name: primary-credentials-logs
namespace: default
stringData:
username: 'your_username_here'
password: 'your_password_here'
If you’re using Grafana Cloud, you can find your hosted Loki endpoint username and password in the Grafana Cloud Portal. You may wish to base64-encode these values yourself. In this case, please use data
instead of stringData
.
Finally, we’ll roll out a PodLogs resource to define our logging targets. Under the hood, Agent Operator will turn this into Agent config for the logs subsystem, and roll it out to the DaemonSet of logging agents.
The following is a minimal working example which you should adapt to your production needs:
apiVersion: monitoring.grafana.com/v1alpha1
kind: PodLogs
metadata:
labels:
instance: primary
name: kubernetes-pods
namespace: default
spec:
pipelineStages:
- docker: {}
namespaceSelector:
matchNames:
- default
selector:
matchLabels: {}
This tails container logs for all Pods in the default
Namespace. You can restrict the set of Pods matched by using the matchLabels
selector. You can also set additional pipelineStages
and create relabelings
to add or modify log line labels. To learn more about the PodLogs spec and available resource fields, please see the PodLogs CRD.
Under the hood, the above PodLogs resource will add the following labels to log lines:
namespace
service
pod
container
job
- Set to
PodLogs_namespace/PodLogs_name
- Set to
__path__
(the path to log files)- Set to
/var/log/pods/*$1/*.log
where$1
is__meta_kubernetes_pod_uid/__meta_kubernetes_pod_container_name
- Set to
To learn more about this config format and other available labels, please see the Promtail Scraping reference documentation. Agent Operator will load this config into the LogsInstance agents automatically.
At this point the DaemonSet of logging agents should be tailing your container logs, applying some default labels to the log lines, and shipping them to your remote Loki endpoint.
Conclusion
At this point you’ve rolled out the following into your cluster:
- A
GrafanaAgent
resource, which discovers one or moreMetricsInstance
andLogsInstances
resources. - A
MetricsInstance
resource that defines where to ship collected metrics. - A
ServiceMonitor
resource to collect cAdvisor and kubelet metrics. - A
LogsInstance
resource that defines where to ship collected logs. - A
PodLogs
resource to collect container logs from Kubernetes Pods.
You can verify that everything is working correctly by navigating to your Grafana instance and querying your Loki and Prometheus datasources. Operator support for Tempo and traces is coming soon.