Caution
Grafana Alloy is the new name for our distribution of the OTel collector. Grafana Agent has been deprecated and is in Long-Term Support (LTS) through October 31, 2025. Grafana Agent will reach an End-of-Life (EOL) on November 1, 2025. Read more about why we recommend migrating to Grafana Alloy.
Important: This documentation is about an older version. It's relevant only to the release noted, many of the features and functions have been updated or replaced. Please view the current version.
Grafana Agent Operator Custom Resource Quickstart
In this guide you’ll learn how to deploy Agent Operator’s custom resources into your Kubernetes cluster.
You’ll roll out the following custom resources (CRs):
- A
GrafanaAgentresource, which discovers one or moreMetricsInstanceandLogsInstancesresources. - A
MetricsInstanceresource that defines where to ship collected metrics. Under the hood, this rolls out a Grafana Agent StatefulSet that will scrape and ship metrics to aremote_writeendpoint. - A
ServiceMonitorresource to collect cAdvisor and kubelet metrics. Under the hood, this configures theMetricsInstance/ Agent StatefulSet. - A
LogsInstanceresource that defines where to ship collected logs. Under the hood, this rolls out a Grafana Agent DaemonSet that will tail log files on your cluster nodes. - A
PodLogsresource to collect container logs from Kubernetes Pods. Under the hood, this configures theLogsInstance/ Agent DaemonSet.
To learn more about the custom resources Operator provides and their hierarchy, please consult Operator architecture.
Note: Agent Operator is currently in beta and its custom resources are subject to change as the project evolves. It currently supports the metrics and logs subsystems of Grafana Agent. Integrations and traces support is coming soon.
By the end of this guide, you will be scraping and shipping cAdvisor and Kubelet metrics to a Prometheus-compatible metrics endpoint. You’ll also be collecting and shipping your Pods’ container logs to a Loki-compatible logs endpoint.
Prerequisites
Before you begin, make sure that you have installed Agent Operator into your cluster. You can learn how to do this in:
Step 1: Deploy GrafanaAgent resource
In this step you’ll roll out a GrafanaAgent resource. A GrafanaAgent resource discovers MetricsInstance and LogsInstance resources and defines the Grafana Agent image, Pod requests, limits, affinities, and tolerations. Pod attributes can only be defined at the GrafanaAgent level and are propagated to MetricsInstance and LogsInstance Pods. To learn more, please see the GrafanaAgent Custom Resource Definition.
Note: Due to the variety of possible deployment architectures, the official Agent Operator Helm chart does not provide built-in templates for the custom resources described in this quickstart. These must be configured and deployed manually. However, you are encouraged to template and add the following manifests to your own in-house Helm charts and GitOps flows.
Roll out the following manifests in your cluster:
apiVersion: monitoring.grafana.com/v1alpha1
kind: GrafanaAgent
metadata:
name: grafana-agent
namespace: default
labels:
app: grafana-agent
spec:
image: grafana/agent:v0.25.1
logLevel: info
serviceAccountName: grafana-agent
metrics:
instanceSelector:
matchLabels:
agent: grafana-agent-metrics
externalLabels:
cluster: cloud
logs:
instanceSelector:
matchLabels:
agent: grafana-agent-logs
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: grafana-agent
namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: grafana-agent
rules:
- apiGroups:
- ""
resources:
- nodes
- nodes/proxy
- nodes/metrics
- services
- endpoints
- pods
- events
verbs:
- get
- list
- watch
- apiGroups:
- networking.k8s.io
resources:
- ingresses
verbs:
- get
- list
- watch
- nonResourceURLs:
- /metrics
- /metrics/cadvisor
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: grafana-agent
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: grafana-agent
subjects:
- kind: ServiceAccount
name: grafana-agent
namespace: defaultThis creates a ServiceAccount, ClusterRole, and ClusterRoleBinding for the GrafanaAgent resource. It also creates a GrafanaAgent resource and specifies an Agent image version. Finally, the GrafanaAgent resource specifies MetricsInstance and LogsInstance selectors. These search for MetricsInstances and LogsInstances in the same namespace with labels matching agent: grafana-agent-metrics and agent: grafana-agent-logs, respectively. It also sets a cluster: cloud label for all metrics shipped your Prometheus-compatible endpoint. You should change this label to your desired cluster name. To search for MetricsInstances or LogsInstances in a different namespace, please use the instanceNamespaceSelector field. To learn more about this field, please consult the GrafanaAgent CRD specification.
The full hierarchy of custom resources is as follows:
GrafanaAgentMetricsInstancePodMonitorProbeServiceMonitor
LogsInstancePodLogs
Deploying a GrafanaAgent resource on its own will not spin up any Agent Pods. Agent Operator will create Agent Pods once MetricsInstance and LogsIntance resources have been created. In the next step, you’ll roll out a MetricsInstance resource to scrape cAdvisor and Kubelet metrics and ship these to your Prometheus-compatible metrics endpoint.
Step 2: Deploy a MetricsInstance resource
In this step you’ll roll out a MetricsInstance resource. MetricsInstance resources define a remote_write sink for metrics and configure one or more selectors to watch for creation and updates to *Monitor objects. These objects allow you to define Agent scrape targets via K8s manifests:
Roll out the following manifest into your cluster:
apiVersion: monitoring.grafana.com/v1alpha1
kind: MetricsInstance
metadata:
name: primary
namespace: default
labels:
agent: grafana-agent-metrics
spec:
remoteWrite:
- url: your_remote_write_URL
basicAuth:
username:
name: primary-credentials-metrics
key: username
password:
name: primary-credentials-metrics
key: password
# Supply an empty namespace selector to look in all namespaces. Remove
# this to only look in the same namespace as the MetricsInstance CR
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector:
matchLabels:
instance: primary
# Supply an empty namespace selector to look in all namespaces. Remove
# this to only look in the same namespace as the MetricsInstance CR.
podMonitorNamespaceSelector: {}
podMonitorSelector:
matchLabels:
instance: primary
# Supply an empty namespace selector to look in all namespaces. Remove
# this to only look in the same namespace as the MetricsInstance CR.
probeNamespaceSelector: {}
probeSelector:
matchLabels:
instance: primaryBe sure to replace the remote_write URL and customize the namespace and label configuration as necessary. This will associate itself with the agent: grafana-agent GrafanaAgent resource deployed in the previous step, and watch for creation and updates to *Monitors monitors with the the instance: primary label.
Once you’ve rolled out this manifest, create the basicAuth credentials using a Kubernetes Secret:
apiVersion: v1
kind: Secret
metadata:
name: primary-credentials-metrics
namespace: default
stringData:
username: 'your_cloud_prometheus_username'
password: 'your_cloud_prometheus_API_key'If you’re using Grafana Cloud, you can find your hosted Prometheus endpoint username and password in the Grafana Cloud Portal. You may wish to base64-encode these values yourself. In this case, please use data instead of stringData.
Once you’ve rolled out the MetricsInstance and its Secret, you can confirm that the MetricsInstance Agent is up and running with kubectl get pod. Since we haven’t defined any monitors yet, this Agent will not have any scrape targets defined. In the next step, we’ll create scrape targets for the cAdvisor and kubelet endpoints exposed by the kubelet service in the cluster.
Step 3: Create ServiceMonitors for kubelet and cAdvisor endpoints
In this step, you’ll create ServiceMonitors for kubelet and cAdvisor metrics exposed by the kubelet Service. Every node in your cluster exposes kubelet and cadvisor metrics at /metrics and /metrics/cadvisor respectively. Agent Operator creates a kubelet service that exposes these Node endpoints so that they can be scraped using ServiceMonitors.
To scrape these two endpoints, roll out the following two ServiceMonitors in your cluster:
- Kubelet ServiceMonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
instance: primary
name: kubelet-monitor
namespace: default
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
honorLabels: true
interval: 60s
metricRelabelings:
- action: keep
regex: kubelet_cgroup_manager_duration_seconds_count|go_goroutines|kubelet_pod_start_duration_seconds_count|kubelet_runtime_operations_total|kubelet_pleg_relist_duration_seconds_bucket|volume_manager_total_volumes|kubelet_volume_stats_capacity_bytes|container_cpu_usage_seconds_total|container_network_transmit_bytes_total|kubelet_runtime_operations_errors_total|container_network_receive_bytes_total|container_memory_swap|container_network_receive_packets_total|container_cpu_cfs_periods_total|container_cpu_cfs_throttled_periods_total|kubelet_running_pod_count|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate|container_memory_working_set_bytes|storage_operation_errors_total|kubelet_pleg_relist_duration_seconds_count|kubelet_running_pods|rest_client_request_duration_seconds_bucket|process_resident_memory_bytes|storage_operation_duration_seconds_count|kubelet_running_containers|kubelet_runtime_operations_duration_seconds_bucket|kubelet_node_config_error|kubelet_cgroup_manager_duration_seconds_bucket|kubelet_running_container_count|kubelet_volume_stats_available_bytes|kubelet_volume_stats_inodes|container_memory_rss|kubelet_pod_worker_duration_seconds_count|kubelet_node_name|kubelet_pleg_relist_interval_seconds_bucket|container_network_receive_packets_dropped_total|kubelet_pod_worker_duration_seconds_bucket|container_start_time_seconds|container_network_transmit_packets_dropped_total|process_cpu_seconds_total|storage_operation_duration_seconds_bucket|container_memory_cache|container_network_transmit_packets_total|kubelet_volume_stats_inodes_used|up|rest_client_requests_total
sourceLabels:
- __name__
- action: replace
targetLabel: job
replacement: integrations/kubernetes/kubelet
port: https-metrics
relabelings:
- sourceLabels:
- __metrics_path__
targetLabel: metrics_path
scheme: https
tlsConfig:
insecureSkipVerify: true
namespaceSelector:
matchNames:
- default
selector:
matchLabels:
app.kubernetes.io/name: kubelet- cAdvsior ServiceMonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
instance: primary
name: cadvisor-monitor
namespace: default
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
honorLabels: true
honorTimestamps: false
interval: 60s
metricRelabelings:
- action: keep
regex: kubelet_cgroup_manager_duration_seconds_count|go_goroutines|kubelet_pod_start_duration_seconds_count|kubelet_runtime_operations_total|kubelet_pleg_relist_duration_seconds_bucket|volume_manager_total_volumes|kubelet_volume_stats_capacity_bytes|container_cpu_usage_seconds_total|container_network_transmit_bytes_total|kubelet_runtime_operations_errors_total|container_network_receive_bytes_total|container_memory_swap|container_network_receive_packets_total|container_cpu_cfs_periods_total|container_cpu_cfs_throttled_periods_total|kubelet_running_pod_count|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate|container_memory_working_set_bytes|storage_operation_errors_total|kubelet_pleg_relist_duration_seconds_count|kubelet_running_pods|rest_client_request_duration_seconds_bucket|process_resident_memory_bytes|storage_operation_duration_seconds_count|kubelet_running_containers|kubelet_runtime_operations_duration_seconds_bucket|kubelet_node_config_error|kubelet_cgroup_manager_duration_seconds_bucket|kubelet_running_container_count|kubelet_volume_stats_available_bytes|kubelet_volume_stats_inodes|container_memory_rss|kubelet_pod_worker_duration_seconds_count|kubelet_node_name|kubelet_pleg_relist_interval_seconds_bucket|container_network_receive_packets_dropped_total|kubelet_pod_worker_duration_seconds_bucket|container_start_time_seconds|container_network_transmit_packets_dropped_total|process_cpu_seconds_total|storage_operation_duration_seconds_bucket|container_memory_cache|container_network_transmit_packets_total|kubelet_volume_stats_inodes_used|up|rest_client_requests_total
sourceLabels:
- __name__
- action: replace
targetLabel: job
replacement: integrations/kubernetes/cadvisor
path: /metrics/cadvisor
port: https-metrics
relabelings:
- sourceLabels:
- __metrics_path__
targetLabel: metrics_path
scheme: https
tlsConfig:
insecureSkipVerify: true
namespaceSelector:
matchNames:
- default
selector:
matchLabels:
app.kubernetes.io/name: kubeletThese two ServiceMonitors configure Agent to scrape all the Kubelet and cAdvisor endpoints in your Kubernetes cluster (one of each per Node). In addition, it defines a job label which you may change (it is preset here for compatibility with Grafana Cloud’s Kubernetes integration), and allowlists a core set of Kubernetes metrics to reduce remote metrics usage. If you don’t need this allowlist, you may omit it, however note that your metrics usage will increase significantly.
When you’re done, Agent should now be shipping Kubelet and cAdvisor metrics to your remote Prometheus endpoint.
Step 4: Deploy LogsInstance and PodLogs resources
In this step, you’ll deploy a LogsInstance resource to collect logs from your cluster nodes and ship these to your remote Loki endpoint. Under the hood, Agent Operator will deploy a DaemonSet of Agents in your cluster that will tail log files defined in PodLogs resources.
Deploy the LogsInstance into your cluster:
apiVersion: monitoring.grafana.com/v1alpha1
kind: LogsInstance
metadata:
name: primary
namespace: default
labels:
agent: grafana-agent-logs
spec:
clients:
- url: your_remote_logs_URL
basicAuth:
username:
name: primary-credentials-logs
key: username
password:
name: primary-credentials-logs
key: password
# Supply an empty namespace selector to look in all namespaces. Remove
# this to only look in the same namespace as the LogsInstance CR
podLogsNamespaceSelector: {}
podLogsSelector:
matchLabels:
instance: primaryThis LogsInstance will pick up PodLogs resources with the instance: primary label. Be sure to set the Loki URL to the correct push endpoint (for Grafana Cloud, this will be something like logs-prod-us-central1.grafana.net/loki/api/v1/push, however you should check the Cloud Portal to confirm).
Also note that we are using the agent: grafana-agent-logs label here, which will associate this LogsInstance with the GrafanaAgent resource defined in Step 1. This means that it will inherit requests, limits, affinities and other properties defined in the GrafanaAgent custom resource.
Create the Secret for the LogsInstance resource:
apiVersion: v1
kind: Secret
metadata:
name: primary-credentials-logs
namespace: default
stringData:
username: 'your_username_here'
password: 'your_password_here'If you’re using Grafana Cloud, you can find your hosted Loki endpoint username and password in the Grafana Cloud Portal. You may wish to base64-encode these values yourself. In this case, please use data instead of stringData.
Finally, we’ll roll out a PodLogs resource to define our logging targets. Under the hood, Agent Operator will turn this into Agent config for the logs subsystem, and roll it out to the DaemonSet of logging agents.
The following is a minimal working example which you should adapt to your production needs:
apiVersion: monitoring.grafana.com/v1alpha1
kind: PodLogs
metadata:
labels:
instance: primary
name: kubernetes-pods
namespace: default
spec:
pipelineStages:
- docker: {}
namespaceSelector:
matchNames:
- default
selector:
matchLabels: {}This tails container logs for all Pods in the default Namespace. You can restrict the set of Pods matched by using the matchLabels selector. You can also set additional pipelineStages and create relabelings to add or modify log line labels. To learn more about the PodLogs spec and available resource fields, please see the PodLogs CRD.
Under the hood, the above PodLogs resource will add the following labels to log lines:
namespaceservicepodcontainerjob- Set to
PodLogs_namespace/PodLogs_name
- Set to
__path__(the path to log files)- Set to
/var/log/pods/*$1/*.logwhere$1is__meta_kubernetes_pod_uid/__meta_kubernetes_pod_container_name
- Set to
To learn more about this config format and other available labels, please see the Promtail Scraping reference documentation. Agent Operator will load this config into the LogsInstance agents automatically.
At this point the DaemonSet of logging agents should be tailing your container logs, applying some default labels to the log lines, and shipping them to your remote Loki endpoint.
Conclusion
At this point you’ve rolled out the following into your cluster:
- A
GrafanaAgentresource, which discovers one or moreMetricsInstanceandLogsInstancesresources. - A
MetricsInstanceresource that defines where to ship collected metrics. - A
ServiceMonitorresource to collect cAdvisor and kubelet metrics. - A
LogsInstanceresource that defines where to ship collected logs. - A
PodLogsresource to collect container logs from Kubernetes Pods.
You can verify that everything is working correctly by navigating to your Grafana instance and querying your Loki and Prometheus datasources. Operator support for Tempo and traces is coming soon.



