Grafana Cloud

Monitor non-standard workloads

Kubernetes ships with a handful of built-in workload controllers: Deployment, StatefulSet, DaemonSet, Job, and CronJob. Anything that schedules Pods through a different mechanism is “non-standard”: either a custom resource definition (CRD) managed by an operator, or Pods created with no owner reference at all.

Non-standard workloads matter for fleet visibility for two reasons:

  • Blind spots in rollup views. Most dashboards and tools aggregate by well-known owner kinds. A Pod owned by an argo.argoproj.io/Rollout or kafka.strimzi.io/KafkaNodePool doesn’t appear in the Deployment or StatefulSet summary, leaving capacity, restart counts, and failure signals invisible at the fleet level.
  • Different failure modes. Each controller has its own readiness model and update strategy. A canary Rollout failing its analysis looks nothing like a crashing Deployment replica. The signals and remediation paths differ.

Find non-standard workloads in your fleet

You can find non-standard workloads, including:

  • Argo Rollouts
  • Strimzi Pod sets
  • Unmanaged (or static) Pods
  • Bare Pods

Navigate to the Workloads main page and filter the Type column.

Filtering for workload type
Filtering for workload type

Jobs and CronJobs have their own page with a separate Type filter. Refer to Monitor jobs.

Argo Rollouts

Argo Rollouts replaces the standard Deployment controller with a Rollout custom resource that adds canary and blue/green update strategies. The owner chain is RolloutReplicaSetPod, the same shape as a Deployment, so Pods are owned by ReplicaSet objects, not directly by the Rollout.

Why it matters for monitoring:

  • A Rollout can hold traffic on the stable revision while the canary revision is still running. Pod counts alone don’t tell you which revision is active.
  • AnalysisRun objects emit pass or fail signals tied to metric queries. A failed analysis pauses the rollout but leaves two Pod sets running simultaneously, doubling resource consumption with no Deployment-level alert firing.

What to watch:

  • The argo_rollout_phase gauge: values are Healthy, Progressing, Paused, Degraded, and Unknown. A Rollout stuck in Progressing for an extended period often signals a failed analysis or a paused canary.
  • Restart counts and OOMKill events scoped to the canary Rollout name label.
  • The Pod-to-revision label rollouts-pod-template-hash to distinguish stable from canary traffic.

Strimzi PodSets

Strimzi, the Kafka operator for Kubernetes, replaced StatefulSet with its own StrimziPodSet custom resource definition (CRD) as the default in Strimzi 0.35. KafkaNodePool, a separate CRD for managing node pools, became generally available in Strimzi 0.41. Pods are still first-class Kubernetes objects, but their owner chain runs through strimzi.io resources, not apps/v1.

Why it matters for monitoring:

  • Kafka brokers and controllers require strict quorum. Losing one broker in a three-Node Cluster is a partial outage even if Kubernetes reports the Pod as Running. The JVM may be live but the broker may not have rejoined the in-sync replicas (ISR).
  • Standard workload health checks (available replicas ≥ desired) don’t apply. Kafka health is expressed through Kafka-level metrics: under-replicated partitions, ISR shrink rate, and leader elections.

What to watch:

  • kafka_server_replicamanager_underreplicatedpartitions: a non-zero value means data risk.
  • The Strimzi operator condition Ready=False on the Kafka or KafkaNodePool custom resource.
  • Pod owner labels strimzi.io/cluster and strimzi.io/name for grouping in queries.

Static and unmanaged Pods

Static Pods are defined as manifest files on a Node’s filesystem (default: /etc/kubernetes/manifests/) and managed directly by the kubelet, not the API server’s controllers. Control-plane components (kube-apiserver, kube-scheduler, kube-controller-manager, and etcd) are typically static Pods on self-managed Clusters.

Unmanaged Pods are API-created Pods with no owner reference, usually the result of a kubectl run invocation, a misconfigured operator, or a debugging session that was never cleaned up. Modern versions of kubectl run create a bare Pod by default, so it’s easy to leave one behind without realizing it.

Why it matters for monitoring:

  • Static Pods aren’t rescheduled if the Node fails; they’re tightly coupled to a single Node. A NotReady Node means the static Pod is gone until the Node recovers.
  • Unmanaged Pods are rescheduling orphans. If they’re evicted or OOMKilled, they disappear permanently. They also frequently represent forgotten resource consumers that escape capacity planning.

What to watch:

  • Pods without owners. In Prometheus, kube_pod_owner{owner_kind=""} surfaces these Pods.
  • The static Pod label kubernetes.io/config.source=file distinguishes them from unmanaged API Pods.
  • Node-scoped restarts and eviction events for static Pods tied to control-plane health.

Bare Pods

Bare Pods are a subset of unmanaged Pods created intentionally without a controller. They’re common in batch workloads, one-off migrations, and operator-injected sidecar bootstrapping. Unlike accidental unmanaged Pods, bare Pods are a deliberate pattern, but they carry the same observability gap.

Why it matters for monitoring:

  • No controller means no automatic restart on failure and no replica health signal. A bare Pod that exits with code 0 looks identical to one that crashed, so you need exit code and reason tracking explicitly.
  • Bare Pods often run privileged or with elevated permissions for maintenance tasks. Tracking their lifecycle (start time, runtime, termination reason) matters for both capacity and security posture.

What to watch:

  • kube_pod_container_status_last_terminated_reason distinguishes OOMKilled, Error, and Completed.
  • kube_pod_start_time combined with the absence of a matching owner reference detects long-lived bare Pods.
  • Namespace and label conventions, for example app.kubernetes.io/managed-by=manual, to separate intentional from accidental bare Pods.