Sift investigations

Sift is a powerful diagnostic assistant in Grafana Cloud designed to perform investigations on your infrastructure telemetry, helping you identify critical details during incidents. By employing a series of individual checks, Sift examines specific aspects of your infrastructure during investigations, providing valuable insights to guide your incident response efforts.

Before you begin

If needed, have an administrator initialize Grafana Machine Learning.

Sift checks

Sift offers a range of checks to analyze your system’s telemetry during investigations. These checks include:

Error Pattern Logs: Analyzes error logs and identifies groups of similar log lines, highlighting groups with significantly increased log rates based on shared patterns.
Kube Crashes: Detects recent container crashes by analyzing Kubernetes metrics and provides information on the cause of the crash (Error, OOMKill).
Noisy Neighbors: Identifies over-saturated hosts where load exceeds CPU core count, leading to high latency, and examines pods on those hosts for deeper insights into the underlying issues.
Recent Deployments: Identifies resources that recently underwent changes in Kubernetes, such as service updates or configuration modifications.
Resource Contention: Focuses on containers with significant CPU throttling due to reaching CPU limits, or significant packet loss due to networking issues. Unlike noisy neighbors, CPU throttling is caused by the container itself and not by other processes on the underlying infrastructure.
Slow Requests: Analyzes traces in Grafana Tempo, a distributed tracing system, to identify requests taking longer than a specified threshold (default: 3 seconds).
HTTP Error Series: Checks for series exhibiting elevated HTTP errors within a specified cluster and namespace.