Grafana Cloud

Sift investigations

Note: Sift is currently in public preview. Grafana Labs offers support on a best-effort basis, and there might be breaking changes before the feature is generally available.

Sift is a powerful diagnostic assistant in Grafana Cloud designed to perform investigations on your infrastructure telemetry, helping you identify critical details during incidents. By employing a series of individual checks, Sift examines specific aspects of your infrastructure during investigations, providing valuable insights to guide your incident response efforts.

Sift checks

Sift offers a range of checks to analyze your system’s telemetry during investigations. These checks currently include:

  • Error Pattern Logs: Analyzes error logs and identifies groups of similar log lines, highlighting groups with significantly increased log rates based on shared patterns.

  • Kube Crashes: Detects recent container crashes by analyzing Kubernetes metrics and provides information on the cause of the crash (e.g., Error, OOMKill, etc.).

  • Noisy Neighbors: Identifies over-saturated hosts where load exceeds CPU core count, leading to high latency, and examines pods on those hosts for deeper insights into the underlying issues.

  • Recent Deployments: Identifies resources that recently underwent changes in Kubernetes, such as service updates or configuration modifications.

  • Resource Contention: Focuses on containers with significant CPU throttling due to reaching CPU limits, or significant packet loss due to networking issues. Unlike noisy neighbors, CPU throttling is caused by the container itself and not by other processes on the underlying infrastructure.

  • Slow Requests: Analyzes traces in Tempo (Grafana’s distributed tracing system) to identify requests taking longer than a specified threshold (default: 3 seconds).

Sift in Grafana Incident

Note: cluster and namespace are required to initiating a Sift investigation. However, there are future plans to expand Sift to accept different inputs.

You can use Sift investigations in Grafana Incident to get valuable suggestions while working to resolve an active incident. Currently, there are two ways you can leverage Sift within Grafana Incident:

  • Run a Sift investigation within an incident: From the Suggestions section in the right sidebar of the incident timeline, click Start Sift investigation. Manually enter the cluster and namespace to start a Sift investigation specifically tailored to the incident.

  • Add a dashboard to the Incident timeline: When linking to a dashboard from the Incident timeline, ensure they include cluster/namespace references. Sift automatically extracts these references and utilizes them to run an investigation relevant to the incident.

Note: When a Sift investigation is triggered from within an incident, the Timerange is automatically set to the incident start time through the time investigation is triggered.

View and manage Sift suggestions

When a Sift check identifies interesting results, clickable links appear in the right sidebar under Suggestions. Click these links to review detailed information about the specific Sift check.

You can add important Sift suggestions directly to the main Incident timeline. Alternatively, if a Sift check result is deemed irrelevant, you can dismiss it from the suggestions.