Monitor services

Grafana Cloud

Monitor selected services

Use the entity catalog to create focused monitoring views for the services your team owns. Filter to specific services, bookmark the view, and check it daily to stay aware of service health without writing queries.

When to use this workflow

Use this workflow when you want to:

Monitor a specific set of services your team is responsible for
Create a daily “check-in” view for service health
Track RED metrics (request rate, error ratio, latency) across your services
Quickly identify which services have active insights

This is your primary workflow for proactive monitoring and early detection of issues.

Before you begin

Ensure your services are:

Instrumented with OpenTelemetry, Istio, or another supported APM source
Sending RED metrics to Grafana Cloud
Visible in the entity catalog

Open the entity catalog

From Grafana Cloud, navigate to Observability > Entity catalog.

Filter to your services

Create a focused view by applying filters:

Filter by entity type

Under Type, select Service.

This shows only services, removing infrastructure noise.

Filter by properties

Narrow to your team’s services using the property filter dropdowns located above the entity type filter.

Use any combination of these filters:

Namespace - Filter to your team’s namespace (for example, checkout, payments)
Region - Show only services in specific regions (for example, us-east-1, eu-west-1)
Env - Filter to specific environments (for example, production, staging)

You can select values from multiple dropdowns to narrow your view. For example, select production from the Env dropdown and us-east-1 from the Region dropdown to see only your production services in that region.

Search for specific services

Use the search bar to find services by name:

Type partial names (for example, api finds api-server, payment-api)
Search is case-insensitive
Results update as you type

Review service health

The entity catalog shows key information for each service:

Insight rings

Check the colored rings around each service:

Red outer ring - Critical issues on the service itself
Yellow outer ring - Warning-level issues
Blue outer ring - Informational insights (deployments, configuration changes)
Red/yellow inner ring - Issues propagated from controlled entities (for example, pods running the service)

Focus on services with red rings for immediate investigation.

RED metrics

Each service displays three core metrics:

Request rate - Requests per second with trend sparkline
Error ratio - Percentage of failed requests
Latency (P95) - 95th percentile response time

Look for:

Sudden drops in request rate - May indicate traffic routing issues
Error ratio spikes - Often correlate with deployments or dependency failures
Latency increases - Can signal resource saturation or slow dependencies

Sparklines

View sparklines to see trends over the selected time range. Patterns to watch for:

Spikes or drops during deployment windows
Regular daily patterns (normal) vs irregular patterns (investigate)
Gradual degradation over time

Investigate a service

When you spot an issue, click the service name to open its details:

Service overview tab shows:
- RED metrics with threshold visualization
- Active insights with severity and timing
- Related upstream and downstream services
Check active insights:
- Read insight descriptions to understand what was detected
- Note the insight category (Error, Anomaly, Saturation, etc.)
- Check timing to correlate with deployments or other changes
Review connected entities:
- See which services call this one (upstream)
- See which dependencies this service calls (downstream)
- Click connected entities to check if they share issues

Bookmark your view

After setting up filters, bookmark the entity catalog URL:

Apply all desired filters (entity type, properties, search).
Bookmark the page in your browser.
Name it clearly (for example, “Production Checkout Services”).

Return to this bookmark daily for consistent monitoring.

Set up multiple views

Create different bookmarked views for different contexts:

Production services - Filter to production environment
Team services - Filter by namespace or team label
Critical services - Filter to high-priority services only
Recently deployed - Services with recent Amend insights

What to look for

During daily monitoring, focus on:

Critical insights (red rings)

Error rate breaches on customer-facing services
Saturation warnings approaching resource limits
Failure insights like CrashLoopBackOff or service unavailability

Metric anomalies

Request rate dropping to zero (service down)
Error ratio jumping above 1% (investigate immediately)
P95 latency exceeding SLO thresholds

Patterns across services

Multiple services with errors at the same time (shared dependency issue)
Cascading latency increases (trace to slowest service)
All services in a namespace affected (infrastructure or network issue)

Next steps

When you identify an issue during monitoring:

Single service problem: Drill into the service overview and telemetry data
Multiple services affected: Use RCA workbench to correlate insights
Unclear root cause: Explore the entity graph to visualize dependencies
Infrastructure suspected: Switch to identify unhealthy infrastructure

Identify unhealthy infrastructure - Find infrastructure issues affecting services
Investigate incidents - Correlate issues across multiple entities
Track changes - Monitor deployments and configuration changes

Was this page helpful?

Email docs@grafana.com

Help and support

Community

Monitor selected services

When to use this workflow

Before you begin

Open the entity catalog