Grafana Cloud

Track recent changes and their effects

Use the entity catalog and RCA workbench to track deployments, configuration changes, and scale events. Correlate these changes with performance issues to quickly identify whether a recent change triggered a problem.

When to use this workflow

Use this workflow when:

  • A service starts showing errors or latency issues
  • You suspect a recent deployment caused problems
  • Performance degraded but you’re not sure why
  • You need to audit recent changes across services
  • You want to correlate configuration changes with incidents

This workflow helps answer “what changed?” during troubleshooting.

Before you begin

Ensure the knowledge graph is capturing change events:

  • Kubernetes deployments and rollouts
  • ConfigMap and Secret updates
  • HPA (Horizontal Pod Autoscaler) scale events
  • Service version changes
  • Infrastructure configuration changes

These appear as Amend insights.

Find recent changes in the entity catalog

Use Amend insights to identify all recent deployments and configuration changes.

Filter to Amend insights

  1. Navigate to Observability > Entity catalog.
  2. Under Insight Rings, select Amend.
  3. Deselect other insight categories.

This shows only entities with recent changes.

Review change types

Amend insights include:

  • Deployment - New service version deployed
  • ConfigMap update - Configuration changed
  • Secret update - Secrets rotated or modified
  • HPA scale - Pods scaled up or down automatically
  • Manual scale - Replica count changed manually
  • Version change - Service or infrastructure version updated

Check timing

For each entity with an Amend insight:

  1. Click the entity to open details.
  2. Note the time when the change occurred.
  3. Compare with when performance issues started.

If the change happened just before issues started, it’s likely the cause.

Correlate changes with errors

Identify when recent changes trigger performance issues or failures.

Timeline correlation in the entity catalog

  1. Filter to services with both Amend and Error insights.
  2. Click a service to view details.
  3. In the service overview, check if:
    • Amend insight (blue) appears first
    • Error insights (red/yellow) appear shortly after
    • This pattern indicates the change triggered errors.

Use RCA workbench for multi-service analysis

When changes might affect multiple services:

  1. Navigate to Observability > RCA workbench.
  2. Add services that have both changes and errors.
  3. View the Timeline.
  4. Look for Amend insights (blue) followed by Error insights (red).

Example pattern:

  • 10:15 AM - Amend: Deployment on payment-service
  • 10:16 AM - Error: Request error rate breach on payment-service
  • 10:17 AM - Error: Timeout errors on checkout-service (calls payment-service)

This shows a deployment causing errors that propagated upstream.

Investigate specific change types

Drill into different kinds of changes to understand their specific impact.

Deployment changes

When a deployment Amend appears:

  1. Click the service to view details.
  2. Check Properties tab for:
    • New version number
    • Deployment time
    • Image tag or commit hash.
  3. Switch to Logs tab:
    • Filter to the deployment time
    • Look for startup errors or warnings
    • Check for configuration issues.

Common deployment issues:

  • Missing environment variables
  • Database migration failures
  • Dependency version incompatibilities
  • Incorrect configuration values

Configuration changes

When ConfigMap or Secret updates appear:

  1. View the Amend insight details.
  2. Check which configuration changed.
  3. Review Logs after the change:
    • Application reload or restart messages
    • Configuration parsing errors
    • Connection failures (if database or API credentials changed).

Scale events

When HPA or manual scale events appear:

  1. Check if scale-up or scale-down.
  2. Correlate with load patterns:
    • Scale up during traffic spikes (expected)
    • Scale down during low traffic (expected)
    • Rapid scale up/down cycles (potential thrashing).
  3. Look for issues after scale events:
    • New Pods in CrashLoopBackOff
    • Pods not becoming ready
    • Load balancer not routing to new Pods.

Track changes across the environment

Monitor changes across time ranges and specific areas of your infrastructure.

Filter by time range

To see all changes in a specific time window:

  1. In the entity catalog or RCA workbench, set the time range.
  2. Filter to Amend insights only.
  3. Review all entities that changed in that window.

This is useful for:

  • Post-incident review (what changed during the incident?)
  • Deployment auditing (what was deployed today?)
  • Change freeze verification (were changes made during freeze?)

Filter by namespace or environment

To track changes in specific areas:

  1. Use the Namespace dropdown to select your namespace.
  2. Also filter to Amend insights using the insight ring filter.
  3. See all changes affecting your services.

Useful for team-specific change tracking.

Recognize these typical scenarios where changes trigger problems.

Bad deployment rollout

Pattern: Deployment Amend followed immediately by errors on the same service

Symptoms:

  • Error rate spikes within minutes of deployment
  • Latency increases on new version
  • Pods restarting or crashing

Action:

  1. Rollback the deployment.
  2. Check logs from the new version for errors.
  3. Test the change in a lower environment.
  4. Fix the issue and redeploy.

Configuration mismatch

Pattern: ConfigMap/Secret update followed by service errors or failures

Symptoms:

  • Service can’t connect to database (credentials changed)
  • Feature flags cause unexpected behavior
  • Pods restarting due to invalid configuration

Action:

  1. Revert the configuration change.
  2. Verify configuration values.
  3. Test configuration in staging before applying to production.

Scale-induced issues

Pattern: HPA scale event followed by Pod failures or error rate increases

Symptoms:

  • New Pods fail to start (resource constraints)
  • Connection pool exhaustion as Pods scale up
  • Race conditions exposed by rapid scaling

Action:

  1. Check Pod logs for startup failures.
  2. Review resource requests and limits.
  3. Adjust HPA thresholds or resource allocations.
  4. Investigate application concurrency issues.

Deployment issues

Pattern: Deployment on one service causes errors on multiple upstream services

Symptoms:

  • Service A deployed (Amend)
  • Service A shows errors (breaking change)
  • Services B, C, D that call Service A show timeout errors

Action:

  1. Rollback Service A deployment.
  2. Review API compatibility and breaking changes.
  3. Coordinate deployments with dependent services.
  4. Use feature flags or API versioning for safer rollouts.

Use Amend insights for proactive monitoring

Establish regular practices to catch change-related issues early.

Regular change review

Establish a practice of reviewing changes:

Daily:

  1. Filter the entity catalog to Amend insights from last 24 hours.
  2. Review what changed in production.
  3. Cross-reference with error or latency increases.
  4. Flag suspicious correlations for investigation.

After incidents:

  1. Check RCA workbench timeline for Amend insights.
  2. Identify what changed before or during the incident.
  3. Document changes in post-mortem.
  4. Add safeguards to prevent similar issues.

Deployment validation

After deploying a service:

  1. Filter the entity catalog to the deployed service.
  2. Check for Amend insight confirming deployment.
  3. Monitor for Error, Anomaly, or Saturation insights appearing after.
  4. If errors appear, investigate immediately before the change propagates.

Change correlation dashboard

Create a bookmarked view:

  1. Filter to Amend and Error insights.
  2. Set time range to last 4 hours (capture recent changes).
  3. Bookmark as “Recent Changes and Errors”.
  4. Check regularly to spot change-related issues early.

Combine with other workflows

Integrate change tracking with other knowledge graph features for comprehensive analysis.

For incidents

  1. Use investigate incidents in RCA workbench.
  2. Look for Amend insights on the timeline.
  3. Identify if a change triggered the incident.
  4. Use explore dependencies to see impact.

For proactive monitoring

  1. Use monitor services with Amend filter.
  2. Watch for deployments on critical services.
  3. Validate health after each deployment.
  4. Catch issues before they escalate.

Best practices

Follow these practices to minimize change-related incidents and improve monitoring.

Change management

  • Deploy during low-traffic periods for critical services
  • Monitor for 30 minutes after deploying to catch early issues
  • Use canary or blue-green deployments to limit blast radius
  • Coordinate multi-service changes to avoid breaking dependencies

Monitor changes

  • Set up alerts for Error insights appearing shortly after Amend insights
  • Review change history during incident post-mortems
  • Track change frequency to identify teams or services with frequent deployments
  • Document problematic changes in runbooks for faster future diagnosis

Rollback readiness

  • Have rollback procedures documented for each service
  • Test rollback in staging environments
  • Automate rollback where possible (feature flags, deployment automation)
  • Monitor after rollback to ensure it resolved the issue

Next steps

When you identify a problematic change:

  • Immediate: Rollback the change or apply a hot fix
  • Short-term: Investigate logs and traces to understand what went wrong
  • Long-term: Improve testing, add monitoring, or adjust deployment practices

Additional resources