
How to monitor AI agent applications on Amazon Bedrock AgentCore with Grafana Cloud
Today’s AI agents have grown increasingly sophisticated, moving into production environments and becoming integral parts of engineering workflows. But these agents can also be black boxes for engineers, which makes observability more critical than ever.
Without proper monitoring, you’re often left feeling like you’re flying blind as you try to debug agent failures, understand performance bottlenecks, and track costs. We want to put our users back in control, so in this tutorial you’ll learn how to deploy an AI agent on Amazon Bedrock AgentCore with full observability powered by OpenTelemetry and Grafana Cloud.
More specifically, you’ll learn how to:
1. Deploy AI agents on AWS Bedrock AgentCore for managed, scalable production runtime
2. Instrument agents with OpenTelemetry using OpenLit for automatic, zero-code observability
3. Monitor agent performance in Grafana Cloud with AI Observability dashboards
4. Debug production issues using distributed tracing
5. Optimize costs by tracking token usage and model performance
Note: This post focuses on AI application observability. Stay tuned for the second part of this guide, which will focus on the AI observability at the infrastructure layer.
What is Amazon Bedrock AgentCore?
Amazon Bedrock AgentCore is a managed service that simplifies deploying and running AI agents in production. Think of it as a serverless runtime for your AI agents. You provide the agent code, and AWS handles the infrastructure, scaling, and execution environment.
Key benefits include:
- Managed infrastructure: No need to provision servers or manage Kubernetes clusters
- Amazon Bedrock integration: Native access to foundation models like Llama 3, Claude, and others
- Container-based deployment: Package your agent with all dependencies using Docker
- Enterprise-ready: Built-in security, IAM integration, and compliance features
AgentCore is particularly powerful for orchestration frameworks like CrewAI, LangGraph, or Strands, where coordinating multiple agents or complex workflows is necessary.
Why use OpenTelemetry for AI agents?
AI agents can be notoriously difficult to debug. A single user query might trigger:
- Multiple LLM API calls
- Tool invocations and external API requests
- Multi-step reasoning chains
- Retry logic and error handling
When something goes wrong (or worse), when performance silently degrades, you need visibility into every step. To address this, we recommend using OpenTelemetry (OTel), the industry-standard observability framework, which provides unified instrumentation for distributed applications and infrastructure.
For AI agents specifically, OpenTelemetry helps you answer critical questions:
- Which LLM calls are slowest?
- How many tokens am I consuming per request?
- Where are errors occurring in my agent workflow?
- What’s the end-to-end latency for user requests?
And while OpenTelemetry is powerful, manually instrumenting every LLM call and agent step is tedious and error-prone. This is where OpenLit shines.
OpenLit provides automatic instrumentation for AI frameworks:
- Zero code changes required; wrap your Python command with
openlit-instrument - Automatically capture LLM calls (OpenAI, Anthropic, Bedrock, etc.)
- Support for agent frameworks (CrewAI, LangChain, LlamaIndex)
- Export OpenTelemetry-compatible data to any OTLP backend
Tutorial: deploy and monitor a CrewAI agent
To illustrate how this works, let’s build a complete example: a research assistant agent powered by CrewAI and Meta’s Llama 3, deployed on AWS Bedrock AgentCore, with full observability in Grafana Cloud.
Prerequisites
Before starting, ensure you have:
- Python 3.12+** installed
- AWS CLI configured with credentials:
aws configureYou’ll need permissions for:
- Bedrock AgentCore
- Amazon ECR (Elastic Container Registry)
- Bedrock model access (specifically meta.llama3-8b-instruct-v1:0)
4. Grafana Cloud account (If you don’t have one, you can sign up for our forever-free tier now.)
5. AgentCore CLI installed:
python -m venv .venv && source .venv/bin/activate
pip install bedrock-agentcore-starter-toolkitStep 1: Create a CrewAI Agent
Let’s create an example AI Agent using CrewAI:
import os
from bedrock_agentcore import BedrockAgentCoreApp
from crewai import Agent, Task, Crew, Process
# Initialize AgentCore runtime
app = BedrockAgentCoreApp()
# Define a simple research assistant agent
researcher = Agent(
role="Research Assistant",
goal="Provide helpful, accurate answers, with concise summaries.",
backstory=("You are a knowledgeable research assistant who answers clearly "
"and cites facts when relevant."),
# Use Llama 3 8B via AWS Bedrock
llm="bedrock/meta.llama3-8b-instruct-v1:0",
verbose=False,
max_iter=2
)
@app.entrypoint
def invoke(payload: dict):
"""AgentCore entrypoint. Expects {'prompt': ''}"""
user_message = payload.get("prompt", "Hello!")
task = Task(
description=user_message,
agent=researcher,
expected_output="A helpful, well-structured response."
)
crew = Crew(
agents=[researcher],
tasks=[task],
process=Process.sequential,
verbose=False,
)
result = crew.kickoff()
return {"result": result.raw}
if __name__ == "__main__":
app.run()Key components:
- BedrockAgentCoreApp: Integrates CrewAI with the Amazon Bedrock AgentCore runtime
- Agent definition: Single agent with a research assistant role using Llama 3
@app.entrypoint: Decorator that marks the function as the agent’s entry point- Crew orchestration: CrewAI manages task execution and agent coordination
The agent accepts JSON input like {"prompt": "your question"} and returns a JSON response.
Step 2: Configure dependencies
Create a requirements.txt that includes:
crewai>=1.0.0
openlit>=1.35
litellm
bedrock-agentcoreStep 3: Configure AgentCore deployment
Run the AgentCore configuration command:
agentcore configure \
--deployment-type container \
--entrypoint crewai_agent.py \
--name crewai_agent \
--non-interactiveThis generates a .bedrock_agentcore/crewai_agent/ directory with:
- Dockerfile: Container build configuration
- agent_config.json: Metadata for AgentCore
Step 4: Add OpenTelemetry configuration
Now comes the observability magic. Edit the generated Dockerfile at .bedrock_agentcore/crewai_agent/Dockerfile and add these environment variables:
# Disable AWS ADOT observability to use OpenLIT exclusively
ENV DISABLE_ADOT_OBSERVABILITY="true"
# OpenTelemetry configuration for Grafana Cloud
ENV OTEL_SERVICE_NAME="my_service"
ENV OTEL_DEPLOYMENT_ENVIRONMENT="my_environment"
ENV OTEL_EXPORTER_OTLP_ENDPOINT="your_grafana_cloud_otlp_endpoint"
ENV OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic%20"Important: Replace the OTLP endpoint and headers with your Grafana Cloud credentials:
- Sign in to the Grafana Cloud portal and select your Grafana Cloud stack.
- Click Configure in the OpenTelemetry section.
- In the Password / API Token section, click Generate to create a new API token
- Give the API token a name
- Click on Create token
- Click on Close without copying the token
- Copy and replace the values for
OTEL_EXPORTER_OTLP_ENDPOINTandOTEL_EXPORTER_OTLP_HEADERSin the Dockerfile ENVs
For more information, refer to our guide on manually setting up OpenTelemetry for Grafana Cloud.
Next, ensure the CMD line in the Dockerfile uses OpenLit’s instrumentation wrapper:
# Use OpenLit to automatically instrument the agent
CMD ["openlit-instrument", "python", "-m", "crewai_agent"]What’s happening here?
- openlit-instrument wraps your Python command
- At runtime, OpenLit automatically monitors the CrewAI agent operations
- Every LLM request and agent task is traced and exported to Grafana via OTLP
Step 5: Build and deploy
Build the Docker image and deploy to AgentCore:
agentcore launch --local-buildThis command will:
1. Build the Docker image locally with all dependencies
2. Push the image to Amazon ECR
3. Deploy the agent to Bedrock AgentCore
4. Set up IAM execution roles
5. Configure the runtime environment
The deployment process takes two to five minutes. You’ll see output like:
✓ Pushing to ECR...
✓ Deploying to AgentCore...
✓ Agent deployed successfully!
Agent ID: agt_abc123xyzStep 6: Invoke the agent
Test your deployed agent:
# Simple test
agentcore invoke '{"prompt": "hi"}'
# Research query
agentcore invoke '{"prompt": "Explain AI Observability"}'
# Complex request
agentcore invoke '{"prompt": "Compare supervised and unsupervised learning with examples"}'Response:
{
"result": "Quantum computing is a revolutionary approach to computation that..."
}Response:
{
"result": "Quantum computing is a revolutionary approach to computation that..."
}Step 7: Explore Grafana Cloud AI Observability
Once you have telemetry flowing from CrewAI Agent on AgentCore to Grafana Cloud, you can use the pre-built dashboards from Grafana Cloud AI Observability.
Navigate to Connections → search for AI Observability and click on it → go to GenAI Observability, scroll down, and install the dashboards.
Here’s a breakdown of what you can see in the dashboards:
- End-to-end latency: Total time from request to response
- LLM call details: Which model, how many tokens, latency, cost

- Agent workflow: Task creation, execution, and response formatting

- Error traces: If something fails, you’ll see the exact step and error message

Next steps
The combination of AWS Bedrock AgentCore, OpenTelemetry, and Grafana Cloud provides a production-ready stack for AI agents with enterprise-grade observability. Explore our Grafana Cloud AI Observability documentation to learn more.
Grafana Cloud is the easiest way to get started with metrics, logs, traces, dashboards, and more. We have a generous forever-free tier and plans for every use case. Sign up for free now!



