AI Observability Demo
Multi-cloud AI-native full-stack observability platform for monitoring LLM workloads.
Quick Start
Prerequisites
- AWS Account with Bedrock access. This demo uses Claude 3 Haiku/Sonnet in us-east-1, but you can substitute any Bedrock-supported model by updating the model ID in
gateway/litellm-config.yaml. The observability pipeline is model-agnostic and works with any LLM provider supported by LiteLLM. - AWS CLI configured with AdministratorAccess
- Docker Desktop running
- Docker Compose v2
- Python 3.11+
- Terraform 1.5.0+
Phase 1: Infrastructure Provisioning
cd AI-OBS_DEMO/terraform
terraform init
terraform plan -out=tfplan
terraform apply tfplan
# Capture outputs
export AMP_WORKSPACE_ID=$(terraform output -raw amp_workspace_id)
export AMP_REMOTE_WRITE_URL=$(terraform output -raw amp_remote_write_url)
export AMP_ENDPOINT=$(terraform output -raw amp_endpoint)
Phase 2: Environment Configuration
cd ..
cp .env.example .env
# Edit .env with your AWS credentials and Terraform outputs
# Add:
# - AWS_ACCESS_KEY_ID
# - AWS_SECRET_ACCESS_KEY
# - AWS_SESSION_TOKEN (if using temporary credentials)
# - AMP_REMOTE_WRITE_URL (from Terraform)
# - AMP_ENDPOINT (from Terraform)
Phase 3: Build and Launch
# Build all services
docker compose build
# Start the stack
docker compose up -d
# Verify services
docker compose ps
# Check OTEL Collector health
curl http://localhost:13133
# Run demo and watch logs
docker compose logs -f ai-app
Phase 4: Validate Telemetry
# Install awscurl
pip3 install awscurl
# Query AMP for token usage
awscurl --service aps --region us-east-1 \
"${AMP_ENDPOINT}api/v1/query?query=gen_ai_usage_input_tokens_total"
# Check X-Ray traces in AWS Console
# Navigate to: X-Ray > Traces > Filter by service: ai-observability-demo
# Check CloudWatch logs
# Navigate to: CloudWatch > Log Groups > /ai-observability-demo
Phase 5: Grafana Dashboard
- Open AWS Console → Amazon Managed Grafana → ai-observability-demo
- Click "Open Grafana"
- Add data sources:
- Prometheus: Use AMP endpoint with SigV4 auth
- CloudWatch: Set to the region where your resources are deployed
- X-Ray: Set to the same region as CloudWatch
- Import dashboard:
grafana/dashboards/ai-observability.json
Phase 6: MCP Server Integration
# Install MCP server dependencies
cd mcp-server
pip3 install -r requirements.txt
# Configure Kiro MCP
# Add to .kiro/mcp.json:
{
"mcpServers": {
"ai-observability": {
"command": "python3",
"args": ["/path/to/AI-OBS_DEMO/mcp-server/prometheus_mcp_server.py"],
"env": {
"AMP_ENDPOINT": "your-amp-endpoint",
"AWS_REGION": "your-aws-region",
"AWS_ACCESS_KEY_ID": "your-key",
"AWS_SECRET_ACCESS_KEY": "your-secret",
"AWS_SESSION_TOKEN": "your-token"
}
}
}
}
Region Configuration
Set AWS_REGION to the region where your observability resources (AMP, CloudWatch, X-Ray) are deployed. Any AWS region that supports these services will work.
Natural Language Queries in Kiro
Once configured, ask Kiro:
- "Which model is consuming the most tokens right now?"
- "What is the P95 latency for Claude Haiku over the last hour?"
- "Have there been any throttle events in the last 30 minutes?"
- "Estimate the cost of LLM usage today"
- "Compare latency across all active models"
Teardown
# Stop and remove containers
docker compose down
# Destroy AWS resources
cd terraform
terraform destroy -auto-approve
# Remove project (optional)
cd ../..
rm -rf AI-OBS_DEMO
Architecture
- OpenTelemetry: Vendor-neutral instrumentation with GenAI semantic conventions
- LiteLLM: Multi-provider AI gateway (AWS Bedrock, Azure OpenAI, etc.)
- Amazon Managed Prometheus: Scalable metrics storage with SigV4 auth
- AWS X-Ray: Distributed tracing for LLM invocations
- Amazon Managed Grafana: Unified visualization across AMP, CloudWatch, X-Ray
- Custom MCP Server: Natural language observability queries via Kiro
Extending to Multi-Cloud
- Azure OpenAI: Add real credentials to
.env- LiteLLM handles routing automatically - GCP Vertex AI: Add Vertex AI model to
gateway/litellm-config.yaml - On-premises models: Deploy vLLM/Ollama locally, add as LiteLLM provider
- Federated metrics: Deploy OTEL Collector in each cloud, remote write to same AMP workspace