Cloud-Agnostic AI Observability Platform - Architecture
Overview
This document describes the architecture of a cloud-agnostic AI observability platform built on AWS managed services. The platform provides unified monitoring, cost optimization, and operational insights for Large Language Model (LLM) workloads across multiple cloud providers.
Architecture Diagram

Architecture Components
1. LLM Providers Layer (Multi-Cloud)
The platform supports monitoring LLM invocations across multiple providers:
Model Flexibility
The models listed below are the ones used in this demo. Since the platform uses LiteLLM as the AI gateway, you can substitute any LLM supported by LiteLLM — simply update gateway/litellm-config.yaml with your preferred models. The observability pipeline works the same regardless of which models you choose.
AWS Bedrock
- Models: Claude 3 Haiku, Claude 3 Sonnet
- Integration: AWS SDK (boto3)
- Metrics: Token usage, latency, request counts
- Dimension:
CloudProvider=aws
Google Vertex AI
- Models: Gemini 1.5 Pro, Gemini 1.5 Flash
- Integration: Simulated (production would use Google Cloud SDK)
- Metrics: Token usage, latency, request counts
- Dimension:
CloudProvider=gcp
Azure OpenAI
- Models: GPT-4o, GPT-4o Mini
- Integration: Simulated (production would use Azure SDK)
- Metrics: Token usage, latency, request counts
- Dimension:
CloudProvider=azure
On-Premises (Ollama)
- Models: Llama 3.1 70B, Mistral 7B
- Integration: Simulated (production would use Ollama API)
- Metrics: Token usage, latency, request counts
- Dimension:
CloudProvider=on-prem
2. Application Layer
Python Application
- Framework: OpenTelemetry SDK for instrumentation
- Language: Python 3.8+
- Responsibilities:
- Invoke LLM APIs across providers
- Collect telemetry (metrics, traces, logs)
- Send data to OpenTelemetry Collector
OpenTelemetry Collector
- Protocol: OTLP (OpenTelemetry Protocol)
- Format: Cloud-agnostic, vendor-neutral
- Responsibilities:
- Receive telemetry from application
- Transform and enrich data
- Export to AWS services
3. AWS Observability Stack
Amazon CloudWatch
- Service Type: Managed metrics and monitoring
- Region: us-east-1
- Namespace:
AIObservability - Metrics:
InputTokens- Token count for promptsOutputTokens- Token count for completionsLatency- Response time in milliseconds
- Dimensions:
Model- LLM model identifierCloudProvider- Provider (aws, gcp, azure, on-prem)
- Retention: 15 months (default)
- Cost: $0.30 per metric per month (first 10,000 metrics free)
AWS X-Ray
- Service Type: Distributed tracing
- Region: us-east-1
- Responsibilities:
- Track request flow across services
- Identify performance bottlenecks
- Visualize service dependencies
- Trace Format: X-Ray segment documents
- Retention: 30 days
- Cost: $5.00 per 1 million traces recorded