Creating Custom Dashboards for GenAI Telemetry
Why Custom Dashboards?
When you enable Bedrock Model Invocation Logging and deploy the ADOT auto-instrumentation agent, AWS gives you a head start with out-of-the-box dashboards. Bedrock automatically provides invocation count, latency, token counts, and throttle metrics. Application Signals auto-generates service maps and SLO views. That's a solid foundation — but it's not the whole picture.
The out-of-the-box dashboards answer "is my AI healthy right now?" They don't answer the questions your DevOps, FinOps, and security teams actually ask:
- Which caller is burning through 80% of our Bedrock budget?
- Why did the completion rate drop after the 3 PM deployment?
- Is cross-region inference actually helping, or adding latency?
- Which prompts would benefit most from caching?
- Who made that model call that returned PII, and what did they ask?
- Is my agent failing at the tool layer or the model layer?
Answering these requires custom queries that join log groups, compute cost from tokens, segment by IAM role, and drill into span trees. The raw telemetry is already flowing — the value comes from how you slice it.
One Pipeline, Different Audiences
Your GenAI telemetry lands in three log groups: bedrock-model-invocation-logging, aws/spans, and /aws/bedrock-agentcore/runtimes/<agent>. The data doesn't change, but how you present it does. The same invocation data becomes:
- A DevOps dashboard showing completion rate, component latency, and agent error drill-down — focused on "is the system working?"
- A FinOps dashboard showing cost per model, top spenders, and caching opportunities — focused on "are we spending efficiently?"
This guide gives you the queries to build both. Pick the sections relevant to your audience. Each query notes its source log group, view type, query language, and what question it answers.
For an overview of the underlying data pipelines and when to enable each, see GenAI Observability on AWS.
DevOps Persona Dashboard
DevOps teams need to answer: is my GenAI workload healthy, and where are the bottlenecks? These queries focus on invocation health, agent workflow reliability, and performance bottlenecks.

Model Invocation Health
1. Stop Reason Breakdown by Model
- Purpose: Shows the distribution of ALL stop reasons across models. Every Bedrock invocation ends with a stop reason —
end_turn(natural completion),tool_use(calling a tool),max_tokens(truncated),stop_sequence(hit a boundary), or an error. Example: you might discover that 15% of your summarization model's calls end withmax_tokens— meaning users are getting cut-off responses — while your classification model is 100%end_turn. - Source:
bedrock-model-invocation-logging - View: Bar chart
- Query Language: CloudWatch Logs Insights
- Query:
fields @timestamp, modelId, operation, requestId,
output.outputBodyJson.stopReason as stop_reason
| filter schemaType = "ModelInvocationLog"
| filter ispresent(output.outputBodyJson.stopReason)
or ispresent(output.outputBodyJson.error)
| stats count() as stop_reason_count by stop_reason, modelId
- Alarm: Any non-healthy stop reason (not
end_turn,tool_use, orstop_sequence) exceeding 10% of a model's total invocations.
2. Completion Rate vs Truncation (hourly)
- Purpose: Tracks the hourly ratio of successful completions (
end_turn+tool_use) vs truncated responses (max_tokens). This is your SLA metric — target 95%+ completion rate. Example: if the completion rate drops from 97% to 88% between 3 PM and 4 PM, something changed — a new prompt template, a model update, or a configuration change is causing more truncation. - Source:
bedrock-model-invocation-logging - View: Time series (stacked)
- Query Language: CloudWatch Logs Insights
- Query:
fields @timestamp, modelId,
output.outputBodyJson.stopReason as stop_reason
| filter schemaType = "ModelInvocationLog"
| filter ispresent(output.outputBodyJson.stopReason)
| stats sum(stop_reason = "end_turn" or stop_reason = "tool_use") as ok,
sum(stop_reason = "max_tokens") as truncated
by bin(@timestamp, 1h) as hour
| sort hour desc