Live:CloudOps Webinars & Hands-on Workshops ·Register ↗
メインコンテンツまでスキップ

Creating Custom Dashboards for GenAI Telemetry

Why Custom Dashboards?

When you enable Bedrock Model Invocation Logging and deploy the ADOT auto-instrumentation agent, AWS gives you a head start with out-of-the-box dashboards. Bedrock automatically provides invocation count, latency, token counts, and throttle metrics. Application Signals auto-generates service maps and SLO views. That's a solid foundation — but it's not the whole picture.

The out-of-the-box dashboards answer "is my AI healthy right now?" They don't answer the questions your DevOps, FinOps, and security teams actually ask:

  • Which caller is burning through 80% of our Bedrock budget?
  • Why did the completion rate drop after the 3 PM deployment?
  • Is cross-region inference actually helping, or adding latency?
  • Which prompts would benefit most from caching?
  • Who made that model call that returned PII, and what did they ask?
  • Is my agent failing at the tool layer or the model layer?

Answering these requires custom queries that join log groups, compute cost from tokens, segment by IAM role, and drill into span trees. The raw telemetry is already flowing — the value comes from how you slice it.

One Pipeline, Different Audiences

Your GenAI telemetry lands in three log groups: bedrock-model-invocation-logging, aws/spans, and /aws/bedrock-agentcore/runtimes/<agent>. The data doesn't change, but how you present it does. The same invocation data becomes:

  • A DevOps dashboard showing completion rate, component latency, and agent error drill-down — focused on "is the system working?"
  • A FinOps dashboard showing cost per model, top spenders, and caching opportunities — focused on "are we spending efficiently?"

This guide gives you the queries to build both. Pick the sections relevant to your audience. Each query notes its source log group, view type, query language, and what question it answers.

For an overview of the underlying data pipelines and when to enable each, see GenAI Observability on AWS.


DevOps Persona Dashboard

DevOps teams need to answer: is my GenAI workload healthy, and where are the bottlenecks? These queries focus on invocation health, agent workflow reliability, and performance bottlenecks.

GenAI DevOps Dashboard

Model Invocation Health

1. Stop Reason Breakdown by Model

  • Purpose: Shows the distribution of ALL stop reasons across models. Every Bedrock invocation ends with a stop reason — end_turn (natural completion), tool_use (calling a tool), max_tokens (truncated), stop_sequence (hit a boundary), or an error. Example: you might discover that 15% of your summarization model's calls end with max_tokens — meaning users are getting cut-off responses — while your classification model is 100% end_turn.
  • Source: bedrock-model-invocation-logging
  • View: Bar chart
  • Query Language: CloudWatch Logs Insights
  • Query:
fields @timestamp, modelId, operation, requestId,
output.outputBodyJson.stopReason as stop_reason
| filter schemaType = "ModelInvocationLog"
| filter ispresent(output.outputBodyJson.stopReason)
or ispresent(output.outputBodyJson.error)
| stats count() as stop_reason_count by stop_reason, modelId
  • Alarm: Any non-healthy stop reason (not end_turn, tool_use, or stop_sequence) exceeding 10% of a model's total invocations.

2. Completion Rate vs Truncation (hourly)

  • Purpose: Tracks the hourly ratio of successful completions (end_turn + tool_use) vs truncated responses (max_tokens). This is your SLA metric — target 95%+ completion rate. Example: if the completion rate drops from 97% to 88% between 3 PM and 4 PM, something changed — a new prompt template, a model update, or a configuration change is causing more truncation.
  • Source: bedrock-model-invocation-logging
  • View: Time series (stacked)
  • Query Language: CloudWatch Logs Insights
  • Query:
fields @timestamp, modelId,
output.outputBodyJson.stopReason as stop_reason
| filter schemaType = "ModelInvocationLog"
| filter ispresent(output.outputBodyJson.stopReason)
| stats sum(stop_reason = "end_turn" or stop_reason = "tool_use") as ok,
sum(stop_reason = "max_tokens") as truncated
by bin(@timestamp, 1h) as hour
| sort hour desc
  • Alarm: ok / (ok + truncated) below 95% for 2 consecutive hours.

3. Token Efficiency — Find Wasted Tokens

  • Purpose: Finds callers sending high input tokens (more than 2000) but receiving low output (under 200) — a sign of token waste. Example: a classification pipeline sending entire product catalogs (8000 tokens) to get a one-word label (3 tokens). The caller_arn column tells you exactly which service or role is responsible, so you can have a targeted conversation about restructuring their prompts.
  • Source: bedrock-model-invocation-logging
  • View: Table
  • Query Language: CloudWatch Logs Insights
  • Query:
fields @timestamp, modelId, operation,
input.inputTokenCount as input_tokens,
output.outputTokenCount as output_tokens,
identity.arn as caller_arn
| filter schemaType = "ModelInvocationLog"
| filter input_tokens > 2000 and output_tokens < 200
| stats count() as inefficient_requests,
avg(input_tokens) as avg_input_tokens,
avg(output_tokens) as avg_output_tokens,
sum(input_tokens) as total_wasted_tokens
by modelId, operation, caller_arn
| sort total_wasted_tokens desc
  • Alarm: Any caller with total_wasted_tokens above 100K in 24h.

4. Cross-Region Inference Latency

  • Purpose: Compares latency percentiles across inference regions for each model. If you've enabled cross-region inference, some requests route to distant regions with higher latency. Example: your summarization model's P95 is 12s in us-west-2 but 4s in us-east-1 — configuring your inference profile to prefer us-east-1 can reduce P95 by 40%.
  • Source: bedrock-model-invocation-logging
  • View: Table
  • Query Language: CloudWatch Logs Insights
  • Query:
fields @timestamp, modelId, region, inferenceRegion,
output.outputBodyJson.metrics.latencyMs as latency
| filter schemaType = "ModelInvocationLog"
| filter ispresent(inferenceRegion)
| filter latency > 0
| stats count() as invocations,
avg(latency) as avg_latency,
pct(latency, 50) as p50_latency,
pct(latency, 95) as p95_latency,
pct(latency, 99) as p99_latency,
stddev(latency) as latency_stddev
by modelId, region, inferenceRegion
| sort modelId asc, avg_latency asc
  • Alarm: Any model P95 above 10 seconds in a specific region.

5. Prompt Caching Opportunities

  • Purpose: Finds prompts that are called repeatedly but have zero or low cache hits — the biggest caching ROI opportunities. Example: a system prompt used 500 times with zero cache reads means you're paying full price every time — enabling caching could save 90% on those input tokens.
  • Source: bedrock-model-invocation-logging
  • View: Table
  • Query Language: CloudWatch Logs Insights
  • Query:
fields @timestamp,
input.inputBodyJson.messages.0.content.0.text as promptText,
input.inputTokenCount as inputTokens,
input.cacheReadInputTokenCount as cacheReadTokens,
input.cacheWriteInputTokenCount as cacheWriteTokens,
modelId
| filter input.inputTokenCount > 0
| stats sum(input.inputTokenCount) as totalInputTokens,
count(*) as invocationCount,
avg(input.inputTokenCount) as avgInputTokens,
sum(input.cacheReadInputTokenCount) as totalCacheReadTokens,
sum(input.cacheWriteInputTokenCount) as totalCacheWriteTokens
by promptText, modelId
| filter invocationCount > 1
| sort totalInputTokens desc
  • Alarm: None (optimization review, run weekly).

Agent Workflow Health

6. Agent Traces vs Errors (hourly)

  • Purpose: Hourly count of total agent traces alongside error spans — your agent-level reliability metric. Example: if total_traces is 500/hour but error_spans jumps from 5 to 80 at 3 PM, something broke in the agent workflow. This catches problems that model-level metrics miss — the model can succeed while the agent fails due to tool timeouts or guardrail rejections.
  • Source: aws/spans
  • View: Time series
  • Query Language: CloudWatch Logs Insights
  • Query:
fields attributes.session.id as sessionId, traceId,
status.code as statusCode, durationNano/1000000 as durationMs
| filter ispresent(sessionId)
| stats count_distinct(traceId) as total_traces,
sum(statusCode = "ERROR") as error_spans
by bin(@timestamp, 1h) as hour
| sort hour desc
  • Alarm: error_spans / total_traces above 10% for 15 minutes.

7. Span Error Drill-Down

  • Purpose: When you know there are agent errors, this tells you exactly WHICH component is failing — knowledge base retrieval, guardrail check, tool execution, or model invocation. Example: 70% of errors are in the KB retrieval span with HTTP 503 — your OpenSearch cluster is throttling under load, not a model problem.
  • Source: aws/spans
  • View: Table
  • Query Language: CloudWatch Logs Insights
  • Query:
fields name as spanName,
resource.attributes.service.name as serviceName,
status.code as statusCode,
status.message as statusMessage,
attributes.http.response.status_code as httpStatus,
durationNano/1000000 as durationMs,
traceId, spanId, parentSpanId
| filter resource.attributes.aws.service.type = "gen_ai_agent"
| filter status.code = "ERROR"
or attributes.http.response.status_code >= 400
| stats count() as error_count,
count_distinct(traceId) as affected_traces,
avg(durationMs) as avg_error_duration_ms,
earliest(statusMessage) as error_message
by spanName, serviceName, httpStatus
| sort error_count desc
  • Alarm: Any component with more than 10 errors in 15 minutes.

8. Component Performance Breakdown (hourly)

  • Purpose: Hourly performance per agent component with full percentile distributions (P50, P95, P99). Shows where agent time is spent and which component is the bottleneck. Example: the guardrail check averages 2.8s (P95: 4.1s) while the model call averages 1.2s (P95: 2.0s) — optimize the guardrail first, it has more impact than any model optimization.
  • Source: aws/spans
  • View: Table
  • Query Language: CloudWatch Logs Insights
  • Query:
fields name as spanName,
resource.attributes.service.name as serviceName,
durationNano/1000000 as durationMs,
traceId
| filter resource.attributes.aws.service.type = "gen_ai_agent"
| filter ispresent(spanName)
| stats count() as invocations,
avg(durationMs) as avg_duration_ms,
pct(durationMs, 50) as p50_duration_ms,
pct(durationMs, 95) as p95_duration_ms,
pct(durationMs, 99) as p99_duration_ms,
sum(durationMs) as total_time_ms
by bin(1h), spanName, serviceName
| sort total_time_ms desc
  • Alarm: Any component P95 above 5000ms.

FinOps Persona Dashboard

FinOps teams need to answer: where is our GenAI spend going, and how do we optimize it? These queries compute cost from token usage, attribute spend to teams and roles, and surface optimization opportunities like prompt caching.

GenAI FinOps Dashboard

All FinOps queries use a cost calculation pattern based on per-token pricing. The strcontains multiplication pattern maps each model to its per-token rate. Update the pricing values when Bedrock pricing changes.

Executive Summary

9. Total Estimated Spend

  • Purpose: Single-value widget showing total GenAI spend across all models for the selected time range. This is your headline KPI — the number the CFO cares about.
  • Source: bedrock-model-invocation-logging
  • View: Single value
  • Query Language: CloudWatch Logs Insights
  • Query:
fields coalesce(output.outputBodyJson.usage.inputTokens,
output.outputBodyJson.usage.prompt_tokens,
output.outputBodyJson.usage.input_tokens,
input.inputTokenCount) as inputTokens,
coalesce(output.outputBodyJson.usage.outputTokens,
output.outputBodyJson.usage.completion_tokens,
output.outputBodyJson.usage.output_tokens,
output.outputTokenCount) as outputTokens
| fields (inputTokens *
((strcontains(modelId, "nova-micro") * 0.000000035) +
(strcontains(modelId, "nova-lite") * 0.00000006) +
(strcontains(modelId, "nova-pro") * 0.0000008) +
(strcontains(modelId, "claude-sonnet-4-6") * 0.000003) +
(strcontains(modelId, "claude-sonnet-4-5") * 0.000003) +
(strcontains(modelId, "claude-haiku") * 0.000001) +
(strcontains(modelId, "llama4-maverick") * 0.0000002) +
(strcontains(modelId, "llama4-scout") * 0.00000015) +
(strcontains(modelId, "command-r-plus") * 0.0000025) +
(strcontains(modelId, "command-r-v") * 0.00000015) +
(strcontains(modelId, "gpt-oss-120b") * 0.00000009) +
(strcontains(modelId, "gpt-oss-20b") * 0.00000004))) +
(outputTokens *
((strcontains(modelId, "nova-micro") * 0.00000014) +
(strcontains(modelId, "nova-lite") * 0.00000024) +
(strcontains(modelId, "nova-pro") * 0.0000032) +
(strcontains(modelId, "claude-sonnet-4-6") * 0.000015) +
(strcontains(modelId, "claude-sonnet-4-5") * 0.000015) +
(strcontains(modelId, "claude-haiku") * 0.000005) +
(strcontains(modelId, "llama4-maverick") * 0.0000002) +
(strcontains(modelId, "llama4-scout") * 0.00000015) +
(strcontains(modelId, "command-r-plus") * 0.00001) +
(strcontains(modelId, "command-r-v") * 0.0000006) +
(strcontains(modelId, "gpt-oss-120b") * 0.00000045) +
(strcontains(modelId, "gpt-oss-20b") * 0.0000002))) as totalCostUSD
| stats sum(totalCostUSD) as TotalSpendUSD
  • Alarm: Daily spend exceeds 150% of 7-day average.

Cost Analysis

10. Cost Distribution by Model

  • Purpose: Pie chart showing which models account for your spend. Example: you discover Claude Sonnet 4.6 is 70% of your bill while Nova Lite is 5% — a prompt migration opportunity if some use cases could move to Nova.
  • Source: bedrock-model-invocation-logging
  • View: Pie
  • Query Language: CloudWatch Logs Insights
  • Query (append to the cost calculation pattern from Query 9):
| stats sum(totalCostUSD) as costUSD by modelName
| sort costUSD desc
  • Alarm: None (informational).

11. Top 10 Spenders by Role/User

  • Purpose: Identifies which IAM roles or users are driving spend. Combined with invocation count and cost-per-call, you can see whether a team is spending more because of volume or because their calls are more expensive. Example: the data-science-exploration role has 100K invocations at $0.002 each while prod-chatbot has 10K at $0.05 each — very different optimization paths.
  • Source: bedrock-model-invocation-logging
  • View: Table
  • Query Language: CloudWatch Logs Insights
  • Query:
fields replace(`identity.arn`, "arn:aws:sts::ACCOUNT_ID:assumed-role/", "") as userRole
| fields coalesce(output.outputBodyJson.usage.inputTokens,
output.outputBodyJson.usage.prompt_tokens,
output.outputBodyJson.usage.input_tokens,
input.inputTokenCount) as inputTokens,
coalesce(output.outputBodyJson.usage.outputTokens,
output.outputBodyJson.usage.completion_tokens,
output.outputBodyJson.usage.output_tokens,
output.outputTokenCount) as outputTokens
| fields (inputTokens *
((strcontains(modelId, "nova-micro") * 0.000000035) +
(strcontains(modelId, "nova-lite") * 0.00000006) +
(strcontains(modelId, "nova-pro") * 0.0000008) +
(strcontains(modelId, "claude-sonnet-4-6") * 0.000003) +
(strcontains(modelId, "claude-sonnet-4-5") * 0.000003) +
(strcontains(modelId, "claude-haiku") * 0.000001))) +
(outputTokens *
((strcontains(modelId, "nova-micro") * 0.00000014) +
(strcontains(modelId, "nova-lite") * 0.00000024) +
(strcontains(modelId, "nova-pro") * 0.0000032) +
(strcontains(modelId, "claude-sonnet-4-6") * 0.000015) +
(strcontains(modelId, "claude-sonnet-4-5") * 0.000015) +
(strcontains(modelId, "claude-haiku") * 0.000005))) as totalCostUSD
| stats sum(totalCostUSD) as spend,
count(*) as invocations,
(sum(totalCostUSD) / count(*)) as costPerCall
by userRole
| sort spend desc
| limit 10
  • Alarm: Any role's daily spend exceeding 2x its 7-day average.

12. Input vs Output Cost Split (hourly)

  • Purpose: Shows whether you're spending more on input tokens (prompts) or output tokens (completions). If input cost dominates, optimize prompts and enable caching. If output cost dominates, reduce max_tokens or switch to a cheaper model.
  • Source: bedrock-model-invocation-logging
  • View: Bar (stacked)
  • Query Language: CloudWatch Logs Insights
  • Query (append to the cost calculation pattern, splitting input/output):
| stats sum(inputCostUSD) as InputCost, sum(outputCostUSD) as OutputCost
by bin(1h) as hour
| sort hour asc
  • Alarm: None (analysis widget).

Token Consumption

13. Invocation Count (15-min windows)

  • Purpose: Traffic volume baseline in 15-minute windows. If invocations are normally 2-4 per window but suddenly spike to 10, something changed — a new feature launch, a load test, or a runaway retry loop. Compare with the hourly cost trend to see if cost spikes correlate with volume spikes or with model choice changes.
  • Source: bedrock-model-invocation-logging
  • View: Time series
  • Query Language: CloudWatch Logs Insights
  • Query:
stats count(*) as invocations by bin(15m) as period
| sort period asc
  • Alarm: Invocations exceeding 3x the normal 15-minute average for 2 consecutive periods.

14. Input vs Output Tokens

  • Purpose: Shows input vs output token consumption in 5-minute windows. The ratio reveals your workload profile. Example: if input tokens are consistently 10x output tokens, you're sending large context (RAG, system prompts) for short responses — a prime candidate for prompt caching. If output tokens suddenly spike, a model update or prompt change may be generating longer responses.
  • Source: bedrock-model-invocation-logging
  • View: Bar (stacked)
  • Query Language: CloudWatch Logs Insights
  • Query:
fields
coalesce(output.outputBodyJson.usage.inputTokens,
output.outputBodyJson.usage.prompt_tokens,
output.outputBodyJson.usage.input_tokens,
input.inputTokenCount) as inputTokens,
coalesce(output.outputBodyJson.usage.outputTokens,
output.outputBodyJson.usage.completion_tokens,
output.outputBodyJson.usage.output_tokens,
output.outputTokenCount) as outputTokens
| stats sum(inputTokens) as totalInputTokens,
sum(outputTokens) as totalOutputTokens
by bin(5m) as period
| sort period asc
  • Alarm: Input-to-output ratio exceeding 20:1 sustained for 1 hour — investigate prompt optimization.

15. Total Token Count

  • Purpose: Combined (input + output) token consumption in 5-minute windows. The simplest view of how much you're using. Example: if total tokens are climbing week over week without a corresponding increase in invocations, individual requests are getting larger (longer prompts or longer responses). Compare with invocation count to distinguish "more requests" from "bigger requests."
  • Source: bedrock-model-invocation-logging
  • View: Bar
  • Query Language: CloudWatch Logs Insights
  • Query:
fields
coalesce(output.outputBodyJson.usage.inputTokens,
output.outputBodyJson.usage.prompt_tokens,
output.outputBodyJson.usage.input_tokens,
input.inputTokenCount) as inputTokens,
coalesce(output.outputBodyJson.usage.outputTokens,
output.outputBodyJson.usage.completion_tokens,
output.outputBodyJson.usage.output_tokens,
output.outputTokenCount) as outputTokens
| stats sum(inputTokens) + sum(outputTokens) as totalTokens by bin(5m) as period
| sort period asc
  • Alarm: Total tokens exceeding 200% of the 7-day average in any 5-minute window.

Invocation Detail

16. Per-Invocation Detail Table

  • Purpose: Last 200 invocations with full detail — model name, temperature, maxTokens config, input/output/total tokens, cache read/write tokens, and estimated cost per call. This is your drill-down table for investigating specific calls. Example: you spot an invocation with 12,000 input tokens, 50 output tokens, zero cache reads, and $0.04 cost — that's a classification task sending an entire document.
  • Source: bedrock-model-invocation-logging
  • View: Table
  • Query Language: CloudWatch Logs Insights
  • Query:
fields @timestamp, modelId,
replace(replace(replace(modelId,
"arn:aws:bedrock:us-east-1:ACCOUNT_ID:inference-profile/us.", ""),
"arn:aws:bedrock:us-east-1:ACCOUNT_ID:inference-profile/", ""),
"arn:aws:bedrock:us-east-1:ACCOUNT_ID:", "") as modelName,
coalesce(input.inputBodyJson.inferenceConfig.temperature,
input.inputBodyJson.temperature) as temperature,
coalesce(input.inputBodyJson.inferenceConfig.maxTokens,
input.inputBodyJson.max_completion_tokens,
input.inputBodyJson.max_tokens) as maxTokens,
coalesce(output.outputBodyJson.usage.inputTokens,
output.outputBodyJson.usage.prompt_tokens,
output.outputBodyJson.usage.input_tokens,
input.inputTokenCount) as inputTokens,
coalesce(output.outputBodyJson.usage.outputTokens,
output.outputBodyJson.usage.completion_tokens,
output.outputBodyJson.usage.output_tokens,
output.outputTokenCount) as outputTokens,
coalesce(output.outputBodyJson.usage.totalTokens,
output.outputBodyJson.usage.total_tokens,
floor(inputTokens + outputTokens)) as totalTokens,
coalesce(output.outputBodyJson.usage.cache_read_input_tokens,
output.outputBodyJson.usage.cacheReadInputTokenCount) as cacheReadTokens,
coalesce(output.outputBodyJson.usage.cache_creation_input_tokens,
output.outputBodyJson.usage.cacheWriteInputTokenCount) as cacheWriteTokens
| display @timestamp, modelName, temperature, maxTokens,
inputTokens, outputTokens, totalTokens,
cacheReadTokens, cacheWriteTokens
| sort @timestamp desc
| limit 200
  • Alarm: None (drill-down table — use for investigation).

17. Top 10 Prompts with High Token Count

  • Purpose: The 10 most token-heavy invocations with full request/response bodies, model name, token counts, and latency. These are your most expensive individual calls. Example: the top prompt uses 15,000 tokens with 8 seconds latency — reading the actual prompt text reveals it's stuffing an entire knowledge base into context instead of using retrieval. Requires "Log request and response body" enabled in Bedrock model invocation logging settings.
  • Source: bedrock-model-invocation-logging
  • View: Table
  • Query Language: CloudWatch Logs Insights
  • Query:
filter !isPresent(errorCode)
| fields jsonParse(@message) as json_message,
replace(replace(replace(modelId,
"arn:aws:bedrock:us-east-1:ACCOUNT_ID:inference-profile/us.", ""),
"arn:aws:bedrock:us-east-1:ACCOUNT_ID:inference-profile/", ""),
"arn:aws:bedrock:us-east-1:ACCOUNT_ID:", "") as modelName
| unnest json_message.input into inputMessage
| unnest json_message.output into outputMessage
| display requestId, timestamp, modelName, inputMessage, outputMessage,
coalesce(input.inputTokenCount, 0) as inputTokenCount,
coalesce(output.outputTokenCount, 0) as outputTokenCount,
coalesce(input.inputTokenCount, 0) + coalesce(output.outputTokenCount, 0) as totalTokenCount,
(output.outputBodyJson.metrics.latencyMs / 1000) as latency
| sort totalTokenCount desc
| limit 10
  • Alarm: Any single invocation exceeding 20,000 total tokens — review prompt design.

Model Pricing Reference

警告

These prices are a snapshot and may be out of date. AWS updates Bedrock pricing regularly and adds new models. Always verify current rates on the AWS Bedrock pricing page and update the strcontains multipliers in your queries accordingly.

Prices are per token (not per 1K or 1M tokens). To update: find your model in the Bedrock pricing page, convert the per-1M-token price to per-token (divide by 1,000,000), and replace the matching value in the strcontains block of each cost query.

ModelInput ($/token)Output ($/token)
Nova Micro0.0000000350.00000014
Nova Lite0.000000060.00000024
Nova Pro0.00000080.0000032
Claude Sonnet 4.60.0000030.000015
Claude Sonnet 4.50.0000030.000015
Claude Sonnet 40.0000030.000015
Claude Haiku 4.50.0000010.000005
Llama 4 Maverick0.00000020.0000002
Llama 4 Scout0.000000150.00000015
Cohere Command R+0.00000250.00001
Cohere Command R0.000000150.0000006
GPT OSS 120B0.000000090.00000045
GPT OSS 20B0.000000040.0000002

Alarm Recommendations

DevOps Alarms

AlarmConditionSeverity
Completion rate dropok / (ok + truncated) below 95% for 2 hoursWarning
Token wasteCaller above 100K wasted tokens in 24hWarning
Cross-region latencyModel P95 above 10s in a regionWarning
Agent error rateerror_spans / total_traces above 10% for 15 minCritical
Component errorsComponent above 10 errors in 15 minCritical
Component latencyComponent P95 above 5000msWarning

FinOps Alarms

AlarmConditionSeverity
Daily cost spikeDaily cost exceeds 150% of 7-day averageWarning
Hourly cost anomalyHourly cost exceeds 3x the hourly averageWarning
Cost concentrationSingle model exceeds 60% of total spendWarning
Token volume spikeTotal tokens exceed 2x baseline in 1 hourWarning
Error rate cost wasteError rate above 5% (paying for failed calls)Warning
Per-role budgetAny role's daily spend exceeding 2x its 7-day averageWarning
Token ratio imbalanceInput:output ratio exceeding 20:1 sustained for 1 hourWarning
High-token invocationAny single call exceeding 20,000 tokensWarning

Additional Resources


Contributors: AWS Observability Team Last Updated: 2026-04-21