In here you will find curated guidance, how-to's, and links to other resources that help with the application of observability (o11y) to various use cases. This includes managed services such as Amazon Managed Service for Prometheus and Amazon Managed Grafana as well as agents, for example OpenTelemetry and Fluent Bit. Content here is not resitricted to AWS tools alone though, and many open source projects are referenced here.
We want to address the needs of both developers and infrastructure folks equally, so many of the recipes "cast a wide net". We encourge you to explore and find the solutions that work best for what you are seeking to accomplish.
The content here is derived from actual customer engagement by our Solutions Architects, Professional Services, and feedback from other customers. Everything you will find here has been implemented by our actual customers in their own environments.
The way we think about the o11y space is as follows: we decompose it into six dimensions you can then combine to arrive at a specific solution:
|Destinations||Prometheus · Grafana · OpenSearch · CloudWatch · Jaeger|
|Agents||ADOT · Fluent Bit · CW agent · X-Ray agent|
|Infra & databases||RDS · DynamoDB · MSK|
|Compute unit||Batch · ECS · EKS · AEB · Lambda · AppRunner|
|Compute engine||Fargate · EC2 · Lightsail|
Example solution requirement
I need a logging solution for a Python app I'm running on EKS on Fargate with the goal to store the logs in an S3 bucket for further consumption
One stack that would fit this need is the following:
- Destination: An S3 bucket for further consumption of data
- Agent: FluentBit to emit log data from EKS
- Language: Python
- Infra & DB: N/A
- Compute unit: Kubernetes (EKS)
- Compute engine: EC2
Not every dimension needs to be specified and sometimes it's hard to decide where to start. Try different paths and compare the pros and cons of certain recipes.
To simplify navigation, we're grouping the six dimension into the following categories:
- By Compute: covering compute engines and units
- By Infra & Data: covering infrastructure and databases
- By Language: covering languages
- By Destination: covering telemetry and analytics
- Tasks: covering anomaly detection, alerting, troubleshooting, and more
How to use¶
You can either use the top navigation menu to browse to a specific index page,
starting with a rough selection. For example,
By Compute ->
Alternatively, you can search the site pressing
/ or the
All recipes published on this site are available via the MIT-0 license, a modification to the usual MIT license that removes the requirement for attribution.
How to contribute¶
Start a discussion on what you plan to do and we take it from there.
The recipes on this site are a good practices collection. In addition, there are a number of places where you can learn more about the status of open source projects we use as well as about the managed services from the recipes, so check out: