Skip to content

Single Cluster Open Source Observability - OTEL Collector Monitoring

Objective

This pattern demonstrates how to use the New EKS Cluster Open Source Observability Accelerator with monitoring for ADOT collector health.

Prerequisites

Ensure that you have installed the following tools on your machine.

  1. aws cli
  2. kubectl
  3. cdk
  4. npm

Deploying

Please follow the Deploying instructions of the New EKS Cluster Open Source Observability Accelerator pattern, except for step 7, where you need to replace "context" in ~/.cdk.json with the following:

 "context": {
    "fluxRepository": {
      "name": "grafana-dashboards",
      "namespace": "grafana-operator",
      "repository": {
        "repoUrl": "https://github.com/aws-observability/aws-observability-accelerator",
        "name": "grafana-dashboards",
        "targetRevision": "main",
        "path": "./artifacts/grafana-operator-manifests/eks/infrastructure"
      },
      "values": {
        "GRAFANA_CLUSTER_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/cluster.json",
        "GRAFANA_KUBELET_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/kubelet.json",
        "GRAFANA_NSWRKLDS_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/namespace-workloads.json",
        "GRAFANA_NODEEXP_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/nodeexporter-nodes.json",
        "GRAFANA_NODES_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/nodes.json",
        "GRAFANA_WORKLOADS_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/workloads.json",
        "GRAFANA_ADOTHEALTH_DASH_URL": "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/adot/adothealth.json",
        "GRAFANA_KSH_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/ksh.json",
        "GRAFANA_KCM_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/kcm.json"
      },
      "kustomizations": [
        {
          "kustomizationPath": "./artifacts/grafana-operator-manifests/eks/infrastructure"
        },
        {
          "kustomizationPath": "./artifacts/grafana-operator-manifests/eks/adot"
        }
      ]
    },
    "adotcollectormetrics.pattern.enabled": true
  }

Visualization

The OpenTelemetry collector produces metrics to monitor the entire pipeline.

Login to your Grafana workspace and navigate to the Dashboards panel. You should see three new dashboard named OpenTelemetry Health Collector, under Observability Accelerator Dashboards

This dashboard shows useful telemetry information about the ADOT collector itself which can be helpful when you want to troubleshoot any issues with the collector or understand how much resources the collector is consuming.

Below diagram shows an example data flow and the components in an ADOT collector:

ADOTCollectorComponents

In this dashboard, there are five sections. Each section has metrics relevant to the various components of the AWS Distro for OpenTelemetry (ADOT) collector :

Receivers

Shows the receiver’s accepted and refused rate/count of spans and metric points that are pushed into the telemetry pipeline.

Processors

Shows the accepted and refused rate/count of spans and metric points pushed into next component in the pipeline. The batch metrics can help to understand how often metrics are sent to exporter and the batch size.

receivers_processors

Exporters

Shows the exporter’s accepted and refused rate/count of spans and metric points that are pushed to any of the destinations. It also shows the size and capacity of the retry queue. These metrics can be used to understand if the collector is having issues in sending trace or metric data to the destination configured.

exporters

Collectors

Shows the collector’s operational metrics (Memory, CPU, uptime). This can be used to understand how much resources the collector is consuming.

collectors

Data Flow

Shows the metrics and spans data flow through the collector’s components.

dataflow

Note: To read more about the metrics and the dashboard used, visit the upstream documentation here.

Disable ADOT health monitoring

Update the context in cdk.json file located in cdk-eks-blueprints-patterns directory

   "context": {
    "adotcollectormetrics.pattern.enabled": false
  }

Teardown

You can teardown the whole CDK stack with the following command:

make pattern single-new-eks-opensource-observability destroy