Help build the future of open source observability software Open positions

Check out the open source projects we support Downloads

Grot cannot remember your choice unless you click the consent notice at the bottom.

A guide to scaling OpenTelemetry Collectors across multiple hosts via Ansible

A guide to scaling OpenTelemetry Collectors across multiple hosts via Ansible

2024-04-18 5 min

OpenTelemetry has emerged as a key open source tool in the observability space. And as organizations use it to manage more of their telemetry data, they also need to understand how to make it work across their various environments.

This guide is focused on scaling the OpenTelemetry Collector deployment across various Linux hosts to function as both gateways and agents within your observability architecture. Utilizing the OpenTelemetry Collector in this dual capacity enables a robust collection and forwarding of metrics, traces, and logs to analysis and visualization platforms, such as Grafana Cloud

To accelerate the process, we’ll also use Ansible, which is popular with developers for its ease of use and versatility in deploying and managing software. It’s an excellent choice for deploying OpenTelemetry Collector, as it simplifies the automation of installation, setup, and management tasks.

And once we’ve outlined a strategy for deploying and managing the OpenTelemetry Collector’s scalable instances throughout your infrastructure with Ansible, we’ll discuss how you can effectively visualize that data in Grafana. Let’s dive in!

How to scale OTel Collectors across multiple hosts

Before you begin

To get started, you should have the following:

  • Linux hosts with SSH access and sufficient permissions
  • Ansible installed on your base system

Install the Grafana Ansible collection

The Grafana Ansible collection provides modules and roles for managing various resources on Grafana Cloud, as well as roles to manage and deploy Grafana.The OpenTelemetry Collector role is available in the Grafana Ansible collection as of the 3.1.0 release.

To install the Grafana Ansible collection, run this command:

ansible-galaxy collection install grafana.grafana

Create an Ansible inventory file

Next, you need to set up your hosts and create an inventory file.

  1. Create an Ansible inventory file.

An Ansible inventory, which resides in a file named inventory, lists each host IP on a separate line, like this (8 hosts shown):

146.190.208.216    # hostname = ubuntu-01
146.190.208.190    # hostname = ubuntu-02
137.184.155.128    # hostname = centos-01
146.190.216.129    # hostname = centos-02
198.199.82.174     # hostname = debian-01
198.199.77.93       # hostname = debian-02
143.198.182.156    # hostname = fedora-01
143.244.174.246    # hostname = fedora-02

Note: If you are copying the above file, remove the comments (#).

  1. Create an ansible.cfg file within the same directory as inventory, with the following values:
[defaults]
inventory = inventory  # Path to the inventory file
private_key_file = ~/.ssh/id_rsa   # Path to my private SSH Key
remote_user=root

Use the OpenTelemetry Collector Ansible role

Next you will create an Ansible playbook that calls the opentelemetry_collector role from the grafana.grafana Ansible collection. Refer to this documentation to get the credentials needed for configuring the OpenTelemetry Collector to send telemetry to Grafana Cloud.

Create a file named deploy-otelcol.yml in the same directory as your ansible.cfg and inventory.

- name: Install OpenTelemetry Collector
  hosts: all
  become: true

  tasks:
    - name: Install OpenTelemetry Collector
      ansible.builtin.include_role:
        name: grafana.grafana.opentelemetry_collector
      vars:
        otel_collector_extensions:
          basicauth/otlp:
              client_auth:
                username: <instanceID>
                password: <Cloud Access Policy token>

        otel_collector_receivers:
              otlp:
                # https://github.com/open-telemetry/opentelemetry-collector/tree/main/receiver/otlpreceiver
                protocols:
                  grpc:
                  http:
              hostmetrics:
                # Optional. Host Metrics Receiver added as an example of Infra Monitoring capabilities of the OpenTelemetry Collector
                # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/hostmetricsreceiver
                scrapers:
                  load:
                  memory:

        otel_collector_processors:
          batch:
            # https://github.com/open-telemetry/opentelemetry-collector/tree/main/processor/batchprocessor
          resourcedetection:
            # Enriches telemetry data with resource information from the host
            # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/resourcedetectionprocessor
            detectors: ["env", "system"]
            override: false
          transform/add_resource_attributes_as_metric_attributes:
            # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/transformprocessor
            error_mode: ignore
            metric_statements:
              - context: datapoint
                statements:
                  - set(attributes["deployment.environment"], resource.attributes["deployment.environment"])
                  - set(attributes["service.version"], resource.attributes["service.version"])

        otel_collector_exporters:
          otlphttp:
            auth:
              authenticator: basicauth/otlp
            endpoint: <OTLP Endpoint URL>

        otel_collector_service:
          extensions: [basicauth/otlp]
          pipelines:
            traces:
              receivers: [otlp]
              processors: [resourcedetection, batch]
              exporters: [otlphttp]
            metrics:
              receivers: [otlp, hostmetrics]
              processors: [resourcedetection, transform/add_resource_attributes_as_metric_attributes, batch]
              exporters: [otlphttp]
            logs:
              receivers: [otlp]
              processors: [resourcedetection, batch]
              exporters: [otlphttp]

Note: You’ll need to adjust the configuration to match the specific telemetry data you intend to collect and where you plan to forward it. The configuration snippet above is a basic example designed to collect host metrics (via the OpenTelemetry Collector) and forward them to Grafana Cloud.

Run the Ansible playbook

Deploy the OpenTelemetry Collector across your hosts by executing:

sh
ansible-playbook deploy-opentelemetry.yml

Verify data ingestion into Grafana Cloud

Once you’ve deployed the OpenTelemetry Collector and configured it to forward data to Grafana Cloud, you can verify the ingestion:

  • Log into your Grafana Cloud instance.
  • Navigate to the Explore section.
  • Select your Grafana Cloud Prometheus data source from the dropdown menu.
  • Execute a query to confirm the reception of metrics, e.g., {instance="ubuntu-01"} for a specific host’s metrics.

Visualize metrics and logs in Grafana

Metrics example

With data successfully ingested into Grafana Cloud, you can create custom dashboards to visualize the metrics, logs, and traces received from your OTel Collector. You can then use Grafana’s powerful query builder and visualization tools to derive insights from your data effectively.

Here are a few tips to keep in mind before you get started:

  • Consider creating dashboards that offer a comprehensive overview of your infrastructure’s health and performance.
  • Utilize Grafana’s alerting features to proactively manage and respond to issues identified through the OpenTelemetry data.
  • Tailor the Ansible roles, OpenTelemetry Collector configurations, and Grafana dashboards to suit your specific monitoring and observability requirements.

What’s next?

The strategy we outlined here can enhance your monitoring and data visualization capabilities. If you found it useful, here are some additional resources that can help you as you work with OpenTelemetry and Grafana:

  1. Sending telemetry data to Prometheus and viewing it in Grafana OSS: To learn how to directly transmit your telemetry data to a Prometheus instance and analyze the metrics in Grafana OSS, there’s a helpful guide on the OpenTelemetry Blog.
  2. Scaling Grafana Alloy with Ansible: Interested in scaling Alloy? There’s a detailed blog post on scaling Alloy that walks you through how to efficiently use Ansible for scaling purposes. This is a great resource if you’re aiming to enhance the performance and scalability of your Alloy setup.
  3. Scaling the OpenTelemetry Collector: For those looking to extend the capabilities of the OpenTelemetry Collector, the official OpenTelemetry documentation offers comprehensive insights on scaling the collector.
  4. Visualizing OpenTelemetry Data in Grafana Cloud: Finally, if your goal is to visualize the telemetry data from your applications in a more refined and accessible manner using Grafana Cloud, checking out the Application Observability in Grafana Cloud is the way to go.

Grafana Cloud is the easiest way to get started with metrics, logs, traces, dashboards, and more. We have a generous forever-free tier and plans for every use case. Sign up for free now!