Loki tutorial: How to send logs from Amazon's ECS to Loki

Published: 6 Aug 2020 RSS

Elastic Container Service (ECS) is the fully managed container orchestration service by Amazon. Combined with Fargate, Amazon’s serverless compute engine for containers, you can run your container workload without the need to provision your own compute resources.

But how can you consolidate and query all of your logs and metadata for these workloads? Enter Loki, the log aggregation system from Grafana Labs that has proven to increase performance and decrease costs.

In this tutorial, you’ll learn how to leverage Firelens, an AWS log router, to forward your logs and your workload metadata to a Loki instance.

As a result, you will be able to query all your logs in one place using Grafana.

For those using a Promtail agent, check out my previous tutorials on how to set up Promtail on an AWS EC2 instance or on AWS EKS.

This blog post will cover:

Requirements

Before we start you’ll need:

  • The AWS CLI configured (run aws configure).
  • A Grafana instance with a Loki data source already configured.
  • A Subnet in VPC that is routable from the internet. (Follow these instructions if you need to create one.)
  • A security group of your choice for your containers. (Follow these instructions if you need to create one.)

For the sake of simplicity, we’ll use a Grafana Cloud Loki and Grafana instance (you can get a free 30-day trial for Grafana Cloud Loki), but all the steps are the same if you’re running your own open source version of Loki and Grafana instances.

Setting up the ECS cluster

To run containers with ECS, you need an ECS cluster. We’ll use a Fargate cluster, but if you prefer to use an EC2 cluster all the given steps are still applicable.

Let’s create the cluster with awscli:

aws ecs create-cluster --cluster-name ecs-firelens-cluster

We will also need an IAM Role to run containers with so let’s create a new one and authorize ECS to endorse this role.

You might already have this ecsTaskExecutionRole role in your AWS account. If that’s the case you can skip this step.

curl https://raw.githubusercontent.com/grafana/loki/master/docs/aws/ecs/ecs-role.json > ecs-role.json
aws iam create-role --role-name ecsTaskExecutionRole  --assume-role-policy-document file://ecs-role.json

{
    "Role": {
        "Path": "/",
        "RoleName": "ecsTaskExecutionRole",
        "RoleId": "AROA5FW5RZWLXFPU656SQ",
        "Arn": "arn:aws:iam::0000000000:role/ecsTaskExecutionRole",
        "CreateDate": "2020-07-09T14:51:49+00:00",
        "AssumeRolePolicyDocument": {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Effect": "Allow",
                    "Principal": {
                        "Service": [
                            "ecs-tasks.amazonaws.com"
                        ]
                    },
                    "Action": "sts:AssumeRole"
                }
            ]
        }
    }
}

Note the ARN of this new role; we’ll use it later to create an ECS task.

Finally we’ll give the ECS task execution policy (AmazonECSTaskExecutionRolePolicy) to the created role. This will allows us to manage logs with Firelens:

aws iam attach-role-policy --role-name ecsTaskExecutionRole --policy-arn "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"

Creating your task definition

Amazon Firelens is a log router (usually fluentd or fluentbit) you run along the same task definition next to your application containers to route their logs to Loki.

In this example, we will use fluentbit (with the Loki plugin installed), but if you prefer fluentd make sure to check the documentation.

We recommend you use fluentbit as it uses fewer resources than fluentd.

Our task definition will be made of two containers: the Firelens log router to send logs to Loki (log_router) and a sample application to generate log with (sample-app).

Let’s download the task definition, and we’ll go through the most important parts.

curl https://raw.githubusercontent.com/grafana/loki/master/docs/aws/ecs/ecs-task.json > ecs-task.json
 {
    "essential": true,
    "image": "grafana/fluent-bit-plugin-loki:1.5.0-amd64",
    "name": "log_router",
    "firelensConfiguration": {
        "type": "fluentbit",
        "options": {
            "enable-ecs-log-metadata": "true"
        }
    },
    "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
            "awslogs-group": "firelens-container",
            "awslogs-region": "us-east-2",
            "awslogs-create-group": "true",
            "awslogs-stream-prefix": "firelens"
        }
    },
    "memoryReservation": 50
},

The log_router container image is the fluentbit Loki docker image which contains the Loki plugin pre-installed. As you can see, the firelensConfiguration type is set to fluentbit and we’ve also added options to enable ECS log metadata. This will be useful when querying your logs with Loki LogQL label matchers.

The logConfiguration is mostly there for debugging the fluentbit container, but feel free to remove that part when you’re done testing and configuring.

 {
    "command": [
        "/bin/sh -c \"while true; do sleep 15 ;echo hello_world; done\""
    ],
    "entryPoint": ["sh","-c"],
    "essential": true,
    "image": "alpine:3.12",
    "logConfiguration": {
        "logDriver": "awsfirelens",
        "options": {
            "Name": "loki",
            "Url": "https://<userid>:<grafancloud apikey>@logs-prod-us-central1.grafana.net/loki/api/v1/push",
            "Labels": "{job=\"firelens\"}",
            "RemoveKeys": "container_id,ecs_task_arn",
            "LabelKeys": "container_name,ecs_task_definition,source,ecs_cluster",
            "LineFormat": "key_value"
        }
    },
    "name": "sample-app"
}

The second container is our sample-app, a simple alpine container that prints to stdout welcoming messages. To send those logs to Loki, we will configure this container to use the log driver awsfirelens.

Go ahead and replace the Url property with your GrafanaCloud credentials. You can find them in your account in the Loki instance page. If you’re running your own Loki instance completely replace the URL (e.g http://my-loki.com:3100/loki/api/v1/push).

All options of the logConfiguration will be automatically translated into fluentbit ouput. For example, the above options will produce this fluent bit OUTPUT config section:

[OUTPUT]
    Name loki
    Match awsfirelens*
    Url https://<userid>:<grafancloud apikey>@logs-prod-us-central1.grafana.net/loki/api/v1/push
    Labels {job="firelens"}
    RemoveKeys container_id,ecs_task_arn
    LabelKeys container_name,ecs_task_definition,source,ecs_cluster
    LineFormat key_value

This OUTPUT config will forward logs to GrafanaCloud Loki. To learn more about those options, make sure to read the documentation of the Loki output. We’ve kept some interesting and useful labels such as container_name, ecs_task_definition , source and ecs_cluster, but you can statically add more via the Labels option.

If you want to run multiple containers in your task, all of them need a logConfiguration section. This gives you the opportunity to add different labels depending on the container.

{
    "containerDefinitions": [
     ...
    ],
    "cpu": "256",
    "executionRoleArn": "arn:aws:iam::00000000:role/ecsTaskExecutionRole",
    "family": "loki-fargate-task-definition",
    "memory": "512",
    "networkMode": "awsvpc",
    "requiresCompatibilities": [
        "FARGATE"
    ]
}

Finally, you need to replace the executionRoleArn with the ARN of the role we created in the first section.

Once you’ve finished editing the task definition, we can then run the command below to create the task:

aws ecs register-task-definition --region us-east-2 --cli-input-json  file://ecs-task.json

Now let’s create and start a service.

Running your service

To run the service you need to provide the task definition name loki-fargate-task-definition:1 which is the combination of task family plus the task revision :1. You also need your own subnet and security group. You can replace respectively subnet-306ca97d and sg-02c489bbdeffdca1d in the command below and start the your service:

aws ecs create-service --cluster ecs-firelens-cluster \
--service-name firelens-loki-fargate \
--task-definition loki-fargate-task-definition:1 \
--desired-count 1 --region us-east-2 --launch-type "FARGATE" \
--network-configuration "awsvpcConfiguration={subnets=[subnet-306ca97d],securityGroups=[sg-02c489bbdeffdca1d],assignPublicIp=ENABLED}"

Make sure public (assignPublicIp) is enabled otherwise ECS won’t connect to the Internet and you won’t be able to pull external docker images.

You can now access the ECS console and you should see your task running. Now let’s open Grafana and use explore with the Loki data source to explore our task logs. Enter the query {job="firelens"} and you should see our sample-app logs showing up as shown below:

Loki tutorial: Send logs from ECS to Loki

Using the Log Labels dropdown you should be able to discover your workload via the ECS metadata, which is also visible if you expand a log line.

And that’s it! Make sure to check out the LogQL to learn more about the powerful Loki query language.

Related Posts

People in the community have long used Grafana and NGINX together. A new partnership is focused on delivering an experience that allows them to continue to innovate on top of the tools.
How Grafana Labs leverages the regexp syntax package to simplify and improve Loki regex performance
It's a wrap! GrafanaCONline ended on Friday with sessions on documentation and the business and people of Grafana Labs. Here are all the highlights of the past three weeks.