Categories
Uncategorized

Centralized Logging on Nomad

In this guide, I share how I implemented centralized logging on my Nomad cluster. One important thing to note about my configuration is that most of my workload runs in Docker managed by Nomad. I will share some of my configurations, but you may need to adapt for your specific needs.

I still need to provide full working Nomad Specifications, I suppose I’ll put them on GitHub/GitLab once I’ve cleaned them up.

Loki is a logging solution produced by Grafana Labs. I have chosen to use Loki as the log interface for my home lab Nomad cluster due to its simplicity and my desire to build out a single pane of glass for all of my cluster’s metrics.

There are three main components in the Loki stack:

  1. A log forwarder (I use two: promtail and fluent-bit, we’ll discuss this in a bit.)
  2. Loki – the log aggregator
  3. Grafana – the graphical interface used to query logs.

There are a few things that I like about Loki. First, the architecture is fairly simple to rationalize. In my implementation, I run fluent-bit as a logging driver on each of the Nomad hosts. Subsequently, I run Loki to accept the forwarded logs from fluent-bit. A simple query in Grafana and I have logs from all of my containers available to me with enough metadata to get an idea of what’s happening with the application.

One of my criticisms is that the documentation that Grafana offers up is pretty scattered and you’re left to kind of figure out the missing pieces.

Configuration

It took me a little bit of tweaking to figure out how to produce the output that works best for me. I will express most of my configuration through Nomad configurations. I like the idea of having all of the necessary bits to run an application in one file. I’ll walk through my Nomad Job Specification files. I’ll make them available in full. TBD.

Loki

My configuration isn’t necessarily what you’d want for a production environment, it is fairly solid for a development environment.

task "loki" {
  driver = "docker"
  config {
    image = "grafana/loki:latest"
    port_map {
      http = 3100
    }
    args = ["-config.file=/local/loki-config.yaml"]
  }

Grafana offers a Docker container image with the necessary components called grafana/loki. The HTTP interface runs on port 3100. I then tell Loki to use a templated configuration file that we’ll review later on in the file.

resources {
  cpu    = 500 # 500 MHz
  memory = 256 # 256MB
  network {
    mbits = 10
    port  "http"  { static = "4444" }
  }
}

At this time, I have allocated meager resources for the 1 Loki service running on Nomad. I set a static port so I’m able to easily leverage internal networking. While this isn’t necessary, it makes integration into Grafana easier. I chose port 4444, but you are free to set a non-conflicting port that works for your setup. The next portion is the configuration.

template {
        data        = <<EOH
auth_enabled: false

server:
  http_listen_port: 3100

ingester:
  lifecycler:
    address: 127.0.0.1
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1
    final_sleep: 0s
  chunk_idle_period: 5m
  chunk_retain_period: 30s
  max_transfer_retries: 0

schema_config:
  configs:
    - from: 2018-04-15
      store: boltdb
      object_store: filesystem
      schema: v11
      index:
        prefix: index_
        period: 168h

storage_config:
  boltdb:
    directory: /tmp/loki/index

  filesystem:
    directory: /tmp/loki/chunks

limits_config:
  enforce_metric_name: false
  reject_old_samples: true
  reject_old_samples_max_age: 168h

chunk_store_config:
  max_look_back_period: 0s

table_manager:
  retention_deletes_enabled: false
  retention_period: 0s
EOH
        destination = "local/loki-config.yaml"
        env         = false
      }

The configuration is pretty much the example from the Docker Compose exapmle that Grafana provides for local development. It uses in-container storage mechanisms and has defaults setup for the ingester. Admittedly I haven’t played around too much with the configuration yet, but I’m excited to start digging into some of the tuneables.

Future work

Some things I’d like to look into:

Now that we have a destination for logs, let’s connect it to Grafana so we can see logs come into the system.

Grafana

Grafana 7.1 comes with Loki Data Sources available by default and the Explorer is enabled so you can start querying logs right away.

Setup is fairly straight-forward. I use Consul and the static port we allocated in the Loki Configuration section to connect over the private network. I really wanted to run this through my load balancer to have SSL termination, but the LB is public and I haven’t implemented authentication yet to protect Loki from the public Internet.

Now that we have a way to verify logs are flowing into Loki, we can start sending some logs into the system.

Fluent-bit: Docker Logs

Admittedly this was a little tricky to figure out, I am still tweaking my configuration, but I have something workable. I run Fluent-Bit on each of my Nomad client nodes as a Nomad Job.

Nomad Job Specification

This configuration was a bit tricky to figure out and requires knowledge of a couple tidbits on how Fluent Bit works. The generic Fluent Bit, offered by the Fluent company, has a plugin system that allows developers to write input or output plugins. Fluent Bit is a compiled executable as it is written in the C language. In order to load a plugin you must include the shared library object.

Grafana has written a Loki plugin for Fluent Bit and bundles it in their grafana/fluent-bit-plugin-loki Docker image.

In order to get Fluent Bit to read the templated configuration, you must pass the -c flag. To include the proper shared library, you must supply the -e flag.

You can see the “magic” in Grafana’s Dockerfile for Fluent Bit.

I elected to run this as a “system” job meaning that Nomad will place this job on each client in the cluster, which is what we’ll want later when I tell the container to send its logs to <HOST IP>:24224.

job "fluent-bit" {
  type = "system"
    task "fluent-bit" {
      driver = "docker"
      config {
        image = "grafana/fluent-bit-plugin-loki:latest"
        port_map {
          fluentd = 24224
        }
        command = "/fluent-bit/bin/fluent-bit"
        args = ["-c", "/local/fluent-bit.conf", "-e", "/fluent-bit/bin/out_loki.so"]
      }
      ...

Copy

The actual configuration file is pretty terse. In this case, Fluent Bit is listening for logs on 0.0.0.0:24224 and will forward the logs onto Loki. I strip out “source” and “container_id” from the Docker log JSON payload. We create labels “job” and “hostname” and use them as the base Loki log structure. We then instruct Fluent Bit to yank the “container_name” field from the Docker log and inject it into the Loki log structure.

See the documentation for each Loki plugin option.

template {
        destination = "/local/fluent-bit.conf"
        data = <<EOH
[INPUT]
    Name        forward
    Listen      0.0.0.0
    Port        24224
[Output]
    Name loki
    Match *
    Url http://loki.service.dc1.kwojo:4444/loki/api/v1/push
    RemoveKeys source,container_id
    Labels {job="fluent-bit", hostname="{{env "attr.unique.hostname" }}"}
    LabelKeys container_name
    BatchWait 1
    BatchSize 1001024
    LineFormat json
    LogLevel info
EOH
      }

Copy

For my current setup, I’ve provisioned pretty conservative resources while I gain more familiarity, test, and tweak. The most import part is to set a static port binding, in this case 24224.

resources {
  cpu    = 500 # 500 MHz
  memory = 256 # 256MB
  network {
    mode = "host"
    mbits = 10
    port  "fluentd"  { static = 24224 }
  }
}

Sending Docker Logs to Loki

Docker allows you to specify various log drivers. Nomad facilites this with the logging stanza. For services that I want to log to Loki, I copy/paste the logging stanza below.

The magic is done through environment variables that are managed by Nomad, namely ${atrr.unique.network.ip-address} which evaulates to the IP address of the Nomad client where the container is placed. Since we have Fluent Bit listening on all client hosts as a Nomad System Service, we get all logs from all containers.

task "wiki-js" {
  driver = "docker"
  config {
    image = "requarks/wiki:2"
    port_map {
      http = 3000
    }
    logging {
      type = "fluentd"
      config {
        fluentd-address = "${attr.unique.network.ip-address}:24224"
      }
    }
  }

Syslogs

TBD

Querying Logs

It takes a little while to get used to how Loki handles searching and filtering, but once I got the hang of it, it became really fast and powerful to gain insight into my workloads.

In the query bar, {job="fluent-bit"} will return all logs that Fluent Bit is sending into Loki. Due to our configuration our final logging structure looks similar to this.

Examples

References

Leave a Reply

Your email address will not be published. Required fields are marked *