You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

3.8 KiB

checkmk

guide-by-example

logo

Purpose

Monitoring of machines, containers, services, logs, ...

Monitoring in this case means gathering and showing information on how services or machines or containers are running. Can be cpu, io, ram, disk use, network throughput, latency,... can be number of http requests, errors, results of backups...

Overview

Good youtube overview.

checkmk is a fork of nagios and is mostly written in python.
Interesting fact is that there is no database where data are stored, RRD files for metrics and plaintext logs for everything else.

overview

Editions

Docs

  • raw - 100% open source, unlimited use, some features are missing or are harder to set up. For example no containers monitoring, no push mode from agents.
  • cloud - full featured with better performing version of the monitoring micro core, but with 750 services limit

I am gonna go with cloud for now, as 750 sounds like enough for my use cases.

Files and directory structure

/home/
 └── ~/
     └── docker/
         └── checkmk/
             ├── 🗁 checkmk_data/
             ├── 🗋 docker-compose.yml
             └── 🗋 .env
  • checkmk_data/ - a directory where checkmk_data stores its persistent data
  • .env - a file containing environment variables for docker compose
  • docker-compose.yml - a docker compose file, telling docker how to run the containers

The two files must be provided.
The directory is created by docker compose on the first run.

docker-compose

A simple compose.
Of note is use of ram as tmpfs mount into the container and setting a 1024 limit for max open files by a single process.

Note - the port is only expose, since theres expectation of use of a reverse proxy and accessing the services by hostname, not ip and port.

Docs on ports used in cmk.

docker-compose.yml

services:
  checkmk:
    image: checkmk/check-mk-cloud
    container_name: checkmk
    hostname: checkmk
    restart: unless-stopped
    env_file: .env
    ulimits:
      nofile: 1024
    tmpfs:
      - /opt/omd/sites/cmk/tmp:uid=1000,gid=1000
    volumes:
      - ./checkmk_data:/omd/sites
      - /etc/localtime:/etc/localtime:ro
    expose:
      - "5000"      # webgui
    ports:
      - 8000:8000   # agents who push

networks:
  default:
    name: $DOCKER_MY_NETWORK
    external: true

.env

# GENERAL
DOCKER_MY_NETWORK=caddy_net
TZ=Europe/Bratislava

# CMK
CMK_SITE_ID=dom
CMK_PASSWORD=WUx666yd0qCWh

All containers must be on the same network.
Which is named in the .env file.
If one does not exist yet: docker network create caddy_net

Reverse proxy

Caddy v2 is used, details here.

Caddyfile

cmk.{$MY_DOMAIN} {
  reverse_proxy checkmk:5000
}

First run

Agents

Push

Alerts

Logs

Update

Manual image update:

  • docker-compose pull
  • docker-compose up -d
  • docker image prune

Backup and restore

Backup

Using borg that makes daily snapshot of the entire directory.

Restore

  • down the containers docker-compose down
  • delete the entire monitoring directory
  • from the backup copy back the monitoring directory
  • start the containers docker-compose up -d