.. | ||
dashboards | ||
readme.md |
Prometheus+Grafana in docker
guide-by-example
Purpose
Monitoring of the host and the running cointaners.
Everything here is based on the magnificent
stefanprodan/dockprom.
So maybe just go get that.
Great youtube overview of Prometheus.
Here's my veeam-prometheus-grafana
how to setup pushgateway and send to it info on done backups
and visualize history of that in grafana.
Also soon to be added, Loki for logs,
to get that ntfy alarm when something happens in a log in a docker container.
Prometheus is an open source system for monitoring and alerting,
written in golang.
It periodicly collects metrics from configured targets,
exposes collected metrics for visualization, and can trigger alerts.
Prometheus is relatively young project, it is a pull type monitoring
and consists of several components.
- Prometheus Server is the core of the system, responsible for
- pulling new metrics
- storing the metrics in a database and evaluating them
- making metrics available through PromQL API
- Targets - machines, services, applications that are monitored.
These need to have an exporter.- exporter - a script or a service that gathers metrics on the target, converts them to prometheus server format, and exposes them at an endpoint so they can be pulled
- AlertManager - responsible for handling alerts from Prometheus Server,
and sending notifications through email, slack, pushover,..
In this setup ntfy webhook will be used.
Grafana comes with own alerts, but grafana kinda feels... b-tier - pushgateway - allows push type of monitoring.
Should not be overused as it goes against the pull philosophy of prometheus.
Most commonly it is used to collect data from batch jobs, or from services
that have short execution time. Like a backup script.
Here's my use of it to monitor veeam backup servers. - Grafana - for web UI visualization of the collected metrics
Files and directory structure
/home/
└── ~/
└── docker/
└── prometheus/
├─── alertmanager/
├─── grafana/
├─── grafana-data/
├─── prometheus-data/
├── docker-compose.yml
├── .env
└── prometheus.yml
alertmanager/
- ...grafana/
- a directory containing grafanas configs and dashboardsgrafana-data/
- a directory where grafana stores its dataprometheus-data/
- a directory where prometheus stores its database and data.env
- a file containing environment variables for docker composedocker-compose.yml
- a docker compose file, telling docker how to run the containersprometheus.yml
- a configuration file for prometheus
The three files must be provided.
The directories are created by docker compose on the first run.
docker-compose
- Prometheus - prometheus server, pulling, storing, evaluating metrics
- Grafana - web UI visualization of the collected metrics in nice dashboards
- NodeExporter - an exporter for linux machines, in this case gathering the metrics of the linux machine runnig docker, like uptime, cpu load, memory use, network bandwidth use, disk space,...
- cAdvisor - exporter for gathering docker containers metrics, showing cpu, memory, network use of each container
- alertmanager - guess what that one do
docker-compose.yml
services:
# MONITORING SYSTEM AND THE METRICS DATABASE
prometheus:
image: prom/prometheus:v2.42.0
container_name: prometheus
hostname: prometheus
restart: unless-stopped
user: root
depends_on:
- cadvisor
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--web.console.templates=/etc/prometheus/consoles'
- '--storage.tsdb.retention.time=200h'
- '--web.enable-lifecycle'
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- ./prometheus_data:/prometheus
ports:
- 9090:9090
labels:
org.label-schema.group: "monitoring"
# WEB BASED UI VISUALISATION OF THE METRICS
grafana:
image: grafana/grafana:9.3.6
container_name: grafana
hostname: grafana
restart: unless-stopped
env_file: .env
user: root
volumes:
- ./grafana_data:/var/lib/grafana
- ./grafana/provisioning/dashboards:/etc/grafana/provisioning/dashboards
- ./grafana/provisioning/datasources:/etc/grafana/provisioning/datasources
expose:
- 3000
labels:
org.label-schema.group: "monitoring"
# HOST MACHINE METRICS EXPORTER
nodeexporter:
image: prom/node-exporter:v1.5.0
container_name: nodeexporter
hostname: nodeexporter
restart: unless-stopped
command:
- '--path.procfs=/host/proc'
- '--path.rootfs=/rootfs'
- '--path.sysfs=/host/sys'
- '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
expose:
- 9100
labels:
org.label-schema.group: "monitoring"
# DOCKER CONTAINERS METRICS EXPORTER
cadvisor:
image: gcr.io/cadvisor/cadvisor:v0.47.1
container_name: cadvisor
hostname: cadvisor
restart: unless-stopped
privileged: true
devices:
- /dev/kmsg:/dev/kmsg
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker:/var/lib/docker:ro
- /cgroup:/cgroup:ro #doesn't work on MacOS only for Linux
expose:
- 3000
labels:
org.label-schema.group: "monitoring"
# NOTIFICATIONS MANAGMENT
alertmanager:
image: prom/alertmanager:v0.25.0
container_name: alertmanager
hostname: alertmanager
restart: unless-stopped
volumes:
- ./alertmanager:/etc/alertmanager
command:
- '--config.file=/etc/alertmanager/config.yml'
- '--storage.path=/alertmanager'
expose:
- 9093
labels:
org.label-schema.group: "monitoring"
networks:
default:
name: $DOCKER_MY_NETWORK
external: true
.env
# GENERAL
MY_DOMAIN=example.com
DOCKER_MY_NETWORK=caddy_net
TZ=Europe/Bratislava
# GRAFANA
GF_SECURITY_ADMIN_USER=admin
GF_SECURITY_ADMIN_PASSWORD=admin
GF_USERS_ALLOW_SIGN_UP=false
All containers must be on the same network.
Which is named in the .env
file.
If one does not exist yet: docker network create caddy_net
prometheus.yml
A config file for prometheus, bind mounted in to prometheus container.
Contains the bare minimum setup of targets from where metrics are to be pulled.
Stefanprodan gives a custom shorter scrape intervals, but I feel thats not really necessary.
prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
# A scrape configuration containing exactly one endpoint to scrape.
scrape_configs:
- job_name: 'nodeexporter'
static_configs:
- targets: ['nodeexporter:9100']
- job_name: 'cadvisor'
static_configs:
- targets: ['cadvisor:8080']
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
Reverse proxy
Caddy v2 is used, details
here.
Caddyfile
graf.{$MY_DOMAIN} {
reverse_proxy grafana:3000
}
prom.{$MY_DOMAIN} {
reverse_proxy prometheus:9090
}
push.{$MY_DOMAIN} {
reverse_proxy pushgateway:9091
}
First run and Grafana configuration
- login admin/admin to
graf.example.com
, change the password - add Prometheus as a Data source in configuration
set URL tohttp://prometheus:9090
- import dashboards from json files in this repo
These dashboards are the preconfigured ones from
stefanprodan/dockprom
with few changes.
docker_host.json
did not show free disk space, it needed fstype
changed from
aufs
to ext4
. Also a fix
for host network monitoring not showing traffick. And in all of them
the time interval is set to show last 1h instead of last 15m
- docker_host.json - dashboard showing linux host machine metrics
- docker_containers.json - dashboard showing docker containers metrics,
except the ones labeled as
monitoring
in the compose file - monitoring_services.json - dashboar showing docker containers metrics
of containers that are labeled
monitoring
Update
Manual image update:
docker-compose pull
docker-compose up -d
docker image prune
Backup and restore
Backup
Using borg that makes daily snapshot of the entire directory.
Restore
- down the prometheus containers
docker-compose down
- delete the entire prometheus directory
- from the backup copy back the prometheus directory
- start the containers
docker-compose up -d