You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

372 lines
11 KiB
Markdown

4 years ago
# Prometheus+Grafana in docker
4 years ago
###### guide-by-example
4 years ago
![logo](https://i.imgur.com/e03aF8d.png)
4 years ago
# Purpose
4 years ago
Monitoring of the host and the running cointaners.
* [Official site](https://prometheus.io/)
* [Github](https://github.com/prometheus)
* [DockerHub](https://hub.docker.com/r/prom/prometheus/)
4 years ago
[Good overview](https://youtu.be/h4Sl21AKiDg) of Prometheus.</br>
4 years ago
Everything here is based on the magnificent
4 years ago
[stefanprodan/dockprom](https://github.com/stefanprodan/dockprom),</br>
So maybe just go get that.
---
Prometheus is an open source system application used for monitoring and alerting.
It collects metrics from configured targets at given intervals,
expose collected metrics for visualization, evaluates rule expressions,
and can trigger alerts if some condition is observed to be true.
Prometheus is relatively new project, it is a **pull type** monitoring
and consists of several components.
* **Prometheus Server** is the core of the system, responsible for
* pulling new metrics
* storing the metrics in a database and evaluating them
* making metrics available through PromQL API
* **Targets** - machines, services, applications that are monitored.</br>
These needs to have an **exporter**.
4 years ago
* **exporter** - a script or a service that gathers metrics on the target,
4 years ago
converts them for prometheus server format,
and exposes them at an endpoint so they can be pulled
* **AlertManager** - responsible for handling alerts from Prometheus Server,
4 years ago
and sending notifications through email, slack, pushover,..
4 years ago
* **pushgateway** - allows push type of monitoring.
4 years ago
Should not be overused as it goes against the pull philosophy of prometheus.
Most commonly it is used to collect data from batch jobs, or from services
that have short execution time. Like a backup script.
4 years ago
* **Grafana** - for web UI visualization of the collected metrics
[glossary](https://prometheus.io/docs/introduction/glossary/)
![prometheus components](https://i.imgur.com/AxJCg8C.png)
4 years ago
4 years ago
# Files and directory structure
4 years ago
```
4 years ago
/home/
└── ~/
└── docker/
└── prometheus/
4 years ago
4 years ago
├── grafana/
│ └── provisioning/
│ ├── dashboards/
│ │ ├── dashboard.yml
│ │ ├── docker_host.json
│ │ ├── docker_containers.json
│ │ └── monitor_services.json
4 years ago
│ │
4 years ago
│ └── datasources/
│ └── datasource.yml
4 years ago
4 years ago
├── grafana-data/
├── prometheus-data/
4 years ago
4 years ago
├── .env
├── docker-compose.yml
└── prometheus.yml
4 years ago
```
4 years ago
* `grafana/` - a directory containing grafanas configs and dashboards
* `grafana-data/` - a directory where grafana stores its data
* `prometheus-data/` - a directory where prometheus stores its database and data
4 years ago
* `.env` - a file containing environment variables for docker compose
4 years ago
* `docker-compose.yml` - a docker compose file, telling docker how to run the containers
4 years ago
* `prometheus.yml` - a configuration file for prometheus
4 years ago
All files must be provided.</br>
As well as `grafana` directory and its subdirectories and files.
4 years ago
the directories `grafana-data` and `prometheus-data` are created
by docker compose on the first run.
4 years ago
# docker-compose
4 years ago
4 years ago
Four containers to spin up.</br>
4 years ago
While [stefanprodan/dockprom](https://github.com/stefanprodan/dockprom)
4 years ago
also got alertmanager and pushgateway, this is a simpler setup for now.</br>
Just want pretty graphs.
4 years ago
* **Prometheus** - prometheus server, pulling, storing, evaluating metrics
* **Grafana** - web UI visualization of the collected metrics
in nice dashboards
* **NodeExporter** - an exporter for linux machines,
in this case gathering the metrics of the linux machine runnig docker,
4 years ago
like uptime, cpu load, memory use, network bandwidth use, disk space,...
4 years ago
* **cAdvisor** - exporter for gathering docker **containers** metrics,
4 years ago
showing cpu, memory, network use of each container
4 years ago
`docker-compose.yml`
```yml
version: '3'
services:
# MONITORING SYSTEM AND THE METRICS DATABASE
prometheus:
image: prom/prometheus
container_name: prometheus
hostname: prometheus
restart: unless-stopped
user: root
depends_on:
- cadvisor
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=200h'
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--web.console.templates=/etc/prometheus/consoles'
- '--web.enable-lifecycle'
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- ./prometheus_data:/prometheus
labels:
org.label-schema.group: "monitoring"
# WEB BASED UI VISUALISATION OF THE METRICS
grafana:
image: grafana/grafana
container_name: grafana
hostname: grafana
restart: unless-stopped
user: root
environment:
- GF_SECURITY_ADMIN_USER
- GF_SECURITY_ADMIN_PASSWORD
- GF_USERS_ALLOW_SIGN_UP
volumes:
- ./grafana_data:/var/lib/grafana
- ./grafana/provisioning:/etc/grafana/provisioning
labels:
org.label-schema.group: "monitoring"
# HOSTS METRICS COLLECTOR
nodeexporter:
image: prom/node-exporter
container_name: nodeexporter
hostname: nodeexporter
restart: unless-stopped
command:
- '--path.procfs=/host/proc'
- '--path.rootfs=/rootfs'
- '--path.sysfs=/host/sys'
- '--collector.filesystem.ignored-mount-points=^/(sys|proc|dev|host|etc)($$|/)'
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
labels:
org.label-schema.group: "monitoring"
# DOCKER CONTAINERS METRICS COLLECTOR
cadvisor:
image: google/cadvisor
container_name: cadvisor
hostname: cadvisor
restart: unless-stopped
volumes:
- /:/rootfs:ro
- /var/run:/var/run:rw
- /sys:/sys:ro
- /var/lib/docker:/var/lib/docker:ro
- /cgroup:/cgroup:ro
labels:
org.label-schema.group: "monitoring"
networks:
default:
external:
4 years ago
name: $DOCKER_MY_NETWORK
4 years ago
```
`.env`
```bash
# GENERAL
4 years ago
MY_DOMAIN=example.com
4 years ago
DOCKER_MY_NETWORK=caddy_net
4 years ago
TZ=Europe/Bratislava
4 years ago
# GRAFANA
GF_SECURITY_ADMIN_USER=admin
GF_SECURITY_ADMIN_PASSWORD=admin
GF_USERS_ALLOW_SIGN_UP=false
```
**All containers must be on the same network**.</br>
4 years ago
Which is named in the `.env` file.</br>
4 years ago
If one does not exist yet: `docker network create caddy_net`
4 years ago
# Prometheus configuration
4 years ago
4 years ago
#### prometheus.yml
4 years ago
* /prometheus/**prometheus.yml**
[Official documentation.](https://prometheus.io/docs/prometheus/latest/configuration/configuration/)
4 years ago
4 years ago
A config file for prometheus, bind mounted in to prometheus container.</br>
4 years ago
Contains the bare minimum setup of targets from where metrics are to be pulled.
4 years ago
`prometheus.yml`
```yml
global:
scrape_interval: 15s
evaluation_interval: 15s
# A scrape configuration containing exactly one endpoint to scrape.
scrape_configs:
- job_name: 'nodeexporter'
scrape_interval: 5s
static_configs:
- targets: ['nodeexporter:9100']
- job_name: 'cadvisor'
scrape_interval: 5s
static_configs:
- targets: ['cadvisor:8080']
- job_name: 'prometheus'
scrape_interval: 10s
static_configs:
- targets: ['localhost:9090']
```
4 years ago
4 years ago
# Grafana configuration
Some of the grafana config files could be ommited
and info passed on the first run, or through settings.
But setting it through GUI wont generate these files which hinders backup
and ease of migration.
4 years ago
#### datasource.yml
4 years ago
4 years ago
* /prometheus/grafana/provisioning/datasources/**datasource.yml**
[Official documentation.](https://grafana.com/docs/grafana/latest/administration/provisioning/#datasources)
4 years ago
4 years ago
Grafana's datasources config file, from where it suppose to get metrics.</br>
4 years ago
In this case it points at the prometheus container.
4 years ago
`datasource.yml`
```yml
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
orgId: 1
url: http://prometheus:9090
basicAuth: false
isDefault: true
4 years ago
editable: false
4 years ago
```
4 years ago
#### dashboard.yml
4 years ago
4 years ago
* /prometheus/grafana/provisioning/dashboards/**dashboard.yml**
[Official documentation](https://grafana.com/docs/grafana/latest/administration/provisioning/#dashboards)
4 years ago
4 years ago
Config file telling grafana from where to load dashboards.
4 years ago
`dashboard.yml`
```yml
apiVersion: 1
providers:
- name: 'Prometheus'
orgId: 1
folder: ''
type: file
disableDeletion: false
4 years ago
editable: false
allowUiUpdates: false
4 years ago
options:
path: /etc/grafana/provisioning/dashboards
```
4 years ago
#### \<dashboards>.json
4 years ago
4 years ago
* /prometheus/grafana/provisioning/dashboards/**<dashboards.json>**
4 years ago
4 years ago
[Official documentation.](https://grafana.com/docs/grafana/latest/reference/dashboard/)
The dashboards files are in
[the dashboards](https://github.com/DoTheEvo/selfhosted-apps-docker/tree/master/prometheus_grafana/dashboards)
4 years ago
directory of this repository.
4 years ago
Preconfigured dashboards from
4 years ago
[stefanprodan/dockprom](https://github.com/stefanprodan/dockprom).</br>
4 years ago
Mostly unchanged, except for the default time range shown,
changed from 15min to 1hour,
4 years ago
and [a fix](https://github.com/stefanprodan/dockprom/issues/18#issuecomment-487023049)
for host network monitoring not showing traffick.
4 years ago
4 years ago
* **docker_host.json** - dashboard showing linux host metrics
* **docker_containers.json** - dashboard showing docker containers metrics,
except the ones labeled as `monitoring` in the compose file
* **monitoring_services.json** - dashboar showing docker containers metrics
4 years ago
of containers that are labeled `monitoring`, which are this repo containers.
4 years ago
4 years ago
# Reverse proxy
4 years ago
4 years ago
Caddy v2 is used, details
[here](https://github.com/DoTheEvo/selfhosted-apps-docker/tree/master/caddy_v2).</br>
4 years ago
4 years ago
The setup is accessed through grafana.
But occasionally there might be need to check with prometheus,
4 years ago
which will be available on \<docker-host-ip>:9090.</br>
For that to work, Caddy will also need port 9090 published.
4 years ago
4 years ago
`Caddyfile`
```
grafana.{$MY_DOMAIN} {
reverse_proxy grafana:3000
}
4 years ago
4 years ago
:9090 {
reverse_proxy prometheus:9090
}
```
4 years ago
4 years ago
*Extra info:* `:9090` is short notation for `localhost:9090`
4 years ago
4 years ago
---
4 years ago
![interface-pic](https://i.imgur.com/wzwgBkp.png)
4 years ago
4 years ago
# Update
4 years ago
4 years ago
[Watchtower](https://github.com/DoTheEvo/selfhosted-apps-docker/tree/master/watchtower)
updates the image automatically.
4 years ago
4 years ago
Manual image update:
- `docker-compose pull`</br>
- `docker-compose up -d`</br>
- `docker image prune`
4 years ago
4 years ago
# Backup and restore
4 years ago
4 years ago
#### Backup
Using [borg](https://github.com/DoTheEvo/selfhosted-apps-docker/tree/master/borg_backup)
that makes daily snapshot of the entire directory.
4 years ago
4 years ago
#### Restore
* down the prometheus containers `docker-compose down`</br>
* delete the entire prometheus directory</br>
* from the backup copy back the prometheus directory</br>
* start the containers `docker-compose up -d`