You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

836 lines
24 KiB
Markdown

4 years ago
# Prometheus+Grafana in docker
4 years ago
###### guide-by-example
4 years ago
1 year ago
![logo](https://i.imgur.com/q41QfyI.png)
4 years ago
1 year ago
WORK IN PROGRESS<br>
Loki and caddy monitoring parts are not finished yet
4 years ago
# Purpose
4 years ago
Monitoring of the host and the running cointaners.
1 year ago
* [Official Prometheus](https://prometheus.io/)
* [Official Grafana](https://grafana.com/)
* [Official Loki](https://grafana.com/oss/loki/)
4 years ago
1 year ago
Monitoring in this case means gathering and showing information on how services
or machines or containers are running. Can be cpu, io, ram, disk use...
1 year ago
can be number of http requests, errors, or results of backups.<br>
1 year ago
Prometheus deals with metrics. Loki deals with logs. Grafana is there to show
the data on a dashboard.
Lot of the prometheus stuff here is based off the magnificent
[stefanprodan/dockprom.](https://github.com/stefanprodan/dockprom)
4 years ago
1 year ago
# Chapters
1 year ago
1 year ago
* **[Core prometheus+grafana](#Overview)** - nice dashboards with metrics of docker host and containers
* **[Pushgateway](#Pushgateway)** - push data to prometheus from anywhere
* **[Alertmanager](#Alertmanager)** - setting alerts and getting notifications
* **[Loki](#Loki)** - all of the above but for log files
* **[Caddy monitoring](#Caddy_monitoring)** - monitoring a reverse proxy
1 year ago
1 year ago
![dashboards_pic](https://i.imgur.com/ac9Qj1F.png)
1 year ago
# Overview
[Good youtube overview](https://youtu.be/h4Sl21AKiDg) of Prometheus.</br>
4 years ago
1 year ago
Prometheus is an open source system for monitoring and alerting,
written in golang.<br>
1 year ago
It periodically collects metrics from configured targets,
1 year ago
makes these metrics available for visualization, and can trigger alerts.<br>
Prometheus is relatively young project, it is a **pull type** monitoring.
[Glossary.](https://prometheus.io/docs/introduction/glossary/)
4 years ago
* **Prometheus Server** is the core of the system, responsible for
* pulling new metrics
* storing the metrics in a database and evaluating them
* making metrics available through PromQL API
* **Targets** - machines, services, applications that are monitored.</br>
4 years ago
These need to have an **exporter**.
4 years ago
* **exporter** - a script or a service that gathers metrics on the target,
4 years ago
converts them to prometheus server format,
4 years ago
and exposes them at an endpoint so they can be pulled
1 year ago
* **Alertmanager** - responsible for handling alerts from Prometheus Server,
4 years ago
and sending notifications through email, slack, pushover,..
1 year ago
**In this setup [ntfy](https://github.com/DoTheEvo/selfhosted-apps-docker/tree/master/gotify-ntfy-signal)
webhook will be used.**
* **pushgateway** - allows push type of monitoring. Meaning a machine anywhere
in the world can push data in to your prometheus. Should not be overused
as it goes against the pull philosophy of prometheus.
1 year ago
* **Grafana** - for web UI visualization of the collected metrics
4 years ago
![prometheus components](https://i.imgur.com/AxJCg8C.png)
4 years ago
4 years ago
# Files and directory structure
4 years ago
```
4 years ago
/home/
1 year ago
└── ~/
└── docker/
└── prometheus/
1 year ago
├── 🗁 grafana_data/
├── 🗁 prometheus_data/
├── 🗋 docker-compose.yml
├── 🗋 .env
└── 🗋 prometheus.yml
4 years ago
```
1 year ago
* `grafana_data/` - a directory where grafana stores its data
* `prometheus_data/` - a directory where prometheus stores its database and data
4 years ago
* `.env` - a file containing environment variables for docker compose
4 years ago
* `docker-compose.yml` - a docker compose file, telling docker how to run the containers
4 years ago
* `prometheus.yml` - a configuration file for prometheus
1 year ago
The three files must be provided.</br>
The directories are created by docker compose on the first run.
4 years ago
4 years ago
# docker-compose
4 years ago
1 year ago
* **Prometheus** - The official image used. Few extra commands passing configuration.
1 year ago
Of note is 240 hours(10days) retention policy.
1 year ago
* **Grafana** - The official image used. Bind mounted directory
for persistent data storage. User sets as root, as it solves issues I am
lazy to investigate.
* **NodeExporter** - An exporter for linux machines,
4 years ago
in this case gathering the metrics of the linux machine runnig docker,
1 year ago
like uptime, cpu load, memory use, network bandwidth use, disk space,...<br>
Also bind mount of some system directories to have access to required info.
1 year ago
* **cAdvisor** - An exporter for gathering docker **containers** metrics,
1 year ago
showing cpu, memory, network use of each container<br>
Runs in `privileged` mode and has some bind mounts of system directories
to have access to required info.
*Note* - ports are only `expose`, since expectation of use of a reverse proxy
and accessing the services by hostname, not ip and port.
4 years ago
4 years ago
`docker-compose.yml`
```yml
services:
# MONITORING SYSTEM AND THE METRICS DATABASE
prometheus:
1 year ago
image: prom/prometheus:v2.42.0
4 years ago
container_name: prometheus
hostname: prometheus
1 year ago
user: root
1 year ago
restart: unless-stopped
4 years ago
depends_on:
- cadvisor
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--web.console.templates=/etc/prometheus/consoles'
1 year ago
- '--storage.tsdb.retention.time=240h'
4 years ago
- '--web.enable-lifecycle'
volumes:
- ./prometheus_data:/prometheus
1 year ago
- ./prometheus.yml:/etc/prometheus/prometheus.yml
expose:
1 year ago
- "9090"
4 years ago
labels:
org.label-schema.group: "monitoring"
1 year ago
# WEB BASED UI VISUALISATION OF METRICS
4 years ago
grafana:
1 year ago
image: grafana/grafana:9.4.3
4 years ago
container_name: grafana
hostname: grafana
1 year ago
user: root
4 years ago
restart: unless-stopped
1 year ago
env_file: .env
4 years ago
volumes:
- ./grafana_data:/var/lib/grafana
2 years ago
expose:
1 year ago
- "3000"
4 years ago
labels:
org.label-schema.group: "monitoring"
1 year ago
# HOST LINUX MACHINE METRICS EXPORTER
4 years ago
nodeexporter:
1 year ago
image: prom/node-exporter:v1.5.0
4 years ago
container_name: nodeexporter
hostname: nodeexporter
restart: unless-stopped
command:
- '--path.procfs=/host/proc'
- '--path.rootfs=/rootfs'
- '--path.sysfs=/host/sys'
1 year ago
- '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
4 years ago
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
2 years ago
expose:
1 year ago
- "9100"
4 years ago
labels:
org.label-schema.group: "monitoring"
1 year ago
# DOCKER CONTAINERS METRICS EXPORTER
4 years ago
cadvisor:
1 year ago
image: gcr.io/cadvisor/cadvisor:v0.47.1
4 years ago
container_name: cadvisor
hostname: cadvisor
restart: unless-stopped
2 years ago
privileged: true
devices:
- /dev/kmsg:/dev/kmsg
4 years ago
volumes:
- /:/rootfs:ro
2 years ago
- /var/run:/var/run:ro
4 years ago
- /sys:/sys:ro
- /var/lib/docker:/var/lib/docker:ro
2 years ago
- /cgroup:/cgroup:ro #doesn't work on MacOS only for Linux
expose:
1 year ago
- "3000"
4 years ago
labels:
org.label-schema.group: "monitoring"
networks:
default:
2 years ago
name: $DOCKER_MY_NETWORK
external: true
4 years ago
```
`.env`
```bash
# GENERAL
4 years ago
DOCKER_MY_NETWORK=caddy_net
4 years ago
TZ=Europe/Bratislava
4 years ago
# GRAFANA
GF_SECURITY_ADMIN_USER=admin
GF_SECURITY_ADMIN_PASSWORD=admin
GF_USERS_ALLOW_SIGN_UP=false
1 year ago
# GRAFANA EMAIL
GF_SMTP_ENABLED=true
GF_SMTP_HOST=smtp-relay.sendinblue.com:587
GF_SMTP_USER=example@gmail.com
GF_SMTP_PASSWORD=xzu0dfFhn3eqa
4 years ago
```
**All containers must be on the same network**.</br>
4 years ago
Which is named in the `.env` file.</br>
4 years ago
If one does not exist yet: `docker network create caddy_net`
1 year ago
## prometheus.yml
4 years ago
[Official documentation.](https://prometheus.io/docs/prometheus/latest/configuration/configuration/)
4 years ago
1 year ago
Contains the bare minimum setup of targets from where metrics are to be pulled.
1 year ago
4 years ago
`prometheus.yml`
```yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'nodeexporter'
static_configs:
- targets: ['nodeexporter:9100']
- job_name: 'cadvisor'
static_configs:
- targets: ['cadvisor:8080']
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
```
4 years ago
1 year ago
## Reverse proxy
4 years ago
1 year ago
Caddy v2 is used, details
[here](https://github.com/DoTheEvo/selfhosted-apps-docker/tree/master/caddy_v2).</br>
4 years ago
1 year ago
`Caddyfile`
1 year ago
```php
1 year ago
graf.{$MY_DOMAIN} {
reverse_proxy grafana:3000
}
4 years ago
1 year ago
prom.{$MY_DOMAIN} {
reverse_proxy prometheus:9090
}
4 years ago
```
1 year ago
## First run and Grafana configuration
4 years ago
1 year ago
* login admin/admin to `graf.example.com`, change the password
* add Prometheus as a Data source in configuration<br>
set URL to `http://prometheus:9090`<br>
1 year ago
* import dashboards from [json files in this repo](dashboards/)<br>
1 year ago
1 year ago
These dashboards are the preconfigured ones from
[stefanprodan/dockprom](https://github.com/stefanprodan/dockprom)
with few changes.<br>
1 year ago
`docker_host.json` did not show free disk space for me, had to change `fstype`
from `aufs` to `ext4`.
1 year ago
Also included is [a fix](https://github.com/stefanprodan/dockprom/issues/18#issuecomment-487023049)
for host network monitoring not showing traffick. In all of them
the default time interval is set to 1h instead of 15m
1 year ago
* **docker_host.json** - dashboard showing linux host machine metrics
4 years ago
* **docker_containers.json** - dashboard showing docker containers metrics,
except the ones labeled as `monitoring` in the compose file
* **monitoring_services.json** - dashboar showing docker containers metrics
1 year ago
of containers that are labeled `monitoring`
4 years ago
1 year ago
![interface-pic](https://i.imgur.com/wzwgBkp.png)
1 year ago
---
4 years ago
---
1 year ago
# Pushgateway
1 year ago
Gives freedom to push information in to prometheus from anywhere.
1 year ago
## The setup
1 year ago
1 year ago
To add pushgateway functionality to the current stack:
* New container `pushgateway` added to the compose file.
1 year ago
1 year ago
<details>
<summary>docker-compose.yml</summary>
1 year ago
1 year ago
```yml
services:
1 year ago
# PUSHGATEWAY FOR PROMETHEUS
pushgateway:
image: prom/pushgateway:v1.5.1
container_name: pushgateway
hostname: pushgateway
restart: unless-stopped
command:
- '--web.enable-admin-api'
expose:
- "9091"
1 year ago
networks:
1 year ago
default:
name: $DOCKER_MY_NETWORK
external: true
1 year ago
```
</details>
1 year ago
* Adding pushgateway to the Caddyfile of the reverse proxy so that it can be reached at `https://push.example.com`<br>
1 year ago
1 year ago
<details>
<summary>Caddyfile</summary>
1 year ago
1 year ago
```php
push.{$MY_DOMAIN} {
1 year ago
reverse_proxy pushgateway:9091
1 year ago
}
```
</details>
1 year ago
* Adding pushgateway's scrape point to `prometheus.yml`<br>
1 year ago
1 year ago
<details>
<summary>prometheus.yml</summary>
1 year ago
1 year ago
```yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'pushgateway-scrape'
honor_labels: true
static_configs:
- targets: ['pushgateway:9091']
```
</details>
1 year ago
## The basics
1 year ago
![veeam-dash](https://i.imgur.com/TOuv9bM.png)
To **test pushing** some metric, execute in linux:<br>
`echo "some_metric 3.14" | curl --data-binary @- https://push.example.com/metrics/job/blabla/instance/whatever`
1 year ago
You see **labels** being set to the pushed metric in the path.<br>
1 year ago
Label `job` is required, but after that it's whatever you want,
though use of `instance` label is customary.<br>
Now in grafana, in **Explore** section you should see some results
when quering for `some_metric`.
1 year ago
The metrics sit on the pushgateway **forever**, unless deleted or container
shuts down. Prometheus will not remove the metrics from it after scraping,
it will keep scraping the pushgateway and store the value there with the time of
scraping.
1 year ago
To wipe the pushgateway clean<br>
`curl -X PUT https://push.example.com/api/v1/admin/wipe`
More on pushgateway setup, with the real world use to monitor backups,
along with pushing metrics from windows in powershell -
[**Veeam Prometheus Grafana**](https://github.com/DoTheEvo/veeam-prometheus-grafana)<br>
1 year ago
1 year ago
![veeam-dash](https://i.imgur.com/dUyzuyl.png)
1 year ago
---
---
1 year ago
1 year ago
# Alertmanager
1 year ago
To send a notification about some metric breaching some preset condition.<br>
Notifications chanels set here will be email and
[ntfy](https://github.com/DoTheEvo/selfhosted-apps-docker/tree/master/gotify-ntfy-signal)
1 year ago
1 year ago
![alert](https://i.imgur.com/b4hchSu.png)
1 year ago
1 year ago
## The setup
1 year ago
1 year ago
To add alertmanager to the current stack:
1 year ago
1 year ago
* New file - `alertmanager.yml` will be bind mounted in alertmanager container.<br>
This file contains configuration on how and where to deliver alerts.<br>
1 year ago
1 year ago
<details>
1 year ago
<summary>alertmanager.yml</summary>
1 year ago
1 year ago
```yml
route:
receiver: 'email'
receivers:
- name: 'ntfy'
webhook_configs:
- url: 'https://ntfy.example.com/alertmanager'
send_resolved: true
- name: 'email'
email_configs:
- to: 'whoever@example.com'
from: 'alertmanager@example.com'
smarthost: smtp-relay.sendinblue.com:587
auth_username: '<registration_email@gmail.com>'
auth_identity: '<registration_email@gmail.com>'
auth_password: '<long ass generated SMTP key>'
```
</details>
1 year ago
1 year ago
* New file - `alert.rules` will be mounted in to prometheus container<br>
This file defines which value of some metric becomes an alert event.
1 year ago
1 year ago
<details>
<summary>alert.rules</summary>
1 year ago
1 year ago
```yml
groups:
- name: host
rules:
- alert: DiskSpaceLow
expr: sum(node_filesystem_free_bytes{fstype="ext4"}) > 19
for: 10s
labels:
severity: critical
annotations:
description: "Diskspace is low!"
```
</details>
1 year ago
1 year ago
* Changed `prometheus.yml`. Added `alerting` section that points to alertmanager
container, and also set is a path to a `rules` file.
1 year ago
1 year ago
<details>
1 year ago
<summary>prometheus.yml</summary>
1 year ago
```yml
global:
scrape_interval: 15s
evaluation_interval: 15s
1 year ago
1 year ago
scrape_configs:
- job_name: 'nodeexporter'
static_configs:
- targets: ['nodeexporter:9100']
1 year ago
1 year ago
- job_name: 'cadvisor'
static_configs:
- targets: ['cadvisor:8080']
1 year ago
1 year ago
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
1 year ago
1 year ago
alerting:
alertmanagers:
- scheme: http
static_configs:
- targets:
- 'alertmanager:9093'
1 year ago
1 year ago
rule_files:
- '/etc/prometheus/rules/alert.rules'
```
</details>
1 year ago
1 year ago
* New container - `alertmanager` added to the compose file and prometheus
container has bind mount rules file added.
1 year ago
1 year ago
<details>
<summary>docker-compose.yml</summary>
1 year ago
1 year ago
```yml
services:
1 year ago
1 year ago
# MONITORING SYSTEM AND THE METRICS DATABASE
prometheus:
image: prom/prometheus:v2.42.0
container_name: prometheus
hostname: prometheus
restart: unless-stopped
user: root
depends_on:
- cadvisor
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--web.console.templates=/etc/prometheus/consoles'
- '--storage.tsdb.retention.time=240h'
- '--web.enable-lifecycle'
volumes:
- ./prometheus_data:/prometheus
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- ./alert.rules:/etc/prometheus/rules/alert.rules
expose:
- "9090"
labels:
org.label-schema.group: "monitoring"
# ALERT MANAGMENT FOR PROMETHEUS
alertmanager:
image: prom/alertmanager:v0.25.0
container_name: alertmanager
hostname: alertmanager
restart: unless-stopped
volumes:
- ./alertmanager.yml:/etc/alertmanager.yml
- ./alertmanager_data:/alertmanager
command:
- '--config.file=/etc/alertmanager.yml'
- '--storage.path=/alertmanager'
expose:
- "9093"
labels:
org.label-schema.group: "monitoring"
networks:
default:
name: $DOCKER_MY_NETWORK
external: true
```
</details>
* Adding alertmanager to the Caddyfile of the reverse proxy so that it can be reached
at `https://alert.example.com`. Not really necessary, but useful as it allows
to send alerts from anywhere, not just from prometheus.
<details>
<summary>Caddyfile</summary>
1 year ago
1 year ago
```php
alert.{$MY_DOMAIN} {
reverse_proxy alertmanager:9093
}
```
</details>
## The basics
![alert](https://i.imgur.com/C7g0xJt.png)
1 year ago
1 year ago
Once above setup is done an alert about low disk space should fire and notification
email should come.<br>
In `alertmanager.yml` switch from email to ntfy can be done.
*Useful*
* alert from anywhere using curl:<br>
1 year ago
`curl -H 'Content-Type: application/json' -d '[{"labels":{"alertname":"blabla"}}]' https://alert.example.com/api/v1/alerts`
* reload rules:<br>
1 year ago
`curl -X POST https://prom.example.com/-/reload`
[stefanprodan/dockprom](https://github.com/stefanprodan/dockprom#define-alerts)
has more detailed section on alerting worth checking out.
1 year ago
# Loki
1 year ago
![loki_arch](https://i.imgur.com/aoMPrVV.png)
1 year ago
Loki is made by the grafana team. It's often refered as a Prometheus for logs.<br>
1 year ago
It is a **push** type monitoring, where an agent - **promtail**
1 year ago
pushes logs on to a Loki instance.<br>
For docker containers theres also an option to install **loki-docker-driver**
1 year ago
on a docker host and log pushing is set either globally in /etc/docker/daemon.json
or per container in compose files.
1 year ago
There will be **two examples**.<br>
1 year ago
A **minecraft server** and a **caddy revers proxy**, both docker containers.
1 year ago
## The setup
1 year ago
1 year ago
To add Loki to the current stack:
1 year ago
1 year ago
* New container - `loki` added to the compose file.<br>
Note the port 3100 is actually mapped to the host,
allowing `localhost:3100` from driver to work.
1 year ago
1 year ago
<details>
<summary>docker-compose.yml</summary>
1 year ago
1 year ago
```yml
services:
1 year ago
1 year ago
# LOG MANAGMENT WITH LOKI
loki:
image: grafana/loki:2.7.3
container_name: loki
hostname: loki
user: root
restart: unless-stopped
volumes:
- ./loki_data:/loki
- ./loki-docker-config.yml:/etc/loki-docker-config.yml
command:
- '-config.file=/etc/loki-docker-config.yml'
ports:
- "3100:3100"
labels:
org.label-schema.group: "monitoring"
1 year ago
1 year ago
networks:
default:
name: $DOCKER_MY_NETWORK
external: true
```
</details>
1 year ago
1 year ago
* New file - `loki-docker-config.yml` bind mounted in the loki container.<br>
The file comes from
[the official example](https://github.com/grafana/loki/tree/main/cmd/loki),
but url is changed, and compactor section is added, to have control over
[data retention.](https://grafana.com/docs/loki/latest/operations/storage/retention/)
1 year ago
1 year ago
<details>
1 year ago
<summary>loki-docker-config.yml</summary>
1 year ago
1 year ago
```yml
auth_enabled: false
server:
http_listen_port: 3100
common:
path_prefix: /loki
storage:
filesystem:
chunks_directory: /loki/chunks
rules_directory: /loki/rules
replication_factor: 1
ring:
kvstore:
store: inmemory
compactor:
working_directory: /loki/compactor
compaction_interval: 10m
retention_enabled: true
retention_delete_delay: 2h
retention_delete_worker_count: 150
limits_config:
retention_period: 240h
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
ruler:
alertmanager_url: http://alertmanager:9093
analytics:
reporting_enabled: false
```
</details>
1 year ago
1 year ago
* Install [loki-docker-driver](https://grafana.com/docs/loki/latest/clients/docker-driver/)<br>
`docker plugin install grafana/loki-docker-driver:latest --alias loki --grant-all-permissions`<br>
To check if it's installed and enabled: `docker plugin ls`
* Containers that should be monitored need `logging` section in their compose.<br>
1 year ago
1 year ago
<details>
<summary>docker-compose.yml</summary>
1 year ago
1 year ago
```yml
services:
1 year ago
1 year ago
whoami:
image: "containous/whoami"
container_name: "whoami"
hostname: "whoami"
logging:
driver: "loki"
options:
loki-url: "http://localhost:3100/loki/api/v1/push"
```
</details>
## Minecraft example
1 year ago
1 year ago
Loki will be used to monitor logs of a [minecraft server.](https://github.com/DoTheEvo/selfhosted-apps-docker/tree/master/minecraft)<br>
1 year ago
A dashboard will be created, showing logs volume in time.<br>
1 year ago
Alert will be set to send a notification when a player joins.<br>
1 year ago
1 year ago
**Requirements** - grafana, loki, loki-docker-driver, minecraft with logging
1 year ago
set in compose
1 year ago
1 year ago
![logo](https://i.imgur.com/M1k0Dn4.png)
1 year ago
1 year ago
### First steps
* In grafana, loki needs to be added as a datasource, `http://loki:3100`
1 year ago
* In `Explore` section, filter, container_name = minecraft, query...
1 year ago
this should result in seeing minecraft logs and their volume/time graph.
1 year ago
This Explore view will be recreated as a dashboard.
### Dashboard minecraft_logs
* New dashboard, new panel
* Data source - Loki
* Switch from `builder` to `code`<br>
* query - `count_over_time({container_name="minecraft"} |= `` [1m])`<br>
* Transform - Rename by regex - `(.*)` - `Logs`
* Graph type - `Time series`
* Title - Logs volume
* Transparent background
* Legend off
* Graph styles - bar
* Fill opacity - 50
* Color scheme - single color
* `Query options` - Min interval=1m
* Save
* Add another pane to the dashboard
* Graph type - `Logs`
* Data source - Loki
* Switch from `builder` to `code`<br>
query - `{container_name="minecraft"} |= ""`<br>
* Title - *empty*
* Deduplication - Signature
* Save
1 year ago
This should create a similar dashboard to the one in the picture above.<br>
1 year ago
1 year ago
[Performance tips](https://www.youtube.com/watch?v=YED8XIm0YPs)
1 year ago
for grafana loki queries
1 year ago
### Alerts in Grafana for Loki
1 year ago
When a player joins minecraft server a log appears *"Bastard joined the game"*<br>
Alert will be set to look for string *"joined the game"* and send notification
when it occurs.
Grafana rules are based around a `Query` and `Expressions` and each
and every one has to result in a a simple number or a true or false condition.
#### Create alert rule
- **1 Set an alert rule name**
- Rule name = Minecraft-player-joined-alert
- **2 Set a query and alert condition**
- **A** - Loki; Last 5 minutes
- switch from builder to code
- `count_over_time({compose_service="minecraft"} |= "joined the game" [5m])`
- **B** - Reduce
- Function = Last
- Input = A
- Mode = Strict
- **C** - Treshold
- Input = B
- is above 0
- Make this the alert condition
- **3 Alert evaluation behavior**
- Folder = "Alerts"
- Evaluation group (interval) = "five-min"<br>
- Evaluation interval = 5m
- For 0s
1 year ago
- Configure no data and error handling
1 year ago
- Alert state if no data or all values are null = OK
- **4 Add details for your alert rule**
- Can pass values from logs to alerts, by targeting A/B/C/.. expressions
from step2.
- Summary = `Number of players: {{ $values.B }}`<br>
- **5 Notifications**
1 year ago
- nothing
- Save and exit
1 year ago
#### Contact points
1 year ago
- New contact point
- Name = ntfy
1 year ago
- Integration = Webhook
1 year ago
- URL = https://ntfy.example.com/grafana
1 year ago
- Disable resolved message = check
- Test
- Save
1 year ago
1 year ago
#### Notification policies
- Edit default
- Default contact point = ntfy
- Save
1 year ago
1 year ago
After all this, there should be notification coming when a player joins.
1 year ago
1 year ago
`.*:\s(?P<player>.*)\sjoined the game$` - if ever I find out how to extract
string from a log like and pass it on to an alert.
1 year ago
1 year ago
# Caddy monitoring
1 year ago
1 year ago
Described in
[the caddy guide](https://github.com/DoTheEvo/selfhosted-apps-docker/tree/master/caddy_v2)
1 year ago
4 years ago
# Update
4 years ago
4 years ago
Manual image update:
- `docker-compose pull`</br>
- `docker-compose up -d`</br>
- `docker image prune`
4 years ago
4 years ago
# Backup and restore
4 years ago
4 years ago
#### Backup
Using [borg](https://github.com/DoTheEvo/selfhosted-apps-docker/tree/master/borg_backup)
that makes daily snapshot of the entire directory.
1 year ago
4 years ago
#### Restore
* down the prometheus containers `docker-compose down`</br>
* delete the entire prometheus directory</br>
* from the backup copy back the prometheus directory</br>
* start the containers `docker-compose up -d`