DoTheEvo 2 years ago
parent fe9e3dadd4
commit 47d4f85464

@ -746,7 +746,7 @@ This should create a similar dashboard to the one in the picture above.<br>
[Performance tips](https://www.youtube.com/watch?v=YED8XIm0YPs) [Performance tips](https://www.youtube.com/watch?v=YED8XIm0YPs)
for grafana loki queries for grafana loki queries
### Alerts in Grafana for Loki ## Alerts in Grafana for Loki
When a player joins minecraft server a log appears *"Bastard joined the game"*<br> When a player joins minecraft server a log appears *"Bastard joined the game"*<br>
Alert will be set to look for string *"joined the game"* and send notification Alert will be set to look for string *"joined the game"* and send notification
@ -755,14 +755,14 @@ when it occurs.
Grafana rules are based around a `Query` and `Expressions` and each Grafana rules are based around a `Query` and `Expressions` and each
and every one has to result in a a simple number or a true or false condition. and every one has to result in a a simple number or a true or false condition.
#### Create alert rule ### Create alert rule
- **1 Set an alert rule name** - **1 Set an alert rule name**
- Rule name = Minecraft-player-joined-alert - Rule name = Minecraft-player-joined-alert
- **2 Set a query and alert condition** - **2 Set a query and alert condition**
- **A** - Loki; Last 5 minutes - **A** - Switch to Loki; set Last 5 minutes
- switch from builder to code - switch from builder to code
- `count_over_time({compose_service="minecraft"} |= "joined the game" [5m])` - `count_over_time({container_name="minecraft"} |= "joined the game" [5m])`
- **B** - Reduce - **B** - Reduce
- Function = Last - Function = Last
- Input = A - Input = A
@ -781,12 +781,17 @@ and every one has to result in a a simple number or a true or false condition.
- **4 Add details for your alert rule** - **4 Add details for your alert rule**
- Can pass values from logs to alerts, by targeting A/B/C/.. expressions - Can pass values from logs to alerts, by targeting A/B/C/.. expressions
from step2. from step2.
- Summary = `Number of players: {{ $values.B }}`<br> - Summary = `Number of players joined: {{ $values.B }}`<br>
- Maybe one day I figure out how to pull player's name from the log
and pass it to alert, so far I got [this](https://regex101.com/r/pBAaEl/2)
`.*:\s(?P<player>.*)\sjoined the game$` and [a full query](https://pastebin.com/Ep6PUwV2)
but dunno how to reference the named regex group in alert 4th section.<br>
And grafana forum is kinda big black hole of unanswared questions.
- **5 Notifications** - **5 Notifications**
- nothing - nothing
- Save and exit - Save and exit
#### Contact points ### Contact points
- New contact point - New contact point
- Name = ntfy - Name = ntfy
@ -796,7 +801,7 @@ and every one has to result in a a simple number or a true or false condition.
- Test - Test
- Save - Save
#### Notification policies ### Notification policies
- Edit default - Edit default
- Default contact point = ntfy - Default contact point = ntfy
@ -804,13 +809,307 @@ and every one has to result in a a simple number or a true or false condition.
After all this, there should be notification coming when a player joins. After all this, there should be notification coming when a player joins.
`.*:\s(?P<player>.*)\sjoined the game$` - if ever I find out how to extract
string from a log like and pass it on to an alert.
# Caddy monitoring # Caddy monitoring
Described in Reverse proxy is kinda linchpin of a selfhosted setup, since it's in charge
[the caddy guide](https://github.com/DoTheEvo/selfhosted-apps-docker/tree/master/caddy_v2) of all the http/https traffic that goes in. So focus on monitoring this
keystone makes sense.
Will be using Prometheus for monitoring metrics and Loki for log files monitoring.
**Requirements** - grafana, prometheus, loki, caddy container
## Metrics
Caddy has build in exporter of metrics for prometheus, so all that is needed
is enabling it, scrape it by prometheus, and import a dashboard.
* Edit Caddyfile to [enable metrics.](https://caddyserver.com/docs/metrics)
servers {
a.{$MY_DOMAIN} {
reverse_proxy whoami:80
* Edit compose to publish 2019 port.<br>
Likely not necessary if Caddy and Prometheus are on the same docker network,
but its nice to check if the metrics export works at `<docker-host-ip>:2019/metrics`
image: caddy
container_name: caddy
hostname: caddy
restart: unless-stopped
env_file: .env
- "80:80"
- "443:443"
- "443:443/udp"
- "2019:2019"
- ./Caddyfile:/etc/caddy/Caddyfile
- ./caddy_config:/data
- ./caddy_data:/config
external: true
* Edit prometheus.yml to add caddy scraping point
scrape_interval: 15s
evaluation_interval: 15s
- job_name: 'caddy'
- targets: ['caddy:2019']
* In grafana import [caddy dashboard](https://grafana.com/grafana/dashboards/14280-caddy-exporter/)<br>
or make your own, `caddy_reverse_proxy_upstreams_healthy` shows reverse proxy
upstreams, but thats all.
But these metrics are more about performance and load put on Caddy,
which in selfhosted enviroment will likely be minmal and not interesting.<br>
To get more intriguing info of who, when, from where, connects to what service,..
for that acces logs monitoring is needed.
## Logs
Loki will be used for logs monitoring.<br>
Loki itself just stores them, to get logs a promtail container will be used
that will have access to caddy's logs, and its job is to scrape them regularly
and push them to Loki. Once there, a basic grafana dashboard can be made.
* Have Grafana, Loki, Caddy working
* Edit Caddy compose, bind mount `/var/log/caddy`.<br>
Add Promtail container, that also has same bind mount, along with bind mount
of its config file.<br>
Promtail will scrape logs to which it now has access and pushes them to Loki.
image: caddy
container_name: caddy
hostname: caddy
restart: unless-stopped
env_file: .env
- "80:80"
- "443:443"
- "443:443/udp"
- "2019:2019"
- ./Caddyfile:/etc/caddy/Caddyfile
- ./caddy_data:/data
- ./caddy_config:/config
- /var/log/caddy:/var/log/caddy
image: grafana/promtail
container_name: caddy-promtail
hostname: caddy-promtail
restart: unless-stopped
- ./promtail-config.yml:/etc/promtail-config.yml
- /var/log/caddy:/var/log/caddy:ro
- '-config.file=/etc/promtail-config.yml'
external: true
- url: http://loki:3100/loki/api/v1/push
- job_name: caddy
- targets:
- localhost
job: caddy_access_log
__path__: /var/log/caddy/*.log
<summary>promtail-config.yml customizing fields</summary>
- url: http://loki:3100/loki/api/v1/push
- job_name: caddy_access_log
- targets: # tells promtail to look for the logs on the current machine/host
- localhost
job: caddy_access_log
__path__: /var/log/caddy/*.log
# Extract all the fields I care about from the
# message:
- json:
"level": "level"
"timestamp": "ts"
"duration": "duration"
"response_status": "status"
"request_path": "request.uri"
"request_method": "request.method"
"request_host": "request.host"
"request_useragent": "request.headers.\"User-Agent\""
"request_remote_ip": "request.remote_ip"
# Promote the level into an actual label:
- labels:
# Regenerate the message as all the fields listed
# above:
- template:
# This is a field that doesn't exist yet, so it will be created
source: "output"
template: |
{{toJson (unset (unset (unset . "Entry") "timestamp") "filename")}}
- output:
source: output
# Set the timestamp of the log entry to what's in the
# timestamp field.
- timestamp:
source: "timestamp"
format: "Unix"
* Edit `Caddyfile` to enable [access logs](https://caddyserver.com/docs/caddyfile/directives/log).
Unfortunetly this can't be globally enabled, so the easiest way seems to be
to create a logging [snippet](https://caddyserver.com/docs/caddyfile/concepts#snippets)
and copy paste import line in to every site block.
(log_common) {
log {
output file /var/log/caddy/caddy_access.log
ntfy.example.com {
import log_common
reverse_proxy ntfy:80
mealie.{$MY_DOMAIN} {
import log_common
reverse_proxy mealie:80
* at this points logs should be visible and explorable in grafana<br>
Explore > `{job="caddy_access_log"} |= "" | json`
## dashboard
* new pane, will be time series graph showing logs volume in time
* Data source = Loki
* switch from builder to code<br>
`sum(count_over_time({job="caddy_access_log"} |= "" | json [1m])) by (request_host)`
* Transform > Rename by regex > Match = `\{request_host="(.*)"\}`; Replace = $1
* Query options > Min interval = 1m
* Graph type = Time series
* Title = "Access timeline"
* Transparent
* Tooltip mode = All
* Tooltip values sort order = Descending
* Legen Placement = Right
* Value = Total
* Graph style = Bars
* Fill opacity = 50
* Add another pane, will be a pie chart, showing subdomains divide
* Data source = Loki
* switch from builder to code<br>
`sum(count_over_time({job="caddy_access_log"} |= "" | json [$__range])) by (request_host)`
* Transform > Rename by regex > Match = `\{request_host="(.*)"\}`; Replace = $1
* Graph type = Pie chart
* Title = "Subdomains divide"
* Transparent
* Legen Placement = Right
* Value = Total
* Graph style = Bars
* Add another pane, this will be actual log view
* Graph type - Logs
* Data source - Loki
* Switch from builder to code
* query - `{job="caddy_access_log"} |= "" | json`
* Title - empty
* Deduplication - Signature
* Save
## Geoip
# Update # Update
