new-site/monitoring
justin 7670608c1a fix(monitoring): render alertmanager.yml from template at deploy (fixes crash loop)
Alertmanager does not expand ${ENV} in its YAML, so the committed config with
${TELEGRAM_BOT_TOKEN}/${TELEGRAM_CHAT_ID} crash-looped it (line 24: cannot
unmarshal !!str `${TELEG...` into int64) - 11k+ restarts on prod, alerting dead.

- rename alertmanager.yml -> alertmanager.yml.template (keeps ${} placeholders)
- deploy.sh: envsubst the template into the (gitignored) alertmanager.yml from
  .env, scoped to the two TELEGRAM vars so the {{ }} Go-template message survives
- gitignore the rendered file (contains the bot token)
- warns if the vars are unset
2026-06-07 04:49:53 -05:00
..
alert_rules.yml Fix ContainerHighMemory alert: skip containers with no memory limit 2026-05-01 03:54:16 -05:00
alertmanager.yml.template fix(monitoring): render alertmanager.yml from template at deploy (fixes crash loop) 2026-06-07 04:49:53 -05:00
blackbox.yml Fix ERPNext and Forgejo probes 2026-05-01 03:35:45 -05:00
grafana-datasources.yml Remove fixed uid from Grafana datasource provisioning — Grafana 13 rejects it on fresh boot 2026-05-01 03:09:10 -05:00
prometheus.yml Fix Forgejo probe: use HTTPS public URL (port 3000 conflicts with Grafana internally) 2026-05-01 03:38:36 -05:00
pw-services-dashboard.json Fix dashboard stale series + enable Prometheus admin API 2026-05-01 03:43:42 -05:00