new-site/infra/ansible
justin e318f12e36 infra: disk-space guardrail + Docker log rotation (prevent disk-full crash)
On 2026-06-27 / filled to 100% and crash-looped Postgres ('No space left on
device'), taking down Listmonk mid-send. Cause: an orphaned 15GB
/tmp/forgejo-dump.zip (interrupted backup) + uncapped Docker json-file logs
(forgejo container log alone was ~1GB), with NO disk monitoring to warn first.

- pw-disk-space-alert.sh + cron (every 15m): Telegram warn at 80%, auto-reclaim
  build cache + orphaned forgejo dump at 88%. Silent when healthy.
- ansible docker role: write /etc/docker/daemon.json with 50m x 3 log cap
  (150MB/container max) + non-disruptive Reload docker handler.
2026-06-27 09:47:29 -05:00
..
inventory docs+infra(deliverability): document bulk subdomain; ansible signs send.performancewest.net 2026-06-18 23:12:05 -05:00
playbooks feat(healthcare): OIG/SAM exclusion screening as $79/mo Stripe Subscription 2026-06-18 07:54:38 -05:00
roles infra: disk-space guardrail + Docker log rotation (prevent disk-full crash) 2026-06-27 09:47:29 -05:00
ansible.cfg Add Prometheus + Grafana + Alertmanager monitoring stack 2026-05-01 02:08:39 -05:00