new-site/infra
justin e318f12e36 infra: disk-space guardrail + Docker log rotation (prevent disk-full crash)
On 2026-06-27 / filled to 100% and crash-looped Postgres ('No space left on
device'), taking down Listmonk mid-send. Cause: an orphaned 15GB
/tmp/forgejo-dump.zip (interrupted backup) + uncapped Docker json-file logs
(forgejo container log alone was ~1GB), with NO disk monitoring to warn first.

- pw-disk-space-alert.sh + cron (every 15m): Telegram warn at 80%, auto-reclaim
  build cache + orphaned forgejo dump at 88%. Silent when healthy.
- ansible docker role: write /etc/docker/daemon.json with 50m x 3 log cap
  (150MB/container max) + non-disruptive Reload docker handler.
2026-06-27 09:47:29 -05:00
..
ansible infra: disk-space guardrail + Docker log rotation (prevent disk-full crash) 2026-06-27 09:47:29 -05:00
cron infra: disk-space guardrail + Docker log rotation (prevent disk-full crash) 2026-06-27 09:47:29 -05:00
fail2ban Initial commit — Performance West telecom compliance platform 2026-04-27 06:54:22 -05:00
firewall firewall: allow ezstorehost (207.174.124.51) to reach Forgejo SSH 2026-06-10 22:45:43 -05:00
k8s infra/k8s: shkeeper liveness+readiness probes (fix recurring crypto.performancewest.net downtime) 2026-06-09 04:57:50 -05:00
mail infra(mail): fix warmed sending IPs dropping off ens18 on reboot (Jun 24 outage) 2026-06-25 17:28:33 -05:00
monitoring infra: disk-space guardrail + Docker log rotation (prevent disk-full crash) 2026-06-27 09:47:29 -05:00
mta-sts infra: MTA-STS HTTPS vhost (cert issued, policy live) 2026-06-06 21:03:30 -05:00
network infra(mail): remove 18 dormant snowshoe IPs from postfix + host 2026-06-23 23:45:41 -05:00
nginx fix(nginx): unblock public API routes powering lead tools/flows (HC sales killer) 2026-06-23 15:51:30 -05:00
postfix infra(mail): remove 18 dormant snowshoe IPs from postfix + host 2026-06-23 23:45:41 -05:00
systemd infra: codify the email-campaign pipeline in Ansible (new mail-pipeline role) 2026-06-17 20:26:01 -05:00