new-site/infra/cron/pw-warmup-tg-alert
justin ae68edbc58 fix(monitoring): repair both dead mail-alert crons + de-noise DMARC digest
Three bugs the owner hit:
1. Per-operator reputation alert (06:10 cron, mail_reputation_monitor --alert)
   silently never ran: it redirected to /var/log/pw-mail-reputation.log but
   /var/log is root-only and that file was never pre-created, so the deploy
   user's >> redirect failed and cron aborted before the command. Repointed
   both mail-alert crons to deploy-writable /opt/performancewest/logs/.
2. IP reputation alert (20:00 cron) still referenced the removed rehab pool
   (.91-.93) and used 8.8.8.8 for Spamhaus (which returns the open-resolver
   error, not a real answer). Dropped the rehab section, relabeled to the two
   live IPs (.94/.107), and switched the DNSBL check to Control D (76.76.2.0)
   which returns real Spamhaus ZEN data. (It was correctly SILENT lately
   because delivery is healthy -- silent-on-healthy is by design.)
3. DMARC daily digest was pure noise: it alerted on ANY external IP with >=20
   failing msgs, but those are legit recipient-side forwarders/security
   gateways (inkyphishfence, cloud-sec-av, Proofpoint, Mimecast, ...) that
   re-send our mail and naturally break SPF/DKIM alignment -- benign under
   p=reject. Added PTR-based forwarder detection (FORWARDER_PTR_HINTS) so the
   digest tags them [fwd] and only alerts on (a) OUR IP <95% pass or (b) an
   UNKNOWN non-forwarder external IP with >=100 failing msgs = real spoofing.

Verified: all 4 currently-flagged external IPs now classify as forwarder=True.
2026-06-24 06:28:50 -05:00

7 lines
542 B
Text

# Daily warmup IP-reputation check + Telegram alert. Runs 20:00 Central (after
# the day's sends complete), alerts ONLY on a problem (delivery below 65% or
# >150 spam/policy 550-5.7.1 blocks); healthy days stay silent. Logs every run
# to /var/log/pw-warmup-healthcheck.log. Script: infra/monitoring/pw-warmup-tg-alert.sh
# -> /usr/local/bin/pw-warmup-tg-alert. Reads TELEGRAM_BOT_TOKEN/CHAT_ID from
# /opt/performancewest/.env.
0 20 * * * deploy /usr/local/bin/pw-warmup-tg-alert >> /opt/performancewest/logs/pw-warmup-healthcheck.log 2>&1