Commit graph

3 commits

Author SHA1 Message Date
justin
ae68edbc58 fix(monitoring): repair both dead mail-alert crons + de-noise DMARC digest
Three bugs the owner hit:
1. Per-operator reputation alert (06:10 cron, mail_reputation_monitor --alert)
   silently never ran: it redirected to /var/log/pw-mail-reputation.log but
   /var/log is root-only and that file was never pre-created, so the deploy
   user's >> redirect failed and cron aborted before the command. Repointed
   both mail-alert crons to deploy-writable /opt/performancewest/logs/.
2. IP reputation alert (20:00 cron) still referenced the removed rehab pool
   (.91-.93) and used 8.8.8.8 for Spamhaus (which returns the open-resolver
   error, not a real answer). Dropped the rehab section, relabeled to the two
   live IPs (.94/.107), and switched the DNSBL check to Control D (76.76.2.0)
   which returns real Spamhaus ZEN data. (It was correctly SILENT lately
   because delivery is healthy -- silent-on-healthy is by design.)
3. DMARC daily digest was pure noise: it alerted on ANY external IP with >=20
   failing msgs, but those are legit recipient-side forwarders/security
   gateways (inkyphishfence, cloud-sec-av, Proofpoint, Mimecast, ...) that
   re-send our mail and naturally break SPF/DKIM alignment -- benign under
   p=reject. Added PTR-based forwarder detection (FORWARDER_PTR_HINTS) so the
   digest tags them [fwd] and only alerts on (a) OUR IP <95% pass or (b) an
   UNKNOWN non-forwarder external IP with >=100 failing msgs = real spoofing.

Verified: all 4 currently-flagged external IPs now classify as forwarder=True.
2026-06-24 06:28:50 -05:00
justin
707d538847 mail: DMARC parser — classify whole 207.174.124.0/24 as ours (warmup pool)
First live ingest (28 reports) showed our warmup rotation pool (.91-.109, out0x)
mislabeled EXTERNAL because OUR_IPS only listed 4 specific IPs -- every one was
100% DMARC-passing, clearly ours, and would have generated false spoofing alerts.
Replace the literal-IP set with an ipaddress subnet check on 207.174.124.0/24
(our whole block). The only genuinely-external failing sender is 35.174.145.124
(AWS, 32 msgs spoofing us, SPF-fail/no-DKIM, all correctly rejected by p=reject) --
exactly the signal the --alert path is meant to surface.
2026-06-19 08:54:41 -05:00
justin
8e5590b492 mail: DMARC aggregate-report parser + dedicated dmarc@ mailbox ingestion
Tool 2 of the deliverability monitoring pair (Tool 1 = mail_reputation_monitor).
DMARC rua reports from dozens of operators (Google, Yahoo, Comcast, Cox, Bell,
Mimecast, Cisco ESA, GMX, mail.com, ...) were landing in ops@ (dmarc@ was a DL),
burying real mail and never parsed. Now ingested + queryable:

- dmarc@performancewest.net converted DL -> dedicated Carbonio mailbox; isolated
  IMAP creds in server .env, surfaced to workers in docker-compose.yml (mirrors
  OPS_IMAP_*). 29 historical reports moved ops@ -> dmarc@ via IMAP.
- scripts/dmarc_report_parser.py: IMAP fetch unseen -> decompress .gz/.zip/.xml
  (namespace-agnostic: classic + urn:ietf:params:xml:ns:dmarc-2.0 GMX/mail.com) ->
  parse aggregate XML -> upsert dmarc_report (keyed (org_name,report_id), no-op on
  re-parse) + dmarc_record per source IP. dmarc_pass = dkim_aligned OR spf_aligned.
  Marks \Seen. --dry-run/--all/--alert (7d per-IP summary + Telegram if one of OUR
  IPs <95% pass, or EXTERNAL IP sends >=20 failing msgs as us = spoofing under
  p=reject). psycopg2 imported lazily so --dry-run runs without the driver.
- api/migrations/102_dmarc_aggregate.sql: dmarc_report + dmarc_record tables.
- infra/cron/pw-dmarc-parser: 06:20 UTC daily --alert (after reputation, before scrub).
- docs/deliverability.md: DMARC section DONE; query examples.

Verified: dry-run --all parses all 28 reports (1 non-report test probe), 0 unknown
after the namespace fix.
2026-06-19 08:50:20 -05:00