fix(monitoring): repair both dead mail-alert crons + de-noise DMARC digest

Three bugs the owner hit:
1. Per-operator reputation alert (06:10 cron, mail_reputation_monitor --alert)
   silently never ran: it redirected to /var/log/pw-mail-reputation.log but
   /var/log is root-only and that file was never pre-created, so the deploy
   user's >> redirect failed and cron aborted before the command. Repointed
   both mail-alert crons to deploy-writable /opt/performancewest/logs/.
2. IP reputation alert (20:00 cron) still referenced the removed rehab pool
   (.91-.93) and used 8.8.8.8 for Spamhaus (which returns the open-resolver
   error, not a real answer). Dropped the rehab section, relabeled to the two
   live IPs (.94/.107), and switched the DNSBL check to Control D (76.76.2.0)
   which returns real Spamhaus ZEN data. (It was correctly SILENT lately
   because delivery is healthy -- silent-on-healthy is by design.)
3. DMARC daily digest was pure noise: it alerted on ANY external IP with >=20
   failing msgs, but those are legit recipient-side forwarders/security
   gateways (inkyphishfence, cloud-sec-av, Proofpoint, Mimecast, ...) that
   re-send our mail and naturally break SPF/DKIM alignment -- benign under
   p=reject. Added PTR-based forwarder detection (FORWARDER_PTR_HINTS) so the
   digest tags them [fwd] and only alerts on (a) OUR IP <95% pass or (b) an
   UNKNOWN non-forwarder external IP with >=100 failing msgs = real spoofing.

Verified: all 4 currently-flagged external IPs now classify as forwarder=True.
This commit is contained in:
justin 2026-06-24 06:28:50 -05:00
parent c20edb28cd
commit ae68edbc58
4 changed files with 74 additions and 31 deletions

View file

@ -86,6 +86,47 @@ def is_ours(ip: str) -> bool:
return any(addr in net for net in OUR_NETS)
# Reverse-DNS substrings that identify a LEGIT forwarder / recipient-side mail
# security gateway. These re-send our mail from their own IP, which naturally
# breaks SPF/DKIM alignment -> the forwarded copy "fails" DMARC. That is benign
# (the ORIGINAL was already delivered+aligned; our p=reject only drops the
# forwarded duplicate). We must NOT alert on these or the digest is pure noise.
# Matched case-insensitively against the source IP's PTR record.
FORWARDER_PTR_HINTS = (
"inkyphishfence", "cloud-sec-av", "proofpoint", "pphosted", "ppe-hosted",
"mimecast", "barracuda", "messagelabs", "symanteccloud", "fireeyecloud",
"trendmicro", "mailcontrol", "forcepoint", "cisco", "iphmx", # Cisco ESA
"mxlogic", "mailprotect", "emailsrvr", "godaddy", "secureserver",
"outlook.com", "protection.outlook", "google.com", "googlemail",
"amazonses", "sendgrid", "mailgun", "mcsv.net", "mailchimp",
"fastmail", "messagingengine", "zoho", "mailroute", "spamh",
"antispamcloud", "mailspamprotection", "fortimail", "sophos",
)
_ptr_cache: dict[str, str] = {}
def reverse_dns(ip: str) -> str:
"""Best-effort PTR lookup (cached). Empty string on failure."""
if ip in _ptr_cache:
return _ptr_cache[ip]
ptr = ""
try:
import socket
ptr = socket.gethostbyaddr(ip)[0].lower()
except Exception:
ptr = ""
_ptr_cache[ip] = ptr
return ptr
def is_known_forwarder(ip: str) -> bool:
"""True if the IP's PTR looks like a legit forwarder / security gateway, so
DMARC failures from it are benign (forwarded mail, not spoofing)."""
ptr = reverse_dns(ip)
return any(h in ptr for h in FORWARDER_PTR_HINTS) if ptr else False
# ── attachment extraction ─────────────────────────────────────────────────────
def extract_xml(payload: bytes, filename: str) -> bytes | None:
"""Decompress a DMARC report attachment to raw XML bytes."""
@ -280,13 +321,23 @@ def summarize(conn, days: int = 7) -> tuple[str, list[str]]:
continue
pass_pct = round(100 * passed / total)
ours = is_ours(ip)
tag = "ours" if ours else "EXTERNAL"
if ours:
tag = "ours"
elif failed > 0 and is_known_forwarder(ip):
tag = "fwd" # legit forwarder / security gateway -- failures benign
else:
tag = "EXTERNAL"
lines.append(f" {ip:<16} [{tag:<8}] total={total:<6} pass={pass_pct}% fail={failed}")
# Alerts: our IP failing alignment, OR an external IP sending as us at volume.
# Alert ONLY on genuinely actionable cases:
# 1. OUR OWN IP failing alignment = a real auth/config break we must fix.
# 2. An UNKNOWN external IP (not ours, not a recognized forwarder) sending
# as us at high volume = possible spoofing. Recognized forwarders
# (Proofpoint/Mimecast/Inky/etc. re-sending our mail) naturally fail
# SPF/DKIM alignment and are filtered out -- they were the digest noise.
if ours and pass_pct < 95 and total >= 20:
problems.append(f"{ip} (ours): only {pass_pct}% DMARC pass ({failed}/{total} fail) -- alignment broken")
if not ours and failed >= 20:
problems.append(f"{ip} (EXTERNAL): {failed} failing msgs sending as us -- possible spoofing")
elif tag == "EXTERNAL" and failed >= 100:
problems.append(f"{ip} (EXTERNAL, PTR={reverse_dns(ip) or 'none'}): {failed} failing msgs sending as us -- possible spoofing")
return "\n".join(lines), problems