diff --git a/docs/campaign-deliverability-plan.md b/docs/campaign-deliverability-plan.md new file mode 100644 index 0000000..8fa0159 --- /dev/null +++ b/docs/campaign-deliverability-plan.md @@ -0,0 +1,94 @@ +# Campaign Deliverability — Diagnosis & List-Verification Plan + +_Created 2026-06-17 after trucking conversions went to zero._ + +## TL;DR + +Trucking conversions stopped on **June 9** not because campaigns stopped sending +(they send ~2,400/day with ~1,800 opens/3 days) but because a **filter bug was +blasting ~438k dead `mx_unreachable` domains**, producing a **~47% hard-bounce +rate (~1,100/day)** that blocklisted **half the 120k subscriber base** and +torched sender reputation, so real prospects never saw the offer. + +- **Fixed** (`build_trucking_campaigns.py`): send filter now keys only off + `email_verify_result` (never the broken `email_verified` boolean), and defaults + to **recovery mode = `smtp_valid` only** until reputation recovers. Set + `CAMPAIGN_INCLUDE_CATCH_ALL=1` to re-add catch-all domains afterward. +- **Healthcare is fine** — separate instance (`listmonk-hc` / DB `listmonk_hc`), + cleaned list (`clean_hc_warmup_list.py` already drops `mx_unreachable`), bounce + rate ~2-3%. No change needed; it proves the fix is correct. + +## Why the SMTP-probe verification under-counts deliverable addresses + +`email_verifier.py` does syntax → MX → SMTP `RCPT TO`. Results: + +| result | count | sendable? | why | +|---|---|---|---| +| `catch_all_domain` | 1,082,817 | risky | domain accepts ALL rcpts at SMTP time, then may bounce later | +| `mx_unreachable` | 438,163 | **NO** | MX exists but never answered the probe — **hard-bounces on real send** | +| `smtp_valid` | 11,774 | **YES** | an MX explicitly accepted this exact mailbox | +| `no_mx_records` / `invalid_syntax` / `smtp_rejected_550` | ~46k | no | dead | + +The probe can only *confirm* a mailbox on non-catch-all domains that answer the +RCPT handshake — which is a small slice. Only **~3,042 `smtp_valid` are still +unsent**, so recovery mode will exhaust the clean pool in ~1 day. **We need a way +to grow the verified-deliverable list without burning PW's reputation.** + +## The real fix: burner-domain bounce verification + +SMTP-probe verification is unreliable (catch-alls mask validity; many MTAs refuse +probes but accept real mail). The only ground truth is **actually send a message +and see if it bounces.** But doing that from PW's domain is what got us here. So: + +### Design + +1. **Dedicated throwaway verification domain** (NOT performancewest.net and NOT + carrierone.com — both are reputation assets we must protect). Register a cheap + neutral `.com` via Porkbun (we already have the Porkbun integration). Give it + its own SPF/DKIM/DMARC and a dedicated sending IP/identity (separate postfix + instance or a transactional provider sub-account that isolates reputation). + +2. **Send a low-key, CAN-SPAM-compliant, non-commercial verification email** to + the unverified pool (e.g. a plain "is this the right contact for ?" or a + bland newsletter-style note with a working unsubscribe). It must be a real, + legitimate message — never deceptive — but its ONLY purpose is to elicit a + delivered-vs-bounced signal. Throttled and warmed like any send. + +3. **Catch bounces from that domain's own MTA log** (reuse `bounce-watcher.sh`'s + `status=bounced` tail pattern) and **write the result back to + `fmcsa_carriers.email_verify_result`**: + - delivered (no bounce within N hours) → upgrade to a new `send_confirmed` + result that the PW campaign filter treats as sendable. + - hard-bounced → mark `hard_bounced`, permanently excluded from PW sends. + +4. **PW campaigns then send only to `smtp_valid` + `send_confirmed`** — addresses + proven deliverable by a real send — keeping PW's bounce rate near zero. + +### Why a separate domain/IP + +Reputation is per sending-domain + per-IP. If the burner domain gets blocklisted +from the inevitable bounces during scrubbing, **PW and carrierone are untouched.** +The burner is disposable: if it burns, rotate to a new one. PW only ever sends to +the cleaned output. + +### Compliance guardrails (must-haves) + +- Real **CAN-SPAM** compliance: truthful from/subject, physical address, working + one-click unsubscribe, honor opt-outs immediately (sync opt-outs back to PW's + suppression list too). +- **Not deceptive**: the email is a genuine message (these are public FMCSA + business contacts for B2B outreach), not a fake/pretext. The bounce signal is a + byproduct, not a trick. +- Suppress anyone who ever bounced or opted out from ALL future sends (burner and + PW). + +## Status / next steps + +- [x] Fix the PW trucking send filter (drop `mx_unreachable`; recovery mode). +- [x] Confirm healthcare unaffected. +- [ ] Add `send_confirmed` / `hard_bounced` result handling to the campaign + filter + a writeback path from bounce processing. +- [ ] Stand up the burner verification domain + isolated MTA identity. +- [ ] Build the verification-send + bounce-writeback worker. +- [ ] Re-verify the `catch_all_domain` + `mx_unreachable` pools through the burner + to grow the PW-sendable list. diff --git a/scripts/build_trucking_campaigns.py b/scripts/build_trucking_campaigns.py index b255c1d..ab36fb9 100644 --- a/scripts/build_trucking_campaigns.py +++ b/scripts/build_trucking_campaigns.py @@ -339,11 +339,13 @@ REPLY_TO_HEADERS = [{"name": "Reply-To", "value": REPLY_TO_EMAIL}] # blocklisted). So 'mx_unreachable' and all error/reject results are excluded. # # Recovery mode (default ON while reputation is damaged): send ONLY 'smtp_valid' -# — addresses an MX explicitly accepted at RCPT time — to drive the bounce rate -# to near-zero and rebuild sender reputation. Once recovered, set -# CAMPAIGN_INCLUDE_CATCH_ALL=1 to re-add catch-all domains (which accept at SMTP -# time but can still bounce later, so they stay out during recovery). -_SENDABLE_RESULTS = ["smtp_valid"] +# — addresses an MX explicitly accepted at RCPT time — plus 'send_confirmed' +# (addresses proven deliverable by a real burner-domain verification send; see +# docs/campaign-deliverability-plan.md). This drives the bounce rate to near-zero +# and rebuilds sender reputation. Once recovered, set CAMPAIGN_INCLUDE_CATCH_ALL=1 +# to re-add catch-all domains (which accept at SMTP time but can still bounce +# later, so they stay out during recovery). 'hard_bounced' is NEVER sendable. +_SENDABLE_RESULTS = ["smtp_valid", "send_confirmed"] if os.getenv("CAMPAIGN_INCLUDE_CATCH_ALL", "0") not in ("0", "false", ""): _SENDABLE_RESULTS += ["catch_all_domain", "catch_all_detected"] USABLE_FILTER = ( diff --git a/scripts/burner_list_verify.py b/scripts/burner_list_verify.py new file mode 100644 index 0000000..dc5714b --- /dev/null +++ b/scripts/burner_list_verify.py @@ -0,0 +1,157 @@ +#!/usr/bin/env python3 +"""Burner-domain list verification: write deliverability back to fmcsa_carriers. + +The SMTP-probe verifier (email_verifier.py) can't tell which catch-all / +mx_unreachable addresses actually deliver. The only ground truth is a REAL send. +We do that from a disposable burner sending domain (NOT performancewest.net / +carrierone.com — see docs/campaign-deliverability-plan.md) so the inevitable +bounces never touch PW's reputation. This script reconciles that send: + + 1. Scan the burner MTA's mail.log for messages FROM the burner sender. + 2. Any recipient that hard-bounced -> fmcsa_carriers.email_verify_result = + 'hard_bounced' (permanently excluded from PW campaigns). + 3. Any recipient that was DELIVERED (status=sent, no later bounce) and is not + already smtp_valid -> 'send_confirmed' (proven deliverable; the PW + campaign filter treats smtp_valid + send_confirmed as sendable). + +Idempotent: only upgrades 'catch_all_*' / 'mx_unreachable' / NULL rows to +'send_confirmed', and only sets 'hard_bounced' on a real bounce. Never downgrades +an already-confirmed address except to mark a genuine bounce. + +Usage: + python3 -m scripts.burner_list_verify --log /var/log/burner-mail.log + python3 -m scripts.burner_list_verify --log mail.log --dry-run +""" +from __future__ import annotations + +import argparse +import os +import re +import sys + +import psycopg2 + +DATABASE_URL = os.getenv("DATABASE_URL", "") + +# Sender(s) used by the burner verification campaign. Override via env when the +# burner domain is provisioned (e.g. BURNER_SENDERS="verify@listcheck-xyz.com"). +BURNER_SENDERS = { + s.strip().lower() + for s in os.getenv("BURNER_SENDERS", "").split(",") + if s.strip() +} + +QID_RE = re.compile(r"postfix/\w+\[\d+\]: ([A-Z0-9]+):") +FROM_RE = re.compile(r"from=<([^>]*)>") +TO_RE = re.compile(r"to=<([^>]*)>") +STATUS_RE = re.compile(r"status=(\w+)") + +# Results we are allowed to UPGRADE to 'send_confirmed'. We never overwrite an +# explicit smtp_valid (already best) or a hard_bounced (worse signal wins). +UPGRADABLE = ("catch_all_domain", "catch_all_detected", "mx_unreachable", + "smtp_temp_error", "smtp_unknown_451", "smtp_unknown_450") + + +def scan_log(log_path: str) -> tuple[set[str], set[str]]: + """Return (delivered_emails, bounced_emails) for burner-sender messages.""" + if not BURNER_SENDERS: + print("ERROR: set BURNER_SENDERS (e.g. verify@your-burner-domain.com)", + file=sys.stderr) + return set(), set() + + burner_qids: set[str] = set() + qid_rcpt: dict[str, str] = {} + delivered: set[str] = set() + bounced: set[str] = set() + + with open(log_path, errors="ignore") as f: + for line in f: + qm = QID_RE.search(line) + if not qm: + continue + qid = qm.group(1) + + fm = FROM_RE.search(line) + if fm and fm.group(1).lower() in BURNER_SENDERS: + burner_qids.add(qid) + + tm = TO_RE.search(line) + sm = STATUS_RE.search(line) + if tm and sm and qid in burner_qids: + rcpt = tm.group(1).lower() + qid_rcpt[qid] = rcpt + status = sm.group(1).lower() + if status == "bounced": + bounced.add(rcpt) + elif status == "sent": + delivered.add(rcpt) + + # A bounce anywhere wins over a "sent" (deferred-then-bounced). + delivered -= bounced + return delivered, bounced + + +def writeback(delivered: set[str], bounced: set[str], dry_run: bool = False) -> dict: + """Apply send_confirmed / hard_bounced to fmcsa_carriers.""" + stats = {"confirmed": 0, "bounced": 0} + if not (delivered or bounced): + return stats + conn = psycopg2.connect(DATABASE_URL) + try: + with conn.cursor() as cur: + # Hard bounces: always mark (worst signal wins), excludes from PW sends. + for email in bounced: + if dry_run: + stats["bounced"] += 1 + continue + cur.execute( + """UPDATE fmcsa_carriers + SET email_verify_result = 'hard_bounced', + email_verified = FALSE + WHERE lower(email_address) = %s + AND email_verify_result IS DISTINCT FROM 'hard_bounced'""", + (email,), + ) + stats["bounced"] += cur.rowcount + # Delivered: upgrade soft/unknown results to send_confirmed. + for email in delivered: + if dry_run: + stats["confirmed"] += 1 + continue + cur.execute( + """UPDATE fmcsa_carriers + SET email_verify_result = 'send_confirmed', + email_verified = TRUE + WHERE lower(email_address) = %s + AND (email_verify_result IN %s OR email_verify_result IS NULL)""", + (email, UPGRADABLE), + ) + stats["confirmed"] += cur.rowcount + if not dry_run: + conn.commit() + finally: + conn.close() + return stats + + +def main() -> int: + ap = argparse.ArgumentParser() + ap.add_argument("--log", default="/var/log/burner-mail.log", + help="burner MTA mail.log to scan") + ap.add_argument("--dry-run", action="store_true") + args = ap.parse_args() + + if not os.path.exists(args.log): + print(f"log not found: {args.log}", file=sys.stderr) + return 1 + delivered, bounced = scan_log(args.log) + print(f"burner scan: {len(delivered)} delivered, {len(bounced)} bounced") + stats = writeback(delivered, bounced, dry_run=args.dry_run) + tag = "[dry-run] " if args.dry_run else "" + print(f"{tag}writeback: send_confirmed +{stats['confirmed']}, " + f"hard_bounced +{stats['bounced']}") + return 0 + + +if __name__ == "__main__": + raise SystemExit(main())