justin 40da017b79 campaigns: auto-rollout catch-all pool gated by warmup day + live bounce rate

Replaces the panic-era burner-domain verification plan with an in-house
automatic catch-all rollout in the trucking/IFTA/UCR builders. Root-cause
classification of the 75k pre-DKIM-fix bounces showed ~55% were reputation/
auth (now fixed by DKIM signing) and only ~29% genuinely-dead mailboxes;
catch-all domains accept at RCPT time so they do not user-unknown bounce at
send, making a controlled in-house bleed safer than warming a separate burner.

catch_all_enabled() adds catch-all results only when warmup_day >=
CAMPAIGN_CATCH_ALL_MIN_DAY (21) AND the recent 2-day live bounce rate is below
CAMPAIGN_CATCH_ALL_MAX_BOUNCE_PCT (8%) on a >=300-sent sample; auto-reverts to
the clean smtp_valid/send_confirmed pool on the next run if bounces spike.
Short window so a past disaster cannot block the rollout forever and a fresh
spike trips fast. CAMPAIGN_INCLUDE_CATCH_ALL=1/0 still hard-overrides.

USABLE_FILTER (static) -> usable_filter() (per-run, memoized, one DB probe).
IFTA/UCR SELECT_SQL -> _select_sql() so tc.usable_filter() resolves at call
time, not import. 13 logic unit tests pass; live dry-run decision = OFF
(day 15 < 21 and recent 2d bounce 42% from the aging-out Jun-16 disaster).

2026-06-18 01:39:09 -05:00

6.8 KiB

Raw Blame History

Campaign Deliverability — Diagnosis & List-Verification Plan

Created 2026-06-17 after trucking conversions went to zero.

TL;DR

Trucking conversions stopped on June 9 not because campaigns stopped sending (they send ~2,400/day with ~1,800 opens/3 days) but because a filter bug was blasting ~438k dead mx_unreachable domains, producing a ~47% hard-bounce rate (~1,100/day) that blocklisted half the 120k subscriber base and torched sender reputation, so real prospects never saw the offer.

Fixed (build_trucking_campaigns.py): send filter now keys only off email_verify_result (never the broken email_verified boolean), and defaults to recovery mode = smtp_valid only until reputation recovers. Set CAMPAIGN_INCLUDE_CATCH_ALL=1 to re-add catch-all domains afterward.
Healthcare is fine — separate instance (listmonk-hc / DB listmonk_hc), cleaned list (clean_hc_warmup_list.py already drops mx_unreachable), bounce rate ~2-3%. No change needed; it proves the fix is correct.

Why the SMTP-probe verification under-counts deliverable addresses

email_verifier.py does syntax → MX → SMTP RCPT TO. Results:

result	count	sendable?	why
`catch_all_domain`	1,082,817	risky	domain accepts ALL rcpts at SMTP time, then may bounce later
`mx_unreachable`	438,163	NO	MX exists but never answered the probe — hard-bounces on real send
`smtp_valid`	11,774	YES	an MX explicitly accepted this exact mailbox
`no_mx_records` / `invalid_syntax` / `smtp_rejected_550`	~46k	no	dead

The probe can only confirm a mailbox on non-catch-all domains that answer the RCPT handshake — which is a small slice. Only ~3,042 smtp_valid are still unsent, so recovery mode will exhaust the clean pool in ~1 day. We need a way to grow the verified-deliverable list without burning PW's reputation.

The real fix: burner-domain bounce verification

SMTP-probe verification is unreliable (catch-alls mask validity; many MTAs refuse probes but accept real mail). The only ground truth is actually send a message and see if it bounces. But doing that from PW's domain is what got us here. So:

Design

Dedicated throwaway verification domain (NOT performancewest.net and NOT carrierone.com — both are reputation assets we must protect). Register a cheap neutral .com via Porkbun (we already have the Porkbun integration). Give it its own SPF/DKIM/DMARC and a dedicated sending IP/identity (separate postfix instance or a transactional provider sub-account that isolates reputation).
Send a low-key, CAN-SPAM-compliant, non-commercial verification email to the unverified pool (e.g. a plain "is this the right contact for <DOT#>?" or a bland newsletter-style note with a working unsubscribe). It must be a real, legitimate message — never deceptive — but its ONLY purpose is to elicit a delivered-vs-bounced signal. Throttled and warmed like any send.
Catch bounces from that domain's own MTA log (reuse bounce-watcher.sh's status=bounced tail pattern) and write the result back to fmcsa_carriers.email_verify_result:
- delivered (no bounce within N hours) → upgrade to a new send_confirmed result that the PW campaign filter treats as sendable.
- hard-bounced → mark hard_bounced, permanently excluded from PW sends.
PW campaigns then send only to smtp_valid + send_confirmed — addresses proven deliverable by a real send — keeping PW's bounce rate near zero.

Why a separate domain/IP

Reputation is per sending-domain + per-IP. If the burner domain gets blocklisted from the inevitable bounces during scrubbing, PW and carrierone are untouched. The burner is disposable: if it burns, rotate to a new one. PW only ever sends to the cleaned output.

Compliance guardrails (must-haves)

Real CAN-SPAM compliance: truthful from/subject, physical address, working one-click unsubscribe, honor opt-outs immediately (sync opt-outs back to PW's suppression list too).
Not deceptive: the email is a genuine message (these are public FMCSA business contacts for B2B outreach), not a fake/pretext. The bounce signal is a byproduct, not a trick.
Suppress anyone who ever bounced or opted out from ALL future sends (burner and PW).

Status / next steps

Fix the PW trucking send filter (drop mx_unreachable; recovery mode).
Confirm healthcare unaffected.
Add send_confirmed / hard_bounced result handling to the campaign filter + a writeback path from bounce processing (burner_list_verify.py).
Catch-all auto-rollout instead of the burner domain (2026-06-18). After the DKIM signing fix landed, a root-cause classification of the 75k pre-fix bounces showed the damage was ~55% reputation/auth (which DKIM fixes) and only ~29% genuinely-dead mailboxes. The catch-all pool accepts at RCPT time by definition, so it does not user-unknown bounce at send time -- it is far safer to bleed directly in warmed batches than to stand up + warm a whole separate burner domain/IP/SPF/DKIM identity. So the catch-all pool is now gated by an automatic in-house rollout in build_trucking_campaigns.py (catch_all_enabled()): - enables only when warmup_day() >= CAMPAIGN_CATCH_ALL_MIN_DAY (21) AND the recent (2-day) live campaign bounce rate is below CAMPAIGN_CATCH_ALL_MAX_BOUNCE_PCT (8%) on a trustworthy sample (>= 300 sent); - auto-reverts to the clean smtp_valid/send_confirmed pool on the next run if bounces spike back above the ceiling; - a deliberately SHORT window so a past disaster (the Jun-16 ~45% 7-day rate) cannot block the rollout forever, and a fresh spike trips it fast; - CAMPAIGN_INCLUDE_CATCH_ALL=1/0 still hard-overrides the auto decision. Applied uniformly to trucking + IFTA + UCR builders (tc.usable_filter()). The bounce-watcher continues to auto-suppress any individual hard bounces in real time, so PW's own bounce rate stays bounded during the rollout.
~~Stand up the burner verification domain + isolated MTA identity.~~ Dropped -- superseded by the catch-all auto-rollout above (the burner was a panic-era design from before the DKIM fix + per-subscriber bounce tracking made an in-house controlled rollout safe). The mx_probe_blocked consumer-ISP pool (438k, highest dead-mailbox risk) is the only case where a burner would still help; revisit only if that pool is ever needed.
~~Build the verification-send + bounce-writeback worker.~~ Not needed for catch-all (see above). burner_list_verify.py remains available if the mx_probe_blocked pool is ever scrubbed via a burner.

6.8 KiB Raw Blame History