new-site/docs/campaign-deliverability-plan.md
justin 40da017b79 campaigns: auto-rollout catch-all pool gated by warmup day + live bounce rate
Replaces the panic-era burner-domain verification plan with an in-house
automatic catch-all rollout in the trucking/IFTA/UCR builders. Root-cause
classification of the 75k pre-DKIM-fix bounces showed ~55% were reputation/
auth (now fixed by DKIM signing) and only ~29% genuinely-dead mailboxes;
catch-all domains accept at RCPT time so they do not user-unknown bounce at
send, making a controlled in-house bleed safer than warming a separate burner.

catch_all_enabled() adds catch-all results only when warmup_day >=
CAMPAIGN_CATCH_ALL_MIN_DAY (21) AND the recent 2-day live bounce rate is below
CAMPAIGN_CATCH_ALL_MAX_BOUNCE_PCT (8%) on a >=300-sent sample; auto-reverts to
the clean smtp_valid/send_confirmed pool on the next run if bounces spike.
Short window so a past disaster cannot block the rollout forever and a fresh
spike trips fast. CAMPAIGN_INCLUDE_CATCH_ALL=1/0 still hard-overrides.

USABLE_FILTER (static) -> usable_filter() (per-run, memoized, one DB probe).
IFTA/UCR SELECT_SQL -> _select_sql() so tc.usable_filter() resolves at call
time, not import. 13 logic unit tests pass; live dry-run decision = OFF
(day 15 < 21 and recent 2d bounce 42% from the aging-out Jun-16 disaster).
2026-06-18 01:39:09 -05:00

120 lines
6.8 KiB
Markdown

# Campaign Deliverability — Diagnosis & List-Verification Plan
_Created 2026-06-17 after trucking conversions went to zero._
## TL;DR
Trucking conversions stopped on **June 9** not because campaigns stopped sending
(they send ~2,400/day with ~1,800 opens/3 days) but because a **filter bug was
blasting ~438k dead `mx_unreachable` domains**, producing a **~47% hard-bounce
rate (~1,100/day)** that blocklisted **half the 120k subscriber base** and
torched sender reputation, so real prospects never saw the offer.
- **Fixed** (`build_trucking_campaigns.py`): send filter now keys only off
`email_verify_result` (never the broken `email_verified` boolean), and defaults
to **recovery mode = `smtp_valid` only** until reputation recovers. Set
`CAMPAIGN_INCLUDE_CATCH_ALL=1` to re-add catch-all domains afterward.
- **Healthcare is fine** — separate instance (`listmonk-hc` / DB `listmonk_hc`),
cleaned list (`clean_hc_warmup_list.py` already drops `mx_unreachable`), bounce
rate ~2-3%. No change needed; it proves the fix is correct.
## Why the SMTP-probe verification under-counts deliverable addresses
`email_verifier.py` does syntax → MX → SMTP `RCPT TO`. Results:
| result | count | sendable? | why |
|---|---|---|---|
| `catch_all_domain` | 1,082,817 | risky | domain accepts ALL rcpts at SMTP time, then may bounce later |
| `mx_unreachable` | 438,163 | **NO** | MX exists but never answered the probe — **hard-bounces on real send** |
| `smtp_valid` | 11,774 | **YES** | an MX explicitly accepted this exact mailbox |
| `no_mx_records` / `invalid_syntax` / `smtp_rejected_550` | ~46k | no | dead |
The probe can only *confirm* a mailbox on non-catch-all domains that answer the
RCPT handshake — which is a small slice. Only **~3,042 `smtp_valid` are still
unsent**, so recovery mode will exhaust the clean pool in ~1 day. **We need a way
to grow the verified-deliverable list without burning PW's reputation.**
## The real fix: burner-domain bounce verification
SMTP-probe verification is unreliable (catch-alls mask validity; many MTAs refuse
probes but accept real mail). The only ground truth is **actually send a message
and see if it bounces.** But doing that from PW's domain is what got us here. So:
### Design
1. **Dedicated throwaway verification domain** (NOT performancewest.net and NOT
carrierone.com — both are reputation assets we must protect). Register a cheap
neutral `.com` via Porkbun (we already have the Porkbun integration). Give it
its own SPF/DKIM/DMARC and a dedicated sending IP/identity (separate postfix
instance or a transactional provider sub-account that isolates reputation).
2. **Send a low-key, CAN-SPAM-compliant, non-commercial verification email** to
the unverified pool (e.g. a plain "is this the right contact for <DOT#>?" or a
bland newsletter-style note with a working unsubscribe). It must be a real,
legitimate message — never deceptive — but its ONLY purpose is to elicit a
delivered-vs-bounced signal. Throttled and warmed like any send.
3. **Catch bounces from that domain's own MTA log** (reuse `bounce-watcher.sh`'s
`status=bounced` tail pattern) and **write the result back to
`fmcsa_carriers.email_verify_result`**:
- delivered (no bounce within N hours) → upgrade to a new `send_confirmed`
result that the PW campaign filter treats as sendable.
- hard-bounced → mark `hard_bounced`, permanently excluded from PW sends.
4. **PW campaigns then send only to `smtp_valid` + `send_confirmed`** — addresses
proven deliverable by a real send — keeping PW's bounce rate near zero.
### Why a separate domain/IP
Reputation is per sending-domain + per-IP. If the burner domain gets blocklisted
from the inevitable bounces during scrubbing, **PW and carrierone are untouched.**
The burner is disposable: if it burns, rotate to a new one. PW only ever sends to
the cleaned output.
### Compliance guardrails (must-haves)
- Real **CAN-SPAM** compliance: truthful from/subject, physical address, working
one-click unsubscribe, honor opt-outs immediately (sync opt-outs back to PW's
suppression list too).
- **Not deceptive**: the email is a genuine message (these are public FMCSA
business contacts for B2B outreach), not a fake/pretext. The bounce signal is a
byproduct, not a trick.
- Suppress anyone who ever bounced or opted out from ALL future sends (burner and
PW).
## Status / next steps
- [x] Fix the PW trucking send filter (drop `mx_unreachable`; recovery mode).
- [x] Confirm healthcare unaffected.
- [x] Add `send_confirmed` / `hard_bounced` result handling to the campaign
filter + a writeback path from bounce processing (`burner_list_verify.py`).
- [x] **Catch-all auto-rollout instead of the burner domain (2026-06-18).** After
the DKIM signing fix landed, a root-cause classification of the 75k
pre-fix bounces showed the damage was ~55% reputation/auth (which DKIM
fixes) and only ~29% genuinely-dead mailboxes. The catch-all pool accepts
at RCPT time by definition, so it does not user-unknown bounce at send
time -- it is far safer to bleed directly in warmed batches than to stand
up + warm a whole separate burner domain/IP/SPF/DKIM identity. So the
catch-all pool is now gated by an **automatic in-house rollout** in
`build_trucking_campaigns.py` (`catch_all_enabled()`):
- enables only when `warmup_day() >= CAMPAIGN_CATCH_ALL_MIN_DAY` (21)
AND the **recent** (2-day) live campaign bounce rate is below
`CAMPAIGN_CATCH_ALL_MAX_BOUNCE_PCT` (8%) on a trustworthy sample
(>= 300 sent);
- **auto-reverts** to the clean `smtp_valid`/`send_confirmed` pool on the
next run if bounces spike back above the ceiling;
- a deliberately SHORT window so a past disaster (the Jun-16 ~45% 7-day
rate) cannot block the rollout forever, and a fresh spike trips it fast;
- `CAMPAIGN_INCLUDE_CATCH_ALL=1/0` still hard-overrides the auto decision.
Applied uniformly to trucking + IFTA + UCR builders (`tc.usable_filter()`).
The bounce-watcher continues to auto-suppress any individual hard bounces
in real time, so PW's own bounce rate stays bounded during the rollout.
- [ ] ~~Stand up the burner verification domain + isolated MTA identity.~~
**Dropped** -- superseded by the catch-all auto-rollout above (the burner
was a panic-era design from before the DKIM fix + per-subscriber bounce
tracking made an in-house controlled rollout safe). The `mx_probe_blocked`
consumer-ISP pool (438k, highest dead-mailbox risk) is the only case where
a burner would still help; revisit only if that pool is ever needed.
- [x] ~~Build the verification-send + bounce-writeback worker.~~ Not needed for
catch-all (see above). `burner_list_verify.py` remains available if the
`mx_probe_blocked` pool is ever scrubbed via a burner.