Fix 1 (consumer mx: exclusion) and Fix 3 (pw-mx-tag cron) live as of 9eeed47.
Verified: warmup pool 353,909 after fix (not starved), mx:yahoodns.net cap=0
during warmup, cron tags idempotently. Fix 2 (NULL bucket cap) deferred.
8.9 KiB
Plan: close the MX-exclusion gaps in the trucking warmup
Status: Fix 1 + Fix 3 SHIPPED 2026-06-20 (commit 9eeed47); Fix 2 deferred.
Owner context: warmup day 17; big operators (Google/Microsoft/Proofpoint/
Mimecast/Barracuda/Cisco/Broadcom) are EXCLUDED until day 30, then re-introduced
via mx_daily_caps(). This plan fixes three holes that let throttling/consumer
MX operators through during that window.
What shipped (2026-06-20, commit 9eeed47)
- Fix 1 (DONE):
CONSUMER_MX_OPERATORS(mx:yahoodns.net, mx:icloud.com, comcast/charter/centurylink/windstream/tds/earthlink) folded intoWARMUP_EXCLUDE_OPERATORS, used by both thefetch_carriers()exclusion SQL andmx_daily_caps()(same day-30 ramp). Verified live: warmup-eligible pool = 353,909 carriers after the fix (not starved), andmx_daily_caps()returns cap 0 for mx:yahoodns.net during warmup. - Fix 3 (DONE):
infra/cron/pw-mx-taginstalled to/etc/cron.d/(05:45 UTC daily,--only-unsent --limit-domains 20000). Verified: a 200-domain test run tagged 216 domains; idempotent/bounded. - Fix 2 (DEFERRED): bounding the NULL bucket cap — the cron will drain the backlog, revisit if NULL stays large.
Background: how the two MX layers work today
Sender reputation is judged by the receiving operator (MX), not the recipient
domain string. There are two independent gates in scripts/build_trucking_campaigns.py:
fetch_carriers()big-MX EXCLUSION (SQLbig_mx_exclude): during warmup (main_warmup_day() <= MAIN_BIG_MX_EXCLUDE_UNTIL_DAY, currently day 30) it drops carriers whosemx_provider IN BIG_MX_OPERATORS.mx_provider IS NULLis deliberately KEPT (so the pool isn't starved before tagging completes).select_sendable_carriers()per-MX THROTTLE (mx_daily_caps+per_opcap): bounds how many of a run's quota go to each KNOWN operator so we never concentrate on one. NULL is NOT capped (would collapse onto one bucket and starve the pool).
mx_provider is populated by scripts/mx_tag_carriers.py, which resolves each
domain's MX and returns either a clean label (google, microsoft,
proofpoint, mimecast, cisco, barracuda, broadcom, godaddy, zoho,
rackspace) or, for everything else, an mx:<root-domain> prefix (e.g.
mx:yahoodns.net, mx:icloud.com, mx:comcast.net).
The three gaps (with live numbers, 2026-06-20)
Gap 1 — consumer/throttling MX behind the mx: prefix are NOT excluded
BIG_MX_OPERATORS only lists the clean labels. The big consumer mailbox
operators get tagged with the mx: prefix and so slip BOTH gates during warmup:
| mx_provider | sendable carriers | why it's a problem |
|---|---|---|
mx:yahoodns.net |
283,113 | Yahoo Small Business / AOL custom domains — same aggressive consumer filtering + complaint-driven blocking as consumer Yahoo. By far the biggest hole. |
mx:icloud.com |
24,985 | Apple iCloud+ Custom Domain — Apple consumer filtering; iCloud was the biggest consumer leak we already scrubbed from Listmonk. |
mx:comcast.net |
12,251 | Comcast consumer infra; historically bouncy. |
mx:charter.net |
5,860 | Spectrum/Charter consumer. |
mx:centurylink.net / mx:windstream.net / mx:tds.net / mx:earthlink-vadesecure.net |
~8,100 | Legacy/satellite ISP consumer mail; many already in DEAD_ISP_DOMAINS as literal domains but NOT caught when a custom domain points its MX there. |
mx:yahoodns.net alone is 283k carriers that look "long-tail/safe" to the
warmup but actually filter like a big operator. This is the headline fix.
NOTE: the literal-domain layer (
BLOCKED_EMAIL_DOMAINSincl. the Yahoo family, Apple, dead ISPs) already blockssomeone@yahoo.com/@icloud.com. The hole is a custom domain whose MX points at Yahoo/iCloud — invisible to the string layer, only visible via MX tagging. That's exactly what this closes.
Gap 2 — 315,892 untagged (NULL) carriers are sent to unvetted
mx_provider IS NULL is kept by both gates by design (anti-starvation). With
315,892 sendable NULLs vs 1,187,054 tagged, a meaningful slice of every run
goes to domains we've never MX-resolved — some of which are Google/MS/Yahoo we'd
otherwise exclude. This is acceptable as a bootstrap but should shrink over time.
Gap 3 — mx_tag_carriers.py is not on a cron
There is no infra/cron/pw-mx-tag (confirmed: no cron references it). So the NULL
backlog only shrinks when someone runs it by hand. New carriers imported by the
FMCSA census downloader land as NULL and stay NULL. Without continuous tagging,
Gaps 1 and 2 slowly re-open.
Proposed fixes
Fix 1 — exclude consumer/throttling mx: operators during warmup (HIGH)
Add an explicit set of mx:-prefixed operators that should be treated like the
big operators during warmup, and fold them into BOTH the exclusion and the
throttle. Keep it data-driven and documented.
# scripts/build_trucking_campaigns.py
# Consumer / aggressively-filtering mailbox operators that mx_tag_carriers.py
# labels with the "mx:" prefix (no clean label). They throttle/complaint-block
# like the big operators, so hold them out during warmup too. (yahoodns =
# Yahoo Small Business + AOL custom domains; icloud = Apple custom domains.)
CONSUMER_MX_OPERATORS = (
"mx:yahoodns.net", "mx:icloud.com", "mx:comcast.net", "mx:charter.net",
"mx:centurylink.net", "mx:windstream.net", "mx:tds.net",
"mx:earthlink-vadesecure.net",
)
# Everything held out of the warmup pool entirely (until MAIN_BIG_MX_EXCLUDE_UNTIL_DAY).
WARMUP_EXCLUDE_OPERATORS = BIG_MX_OPERATORS + CONSUMER_MX_OPERATORS
- In
fetch_carriers(): buildbig_mx_excludefromWARMUP_EXCLUDE_OPERATORS(not justBIG_MX_OPERATORS). - In
mx_daily_caps(): giveCONSUMER_MX_OPERATORSthe samebigramp as the clean big operators after day 30 (so they re-introduce gradually, not all at once on day 31). - Keep it behind the existing
MAIN_SKIP_BIG_MXswitch so it's reversible.
Effect: removes ~330k consumer-MX carriers from the warmup-window pool; the long tail of genuinely small/self-hosted systems carries the volume, which is the whole point of the warmup strategy.
Fix 2 — bound the NULL bucket with a small cap (MEDIUM)
Don't exclude NULL (still anti-starvation), but give it a real per-run cap in
select_sendable_carriers() instead of "uncapped". E.g. treat unknown/NULL like
__default__ but at a fraction (say 40/run) so an untagged Google/Yahoo domain
can't flood a run. Pairs with Fix 3 (continuous tagging) to shrink the bucket.
Fix 3 — put mx_tag_carriers.py on a daily cron (MEDIUM)
Add infra/cron/pw-mx-tag (model on pw-listmonk-scrub) running e.g. 05:45 UTC
(before the 08:00 trucking builder), tagging the next N thousand NULL domains/day:
45 5 * * * deploy cd /opt/performancewest && docker compose exec -T workers \
python3 -m scripts.mx_tag_carriers --limit-domains 20000 \
>> /var/log/pw-mx-tag.log 2>&1
Install to /etc/cron.d/ (deploy.sh doesn't run ansible). This continuously
shrinks the 315k NULL backlog and keeps newly-imported carriers tagged, so Fixes
1 & 2 stay effective.
Validation plan (verify before/after, no sends triggered)
- Dry-run the selector before/after Fix 1 and diff the per-MX composition of
a simulated run (the builder has
list_segments()/ quota selection paths that can be exercised read-only). Assert 0 carriers fromCONSUMER_MX_OPERATORSare selected whilemain_warmup_day() <= 30. - SQL sanity:
SELECT mx_provider, count(*) ... WHERE listmonk_sent_at IS NULL GROUP BY 1— confirm the excluded operators drop out of the candidate pool. - Cron (Fix 3): run
mx_tag_carriers --limit-domains 1000once by hand, confirm the NULL count falls and no errors; then install the cron and confirm the next-day count fell again (idempotent, bounded). - Regression: confirm the long-tail pool is still large enough to hit daily quota at warmup caps (so we don't starve the send). If the long tail is too small after excluding 330k consumer-MX, that's a signal to either lower the daily quota or accept a smaller controlled slice of one consumer operator.
Open questions / decisions for owner
- Re-introduction after day 30: treat
CONSUMER_MX_OPERATORSidentically to the big operators (same ramp), or keep Yahoo/iCloud custom domains excluded longer (they convert worse and complain more)? Recommendation: same ramp, but watch the reputation monitor's per-operator reject% and pull back if Yahoo spikes. - NULL cap size (Fix 2): 40/run is a guess; tune against how fast Fix 3 drains the backlog.
- Should
mx:consumer exclusion be permanent (not just warmup)? For a B2B compliance product, a carrier reachable only at a Yahoo/iCloud custom domain is a low-value, high-complaint segment regardless of warmup. Worth considering a permanent down-weight, not just a warmup hold.