# Plan: close the MX-exclusion gaps in the trucking warmup **Status:** ALL THREE FIXES SHIPPED 2026-06-20 (Fix 1+3 `9eeed47`, Fix 2 `bc93d93`). **Owner context:** warmup day 17; big operators (Google/Microsoft/Proofpoint/ Mimecast/Barracuda/Cisco/Broadcom) are EXCLUDED until day 30, then re-introduced via `mx_daily_caps()`. This plan fixes three holes that let throttling/consumer MX operators through during that window. ## What shipped (2026-06-20, commit `9eeed47`) - **Fix 1 (DONE):** `CONSUMER_MX_OPERATORS` (mx:yahoodns.net, mx:icloud.com, comcast/charter/centurylink/windstream/tds/earthlink) folded into `WARMUP_EXCLUDE_OPERATORS`, used by both the `fetch_carriers()` exclusion SQL and `mx_daily_caps()` (same day-30 ramp). Verified live: warmup-eligible pool = 353,909 carriers after the fix (not starved), and `mx_daily_caps()` returns cap 0 for mx:yahoodns.net during warmup. - **Fix 3 (DONE):** `infra/cron/pw-mx-tag` installed to `/etc/cron.d/` (05:45 UTC daily, `--only-unsent --limit-domains 20000`). Verified: a 200-domain test run tagged 216 domains; idempotent/bounded. - **Fix 2 (DONE):** `select_sendable_carriers()` now bounds the untagged (NULL mx_provider) bucket with a single shared `untagged_cap` (env `MAIN_UNTAGGED_MX_CAP`, default `max(quota, 200)` = no-starve / no behavior change today). Only ~3,035 distinct verified-sendable untagged domains remain, so pw-mx-tag drains them in its first run; tighten the cap to a fraction of quota afterward to prefer the tagged long tail. Commit `bc93d93`. --- ## Background: how the two MX layers work today Sender reputation is judged by the **receiving operator (MX)**, not the recipient domain string. There are two independent gates in `scripts/build_trucking_campaigns.py`: 1. **`fetch_carriers()` big-MX EXCLUSION** (SQL `big_mx_exclude`): during warmup (`main_warmup_day() <= MAIN_BIG_MX_EXCLUDE_UNTIL_DAY`, currently day 30) it drops carriers whose `mx_provider IN BIG_MX_OPERATORS`. `mx_provider IS NULL` is deliberately KEPT (so the pool isn't starved before tagging completes). 2. **`select_sendable_carriers()` per-MX THROTTLE** (`mx_daily_caps` + `per_op` cap): bounds how many of a run's quota go to each KNOWN operator so we never concentrate on one. NULL is NOT capped (would collapse onto one bucket and starve the pool). `mx_provider` is populated by `scripts/mx_tag_carriers.py`, which resolves each domain's MX and returns either a **clean label** (`google`, `microsoft`, `proofpoint`, `mimecast`, `cisco`, `barracuda`, `broadcom`, `godaddy`, `zoho`, `rackspace`) or, for everything else, an **`mx:` prefix** (e.g. `mx:yahoodns.net`, `mx:icloud.com`, `mx:comcast.net`). --- ## The three gaps (with live numbers, 2026-06-20) ### Gap 1 — consumer/throttling MX behind the `mx:` prefix are NOT excluded `BIG_MX_OPERATORS` only lists the clean labels. The big consumer mailbox operators get tagged with the `mx:` prefix and so slip BOTH gates during warmup: | mx_provider | sendable carriers | why it's a problem | | --- | --- | --- | | `mx:yahoodns.net` | **283,113** | Yahoo Small Business / AOL custom domains — same aggressive consumer filtering + complaint-driven blocking as consumer Yahoo. By far the biggest hole. | | `mx:icloud.com` | **24,985** | Apple iCloud+ Custom Domain — Apple consumer filtering; iCloud was the biggest consumer leak we already scrubbed from Listmonk. | | `mx:comcast.net` | 12,251 | Comcast consumer infra; historically bouncy. | | `mx:charter.net` | 5,860 | Spectrum/Charter consumer. | | `mx:centurylink.net` / `mx:windstream.net` / `mx:tds.net` / `mx:earthlink-vadesecure.net` | ~8,100 | Legacy/satellite ISP consumer mail; many already in `DEAD_ISP_DOMAINS` as literal domains but NOT caught when a custom domain points its MX there. | `mx:yahoodns.net` alone is **283k** carriers that look "long-tail/safe" to the warmup but actually filter like a big operator. This is the headline fix. > NOTE: the literal-domain layer (`BLOCKED_EMAIL_DOMAINS` incl. the Yahoo family, > Apple, dead ISPs) already blocks `someone@yahoo.com` / `@icloud.com`. The hole > is a **custom domain whose MX points at Yahoo/iCloud** — invisible to the > string layer, only visible via MX tagging. That's exactly what this closes. ### Gap 2 — 315,892 untagged (NULL) carriers are sent to unvetted `mx_provider IS NULL` is kept by both gates by design (anti-starvation). With **315,892** sendable NULLs vs 1,187,054 tagged, a meaningful slice of every run goes to domains we've never MX-resolved — some of which are Google/MS/Yahoo we'd otherwise exclude. This is acceptable as a bootstrap but should shrink over time. ### Gap 3 — `mx_tag_carriers.py` is not on a cron There is no `infra/cron/pw-mx-tag` (confirmed: no cron references it). So the NULL backlog only shrinks when someone runs it by hand. New carriers imported by the FMCSA census downloader land as NULL and stay NULL. Without continuous tagging, Gaps 1 and 2 slowly re-open. --- ## Proposed fixes ### Fix 1 — exclude consumer/throttling `mx:` operators during warmup (HIGH) Add an explicit set of `mx:`-prefixed operators that should be treated like the big operators during warmup, and fold them into BOTH the exclusion and the throttle. Keep it data-driven and documented. ```python # scripts/build_trucking_campaigns.py # Consumer / aggressively-filtering mailbox operators that mx_tag_carriers.py # labels with the "mx:" prefix (no clean label). They throttle/complaint-block # like the big operators, so hold them out during warmup too. (yahoodns = # Yahoo Small Business + AOL custom domains; icloud = Apple custom domains.) CONSUMER_MX_OPERATORS = ( "mx:yahoodns.net", "mx:icloud.com", "mx:comcast.net", "mx:charter.net", "mx:centurylink.net", "mx:windstream.net", "mx:tds.net", "mx:earthlink-vadesecure.net", ) # Everything held out of the warmup pool entirely (until MAIN_BIG_MX_EXCLUDE_UNTIL_DAY). WARMUP_EXCLUDE_OPERATORS = BIG_MX_OPERATORS + CONSUMER_MX_OPERATORS ``` - In `fetch_carriers()`: build `big_mx_exclude` from `WARMUP_EXCLUDE_OPERATORS` (not just `BIG_MX_OPERATORS`). - In `mx_daily_caps()`: give `CONSUMER_MX_OPERATORS` the same `big` ramp as the clean big operators after day 30 (so they re-introduce gradually, not all at once on day 31). - Keep it behind the existing `MAIN_SKIP_BIG_MX` switch so it's reversible. **Effect:** removes ~330k consumer-MX carriers from the warmup-window pool; the long tail of genuinely small/self-hosted systems carries the volume, which is the whole point of the warmup strategy. ### Fix 2 — bound the NULL bucket with a small cap (MEDIUM) Don't exclude NULL (still anti-starvation), but give it a real per-run cap in `select_sendable_carriers()` instead of "uncapped". E.g. treat unknown/NULL like `__default__` but at a fraction (say 40/run) so an untagged Google/Yahoo domain can't flood a run. Pairs with Fix 3 (continuous tagging) to shrink the bucket. ### Fix 3 — put `mx_tag_carriers.py` on a daily cron (MEDIUM) Add `infra/cron/pw-mx-tag` (model on `pw-listmonk-scrub`) running e.g. 05:45 UTC (before the 08:00 trucking builder), tagging the next N thousand NULL domains/day: ``` 45 5 * * * deploy cd /opt/performancewest && docker compose exec -T workers \ python3 -m scripts.mx_tag_carriers --limit-domains 20000 \ >> /var/log/pw-mx-tag.log 2>&1 ``` Install to `/etc/cron.d/` (deploy.sh doesn't run ansible). This continuously shrinks the 315k NULL backlog and keeps newly-imported carriers tagged, so Fixes 1 & 2 stay effective. --- ## Validation plan (verify before/after, no sends triggered) 1. **Dry-run the selector** before/after Fix 1 and diff the per-MX composition of a simulated run (the builder has `list_segments()` / quota selection paths that can be exercised read-only). Assert 0 carriers from `CONSUMER_MX_OPERATORS` are selected while `main_warmup_day() <= 30`. 2. **SQL sanity:** `SELECT mx_provider, count(*) ... WHERE listmonk_sent_at IS NULL GROUP BY 1` — confirm the excluded operators drop out of the candidate pool. 3. **Cron (Fix 3):** run `mx_tag_carriers --limit-domains 1000` once by hand, confirm the NULL count falls and no errors; then install the cron and confirm the next-day count fell again (idempotent, bounded). 4. **Regression:** confirm the long-tail pool is still large enough to hit daily quota at warmup caps (so we don't starve the send). If the long tail is too small after excluding 330k consumer-MX, that's a signal to either lower the daily quota or accept a smaller controlled slice of one consumer operator. --- ## Open questions / decisions for owner - **Re-introduction after day 30:** treat `CONSUMER_MX_OPERATORS` identically to the big operators (same ramp), or keep Yahoo/iCloud custom domains excluded *longer* (they convert worse and complain more)? Recommendation: same ramp, but watch the reputation monitor's per-operator reject% and pull back if Yahoo spikes. - **NULL cap size (Fix 2):** 40/run is a guess; tune against how fast Fix 3 drains the backlog. - **Should `mx:` consumer exclusion be permanent (not just warmup)?** For a B2B compliance product, a carrier reachable only at a Yahoo/iCloud custom domain is a low-value, high-complaint segment regardless of warmup. Worth considering a permanent down-weight, not just a warmup hold.