diff --git a/docs/plan.intl-compliance-expansion.md b/docs/plan.intl-compliance-expansion.md new file mode 100644 index 0000000..1f4313a --- /dev/null +++ b/docs/plan.intl-compliance-expansion.md @@ -0,0 +1,281 @@ +# Plan — International Compliance-Services Expansion (UK / AU / IE / NZ) + +_Drafted 2026-06-19. **Planning only — nothing implemented yet.** Builds on the +US FMCSA/DOT cold-email + filing model. Sister doc to +`docs/campaign-deliverability-plan.md`, `docs/foreign-incorporation-guide.md`, +and `docs/billing.md`._ + +## Goal + +Replicate the US "regulatory-burden compliance services" model (sell filing / +renewal / monitoring services to small operators, acquired via legal unsolicited +B2B email) in other English-speaking markets that allow cold B2B email. Target +markets, ranked: **UK ⭐, Australia ⭐, Ireland, New Zealand.** (Canada & South +Africa excluded — opt-in-only marketing law; see deliverability doc / prior memo.) + +This doc answers the **two blocking questions** before any market entry: + +1. **Must we be incorporated / locally registered to legally sell this service + type in-region?** (and to send the marketing) +2. **Merchant processing**: how do we take ecommerce payments from in-region + customers, and how do we remit any **fees to the authorities** on their behalf? + +Everything else (template localization, list sourcing, burner sending) is +deferred to follow-up docs once these two gates are cleared. + +--- + +## Question 1 — Do we need a local entity? + +Two separate legal tests. Don't conflate them. + +- **(a) To OPERATE the business** (sell a filing/agent service to locals). +- **(b) To send the MARKETING** (cold B2B email into that country). + +For (b) none of these countries require a local entity — the anti-spam laws +(UK PECR, AU Spam Act, IE ePrivacy, NZ UEMA) bind on **conduct** (sender ID + +unsubscribe + B2B-consent basis), not on where the sender is incorporated. A +foreign sender can lawfully email; it just must comply with the rules. So the +entity question is really about (a): operating, contracting, and getting paid. + +| Market | Local entity legally required to operate? | Reality / why | +|---|---|---| +| **UK** | **No** (can trade as overseas entity), but **strongly advised** | A foreign company can sell services into the UK. BUT: (i) UK merchant acquirers/Stripe UK want a UK or EEA entity for GBP settlement + lower fees; (ii) **VAT registration** likely required (see Q2); (iii) credibility — UK SMEs distrust a US filing agent for their O-licence. A **UK Ltd** is cheap (~£12/yr Companies House) and removes all three frictions. **No UK-resident director required.** | +| **Australia** | **No** to sell remotely; **registration triggered** if "carrying on business in Australia" | Foreign co can sell in. Once you're "carrying on business in AU" you must register as a **foreign company with ASIC (ARBN)** OR form a local **Pty Ltd**. Pty Ltd is cleaner BUT **requires at least one director who ordinarily resides in Australia** (Corporations Act s201A) — this is the real blocker; needs a resident director / nominee service. GST registration required once turnover ≥ A$75k (see Q2). | +| **Ireland** | **No**, but EU-presence helps | Foreign (incl. UK post-Brexit, US) co can sell in. An **Irish Ltd requires at least one EEA-resident director** OR a **s137 non-resident bond** (~€25k insurance bond, ~€2k/yr). VAT registration required (Q2). If we already have a UK Ltd, can often sell into IE from UK without a separate IE entity. | +| **New Zealand** | **No**, low bar to localize | Foreign co can sell in. NZ company formation is fast/cheap BUT **requires one director living in NZ (or in Australia AND a director of an AU company)** — Companies Act 1993 s10. GST registration once turnover ≥ NZ$60k. Smallest market; defer. | + +### Sharper question: does the *service itself* (acting as filing agent) require a license? + +This is the one to verify per-vertical before launch — being someone's agent for +a government filing can be a regulated activity. + +| Market | Filing-agent licensing for transport compliance? | Notes / open item | +|---|---|---| +| **UK** | **No license to be a paid agent**, but the **Transport Manager** role on an O-licence is statutory (must hold a CPC and be a real person of repute). We can sell **prep / monitoring / renewal admin**, and optionally broker **external Transport Manager CPC holders**, but we **cannot ourselves "be" the Transport Manager** without a qualified person. **VERIFY: don't market as providing the TM unless we contract real CPC holders.** | +| **AU** | **No agent license** for NHVR/CoR advisory or NHVAS prep. NHVAS auditors must be approved, but we'd sell prep, not audit. Low risk. | +| **IE** | Same as UK (EU-harmonized: Transport Manager CPC required on the operator licence). | +| **NZ** | Transport Service Licence has a "fit and proper person" + certificate requirements; advisory/prep is unregulated. | + +**Recommendation (Q1):** +1. **UK first.** Form a **UK entity** (no resident director/member needed, cheap, + unlocks Stripe UK + GBP + VAT + credibility). Sell prep/monitoring/renewal; + partner with external CPC Transport Managers rather than claiming to be one. + + **Entity choice — LLP (chosen) vs Ltd:** Going with a **UK LLP** for + **tax pass-through** (no corporation tax at entity level; profits taxed in + members' hands). Trade-offs to plan around: + - **LLP needs ≥2 members** (a Ltd can be a single person). Need a second + member/designated member. + - Pass-through is **not** zero UK tax: non-UK-resident members with **UK-source + trading profit** owe **UK self-assessment**; the LLP files a **partnership + return (SA800)**. So two layers of personal filing, not entity tax. + - **US tax:** a UK LLP defaults to a **partnership** for US purposes (or + check-the-box) → flows to US members' returns; watch for extra US filing. + - **VAT obligation is identical to a Ltd** (see Q2). No saving there. + - **No UK-resident member required** for an LLP — good. +2. **AU second.** Start by **selling remotely as the existing entity** (legal) to + validate demand; only stand up a **Pty Ltd (needs resident-director nominee)** + or **ASIC ARBN** once revenue justifies it / once "carrying on business" is + triggered. +3. **IE / NZ** deferred — both need a resident-director or bond workaround and are + smaller; revisit after UK proves the playbook. + +--- + +## Question 2 — Merchant processing & remitting fees to authorities + +Two money flows, kept strictly separate (same separation we already enforce in +`docs/billing.md`: **our service fee** vs **government filing fee**): + +- **Flow A — collect from customer** (ecommerce checkout, multi-currency). +- **Flow B — pay the authority** the actual government fee on the customer's behalf. + +### Flow A — Collecting payment (merchant processing) + +Today's stack (`api/src/routes/checkout.ts`): **Stripe Checkout (card + ACH), +PayPal Orders v2, SHKeeper crypto**; Stripe Subscriptions for recurring; +Adyen aspirational/not live. We extend this, we don't replace it. + +| Market | Best acquiring approach | Currency / settlement | Notes | +|---|---|---|---| +| **UK** | **Stripe UK** under the new **UK Ltd** | Settle **GBP** to a UK/EEA business account (Wise, Airwallex, Revolut Business, or a UK high-street acct) | Lowest fees, local card success rates, supports **BACS Direct Debit** (the UK ACH analog — good for recurring monitoring subs) and local methods. PayPal UK as fallback. | +| **AU** | **Stripe AU** (needs AU entity) **or** sell via existing Stripe charging in **AUD as a foreign business** initially | AUD; settle via Airwallex/Wise AUD until Pty Ltd exists | Stripe supports AUD on a non-AU account but settlement/fees are worse; **PayID/BECS Direct Debit** need a local Stripe. Start cross-border, localize when entity lands. | +| **IE** | Stripe (UK or IE entity), **EUR**, SEPA Direct Debit | EUR | If UK Ltd exists, can run IE sales through it in EUR. | +| **NZ** | Stripe NZ (needs NZ entity) or cross-border NZD | NZD | Defer. | + +**Multi-currency mechanics (low-lift path):** +- Stripe can present/settle multiple currencies on one account; quickest start is + **charge in local currency on the existing/US or new UK account**, accept FX + until per-market entities exist. +- Use **Wise Business / Airwallex** to hold GBP/AUD/EUR/NZD and avoid double FX. +- Keep **ERPNext as system of record** (multi-currency invoices already supported) + exactly as in `docs/billing.md`; add per-market price lists + tax templates. + +**Surcharge note:** our card surcharge model (`docs/billing.md`) is **illegal/capped +in several of these markets** — **UK & EU cap/ban surcharges on consumer cards +(PSD2 surcharging ban); AU allows surcharge only up to actual cost of acceptance +(RBA rules).** ⚠️ **Do NOT copy the US 3% card surcharge into UK/EU/AU.** Bake +processor cost into price or absorb it there. + +### Flow A.1 — Sales tax / VAT / GST on OUR service fee + +This is mandatory homework, not optional. Selling services to local businesses +generally creates a tax-collection obligation. + +| Market | Tax | Registration trigger | Mechanic | +|---|---|---|---| +| **UK** | **VAT 20%** | If UK-established: register at **£90k** turnover. **If we sell from a non-UK entity into UK, threshold can be £0** (non-established taxable person) → register from first sale. A UK Ltd is simpler. B2B may use **reverse charge** (customer self-accounts) which can reduce our collection burden — **VERIFY per service**. | Register for VAT, charge 20% (or reverse-charge B2B), file quarterly (MTD). | +| **AU** | **GST 10%** | Register at **A$75k** turnover (lower/zero for some non-resident supplies) | Charge 10%, remit to ATO (BAS). B2B reverse-charge may apply for non-resident suppliers. | +| **IE** | **VAT 23%** | Non-established → effectively from first B2B sale; reverse charge common for B2B | File via Revenue. | +| **NZ** | **GST 15%** | A$/NZ$60k | Defer. | + +**Open item:** for **B2B** sales the **reverse charge** mechanism may mean the +*customer* accounts for VAT/GST, dramatically simplifying our obligation — but it +depends on whether the supply is "digital service" vs "professional service" and +our establishment status. **Get a one-off cross-border VAT opinion before launch.** + +### Flow B — Paying the government authority on the customer's behalf + +This is the operationally hard part. In the US we front/relay the filing fee. The +analog per market: + +| Market | Authority + typical fee | How fees are paid | Our remittance mechanism | +|---|---|---|---| +| **UK** | **Traffic Commissioner / DVSA** — O-licence app ~£257 + ~£401 grant + ~£401/5yr; **DVSA** for MOT/tacho; **Companies House** for any co. admin | Mostly **GOV.UK online card/Direct Debit**, agent can pay on behalf | Pay via a **UK business debit card** (from the UK Ltd's bank) at GOV.UK; pass-through the exact fee to customer with no surcharge. Need a funded GBP account (Wise/Revolut/UK bank). | +| **AU** | **NHVR** (registration/accreditation), state road agencies, **ASIC** | NHVR Portal card payment; state portals | Pay via **AU business card**; needs AUD float. Until Pty Ltd, may need customer to pay authority directly while we do prep-only (avoids handling AU gov payments cross-border). | +| **IE** | **RSA** / Dept of Transport, CRO (companies) | gov.ie / RSA online card | EUR business card. | +| **NZ** | **NZTA** (TSL, RUC) | NZTA online | Defer. | + +**Key design decisions for Flow B:** +1. **Pass-through, never markup, the government fee** — same rule as US billing + (surcharges apply to service fees only, not filing fees — `docs/billing.md`). + Display gov fee as a separate, at-cost line item. + + **Card to pay the authorities — funding rail (decided):** GOV.UK / DVSA / + Companies House all take **Visa/Mastercard**, so we need a GBP-funded card. + Options: + - **Stripe Issuing (UK/EU): yes, virtual cards exist.** Stripe Issuing offers + **virtual + physical Visa** in the **UK and EU** (not US-only), funded from + the Stripe balance, with per-card limits. Good for **programmatic per-filing + virtual cards** later. Caveat: needs **Issuing approval/eligibility**, Visa + network only, pitched for platform/expense use — an application, not + instant-on. + - **Wise Business / Revolut Business (preferred for launch):** one product gives + **real UK account details (sort code + acct no.)** that receive + **Faster Payments / BACS / CHAPS**, PLUS **virtual + physical debit cards**, + PLUS multi-currency GBP/EUR/AUD holding. Fund GBP via **Faster Payments** + (instant, free, ~£1M cap) and pay authorities with the attached virtual card. + No prepaid card and no Stripe Issuing approval needed. + - **Transfer-rail note:** you fund an **account that has a card attached**, not a + card directly. Use **Faster Payments** for top-ups (instant/free). **CHAPS** + (£25-35) only for high-value one-offs; **BACS** (3-day batch) for Direct + Debit/payroll, not ad-hoc. Use **Stripe Issuing** only if/when we want + per-filing programmatic cards. +2. **Two models for who pays the authority:** + - **(i) We pay (agent model):** we hold a funded local-currency business card, + pay GOV.UK/NHVR directly, recoup via the customer's checkout. Best UX, needs + local banking + float + reconciliation. **UK = yes (UK Ltd + Wise/Revolut).** + - **(ii) Customer pays the authority directly (prep-only model):** we charge + only our service fee; customer enters their own card at the gov portal. **No + gov-money handling, no float, no entity needed for Flow B.** Best for AU/NZ + market-validation phase and avoids money-transmission questions. +3. **Avoid looking like a money transmitter.** Fronting third-party gov fees at + scale can edge toward regulated payment activity. Keep it as **agency + disbursement of a clearly-itemized pass-through cost**, not a stored-value / + FX product. **VERIFY threshold with counsel if volume grows.** + +**Recommendation (Q2):** +- **UK:** UK Ltd → **Stripe UK (GBP, no card surcharge) + Wise/Revolut GBP + account** for both collecting (Flow A) and paying GOV.UK (Flow B, agent model). + Register for VAT. ERPNext stays system of record. +- **AU/IE/NZ:** launch **prep-only / customer-pays-authority** (Flow B model ii) on + cross-border Stripe in local currency to validate demand **before** committing to + a local entity + resident director + local acquiring. + +--- + +## Cost / friction summary (entity + payments to launch) + +| Market | Entity to operate | Hard blocker | Payments-in | Pay-authority | Verdict | +|---|---|---|---|---|---| +| **UK** ⭐ | UK Ltd (no resident dir, ~£12/yr) | VAT registration | Stripe UK / GBP, no surcharge | Agent model via GBP card | **Go first** | +| **AU** ⭐ | None to start; Pty Ltd later | Pty Ltd needs **AU-resident director** | Cross-border AUD → Stripe AU later | Prep-only first | **Go second, prep-only** | +| **IE** | UK Ltd can serve; IE Ltd needs **EEA director / €25k bond** | Director/bond | Stripe EUR | Prep-only / agent | Defer | +| **NZ** | NZ co needs **NZ/AU-resident director** | Director | Cross-border NZD | Prep-only | Defer | + +--- + +## Open questions (need answers before build) + +1. **Cross-border VAT/GST opinion** — does B2B **reverse charge** cover our service + so we don't have to collect? (UK + AU + IE). Single biggest unknown for Q2. +2. **UK LLP formation** — confirm no-resident-member is fine, **line up the + required 2nd member/designated member**, pick a registered-office/agent + provider (mirror `docs/foreign-incorporation-guide.md`). Confirm LLP + pass-through vs the extra UK SA800 + members' self-assessment + US + partnership-filing burden is acceptable vs a single-member Ltd. Banking: + **Wise vs Revolut Business vs Airwallex** for the GBP account + virtual card + (Flow B); decide whether to also apply for **Stripe Issuing** later. +3. **AU resident-director nominee** — cost/availability of a nominee director + service if/when we localize; or stick to ARBN (foreign-company) route. +4. **Money-transmission line** — confirm fronting GOV.UK fees as itemized + pass-through disbursement does not trigger payment-institution licensing at our + volumes. +5. **Transport Manager (UK/IE)** — confirm we can sell prep/monitoring without + holding the statutory TM CPC ourselves, and line up external CPC holders to + broker if we want to offer the full O-licence package. +6. **Surcharge legality** — strip the US card surcharge from all UK/EU/AU pricing; + reprice to absorb processor cost. (Confirmed needed, just needs implementation.) +7. **Vertical fit** — this doc assumes the **transport/trucking** analog (closest + to FMCSA). See the **Vertical portability matrix** below for how the rest of the + US stack ports; **healthcare does NOT port (NHS, no billing-enrollment model).** + +--- + +## Vertical portability matrix (US stack → UK / European-English markets) + +European English-speaking ≈ **UK + Ireland (+ Malta, negligible)**. Each vertical is +judged on the same two gates as the US playbook: **(1) is there a recurring +regulatory clock to sell against, and (2) can we actually get emails** (every UK +public register lists the regulated entity but **not** its email, so all of these +collapse to the same spine: **free public register × Companies House join × +scrape-published-emails / paid append** — build it once, run all verticals on it). + +| US vertical (ours) | UK/EU analog | Recurring clock? | Email/data path | Verdict | +|---|---|---|---|---| +| **Formation + annual report + registered agent** | **Companies House**: formation, **confirmation statement (annual)**, registered office, **ECCT identity verification** (2025) | ✅ annual | Companies House **free bulk register** (no email) → enrich | ⭐ Best 1:1 transfer; ~5M cos; **but saturated** (1stFormations/Tide) | +| **TCPA / data-privacy** | **ICO data protection fee** — every UK business processing personal data pays £40–£2,900/yr; **PECR** is the marketing law itself | ✅ **mandatory annual**, widely missed | **ICO public register** (name+status, no email) → can flag the *unregistered* → enrich | ⭐ **Sleeper / lead UK product.** Mandatory, recurring, under-served, we already operate under this law | +| **Trucking / FMCSA** | **O-licence** (Traffic Commissioner/DVSA) | ✅ 5-yr + ongoing | O-licence register (no email) × Companies House × scrape | ⭐ Main plan; ~85k UK + ~4k IE operators | +| **EPA RCRA hazardous waste** | **Environment Agency** waste carrier/broker registration (renew **3-yr**) + hazardous waste producer | ✅ 3-yr | EA public carrier/broker register (limited contact) → enrich | ✅ Decent niche, clear clock, public register | +| **Employment / contractor classification** | **IR35 / off-payroll working**, worker status | ⚠️ event-driven, no registry | no registry; reach via contractor/accountant channels | ⚠️ Real pain but **not list/cold-email driven** → inbound/content | +| **Telecom (CRTC / FCC 499 / USAC)** | **Ofcom** comms-provider notification + annual admin charge; **CCTS→ADR (CISAS/Ombudsman)** | ✅ annual admin charge | Ofcom lists exist, **no rich email register** like FCC RMD | ⚠️ Small universe, weak data, niche | +| **FMC ocean (NVOCC/forwarders)** | **BIFA membership, AEO, CDS customs** | ⚠️ mostly one-time/voluntary | BIFA member list, no clean email feed | ⚠️ Niche, weak clock | +| **Healthcare (Medicare/PECOS/Medicaid/CLIA/DEA)** | **NHS single-payer kills the billing-enrollment model.** Only **GMC revalidation (5yr)/NMC (3yr)** + **CQC provider registration** map | ⚠️ revalidation is **personal attestation** | GMC/NMC registers (no email); CQC has provider contact | ❌ **Worst transfer — skip.** No Medicare-enrollment analog; don't spend burner infra here | + +### Takeaways +1. **Two verticals beat trucking for the UK launch:** + - **Companies House corporate services** — most direct transfer of our entire + formation/RA/annual-report engine, but the most crowded market. + - **ICO data protection fee** ⭐ — the sleeper: mandatory + recurring + widely + ignored, the public register lets us target the **non-compliant**, per-deal + value is small but volume is enormous, and we already understand PECR. +2. **Healthcare does NOT port** — entire US healthcare stack assumes fee-for-service + billing the NHS doesn't have. Exclude from UK/IE. +3. **One enrichment spine serves all** — Companies House-anchored verticals + (corporate, ICO, trucking, waste) are all **Tier-2 "one hop to email"** (per + `docs/vertical-lead-source-analysis.md`); telecom/FMC/healthcare are Tier-3/4. +4. **Lead UK products:** **ICO data-protection-fee + Companies House corporate + services**, alongside the **O-licence** trucking stream. + +## Next docs (after Q1/Q2 cleared) + +- `plan.uk-olicence-stream.md` — UK Traffic Commissioner O-licence product, + template localization, Companies House entity-type segmentation (Ltd/LLP/PLC = + legal cold B2B; sole traders/partnerships = need soft opt-in). +- `plan.au-nhvr-stream.md` — NHVR / Chain of Responsibility, inferred-consent list + sourcing from published business addresses. +- `plan.uk-ico-fee-stream.md` — ICO data-protection-fee renewal product; target the + unregistered/lapsed from the ICO public register; PECR-compliant outreach. +- `plan.uk-companies-house-stream.md` — confirmation statement + registered office + + ECCT identity verification; the enrichment spine (Companies House bulk × SIC). diff --git a/docs/plan.mx-exclusion-gaps.md b/docs/plan.mx-exclusion-gaps.md new file mode 100644 index 0000000..c256238 --- /dev/null +++ b/docs/plan.mx-exclusion-gaps.md @@ -0,0 +1,151 @@ +# Plan: close the MX-exclusion gaps in the trucking warmup + +**Status:** PROPOSED (2026-06-20). Analysis + design only; no code shipped yet. +**Owner context:** warmup day 17; big operators (Google/Microsoft/Proofpoint/ +Mimecast/Barracuda/Cisco/Broadcom) are EXCLUDED until day 30, then re-introduced +via `mx_daily_caps()`. This plan fixes three holes that let throttling/consumer +MX operators through during that window. + +--- + +## Background: how the two MX layers work today + +Sender reputation is judged by the **receiving operator (MX)**, not the recipient +domain string. There are two independent gates in `scripts/build_trucking_campaigns.py`: + +1. **`fetch_carriers()` big-MX EXCLUSION** (SQL `big_mx_exclude`): during warmup + (`main_warmup_day() <= MAIN_BIG_MX_EXCLUDE_UNTIL_DAY`, currently day 30) it + drops carriers whose `mx_provider IN BIG_MX_OPERATORS`. `mx_provider IS NULL` + is deliberately KEPT (so the pool isn't starved before tagging completes). +2. **`select_sendable_carriers()` per-MX THROTTLE** (`mx_daily_caps` + + `per_op` cap): bounds how many of a run's quota go to each KNOWN operator so + we never concentrate on one. NULL is NOT capped (would collapse onto one + bucket and starve the pool). + +`mx_provider` is populated by `scripts/mx_tag_carriers.py`, which resolves each +domain's MX and returns either a **clean label** (`google`, `microsoft`, +`proofpoint`, `mimecast`, `cisco`, `barracuda`, `broadcom`, `godaddy`, `zoho`, +`rackspace`) or, for everything else, an **`mx:` prefix** (e.g. +`mx:yahoodns.net`, `mx:icloud.com`, `mx:comcast.net`). + +--- + +## The three gaps (with live numbers, 2026-06-20) + +### Gap 1 — consumer/throttling MX behind the `mx:` prefix are NOT excluded +`BIG_MX_OPERATORS` only lists the clean labels. The big consumer mailbox +operators get tagged with the `mx:` prefix and so slip BOTH gates during warmup: + +| mx_provider | sendable carriers | why it's a problem | +| --- | --- | --- | +| `mx:yahoodns.net` | **283,113** | Yahoo Small Business / AOL custom domains — same aggressive consumer filtering + complaint-driven blocking as consumer Yahoo. By far the biggest hole. | +| `mx:icloud.com` | **24,985** | Apple iCloud+ Custom Domain — Apple consumer filtering; iCloud was the biggest consumer leak we already scrubbed from Listmonk. | +| `mx:comcast.net` | 12,251 | Comcast consumer infra; historically bouncy. | +| `mx:charter.net` | 5,860 | Spectrum/Charter consumer. | +| `mx:centurylink.net` / `mx:windstream.net` / `mx:tds.net` / `mx:earthlink-vadesecure.net` | ~8,100 | Legacy/satellite ISP consumer mail; many already in `DEAD_ISP_DOMAINS` as literal domains but NOT caught when a custom domain points its MX there. | + +`mx:yahoodns.net` alone is **283k** carriers that look "long-tail/safe" to the +warmup but actually filter like a big operator. This is the headline fix. + +> NOTE: the literal-domain layer (`BLOCKED_EMAIL_DOMAINS` incl. the Yahoo family, +> Apple, dead ISPs) already blocks `someone@yahoo.com` / `@icloud.com`. The hole +> is a **custom domain whose MX points at Yahoo/iCloud** — invisible to the +> string layer, only visible via MX tagging. That's exactly what this closes. + +### Gap 2 — 315,892 untagged (NULL) carriers are sent to unvetted +`mx_provider IS NULL` is kept by both gates by design (anti-starvation). With +**315,892** sendable NULLs vs 1,187,054 tagged, a meaningful slice of every run +goes to domains we've never MX-resolved — some of which are Google/MS/Yahoo we'd +otherwise exclude. This is acceptable as a bootstrap but should shrink over time. + +### Gap 3 — `mx_tag_carriers.py` is not on a cron +There is no `infra/cron/pw-mx-tag` (confirmed: no cron references it). So the NULL +backlog only shrinks when someone runs it by hand. New carriers imported by the +FMCSA census downloader land as NULL and stay NULL. Without continuous tagging, +Gaps 1 and 2 slowly re-open. + +--- + +## Proposed fixes + +### Fix 1 — exclude consumer/throttling `mx:` operators during warmup (HIGH) +Add an explicit set of `mx:`-prefixed operators that should be treated like the +big operators during warmup, and fold them into BOTH the exclusion and the +throttle. Keep it data-driven and documented. + +```python +# scripts/build_trucking_campaigns.py +# Consumer / aggressively-filtering mailbox operators that mx_tag_carriers.py +# labels with the "mx:" prefix (no clean label). They throttle/complaint-block +# like the big operators, so hold them out during warmup too. (yahoodns = +# Yahoo Small Business + AOL custom domains; icloud = Apple custom domains.) +CONSUMER_MX_OPERATORS = ( + "mx:yahoodns.net", "mx:icloud.com", "mx:comcast.net", "mx:charter.net", + "mx:centurylink.net", "mx:windstream.net", "mx:tds.net", + "mx:earthlink-vadesecure.net", +) +# Everything held out of the warmup pool entirely (until MAIN_BIG_MX_EXCLUDE_UNTIL_DAY). +WARMUP_EXCLUDE_OPERATORS = BIG_MX_OPERATORS + CONSUMER_MX_OPERATORS +``` +- In `fetch_carriers()`: build `big_mx_exclude` from `WARMUP_EXCLUDE_OPERATORS` + (not just `BIG_MX_OPERATORS`). +- In `mx_daily_caps()`: give `CONSUMER_MX_OPERATORS` the same `big` ramp as the + clean big operators after day 30 (so they re-introduce gradually, not all at + once on day 31). +- Keep it behind the existing `MAIN_SKIP_BIG_MX` switch so it's reversible. + +**Effect:** removes ~330k consumer-MX carriers from the warmup-window pool; the +long tail of genuinely small/self-hosted systems carries the volume, which is the +whole point of the warmup strategy. + +### Fix 2 — bound the NULL bucket with a small cap (MEDIUM) +Don't exclude NULL (still anti-starvation), but give it a real per-run cap in +`select_sendable_carriers()` instead of "uncapped". E.g. treat unknown/NULL like +`__default__` but at a fraction (say 40/run) so an untagged Google/Yahoo domain +can't flood a run. Pairs with Fix 3 (continuous tagging) to shrink the bucket. + +### Fix 3 — put `mx_tag_carriers.py` on a daily cron (MEDIUM) +Add `infra/cron/pw-mx-tag` (model on `pw-listmonk-scrub`) running e.g. 05:45 UTC +(before the 08:00 trucking builder), tagging the next N thousand NULL domains/day: +``` +45 5 * * * deploy cd /opt/performancewest && docker compose exec -T workers \ + python3 -m scripts.mx_tag_carriers --limit-domains 20000 \ + >> /var/log/pw-mx-tag.log 2>&1 +``` +Install to `/etc/cron.d/` (deploy.sh doesn't run ansible). This continuously +shrinks the 315k NULL backlog and keeps newly-imported carriers tagged, so Fixes +1 & 2 stay effective. + +--- + +## Validation plan (verify before/after, no sends triggered) + +1. **Dry-run the selector** before/after Fix 1 and diff the per-MX composition of + a simulated run (the builder has `list_segments()` / quota selection paths that + can be exercised read-only). Assert 0 carriers from `CONSUMER_MX_OPERATORS` + are selected while `main_warmup_day() <= 30`. +2. **SQL sanity:** `SELECT mx_provider, count(*) ... WHERE listmonk_sent_at IS NULL + GROUP BY 1` — confirm the excluded operators drop out of the candidate pool. +3. **Cron (Fix 3):** run `mx_tag_carriers --limit-domains 1000` once by hand, + confirm the NULL count falls and no errors; then install the cron and confirm + the next-day count fell again (idempotent, bounded). +4. **Regression:** confirm the long-tail pool is still large enough to hit daily + quota at warmup caps (so we don't starve the send). If the long tail is too + small after excluding 330k consumer-MX, that's a signal to either lower the + daily quota or accept a smaller controlled slice of one consumer operator. + +--- + +## Open questions / decisions for owner + +- **Re-introduction after day 30:** treat `CONSUMER_MX_OPERATORS` identically to + the big operators (same ramp), or keep Yahoo/iCloud custom domains excluded + *longer* (they convert worse and complain more)? Recommendation: same ramp, but + watch the reputation monitor's per-operator reject% and pull back if Yahoo + spikes. +- **NULL cap size (Fix 2):** 40/run is a guess; tune against how fast Fix 3 drains + the backlog. +- **Should `mx:` consumer exclusion be permanent (not just warmup)?** For a + B2B compliance product, a carrier reachable only at a Yahoo/iCloud custom + domain is a low-value, high-complaint segment regardless of warmup. Worth + considering a permanent down-weight, not just a warmup hold.