new-site/docs/healthcare-email-stream-plan.md
justin 40090da1dd docs: plan dual-stream outbound email (healthcare-hot + trucking-trickle)
Today one global Listmonk cap + shared Postfix rotation pool governs all mail,
sized to protect consumer-ISP (Gmail/MS/Yahoo) reputation for trucking cold mail.
Healthcare practice-domain (institutional) mail has an independent deliverability
profile and should run hotter without endangering the warmed trucking IPs.

Plan: isolate two streams sharing one Postfix/Listmonk:
- carve hc-dedicated sending IPs (.107-.109) with their own PTR/SPF + warmup;
- a 2nd Postfix submission service (:2526) bound to the hc pool;
- a 2nd Listmonk instance (or SMTP server) with its own sliding-window cap;
- split the healthcare list into institutional (hot) vs consumer-webmail (rides
  trucking discipline) vs DirectTrust (parked);
- free MX+SMTP verify the institutional list on a non-sending IP first.

Includes mermaid topology, separate hc warmup/cap schedule, validation (isolation/
identity/deliverability/cap proofs), and open decisions for sizing.
2026-06-05 18:51:05 -05:00

10 KiB

Plan — Dual-Stream Outbound Email (Healthcare hot + Trucking trickle)

Why this exists

Today one global throttle governs all outbound mail: the Listmonk sliding window (app.message_sliding_window_rate, currently 150/h ramping to a 300/h hard ceiling ≈ 4k/day) plus a shared Postfix rotation pool (.94/.95/.96).

That ceiling exists to protect consumer-ISP reputation (Gmail / Microsoft / Yahoo), which is what the FMCSA trucking campaigns mail. The May 30-31 collapse (29k blast → Gmail 550-5.7.1, Yahoo 421 TSS04, delivery fell to ~13%) is why the whole warmup/cap machinery exists.

Healthcare's reachable audience is different in kind, so it should NOT be constrained by the same ceiling:

  • The cold-emailable NPPES-endpoint slice is "tens of thousands"; a large part is consumer webmail (gmail ~12.4k) but a meaningful tail is practice/clinic domains (their own MX, Google Workspace / Microsoft 365 tenants).
  • Practice-domain (institutional) mail does not share the consumer-ISP snowshoe heuristics that torch the trucking IPs. Its deliverability is largely independent of the reputation we're protecting on .94-.96.

So the goal is stream isolation: let healthcare-institutional mail run hot on its own IPs/cap while trucking keeps trickling on the warmed consumer-facing IPs, with neither able to damage the other.

Honesty caveat (do not skip): the consumer-webmail portion of the healthcare list (gmail/outlook/icloud addresses) is NOT institutional and MUST ride the same cautious consumer-ISP discipline as trucking. "Run healthcare hot" applies ONLY to the practice-domain (non-consumer, non-DirectTrust) segment. We split the healthcare list itself into healthcare-institutional vs healthcare-consumer and route each to the matching stream.

Architecture: two independent streams, one Postfix, one Listmonk

flowchart TD
    LM[Listmonk] -->|SMTP server A: 172.18.0.1:25\nhello perfwest...| PFA[Postfix submission]
    LM -->|SMTP server B: 172.18.0.1:2526\nhello hc-mta...| PFB[Postfix submission hc]
    PFA --> TR{transport map}
    PFB --> TRH{transport_maps hc}
    TR -->|yahoo family| HOLD[hold:]
    TR -->|consumer + everything else| ROT[randmap rotation\nout05..out20\n.94-.109]
    TRH -->|practice domains| HCROT[randmap hc pool\nhcout1..hcout4\n.107-.109 + spare]
    ROT --> NET1[(consumer ISPs:\nGmail / MS, capped low)]
    HCROT --> NET2[(practice MX /\nWorkspace / M365, hot)]

Two coordinated changes:

1. Postfix: a dedicated healthcare submission service + IP sub-pool

  • Carve 2-3 IPs out of the existing 20 (.107/.108/.109 = out18/19/20, currently unused at the warmup tail) into a healthcare-only rotation pool. They get their own HELO (hcmtaNN.performancewest.net — confirm/lay down PTR + SPF first) so healthcare reputation is built and judged separately from trucking. They are removed from the trucking ALL=(...) array so the trucking warmup never reclaims them.
  • Add a second Postfix submission entry in master.cf listening on a distinct port (e.g. 2526) whose injected mail is tagged to the healthcare pool. Two clean ways to bind the pool:
    • (preferred) sender-dependent / class-based transport: route by the submission port via a dedicated cleanup/smtpd service that sets a header or uses a separate transport_maps so healthcare recipients hit randmap:{hcout1:,hcout2:,hcout3:}.
    • Simpler alternative: a separate Postfix instance (postmulti) listening on 2526, with its own main.cf bound to the hc IPs. More isolation, more moving parts. Decide in step 0 (recommend the single-instance class-based route unless isolation is required).
  • Keep the Yahoo-family hold: backstop in BOTH transports. Healthcare list is pre-filtered, but defense in depth.

2. Listmonk: a second SMTP server, used only by healthcare campaigns

Listmonk's settings.smtp is a JSON array and already supports multiple SMTP servers. Add a second entry:

{ "host":"172.18.0.1", "port":2526, "uuid":"healthcare",
  "enabled":true, "hello_hostname":"hcmta.performancewest.net",
  "max_conns":4, "tls_type":"none", "auth_protocol":"none" }

Listmonk round-robins across enabled SMTP servers, so to keep streams isolated we do NOT rely on per-campaign SMTP selection (Listmonk lacks native per-campaign SMTP pinning). Instead we isolate by separate Listmonk instances OR by the cleaner operational split below. Decide in step 0:

  • Option A — second Listmonk instance (listmonk-hc) on the same Postgres, separate app.message_sliding_window_rate, pointed only at port 2526. Cleanest isolation of caps; ~zero risk of cross-stream throttle coupling. This is the recommended option because the whole point is independent caps.
  • Option B — one Listmonk, single SMTP server B for healthcare, and we accept Listmonk's single global cap by running trucking and healthcare in non-overlapping send windows. Cheaper but couples the caps (defeats the goal).

Recommend Option A (second listmonk-hc service in compose). It gets its own app.message_sliding_window_rate (the healthcare cap), its own SMTP server (port 2526 → hc IPs), and shares the contacts DB only if we want (probably separate DB to keep bounce/complaint reputation accounting clean per stream).

Healthcare-stream cap (institutional segment)

Institutional B2B mail tolerates much higher volume than consumer cold mail, but we still warm the new hc IPs (they're fresh) and we still respect per-domain practice MX limits. Proposed hc warmup (separate stamp /etc/postfix/hc-warmup-start):

hc warmup day hourly cap ~daily notes
0-1 100/h ~1,000 brand-new hc IPs, prove clean
2-4 300/h ~3,000
5-9 600/h ~6,000
10+ 1,000/h ~10,000 institutional ceiling; revisit with data

These are separate from and additive to the trucking ~4k/day ceiling, because they hit a disjoint set of receiving systems on disjoint sending IPs.

Per-domain politeness still applies (smtp_destination_concurrency_limit, smtp_destination_rate_delay) so we never hammer one clinic's MX.

Audience split (must happen before any send)

Extend scripts/build_npi_outreach_lists.py (or a thin post-processor) to emit THREE files instead of lumping cold together:

  1. npi_healthcare_institutional.csv — cold, non-Direct, non-consumer-webmail (practice/clinic domains). → healthcare HOT stream.
  2. npi_healthcare_consumer.csv — cold consumer webmail (gmail/outlook/icloud…). → rides the TRUCKING consumer-discipline stream (low cap), NOT the hot one.
  3. npi_direct_secure.csv — DirectTrust/HISP. → parked until DirectTrust signup.

Classification rule: institutional = cold channel AND domain NOT in CONSUMER_WEBMAIL AND not Direct. (We already compute cold/direct and a cold_consumer count; just split on the consumer set.)

Always run the existing free MX + SMTP RCPT verification on a NON-sending IP (doc sec 8.2) over the institutional list before importing, so we never mail dead practice mailboxes (550 5.1.1 from a clinic MX still hurts the hc IPs).

Reputation hygiene (per stream, independent)

  • Separate PTR/FCrDNS (hcmtaNN.performancewest.net) + separate SPF authorization for the hc IPs (still under the same domain so DKIM/DMARC pass).
  • DKIM/DMARC unchanged (domain-level) — healthcare mail still signs as performancewest.net, which is fine and desirable.
  • Separate bounce/complaint monitoring per pool (grep by hc IP / by hc syslog_name). The existing monitoring commands extend trivially with the hc IPs.
  • A healthcare ramp-cap script (pw-hc-rampcap) mirroring pw-listmonk-rampcap but driving the listmonk-hc cap off /etc/postfix/hc-warmup-start.

Concrete ordered steps

  1. Decide: single Postfix instance + class-based hc transport vs postmulti; and Listmonk Option A (2nd instance) vs B. (Recommend: single instance + class transport, and Listmonk Option A.)
  2. DNS/identity: add PTR hcmtaNN for .107/.108/.109, extend SPF, confirm DKIM/DMARC still pass for those IPs. (No send until green.)
  3. Postfix: new submission service on :2526; carve out18/19/20 into an hc rotation pool; remove them from the trucking ALL array; add the hc-warmup-start stamp + pw-hc-mta-warmup. Keep Yahoo hold: backstop.
  4. Listmonk-hc: add listmonk-hc compose service (same image, own LISTMONK_app__* cap env / settings, SMTP server = 172.18.0.1:2526), behind nginx at a separate vhost or path. Wire pw-hc-rampcap.
  5. Audience: extend the list builder to emit the 3 split files; run free MX + SMTP verification (non-sending IP) on the institutional file.
  6. Campaign: build a healthcare-institutional campaign (revalidation-overdue first → free NPI tool link → $399 PECOS Revalidation product), import the verified institutional list into listmonk-hc, send small focused batches.
  7. deploy wiring: add the new services/scripts to deploy.sh / deploy-dev.sh and ansible templates, mirroring the proxy-relay pattern just landed.

Validation

  • Isolation proof: send a trucking batch and an hc batch simultaneously; confirm via mail.log that trucking mail egresses ONLY from .94-.96 and hc mail ONLY from .107-.109, and that each respects its own cap independently.
  • Identity proof: an hc test send to a mail-tester/aboutmy.email account shows PTR hcmtaNN, SPF pass, DKIM pass, DMARC pass.
  • Deliverability proof: hc test sends to a Google Workspace test domain + an M365 test domain land in inbox (not spam); record per-domain disposition.
  • Cap proof: pw-hc-rampcap sets the listmonk-hc cap from the hc warmup day and does NOT touch the trucking Listmonk cap (and vice-versa).
  • No regression: trucking delivery mix unchanged after the split (same monitoring commands, same .94-.96 volumes).

Open decisions for Justin

  1. Real institutional-domain count: re-run the list builder on fresh NPPES data to get the exact npi_healthcare_institutional.csv size before we size the hc cap.
  2. Single Postfix instance (class transport) vs postmulti second instance.
  3. Listmonk: second instance (recommended, true cap isolation) vs single instance with windowed sends.
  4. How aggressive on the institutional ceiling (10k/day proposed) — start conservative and let data raise it.
  5. Whether hc uses a separate Listmonk contacts DB (cleaner per-stream complaint accounting) or shares the existing one.