new-site/docs/healthcare-email-stream-plan.md

15 KiB

Plan — Dual-Stream Outbound Email (Healthcare hot + Trucking trickle)

Why this exists

Today one global throttle governs all outbound mail: the Listmonk sliding window (app.message_sliding_window_rate, currently 150/h ramping to a 300/h hard ceiling ≈ 4k/day) plus a shared Postfix rotation pool (.94/.95/.96).

That ceiling exists to protect consumer-ISP reputation (Gmail / Microsoft / Yahoo), which is what the FMCSA trucking campaigns mail. The May 30-31 collapse (29k blast → Gmail 550-5.7.1, Yahoo 421 TSS04, delivery fell to ~13%) is why the whole warmup/cap machinery exists.

Healthcare's reachable audience is different in kind, so it should NOT be constrained by the same ceiling:

  • The cold-emailable NPPES-endpoint slice is "tens of thousands"; a large part is consumer webmail (gmail ~12.4k) but a meaningful tail is practice/clinic domains (their own MX, Google Workspace / Microsoft 365 tenants).
  • Practice-domain (institutional) mail does not share the consumer-ISP snowshoe heuristics that torch the trucking IPs. Its deliverability is largely independent of the reputation we're protecting on .94-.96.

Verified audience size (May 2026 NPPES endpoint_pfile, measured)

Classifying every email-formatted endpoint (deduped) with the tightened Direct/HISP filter (direct, medicity.net, surescripts, updox, maxmd, …) and the consumer-webmail set:

segment rows NPIs routing
Direct / HISP 242,441 parked (DirectTrust-only routing, won't cold-deliver)
Consumer webmail 19,366 ~19,072 rides the trucking consumer-discipline stream
Institutional (practice domains) 94,348 ~92,592 HEALTHCARE HOT stream

Institutional spread: 38,873 distinct domains, 76% of which have exactly 1 provider (small practices = our $399 PECOS-revalidation buyer). Top-100 domains are only 23% of volume → healthy long tail, no single MX gets hammered. (Excludes a handful of non-prospect giants — va.gov, mail.mil, cvshealth.com, walgreens.com, wal-mart.com — that we drop in the audience build.)

This sizes the hot stream: at ~92k deliverable institutional addresses a 10k/day ceiling drains the list in ~2 weeks; stuck behind the 4k trucking cap it would take ~23 days AND poison the trucking IPs. Hence the split.

So the goal is stream isolation: let healthcare-institutional mail run hot on its own IPs/cap while trucking keeps trickling on the warmed consumer-facing IPs, with neither able to damage the other.

Honesty caveat (do not skip): the consumer-webmail portion of the healthcare list (gmail/outlook/icloud addresses) is NOT institutional and MUST ride the same cautious consumer-ISP discipline as trucking. "Run healthcare hot" applies ONLY to the practice-domain (non-consumer, non-DirectTrust) segment. We split the healthcare list itself into healthcare-institutional vs healthcare-consumer and route each to the matching stream.

Architecture: two independent streams, one Postfix, one Listmonk

flowchart TD
    LM[Listmonk] -->|SMTP server A: 172.18.0.1:25\nhello perfwest...| PFA[Postfix submission]
    LM -->|SMTP server B: 172.18.0.1:2526\nhello hc-mta...| PFB[Postfix submission hc]
    PFA --> TR{transport map}
    PFB --> TRH{transport_maps hc}
    TR -->|yahoo family| HOLD[hold:]
    TR -->|consumer + everything else| ROT[randmap rotation\nout05..out20\n.94-.109]
    TRH -->|practice domains| HCROT[randmap hc pool\nhcout1..hcout4\n.107-.109 + spare]
    ROT --> NET1[(consumer ISPs:\nGmail / MS, capped low)]
    HCROT --> NET2[(practice MX /\nWorkspace / M365, hot)]

Two coordinated changes:

1. Postfix: a dedicated healthcare submission service + IP sub-pool

  • Carve 2-3 IPs out of the existing 20 (.107/.108/.109 = out18/19/20, currently unused at the warmup tail) into a healthcare-only rotation pool. They get their own HELO (hcmtaNN.performancewest.net — confirm/lay down PTR + SPF first) so healthcare reputation is built and judged separately from trucking. They are removed from the trucking ALL=(...) array so the trucking warmup never reclaims them.
  • Add a second Postfix submission entry in master.cf listening on a distinct port (e.g. 2526) whose injected mail is tagged to the healthcare pool. Two clean ways to bind the pool:
    • (preferred) sender-dependent / class-based transport: route by the submission port via a dedicated cleanup/smtpd service that sets a header or uses a separate transport_maps so healthcare recipients hit randmap:{hcout1:,hcout2:,hcout3:}.
    • Simpler alternative: a separate Postfix instance (postmulti) listening on 2526, with its own main.cf bound to the hc IPs. More isolation, more moving parts. Decide in step 0 (recommend the single-instance class-based route unless isolation is required).
  • Keep the Yahoo-family hold: backstop in BOTH transports. Healthcare list is pre-filtered, but defense in depth.

2. Listmonk: a second SMTP server, used only by healthcare campaigns

Listmonk's settings.smtp is a JSON array and already supports multiple SMTP servers. Add a second entry:

{ "host":"172.18.0.1", "port":2526, "uuid":"healthcare",
  "enabled":true, "hello_hostname":"hcmta.performancewest.net",
  "max_conns":4, "tls_type":"none", "auth_protocol":"none" }

Listmonk round-robins across enabled SMTP servers, so to keep streams isolated we do NOT rely on per-campaign SMTP selection (Listmonk lacks native per-campaign SMTP pinning). Instead we isolate by separate Listmonk instances OR by the cleaner operational split below. Decide in step 0:

  • Option A — second Listmonk instance (listmonk-hc) on the same Postgres, separate app.message_sliding_window_rate, pointed only at port 2526. Cleanest isolation of caps; ~zero risk of cross-stream throttle coupling. This is the recommended option because the whole point is independent caps.
  • Option B — one Listmonk, single SMTP server B for healthcare, and we accept Listmonk's single global cap by running trucking and healthcare in non-overlapping send windows. Cheaper but couples the caps (defeats the goal).

Recommend Option A (second listmonk-hc service in compose). It gets its own app.message_sliding_window_rate (the healthcare cap), its own SMTP server (port 2526 → hc IPs), and shares the contacts DB only if we want (probably separate DB to keep bounce/complaint reputation accounting clean per stream).

Healthcare-stream cap (institutional segment)

Institutional B2B mail tolerates much higher volume than consumer cold mail, but we still warm the new hc IPs (they're fresh) and we still respect per-domain practice MX limits. Proposed hc warmup (separate stamp /etc/postfix/hc-warmup-start):

hc warmup day hourly cap ~daily notes
0-1 100/h ~1,000 brand-new hc IPs, prove clean
2-4 300/h ~3,000
5-9 600/h ~6,000
10+ 1,000/h ~10,000 institutional ceiling; revisit with data

These are separate from and additive to the trucking ~4k/day ceiling, because they hit a disjoint set of receiving systems on disjoint sending IPs.

Per-domain politeness still applies (smtp_destination_concurrency_limit, smtp_destination_rate_delay) so we never hammer one clinic's MX.

Audience split (must happen before any send)

Extend scripts/build_npi_outreach_lists.py (or a thin post-processor) to emit THREE files instead of lumping cold together:

  1. npi_healthcare_institutional.csv — cold, non-Direct, non-consumer-webmail (practice/clinic domains). → healthcare HOT stream.
  2. npi_healthcare_consumer.csv — cold consumer webmail (gmail/outlook/icloud…). → rides the TRUCKING consumer-discipline stream (low cap), NOT the hot one.
  3. npi_direct_secure.csv — DirectTrust/HISP. → parked until DirectTrust signup.

Classification rule: institutional = cold channel AND domain NOT in CONSUMER_WEBMAIL AND not Direct. (We already compute cold/direct and a cold_consumer count; just split on the consumer set.)

Always run the existing free MX + SMTP RCPT verification on a NON-sending IP (doc sec 8.2) over the institutional list before importing, so we never mail dead practice mailboxes (550 5.1.1 from a clinic MX still hurts the hc IPs).

Reputation hygiene (per stream, independent)

  • Separate PTR/FCrDNS (hcmtaNN.performancewest.net) + separate SPF authorization for the hc IPs (still under the same domain so DKIM/DMARC pass).
  • DKIM/DMARC unchanged (domain-level) — healthcare mail still signs as performancewest.net, which is fine and desirable.
  • Separate bounce/complaint monitoring per pool (grep by hc IP / by hc syslog_name). The existing monitoring commands extend trivially with the hc IPs.
  • A healthcare ramp-cap script (pw-hc-rampcap) mirroring pw-listmonk-rampcap but driving the listmonk-hc cap off /etc/postfix/hc-warmup-start.

Concrete ordered steps

  1. Decide: single Postfix instance + class-based hc transport vs postmulti; and Listmonk Option A (2nd instance) vs B. (Recommend: single instance + class transport, and Listmonk Option A.)
  2. DNS/identity: add PTR hcmtaNN for .107/.108/.109, extend SPF, confirm DKIM/DMARC still pass for those IPs. (No send until green.)
  3. Postfix: new submission service on :2526; carve out18/19/20 into an hc rotation pool; remove them from the trucking ALL array; add the hc-warmup-start stamp + pw-hc-mta-warmup. Keep Yahoo hold: backstop.
  4. Listmonk-hc: add listmonk-hc compose service (same image, own LISTMONK_app__* cap env / settings, SMTP server = 172.18.0.1:2526), behind nginx at a separate vhost or path. Wire pw-hc-rampcap.
  5. Audience: extend the list builder to emit the 3 split files; run free MX + SMTP verification (non-sending IP) on the institutional file.
  6. Campaign: build a healthcare-institutional campaign (revalidation-overdue first → free NPI tool link → $399 PECOS Revalidation product), import the verified institutional list into listmonk-hc, send small focused batches.
  7. deploy wiring: add the new services/scripts to deploy.sh / deploy-dev.sh and ansible templates, mirroring the proxy-relay pattern just landed.

Validation

  • Isolation proof: send a trucking batch and an hc batch simultaneously; confirm via mail.log that trucking mail egresses ONLY from .94-.96 and hc mail ONLY from .107-.109, and that each respects its own cap independently.
  • Identity proof: an hc test send to a mail-tester/aboutmy.email account shows PTR hcmtaNN, SPF pass, DKIM pass, DMARC pass.
  • Deliverability proof: hc test sends to a Google Workspace test domain + an M365 test domain land in inbox (not spam); record per-domain disposition.
  • Cap proof: pw-hc-rampcap sets the listmonk-hc cap from the hc warmup day and does NOT touch the trucking Listmonk cap (and vice-versa).
  • No regression: trucking delivery mix unchanged after the split (same monitoring commands, same .94-.96 volumes).

Decisions (locked)

  1. Postfix: single instance + class-based hc transport (port :2526 → hc rotation pool). No postmulti.
  2. Listmonk: a second instance (listmonk-hc) with its own sliding-window cap → true cap isolation.
  3. Institutional ceiling: 10k/day (warm up to it).
  4. Contacts DB: separate (listmonk_hc database) — cleaner per-stream bounce/complaint accounting, and the hc instance needs its own DB anyway.
  5. Audience count: measured — ~92,592 institutional NPIs / 38,873 domains (see table above).

Open / for-later

  • How aggressive on the institutional ceiling beyond 10k/day — raise only with clean delivery data.
  • DirectTrust signup to unlock the 242k Direct/HISP segment (separate effort).

Implementation status (built + validated)

Committed and validated on dev:

  • Audience splitscripts/healthcare_email_streams.py (shared classifier)
    • reworked scripts/build_npi_outreach_lists.py emit npi_healthcare_institutional/consumer.csv + npi_direct_secure.csv. Verified on May 2026 NPPES: 89,557 institutional rows.
  • Postfix hc streaminfra/postfix/hc_stream_setup.sh applied on the app server: ports 2526/2527/2528 -> hcout1/2/3 -> IPs .107/.108/.109 (HELO hcmta01-03). Proven: a send on :2527 egressed via hcout2 (.108) to the real gmail MX; trucking transport_maps (.94-.96) untouched.
  • listmonk-hc — second instance (own listmonk_hc DB, own cap), 3 SMTP servers = the 3 hc ports. Proven on dev: listmonk-hc container -> host :2526 (hcsubmit107) -> hcout1 (.107) -> real gmail MX.
  • Ramp-capinfra/postfix/pw-hc-rampcap.sh (100->1000/h off /etc/postfix/hc-warmup-start), independent of the trucking ramp.
  • Deploy wiring — deploy.sh/deploy-dev.sh bring up listmonk-hc; docker-compose.dev.override.yml keeps dev (shared host) from clashing on prod host ports / postgres volume.

REMAINING before any healthcare send (manual, needs Justin/DNS)

  1. PTR / FCrDNS for the hc IPs — DONE 2026-06-06. .107->hcmta01, .108->hcmta02, .109->hcmta03 (.performancewest.net), plus matching forward A records, verified resolving on the authoritative NS AND HE.net secondaries (SOA serial in sync). FCrDNS confirmed both ways.

    How (for future reference): HestiaCP box cp.carrierone.com = 207.174.124.22, SSH port 22022 (not 22). admin@ is sftp-only, but root@.22:22022 accepts our default ~/.ssh/id_ed25519 → full shell + Hestia CLI. Forward zone performancewest.net and reverse zone 124.174.207.in-addr.arpa are both owned by Hestia user justin; HE.net auto-zone-transfers (secondaries). Commands used:

    export PATH=$PATH:/usr/local/hestia/bin
    # forward A:  USER DOMAIN RECORD TYPE VALUE
    v-add-dns-record justin performancewest.net hcmta01 A 207.174.124.107
    # reverse PTR: USER REVZONE OCTET PTR FQDN. "" "" <restart yes/no>
    v-add-dns-record justin 124.174.207.in-addr.arpa 107 PTR hcmta01.performancewest.net. "" "" yes
    v-delete-dns-record justin 124.174.207.in-addr.arpa <ID> no   # remove stale
    v-rebuild-dns-domain justin 124.174.207.in-addr.arpa          # bump serial
    

    (Also removed pre-existing duplicate mta18-20 PTRs in the reverse zone.) NOTE: the workers' hestia_provisioner.py path (admin@:22 + mounted key) remains unfinished/unused — the working path is root@:22022 with our key.

  2. SPF/DKIM/DMARC VERIFIED 2026-06-06. SPF already authorizes .107/.108/.109 explicitly and ends -all (only 2 DNS-lookup mechanisms, a mx — safe under the 10 limit). DKIM selector mail published (2048-bit). DMARC p=quarantine; pct=100; rua=dmarc@. All domain-level, no change needed.

  3. Install on prod: create listmonk_hc DB + --install, configure its 3 SMTP servers (commands in deploy.sh header), run hc_stream_setup.sh on the prod MTA, install pw-hc-rampcap cron.

  4. Verify identity with mail-tester / aboutmy.email from an hc IP (PTR + SPF

    • DKIM + DMARC all pass) BEFORE importing the list.
  5. Free MX+SMTP verify the institutional CSV on a non-sending IP, import the verified file into listmonk-hc, send small focused batches (overdue-first).