docs: plan dual-stream outbound email (healthcare-hot + trucking-trickle)
Today one global Listmonk cap + shared Postfix rotation pool governs all mail, sized to protect consumer-ISP (Gmail/MS/Yahoo) reputation for trucking cold mail. Healthcare practice-domain (institutional) mail has an independent deliverability profile and should run hotter without endangering the warmed trucking IPs. Plan: isolate two streams sharing one Postfix/Listmonk: - carve hc-dedicated sending IPs (.107-.109) with their own PTR/SPF + warmup; - a 2nd Postfix submission service (:2526) bound to the hc pool; - a 2nd Listmonk instance (or SMTP server) with its own sliding-window cap; - split the healthcare list into institutional (hot) vs consumer-webmail (rides trucking discipline) vs DirectTrust (parked); - free MX+SMTP verify the institutional list on a non-sending IP first. Includes mermaid topology, separate hc warmup/cap schedule, validation (isolation/ identity/deliverability/cap proofs), and open decisions for sizing.
This commit is contained in:
parent
4f49fad7f9
commit
40090da1dd
1 changed files with 185 additions and 0 deletions
185
docs/healthcare-email-stream-plan.md
Normal file
185
docs/healthcare-email-stream-plan.md
Normal file
|
|
@ -0,0 +1,185 @@
|
|||
# Plan — Dual-Stream Outbound Email (Healthcare hot + Trucking trickle)
|
||||
|
||||
## Why this exists
|
||||
Today **one global throttle governs all outbound mail**: the Listmonk sliding
|
||||
window (`app.message_sliding_window_rate`, currently 150/h ramping to a 300/h
|
||||
hard ceiling ≈ 4k/day) plus a shared Postfix rotation pool (`.94/.95/.96`).
|
||||
|
||||
That ceiling exists to protect **consumer-ISP reputation** (Gmail / Microsoft /
|
||||
Yahoo), which is what the FMCSA trucking campaigns mail. The May 30-31 collapse
|
||||
(29k blast → Gmail `550-5.7.1`, Yahoo `421 TSS04`, delivery fell to ~13%) is why
|
||||
the whole warmup/cap machinery exists.
|
||||
|
||||
Healthcare's reachable audience is **different in kind**, so it should NOT be
|
||||
constrained by the same ceiling:
|
||||
- The cold-emailable NPPES-endpoint slice is "tens of thousands"; a large part is
|
||||
consumer webmail (gmail ~12.4k) but a meaningful tail is **practice/clinic
|
||||
domains** (their own MX, Google Workspace / Microsoft 365 tenants).
|
||||
- **Practice-domain (institutional) mail does not share the consumer-ISP
|
||||
snowshoe heuristics** that torch the trucking IPs. Its deliverability is
|
||||
largely independent of the reputation we're protecting on `.94-.96`.
|
||||
|
||||
So the goal is **stream isolation**: let healthcare-institutional mail run hot on
|
||||
its own IPs/cap while trucking keeps trickling on the warmed consumer-facing IPs,
|
||||
with neither able to damage the other.
|
||||
|
||||
> Honesty caveat (do not skip): the *consumer-webmail* portion of the healthcare
|
||||
> list (gmail/outlook/icloud addresses) is NOT institutional and MUST ride the
|
||||
> same cautious consumer-ISP discipline as trucking. "Run healthcare hot" applies
|
||||
> ONLY to the practice-domain (non-consumer, non-DirectTrust) segment. We split
|
||||
> the healthcare list itself into `healthcare-institutional` vs
|
||||
> `healthcare-consumer` and route each to the matching stream.
|
||||
|
||||
## Architecture: two independent streams, one Postfix, one Listmonk
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
LM[Listmonk] -->|SMTP server A: 172.18.0.1:25\nhello perfwest...| PFA[Postfix submission]
|
||||
LM -->|SMTP server B: 172.18.0.1:2526\nhello hc-mta...| PFB[Postfix submission hc]
|
||||
PFA --> TR{transport map}
|
||||
PFB --> TRH{transport_maps hc}
|
||||
TR -->|yahoo family| HOLD[hold:]
|
||||
TR -->|consumer + everything else| ROT[randmap rotation\nout05..out20\n.94-.109]
|
||||
TRH -->|practice domains| HCROT[randmap hc pool\nhcout1..hcout4\n.107-.109 + spare]
|
||||
ROT --> NET1[(consumer ISPs:\nGmail / MS, capped low)]
|
||||
HCROT --> NET2[(practice MX /\nWorkspace / M365, hot)]
|
||||
```
|
||||
|
||||
Two coordinated changes:
|
||||
|
||||
### 1. Postfix: a dedicated healthcare submission service + IP sub-pool
|
||||
- Carve **2-3 IPs out of the existing 20** (`.107/.108/.109` = `out18/19/20`,
|
||||
currently unused at the warmup tail) into a **healthcare-only rotation pool**.
|
||||
They get their own HELO (`hcmtaNN.performancewest.net` — confirm/lay down PTR +
|
||||
SPF first) so healthcare reputation is built and judged separately from
|
||||
trucking. They are removed from the trucking `ALL=(...)` array so the trucking
|
||||
warmup never reclaims them.
|
||||
- Add a **second Postfix submission entry** in `master.cf` listening on a distinct
|
||||
port (e.g. `2526`) whose injected mail is tagged to the healthcare pool. Two
|
||||
clean ways to bind the pool:
|
||||
- **(preferred) sender-dependent / class-based transport:** route by the
|
||||
submission port via a dedicated `cleanup`/`smtpd` service that sets a header
|
||||
or uses a separate `transport_maps` so healthcare recipients hit
|
||||
`randmap:{hcout1:,hcout2:,hcout3:}`.
|
||||
- Simpler alternative: a separate Postfix **instance** (`postmulti`) listening
|
||||
on `2526`, with its own `main.cf` bound to the hc IPs. More isolation, more
|
||||
moving parts. Decide in step 0 (recommend the single-instance class-based
|
||||
route unless isolation is required).
|
||||
- Keep the **Yahoo-family `hold:` backstop** in BOTH transports. Healthcare list
|
||||
is pre-filtered, but defense in depth.
|
||||
|
||||
### 2. Listmonk: a second SMTP server, used only by healthcare campaigns
|
||||
Listmonk's `settings.smtp` is a JSON array and **already supports multiple SMTP
|
||||
servers**. Add a second entry:
|
||||
```json
|
||||
{ "host":"172.18.0.1", "port":2526, "uuid":"healthcare",
|
||||
"enabled":true, "hello_hostname":"hcmta.performancewest.net",
|
||||
"max_conns":4, "tls_type":"none", "auth_protocol":"none" }
|
||||
```
|
||||
Listmonk round-robins across enabled SMTP servers, so to keep streams isolated we
|
||||
do NOT rely on per-campaign SMTP selection (Listmonk lacks native per-campaign
|
||||
SMTP pinning). Instead we isolate by **separate Listmonk instances OR** by the
|
||||
cleaner operational split below. Decide in step 0:
|
||||
|
||||
- **Option A — second Listmonk instance** (`listmonk-hc`) on the same Postgres,
|
||||
separate `app.message_sliding_window_rate`, pointed only at port `2526`.
|
||||
Cleanest isolation of caps; ~zero risk of cross-stream throttle coupling. This
|
||||
is the recommended option because the *whole point* is independent caps.
|
||||
- **Option B — one Listmonk**, single SMTP server B for healthcare, and we accept
|
||||
Listmonk's single global cap by running trucking and healthcare in
|
||||
non-overlapping send windows. Cheaper but couples the caps (defeats the goal).
|
||||
|
||||
→ **Recommend Option A** (second `listmonk-hc` service in compose). It gets its
|
||||
own `app.message_sliding_window_rate` (the healthcare cap), its own SMTP server
|
||||
(port 2526 → hc IPs), and shares the contacts DB only if we want (probably
|
||||
separate DB to keep bounce/complaint reputation accounting clean per stream).
|
||||
|
||||
## Healthcare-stream cap (institutional segment)
|
||||
Institutional B2B mail tolerates much higher volume than consumer cold mail, but
|
||||
we still **warm the new hc IPs** (they're fresh) and we still respect per-domain
|
||||
practice MX limits. Proposed hc warmup (separate stamp `/etc/postfix/hc-warmup-start`):
|
||||
|
||||
| hc warmup day | hourly cap | ~daily | notes |
|
||||
|---:|---:|---:|---|
|
||||
| 0-1 | 100/h | ~1,000 | brand-new hc IPs, prove clean |
|
||||
| 2-4 | 300/h | ~3,000 | |
|
||||
| 5-9 | 600/h | ~6,000 | |
|
||||
| 10+ | 1,000/h | ~10,000 | institutional ceiling; revisit with data |
|
||||
|
||||
These are **separate** from and additive to the trucking ~4k/day ceiling, because
|
||||
they hit a disjoint set of receiving systems on disjoint sending IPs.
|
||||
|
||||
Per-domain politeness still applies (`smtp_destination_concurrency_limit`,
|
||||
`smtp_destination_rate_delay`) so we never hammer one clinic's MX.
|
||||
|
||||
## Audience split (must happen before any send)
|
||||
Extend `scripts/build_npi_outreach_lists.py` (or a thin post-processor) to emit
|
||||
THREE files instead of lumping cold together:
|
||||
1. `npi_healthcare_institutional.csv` — cold, non-Direct, **non-consumer-webmail**
|
||||
(practice/clinic domains). → healthcare HOT stream.
|
||||
2. `npi_healthcare_consumer.csv` — cold consumer webmail (gmail/outlook/icloud…).
|
||||
→ rides the TRUCKING consumer-discipline stream (low cap), NOT the hot one.
|
||||
3. `npi_direct_secure.csv` — DirectTrust/HISP. → parked until DirectTrust signup.
|
||||
|
||||
Classification rule: institutional = `cold` channel AND domain NOT in
|
||||
`CONSUMER_WEBMAIL` AND not Direct. (We already compute `cold`/`direct` and a
|
||||
`cold_consumer` count; just split on the consumer set.)
|
||||
|
||||
Always run the existing **free MX + SMTP RCPT verification on a NON-sending IP**
|
||||
(doc sec 8.2) over the institutional list before importing, so we never mail
|
||||
dead practice mailboxes (`550 5.1.1` from a clinic MX still hurts the hc IPs).
|
||||
|
||||
## Reputation hygiene (per stream, independent)
|
||||
- Separate **PTR/FCrDNS** (`hcmtaNN.performancewest.net`) + separate **SPF**
|
||||
authorization for the hc IPs (still under the same domain so DKIM/DMARC pass).
|
||||
- **DKIM/DMARC unchanged** (domain-level) — healthcare mail still signs as
|
||||
performancewest.net, which is fine and desirable.
|
||||
- Separate **bounce/complaint monitoring** per pool (grep by hc IP / by hc
|
||||
syslog_name). The existing monitoring commands extend trivially with the hc IPs.
|
||||
- A **healthcare ramp-cap script** (`pw-hc-rampcap`) mirroring `pw-listmonk-rampcap`
|
||||
but driving the `listmonk-hc` cap off `/etc/postfix/hc-warmup-start`.
|
||||
|
||||
## Concrete ordered steps
|
||||
0. **Decide:** single Postfix instance + class-based hc transport vs `postmulti`;
|
||||
and Listmonk Option A (2nd instance) vs B. (Recommend: single instance +
|
||||
class transport, and Listmonk Option A.)
|
||||
1. **DNS/identity:** add PTR `hcmtaNN` for `.107/.108/.109`, extend SPF, confirm
|
||||
DKIM/DMARC still pass for those IPs. (No send until green.)
|
||||
2. **Postfix:** new submission service on `:2526`; carve `out18/19/20` into an
|
||||
hc rotation pool; remove them from the trucking `ALL` array; add the
|
||||
`hc-warmup-start` stamp + `pw-hc-mta-warmup`. Keep Yahoo `hold:` backstop.
|
||||
3. **Listmonk-hc:** add `listmonk-hc` compose service (same image, own
|
||||
`LISTMONK_app__*` cap env / settings, SMTP server = `172.18.0.1:2526`),
|
||||
behind nginx at a separate vhost or path. Wire `pw-hc-rampcap`.
|
||||
4. **Audience:** extend the list builder to emit the 3 split files; run free MX +
|
||||
SMTP verification (non-sending IP) on the institutional file.
|
||||
5. **Campaign:** build a healthcare-institutional campaign (revalidation-overdue
|
||||
first → free NPI tool link → $399 PECOS Revalidation product), import the
|
||||
verified institutional list into `listmonk-hc`, send small focused batches.
|
||||
6. **deploy wiring:** add the new services/scripts to `deploy.sh` / `deploy-dev.sh`
|
||||
and ansible templates, mirroring the proxy-relay pattern just landed.
|
||||
|
||||
## Validation
|
||||
- **Isolation proof:** send a trucking batch and an hc batch simultaneously;
|
||||
confirm via `mail.log` that trucking mail egresses ONLY from `.94-.96` and hc
|
||||
mail ONLY from `.107-.109`, and that each respects its own cap independently.
|
||||
- **Identity proof:** an hc test send to a mail-tester/aboutmy.email account
|
||||
shows PTR `hcmtaNN`, SPF pass, DKIM pass, DMARC pass.
|
||||
- **Deliverability proof:** hc test sends to a Google Workspace test domain + an
|
||||
M365 test domain land in inbox (not spam); record per-domain disposition.
|
||||
- **Cap proof:** `pw-hc-rampcap` sets the `listmonk-hc` cap from the hc warmup day
|
||||
and does NOT touch the trucking Listmonk cap (and vice-versa).
|
||||
- **No regression:** trucking delivery mix unchanged after the split (same
|
||||
monitoring commands, same `.94-.96` volumes).
|
||||
|
||||
## Open decisions for Justin
|
||||
1. Real institutional-domain count: re-run the list builder on fresh NPPES data to
|
||||
get the exact `npi_healthcare_institutional.csv` size before we size the hc cap.
|
||||
2. Single Postfix instance (class transport) vs `postmulti` second instance.
|
||||
3. Listmonk: second instance (recommended, true cap isolation) vs single instance
|
||||
with windowed sends.
|
||||
4. How aggressive on the institutional ceiling (10k/day proposed) — start
|
||||
conservative and let data raise it.
|
||||
5. Whether hc uses a **separate Listmonk contacts DB** (cleaner per-stream
|
||||
complaint accounting) or shares the existing one.
|
||||
```
|
||||
Loading…
Add table
Add a link
Reference in a new issue