309 lines
18 KiB
Markdown
309 lines
18 KiB
Markdown
# Plan — Dual-Stream Outbound Email (Healthcare hot + Trucking trickle)
|
|
|
|
## Why this exists
|
|
Today **one global throttle governs all outbound mail**: the Listmonk sliding
|
|
window (`app.message_sliding_window_rate`, currently 150/h ramping to a 300/h
|
|
hard ceiling ≈ 4k/day) plus a shared Postfix rotation pool (`.94/.95/.96`).
|
|
|
|
That ceiling exists to protect **consumer-ISP reputation** (Gmail / Microsoft /
|
|
Yahoo), which is what the FMCSA trucking campaigns mail. The May 30-31 collapse
|
|
(29k blast → Gmail `550-5.7.1`, Yahoo `421 TSS04`, delivery fell to ~13%) is why
|
|
the whole warmup/cap machinery exists.
|
|
|
|
Healthcare's reachable audience is **different in kind**, so it should NOT be
|
|
constrained by the same ceiling:
|
|
- The cold-emailable NPPES-endpoint slice is "tens of thousands"; a large part is
|
|
consumer webmail (gmail ~12.4k) but a meaningful tail is **practice/clinic
|
|
domains** (their own MX, Google Workspace / Microsoft 365 tenants).
|
|
- **Practice-domain (institutional) mail does not share the consumer-ISP
|
|
snowshoe heuristics** that torch the trucking IPs. Its deliverability is
|
|
largely independent of the reputation we're protecting on `.94-.96`.
|
|
|
|
### Verified audience size (May 2026 NPPES endpoint_pfile, measured)
|
|
Classifying every email-formatted endpoint (deduped) with the tightened
|
|
Direct/HISP filter (`direct`, `medicity.net`, `surescripts`, `updox`, `maxmd`, …)
|
|
and the consumer-webmail set:
|
|
|
|
| segment | rows | NPIs | routing |
|
|
|---|---:|---:|---|
|
|
| Direct / HISP | 242,441 | — | **parked** (DirectTrust-only routing, won't cold-deliver) |
|
|
| Consumer webmail | 19,366 | ~19,072 | rides the **trucking** consumer-discipline stream |
|
|
| **Institutional (practice domains)** | **94,348** | **~92,592** | **HEALTHCARE HOT stream** |
|
|
|
|
Institutional spread: **38,873 distinct domains**, **76% of which have exactly 1
|
|
provider** (small practices = our $399 PECOS-revalidation buyer). Top-100 domains
|
|
are only 23% of volume → healthy long tail, no single MX gets hammered. (Excludes
|
|
a handful of non-prospect giants — `va.gov`, `mail.mil`, `cvshealth.com`,
|
|
`walgreens.com`, `wal-mart.com` — that we drop in the audience build.)
|
|
|
|
This sizes the hot stream: at ~92k deliverable institutional addresses a 10k/day
|
|
ceiling drains the list in ~2 weeks; stuck behind the 4k trucking cap it would
|
|
take ~23 days AND poison the trucking IPs. Hence the split.
|
|
|
|
|
|
So the goal is **stream isolation**: let healthcare-institutional mail run hot on
|
|
its own IPs/cap while trucking keeps trickling on the warmed consumer-facing IPs,
|
|
with neither able to damage the other.
|
|
|
|
> Honesty caveat (do not skip): the *consumer-webmail* portion of the healthcare
|
|
> list (gmail/outlook/icloud addresses) is NOT institutional and MUST ride the
|
|
> same cautious consumer-ISP discipline as trucking. "Run healthcare hot" applies
|
|
> ONLY to the practice-domain (non-consumer, non-DirectTrust) segment. We split
|
|
> the healthcare list itself into `healthcare-institutional` vs
|
|
> `healthcare-consumer` and route each to the matching stream.
|
|
|
|
## Architecture: two independent streams, one Postfix, one Listmonk
|
|
|
|
```mermaid
|
|
flowchart TD
|
|
LM[Listmonk] -->|SMTP server A: 172.18.0.1:25\nhello perfwest...| PFA[Postfix submission]
|
|
LM -->|SMTP server B: 172.18.0.1:2526\nhello hc-mta...| PFB[Postfix submission hc]
|
|
PFA --> TR{transport map}
|
|
PFB --> TRH{transport_maps hc}
|
|
TR -->|yahoo family| HOLD[hold:]
|
|
TR -->|consumer + everything else| ROT[randmap rotation\nout05..out20\n.94-.109]
|
|
TRH -->|practice domains| HCROT[randmap hc pool\nhcout1..hcout4\n.107-.109 + spare]
|
|
ROT --> NET1[(consumer ISPs:\nGmail / MS, capped low)]
|
|
HCROT --> NET2[(practice MX /\nWorkspace / M365, hot)]
|
|
```
|
|
|
|
Two coordinated changes:
|
|
|
|
### 1. Postfix: a dedicated healthcare submission service + IP sub-pool
|
|
- Carve **2-3 IPs out of the existing 20** (`.107/.108/.109` = `out18/19/20`,
|
|
currently unused at the warmup tail) into a **healthcare-only rotation pool**.
|
|
They get their own HELO (`hcmtaNN.performancewest.net` — confirm/lay down PTR +
|
|
SPF first) so healthcare reputation is built and judged separately from
|
|
trucking. They are removed from the trucking `ALL=(...)` array so the trucking
|
|
warmup never reclaims them.
|
|
- Add a **second Postfix submission entry** in `master.cf` listening on a distinct
|
|
port (e.g. `2526`) whose injected mail is tagged to the healthcare pool. Two
|
|
clean ways to bind the pool:
|
|
- **(preferred) sender-dependent / class-based transport:** route by the
|
|
submission port via a dedicated `cleanup`/`smtpd` service that sets a header
|
|
or uses a separate `transport_maps` so healthcare recipients hit
|
|
`randmap:{hcout1:,hcout2:,hcout3:}`.
|
|
- Simpler alternative: a separate Postfix **instance** (`postmulti`) listening
|
|
on `2526`, with its own `main.cf` bound to the hc IPs. More isolation, more
|
|
moving parts. Decide in step 0 (recommend the single-instance class-based
|
|
route unless isolation is required).
|
|
- Keep the **Yahoo-family `hold:` backstop** in BOTH transports. Healthcare list
|
|
is pre-filtered, but defense in depth.
|
|
|
|
### 2. Listmonk: a second SMTP server, used only by healthcare campaigns
|
|
Listmonk's `settings.smtp` is a JSON array and **already supports multiple SMTP
|
|
servers**. Add a second entry:
|
|
```json
|
|
{ "host":"172.18.0.1", "port":2526, "uuid":"healthcare",
|
|
"enabled":true, "hello_hostname":"hcmta.performancewest.net",
|
|
"max_conns":4, "tls_type":"none", "auth_protocol":"none" }
|
|
```
|
|
Listmonk round-robins across enabled SMTP servers, so to keep streams isolated we
|
|
do NOT rely on per-campaign SMTP selection (Listmonk lacks native per-campaign
|
|
SMTP pinning). Instead we isolate by **separate Listmonk instances OR** by the
|
|
cleaner operational split below. Decide in step 0:
|
|
|
|
- **Option A — second Listmonk instance** (`listmonk-hc`) on the same Postgres,
|
|
separate `app.message_sliding_window_rate`, pointed only at port `2526`.
|
|
Cleanest isolation of caps; ~zero risk of cross-stream throttle coupling. This
|
|
is the recommended option because the *whole point* is independent caps.
|
|
- **Option B — one Listmonk**, single SMTP server B for healthcare, and we accept
|
|
Listmonk's single global cap by running trucking and healthcare in
|
|
non-overlapping send windows. Cheaper but couples the caps (defeats the goal).
|
|
|
|
→ **Recommend Option A** (second `listmonk-hc` service in compose). It gets its
|
|
own `app.message_sliding_window_rate` (the healthcare cap), its own SMTP server
|
|
(port 2526 → hc IPs), and shares the contacts DB only if we want (probably
|
|
separate DB to keep bounce/complaint reputation accounting clean per stream).
|
|
|
|
## Healthcare-stream cap (institutional segment)
|
|
Institutional B2B mail tolerates much higher volume than consumer cold mail, but
|
|
we still **warm the new hc IPs** (they're fresh) and we still respect per-domain
|
|
practice MX limits. Proposed hc warmup (separate stamp `/etc/postfix/hc-warmup-start`):
|
|
|
|
| hc warmup day | hourly cap | ~daily | notes |
|
|
|---:|---:|---:|---|
|
|
| 0-1 | 100/h | ~1,000 | brand-new hc IPs, prove clean |
|
|
| 2-4 | 300/h | ~3,000 | |
|
|
| 5-9 | 600/h | ~6,000 | |
|
|
| 10+ | 1,000/h | ~10,000 | institutional ceiling; revisit with data |
|
|
|
|
These are **separate** from and additive to the trucking ~4k/day ceiling, because
|
|
they hit a disjoint set of receiving systems on disjoint sending IPs.
|
|
|
|
Per-domain politeness still applies (`smtp_destination_concurrency_limit`,
|
|
`smtp_destination_rate_delay`) so we never hammer one clinic's MX.
|
|
|
|
## Audience split (must happen before any send)
|
|
Extend `scripts/build_npi_outreach_lists.py` (or a thin post-processor) to emit
|
|
THREE files instead of lumping cold together:
|
|
1. `npi_healthcare_institutional.csv` — cold, non-Direct, **non-consumer-webmail**
|
|
(practice/clinic domains). → healthcare HOT stream.
|
|
2. `npi_healthcare_consumer.csv` — cold consumer webmail (gmail/outlook/icloud…).
|
|
→ rides the TRUCKING consumer-discipline stream (low cap), NOT the hot one.
|
|
3. `npi_direct_secure.csv` — DirectTrust/HISP. → parked until DirectTrust signup.
|
|
|
|
Classification rule: institutional = `cold` channel AND domain NOT in
|
|
`CONSUMER_WEBMAIL` AND not Direct. (We already compute `cold`/`direct` and a
|
|
`cold_consumer` count; just split on the consumer set.)
|
|
|
|
Always run the existing **free MX + SMTP RCPT verification on a NON-sending IP**
|
|
(doc sec 8.2) over the institutional list before importing, so we never mail
|
|
dead practice mailboxes (`550 5.1.1` from a clinic MX still hurts the hc IPs).
|
|
|
|
## Reputation hygiene (per stream, independent)
|
|
- Separate **PTR/FCrDNS** (`hcmtaNN.performancewest.net`) + separate **SPF**
|
|
authorization for the hc IPs (still under the same domain so DKIM/DMARC pass).
|
|
- **DKIM/DMARC unchanged** (domain-level) — healthcare mail still signs as
|
|
performancewest.net, which is fine and desirable.
|
|
- Separate **bounce/complaint monitoring** per pool (grep by hc IP / by hc
|
|
syslog_name). The existing monitoring commands extend trivially with the hc IPs.
|
|
- A **healthcare ramp-cap script** (`pw-hc-rampcap`) mirroring `pw-listmonk-rampcap`
|
|
but driving the `listmonk-hc` cap off `/etc/postfix/hc-warmup-start`.
|
|
|
|
## Concrete ordered steps
|
|
0. **Decide:** single Postfix instance + class-based hc transport vs `postmulti`;
|
|
and Listmonk Option A (2nd instance) vs B. (Recommend: single instance +
|
|
class transport, and Listmonk Option A.)
|
|
1. **DNS/identity:** add PTR `hcmtaNN` for `.107/.108/.109`, extend SPF, confirm
|
|
DKIM/DMARC still pass for those IPs. (No send until green.)
|
|
2. **Postfix:** new submission service on `:2526`; carve `out18/19/20` into an
|
|
hc rotation pool; remove them from the trucking `ALL` array; add the
|
|
`hc-warmup-start` stamp + `pw-hc-mta-warmup`. Keep Yahoo `hold:` backstop.
|
|
3. **Listmonk-hc:** add `listmonk-hc` compose service (same image, own
|
|
`LISTMONK_app__*` cap env / settings, SMTP server = `172.18.0.1:2526`),
|
|
behind nginx at a separate vhost or path. Wire `pw-hc-rampcap`.
|
|
4. **Audience:** extend the list builder to emit the 3 split files; run free MX +
|
|
SMTP verification (non-sending IP) on the institutional file.
|
|
5. **Campaign:** build a healthcare-institutional campaign (revalidation-overdue
|
|
first → free NPI tool link → $399 PECOS Revalidation product), import the
|
|
verified institutional list into `listmonk-hc`, send small focused batches.
|
|
6. **deploy wiring:** add the new services/scripts to `deploy.sh` / `deploy-dev.sh`
|
|
and ansible templates, mirroring the proxy-relay pattern just landed.
|
|
|
|
## Validation
|
|
- **Isolation proof:** send a trucking batch and an hc batch simultaneously;
|
|
confirm via `mail.log` that trucking mail egresses ONLY from `.94-.96` and hc
|
|
mail ONLY from `.107-.109`, and that each respects its own cap independently.
|
|
- **Identity proof:** an hc test send to a mail-tester/aboutmy.email account
|
|
shows PTR `hcmtaNN`, SPF pass, DKIM pass, DMARC pass.
|
|
- **Deliverability proof:** hc test sends to a Google Workspace test domain + an
|
|
M365 test domain land in inbox (not spam); record per-domain disposition.
|
|
- **Cap proof:** `pw-hc-rampcap` sets the `listmonk-hc` cap from the hc warmup day
|
|
and does NOT touch the trucking Listmonk cap (and vice-versa).
|
|
- **No regression:** trucking delivery mix unchanged after the split (same
|
|
monitoring commands, same `.94-.96` volumes).
|
|
|
|
## Decisions (locked)
|
|
1. **Postfix:** single instance + class-based hc transport (port `:2526` →
|
|
hc rotation pool). No `postmulti`.
|
|
2. **Listmonk:** a **second instance** (`listmonk-hc`) with its own
|
|
sliding-window cap → true cap isolation.
|
|
3. **Institutional ceiling:** **10k/day** (warm up to it).
|
|
4. **Contacts DB:** separate (`listmonk_hc` database) — cleaner per-stream
|
|
bounce/complaint accounting, and the hc instance needs its own DB anyway.
|
|
5. **Audience count:** measured — ~92,592 institutional NPIs / 38,873 domains
|
|
(see table above).
|
|
|
|
## Open / for-later
|
|
- How aggressive on the institutional ceiling beyond 10k/day — raise only with
|
|
clean delivery data.
|
|
- DirectTrust signup to unlock the 242k Direct/HISP segment (separate effort).
|
|
|
|
## Implementation status (built + validated)
|
|
Committed and validated on dev:
|
|
- **Audience split** — `scripts/healthcare_email_streams.py` (shared classifier)
|
|
+ reworked `scripts/build_npi_outreach_lists.py` emit
|
|
`npi_healthcare_institutional/consumer.csv` + `npi_direct_secure.csv`.
|
|
Verified on May 2026 NPPES: 89,557 institutional rows.
|
|
- **Postfix hc stream** — `infra/postfix/hc_stream_setup.sh` applied on the app
|
|
server: ports 2526/2527/2528 -> hcout1/2/3 -> IPs .107/.108/.109 (HELO
|
|
hcmta01-03). Proven: a send on :2527 egressed via hcout2 (.108) to the real
|
|
gmail MX; trucking transport_maps (.94-.96) untouched.
|
|
- **listmonk-hc** — second instance (own `listmonk_hc` DB, own cap), 3 SMTP
|
|
servers = the 3 hc ports. Proven on dev: listmonk-hc container -> host :2526
|
|
(hcsubmit107) -> hcout1 (.107) -> real gmail MX.
|
|
- **Ramp-cap** — `infra/postfix/pw-hc-rampcap.sh` (100->1000/h off
|
|
`/etc/postfix/hc-warmup-start`), independent of the trucking ramp.
|
|
- **Deploy wiring** — deploy.sh/deploy-dev.sh bring up listmonk-hc;
|
|
`docker-compose.dev.override.yml` keeps dev (shared host) from clashing on
|
|
prod host ports / postgres volume.
|
|
|
|
## REMAINING before any healthcare send (manual, needs Justin/DNS)
|
|
1. **PTR / FCrDNS** for the hc IPs — ✅ **DONE 2026-06-06.**
|
|
`.107->hcmta01`, `.108->hcmta02`, `.109->hcmta03` (.performancewest.net),
|
|
plus matching forward A records, verified resolving on the authoritative NS
|
|
AND HE.net secondaries (SOA serial in sync). FCrDNS confirmed both ways.
|
|
|
|
**How (for future reference):** HestiaCP box `cp.carrierone.com` =
|
|
`207.174.124.22`, **SSH port 22022** (not 22). `admin@` is sftp-only, but
|
|
**`root@.22:22022` accepts our default `~/.ssh/id_ed25519`** → full shell +
|
|
Hestia CLI. Forward zone `performancewest.net` and reverse zone
|
|
`124.174.207.in-addr.arpa` are both owned by Hestia user **`justin`**; HE.net
|
|
auto-zone-transfers (secondaries). Commands used:
|
|
```
|
|
export PATH=$PATH:/usr/local/hestia/bin
|
|
# forward A: USER DOMAIN RECORD TYPE VALUE
|
|
v-add-dns-record justin performancewest.net hcmta01 A 207.174.124.107
|
|
# reverse PTR: USER REVZONE OCTET PTR FQDN. "" "" <restart yes/no>
|
|
v-add-dns-record justin 124.174.207.in-addr.arpa 107 PTR hcmta01.performancewest.net. "" "" yes
|
|
v-delete-dns-record justin 124.174.207.in-addr.arpa <ID> no # remove stale
|
|
v-rebuild-dns-domain justin 124.174.207.in-addr.arpa # bump serial
|
|
```
|
|
(Also removed pre-existing duplicate `mta18-20` PTRs in the reverse zone.)
|
|
NOTE: the workers' `hestia_provisioner.py` path (admin@:22 + mounted key)
|
|
remains unfinished/unused — the working path is root@:22022 with our key.
|
|
2. **SPF/DKIM/DMARC** — ✅ **VERIFIED 2026-06-06.** SPF already authorizes
|
|
`.107/.108/.109` explicitly and ends `-all` (only 2 DNS-lookup mechanisms,
|
|
`a mx` — safe under the 10 limit). DKIM selector `mail` published (2048-bit).
|
|
DMARC `p=quarantine; pct=100; rua=dmarc@`. All domain-level, no change needed.
|
|
3. **Install on prod** — ✅ **DONE 2026-06-06.**
|
|
- Postfix hc stream already live on the app host (Postfix is co-located):
|
|
ports `2526/2527/2528` → `content_filter=hcout1/2/3:` → `smtp_bind_address`
|
|
`.107/.108/.109` + HELO `hcmta01/02/03`. Verified in master.cf.
|
|
- `listmonk_hc` DB existed (owner `pw`, was empty); ran
|
|
`docker compose run --rm --entrypoint /bin/sh listmonk-hc -c
|
|
'./listmonk --install --idempotent --yes --config /listmonk/config.toml'`
|
|
→ 16 tables, superadmin `api` created. `docker compose up -d listmonk-hc`
|
|
→ container Up, `:9101` → 200.
|
|
- **3 SMTP servers configured directly in the `listmonk_hc.settings` table**
|
|
(the env-installed admin is a UI user, not an API-token user, so the REST
|
|
API rejects basic-auth; DB update is the clean path). Each points at
|
|
`172.18.0.1:2526/2527/2528` (docker bridge gateway → host Postfix hc ports),
|
|
`auth_protocol=none`, `tls_type=none`, `max_conns=2`,
|
|
`hello_hostname=hcmta0N`. Restart loaded "3 SMTP messengers".
|
|
- **End-to-end validated:** submitted one probe through each of 2526/2527/2528;
|
|
maillog shows each routed via its own `hcout1/2/3`, established a **Trusted
|
|
TLS connection to gmail-smtp-in.l.google.com:25**, and got a genuine Gmail
|
|
`550-5.1.1 NoSuchUser` (expected for the dummy recipient) — i.e. **no
|
|
PTR/SPF/reputation rejection**, FCrDNS accepted from all 3 hc IPs.
|
|
- ✅ `pw-hc-rampcap` installed at `/usr/local/bin/` + `/etc/cron.d/pw-hc-rampcap`
|
|
(daily 07:20, mirrors the trucking rampcap). The hc warmup stamp
|
|
`/etc/postfix/hc-warmup-start` exists (created by `hc_stream_setup.sh`), so
|
|
the ramp is on **day 0 → cap 100/h** (sliding window, 1h). Ramps to 1000/h
|
|
by day 10. Nothing sends until a list is imported.
|
|
4. **Verify identity** — ⚠️ **PARTIAL.** The live-send probes already prove Gmail
|
|
accepts mail from `.107/.108/.109` with no PTR/SPF/reputation rejection (only
|
|
the dummy-recipient `550 NoSuchUser`). Still worth a **mail-tester.com /
|
|
aboutmy.email** run from an hc IP (send to their probe address through
|
|
listmonk-hc) to confirm the numeric score (DKIM-signed, DMARC aligned, content
|
|
spamassassin score) BEFORE the first real batch. Not started.
|
|
5. **Free MX+SMTP verify** the institutional CSV on a non-sending IP, import the
|
|
verified file into listmonk-hc, send small focused batches (overdue-first).
|
|
|
|
|
|
```
|
|
|
|
## Campaign tracking (opens/clicks) — fixed 2026-06-06
|
|
listmonk-hc was installed with the default `app.root_url=http://localhost:9000`,
|
|
so every tracking pixel + click-link in sent emails pointed at localhost ->
|
|
recipients couldn't reach them -> **views/clicks always showed 0**. Fixed:
|
|
- `app.root_url` -> `https://lists-hc.performancewest.net` (the public portal vhost)
|
|
- `privacy.individual_tracking` -> `true` (per-subscriber opens/clicks)
|
|
- `privacy.disable_tracking` -> `false`
|
|
- restarted listmonk-hc to load the new root_url.
|
|
Verified: hitting the tracking pixel publicly records a `campaign_views` row, and
|
|
the campaign preview renders pixel+links as `lists-hc.performancewest.net/...`
|
|
with zero `localhost` references. NOTE: emails sent BEFORE this fix (the first
|
|
~100 warmup) had localhost tracking baked in and won't track retroactively; all
|
|
future sends track correctly.
|