# Email Deliverability Runbook **Owner action items are marked 🔴 MANUAL. Everything else is already done/automated.** Last updated: 2026-06-19 (bulk subdomain + SPF trim + Microsoft/audience analysis). --- ## TL;DR of the 2026-06-18/19 deliverability incident - **Symptom:** ~30% "open" rates but **0 human clicks, 0 sales** across both trucking and healthcare streams. - **Root cause:** NOT a blocklist, NOT the IPs. Proven by a controlled A/B test (2026-06-19): from the **same mail server / same IPs**, a message From `justin@carrierone.com` landed in the **Inbox** while From `justin@performancewest.net` went to **Junk**. The variable is the **From domain's reputation**. `carrierone.com` (reg. 2006, years of steady low-volume mail, tight 2-IP SPF) is trusted; `performancewest.net` (only started bulk in ~May 2026, broken DKIM until 2026-06-17, 21-IP snowshoe SPF, May 30-31 over-volume blast) is cold/damaged. - **Where the audience actually is (24h receiver mix):** **~85% Microsoft** (M365/Outlook/Hotmail), ~14% Google, <1% Yahoo. Our list is B2B, so Microsoft is the game, not Gmail. **Microsoft is NOT reputation-blocking us** (only ~1.6% 5.7.x/S3150 rejects; it accepts ~2,138 msgs/24h) — but acceptance != inbox, so the engagement problem there is likely Junk-foldering, same domain-reputation cause. Gmail rejects ~95% of its (smaller) slice on `550-5.7.1 ... very low reputation of the sending domain`. The single biggest bounce bucket is actually **list hygiene**: ~1,012/24h Microsoft `451 4.4.4 no mail-enabled subscriptions` (dead tenant domains) + dead recipients. - **Fixes applied (2026-06-18/19):** 1. Consolidated to ONE IP per stream (snowshoe was a band-aid for broken DKIM). 2. **Dedicated bulk subdomain** `send.performancewest.net` so bulk reputation is isolated from the root domain (which stays clean for transactional mail). 3. Trimmed root SPF from 21 IPs to the real 3 (the bloated record was itself a snowshoe signal). 4. Disabled the pointless `pw-ip-rehab` cron (we have no IP reputation problem). --- ## Bulk subdomain: send.performancewest.net (2026-06-19) **Why:** isolate bulk/cold-campaign sending reputation from the root domain. The root domain carries transactional/verification/receipt mail (via co.carrierone.com relay + the .71 default egress) and must stay clean; cold campaigns are inherently reputation-risky. Industry-standard (SendGrid/Mailchimp/etc.) split. **Customer experience is unchanged:** From is the subdomain, but **Reply-To stays `info@performancewest.net`**, so replies land in the real inbox and look normal. | Piece | Value | |-------|-------| | Trucking From | `Performance West ` | | Healthcare From | `Performance West Compliance ` | | Reply-To (both) | `info@performancewest.net` | | DKIM selector | `send` (`send._domainkey.send.performancewest.net`), 2048-bit | | SPF | `v=spf1 ip4:207.174.124.94 ip4:207.174.124.107 -all` | | DMARC | inherits root `p=reject` (explicit `_dmarc.send` also published) | | MX / Return-Path | `co.carrierone.com` (bounces) | | Egress IPs | .94 (trucking) / .107 (HC) — unchanged | **Code:** `from_email` is set in `scripts/build_trucking_campaigns.py` (`FROM_EMAIL`, env `CAMPAIGN_FROM`) and `scripts/build_healthcare_campaigns_cron.py` (`FROM_EMAIL`, env `HC_CAMPAIGN_FROM`). Bounce-watchers (`scripts/bounce-watcher.sh`, `scripts/hc-bounce-watcher.sh`) track the new subdomain sender (and keep the legacy root sender so the pre-cutover queue drains). **Infra:** OpenDKIM signs both domains — see `infra/ansible/roles/mail` (`opendkim_signing_domains` list generates per-domain keys + KeyTable/SigningTable). DNS published on the Hestia master (see DNS automation note below). Verified end-to-end 2026-06-19: a test send signs `d=send.performancewest.net; s=send;` and egresses out05/.94. **Listmonk global `app.from_email`** was also updated in both DBs as a fallback for any UI/test send that doesn't set From explicitly. > ⚠️ The subdomain starts at NEUTRAL reputation (not negative, not warm). It still > needs the same warm-up discipline: steady low volume to engaged recipients. It is > NOT a magic reset — but it protects the root domain and starts cleaner than the > damaged root. --- ## Sending architecture (after 2026-06-18/19 consolidation) | Stream | IP | PTR / HELO | Path | |--------|----|-----------|----| | **Trucking** (listmonk) | **207.174.124.94** | mta05.performancewest.net | listmonk -> :25 -> `randmap:{out05:}` | | **Healthcare** (listmonk-hc) | **207.174.124.107** | hcmta01.performancewest.net | listmonk-hc SMTP server 1 -> :2526 -> hcout1 | | Transactional / verification | 207.174.124.71 + co.carrierone.com (.15) | perfwest | default `smtp_bind_address` (.71) + :587 relay (.15) | | Yahoo/AOL trickle | 207.174.124.90 | mta01 | `yahooslow` transport (hash:transport) | | Retired (torched May 30-31) | .91 / .92 / .93 | mta02-04 | rehab02-04 — **`pw-ip-rehab` cron DISABLED 2026-06-19** | | Dormant (re-expand later) | .95-.105, .108-.109 | mta06-17, hcmta02-03 | disabled | **Root SPF (trimmed 2026-06-19):** `v=spf1 a mx ip4:207.174.124.15 ip4:207.174.124.94 ip4:207.174.124.107 -all` — `a`=.71, `mx`=co.carrierone.com(.15), plus the two bulk IPs. The old 21-IP record was a snowshoe signal; this matches carrierone.com's tight style. **To re-expand after reputation is established:** add transports back to `ALL=()` in `infra/postfix/pw-mta-warmup.sh` and re-enable the HC SMTP servers (ports 2527/2528) in the `listmonk_hc` DB `settings.smtp`. Re-expand SLOWLY (one IP at a time, days apart) and only after Postmaster Tools shows a green/medium reputation. If you re-expand, also add the IPs back to BOTH the root SPF and the `send` subdomain SPF. --- ## DNS automation (Hestia is the master) **DNS is fully automatable** — Hestia (`cp.carrierone.com`, 207.174.124.22) is the DNS master; HE.net are slaves. Access: `ssh -p 22022 root@cp.carrierone.com` using the **local workstation's** `~/.ssh/id_ed25519` (NOT the app server, NOT justin@ which is SFTP-only). The `justin` Hestia user owns the `performancewest.net` zone. ``` # add (note: Hestia appends the base domain to the RECORD name, so a record at # send._domainkey.send.performancewest.net needs RECORD = "send._domainkey.send") v-add-dns-record justin performancewest.net "" "" [prio] # change / delete (find the numeric id with v-list-dns-records ... plain) v-change-dns-record justin performancewest.net "" "" "" yes v-delete-dns-record justin performancewest.net # list v-list-dns-records justin performancewest.net plain ``` Each write triggers a ~30s zone rebuild + DNSSEC re-sign; slaves sync via NOTIFY / SOA refresh, usually within a minute. Verify on `@8.8.8.8` AND the master `@207.174.124.22` (the master is authoritative; public resolvers may lag). --- ## Monitoring tools (set these up to SEE reputation directly) These all require a provider account login + (for Google) a DNS TXT record on HE.net, so they can't be fully automated. Steps are pre-filled below. ### 🔴 MANUAL 1 — Google Postmaster Tools (Gmail is our biggest blocker) Gmail's verbatim rejection names "the sending **domain**", so this is priority #1. **DNS is fully automatable** — Hestia (cp.carrierone.com) is the DNS master, HE.net are slaves. Add records as root: `ssh -p 22022 root@cp.carrierone.com` then `v-add-dns-record justin performancewest.net "@" TXT '"'"'""'"'"'` (zone owner is the `justin` Hestia user; ~30s zone rebuild + slaves sync via the 2h SOA refresh / NOTIFY, usually within a minute). Status 2026-06-18: **TXT added + verified live** (record id 14464, `google-site-verification=p8s3RaN5wi81350wToMpdPMho5Gcel4RGT1Q1SXj7vg`), resolving on 8.8.8.8/1.1.1.1/9.9.9.9 and 4/5 HE.net slaves. Owner just needs to click **Verify** in the Postmaster console once. Data populates 24-48h after volume flows from the consolidated IP. To set up from scratch next time: postmaster.google.com -> +Add domain -> performancewest.net -> copy the `google-site-verification=...` token -> add via the Hestia command above -> Verify. ### 🔴 MANUAL 2 — Microsoft SNDS + JMRP (Outlook/Hotmail/Live) — **#1 PRIORITY** **85% of our audience is Microsoft-hosted** (M365/Outlook/Hotmail), so this is the single most important monitoring tool. Microsoft already *accepts* our mail (~1.6% reputation rejects), so this tells us inbox-vs-junk + complaint rates. SNDS is **IP-based** (register the sending IPs), JMRP is the complaint feedback loop. > **2026 URL MIGRATION:** Microsoft moved SNDS off > `sendersupport.olc.protection.outlook.com`. The old `/snds/` and `/pm/` links now > 308-redirect to the new app at **`substrate.office.com/ip-domain-management-snds/`**. > The *footer/help* links on that page ("contact sender support", "Privacy", > "Microsoft Services Agreement") go to generic `microsoft.com` pages — that is > normal, they are boilerplate, NOT the broken task. **You must click "Log in" > (top-right) with a personal Microsoft account FIRST**; until you authenticate the > "Request Access" / "Junk Mail Reporting Program" links just bounce to > `login.microsoftonline.com`, which looks like a dead redirect but is the expected > auth step. After login the real forms render. 1. **SNDS — Request Access:** open the SNDS app — either the legacy entry (it 308-redirects to the new app) or directly `https://substrate.office.com/ip-domain-management-snds/SNDS` — then **Log in** -> left-nav **"Request Access"** (direct: `https://substrate.office.com/ip-domain-management-snds/SNDS/AddNetwork`) -> register IPs **207.174.124.94** and **207.174.124.107** (the two live stream IPs; add .90 and .71 if you want full coverage). Verification goes to a role address on the IP's domain (use `postmaster@` or `abuse@performancewest.net`, now live). (NOTE: `snds.microsoft.com` does NOT resolve — do not use it.) 2. **JMRP:** same site, left-nav **"Junk Mail Reporting Program"** (direct: `https://substrate.office.com/ip-domain-management-snds/SNDS/Jmrp`) -> register the same IPs + complaint-destination mailbox **`fbl@performancewest.net`**. Complaints then arrive as ARF emails. **✅ DONE 2026-06-19:** both IPs registered as feeds — `pw1` = 207.174.124.94, `pw2` = 207.174.124.107, complaint destination set to **`fbl@performancewest.net`** (live, routes to ops@). ARF complaint reports now land there automatically. **✅ PREREQ DONE (2026-06-19):** the role mailboxes Microsoft needs now exist and deliver. Created as Carbonio distribution lists routing to `ops@performancewest.net`: `postmaster@`, `abuse@`, `fbl@`, `dmarc@` — all verified ACCEPT at the MX + delivered end-to-end. (They previously REJECTED with 5.1.1, which would have blocked SNDS verification.) Use `postmaster@` or `abuse@` for SNDS verification and `fbl@performancewest.net` as the JMRP complaint destination. > Carbonio mail admin: `ssh -p 22022 justin@207.174.124.15` (the **co.carrierone.com** > mail host; local workstation key, justin has NOPASSWD sudo). Run prov as zextras: > `sudo -u zextras /opt/zextras/bin/carbonio prov ` (e.g. `gaa`, `gadl`, > `cdl `, `adlm
`, `gdlm
`). ### 🔴 MANUAL 3 — Yahoo Complaint Feedback Loop (Yahoo/AOL + att/sbcglobal/verizon) Lowest priority (<1% of audience), but cheap. CFL is DKIM-d= based. 1. -> sign in -> register the domains `performancewest.net` **and** `send.performancewest.net` (CFL keys off the DKIM `d=` value; bulk mail now signs `d=send.performancewest.net`). 2. Set the complaint destination to `fbl@performancewest.net` (now live, see above). ### ✅ DMARC aggregate reports — mailbox FIXED 2026-06-19 (parser still TODO) Gmail/Yahoo/Microsoft send daily per-IP auth+disposition XML to `dmarc@performancewest.net` (DMARC record has `rua=mailto:dmarc@...`). **That mailbox was REJECTING (5.1.1) until 2026-06-19 — we were silently losing every report.** It's now a Carbonio DL -> ops@ (verified delivering). Next: add IMAP creds for ops@ (or a dedicated dmarc mailbox) and build a small collector/parser worker to chart per-IP/per-domain pass-fail without any provider login. Now actually worth doing since the data finally arrives. --- ## Ongoing hygiene (reduce reputation damage) - **Dead-address scrub:** ~110 genuine `5.1.1 user unknown` bounces/day. listmonk already blocklists hard bounces after 1 (`bounce.actions hard->blocklist`), so these self-clean, but pre-scrubbing the dirtiest segments before send avoids the reputation hit. See `data/` segment exports. - **Consumer-domain exclusion (two layers).** The authoritative list lives in `scripts/_email_exclusions.py` (`BLOCKED_EMAIL_DOMAINS`): gmail/google, the full Yahoo/Verizon-Media family, Microsoft consumer, **Apple/iCloud (added 2026-06-19)**, dead/legacy ISPs, and the legal do-not-contact list. 1. *NEW selections:* the per-vertical builders filter it out of audience SQL and `listmonk_import.py` refuses to import a blocked address. 2. *Already-imported subs:* LIST-BASED campaigns (FCC Direct Contacts list 3, CRTC/USF blasts) can still hit consumer subs imported BEFORE a domain joined the list. `scripts/scrub_listmonk_consumer.py` reconciles the live subscriber table against the exclusion list and blocklists any ENABLED match (idempotent; `--dry-run` supported; both `listmonk` + `listmonk_hc`). Runs daily 06:30 UTC via `/etc/cron.d/pw-listmonk-scrub` (tracked at `infra/cron/pw-listmonk-scrub`). First run 2026-06-19 blocklisted **7,943** trucking + **21** HC stale consumer subs (1,321 iCloud, 267 gmail, etc.) that were leaking via the running CRTC campaign. Re-run the scrub whenever you add a domain to the exclusion list. - **Don't re-expand IPs** until Postmaster Tools shows recovered reputation. - **Volume discipline:** keep the global 200/hr sliding window until reputation is green; concentrated low volume on one warm IP beats bursts. - **Watch the rejection mix:** `5.7.1 reputation/spam/blocked` should fall over the next 1-2 weeks as the single-IP reputation builds. Track via: `ssh ... 'sudo grep status=bounced /var/log/mail.log | grep -c 5.7.1'`