new-site/docs/deliverability.md
justin a9bbfbf59b docs(deliverability): Microsoft MANUAL 2 fully DONE — SNDS access + JMRP both set
SNDS access requested/granted for 207.174.124.94 + .107; JMRP feeds registered
with complaint dest fbl@. Section marked complete. SNDS data populates in ~24-48h.
2026-06-19 02:03:30 -05:00

14 KiB

Email Deliverability Runbook

Owner action items are marked 🔴 MANUAL. Everything else is already done/automated.

Last updated: 2026-06-19 (bulk subdomain + SPF trim + Microsoft/audience analysis).


TL;DR of the 2026-06-18/19 deliverability incident

  • Symptom: ~30% "open" rates but 0 human clicks, 0 sales across both trucking and healthcare streams.
  • Root cause: NOT a blocklist, NOT the IPs. Proven by a controlled A/B test (2026-06-19): from the same mail server / same IPs, a message From justin@carrierone.com landed in the Inbox while From justin@performancewest.net went to Junk. The variable is the From domain's reputation. carrierone.com (reg. 2006, years of steady low-volume mail, tight 2-IP SPF) is trusted; performancewest.net (only started bulk in ~May 2026, broken DKIM until 2026-06-17, 21-IP snowshoe SPF, May 30-31 over-volume blast) is cold/damaged.
  • Where the audience actually is (24h receiver mix): ~85% Microsoft (M365/Outlook/Hotmail), ~14% Google, <1% Yahoo. Our list is B2B, so Microsoft is the game, not Gmail. Microsoft is NOT reputation-blocking us (only ~1.6% 5.7.x/S3150 rejects; it accepts ~2,138 msgs/24h) — but acceptance != inbox, so the engagement problem there is likely Junk-foldering, same domain-reputation cause. Gmail rejects ~95% of its (smaller) slice on 550-5.7.1 ... very low reputation of the sending domain. The single biggest bounce bucket is actually list hygiene: ~1,012/24h Microsoft 451 4.4.4 no mail-enabled subscriptions (dead tenant domains) + dead recipients.
  • Fixes applied (2026-06-18/19):
    1. Consolidated to ONE IP per stream (snowshoe was a band-aid for broken DKIM).
    2. Dedicated bulk subdomain send.performancewest.net so bulk reputation is isolated from the root domain (which stays clean for transactional mail).
    3. Trimmed root SPF from 21 IPs to the real 3 (the bloated record was itself a snowshoe signal).
    4. Disabled the pointless pw-ip-rehab cron (we have no IP reputation problem).

Bulk subdomain: send.performancewest.net (2026-06-19)

Why: isolate bulk/cold-campaign sending reputation from the root domain. The root domain carries transactional/verification/receipt mail (via co.carrierone.com relay + the .71 default egress) and must stay clean; cold campaigns are inherently reputation-risky. Industry-standard (SendGrid/Mailchimp/etc.) split.

Customer experience is unchanged: From is the subdomain, but Reply-To stays info@performancewest.net, so replies land in the real inbox and look normal.

Piece Value
Trucking From Performance West <noreply@send.performancewest.net>
Healthcare From Performance West Compliance <compliance@send.performancewest.net>
Reply-To (both) info@performancewest.net
DKIM selector send (send._domainkey.send.performancewest.net), 2048-bit
SPF v=spf1 ip4:207.174.124.94 ip4:207.174.124.107 -all
DMARC inherits root p=reject (explicit _dmarc.send also published)
MX / Return-Path co.carrierone.com (bounces)
Egress IPs .94 (trucking) / .107 (HC) — unchanged

Code: from_email is set in scripts/build_trucking_campaigns.py (FROM_EMAIL, env CAMPAIGN_FROM) and scripts/build_healthcare_campaigns_cron.py (FROM_EMAIL, env HC_CAMPAIGN_FROM). Bounce-watchers (scripts/bounce-watcher.sh, scripts/hc-bounce-watcher.sh) track the new subdomain sender (and keep the legacy root sender so the pre-cutover queue drains).

Infra: OpenDKIM signs both domains — see infra/ansible/roles/mail (opendkim_signing_domains list generates per-domain keys + KeyTable/SigningTable). DNS published on the Hestia master (see DNS automation note below). Verified end-to-end 2026-06-19: a test send signs d=send.performancewest.net; s=send; and egresses out05/.94.

Listmonk global app.from_email was also updated in both DBs as a fallback for any UI/test send that doesn't set From explicitly.

⚠️ The subdomain starts at NEUTRAL reputation (not negative, not warm). It still needs the same warm-up discipline: steady low volume to engaged recipients. It is NOT a magic reset — but it protects the root domain and starts cleaner than the damaged root.


Sending architecture (after 2026-06-18/19 consolidation)

Stream IP PTR / HELO Path
Trucking (listmonk) 207.174.124.94 mta05.performancewest.net listmonk -> :25 -> randmap:{out05:}
Healthcare (listmonk-hc) 207.174.124.107 hcmta01.performancewest.net listmonk-hc SMTP server 1 -> :2526 -> hcout1
Transactional / verification 207.174.124.71 + co.carrierone.com (.15) perfwest default smtp_bind_address (.71) + :587 relay (.15)
Yahoo/AOL trickle 207.174.124.90 mta01 yahooslow transport (hash:transport)
Retired (torched May 30-31) .91 / .92 / .93 mta02-04 rehab02-04 — pw-ip-rehab cron DISABLED 2026-06-19
Dormant (re-expand later) .95-.105, .108-.109 mta06-17, hcmta02-03 disabled

Root SPF (trimmed 2026-06-19): v=spf1 a mx ip4:207.174.124.15 ip4:207.174.124.94 ip4:207.174.124.107 -alla=.71, mx=co.carrierone.com(.15), plus the two bulk IPs. The old 21-IP record was a snowshoe signal; this matches carrierone.com's tight style.

To re-expand after reputation is established: add transports back to ALL=() in infra/postfix/pw-mta-warmup.sh and re-enable the HC SMTP servers (ports 2527/2528) in the listmonk_hc DB settings.smtp. Re-expand SLOWLY (one IP at a time, days apart) and only after Postmaster Tools shows a green/medium reputation. If you re-expand, also add the IPs back to BOTH the root SPF and the send subdomain SPF.


DNS automation (Hestia is the master)

DNS is fully automatable — Hestia (cp.carrierone.com, 207.174.124.22) is the DNS master; HE.net are slaves. Access: ssh -p 22022 root@cp.carrierone.com using the local workstation's ~/.ssh/id_ed25519 (NOT the app server, NOT justin@ which is SFTP-only). The justin Hestia user owns the performancewest.net zone.

# add  (note: Hestia appends the base domain to the RECORD name, so a record at
#        send._domainkey.send.performancewest.net needs RECORD = "send._domainkey.send")
v-add-dns-record justin performancewest.net "<record>" <TYPE> "<value>" [prio]
# change / delete (find the numeric id with v-list-dns-records ... plain)
v-change-dns-record justin performancewest.net <id> "<record>" <TYPE> "<value>" "" yes <ttl>
v-delete-dns-record justin performancewest.net <id>
# list
v-list-dns-records  justin performancewest.net plain

Each write triggers a ~30s zone rebuild + DNSSEC re-sign; slaves sync via NOTIFY / SOA refresh, usually within a minute. Verify on @8.8.8.8 AND the master @207.174.124.22 (the master is authoritative; public resolvers may lag).


Monitoring tools (set these up to SEE reputation directly)

These all require a provider account login + (for Google) a DNS TXT record on HE.net, so they can't be fully automated. Steps are pre-filled below.

🔴 MANUAL 1 — Google Postmaster Tools (Gmail is our biggest blocker)

Gmail's verbatim rejection names "the sending domain", so this is priority #1.

DNS is fully automatable — Hestia (cp.carrierone.com) is the DNS master, HE.net are slaves. Add records as root: ssh -p 22022 root@cp.carrierone.com then v-add-dns-record justin performancewest.net "@" TXT '"'"'"<value>"'"'"' (zone owner is the justin Hestia user; ~30s zone rebuild + slaves sync via the 2h SOA refresh / NOTIFY, usually within a minute).

Status 2026-06-18: TXT added + verified live (record id 14464, google-site-verification=p8s3RaN5wi81350wToMpdPMho5Gcel4RGT1Q1SXj7vg), resolving on 8.8.8.8/1.1.1.1/9.9.9.9 and 4/5 HE.net slaves. Owner just needs to click Verify in the Postmaster console once. Data populates 24-48h after volume flows from the consolidated IP.

To set up from scratch next time: postmaster.google.com -> +Add domain -> performancewest.net -> copy the google-site-verification=... token -> add via the Hestia command above -> Verify.

MANUAL 2 — Microsoft SNDS + JMRP (Outlook/Hotmail/Live) — DONE 2026-06-19

85% of our audience is Microsoft-hosted (M365/Outlook/Hotmail), so this is the single most important monitoring tool. Microsoft already accepts our mail (~1.6% reputation rejects), so this tells us inbox-vs-junk + complaint rates. SNDS is IP-based (register the sending IPs), JMRP is the complaint feedback loop. Both SNDS access and JMRP are now registered for 207.174.124.94 + .107.

2026 URL MIGRATION: Microsoft moved SNDS off sendersupport.olc.protection.outlook.com. The old /snds/ and /pm/ links now 308-redirect to the new app at substrate.office.com/ip-domain-management-snds/. The footer/help links on that page ("contact sender support", "Privacy", "Microsoft Services Agreement") go to generic microsoft.com pages — that is normal, they are boilerplate, NOT the broken task. You must click "Log in" (top-right) with a personal Microsoft account FIRST; until you authenticate the "Request Access" / "Junk Mail Reporting Program" links just bounce to login.microsoftonline.com, which looks like a dead redirect but is the expected auth step. After login the real forms render.

  1. SNDS — Request Access: open the SNDS app — either the legacy entry https://sendersupport.olc.protection.outlook.com/snds/ (it 308-redirects to the new app) or directly https://substrate.office.com/ip-domain-management-snds/SNDS — then Log in -> left-nav "Request Access" (direct: https://substrate.office.com/ip-domain-management-snds/SNDS/AddNetwork) -> register IPs 207.174.124.94 and 207.174.124.107 (the two live stream IPs; add .90 and .71 if you want full coverage). Verification goes to a role address on the IP's domain (use postmaster@ or abuse@performancewest.net, now live). (NOTE: snds.microsoft.com does NOT resolve — do not use it.) DONE 2026-06-19: access requested/granted for .94 + .107. Data populates over ~24-48h; then check the dashboard for the per-IP RED/YELLOW/GREEN status, spam-trap hits, and complaint rate.
  2. JMRP: same site, left-nav "Junk Mail Reporting Program" (direct: https://substrate.office.com/ip-domain-management-snds/SNDS/Jmrp) -> register the same IPs + complaint-destination mailbox fbl@performancewest.net. Complaints then arrive as ARF emails. DONE 2026-06-19: both IPs registered as feeds — pw1 = 207.174.124.94, pw2 = 207.174.124.107, complaint destination set to fbl@performancewest.net (live, routes to ops@). ARF complaint reports now land there automatically.

PREREQ DONE (2026-06-19): the role mailboxes Microsoft needs now exist and deliver. Created as Carbonio distribution lists routing to ops@performancewest.net: postmaster@, abuse@, fbl@, dmarc@ — all verified ACCEPT at the MX + delivered end-to-end. (They previously REJECTED with 5.1.1, which would have blocked SNDS verification.) Use postmaster@ or abuse@ for SNDS verification and fbl@performancewest.net as the JMRP complaint destination.

Carbonio mail admin: ssh -p 22022 justin@207.174.124.15 (the co.carrierone.com mail host; local workstation key, justin has NOPASSWD sudo). Run prov as zextras: sudo -u zextras /opt/zextras/bin/carbonio prov <cmd> (e.g. gaa, gadl, cdl <addr>, adlm <dl> <member>, gdlm <dl>).

🔴 MANUAL 3 — Yahoo Complaint Feedback Loop (Yahoo/AOL + att/sbcglobal/verizon)

Lowest priority (<1% of audience), but cheap. CFL is DKIM-d= based.

  1. https://senders.yahooinc.com/complaint-feedback-loop/ -> sign in -> register the domains performancewest.net and send.performancewest.net (CFL keys off the DKIM d= value; bulk mail now signs d=send.performancewest.net).
  2. Set the complaint destination to fbl@performancewest.net (now live, see above).

DMARC aggregate reports — mailbox FIXED 2026-06-19 (parser still TODO)

Gmail/Yahoo/Microsoft send daily per-IP auth+disposition XML to dmarc@performancewest.net (DMARC record has rua=mailto:dmarc@...). That mailbox was REJECTING (5.1.1) until 2026-06-19 — we were silently losing every report. It's now a Carbonio DL -> ops@ (verified delivering). Next: add IMAP creds for ops@ (or a dedicated dmarc mailbox) and build a small collector/parser worker to chart per-IP/per-domain pass-fail without any provider login. Now actually worth doing since the data finally arrives.


Ongoing hygiene (reduce reputation damage)

  • Dead-address scrub: ~110 genuine 5.1.1 user unknown bounces/day. listmonk already blocklists hard bounces after 1 (bounce.actions hard->blocklist), so these self-clean, but pre-scrubbing the dirtiest segments before send avoids the reputation hit. See data/ segment exports.
  • Consumer-domain exclusion (two layers). The authoritative list lives in scripts/_email_exclusions.py (BLOCKED_EMAIL_DOMAINS): gmail/google, the full Yahoo/Verizon-Media family, Microsoft consumer, Apple/iCloud (added 2026-06-19), dead/legacy ISPs, and the legal do-not-contact list.
    1. NEW selections: the per-vertical builders filter it out of audience SQL and listmonk_import.py refuses to import a blocked address.
    2. Already-imported subs: LIST-BASED campaigns (FCC Direct Contacts list 3, CRTC/USF blasts) can still hit consumer subs imported BEFORE a domain joined the list. scripts/scrub_listmonk_consumer.py reconciles the live subscriber table against the exclusion list and blocklists any ENABLED match (idempotent; --dry-run supported; both listmonk + listmonk_hc). Runs daily 06:30 UTC via /etc/cron.d/pw-listmonk-scrub (tracked at infra/cron/pw-listmonk-scrub). First run 2026-06-19 blocklisted 7,943 trucking + 21 HC stale consumer subs (1,321 iCloud, 267 gmail, etc.) that were leaking via the running CRTC campaign. Re-run the scrub whenever you add a domain to the exclusion list.
  • Don't re-expand IPs until Postmaster Tools shows recovered reputation.
  • Volume discipline: keep the global 200/hr sliding window until reputation is green; concentrated low volume on one warm IP beats bursts.
  • Watch the rejection mix: 5.7.1 reputation/spam/blocked should fall over the next 1-2 weeks as the single-IP reputation builds. Track via: ssh ... 'sudo grep status=bounced /var/log/mail.log | grep -c 5.7.1'