docs: deliverability runbook (incident, IP consolidation, monitoring setup)
Documents the 2026-06-18 reputation incident (snowshoe -> Gmail domain-rep blocks, RBLs all clean), the single-IP-per-stream consolidation, and fill-in-the-blanks setup steps for Google Postmaster Tools, Microsoft SNDS/JMRP, and Yahoo CFL (all require owner account login + HE.net DNS). Plus ongoing hygiene + how to re-expand IPs once reputation recovers.
This commit is contained in:
parent
545e6f7ed7
commit
5253f16675
1 changed files with 102 additions and 0 deletions
102
docs/deliverability.md
Normal file
102
docs/deliverability.md
Normal file
|
|
@ -0,0 +1,102 @@
|
|||
# Email Deliverability Runbook
|
||||
|
||||
**Owner action items are marked 🔴 MANUAL. Everything else is already done/automated.**
|
||||
|
||||
Last updated: 2026-06-18 (IP consolidation + monitoring-tools setup).
|
||||
|
||||
---
|
||||
|
||||
## TL;DR of the 2026-06-18 deliverability incident
|
||||
|
||||
- **Symptom:** ~30% "open" rates but **0 human clicks, 0 sales** across both trucking
|
||||
and healthcare streams.
|
||||
- **Root cause:** NOT a blocklist. Swept all 21 sending IPs against ~40 RBLs
|
||||
(Spamhaus via authoritative NS, Barracuda, SpamCop, SORBS, UCEPROTECT L1/2/3,
|
||||
Mailspike, SpamRATS, etc.) -> **every IP clean.** The real problem was
|
||||
**domain reputation**: Gmail rejected ~150 msgs/day with
|
||||
`550-5.7.1 ... very low reputation of the sending domain`. We were
|
||||
**snowshoeing** ~3k trucking msgs/day across 12 IPs + ~1.2k healthcare across
|
||||
3 IPs, so no single IP sent enough per-receiver volume to build reputation.
|
||||
This rotation was a band-aid for the **broken DKIM** (fixed 2026-06-17) and the
|
||||
May 30-31 over-volume blast.
|
||||
- **Fix applied:** consolidated to ONE IP per stream (below) so each accrues real
|
||||
reputation now that DKIM signs correctly.
|
||||
|
||||
---
|
||||
|
||||
## Sending architecture (after 2026-06-18 consolidation)
|
||||
|
||||
| Stream | IP | PTR / HELO | Path |
|
||||
|--------|----|-----------|----|
|
||||
| **Trucking** (listmonk) | **207.174.124.94** | mta05.performancewest.net | listmonk -> :25 -> `randmap:{out05:}` |
|
||||
| **Healthcare** (listmonk-hc) | **207.174.124.107** | hcmta01.performancewest.net | listmonk-hc SMTP server 1 -> :2526 -> hcout1 |
|
||||
| Yahoo/AOL trickle | 207.174.124.90 | mta01 | `yahooslow` transport (hash:transport) |
|
||||
| Transactional | 207.174.124.71 | perfwest | default `smtp_bind_address` |
|
||||
| Retired (torched May 30-31) | .91 / .92 / .93 | mta02-04 | rehab02-04 (reputation rebuild only) |
|
||||
| Dormant (re-expand later) | .95-.105, .108-.109 | mta06-17, hcmta02-03 | disabled |
|
||||
|
||||
**To re-expand after reputation is established:** add transports back to `ALL=()`
|
||||
in `infra/postfix/pw-mta-warmup.sh` and re-enable the HC SMTP servers (ports
|
||||
2527/2528) in the `listmonk_hc` DB `settings.smtp`. Re-expand SLOWLY (one IP at a
|
||||
time, days apart) and only after Postmaster Tools shows a green/medium reputation.
|
||||
|
||||
SPF authorizes the whole `.71/.90-.109` set already — harmless, gives flexibility.
|
||||
|
||||
---
|
||||
|
||||
## Monitoring tools (set these up to SEE reputation directly)
|
||||
|
||||
These all require a provider account login + (for Google) a DNS TXT record on
|
||||
HE.net, so they can't be fully automated. Steps are pre-filled below.
|
||||
|
||||
### 🔴 MANUAL 1 — Google Postmaster Tools (Gmail is our biggest blocker)
|
||||
Gmail's verbatim rejection names "the sending **domain**", so this is priority #1.
|
||||
1. Go to <https://postmaster.google.com> and sign in with any Google account.
|
||||
2. Click **+ (Add domain)** -> enter `performancewest.net`.
|
||||
3. Google shows a **TXT record** like `google-site-verification=XXXXXXXX`.
|
||||
4. Add it at **HE.net DNS** (dns.he.net -> performancewest.net zone):
|
||||
- Type: `TXT`, Name: `@` (apex), Value: the full `google-site-verification=...`
|
||||
string. (This coexists with the existing SPF TXT — multiple TXT records on
|
||||
the apex are fine.)
|
||||
5. Wait ~15 min for propagation, then click **Verify** in Postmaster Tools.
|
||||
6. Data (Domain Reputation, IP Reputation, Spam Rate, Auth pass %, Feedback Loop)
|
||||
starts populating in 24-48h once volume flows from the consolidated IP.
|
||||
|
||||
### 🔴 MANUAL 2 — Microsoft SNDS + JMRP (Outlook/Hotmail/Live)
|
||||
SNDS is **IP-based** (register the sending IPs), JMRP is the complaint feedback loop.
|
||||
1. **SNDS:** <https://sendersupport.olc.protection.outlook.com/snds/> -> "Request
|
||||
access" -> register IPs: **207.174.124.94** and **207.174.124.107** (the two
|
||||
live stream IPs; add .90 and .71 if you want full coverage). Verification goes
|
||||
to a role address on the IP's domain — use `postmaster@performancewest.net` or
|
||||
`abuse@performancewest.net` (ensure one of those receives mail via carrierone).
|
||||
2. **JMRP:** <https://sendersupport.olc.protection.outlook.com/pm/> -> sign in with
|
||||
a Microsoft account -> register the same IPs + a complaint-destination mailbox
|
||||
(e.g. `fbl@performancewest.net`). Complaints then arrive as ARF emails.
|
||||
|
||||
### 🔴 MANUAL 3 — Yahoo Complaint Feedback Loop (Yahoo/AOL + att/sbcglobal/verizon)
|
||||
1. <https://senders.yahooinc.com/complaint-feedback-loop/> -> sign in -> register
|
||||
the domain `performancewest.net` (CFL is DKIM-d= based, so it covers all our
|
||||
IPs automatically since they all sign with the same `mail._domainkey`).
|
||||
2. Set the complaint destination to `fbl@performancewest.net`.
|
||||
|
||||
### ✅ AUTOMATABLE LATER — DMARC aggregate reports (all providers, free)
|
||||
Gmail/Yahoo/Microsoft already send daily per-IP auth+disposition XML to
|
||||
`dmarc@performancewest.net` (our DMARC record has `rua=mailto:dmarc@...`). Nobody
|
||||
parses them yet. If we add IMAP creds for that mailbox (it's on carrierone MX) we
|
||||
can build a small collector/parser worker to chart per-IP pass/fail without any
|
||||
provider login. Deferred — provider dashboards above are faster to stand up.
|
||||
|
||||
---
|
||||
|
||||
## Ongoing hygiene (reduce reputation damage)
|
||||
|
||||
- **Dead-address scrub:** ~110 genuine `5.1.1 user unknown` bounces/day. listmonk
|
||||
already blocklists hard bounces after 1 (`bounce.actions hard->blocklist`), so
|
||||
these self-clean, but pre-scrubbing the dirtiest segments before send avoids the
|
||||
reputation hit. See `data/` segment exports.
|
||||
- **Don't re-expand IPs** until Postmaster Tools shows recovered reputation.
|
||||
- **Volume discipline:** keep the global 200/hr sliding window until reputation is
|
||||
green; concentrated low volume on one warm IP beats bursts.
|
||||
- **Watch the rejection mix:** `5.7.1 reputation/spam/blocked` should fall over the
|
||||
next 1-2 weeks as the single-IP reputation builds. Track via:
|
||||
`ssh ... 'sudo grep status=bounced /var/log/mail.log | grep -c 5.7.1'`
|
||||
Loading…
Add table
Add a link
Reference in a new issue