When we resume Gmail sends, the front-loaded-inject + slow-drain pattern buries mail: Listmonk stamps Date at injection (verified live: queued msg Date matched postfix arrival, deferred 4h47m later), and Gmail sorts the inbox by the Date header. So a msg injected at 08:00 but accepted at 14:00 files 6h down a Gmail inbox. Documents: why NOT to future-date the Date header (spam signal + breaks our DKIM which signs Date + doesn't help Outlook's received-time sort), and the real fix -- pace Listmonk injection to match Gmail's accept rate (just-in-time Date) via a dedicated Gmail stream on its own IP + low sliding-window rate + queue-age guard. Outlook/M365 (current audience) sorts by received time so the burial is cosmetic there and not worth fixing. Procedure only; Gmail still excluded in _email_exclusions.py until re-enabled.
22 KiB
Email Deliverability Runbook
Owner action items are marked 🔴 MANUAL. Everything else is already done/automated.
Last updated: 2026-06-19 (bulk subdomain + SPF trim + Microsoft/audience analysis).
TL;DR of the 2026-06-18/19 deliverability incident
- Symptom: ~30% "open" rates but 0 human clicks, 0 sales across both trucking and healthcare streams.
- Root cause: NOT a blocklist, NOT the IPs. Proven by a controlled A/B test
(2026-06-19): from the same mail server / same IPs, a message From
justin@carrierone.comlanded in the Inbox while Fromjustin@performancewest.netwent to Junk. The variable is the From domain's reputation.carrierone.com(reg. 2006, years of steady low-volume mail, tight 2-IP SPF) is trusted;performancewest.net(only started bulk in ~May 2026, broken DKIM until 2026-06-17, 21-IP snowshoe SPF, May 30-31 over-volume blast) is cold/damaged. - Where the audience actually is (24h receiver mix): ~85% Microsoft
(M365/Outlook/Hotmail), ~14% Google, <1% Yahoo. Our list is B2B, so Microsoft
is the game, not Gmail. Microsoft is NOT reputation-blocking us (only ~1.6%
5.7.x/S3150 rejects; it accepts ~2,138 msgs/24h) — but acceptance != inbox, so
the engagement problem there is likely Junk-foldering, same domain-reputation
cause. Gmail rejects ~95% of its (smaller) slice on
550-5.7.1 ... very low reputation of the sending domain. The single biggest bounce bucket is actually list hygiene: ~1,012/24h Microsoft451 4.4.4 no mail-enabled subscriptions(dead tenant domains) + dead recipients. - Fixes applied (2026-06-18/19):
- Consolidated to ONE IP per stream (snowshoe was a band-aid for broken DKIM).
- Dedicated bulk subdomain
send.performancewest.netso bulk reputation is isolated from the root domain (which stays clean for transactional mail). - Trimmed root SPF from 21 IPs to the real 3 (the bloated record was itself a snowshoe signal).
- Disabled the pointless
pw-ip-rehabcron (we have no IP reputation problem).
Bulk subdomain: send.performancewest.net (2026-06-19)
Why: isolate bulk/cold-campaign sending reputation from the root domain. The root domain carries transactional/verification/receipt mail (via co.carrierone.com relay + the .71 default egress) and must stay clean; cold campaigns are inherently reputation-risky. Industry-standard (SendGrid/Mailchimp/etc.) split.
Customer experience is unchanged: From is the subdomain, but Reply-To stays
info@performancewest.net, so replies land in the real inbox and look normal.
| Piece | Value |
|---|---|
| Trucking From | Performance West <noreply@send.performancewest.net> |
| Healthcare From | Performance West Compliance <compliance@send.performancewest.net> |
| Reply-To (both) | info@performancewest.net |
| DKIM selector | send (send._domainkey.send.performancewest.net), 2048-bit |
| SPF | v=spf1 ip4:207.174.124.94 ip4:207.174.124.107 -all |
| DMARC | inherits root p=reject (explicit _dmarc.send also published) |
| MX / Return-Path | co.carrierone.com (bounces) |
| Egress IPs | .94 (trucking) / .107 (HC) — unchanged |
Code: from_email is set in scripts/build_trucking_campaigns.py (FROM_EMAIL,
env CAMPAIGN_FROM) and scripts/build_healthcare_campaigns_cron.py (FROM_EMAIL,
env HC_CAMPAIGN_FROM). Bounce-watchers (scripts/bounce-watcher.sh,
scripts/hc-bounce-watcher.sh) track the new subdomain sender (and keep the legacy
root sender so the pre-cutover queue drains).
Infra: OpenDKIM signs both domains — see infra/ansible/roles/mail
(opendkim_signing_domains list generates per-domain keys + KeyTable/SigningTable).
DNS published on the Hestia master (see DNS automation note below). Verified
end-to-end 2026-06-19: a test send signs d=send.performancewest.net; s=send; and
egresses out05/.94.
Listmonk global app.from_email was also updated in both DBs as a fallback for
any UI/test send that doesn't set From explicitly.
⚠️ The subdomain starts at NEUTRAL reputation (not negative, not warm). It still needs the same warm-up discipline: steady low volume to engaged recipients. It is NOT a magic reset — but it protects the root domain and starts cleaner than the damaged root.
Sending architecture (after 2026-06-18/19 consolidation)
| Stream | IP | PTR / HELO | Path |
|---|---|---|---|
| Trucking (listmonk) | 207.174.124.94 | mta05.performancewest.net | listmonk -> :25 -> randmap:{out05:} |
| Healthcare (listmonk-hc) | 207.174.124.107 | hcmta01.performancewest.net | listmonk-hc SMTP server 1 -> :2526 -> hcout1 |
| Transactional / verification | 207.174.124.71 + co.carrierone.com (.15) | perfwest | default smtp_bind_address (.71) + :587 relay (.15) |
| Removed 2026-06-23 (snowshoe cleanup) | .90-.93, .95-.106, .108-.109 | mta01-04/06-17, hcmta02-03 | transports + host IP bindings DELETED |
Snowshoe IP cleanup (2026-06-23): the 18 dormant sending IPs (.90-.93,
.95-.106, .108-.109) were fully removed from BOTH postfix (master.cf
transports yahooslow/out02-04/out06-20/rehab02-04/2527/2528/
hcout2/hcout3) AND the host (/etc/network/interfaces + live ip addr del).
Only the two warm sending IPs (.94 trucking, .107 HC) plus infra (.71/.72)
remain bound. A 20-IP footprint reads as snowshoe spam and was hurting domain
reputation; the SPF was already trimmed to .94/.107 on 2026-06-19, so this just
makes the host/postfix match the SPF intent. Verified live: postfix check OK,
both streams still status=sent post-change, SSH unaffected. Reference snapshots
committed at infra/postfix/live-snapshots/master.cf + infra/network/interfaces
(live backups /root/master.cf.bak_snowshoe_* + /root/interfaces.bak_snowshoe_*).
Root SPF (trimmed 2026-06-19): v=spf1 a mx ip4:207.174.124.15 ip4:207.174.124.94 ip4:207.174.124.107 -all — a=.71, mx=co.carrierone.com(.15),
plus the two bulk IPs. The old 21-IP record was a snowshoe signal; this matches
carrierone.com's tight style.
To re-expand after reputation is established: add transports back to ALL=()
in infra/postfix/pw-mta-warmup.sh and re-enable the HC SMTP servers (ports
2527/2528) in the listmonk_hc DB settings.smtp. Re-expand SLOWLY (one IP at a
time, days apart) and only after Postmaster Tools shows a green/medium reputation.
If you re-expand, also add the IPs back to BOTH the root SPF and the send
subdomain SPF.
Resuming Gmail sends: the stale-Date / inbox-burial problem (READ BEFORE re-enabling Gmail)
Status: Gmail is currently EXCLUDED from all sends (scripts/_email_exclusions.py
BLOCKED_EMAIL_DOMAINS includes gmail/google). This section is the documented
procedure for when we resume Gmail, and the reasoning for the chosen design. It is
NOT yet implemented — implement it at the moment Gmail is re-enabled.
The problem
We inject the whole daily batch into Postfix in a ~2.5h burst (today: 1,430 + 1,419
- 1,077 messages in the 07:00-09:30 window, with a 932-in-one-minute spike at
08:30), then Postfix slow-drains the queue over ~24h because receivers throttle a
warming IP/domain (Microsoft
451 4.7.500 Server busy).
Listmonk stamps the Date: header at the moment it hands each message to Postfix
(injection time), NOT at delivery time. Empirically verified 2026-06-23: a queued
message had Date: 19:47:28 matching its Postfix arrival log line exactly, and was
still deferred ~4h47m later. So a message injected at 08:00 keeps an 08:00 Date:
even when the receiver finally accepts it at 14:00.
Why this matters ONLY for Gmail: inbox sort order depends on the client.
- Outlook / Exchange / M365 (our current #1 audience, ~2,000 delivered/day) and
most webmail (Proton, etc.) sort by received time (
PR_MESSAGE_DELIVERY_TIME) = when THEIR server accepted it. A late-delivered message surfaces fresh at the top on arrival; only the displayed date looks old. So for today's audience the burial is cosmetic and NOT worth fixing. - Gmail sorts the inbox by the
Date:header. A message accepted at 14:00 but Date-stamped 08:00 is filed 6h down the inbox, below mail the user has already read. That is real burial and real lost opens — and it only bites once we send Gmail again (which is ~85% Microsoft / ~14% Google for our B2B list, so Gmail is a meaningful slice).
Why NOT to future-date / spoof the Date: header
The tempting "just stamp a future Date" fix is a net negative:
- Spam signal. A
Date:in the future is a classic filter heuristic — Proofpoint, Mimecast, and Microsoft all penalize it. We'd trade a cosmetic timestamp for WORSE inbox placement. - It breaks our DKIM. OpenDKIM signs the
Dateheader (onlyFromis over-signed, butDateis in the signed set). RewritingDateafter signing invalidates the signature -> DMARCp=reject-> hard bounce. - It doesn't even help Outlook (received-time sort) and is the wrong lever for Gmail (see the real fix below).
The fix: pace Listmonk INJECTION to match Gmail's accept rate (just-in-time Date)
Because Date: is stamped at injection, the solution is to release each Gmail
message close to when Gmail will actually accept it, so Date: ≈ received time ≈
now, and it lands at the top of the Gmail inbox. Keep the Postfix queue shallow for
the Gmail stream so no message sits for hours collecting a stale Date.
Implementation when re-enabling Gmail:
- Segment Gmail into its OWN Listmonk campaign on its OWN single IP (snowshoe- safe), separate from the Microsoft/Proofpoint stream, so its deliberately slow pace does not bottleneck the fast stream. Each stream gets its own injection cadence. (Add the new IP to host + Postfix transport + BOTH SPF records first, per the re-expand note above.)
- Set the Gmail campaign's sliding-window injection rate at or below Gmail's
sustained cold-domain accept rate (
app.message_sliding_window_rate/_durationon that Listmonk instance). Start low (~20-30/hr/IP for a cold domain) and ramp as Postmaster Tools reputation climbs. This spreads injection across the whole sending window instead of front-loading it, so the queue never builds a backlog of stale-dated Gmail mail. - Queue-age guard. Monitor the inject->deliver gap for the Gmail stream
(
delay=in the maillog). If it exceeds ~30 min, injection is outrunning acceptance -> throttle the sliding-window rate down further. Verify after a day that the Gmail stream'sdelay=stays small and the "6-24h late" bucket is ~0.
This is strictly better than date-spoofing: no spam signal, no DKIM break, and because Gmail/Microsoft both reward steady paced volume, pacing injection also RAISES the accept quota over time (the deliverability principle "concentrated low volume beats bursts"). Win-win.
Note: this same pacing slightly helps Outlook's displayed date too, but since Outlook sorts by received time it is not necessary there. Only spend the effort on the Gmail stream.
DNS automation (Hestia is the master)
DNS is fully automatable — Hestia (cp.carrierone.com, 207.174.124.22) is the
DNS master; HE.net are slaves. Access: ssh -p 22022 root@cp.carrierone.com using
the local workstation's ~/.ssh/id_ed25519 (NOT the app server, NOT justin@
which is SFTP-only). The justin Hestia user owns the performancewest.net zone.
# add (note: Hestia appends the base domain to the RECORD name, so a record at
# send._domainkey.send.performancewest.net needs RECORD = "send._domainkey.send")
v-add-dns-record justin performancewest.net "<record>" <TYPE> "<value>" [prio]
# change / delete (find the numeric id with v-list-dns-records ... plain)
v-change-dns-record justin performancewest.net <id> "<record>" <TYPE> "<value>" "" yes <ttl>
v-delete-dns-record justin performancewest.net <id>
# list
v-list-dns-records justin performancewest.net plain
Each write triggers a ~30s zone rebuild + DNSSEC re-sign; slaves sync via NOTIFY /
SOA refresh, usually within a minute. Verify on @8.8.8.8 AND the master
@207.174.124.22 (the master is authoritative; public resolvers may lag).
Monitoring tools (set these up to SEE reputation directly)
These all require a provider account login + (for Google) a DNS TXT record on HE.net, so they can't be fully automated. Steps are pre-filled below.
🔴 MANUAL 1 — Google Postmaster Tools (Gmail is our biggest blocker)
Gmail's verbatim rejection names "the sending domain", so this is priority #1.
DNS is fully automatable — Hestia (cp.carrierone.com) is the DNS master,
HE.net are slaves. Add records as root: ssh -p 22022 root@cp.carrierone.com
then v-add-dns-record justin performancewest.net "@" TXT '"'"'"<value>"'"'"'
(zone owner is the justin Hestia user; ~30s zone rebuild + slaves sync via the
2h SOA refresh / NOTIFY, usually within a minute).
Status 2026-06-18: TXT added + verified live (record id 14464,
google-site-verification=p8s3RaN5wi81350wToMpdPMho5Gcel4RGT1Q1SXj7vg),
resolving on 8.8.8.8/1.1.1.1/9.9.9.9 and 4/5 HE.net slaves. Owner just needs to
click Verify in the Postmaster console once. Data populates 24-48h after
volume flows from the consolidated IP.
To set up from scratch next time: postmaster.google.com -> +Add domain ->
performancewest.net -> copy the google-site-verification=... token -> add via
the Hestia command above -> Verify.
✅ MANUAL 2 — Microsoft SNDS + JMRP (Outlook/Hotmail/Live) — DONE 2026-06-19
85% of our audience is Microsoft-hosted (M365/Outlook/Hotmail), so this is the single most important monitoring tool. Microsoft already accepts our mail (~1.6% reputation rejects), so this tells us inbox-vs-junk + complaint rates. SNDS is IP-based (register the sending IPs), JMRP is the complaint feedback loop. Both SNDS access and JMRP are now registered for 207.174.124.94 + .107.
2026 URL MIGRATION: Microsoft moved SNDS off
sendersupport.olc.protection.outlook.com. The old/snds/and/pm/links now 308-redirect to the new app atsubstrate.office.com/ip-domain-management-snds/. The footer/help links on that page ("contact sender support", "Privacy", "Microsoft Services Agreement") go to genericmicrosoft.compages — that is normal, they are boilerplate, NOT the broken task. You must click "Log in" (top-right) with a personal Microsoft account FIRST; until you authenticate the "Request Access" / "Junk Mail Reporting Program" links just bounce tologin.microsoftonline.com, which looks like a dead redirect but is the expected auth step. After login the real forms render.
- SNDS — Request Access: open the SNDS app — either the legacy entry
https://sendersupport.olc.protection.outlook.com/snds/ (it 308-redirects to the
new app) or directly
https://substrate.office.com/ip-domain-management-snds/SNDS— then Log in -> left-nav "Request Access" (direct:https://substrate.office.com/ip-domain-management-snds/SNDS/AddNetwork) -> register IPs 207.174.124.94 and 207.174.124.107 (the two live stream IPs; add .90 and .71 if you want full coverage). Verification goes to a role address on the IP's domain (usepostmaster@orabuse@performancewest.net, now live). (NOTE:snds.microsoft.comdoes NOT resolve — do not use it.) ✅ DONE 2026-06-19: access requested/granted for .94 + .107. Data populates over ~24-48h; then check the dashboard for the per-IP RED/YELLOW/GREEN status, spam-trap hits, and complaint rate. - JMRP: same site, left-nav "Junk Mail Reporting Program" (direct:
https://substrate.office.com/ip-domain-management-snds/SNDS/Jmrp) -> register the same IPs + complaint-destination mailboxfbl@performancewest.net. Complaints then arrive as ARF emails. ✅ DONE 2026-06-19: both IPs registered as feeds —pw1= 207.174.124.94,pw2= 207.174.124.107, complaint destination set tofbl@performancewest.net(live, routes to ops@). ARF complaint reports now land there automatically.
✅ PREREQ DONE (2026-06-19): the role mailboxes Microsoft needs now exist and
deliver. Created as Carbonio distribution lists routing to ops@performancewest.net:
postmaster@, abuse@, fbl@, dmarc@ — all verified ACCEPT at the MX +
delivered end-to-end. (They previously REJECTED with 5.1.1, which would have blocked
SNDS verification.) Use postmaster@ or abuse@ for SNDS verification and
fbl@performancewest.net as the JMRP complaint destination.
Carbonio mail admin:
ssh -p 22022 justin@207.174.124.15(the co.carrierone.com mail host; local workstation key, justin has NOPASSWD sudo). Run prov as zextras:sudo -u zextras /opt/zextras/bin/carbonio prov <cmd>(e.g.gaa,gadl,cdl <addr>,adlm <dl> <member>,gdlm <dl>).
✅ MANUAL 3 — Yahoo Complaint Feedback Loop — keys added 2026-06-19
Lowest priority (<1% of audience), but cheap. CFL is DKIM-d= based.
- https://senders.yahooinc.com/complaint-feedback-loop/ -> sign in -> register
the domains
performancewest.netandsend.performancewest.net(CFL keys off the DKIMd=value; bulk mail now signsd=send.performancewest.net). - Set the complaint destination to
fbl@performancewest.net(now live, see above).
✅ ENROLLED 2026-06-19 — both domains show Enrolled in the Yahoo Sender Hub
CFL with reporting email fbl@performancewest.net:
performancewest.net— Enrolled, reportingfbl@performancewest.netsend.performancewest.net— Enrolled, reportingfbl@performancewest.net(Reporting-email code was delivered to fbl@ → ops@ and verified; the Selector column is intentionally blank = match any DKIM selector on the verified domain.)
✅ DNS verification keys added + propagated 2026-06-19 (Hestia TXT, verified on all HE.net slaves + 8.8.8.8/1.1.1.1/9.9.9.9):
performancewest.netTXTyahoo-verification-key=IMx+OO5aKUE1nu9JwP6eSBMfSYZu8VcXjpkvEVXS84w=send.performancewest.netTXTyahoo-verification-key=Ps5hGjVxXgeQcLcxr671YG0/RxzjjL0eqh6vfULubEo=(added alongside the existingsendSPF record; both TXT coexist).
✅ DMARC aggregate reports — DONE 2026-06-19 (dedicated mailbox + parser)
Gmail/Yahoo/Microsoft + dozens of operators (Comcast, Cox, Bell, Mimecast, Cisco
ESA, GMX, mail.com, gosecure, ...) send daily per-IP auth+disposition XML to
dmarc@performancewest.net (DMARC record: p=reject; rua=mailto:dmarc@; ruf=mailto:dmarc@; fo=1).
That mailbox was REJECTING (5.1.1) until 2026-06-19 — we silently lost every
report. Now fully wired:
- Dedicated mailbox.
dmarc@performancewest.netis its own Carbonio account (was a DL -> ops@, which buried ops@ under report XML). Isolated IMAP credential in the server.env(DMARC_IMAP_{HOST,PORT,USER,PASS}), surfaced to the workers container indocker-compose.yml(mirrors theOPS_IMAP_*pattern). The 29 historical reports that had landed in ops@ were moved over via IMAP. - Parser worker.
scripts/dmarc_report_parser.pyIMAP-fetches unseen messages, decompresses the.gz/.zip/.xmlattachment (namespace-agnostic — handles both the classic and theurn:ietf:params:xml:ns:dmarc-2.0GMX/mail.com schema), parses the aggregate XML, and upserts onedmarc_reportrow (keyed(org_name, report_id), so re-parsing is a no-op) + onedmarc_recordrow per source IP into the schema fromapi/migrations/102_dmarc_aggregate.sql.dmarc_pass = dkim_aligned=pass OR spf_aligned=pass. Marks each message\Seenso each run only handles new reports. Flags:--dry-run,--all(backfill seen),--alert(7-day per-IP summary + Telegram if one of OUR IPs drops below 95% pass, or an EXTERNAL IP sends >=20 failing msgs as us = spoofing underp=reject). - Cron.
/etc/cron.d/pw-dmarc-parser(tracked atinfra/cron/pw-dmarc-parser) runs... workers python3 -m scripts.dmarc_report_parser --alertdaily at 06:20 UTC.
Query examples once populated:
-- who sends as us, and are they aligning? (the payoff of the DKIM/subdomain fixes)
SELECT source_ip, sum(msg_count) total,
sum(msg_count) FILTER (WHERE dmarc_pass) pass,
round(100.0*sum(msg_count) FILTER (WHERE dmarc_pass)/sum(msg_count)) pass_pct
FROM dmarc_record r JOIN dmarc_report rep ON rep.id=r.report_id
WHERE rep.date_begin >= now()-interval '7 days'
GROUP BY source_ip ORDER BY total DESC;
-- any UNKNOWN IP failing alignment = spoofing/forgotten relay (reputation poison)
Ongoing hygiene (reduce reputation damage)
- Dead-address scrub: ~110 genuine
5.1.1 user unknownbounces/day. listmonk already blocklists hard bounces after 1 (bounce.actions hard->blocklist), so these self-clean, but pre-scrubbing the dirtiest segments before send avoids the reputation hit. Seedata/segment exports. - Consumer-domain exclusion (two layers). The authoritative list lives in
scripts/_email_exclusions.py(BLOCKED_EMAIL_DOMAINS): gmail/google, the full Yahoo/Verizon-Media family, Microsoft consumer, Apple/iCloud (added 2026-06-19), dead/legacy ISPs, and the legal do-not-contact list.- NEW selections: the per-vertical builders filter it out of audience SQL and
listmonk_import.pyrefuses to import a blocked address. - Already-imported subs: LIST-BASED campaigns (FCC Direct Contacts list 3,
CRTC/USF blasts) can still hit consumer subs imported BEFORE a domain joined
the list.
scripts/scrub_listmonk_consumer.pyreconciles the live subscriber table against the exclusion list and blocklists any ENABLED match (idempotent;--dry-runsupported; bothlistmonk+listmonk_hc). Runs daily 06:30 UTC via/etc/cron.d/pw-listmonk-scrub(tracked atinfra/cron/pw-listmonk-scrub). First run 2026-06-19 blocklisted 7,943 trucking + 21 HC stale consumer subs (1,321 iCloud, 267 gmail, etc.) that were leaking via the running CRTC campaign. Re-run the scrub whenever you add a domain to the exclusion list.
- NEW selections: the per-vertical builders filter it out of audience SQL and
- Don't re-expand IPs until Postmaster Tools shows recovered reputation.
- Volume discipline: keep the global 200/hr sliding window until reputation is green; concentrated low volume on one warm IP beats bursts.
- Watch the rejection mix:
5.7.1 reputation/spam/blockedshould fall over the next 1-2 weeks as the single-IP reputation builds. Track via:ssh ... 'sudo grep status=bounced /var/log/mail.log | grep -c 5.7.1'