Document the self-hosted MTA layout, the May 30-31 reputation collapse, the Jun 02 remediation (retired burned IPs .91/.92/.93, swapped rotation to fresh .94/.95/.96, full Yahoo-family hold map, Listmonk sliding-window cap, paused the 13k-recipient blast scheduled for Jun 03), and the fresh-IP warmup rules + monitoring commands.
5.2 KiB
Email Deliverability & IP Warmup Runbook
Performance West self-hosts its outbound MTA (Postfix on the app server) because transactional relays (SES, Postmark, SendGrid) forbid the cold prospecting email our FMCSA trucking and telecom campaigns depend on. That means we own our sending-IP reputation and must manage it manually. This doc is the operational guide for keeping it healthy.
Infrastructure layout
- Host Postfix on the app server (
207.174.124.71), reached by Listmonk via SMTP at172.18.0.1:25. - Sending IPs:
207.174.124.90through.109(20 IPs), each with valid FCrDNS (mtaNN.performancewest.net) and authorized in SPF (-all)..90/mta01: historically a dedicated Yahoo trickle IP. We no longer mail Yahoo at all, so it is idle..91-.109/mta02-mta20: rotation pool, selected viatransport_maps = hash:/etc/postfix/transport, randmap:{<active pool>}.
- Warmup scheduler:
/usr/local/bin/pw-mta-warmup(daily cron/etc/cron.d/pw-mta-warmup, 07:17 UTC). Recomputes the active rotation pool from a start date stamped in/etc/postfix/pw-warmup-start. Ramp schedule: day 0-3 -> 3 IPs, 4-7 -> 5, 8-11 -> 8, 12-17 -> 12, 18-24 -> 16, 25+ -> 19. The pool only ever grows. It picks IPs from the front of theALL=(...)array.
What we do NOT mail
The Yahoo / Verizon-Media family is excluded entirely (yahoo, aol, att,
verizon, frontier, sbcglobal, bellsouth, pacbell, ameritech, ymail, rocketmail,
aim, netscape, compuserve, etc.). They aggressively defer cold senders with
421 4.7.0 [TSS04] ... unexpected volume or user complaints, and that deferral
poisons the sending IP for Gmail and Microsoft too.
Enforced in two layers:
- Audience build (authoritative):
scripts/_email_exclusions.py(BLOCKED_EMAIL_DOMAINS), imported bybuild_trucking_campaigns.pyandpopulate_new_carrier_startup_campaign.py. New campaigns never include them. - Postfix backstop:
/etc/postfix/transportmaps every Yahoo-family domain tohold:. If any leak into the queue they are parked, never sent from a rotation IP.
Incident: May 30-31 2026 reputation collapse
A campaign blast pushed ~29k sends in a day across cold IPs .91/.92/.93 with no
daily volume cap. Result:
- Gmail:
550-5.7.1 ... likely unsolicited mail(hard spam block). - Yahoo:
421 TSS04on the rotation IPs. - Steady state afterward: ~13% delivery (10k sent vs 68k deferred + 7k bounced in a day). Listmonk open rate ~4%, clicks ~0.
Remediation (Jun 02 2026)
- Retired the 3 burned IPs (
.91/.92/.93= out02/03/04) from rotation. Confirmed.94-.109had never sent outbound (only inbound port-scan noise), so they are pristine. - Swapped rotation to fresh
.94/.95/.96(out05/06/07) and reset the warmup start date to day 0. - Patched
pw-mta-warmupALLarray to start atout05so the daily cron never reverts to the burned IPs. - Rewrote
/etc/postfix/transporttohold:the full Yahoo family (was a partial list with buggy duplicate keys routing toyahooslow). - Flushed the entire stale queue (1,846 blast-era messages, mostly dead satellite ISPs) so fresh IPs start clean.
- Enabled Listmonk sliding-window rate limit so no campaign can blast again:
app.message_sliding_window=true, duration1h, rate50,message_rate=2. - Paused 19 trucking campaigns (IDs 275-293, ~13k recipients) that were scheduled to fire Jun 03; they were built before the exclusion fix and would have re-torched the fresh IPs. Rebuild them small/clean before resending.
Fresh-IP warmup discipline (the rules)
- Small audiences. Day 0-3: a few hundred TOTAL per day, not per campaign.
Lower the
limitvalues inbuild_trucking_campaigns.pysegment specs while warming. - Best recipients first. Only verified / engaged addresses. Gmail and Microsoft only (Yahoo family already excluded).
- Scrub hard bounces immediately.
550 5.1.1(no such user), full mailbox, "not our customer" all hurt reputation signals. - Watch the signals daily (see commands below). If Gmail
550-5.7.1or Yahoo421 TSS04reappear, STOP and hold for several days. - Ramp Listmonk's sliding window in step with the IP warmup (e.g. 50/h ->
150/h -> 300/h as days pass and signals stay clean). Restart the listmonk
container after changing
settings.
Monitoring commands
# delivery mix today
sudo grep "^$(date '+%b %d')" /var/log/mail.log | grep -oE 'status=(sent|deferred|bounced)' | sort | uniq -c
# per-IP outbound volume today (catch a runaway blast early)
for ip in 94 95 96; do echo -n ".$ip: "; sudo grep "^$(date '+%b %d')" /var/log/mail.log | grep -c "207.174.124.$ip"; done
# top deferral / bounce reasons today
sudo grep "^$(date '+%b %d')" /var/log/mail.log | grep status=deferred | grep -oE 'said: [0-9]{3}[^)]{0,50}' | sort | uniq -c | sort -rn | head
# queue size
sudo postqueue -p | tail -1
# active rotation pool + warmup day
sudo postconf -h transport_maps
echo $(( ($(date +%s) - $(sudo cat /etc/postfix/pw-warmup-start)) / 86400 ))
Backups left on the server (Jun 02 2026 remediation)
/etc/postfix/main.cf.bak.*/etc/postfix/transport.bak.*/usr/local/bin/pw-mta-warmup.bak.*