infra(mail): fix warmed sending IPs dropping off ens18 on reboot (Jun 24 outage)
Unattended kernel-upgrade reboot (Jun 24 04:04) left only .71 bound because
classic ifupdown applies just the first 'address' line. Postfix then failed to
bind .94/.107 ('Cannot assign requested address') and silently egressed from
.71 -- which is NOT in SPF (every fallback msg failed SPF) and is on RLR621 +
Trend ERS-QIL. ~37h of bypassed IP-warming + a near-zero sales day.
Fixes:
- /etc/network/interfaces: explicit up/down ip-addr hooks for .72/.94/.107
- pw-mail-ips.service: systemd oneshot re-binds IPs + flushes queue on boot
- pw-mail-ip-watchdog: */5 cron re-binds missing IPs + flushes, also catches
'Cannot assign' bind failures
- runbook: full incident writeup + reboot-test lesson
Host already remediated live; this commits the host artifacts + docs.
This commit is contained in:
parent
7ad4c920c6
commit
4276adab80
3 changed files with 74 additions and 0 deletions
20
infra/mail/pw-mail-ip-watchdog
Executable file
20
infra/mail/pw-mail-ip-watchdog
Executable file
|
|
@ -0,0 +1,20 @@
|
|||
#!/bin/sh
|
||||
# Guard against the Jun 24 incident: an unattended reboot dropped the warmed
|
||||
# sending IPs (.94/.107) off ens18 because classic ifupdown only applies the
|
||||
# first "address" line. Postfix then fell back to egressing from .71 (NOT in
|
||||
# SPF, on RLR621/Trend ERS-QIL) for ~37h, tanking deliverability silently.
|
||||
# This re-binds any missing sending IP and logs/flushes if it had to act.
|
||||
CHANGED=0
|
||||
for ip in 207.174.124.72 207.174.124.94 207.174.124.107; do
|
||||
if ! ip addr show ens18 | grep -q "$ip/"; then
|
||||
ip addr add "$ip/23" dev ens18 && CHANGED=1
|
||||
logger -t pw-mail-ip-watchdog "re-bound missing sending IP $ip to ens18"
|
||||
fi
|
||||
done
|
||||
# Also catch silent bind failures even if the IP looks present.
|
||||
if tail -n 500 /var/log/mail.log 2>/dev/null | grep -q "Cannot assign requested address"; then
|
||||
logger -t pw-mail-ip-watchdog "postfix bind failures detected in recent mail.log"
|
||||
CHANGED=1
|
||||
fi
|
||||
[ "$CHANGED" = 1 ] && /usr/sbin/postqueue -f 2>/dev/null
|
||||
exit 0
|
||||
13
infra/mail/pw-mail-ips.service
Normal file
13
infra/mail/pw-mail-ips.service
Normal file
|
|
@ -0,0 +1,13 @@
|
|||
[Unit]
|
||||
Description=Ensure Performance West mail sending IPs are bound to ens18
|
||||
After=network-online.target networking.service
|
||||
Wants=network-online.target
|
||||
|
||||
[Service]
|
||||
Type=oneshot
|
||||
RemainAfterExit=yes
|
||||
ExecStart=/bin/sh -c "for ip in 207.174.124.72 207.174.124.94 207.174.124.107; do ip addr show ens18 | grep -q \"$ip/\" || ip addr add $ip/23 dev ens18; done"
|
||||
ExecStart=/usr/sbin/postqueue -f
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
Loading…
Add table
Add a link
Reference in a new issue