mail: fix OpenDKIM not signing campaign mail (Docker-injected) + codify in Ansible
Root cause of the Jun 2026 deliverability collapse / 'no new sales': opendkim.conf was in single-key mode with no InternalHosts, so it signed only 127.0.0.1. Transactional/cron mail (injected locally) was signed, but ALL campaign mail -- injected over the Docker bridge from the Listmonk containers (172.18.0.5 trucking, 172.18.0.25 healthcare) -- went out UNSIGNED. Gmail/Yahoo require DKIM on bulk mail since Feb 2024, so cold campaigns were junked/blocked (~23% delivery, 550-5.7.1). Proof: 2,620 campaign msgs that day, 0 DKIM sigs. The correct table files already existed on the server but were never wired into opendkim.conf. Fix points the daemon at key.table/signing.table and sets InternalHosts/ExternalIgnoreList to trusted.hosts (which includes 172.16.0.0/12, the Docker subnet). Fixes BOTH streams: HC submission ports 2526-2528 inherit the global smtpd_milters and *@performancewest.net covers compliance@. Verified by injecting from a Docker IP through port 25 and port 2526 -- both now get 'DKIM-Signature field added'. Codified as new Ansible role 'mail' so it can't silently regress (OpenDKIM was previously not in IaC at all).
This commit is contained in:
parent
f7212b3969
commit
4d5901921e
7 changed files with 217 additions and 0 deletions
|
|
@ -115,3 +115,59 @@ echo $(( ($(date +%s) - $(sudo cat /etc/postfix/pw-warmup-start)) / 86400 ))
|
|||
- `/etc/postfix/main.cf.bak.*`
|
||||
- `/etc/postfix/transport.bak.*`
|
||||
- `/usr/local/bin/pw-mta-warmup.bak.*`
|
||||
|
||||
## Incident: Jun 17 2026 — campaign mail sent UNSIGNED (no DKIM)
|
||||
|
||||
**Symptom:** "no new sales." Campaigns were sending (~3-4k/day) but delivery was
|
||||
~23% (sent 1,802 vs deferred 5,143 + bounced 580), Gmail returned `550-5.7.1
|
||||
likely unsolicited mail`, and there were **zero clicks since Jun 8** despite
|
||||
~600 opens/day.
|
||||
|
||||
**Root cause:** OpenDKIM was signing **nothing** that came from Listmonk.
|
||||
`/etc/opendkim.conf` was in single-key mode with **no `InternalHosts`**, so it
|
||||
defaulted to signing only `127.0.0.1`. Cron/transactional mail is injected
|
||||
locally (127.0.0.1) so it WAS signed — but campaign mail is injected over the
|
||||
Docker bridge from the Listmonk containers (`172.18.0.5` trucking,
|
||||
`172.18.0.25` healthcare). Those clients were not "internal," so OpenDKIM
|
||||
*verified* (instead of *signed*) them: every cold email went out **unsigned**.
|
||||
Since Feb 2024 Gmail/Yahoo require DKIM on bulk mail, so unsigned campaigns were
|
||||
junked/blocked. Proof: `2,620` campaign messages that day, `0` "DKIM-Signature
|
||||
field added" events, while the every-5-min cron mail was signed.
|
||||
|
||||
The correct table files already existed (`/etc/opendkim/{key.table,
|
||||
signing.table,trusted.hosts}`, and `trusted.hosts` already listed
|
||||
`172.16.0.0/12`) — they were simply **never wired into `opendkim.conf`**.
|
||||
|
||||
**Fix (now codified in Ansible `roles/mail`):** point `opendkim.conf` at the
|
||||
tables and set the signing scope —
|
||||
```
|
||||
KeyTable refile:/etc/opendkim/key.table
|
||||
SigningTable refile:/etc/opendkim/signing.table
|
||||
InternalHosts /etc/opendkim/trusted.hosts # includes 172.16.0.0/12 (Docker)
|
||||
ExternalIgnoreList /etc/opendkim/trusted.hosts
|
||||
OversignHeaders From
|
||||
```
|
||||
then `systemctl restart opendkim`. This fixes BOTH streams at once: the
|
||||
healthcare submission instances (ports 2526-2528) inherit the global
|
||||
`smtpd_milters` and the `*@performancewest.net` signing table covers
|
||||
`compliance@`. Verified by injecting a message from a Docker IP through both
|
||||
port 25 and port 2526 and confirming "DKIM-Signature field added" for each.
|
||||
|
||||
**Verify DKIM is actually signing campaign mail:**
|
||||
```bash
|
||||
# Should be NON-ZERO and roughly track campaign volume:
|
||||
sudo journalctl -u opendkim --since today | grep -c 'DKIM-Signature field added'
|
||||
# Cross-check: campaign cleanup events today (should be similar order of magnitude)
|
||||
sudo grep "^$(date '+%b %e')" /var/log/mail.log | grep -c postfix/cleanup
|
||||
# Key still matches published DNS:
|
||||
sudo opendkim-testkey -d performancewest.net -s mail -vvv # expect "key OK"
|
||||
```
|
||||
|
||||
**Still TODO from this incident (list quality + content, not yet done):**
|
||||
- Scrub dead rural/satellite ISPs + dead M365 tenants from audiences and
|
||||
suppress repeat-deferring/bouncing domains (extend `_email_exclusions.py`).
|
||||
- Throttle/pause Gmail until reputation recovers (`550-5.7.1` was still firing).
|
||||
- Add a plaintext (altbody) MIME part — all campaigns are currently HTML-only,
|
||||
itself a spam signal.
|
||||
- Fix the self-bounce cron emailing the nonexistent `deploy@performancewest.net`
|
||||
(~700 self-inflicted `550` bounces/day).
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue