mail: fix OpenDKIM not signing campaign mail (Docker-injected) + codify in Ansible
Root cause of the Jun 2026 deliverability collapse / 'no new sales': opendkim.conf was in single-key mode with no InternalHosts, so it signed only 127.0.0.1. Transactional/cron mail (injected locally) was signed, but ALL campaign mail -- injected over the Docker bridge from the Listmonk containers (172.18.0.5 trucking, 172.18.0.25 healthcare) -- went out UNSIGNED. Gmail/Yahoo require DKIM on bulk mail since Feb 2024, so cold campaigns were junked/blocked (~23% delivery, 550-5.7.1). Proof: 2,620 campaign msgs that day, 0 DKIM sigs. The correct table files already existed on the server but were never wired into opendkim.conf. Fix points the daemon at key.table/signing.table and sets InternalHosts/ExternalIgnoreList to trusted.hosts (which includes 172.16.0.0/12, the Docker subnet). Fixes BOTH streams: HC submission ports 2526-2528 inherit the global smtpd_milters and *@performancewest.net covers compliance@. Verified by injecting from a Docker IP through port 25 and port 2526 -- both now get 'DKIM-Signature field added'. Codified as new Ansible role 'mail' so it can't silently regress (OpenDKIM was previously not in IaC at all).
This commit is contained in:
parent
f7212b3969
commit
4d5901921e
7 changed files with 217 additions and 0 deletions
|
|
@ -115,3 +115,59 @@ echo $(( ($(date +%s) - $(sudo cat /etc/postfix/pw-warmup-start)) / 86400 ))
|
||||||
- `/etc/postfix/main.cf.bak.*`
|
- `/etc/postfix/main.cf.bak.*`
|
||||||
- `/etc/postfix/transport.bak.*`
|
- `/etc/postfix/transport.bak.*`
|
||||||
- `/usr/local/bin/pw-mta-warmup.bak.*`
|
- `/usr/local/bin/pw-mta-warmup.bak.*`
|
||||||
|
|
||||||
|
## Incident: Jun 17 2026 — campaign mail sent UNSIGNED (no DKIM)
|
||||||
|
|
||||||
|
**Symptom:** "no new sales." Campaigns were sending (~3-4k/day) but delivery was
|
||||||
|
~23% (sent 1,802 vs deferred 5,143 + bounced 580), Gmail returned `550-5.7.1
|
||||||
|
likely unsolicited mail`, and there were **zero clicks since Jun 8** despite
|
||||||
|
~600 opens/day.
|
||||||
|
|
||||||
|
**Root cause:** OpenDKIM was signing **nothing** that came from Listmonk.
|
||||||
|
`/etc/opendkim.conf` was in single-key mode with **no `InternalHosts`**, so it
|
||||||
|
defaulted to signing only `127.0.0.1`. Cron/transactional mail is injected
|
||||||
|
locally (127.0.0.1) so it WAS signed — but campaign mail is injected over the
|
||||||
|
Docker bridge from the Listmonk containers (`172.18.0.5` trucking,
|
||||||
|
`172.18.0.25` healthcare). Those clients were not "internal," so OpenDKIM
|
||||||
|
*verified* (instead of *signed*) them: every cold email went out **unsigned**.
|
||||||
|
Since Feb 2024 Gmail/Yahoo require DKIM on bulk mail, so unsigned campaigns were
|
||||||
|
junked/blocked. Proof: `2,620` campaign messages that day, `0` "DKIM-Signature
|
||||||
|
field added" events, while the every-5-min cron mail was signed.
|
||||||
|
|
||||||
|
The correct table files already existed (`/etc/opendkim/{key.table,
|
||||||
|
signing.table,trusted.hosts}`, and `trusted.hosts` already listed
|
||||||
|
`172.16.0.0/12`) — they were simply **never wired into `opendkim.conf`**.
|
||||||
|
|
||||||
|
**Fix (now codified in Ansible `roles/mail`):** point `opendkim.conf` at the
|
||||||
|
tables and set the signing scope —
|
||||||
|
```
|
||||||
|
KeyTable refile:/etc/opendkim/key.table
|
||||||
|
SigningTable refile:/etc/opendkim/signing.table
|
||||||
|
InternalHosts /etc/opendkim/trusted.hosts # includes 172.16.0.0/12 (Docker)
|
||||||
|
ExternalIgnoreList /etc/opendkim/trusted.hosts
|
||||||
|
OversignHeaders From
|
||||||
|
```
|
||||||
|
then `systemctl restart opendkim`. This fixes BOTH streams at once: the
|
||||||
|
healthcare submission instances (ports 2526-2528) inherit the global
|
||||||
|
`smtpd_milters` and the `*@performancewest.net` signing table covers
|
||||||
|
`compliance@`. Verified by injecting a message from a Docker IP through both
|
||||||
|
port 25 and port 2526 and confirming "DKIM-Signature field added" for each.
|
||||||
|
|
||||||
|
**Verify DKIM is actually signing campaign mail:**
|
||||||
|
```bash
|
||||||
|
# Should be NON-ZERO and roughly track campaign volume:
|
||||||
|
sudo journalctl -u opendkim --since today | grep -c 'DKIM-Signature field added'
|
||||||
|
# Cross-check: campaign cleanup events today (should be similar order of magnitude)
|
||||||
|
sudo grep "^$(date '+%b %e')" /var/log/mail.log | grep -c postfix/cleanup
|
||||||
|
# Key still matches published DNS:
|
||||||
|
sudo opendkim-testkey -d performancewest.net -s mail -vvv # expect "key OK"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Still TODO from this incident (list quality + content, not yet done):**
|
||||||
|
- Scrub dead rural/satellite ISPs + dead M365 tenants from audiences and
|
||||||
|
suppress repeat-deferring/bouncing domains (extend `_email_exclusions.py`).
|
||||||
|
- Throttle/pause Gmail until reputation recovers (`550-5.7.1` was still firing).
|
||||||
|
- Add a plaintext (altbody) MIME part — all campaigns are currently HTML-only,
|
||||||
|
itself a spam signal.
|
||||||
|
- Fix the self-bounce cron emailing the nonexistent `deploy@performancewest.net`
|
||||||
|
(~700 self-inflicted `550` bounces/day).
|
||||||
|
|
|
||||||
|
|
@ -15,6 +15,7 @@
|
||||||
# minio — MinIO object storage + bucket creation
|
# minio — MinIO object storage + bucket creation
|
||||||
# workers — Python job server + Ollama LLM
|
# workers — Python job server + Ollama LLM
|
||||||
# shkeeper — k3s + Helm + SHKeeper (crypto payments: BTC/ETH/USDC/Polygon/TRX/BNB/LTC)
|
# shkeeper — k3s + Helm + SHKeeper (crypto payments: BTC/ETH/USDC/Polygon/TRX/BNB/LTC)
|
||||||
|
# mail — OpenDKIM signing for outbound Postfix mail (incl. Listmonk campaigns)
|
||||||
# nginx — nginx + certbot TLS for all domains + fail2ban
|
# nginx — nginx + certbot TLS for all domains + fail2ban
|
||||||
|
|
||||||
- name: Provision Performance West server
|
- name: Provision Performance West server
|
||||||
|
|
@ -31,6 +32,7 @@
|
||||||
- workers
|
- workers
|
||||||
- worker-crons
|
- worker-crons
|
||||||
- shkeeper
|
- shkeeper
|
||||||
|
- mail
|
||||||
- nginx
|
- nginx
|
||||||
- monitoring
|
- monitoring
|
||||||
- security-updates
|
- security-updates
|
||||||
|
|
|
||||||
22
infra/ansible/roles/mail/defaults/main.yml
Normal file
22
infra/ansible/roles/mail/defaults/main.yml
Normal file
|
|
@ -0,0 +1,22 @@
|
||||||
|
---
|
||||||
|
# OpenDKIM signing for outbound mail (Postfix milter).
|
||||||
|
#
|
||||||
|
# CRITICAL: campaign mail is injected into Postfix from the Listmonk containers
|
||||||
|
# over the Docker bridge network, NOT from localhost. OpenDKIM only signs mail
|
||||||
|
# whose client is in InternalHosts; if the Docker subnet is missing there,
|
||||||
|
# OpenDKIM *verifies* (rather than *signs*) campaign mail, so every cold email
|
||||||
|
# goes out UNSIGNED. Since Feb 2024 Gmail/Yahoo require DKIM on bulk mail, so
|
||||||
|
# unsigned campaigns get junked/blocked (this caused the Jun 2026 deliverability
|
||||||
|
# collapse: ~23% delivery, Gmail 550-5.7.1). The Docker subnet below MUST be in
|
||||||
|
# opendkim_internal_hosts.
|
||||||
|
opendkim_selector: mail
|
||||||
|
opendkim_signing_domain: performancewest.net
|
||||||
|
opendkim_socket: "inet:8891@localhost"
|
||||||
|
|
||||||
|
# Hosts OpenDKIM will SIGN for (vs verify). Must include the Docker bridge
|
||||||
|
# subnet so Listmonk container traffic is signed.
|
||||||
|
opendkim_internal_hosts:
|
||||||
|
- "127.0.0.1"
|
||||||
|
- "localhost"
|
||||||
|
- "172.16.0.0/12" # Docker bridge networks (Listmonk, workers, etc.)
|
||||||
|
- "10.0.0.0/8"
|
||||||
10
infra/ansible/roles/mail/handlers/main.yml
Normal file
10
infra/ansible/roles/mail/handlers/main.yml
Normal file
|
|
@ -0,0 +1,10 @@
|
||||||
|
---
|
||||||
|
- name: Restart opendkim
|
||||||
|
ansible.builtin.systemd:
|
||||||
|
name: opendkim
|
||||||
|
state: restarted
|
||||||
|
|
||||||
|
- name: Reload postfix
|
||||||
|
ansible.builtin.command:
|
||||||
|
cmd: postfix reload
|
||||||
|
changed_when: true
|
||||||
98
infra/ansible/roles/mail/tasks/main.yml
Normal file
98
infra/ansible/roles/mail/tasks/main.yml
Normal file
|
|
@ -0,0 +1,98 @@
|
||||||
|
---
|
||||||
|
- name: Install OpenDKIM + tools
|
||||||
|
ansible.builtin.apt:
|
||||||
|
name:
|
||||||
|
- opendkim
|
||||||
|
- opendkim-tools
|
||||||
|
state: present
|
||||||
|
|
||||||
|
- name: Ensure OpenDKIM key directory exists
|
||||||
|
ansible.builtin.file:
|
||||||
|
path: "/etc/opendkim/keys/{{ opendkim_signing_domain }}"
|
||||||
|
state: directory
|
||||||
|
owner: opendkim
|
||||||
|
group: opendkim
|
||||||
|
mode: "0750"
|
||||||
|
|
||||||
|
- name: Generate DKIM keypair if missing
|
||||||
|
ansible.builtin.command:
|
||||||
|
cmd: >-
|
||||||
|
opendkim-genkey
|
||||||
|
-b 2048
|
||||||
|
-d {{ opendkim_signing_domain }}
|
||||||
|
-s {{ opendkim_selector }}
|
||||||
|
-D /etc/opendkim/keys/{{ opendkim_signing_domain }}
|
||||||
|
creates: "/etc/opendkim/keys/{{ opendkim_signing_domain }}/{{ opendkim_selector }}.private"
|
||||||
|
register: dkim_keygen
|
||||||
|
|
||||||
|
- name: Fix DKIM private key ownership
|
||||||
|
ansible.builtin.file:
|
||||||
|
path: "/etc/opendkim/keys/{{ opendkim_signing_domain }}/{{ opendkim_selector }}.private"
|
||||||
|
owner: opendkim
|
||||||
|
group: opendkim
|
||||||
|
mode: "0600"
|
||||||
|
|
||||||
|
- name: Show DKIM public DNS record to publish (only when newly generated)
|
||||||
|
ansible.builtin.debug:
|
||||||
|
msg: >-
|
||||||
|
A new DKIM key was generated. Publish the TXT record from
|
||||||
|
/etc/opendkim/keys/{{ opendkim_signing_domain }}/{{ opendkim_selector }}.txt
|
||||||
|
at {{ opendkim_selector }}._domainkey.{{ opendkim_signing_domain }}
|
||||||
|
when: dkim_keygen is changed
|
||||||
|
|
||||||
|
- name: Deploy OpenDKIM KeyTable
|
||||||
|
ansible.builtin.copy:
|
||||||
|
dest: /etc/opendkim/key.table
|
||||||
|
content: |
|
||||||
|
{{ opendkim_selector }}._domainkey.{{ opendkim_signing_domain }} {{ opendkim_signing_domain }}:{{ opendkim_selector }}:/etc/opendkim/keys/{{ opendkim_signing_domain }}/{{ opendkim_selector }}.private
|
||||||
|
owner: root
|
||||||
|
group: root
|
||||||
|
mode: "0644"
|
||||||
|
notify: Restart opendkim
|
||||||
|
|
||||||
|
- name: Deploy OpenDKIM SigningTable
|
||||||
|
ansible.builtin.copy:
|
||||||
|
dest: /etc/opendkim/signing.table
|
||||||
|
content: |
|
||||||
|
*@{{ opendkim_signing_domain }} {{ opendkim_selector }}._domainkey.{{ opendkim_signing_domain }}
|
||||||
|
owner: root
|
||||||
|
group: root
|
||||||
|
mode: "0644"
|
||||||
|
notify: Restart opendkim
|
||||||
|
|
||||||
|
- name: Deploy OpenDKIM trusted/internal hosts (MUST include Docker subnet)
|
||||||
|
ansible.builtin.template:
|
||||||
|
src: trusted.hosts.j2
|
||||||
|
dest: /etc/opendkim/trusted.hosts
|
||||||
|
owner: root
|
||||||
|
group: root
|
||||||
|
mode: "0644"
|
||||||
|
notify: Restart opendkim
|
||||||
|
|
||||||
|
- name: Deploy opendkim.conf (table signing + InternalHosts)
|
||||||
|
ansible.builtin.template:
|
||||||
|
src: opendkim.conf.j2
|
||||||
|
dest: /etc/opendkim.conf
|
||||||
|
owner: root
|
||||||
|
group: root
|
||||||
|
mode: "0644"
|
||||||
|
validate: "opendkim -n -f -x %s"
|
||||||
|
notify: Restart opendkim
|
||||||
|
|
||||||
|
- name: Ensure OpenDKIM is enabled and running
|
||||||
|
ansible.builtin.systemd:
|
||||||
|
name: opendkim
|
||||||
|
enabled: true
|
||||||
|
state: started
|
||||||
|
|
||||||
|
- name: Wire Postfix to the OpenDKIM milter
|
||||||
|
ansible.builtin.command:
|
||||||
|
cmd: "postconf -e {{ item }}"
|
||||||
|
loop:
|
||||||
|
- "smtpd_milters={{ opendkim_socket }}"
|
||||||
|
- "non_smtpd_milters={{ opendkim_socket }}"
|
||||||
|
- "milter_default_action=accept"
|
||||||
|
- "milter_protocol=6"
|
||||||
|
register: postfix_milter
|
||||||
|
changed_when: false
|
||||||
|
notify: Reload postfix
|
||||||
22
infra/ansible/roles/mail/templates/opendkim.conf.j2
Normal file
22
infra/ansible/roles/mail/templates/opendkim.conf.j2
Normal file
|
|
@ -0,0 +1,22 @@
|
||||||
|
Syslog yes
|
||||||
|
SyslogSuccess yes
|
||||||
|
LogWhy yes
|
||||||
|
Mode s
|
||||||
|
Canonicalization relaxed/simple
|
||||||
|
Socket {{ opendkim_socket }}
|
||||||
|
PidFile /run/opendkim/opendkim.pid
|
||||||
|
UserID opendkim:opendkim
|
||||||
|
UMask 007
|
||||||
|
|
||||||
|
# Multi-domain table-based signing. Lets us add domains/selectors without
|
||||||
|
# touching the daemon config.
|
||||||
|
KeyTable refile:/etc/opendkim/key.table
|
||||||
|
SigningTable refile:/etc/opendkim/signing.table
|
||||||
|
|
||||||
|
# Hosts we SIGN for (must include the Docker bridge subnet so Listmonk
|
||||||
|
# container campaign mail is signed, not just localhost cron mail).
|
||||||
|
InternalHosts /etc/opendkim/trusted.hosts
|
||||||
|
ExternalIgnoreList /etc/opendkim/trusted.hosts
|
||||||
|
|
||||||
|
# Oversign From to prevent header-injection / replay of an extra From.
|
||||||
|
OversignHeaders From
|
||||||
7
infra/ansible/roles/mail/templates/trusted.hosts.j2
Normal file
7
infra/ansible/roles/mail/templates/trusted.hosts.j2
Normal file
|
|
@ -0,0 +1,7 @@
|
||||||
|
# OpenDKIM signing/trusted hosts. Mail whose client matches an entry here is
|
||||||
|
# SIGNED (InternalHosts) and never treated as external to verify
|
||||||
|
# (ExternalIgnoreList). The Docker bridge subnet is REQUIRED so campaign mail
|
||||||
|
# injected by the Listmonk containers is signed -- see roles/mail/defaults.
|
||||||
|
{% for h in opendkim_internal_hosts %}
|
||||||
|
{{ h }}
|
||||||
|
{% endfor %}
|
||||||
Loading…
Add table
Add a link
Reference in a new issue