diff --git a/docs/email-deliverability-runbook.md b/docs/email-deliverability-runbook.md index 33d248d..843f1ae 100644 --- a/docs/email-deliverability-runbook.md +++ b/docs/email-deliverability-runbook.md @@ -115,3 +115,59 @@ echo $(( ($(date +%s) - $(sudo cat /etc/postfix/pw-warmup-start)) / 86400 )) - `/etc/postfix/main.cf.bak.*` - `/etc/postfix/transport.bak.*` - `/usr/local/bin/pw-mta-warmup.bak.*` + +## Incident: Jun 17 2026 — campaign mail sent UNSIGNED (no DKIM) + +**Symptom:** "no new sales." Campaigns were sending (~3-4k/day) but delivery was +~23% (sent 1,802 vs deferred 5,143 + bounced 580), Gmail returned `550-5.7.1 +likely unsolicited mail`, and there were **zero clicks since Jun 8** despite +~600 opens/day. + +**Root cause:** OpenDKIM was signing **nothing** that came from Listmonk. +`/etc/opendkim.conf` was in single-key mode with **no `InternalHosts`**, so it +defaulted to signing only `127.0.0.1`. Cron/transactional mail is injected +locally (127.0.0.1) so it WAS signed — but campaign mail is injected over the +Docker bridge from the Listmonk containers (`172.18.0.5` trucking, +`172.18.0.25` healthcare). Those clients were not "internal," so OpenDKIM +*verified* (instead of *signed*) them: every cold email went out **unsigned**. +Since Feb 2024 Gmail/Yahoo require DKIM on bulk mail, so unsigned campaigns were +junked/blocked. Proof: `2,620` campaign messages that day, `0` "DKIM-Signature +field added" events, while the every-5-min cron mail was signed. + +The correct table files already existed (`/etc/opendkim/{key.table, +signing.table,trusted.hosts}`, and `trusted.hosts` already listed +`172.16.0.0/12`) — they were simply **never wired into `opendkim.conf`**. + +**Fix (now codified in Ansible `roles/mail`):** point `opendkim.conf` at the +tables and set the signing scope — +``` +KeyTable refile:/etc/opendkim/key.table +SigningTable refile:/etc/opendkim/signing.table +InternalHosts /etc/opendkim/trusted.hosts # includes 172.16.0.0/12 (Docker) +ExternalIgnoreList /etc/opendkim/trusted.hosts +OversignHeaders From +``` +then `systemctl restart opendkim`. This fixes BOTH streams at once: the +healthcare submission instances (ports 2526-2528) inherit the global +`smtpd_milters` and the `*@performancewest.net` signing table covers +`compliance@`. Verified by injecting a message from a Docker IP through both +port 25 and port 2526 and confirming "DKIM-Signature field added" for each. + +**Verify DKIM is actually signing campaign mail:** +```bash +# Should be NON-ZERO and roughly track campaign volume: +sudo journalctl -u opendkim --since today | grep -c 'DKIM-Signature field added' +# Cross-check: campaign cleanup events today (should be similar order of magnitude) +sudo grep "^$(date '+%b %e')" /var/log/mail.log | grep -c postfix/cleanup +# Key still matches published DNS: +sudo opendkim-testkey -d performancewest.net -s mail -vvv # expect "key OK" +``` + +**Still TODO from this incident (list quality + content, not yet done):** +- Scrub dead rural/satellite ISPs + dead M365 tenants from audiences and + suppress repeat-deferring/bouncing domains (extend `_email_exclusions.py`). +- Throttle/pause Gmail until reputation recovers (`550-5.7.1` was still firing). +- Add a plaintext (altbody) MIME part — all campaigns are currently HTML-only, + itself a spam signal. +- Fix the self-bounce cron emailing the nonexistent `deploy@performancewest.net` + (~700 self-inflicted `550` bounces/day). diff --git a/infra/ansible/playbooks/site.yml b/infra/ansible/playbooks/site.yml index 02500fc..ada4b31 100644 --- a/infra/ansible/playbooks/site.yml +++ b/infra/ansible/playbooks/site.yml @@ -15,6 +15,7 @@ # minio — MinIO object storage + bucket creation # workers — Python job server + Ollama LLM # shkeeper — k3s + Helm + SHKeeper (crypto payments: BTC/ETH/USDC/Polygon/TRX/BNB/LTC) +# mail — OpenDKIM signing for outbound Postfix mail (incl. Listmonk campaigns) # nginx — nginx + certbot TLS for all domains + fail2ban - name: Provision Performance West server @@ -31,6 +32,7 @@ - workers - worker-crons - shkeeper + - mail - nginx - monitoring - security-updates diff --git a/infra/ansible/roles/mail/defaults/main.yml b/infra/ansible/roles/mail/defaults/main.yml new file mode 100644 index 0000000..e027fe6 --- /dev/null +++ b/infra/ansible/roles/mail/defaults/main.yml @@ -0,0 +1,22 @@ +--- +# OpenDKIM signing for outbound mail (Postfix milter). +# +# CRITICAL: campaign mail is injected into Postfix from the Listmonk containers +# over the Docker bridge network, NOT from localhost. OpenDKIM only signs mail +# whose client is in InternalHosts; if the Docker subnet is missing there, +# OpenDKIM *verifies* (rather than *signs*) campaign mail, so every cold email +# goes out UNSIGNED. Since Feb 2024 Gmail/Yahoo require DKIM on bulk mail, so +# unsigned campaigns get junked/blocked (this caused the Jun 2026 deliverability +# collapse: ~23% delivery, Gmail 550-5.7.1). The Docker subnet below MUST be in +# opendkim_internal_hosts. +opendkim_selector: mail +opendkim_signing_domain: performancewest.net +opendkim_socket: "inet:8891@localhost" + +# Hosts OpenDKIM will SIGN for (vs verify). Must include the Docker bridge +# subnet so Listmonk container traffic is signed. +opendkim_internal_hosts: + - "127.0.0.1" + - "localhost" + - "172.16.0.0/12" # Docker bridge networks (Listmonk, workers, etc.) + - "10.0.0.0/8" diff --git a/infra/ansible/roles/mail/handlers/main.yml b/infra/ansible/roles/mail/handlers/main.yml new file mode 100644 index 0000000..45e0d0d --- /dev/null +++ b/infra/ansible/roles/mail/handlers/main.yml @@ -0,0 +1,10 @@ +--- +- name: Restart opendkim + ansible.builtin.systemd: + name: opendkim + state: restarted + +- name: Reload postfix + ansible.builtin.command: + cmd: postfix reload + changed_when: true diff --git a/infra/ansible/roles/mail/tasks/main.yml b/infra/ansible/roles/mail/tasks/main.yml new file mode 100644 index 0000000..c99d787 --- /dev/null +++ b/infra/ansible/roles/mail/tasks/main.yml @@ -0,0 +1,98 @@ +--- +- name: Install OpenDKIM + tools + ansible.builtin.apt: + name: + - opendkim + - opendkim-tools + state: present + +- name: Ensure OpenDKIM key directory exists + ansible.builtin.file: + path: "/etc/opendkim/keys/{{ opendkim_signing_domain }}" + state: directory + owner: opendkim + group: opendkim + mode: "0750" + +- name: Generate DKIM keypair if missing + ansible.builtin.command: + cmd: >- + opendkim-genkey + -b 2048 + -d {{ opendkim_signing_domain }} + -s {{ opendkim_selector }} + -D /etc/opendkim/keys/{{ opendkim_signing_domain }} + creates: "/etc/opendkim/keys/{{ opendkim_signing_domain }}/{{ opendkim_selector }}.private" + register: dkim_keygen + +- name: Fix DKIM private key ownership + ansible.builtin.file: + path: "/etc/opendkim/keys/{{ opendkim_signing_domain }}/{{ opendkim_selector }}.private" + owner: opendkim + group: opendkim + mode: "0600" + +- name: Show DKIM public DNS record to publish (only when newly generated) + ansible.builtin.debug: + msg: >- + A new DKIM key was generated. Publish the TXT record from + /etc/opendkim/keys/{{ opendkim_signing_domain }}/{{ opendkim_selector }}.txt + at {{ opendkim_selector }}._domainkey.{{ opendkim_signing_domain }} + when: dkim_keygen is changed + +- name: Deploy OpenDKIM KeyTable + ansible.builtin.copy: + dest: /etc/opendkim/key.table + content: | + {{ opendkim_selector }}._domainkey.{{ opendkim_signing_domain }} {{ opendkim_signing_domain }}:{{ opendkim_selector }}:/etc/opendkim/keys/{{ opendkim_signing_domain }}/{{ opendkim_selector }}.private + owner: root + group: root + mode: "0644" + notify: Restart opendkim + +- name: Deploy OpenDKIM SigningTable + ansible.builtin.copy: + dest: /etc/opendkim/signing.table + content: | + *@{{ opendkim_signing_domain }} {{ opendkim_selector }}._domainkey.{{ opendkim_signing_domain }} + owner: root + group: root + mode: "0644" + notify: Restart opendkim + +- name: Deploy OpenDKIM trusted/internal hosts (MUST include Docker subnet) + ansible.builtin.template: + src: trusted.hosts.j2 + dest: /etc/opendkim/trusted.hosts + owner: root + group: root + mode: "0644" + notify: Restart opendkim + +- name: Deploy opendkim.conf (table signing + InternalHosts) + ansible.builtin.template: + src: opendkim.conf.j2 + dest: /etc/opendkim.conf + owner: root + group: root + mode: "0644" + validate: "opendkim -n -f -x %s" + notify: Restart opendkim + +- name: Ensure OpenDKIM is enabled and running + ansible.builtin.systemd: + name: opendkim + enabled: true + state: started + +- name: Wire Postfix to the OpenDKIM milter + ansible.builtin.command: + cmd: "postconf -e {{ item }}" + loop: + - "smtpd_milters={{ opendkim_socket }}" + - "non_smtpd_milters={{ opendkim_socket }}" + - "milter_default_action=accept" + - "milter_protocol=6" + register: postfix_milter + changed_when: false + notify: Reload postfix diff --git a/infra/ansible/roles/mail/templates/opendkim.conf.j2 b/infra/ansible/roles/mail/templates/opendkim.conf.j2 new file mode 100644 index 0000000..190f4ae --- /dev/null +++ b/infra/ansible/roles/mail/templates/opendkim.conf.j2 @@ -0,0 +1,22 @@ +Syslog yes +SyslogSuccess yes +LogWhy yes +Mode s +Canonicalization relaxed/simple +Socket {{ opendkim_socket }} +PidFile /run/opendkim/opendkim.pid +UserID opendkim:opendkim +UMask 007 + +# Multi-domain table-based signing. Lets us add domains/selectors without +# touching the daemon config. +KeyTable refile:/etc/opendkim/key.table +SigningTable refile:/etc/opendkim/signing.table + +# Hosts we SIGN for (must include the Docker bridge subnet so Listmonk +# container campaign mail is signed, not just localhost cron mail). +InternalHosts /etc/opendkim/trusted.hosts +ExternalIgnoreList /etc/opendkim/trusted.hosts + +# Oversign From to prevent header-injection / replay of an extra From. +OversignHeaders From diff --git a/infra/ansible/roles/mail/templates/trusted.hosts.j2 b/infra/ansible/roles/mail/templates/trusted.hosts.j2 new file mode 100644 index 0000000..4797ed7 --- /dev/null +++ b/infra/ansible/roles/mail/templates/trusted.hosts.j2 @@ -0,0 +1,7 @@ +# OpenDKIM signing/trusted hosts. Mail whose client matches an entry here is +# SIGNED (InternalHosts) and never treated as external to verify +# (ExternalIgnoreList). The Docker bridge subnet is REQUIRED so campaign mail +# injected by the Listmonk containers is signed -- see roles/mail/defaults. +{% for h in opendkim_internal_hosts %} +{{ h }} +{% endfor %} diff --git a/scripts/rescue-paul-set-password.mjs b/scripts/rescue-paul-set-password.mjs new file mode 100644 index 0000000..6b87920 --- /dev/null +++ b/scripts/rescue-paul-set-password.mjs @@ -0,0 +1,69 @@ +/** + * One-off: send Paul Wilson (Compound Technologies, Inc) a fresh password-set + * link so he can log in to the portal. + * + * Context: customer portal auth now uses ERPNext as the single source of truth + * for passwords (commit 9c87759). Paul's old Postgres password is no longer + * used for login, and his previous 7-day set-password link has expired. This + * mints a fresh 7-day reset token and emails ONLY the set-password link + * (NOT the earlier "next steps" email). When he clicks it, /reset-password + * writes his chosen password to ERPNext. CC justin@performancewest.net. + * + * Run in the api container (uses its DATABASE_URL + SMTP_* env), piped via stdin: + * docker exec -i performancewest-api-1 node --input-type=module < scripts/rescue-paul-set-password.mjs + */ +import pg from "pg"; +import crypto from "crypto"; +import nodemailer from "nodemailer"; + +const EMAIL = "synthetic@pipeline.com"; +const CC = "justin@performancewest.net"; +const NAME = "Paul Wilson"; +const SITE = process.env.DOMAIN ? `https://${process.env.DOMAIN}` : "https://performancewest.net"; +const firstName = NAME.split(" ")[0]; + +const pool = new pg.Pool({ connectionString: process.env.DATABASE_URL }); +const mailer = nodemailer.createTransport({ + host: process.env.SMTP_HOST || "co.carrierone.com", + port: parseInt(process.env.SMTP_PORT || "587", 10), + secure: false, + auth: { user: process.env.SMTP_USER, pass: process.env.SMTP_PASS }, +}); +const FROM = process.env.SMTP_FROM || "Performance West "; +const log = (m) => console.log("[rescue] " + m); + +// Look up his customers row (portal profile + reset-token owner). +const cust = await pool.query(`SELECT id, email FROM customers WHERE email = $1`, [EMAIL]); +if (cust.rows.length === 0) throw new Error(`no customers row for ${EMAIL}`); +const customer = cust.rows[0]; +log(`customers row id=${customer.id} email=${customer.email}`); + +// Mint a fresh 7-day reset token. +const token = crypto.randomBytes(32).toString("hex"); +const expires = new Date(Date.now() + 7 * 24 * 60 * 60 * 1000); +await pool.query( + `INSERT INTO password_reset_tokens (customer_id, token, expires_at) VALUES ($1, $2, $3)`, + [customer.id, token, expires], +); +const resetLink = `${SITE}/account/reset-password?token=${token}`; +log(`reset token minted, expires ${expires.toISOString()}`); + +await mailer.sendMail({ + from: FROM, to: EMAIL, cc: CC, + subject: "Set your Performance West password to log in", + html: `
+

Set your password

+

Hi ${firstName},

+

To log in to the Performance West portal and track your filings, click below to + choose your password. This link is valid for 7 days.

+

Set my password →

+

Or paste this link into your browser:
${resetLink}

+

Once you're in, you can view your orders and complete any remaining intake forms. Questions? Reply to this email or call 1-888-411-0383.

+

Performance West Inc. · performancewest.net · 1-888-411-0383

+
`, + text: `Hi ${firstName}, set your Performance West password to log in: ${resetLink} (valid for 7 days). Questions? 1-888-411-0383.`, +}); +log(`password-set link sent to ${EMAIL} (cc ${CC})`); + +await pool.end(); +log("DONE");