docs+infra(deliverability): document bulk subdomain; ansible signs send.performancewest.net
- infra/ansible/roles/mail: refactor OpenDKIM to support multiple signing domains via opendkim_signing_domains list (root + send.performancewest.net). Loops keygen/ownership/keytable/signingtable so the live two-domain setup is reproducible from ansible. - infra/ansible group_vars: add bulk_mail_subdomain + campaign_from_* + campaign_reply_to documentation vars (map to CAMPAIGN_FROM / HC_CAMPAIGN_FROM env read by the builder scripts). smtp_from (transactional) stays on root. - docs/deliverability.md: rewrite TL;DR with the carrierone-vs-performancewest A/B proof (same server/IPs, different From domain -> Inbox vs Junk) and the ~85% Microsoft / 14% Google / <1% Yahoo audience mix; add the bulk-subdomain section, SPF trim, rehab-disabled, and the Hestia DNS automation runbook.
This commit is contained in:
parent
5c3b4291e7
commit
3ca960aca5
4 changed files with 158 additions and 29 deletions
|
|
@ -2,45 +2,130 @@
|
||||||
|
|
||||||
**Owner action items are marked 🔴 MANUAL. Everything else is already done/automated.**
|
**Owner action items are marked 🔴 MANUAL. Everything else is already done/automated.**
|
||||||
|
|
||||||
Last updated: 2026-06-18 (IP consolidation + monitoring-tools setup).
|
Last updated: 2026-06-19 (bulk subdomain + SPF trim + Microsoft/audience analysis).
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## TL;DR of the 2026-06-18 deliverability incident
|
## TL;DR of the 2026-06-18/19 deliverability incident
|
||||||
|
|
||||||
- **Symptom:** ~30% "open" rates but **0 human clicks, 0 sales** across both trucking
|
- **Symptom:** ~30% "open" rates but **0 human clicks, 0 sales** across both trucking
|
||||||
and healthcare streams.
|
and healthcare streams.
|
||||||
- **Root cause:** NOT a blocklist. Swept all 21 sending IPs against ~40 RBLs
|
- **Root cause:** NOT a blocklist, NOT the IPs. Proven by a controlled A/B test
|
||||||
(Spamhaus via authoritative NS, Barracuda, SpamCop, SORBS, UCEPROTECT L1/2/3,
|
(2026-06-19): from the **same mail server / same IPs**, a message From
|
||||||
Mailspike, SpamRATS, etc.) -> **every IP clean.** The real problem was
|
`justin@carrierone.com` landed in the **Inbox** while From
|
||||||
**domain reputation**: Gmail rejected ~150 msgs/day with
|
`justin@performancewest.net` went to **Junk**. The variable is the **From
|
||||||
`550-5.7.1 ... very low reputation of the sending domain`. We were
|
domain's reputation**. `carrierone.com` (reg. 2006, years of steady low-volume
|
||||||
**snowshoeing** ~3k trucking msgs/day across 12 IPs + ~1.2k healthcare across
|
mail, tight 2-IP SPF) is trusted; `performancewest.net` (only started bulk in
|
||||||
3 IPs, so no single IP sent enough per-receiver volume to build reputation.
|
~May 2026, broken DKIM until 2026-06-17, 21-IP snowshoe SPF, May 30-31
|
||||||
This rotation was a band-aid for the **broken DKIM** (fixed 2026-06-17) and the
|
over-volume blast) is cold/damaged.
|
||||||
May 30-31 over-volume blast.
|
- **Where the audience actually is (24h receiver mix):** **~85% Microsoft**
|
||||||
- **Fix applied:** consolidated to ONE IP per stream (below) so each accrues real
|
(M365/Outlook/Hotmail), ~14% Google, <1% Yahoo. Our list is B2B, so Microsoft
|
||||||
reputation now that DKIM signs correctly.
|
is the game, not Gmail. **Microsoft is NOT reputation-blocking us** (only ~1.6%
|
||||||
|
5.7.x/S3150 rejects; it accepts ~2,138 msgs/24h) — but acceptance != inbox, so
|
||||||
|
the engagement problem there is likely Junk-foldering, same domain-reputation
|
||||||
|
cause. Gmail rejects ~95% of its (smaller) slice on `550-5.7.1 ... very low
|
||||||
|
reputation of the sending domain`. The single biggest bounce bucket is actually
|
||||||
|
**list hygiene**: ~1,012/24h Microsoft `451 4.4.4 no mail-enabled subscriptions`
|
||||||
|
(dead tenant domains) + dead recipients.
|
||||||
|
- **Fixes applied (2026-06-18/19):**
|
||||||
|
1. Consolidated to ONE IP per stream (snowshoe was a band-aid for broken DKIM).
|
||||||
|
2. **Dedicated bulk subdomain** `send.performancewest.net` so bulk reputation is
|
||||||
|
isolated from the root domain (which stays clean for transactional mail).
|
||||||
|
3. Trimmed root SPF from 21 IPs to the real 3 (the bloated record was itself a
|
||||||
|
snowshoe signal).
|
||||||
|
4. Disabled the pointless `pw-ip-rehab` cron (we have no IP reputation problem).
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Sending architecture (after 2026-06-18 consolidation)
|
## Bulk subdomain: send.performancewest.net (2026-06-19)
|
||||||
|
|
||||||
|
**Why:** isolate bulk/cold-campaign sending reputation from the root domain. The
|
||||||
|
root domain carries transactional/verification/receipt mail (via co.carrierone.com
|
||||||
|
relay + the .71 default egress) and must stay clean; cold campaigns are inherently
|
||||||
|
reputation-risky. Industry-standard (SendGrid/Mailchimp/etc.) split.
|
||||||
|
|
||||||
|
**Customer experience is unchanged:** From is the subdomain, but **Reply-To stays
|
||||||
|
`info@performancewest.net`**, so replies land in the real inbox and look normal.
|
||||||
|
|
||||||
|
| Piece | Value |
|
||||||
|
|-------|-------|
|
||||||
|
| Trucking From | `Performance West <noreply@send.performancewest.net>` |
|
||||||
|
| Healthcare From | `Performance West Compliance <compliance@send.performancewest.net>` |
|
||||||
|
| Reply-To (both) | `info@performancewest.net` |
|
||||||
|
| DKIM selector | `send` (`send._domainkey.send.performancewest.net`), 2048-bit |
|
||||||
|
| SPF | `v=spf1 ip4:207.174.124.94 ip4:207.174.124.107 -all` |
|
||||||
|
| DMARC | inherits root `p=reject` (explicit `_dmarc.send` also published) |
|
||||||
|
| MX / Return-Path | `co.carrierone.com` (bounces) |
|
||||||
|
| Egress IPs | .94 (trucking) / .107 (HC) — unchanged |
|
||||||
|
|
||||||
|
**Code:** `from_email` is set in `scripts/build_trucking_campaigns.py` (`FROM_EMAIL`,
|
||||||
|
env `CAMPAIGN_FROM`) and `scripts/build_healthcare_campaigns_cron.py` (`FROM_EMAIL`,
|
||||||
|
env `HC_CAMPAIGN_FROM`). Bounce-watchers (`scripts/bounce-watcher.sh`,
|
||||||
|
`scripts/hc-bounce-watcher.sh`) track the new subdomain sender (and keep the legacy
|
||||||
|
root sender so the pre-cutover queue drains).
|
||||||
|
|
||||||
|
**Infra:** OpenDKIM signs both domains — see `infra/ansible/roles/mail`
|
||||||
|
(`opendkim_signing_domains` list generates per-domain keys + KeyTable/SigningTable).
|
||||||
|
DNS published on the Hestia master (see DNS automation note below). Verified
|
||||||
|
end-to-end 2026-06-19: a test send signs `d=send.performancewest.net; s=send;` and
|
||||||
|
egresses out05/.94.
|
||||||
|
|
||||||
|
**Listmonk global `app.from_email`** was also updated in both DBs as a fallback for
|
||||||
|
any UI/test send that doesn't set From explicitly.
|
||||||
|
|
||||||
|
> ⚠️ The subdomain starts at NEUTRAL reputation (not negative, not warm). It still
|
||||||
|
> needs the same warm-up discipline: steady low volume to engaged recipients. It is
|
||||||
|
> NOT a magic reset — but it protects the root domain and starts cleaner than the
|
||||||
|
> damaged root.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Sending architecture (after 2026-06-18/19 consolidation)
|
||||||
|
|
||||||
| Stream | IP | PTR / HELO | Path |
|
| Stream | IP | PTR / HELO | Path |
|
||||||
|--------|----|-----------|----|
|
|--------|----|-----------|----|
|
||||||
| **Trucking** (listmonk) | **207.174.124.94** | mta05.performancewest.net | listmonk -> :25 -> `randmap:{out05:}` |
|
| **Trucking** (listmonk) | **207.174.124.94** | mta05.performancewest.net | listmonk -> :25 -> `randmap:{out05:}` |
|
||||||
| **Healthcare** (listmonk-hc) | **207.174.124.107** | hcmta01.performancewest.net | listmonk-hc SMTP server 1 -> :2526 -> hcout1 |
|
| **Healthcare** (listmonk-hc) | **207.174.124.107** | hcmta01.performancewest.net | listmonk-hc SMTP server 1 -> :2526 -> hcout1 |
|
||||||
|
| Transactional / verification | 207.174.124.71 + co.carrierone.com (.15) | perfwest | default `smtp_bind_address` (.71) + :587 relay (.15) |
|
||||||
| Yahoo/AOL trickle | 207.174.124.90 | mta01 | `yahooslow` transport (hash:transport) |
|
| Yahoo/AOL trickle | 207.174.124.90 | mta01 | `yahooslow` transport (hash:transport) |
|
||||||
| Transactional | 207.174.124.71 | perfwest | default `smtp_bind_address` |
|
| Retired (torched May 30-31) | .91 / .92 / .93 | mta02-04 | rehab02-04 — **`pw-ip-rehab` cron DISABLED 2026-06-19** |
|
||||||
| Retired (torched May 30-31) | .91 / .92 / .93 | mta02-04 | rehab02-04 (reputation rebuild only) |
|
|
||||||
| Dormant (re-expand later) | .95-.105, .108-.109 | mta06-17, hcmta02-03 | disabled |
|
| Dormant (re-expand later) | .95-.105, .108-.109 | mta06-17, hcmta02-03 | disabled |
|
||||||
|
|
||||||
|
**Root SPF (trimmed 2026-06-19):** `v=spf1 a mx ip4:207.174.124.15
|
||||||
|
ip4:207.174.124.94 ip4:207.174.124.107 -all` — `a`=.71, `mx`=co.carrierone.com(.15),
|
||||||
|
plus the two bulk IPs. The old 21-IP record was a snowshoe signal; this matches
|
||||||
|
carrierone.com's tight style.
|
||||||
|
|
||||||
**To re-expand after reputation is established:** add transports back to `ALL=()`
|
**To re-expand after reputation is established:** add transports back to `ALL=()`
|
||||||
in `infra/postfix/pw-mta-warmup.sh` and re-enable the HC SMTP servers (ports
|
in `infra/postfix/pw-mta-warmup.sh` and re-enable the HC SMTP servers (ports
|
||||||
2527/2528) in the `listmonk_hc` DB `settings.smtp`. Re-expand SLOWLY (one IP at a
|
2527/2528) in the `listmonk_hc` DB `settings.smtp`. Re-expand SLOWLY (one IP at a
|
||||||
time, days apart) and only after Postmaster Tools shows a green/medium reputation.
|
time, days apart) and only after Postmaster Tools shows a green/medium reputation.
|
||||||
|
If you re-expand, also add the IPs back to BOTH the root SPF and the `send`
|
||||||
|
subdomain SPF.
|
||||||
|
|
||||||
SPF authorizes the whole `.71/.90-.109` set already — harmless, gives flexibility.
|
---
|
||||||
|
|
||||||
|
## DNS automation (Hestia is the master)
|
||||||
|
|
||||||
|
**DNS is fully automatable** — Hestia (`cp.carrierone.com`, 207.174.124.22) is the
|
||||||
|
DNS master; HE.net are slaves. Access: `ssh -p 22022 root@cp.carrierone.com` using
|
||||||
|
the **local workstation's** `~/.ssh/id_ed25519` (NOT the app server, NOT justin@
|
||||||
|
which is SFTP-only). The `justin` Hestia user owns the `performancewest.net` zone.
|
||||||
|
|
||||||
|
```
|
||||||
|
# add (note: Hestia appends the base domain to the RECORD name, so a record at
|
||||||
|
# send._domainkey.send.performancewest.net needs RECORD = "send._domainkey.send")
|
||||||
|
v-add-dns-record justin performancewest.net "<record>" <TYPE> "<value>" [prio]
|
||||||
|
# change / delete (find the numeric id with v-list-dns-records ... plain)
|
||||||
|
v-change-dns-record justin performancewest.net <id> "<record>" <TYPE> "<value>" "" yes <ttl>
|
||||||
|
v-delete-dns-record justin performancewest.net <id>
|
||||||
|
# list
|
||||||
|
v-list-dns-records justin performancewest.net plain
|
||||||
|
```
|
||||||
|
|
||||||
|
Each write triggers a ~30s zone rebuild + DNSSEC re-sign; slaves sync via NOTIFY /
|
||||||
|
SOA refresh, usually within a minute. Verify on `@8.8.8.8` AND the master
|
||||||
|
`@207.174.124.22` (the master is authoritative; public resolvers may lag).
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -80,6 +80,21 @@ smtp_pass: "{{ vault_smtp_pass }}"
|
||||||
smtp_from: "Performance West <noreply@performancewest.net>"
|
smtp_from: "Performance West <noreply@performancewest.net>"
|
||||||
smtp_admin_email: ops@performancewest.net
|
smtp_admin_email: ops@performancewest.net
|
||||||
|
|
||||||
|
# ── Bulk campaign From (Listmonk) ────────────────────────────────────────────
|
||||||
|
# Cold/bulk campaign mail is sent From a dedicated bulk subdomain so its sending
|
||||||
|
# reputation is ISOLATED from the root domain. The root domain (smtp_from above)
|
||||||
|
# carries transactional/verification/receipt mail and stays clean. Replies still
|
||||||
|
# route to the root domain via Reply-To, so the customer reply experience is
|
||||||
|
# unchanged. These map to the CAMPAIGN_FROM / HC_CAMPAIGN_FROM env vars read by
|
||||||
|
# scripts/build_trucking_campaigns.py and build_healthcare_campaigns_cron.py.
|
||||||
|
# See docs/deliverability.md. The subdomain's DNS (A/MX/SPF/DKIM selector=send/
|
||||||
|
# DMARC) is published on the Hestia DNS master; OpenDKIM signs it (see role mail,
|
||||||
|
# opendkim_signing_domains).
|
||||||
|
bulk_mail_subdomain: send.performancewest.net
|
||||||
|
campaign_from_trucking: "Performance West <noreply@send.performancewest.net>"
|
||||||
|
campaign_from_healthcare: "Performance West Compliance <compliance@send.performancewest.net>"
|
||||||
|
campaign_reply_to: info@performancewest.net
|
||||||
|
|
||||||
# ── Listmonk (mass-mail via the LOCAL MTA) ───────────────────────────────────
|
# ── Listmonk (mass-mail via the LOCAL MTA) ───────────────────────────────────
|
||||||
# Listmonk SMTP is configured via its web admin UI, not env vars. Listmonk relays
|
# Listmonk SMTP is configured via its web admin UI, not env vars. Listmonk relays
|
||||||
# through the host Postfix (172.18.0.1:25 from inside the Docker network), which
|
# through the host Postfix (172.18.0.1:25 from inside the Docker network), which
|
||||||
|
|
|
||||||
|
|
@ -13,6 +13,19 @@ opendkim_selector: mail
|
||||||
opendkim_signing_domain: performancewest.net
|
opendkim_signing_domain: performancewest.net
|
||||||
opendkim_socket: "inet:8891@localhost"
|
opendkim_socket: "inet:8891@localhost"
|
||||||
|
|
||||||
|
# Signing domains. The root domain carries transactional/verification mail; the
|
||||||
|
# dedicated bulk subdomain (send.performancewest.net) carries Listmonk campaign
|
||||||
|
# mail so its sending reputation is isolated from the root domain (which then
|
||||||
|
# stays clean and recovers faster). Each entry generates its own key + selector
|
||||||
|
# and contributes a line to KeyTable/SigningTable. The first entry is treated as
|
||||||
|
# the primary (kept for backwards-compat with opendkim_signing_domain above).
|
||||||
|
# See docs/deliverability.md.
|
||||||
|
opendkim_signing_domains:
|
||||||
|
- domain: "{{ opendkim_signing_domain }}"
|
||||||
|
selector: "{{ opendkim_selector }}"
|
||||||
|
- domain: "send.performancewest.net"
|
||||||
|
selector: "send"
|
||||||
|
|
||||||
# Hosts OpenDKIM will SIGN for (vs verify). Must include the Docker bridge
|
# Hosts OpenDKIM will SIGN for (vs verify). Must include the Docker bridge
|
||||||
# subnet so Listmonk container traffic is signed.
|
# subnet so Listmonk container traffic is signed.
|
||||||
opendkim_internal_hosts:
|
opendkim_internal_hosts:
|
||||||
|
|
|
||||||
|
|
@ -8,43 +8,57 @@
|
||||||
|
|
||||||
- name: Ensure OpenDKIM key directory exists
|
- name: Ensure OpenDKIM key directory exists
|
||||||
ansible.builtin.file:
|
ansible.builtin.file:
|
||||||
path: "/etc/opendkim/keys/{{ opendkim_signing_domain }}"
|
path: "/etc/opendkim/keys/{{ item.domain }}"
|
||||||
state: directory
|
state: directory
|
||||||
owner: opendkim
|
owner: opendkim
|
||||||
group: opendkim
|
group: opendkim
|
||||||
mode: "0750"
|
mode: "0750"
|
||||||
|
loop: "{{ opendkim_signing_domains }}"
|
||||||
|
loop_control:
|
||||||
|
label: "{{ item.domain }}"
|
||||||
|
|
||||||
- name: Generate DKIM keypair if missing
|
- name: Generate DKIM keypair if missing
|
||||||
ansible.builtin.command:
|
ansible.builtin.command:
|
||||||
cmd: >-
|
cmd: >-
|
||||||
opendkim-genkey
|
opendkim-genkey
|
||||||
-b 2048
|
-b 2048
|
||||||
-d {{ opendkim_signing_domain }}
|
-d {{ item.domain }}
|
||||||
-s {{ opendkim_selector }}
|
-s {{ item.selector }}
|
||||||
-D /etc/opendkim/keys/{{ opendkim_signing_domain }}
|
-D /etc/opendkim/keys/{{ item.domain }}
|
||||||
creates: "/etc/opendkim/keys/{{ opendkim_signing_domain }}/{{ opendkim_selector }}.private"
|
creates: "/etc/opendkim/keys/{{ item.domain }}/{{ item.selector }}.private"
|
||||||
|
loop: "{{ opendkim_signing_domains }}"
|
||||||
|
loop_control:
|
||||||
|
label: "{{ item.domain }} ({{ item.selector }})"
|
||||||
register: dkim_keygen
|
register: dkim_keygen
|
||||||
|
|
||||||
- name: Fix DKIM private key ownership
|
- name: Fix DKIM private key ownership
|
||||||
ansible.builtin.file:
|
ansible.builtin.file:
|
||||||
path: "/etc/opendkim/keys/{{ opendkim_signing_domain }}/{{ opendkim_selector }}.private"
|
path: "/etc/opendkim/keys/{{ item.domain }}/{{ item.selector }}.private"
|
||||||
owner: opendkim
|
owner: opendkim
|
||||||
group: opendkim
|
group: opendkim
|
||||||
mode: "0600"
|
mode: "0600"
|
||||||
|
loop: "{{ opendkim_signing_domains }}"
|
||||||
|
loop_control:
|
||||||
|
label: "{{ item.domain }}"
|
||||||
|
|
||||||
- name: Show DKIM public DNS record to publish (only when newly generated)
|
- name: Show DKIM public DNS records to publish (only when newly generated)
|
||||||
ansible.builtin.debug:
|
ansible.builtin.debug:
|
||||||
msg: >-
|
msg: >-
|
||||||
A new DKIM key was generated. Publish the TXT record from
|
A new DKIM key was generated. Publish the TXT record from
|
||||||
/etc/opendkim/keys/{{ opendkim_signing_domain }}/{{ opendkim_selector }}.txt
|
/etc/opendkim/keys/{{ item.item.domain }}/{{ item.item.selector }}.txt
|
||||||
at {{ opendkim_selector }}._domainkey.{{ opendkim_signing_domain }}
|
at {{ item.item.selector }}._domainkey.{{ item.item.domain }}
|
||||||
when: dkim_keygen is changed
|
loop: "{{ dkim_keygen.results }}"
|
||||||
|
loop_control:
|
||||||
|
label: "{{ item.item.domain }}"
|
||||||
|
when: item is changed
|
||||||
|
|
||||||
- name: Deploy OpenDKIM KeyTable
|
- name: Deploy OpenDKIM KeyTable
|
||||||
ansible.builtin.copy:
|
ansible.builtin.copy:
|
||||||
dest: /etc/opendkim/key.table
|
dest: /etc/opendkim/key.table
|
||||||
content: |
|
content: |
|
||||||
{{ opendkim_selector }}._domainkey.{{ opendkim_signing_domain }} {{ opendkim_signing_domain }}:{{ opendkim_selector }}:/etc/opendkim/keys/{{ opendkim_signing_domain }}/{{ opendkim_selector }}.private
|
{% for d in opendkim_signing_domains %}
|
||||||
|
{{ d.selector }}._domainkey.{{ d.domain }} {{ d.domain }}:{{ d.selector }}:/etc/opendkim/keys/{{ d.domain }}/{{ d.selector }}.private
|
||||||
|
{% endfor %}
|
||||||
owner: root
|
owner: root
|
||||||
group: root
|
group: root
|
||||||
mode: "0644"
|
mode: "0644"
|
||||||
|
|
@ -54,7 +68,9 @@
|
||||||
ansible.builtin.copy:
|
ansible.builtin.copy:
|
||||||
dest: /etc/opendkim/signing.table
|
dest: /etc/opendkim/signing.table
|
||||||
content: |
|
content: |
|
||||||
*@{{ opendkim_signing_domain }} {{ opendkim_selector }}._domainkey.{{ opendkim_signing_domain }}
|
{% for d in opendkim_signing_domains %}
|
||||||
|
*@{{ d.domain }} {{ d.selector }}._domainkey.{{ d.domain }}
|
||||||
|
{% endfor %}
|
||||||
owner: root
|
owner: root
|
||||||
group: root
|
group: root
|
||||||
mode: "0644"
|
mode: "0644"
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue