docs(deliverability): document Gmail re-enablement stale-Date/burial fix

When we resume Gmail sends, the front-loaded-inject + slow-drain pattern
buries mail: Listmonk stamps Date at injection (verified live: queued msg
Date matched postfix arrival, deferred 4h47m later), and Gmail sorts the
inbox by the Date header. So a msg injected at 08:00 but accepted at 14:00
files 6h down a Gmail inbox.

Documents: why NOT to future-date the Date header (spam signal + breaks our
DKIM which signs Date + doesn't help Outlook's received-time sort), and the
real fix -- pace Listmonk injection to match Gmail's accept rate (just-in-time
Date) via a dedicated Gmail stream on its own IP + low sliding-window rate +
queue-age guard. Outlook/M365 (current audience) sorts by received time so the
burial is cosmetic there and not worth fixing.

Procedure only; Gmail still excluded in _email_exclusions.py until re-enabled.
This commit is contained in:
justin 2026-06-24 01:24:24 -05:00
parent 9dd6f53eb2
commit c20edb28cd

View file

@ -115,6 +115,82 @@ subdomain SPF.
---
## Resuming Gmail sends: the stale-Date / inbox-burial problem (READ BEFORE re-enabling Gmail)
**Status:** Gmail is currently EXCLUDED from all sends (`scripts/_email_exclusions.py`
`BLOCKED_EMAIL_DOMAINS` includes gmail/google). This section is the documented
procedure for when we resume Gmail, and the reasoning for the chosen design. It is
NOT yet implemented — implement it at the moment Gmail is re-enabled.
### The problem
We inject the whole daily batch into Postfix in a ~2.5h burst (today: 1,430 + 1,419
+ 1,077 messages in the 07:00-09:30 window, with a 932-in-one-minute spike at
08:30), then Postfix slow-drains the queue over ~24h because receivers throttle a
warming IP/domain (Microsoft `451 4.7.500 Server busy`).
**Listmonk stamps the `Date:` header at the moment it hands each message to Postfix
(injection time), NOT at delivery time.** Empirically verified 2026-06-23: a queued
message had `Date: 19:47:28` matching its Postfix arrival log line exactly, and was
still deferred ~4h47m later. So a message injected at 08:00 keeps an 08:00 `Date:`
even when the receiver finally accepts it at 14:00.
**Why this matters ONLY for Gmail:** inbox sort order depends on the client.
- **Outlook / Exchange / M365** (our current #1 audience, ~2,000 delivered/day) and
most webmail (Proton, etc.) sort by **received time** (`PR_MESSAGE_DELIVERY_TIME`)
= when THEIR server accepted it. A late-delivered message surfaces fresh at the
top on arrival; only the *displayed* date looks old. So for today's audience the
burial is cosmetic and NOT worth fixing.
- **Gmail sorts the inbox by the `Date:` header.** A message accepted at 14:00 but
Date-stamped 08:00 is filed **6h down** the inbox, below mail the user has already
read. That is real burial and real lost opens — and it only bites once we send
Gmail again (which is ~85% Microsoft / ~14% Google for our B2B list, so Gmail is
a meaningful slice).
### Why NOT to future-date / spoof the `Date:` header
The tempting "just stamp a future Date" fix is a net negative:
1. **Spam signal.** A `Date:` in the future is a classic filter heuristic —
Proofpoint, Mimecast, and Microsoft all penalize it. We'd trade a cosmetic
timestamp for WORSE inbox placement.
2. **It breaks our DKIM.** OpenDKIM signs the `Date` header (only `From` is
over-signed, but `Date` is in the signed set). Rewriting `Date` after signing
invalidates the signature -> DMARC `p=reject` -> hard bounce.
3. **It doesn't even help Outlook** (received-time sort) and is the wrong lever for
Gmail (see the real fix below).
### The fix: pace Listmonk INJECTION to match Gmail's accept rate (just-in-time Date)
Because `Date:` is stamped at injection, the solution is to **release each Gmail
message close to when Gmail will actually accept it**, so `Date:` ≈ received time ≈
now, and it lands at the top of the Gmail inbox. Keep the Postfix queue shallow for
the Gmail stream so no message sits for hours collecting a stale Date.
Implementation when re-enabling Gmail:
1. **Segment Gmail into its OWN Listmonk campaign on its OWN single IP** (snowshoe-
safe), separate from the Microsoft/Proofpoint stream, so its deliberately slow
pace does not bottleneck the fast stream. Each stream gets its own injection
cadence. (Add the new IP to host + Postfix transport + BOTH SPF records first,
per the re-expand note above.)
2. **Set the Gmail campaign's sliding-window injection rate at or below Gmail's
sustained cold-domain accept rate** (`app.message_sliding_window_rate` /
`_duration` on that Listmonk instance). Start low (~20-30/hr/IP for a cold
domain) and ramp as Postmaster Tools reputation climbs. This spreads injection
across the whole sending window instead of front-loading it, so the queue never
builds a backlog of stale-dated Gmail mail.
3. **Queue-age guard.** Monitor the inject->deliver gap for the Gmail stream
(`delay=` in the maillog). If it exceeds ~30 min, injection is outrunning
acceptance -> throttle the sliding-window rate down further. Verify after a day
that the Gmail stream's `delay=` stays small and the "6-24h late" bucket is ~0.
This is strictly better than date-spoofing: no spam signal, no DKIM break, and
because Gmail/Microsoft both reward steady paced volume, pacing injection also
RAISES the accept quota over time (the deliverability principle "concentrated low
volume beats bursts"). Win-win.
> Note: this same pacing slightly helps Outlook's *displayed* date too, but since
> Outlook sorts by received time it is not necessary there. Only spend the effort on
> the Gmail stream.
---
## DNS automation (Hestia is the master)
**DNS is fully automatable** — Hestia (`cp.carrierone.com`, 207.174.124.22) is the