# Carbonio `noreply@` mailbox auto-purge Server-side maintenance for the `noreply@performancewest.net` mailbox on the Carbonio (Zextras) mail host `co.carrierone.com`. ## Problem The `noreply@` mailbox accumulated **35,337 messages (~488 MB)**. A sampled audit showed **~98.6% were machine noise**: bounce DSNs (this box's own Postfix backscatter), out-of-office / auto-reply messages, and helpdesk/ticket auto-acknowledgements. Buried in the rest were a small number of **genuine human replies** to the trucking (DOT#/MCS-150) and telecom/FCC campaigns -- these land here because of the historical Reply-To behaviour -- plus the occasional **unsubscribe** request. ## Policy (explicit) - **DELETE**: bounces, ticket/case auto-acknowledgements, out-of-office and auto-reply messages, delivery-status notifications, authentication reports. - **KEEP**: genuine human replies (`Re:`/`Fwd:`) and unsubscribe/opt-out requests. - **Fail-safe**: when a message is not clearly machine-generated, KEEP it. - Deletions **move to `/Trash`** (reversible), never hard-delete. ## Why a header/sender classifier, not subject matching Subject text alone is unreliable: auto-responders frequently reply with a deceptive `Re:` prefix (e.g. an auto-responder answering our campaign with `Re: `). The classifier therefore uses, in precedence order: 1. **Unsubscribe guard** (compliance) -- always KEEP, overrides everything. 2. **RFC 3834 `Auto-Submitted:` header** -- if present and != `no`, the sending system has declared the message automatic (bounces = `auto-generated`, vacation/auto-replies = `auto-replied`). This is the single most reliable signal and it catches the deceptive `Re:` auto-responders. 3. **Machine From-address** -- exact bot localparts (`mailer-daemon`, `postmaster`, `no-reply`, ...), strong tokens anywhere in the localpart (`...-bounces@`, `expense-noreply-...@`, `auth-results@`), and display-name bots (`Mail Delivery System`, `System Administrator`, ...). 4. **STRONG auto subjects** -- unambiguous machine markers no human types (`New Ticket Created`, `(autoresponse)`, `Auto Re:`, `your request with id ##...##`, `we're on it`, `Undeliverable`, `Authentication Report`, ...). Checked **before** the human `Re:` guard so ticket auto-acks dressed as `Re:` are still removed. 5. **Human `Re:`/`Fwd:`** -- KEEP. 6. **Ticket tag `[##...##]` / broad auto-ack subjects** -- DELETE. 7. **Default -> KEEP** (human-safe). Subjects are RFC 2047 MIME-decoded first (campaign subjects contain an em-dash, so they arrive `=?utf-8?Q?...?=` encoded and would otherwise evade matching). The ruleset was validated against a hand-labelled set drawn from the live mailbox: **23/23 cases correct**, including keeping the real `Re:` replies from the same campaigns whose auto-responder twins were deleted. ## Execution model `nr_purge.sh` runs in three stages so the expensive part stays small: - **Phase 1** -- fast server-side search-delete of `from:MAILER-DAEMON` bounces (the ~97% bulk), guarded against unsubscribe. No per-message fetch. - **Phase 1.5** -- fast search-delete of the common non-MAILER machine classes (`from:postmaster`, `Undeliverable`, `automatic reply`, `out of office`, `delivery status notification`), each hard-guarded with `AND NOT (subject:Re OR subject:Fwd OR subject:unsubscribe ...)` so anything ambiguous falls through to the accurate classifier. - **Phase 2** -- header-classify the small remainder one message at a time using the full `decide()` ruleset; KEEP decisions are cached so survivors are not re-fetched on subsequent pages. On the initial backfill this reduced **35,337 -> 68** messages (67 genuine human replies + 1 unsubscribe), moving ~35,269 machine items to Trash. ## Usage ```sh # read-only preview of the N most-recent messages (prints survivors + sample deletes) bash nr_purge.sh --preview 150 # full purge (move matches to /Trash) bash nr_purge.sh --apply # date-bounded purge (only inspect last N days) -- used by the daily cron bash nr_purge.sh --apply --days 3 # Phase-1-only fast bounce sweep bash nr_purge.sh --apply --quick ``` ## Deployment The script lives on the Carbonio host at `/opt/zextras/nr_purge.sh` (and a copy in `~zextras/`). It must run as the `zextras` user (owns `zmmailbox`). A daily cron is installed in **root's** crontab (not the zextras crontab, which Carbonio/`zmcontrol` regenerates and would wipe): ```cron 17 4 * * * su - zextras -c 'bash /opt/zextras/nr_purge.sh --apply --days 3' >> /var/log/nr_purge_cron.log 2>&1 ``` `--days 3` keeps the daily run cheap: it only header-inspects mail from the last three days (a few dozen messages), which is more than enough overlap to catch anything that arrived since the previous run. To (re)deploy after editing this file: ```sh scp -P 22022 nr_purge.sh justin@co.carrierone.com:/tmp/nr_purge.sh ssh -p 22022 justin@co.carrierone.com \ 'sudo cp /tmp/nr_purge.sh /opt/zextras/nr_purge.sh && sudo chown zextras: /opt/zextras/nr_purge.sh && sudo chmod +x /opt/zextras/nr_purge.sh' ``` ## Notes / gotchas - `zmmailbox search -l` works up to 1000 results/page; offset paging (`-o`) does not work reliably and large limits (2000+) silently return empty. The script loops on "delete the top page, re-search" instead of offset paging. - Trash still counts against mailbox size until emptied. The initial backfill left Trash populated (reversible); emptying it is an optional, irreversible follow-up.