ops(carbonio): add noreply@ mailbox auto-purge + daily cron
Server-side classifier for the noreply@performancewest.net Carbonio mailbox (35,337 msgs, ~98.6% machine noise). Deletes bounces/auto-replies/ticket auto-acks, keeps genuine human Re: replies + unsubscribes (move to Trash, reversible). Classifier precedence: unsubscribe guard > RFC3834 Auto-Submitted header > machine From-address (localpart/strong-token/display-bot) > STRONG auto subjects (deletes deceptive Re: auto-acks) > human Re: keep > broad auto-ack subjects > default keep. Subjects RFC2047 MIME-decoded first. Three-phase execution: Phase1 fast MAILER-DAEMON search-delete, Phase1.5 fast search-delete of common auto classes (guarded against Re:/unsub), Phase2 header-classify the small remainder with KEEP-caching. Validated 23/23 against hand-labelled live sample. Initial backfill reduced 35,337 -> 68 (67 human replies + 1 unsubscribe). Daily cron installed in root crontab: 17 4 * * * --apply --days 3.
This commit is contained in:
parent
e414ec4a5f
commit
2d220a273d
3 changed files with 333 additions and 0 deletions
121
scripts/ops/carbonio/README.md
Normal file
121
scripts/ops/carbonio/README.md
Normal file
|
|
@ -0,0 +1,121 @@
|
||||||
|
# Carbonio `noreply@` mailbox auto-purge
|
||||||
|
|
||||||
|
Server-side maintenance for the `noreply@performancewest.net` mailbox on the
|
||||||
|
Carbonio (Zextras) mail host `co.carrierone.com`.
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
The `noreply@` mailbox accumulated **35,337 messages (~488 MB)**. A sampled
|
||||||
|
audit showed **~98.6% were machine noise**: bounce DSNs (this box's own Postfix
|
||||||
|
backscatter), out-of-office / auto-reply messages, and helpdesk/ticket
|
||||||
|
auto-acknowledgements. Buried in the rest were a small number of **genuine human
|
||||||
|
replies** to the trucking (DOT#/MCS-150) and telecom/FCC campaigns -- these land
|
||||||
|
here because of the historical Reply-To behaviour -- plus the occasional
|
||||||
|
**unsubscribe** request.
|
||||||
|
|
||||||
|
## Policy (explicit)
|
||||||
|
|
||||||
|
- **DELETE**: bounces, ticket/case auto-acknowledgements, out-of-office and
|
||||||
|
auto-reply messages, delivery-status notifications, authentication reports.
|
||||||
|
- **KEEP**: genuine human replies (`Re:`/`Fwd:`) and unsubscribe/opt-out
|
||||||
|
requests.
|
||||||
|
- **Fail-safe**: when a message is not clearly machine-generated, KEEP it.
|
||||||
|
- Deletions **move to `/Trash`** (reversible), never hard-delete.
|
||||||
|
|
||||||
|
## Why a header/sender classifier, not subject matching
|
||||||
|
|
||||||
|
Subject text alone is unreliable: auto-responders frequently reply with a
|
||||||
|
deceptive `Re:` prefix (e.g. an auto-responder answering our campaign with
|
||||||
|
`Re: <our subject>`). The classifier therefore uses, in precedence order:
|
||||||
|
|
||||||
|
1. **Unsubscribe guard** (compliance) -- always KEEP, overrides everything.
|
||||||
|
2. **RFC 3834 `Auto-Submitted:` header** -- if present and != `no`, the sending
|
||||||
|
system has declared the message automatic (bounces = `auto-generated`,
|
||||||
|
vacation/auto-replies = `auto-replied`). This is the single most reliable
|
||||||
|
signal and it catches the deceptive `Re:` auto-responders.
|
||||||
|
3. **Machine From-address** -- exact bot localparts (`mailer-daemon`,
|
||||||
|
`postmaster`, `no-reply`, ...), strong tokens anywhere in the localpart
|
||||||
|
(`...-bounces@`, `expense-noreply-...@`, `auth-results@`), and display-name
|
||||||
|
bots (`Mail Delivery System`, `System Administrator`, ...).
|
||||||
|
4. **STRONG auto subjects** -- unambiguous machine markers no human types
|
||||||
|
(`New Ticket Created`, `(autoresponse)`, `Auto Re:`, `your request with id
|
||||||
|
##...##`, `we're on it`, `Undeliverable`, `Authentication Report`, ...).
|
||||||
|
Checked **before** the human `Re:` guard so ticket auto-acks dressed as `Re:`
|
||||||
|
are still removed.
|
||||||
|
5. **Human `Re:`/`Fwd:`** -- KEEP.
|
||||||
|
6. **Ticket tag `[##...##]` / broad auto-ack subjects** -- DELETE.
|
||||||
|
7. **Default -> KEEP** (human-safe).
|
||||||
|
|
||||||
|
Subjects are RFC 2047 MIME-decoded first (campaign subjects contain an em-dash,
|
||||||
|
so they arrive `=?utf-8?Q?...?=` encoded and would otherwise evade matching).
|
||||||
|
|
||||||
|
The ruleset was validated against a hand-labelled set drawn from the live
|
||||||
|
mailbox: **23/23 cases correct**, including keeping the real `Re:` replies from
|
||||||
|
the same campaigns whose auto-responder twins were deleted.
|
||||||
|
|
||||||
|
## Execution model
|
||||||
|
|
||||||
|
`nr_purge.sh` runs in three stages so the expensive part stays small:
|
||||||
|
|
||||||
|
- **Phase 1** -- fast server-side search-delete of `from:MAILER-DAEMON` bounces
|
||||||
|
(the ~97% bulk), guarded against unsubscribe. No per-message fetch.
|
||||||
|
- **Phase 1.5** -- fast search-delete of the common non-MAILER machine classes
|
||||||
|
(`from:postmaster`, `Undeliverable`, `automatic reply`, `out of office`,
|
||||||
|
`delivery status notification`), each hard-guarded with
|
||||||
|
`AND NOT (subject:Re OR subject:Fwd OR subject:unsubscribe ...)` so anything
|
||||||
|
ambiguous falls through to the accurate classifier.
|
||||||
|
- **Phase 2** -- header-classify the small remainder one message at a time using
|
||||||
|
the full `decide()` ruleset; KEEP decisions are cached so survivors are not
|
||||||
|
re-fetched on subsequent pages.
|
||||||
|
|
||||||
|
On the initial backfill this reduced **35,337 -> 68** messages (67 genuine human
|
||||||
|
replies + 1 unsubscribe), moving ~35,269 machine items to Trash.
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
```sh
|
||||||
|
# read-only preview of the N most-recent messages (prints survivors + sample deletes)
|
||||||
|
bash nr_purge.sh --preview 150
|
||||||
|
|
||||||
|
# full purge (move matches to /Trash)
|
||||||
|
bash nr_purge.sh --apply
|
||||||
|
|
||||||
|
# date-bounded purge (only inspect last N days) -- used by the daily cron
|
||||||
|
bash nr_purge.sh --apply --days 3
|
||||||
|
|
||||||
|
# Phase-1-only fast bounce sweep
|
||||||
|
bash nr_purge.sh --apply --quick
|
||||||
|
```
|
||||||
|
|
||||||
|
## Deployment
|
||||||
|
|
||||||
|
The script lives on the Carbonio host at `/opt/zextras/nr_purge.sh` (and a copy
|
||||||
|
in `~zextras/`). It must run as the `zextras` user (owns `zmmailbox`).
|
||||||
|
|
||||||
|
A daily cron is installed in **root's** crontab (not the zextras crontab, which
|
||||||
|
Carbonio/`zmcontrol` regenerates and would wipe):
|
||||||
|
|
||||||
|
```cron
|
||||||
|
17 4 * * * su - zextras -c 'bash /opt/zextras/nr_purge.sh --apply --days 3' >> /var/log/nr_purge_cron.log 2>&1
|
||||||
|
```
|
||||||
|
|
||||||
|
`--days 3` keeps the daily run cheap: it only header-inspects mail from the last
|
||||||
|
three days (a few dozen messages), which is more than enough overlap to catch
|
||||||
|
anything that arrived since the previous run.
|
||||||
|
|
||||||
|
To (re)deploy after editing this file:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
scp -P 22022 nr_purge.sh justin@co.carrierone.com:/tmp/nr_purge.sh
|
||||||
|
ssh -p 22022 justin@co.carrierone.com \
|
||||||
|
'sudo cp /tmp/nr_purge.sh /opt/zextras/nr_purge.sh && sudo chown zextras: /opt/zextras/nr_purge.sh && sudo chmod +x /opt/zextras/nr_purge.sh'
|
||||||
|
```
|
||||||
|
|
||||||
|
## Notes / gotchas
|
||||||
|
|
||||||
|
- `zmmailbox search -l` works up to 1000 results/page; offset paging (`-o`) does
|
||||||
|
not work reliably and large limits (2000+) silently return empty. The script
|
||||||
|
loops on "delete the top page, re-search" instead of offset paging.
|
||||||
|
- Trash still counts against mailbox size until emptied. The initial backfill
|
||||||
|
left Trash populated (reversible); emptying it is an optional, irreversible
|
||||||
|
follow-up.
|
||||||
10
scripts/ops/carbonio/nr_cron_install.sh
Normal file
10
scripts/ops/carbonio/nr_cron_install.sh
Normal file
|
|
@ -0,0 +1,10 @@
|
||||||
|
#!/bin/bash
|
||||||
|
# Install daily noreply@ auto-purge cron in ROOT crontab (NOT zextras' -- that one is
|
||||||
|
# regenerated by Carbonio/zmcontrol and would wipe our line). Root crontab is stable.
|
||||||
|
# Invokes the purge as the zextras user. Date-bounded (last 3 days) so it stays cheap.
|
||||||
|
set -e
|
||||||
|
SCRIPT=/opt/zextras/nr_purge.sh
|
||||||
|
LOG=/var/log/nr_purge_cron.log
|
||||||
|
CRON_LINE="17 4 * * * su - zextras -c 'bash $SCRIPT --apply --days 3' >> $LOG 2>&1"
|
||||||
|
( crontab -l 2>/dev/null | grep -v 'nr_purge.sh' ; echo "$CRON_LINE" ) | crontab -
|
||||||
|
echo "=== root crontab nr_purge line ==="; crontab -l | grep nr_purge
|
||||||
202
scripts/ops/carbonio/nr_purge.sh
Executable file
202
scripts/ops/carbonio/nr_purge.sh
Executable file
|
|
@ -0,0 +1,202 @@
|
||||||
|
#!/bin/bash
|
||||||
|
# nr_purge.sh -- auto-purge noreply@ mailbox on Carbonio.
|
||||||
|
# Policy: DELETE bounces + ticket auto-acks + auto-replies; KEEP human replies + unsubscribes.
|
||||||
|
# Discriminator: RFC 3834 Auto-Submitted header (reliable; catches fake "Re:" auto-responders).
|
||||||
|
# Reversible: deletions MOVE to /Trash (not hard delete).
|
||||||
|
#
|
||||||
|
# Modes:
|
||||||
|
# (no args) preview: classify most-recent $PREVIEW_N msgs, read-only, print decisions+survivors
|
||||||
|
# --preview N preview N most-recent
|
||||||
|
# --apply full two-phase purge (Phase1 bulk bounces, Phase2 header-classify remainder)
|
||||||
|
# --apply --quick Phase1 only (bulk bounce delete), skip header classify
|
||||||
|
set -uo pipefail
|
||||||
|
M="noreply@performancewest.net"
|
||||||
|
TRASH="/Trash"
|
||||||
|
PREVIEW_N=200
|
||||||
|
MODE="preview"; QUICK=0; DAYS="${NR_DAYS:-}"
|
||||||
|
while [ $# -gt 0 ]; do case "$1" in
|
||||||
|
--apply) MODE="apply";;
|
||||||
|
--quick) QUICK=1;;
|
||||||
|
--days) DAYS="${2:-}"; shift;;
|
||||||
|
--preview) MODE="preview"; PREVIEW_N="${2:-200}"; shift;;
|
||||||
|
*) ;;
|
||||||
|
esac; shift; done
|
||||||
|
# Optional date bound for Phase2 (daily cron uses a small window; initial run leaves blank=all)
|
||||||
|
DATEQ=""; [ -n "$DAYS" ] && DATEQ=" AND after:-${DAYS}day"
|
||||||
|
TS=$(date +%Y%m%d_%H%M%S); LOG="/tmp/nr_purge_$TS.log"
|
||||||
|
zm(){ zmmailbox -z -m "$M" "$@" 2>/dev/null; }
|
||||||
|
|
||||||
|
# ---- RFC 2047 MIME-header decode (handles =?utf-8?Q?..?= and ?B?..?=) ----
|
||||||
|
mime_decode(){ perl -MEncode -CS -ne 'print Encode::decode("MIME-Header",$_)' 2>/dev/null; }
|
||||||
|
|
||||||
|
# Machine-sender address localparts (exact, lowercased): definitionally non-human.
|
||||||
|
# Matched against the localpart of the From address only (not display name) to avoid eating humans.
|
||||||
|
FROM_MACHINE_RE='^(mailer-daemon|postmaster|auto-reply|autoreply|auto-responder|autoresponder|no-reply|noreply|donotreply|do-not-reply|bounce|bounces|mdaemon|odoobot|helpdesk|notification|notifications|notify|sysadmin|system|root|abuse)([._+-].*)?$'
|
||||||
|
# Strong machine tokens that may appear ANYWHERE in the localpart (no human address has these).
|
||||||
|
FROM_TOKEN_RE='noreply|no-reply|donotreply|do-not-reply|mailer-daemon|auto-reply|autoreply|autoresponse|auto-response|bounces|auth-results|postmaster'
|
||||||
|
# Display-name bots (substring, lowercased) that use human-ish addresses but are clearly automated.
|
||||||
|
FROM_DISPLAY_BOT_RE='odoobot|mail delivery (sub)?system|system administrator|microsoft outlook|mail administrator|postmaster'
|
||||||
|
# STRONG auto markers checked BEFORE the human "Re:" guard -- unambiguous machine subjects that
|
||||||
|
# no human types, so safe to delete even when wearing a "Re:" prefix (e.g. ticket auto-acks).
|
||||||
|
STRONG_AUTO_RE='new ticket|ticket created|ticket #|ticket no\.?|ticket has been|has been (assigned|received|resolved|closed|created|opened|updated)|your request with id|request with id|we.?re on it|\(autoresponse\)|auto ?re:|automatic (reply|response)|auto-?response|out of office|out-of-office|authentication report|undeliverable|undelivered|delivery status notification|could ?n.?t be delivered|could not be delivered|message could ?n.?t be|failure notice|returned mail|welcome to .*help ?desk|new case notification'
|
||||||
|
# Broader auto-ack / bounce subject patterns (lowercased subject), checked AFTER the Re: guard.
|
||||||
|
SUBJ_AUTO_RE='has been (received|resolved|closed|updated|created|opened|assigned)|case (received|closed|resolved|notification)|ticket ?#|ticket no\.?|ticket has been|your ticket|new ticket|ticket created|your request with id|thanks,? we got your|we have received your|out of office|out-of-office|automatic reply|automatic response|auto[- ]?reply|autoreply|auto-?response|\(autoresponse\)|new case|message (recieved|received)|delivery (status notification|failure|has failed)|undelivered|undeliverable|failure notice|^failed:|returned mail|mail delivery|could not be delivered|could ?n.?t be delivered|delayed mail|invalid address|address not found|recipient (address )?rejected|new email address|quota|read-?receipt|priority opened|authentication report|help ?desk'
|
||||||
|
|
||||||
|
# from_localpart <header-block> -> echoes lowercased localpart of From address
|
||||||
|
from_localpart(){
|
||||||
|
printf '%s' "$1" | grep -iE '^From:' | head -1 \
|
||||||
|
| sed -E 's/^From:[[:space:]]*//I' \
|
||||||
|
| grep -oE '[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+' | head -1 \
|
||||||
|
| sed -E 's/@.*$//' | tr 'A-Z' 'a-z'
|
||||||
|
}
|
||||||
|
from_display(){ printf '%s' "$1" | grep -iE '^From:' | head -1 | sed -E 's/^From:[[:space:]]*//I' | tr 'A-Z' 'a-z'; }
|
||||||
|
|
||||||
|
# decide <header-block> <decoded-subject> -> prints "KEEP <reason>" | "DEL <reason>"
|
||||||
|
# Precedence: (1) unsubscribe wins; (2) Auto-Submitted header; (3) machine From-address (exact
|
||||||
|
# localpart / strong token / display-bot); (4) STRONG auto subjects (delete even if "Re:");
|
||||||
|
# (5) genuine human Re:; (6) ticket-tag / broad auto-ack subjects; (7) default keep.
|
||||||
|
decide(){
|
||||||
|
local H="$1" subj="$2"
|
||||||
|
local s as lp disp
|
||||||
|
s=$(printf '%s' "$subj" | tr 'A-Z' 'a-z')
|
||||||
|
as=$(printf '%s' "$H" | grep -iE '^Auto-Submitted:' | head -1 | sed -E 's/^Auto-Submitted:[[:space:]]*//I' | tr 'A-Z' 'a-z' | tr -d ' ')
|
||||||
|
lp=$(from_localpart "$H"); disp=$(from_display "$H")
|
||||||
|
# 1) compliance: unsubscribe/opt-out always KEEP (overrides every machine signal)
|
||||||
|
if printf '%s' "$s" | grep -qE 'unsubscribe|opt[ -]?out|remove me|stop emailing'; then echo "KEEP unsubscribe"; return; fi
|
||||||
|
# 2) RFC 3834 Auto-Submitted present & != no -> machine
|
||||||
|
if [ -n "$as" ] && [ "$as" != "no" ]; then echo "DEL auto-submitted=$as"; return; fi
|
||||||
|
# 3) machine From-address (exact localpart, strong token anywhere, or display-name bot)
|
||||||
|
if printf '%s' "$lp" | grep -qE "$FROM_MACHINE_RE"; then echo "DEL from-machine=$lp"; return; fi
|
||||||
|
if printf '%s' "$lp" | grep -qE "$FROM_TOKEN_RE"; then echo "DEL from-token=$lp"; return; fi
|
||||||
|
if printf '%s' "$disp" | grep -qE "$FROM_DISPLAY_BOT_RE"; then echo "DEL from-bot"; return; fi
|
||||||
|
# 4) STRONG auto subjects: unambiguous machine markers, delete even if dressed as "Re:"
|
||||||
|
if printf '%s' "$s" | grep -qE "$STRONG_AUTO_RE"; then echo "DEL strong-auto-subject"; return; fi
|
||||||
|
# 5) genuine human threaded reply (auto ones already removed above)
|
||||||
|
if printf '%s' "$subj" | grep -qE '^[[:space:]]*(Re|RE|Fwd|Fw|FW)[:[]'; then echo "KEEP human-reply"; return; fi
|
||||||
|
# 6) ticket tag / broad auto-ack subject patterns
|
||||||
|
if printf '%s' "$subj" | grep -qE '^[[:space:]]*\[##'; then echo "DEL ticket-tag"; return; fi
|
||||||
|
if printf '%s' "$s" | grep -qE "$SUBJ_AUTO_RE"; then echo "DEL auto-ack-subject"; return; fi
|
||||||
|
# 7) default: keep (human-safe)
|
||||||
|
echo "KEEP default"
|
||||||
|
}
|
||||||
|
|
||||||
|
# fetch the joined+decoded Subject for a message id
|
||||||
|
get_subject(){ zm getRestURL "/?id=$1&fmt=rfc822" | sed -n '1,/^$/p' \
|
||||||
|
| awk 'BEGIN{IGNORECASE=1} /^Subject:/{s=$0;getline n; while(n ~ /^[ \t]/){sub(/^[ \t]+/," ",n); s=s n; getline n} print s; exit}' \
|
||||||
|
| sed -E 's/^Subject:[[:space:]]*//I' | mime_decode; }
|
||||||
|
|
||||||
|
# ---- classify one message id -> prints "KEEP <reason>" or "DEL <reason>" ----
|
||||||
|
classify(){
|
||||||
|
local H subj
|
||||||
|
H=$(zm getRestURL "/?id=$1&fmt=rfc822" | sed -n '1,/^$/p')
|
||||||
|
subj=$(printf '%s' "$H" | awk 'BEGIN{IGNORECASE=1} /^Subject:/{s=$0;getline n; while(n ~ /^[ \t]/){sub(/^[ \t]+/," ",n); s=s n; getline n} print s; exit}' | sed -E 's/^Subject:[[:space:]]*//I' | mime_decode)
|
||||||
|
decide "$H" "$subj"
|
||||||
|
}
|
||||||
|
|
||||||
|
# classify + emit decoded subject (single fetch) -> "<DECISION>\t<subject>"
|
||||||
|
classify2(){
|
||||||
|
local H subj
|
||||||
|
H=$(zm getRestURL "/?id=$1&fmt=rfc822" | sed -n '1,/^$/p')
|
||||||
|
subj=$(printf '%s' "$H" | awk 'BEGIN{IGNORECASE=1} /^Subject:/{s=$0;getline n; while(n ~ /^[ \t]/){sub(/^[ \t]+/," ",n); s=s n; getline n} print s; exit}' | sed -E 's/^Subject:[[:space:]]*//I' | mime_decode)
|
||||||
|
printf '%s\t%s\n' "$(decide "$H" "$subj")" "$(printf '%s' "$subj" | cut -c1-70)"
|
||||||
|
}
|
||||||
|
|
||||||
|
ids_for(){ zm search -l 1000 -t message "$1" | grep -w mess | awk '{print $2}'; }
|
||||||
|
|
||||||
|
move_to_trash(){ # stdin: ids one per line
|
||||||
|
local buf=() id n=0
|
||||||
|
while read -r id; do [ -z "$id" ] && continue; buf+=("$id"); n=$((n+1))
|
||||||
|
if [ ${#buf[@]} -ge 200 ]; then
|
||||||
|
local c="${buf[*]}"; zm moveMessage "${c// /,}" "$TRASH" >/dev/null; buf=(); fi
|
||||||
|
done
|
||||||
|
if [ ${#buf[@]} -gt 0 ]; then local c="${buf[*]}"; zm moveMessage "${c// /,}" "$TRASH" >/dev/null; fi
|
||||||
|
echo "$n"
|
||||||
|
}
|
||||||
|
|
||||||
|
if [ "$MODE" = "preview" ]; then
|
||||||
|
echo "=== PREVIEW (read-only) most-recent $PREVIEW_N ===" | tee -a "$LOG"
|
||||||
|
IDS=$(zm search -l "$PREVIEW_N" -t message "in:inbox" | grep -w mess | awk '{print $2}')
|
||||||
|
keep=0; del=0; survivors="/tmp/nr_survivors_$TS.txt"; deletes="/tmp/nr_deletes_$TS.txt"
|
||||||
|
: > "$survivors"; : > "$deletes"
|
||||||
|
for id in $IDS; do
|
||||||
|
line=$(classify2 "$id") # "<DECISION>\t<subject>"
|
||||||
|
d=${line%%$'\t'*}; subj=${line#*$'\t'}
|
||||||
|
if [[ "$d" == KEEP* ]]; then keep=$((keep+1)); echo "id=$id [$d] $subj" >> "$survivors"
|
||||||
|
else del=$((del+1)); echo "id=$id [$d] $subj" >> "$deletes"; fi
|
||||||
|
done
|
||||||
|
echo "kept=$keep deleted=$del" | tee -a "$LOG"
|
||||||
|
echo "--- SURVIVORS (would KEEP) ---" | tee -a "$LOG"
|
||||||
|
cat "$survivors" | tee -a "$LOG"
|
||||||
|
echo "--- sample DELETES (first 25) ---" | tee -a "$LOG"
|
||||||
|
head -25 "$deletes" | tee -a "$LOG"
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
# ---- APPLY ----
|
||||||
|
echo "=== APPLY purge $TS (move to $TRASH) ===" | tee -a "$LOG"
|
||||||
|
# Phase 1: fast bulk bounce delete (MAILER-DAEMON = definitionally bounce), keep-guard on unsubscribe
|
||||||
|
echo "PHASE1 bulk bounces..." | tee -a "$LOG"
|
||||||
|
p1=0; g1=0
|
||||||
|
while :; do
|
||||||
|
B=$(ids_for "in:inbox from:MAILER-DAEMON AND NOT (subject:unsubscribe OR subject:\"opt out\")$DATEQ")
|
||||||
|
[ -z "${B// }" ] && break
|
||||||
|
n=$(printf '%s\n' "$B" | move_to_trash)
|
||||||
|
p1=$((p1+n)); echo " moved $n (cum $p1)" | tee -a "$LOG"
|
||||||
|
[ "$n" -lt 1 ] && break
|
||||||
|
g1=$((g1+1)); [ "$g1" -gt 200 ] && { echo " PHASE1 guard stop" | tee -a "$LOG"; break; }
|
||||||
|
done
|
||||||
|
echo "PHASE1 done: $p1 bounces -> Trash" | tee -a "$LOG"
|
||||||
|
[ "$QUICK" = "1" ] && { echo "quick mode: stop after phase1"; exit 0; }
|
||||||
|
|
||||||
|
# Phase 1.5: fast SEARCH-based bulk delete of the common non-MAILER machine classes
|
||||||
|
# (postmaster bounces, Undeliverable/Undelivered DSNs, OOO/automatic-reply). These are
|
||||||
|
# matched server-side (no per-message fetch) and HARD-guarded so anything wearing a
|
||||||
|
# genuine "Re:"/Fwd: or unsubscribe falls through to the accurate Phase 2 classifier.
|
||||||
|
echo "PHASE1.5 fast search-delete of common auto classes..." | tee -a "$LOG"
|
||||||
|
GUARD='AND NOT (subject:Re OR subject:Fwd OR subject:unsubscribe OR subject:"opt out")'
|
||||||
|
p15=0
|
||||||
|
for q in \
|
||||||
|
"in:inbox from:postmaster $GUARD$DATEQ" \
|
||||||
|
"in:inbox subject:Undeliverable $GUARD$DATEQ" \
|
||||||
|
"in:inbox subject:\"Undelivered Mail\" $GUARD$DATEQ" \
|
||||||
|
"in:inbox subject:\"automatic reply\" $GUARD$DATEQ" \
|
||||||
|
"in:inbox subject:\"out of office\" $GUARD$DATEQ" \
|
||||||
|
"in:inbox subject:\"failure notice\" $GUARD$DATEQ" \
|
||||||
|
"in:inbox subject:\"delivery status notification\" $GUARD$DATEQ" \
|
||||||
|
; do
|
||||||
|
qg=0
|
||||||
|
while :; do
|
||||||
|
B=$(ids_for "$q")
|
||||||
|
[ -z "${B// }" ] && break
|
||||||
|
n=$(printf '%s\n' "$B" | move_to_trash)
|
||||||
|
p15=$((p15+n)); echo " [$q] moved $n (cum $p15)" | tee -a "$LOG"
|
||||||
|
[ "$n" -lt 1 ] && break
|
||||||
|
qg=$((qg+1)); [ "$qg" -gt 50 ] && break
|
||||||
|
done
|
||||||
|
done
|
||||||
|
echo "PHASE1.5 done: $p15 auto-class -> Trash" | tee -a "$LOG"
|
||||||
|
|
||||||
|
# Phase 2: header-classify the remainder. Offset paging is unreliable on this box,
|
||||||
|
# so we loop: classify the top page, delete its DELs, cache KEEPs as "seen" so we
|
||||||
|
# don't re-fetch them next pass. Terminate when a page yields only already-seen KEEPs.
|
||||||
|
echo "PHASE2 header-classify remainder..." | tee -a "$LOG"
|
||||||
|
p2=0; guard=0; SEEN="/tmp/nr_seen_$TS.txt"; : > "$SEEN"
|
||||||
|
while :; do
|
||||||
|
IDS=$(zm search -l 1000 -t message "in:inbox AND NOT from:MAILER-DAEMON$DATEQ" | grep -w mess | awk '{print $2}')
|
||||||
|
[ -z "${IDS// }" ] && break
|
||||||
|
delbuf=""; newwork=0
|
||||||
|
for id in $IDS; do
|
||||||
|
grep -qx "$id" "$SEEN" && continue # already classified KEEP, skip
|
||||||
|
newwork=1
|
||||||
|
d=$(classify "$id")
|
||||||
|
if [[ "$d" == DEL* ]]; then delbuf+="$id"$'\n'; else echo "$id" >> "$SEEN"; fi
|
||||||
|
done
|
||||||
|
if [ -n "${delbuf// }" ]; then
|
||||||
|
n=$(printf '%s' "$delbuf" | move_to_trash); p2=$((p2+n)); echo " page moved $n (cum $p2)" | tee -a "$LOG"
|
||||||
|
fi
|
||||||
|
# A page with no new (unseen) messages means everything left is cached-KEEP -> done.
|
||||||
|
if [ "$newwork" = "0" ]; then echo " page all-seen-KEEP, stop" | tee -a "$LOG"; break; fi
|
||||||
|
guard=$((guard+1)); [ "$guard" -gt 120 ] && { echo "guard stop" | tee -a "$LOG"; break; }
|
||||||
|
done
|
||||||
|
echo "PHASE2 done: $p2 auto/ack -> Trash; survivors cached in $SEEN ($(wc -l < "$SEEN"))" | tee -a "$LOG"
|
||||||
|
echo "TOTAL moved to Trash: $((p1+p2))" | tee -a "$LOG"
|
||||||
Loading…
Add table
Add a link
Reference in a new issue