mail: Fix 2 — bound the untagged (NULL mx_provider) bucket in the selector
Completes the MX-exclusion plan. Untagged carriers can't be excluded (the big-MX gate is MX-based, so an unresolved Google/Yahoo domain would slip through), and were previously UNCAPPED in select_sendable_carriers -- a flood of freshly-imported, never-resolved domains could dominate a run before pw-mx-tag resolves them. Added a single shared untagged_cap (env MAIN_UNTAGGED_MX_CAP, default max(quota,200)) so untagged sends are bounded without starving the pool: at the default the bucket can still fill an entire run's quota (no behavior change today), but the cap can be tightened to a fraction once pw-mx-tag has drained the backlog -- which is fast, since only ~3,035 distinct *verified-sendable* untagged domains remain (< one 20k/day tag run). Tagged carriers keep their per-operator caps unchanged. Verified: compiles; cap logic never starves at default, enforces the limit when set lower.
This commit is contained in:
parent
b7cce370d7
commit
bc93d93c5b
1 changed files with 22 additions and 5 deletions
|
|
@ -989,6 +989,16 @@ def select_sendable_carriers(
|
||||||
caps = mx_daily_caps(main_warmup_day())
|
caps = mx_daily_caps(main_warmup_day())
|
||||||
per_op: dict = {}
|
per_op: dict = {}
|
||||||
default_cap = caps.get("__default__", 50)
|
default_cap = caps.get("__default__", 50)
|
||||||
|
# Untagged (NULL mx_provider) safety cap. We can't exclude NULLs (the big-MX
|
||||||
|
# exclusion is MX-based, so an untagged Google/Yahoo domain would slip through),
|
||||||
|
# but we also shouldn't let a flood of freshly-imported, never-resolved domains
|
||||||
|
# dominate a run -- some are big/consumer operators we'd otherwise hold out.
|
||||||
|
# The pw-mx-tag cron drains the *sendable* untagged backlog fast (only ~3k
|
||||||
|
# distinct verified domains as of 2026-06-20, < one 20k/day run), so this is a
|
||||||
|
# between-runs safety net, not the primary gate. Generous enough to never starve
|
||||||
|
# the pool in normal operation. Tunable via MAIN_UNTAGGED_MX_CAP.
|
||||||
|
untagged_cap = int(os.getenv("MAIN_UNTAGGED_MX_CAP", str(max(quota, 200))))
|
||||||
|
untagged_used = 0
|
||||||
MX_IDX = 5 # mx_provider is the 6th column from fetch_carriers
|
MX_IDX = 5 # mx_provider is the 6th column from fetch_carriers
|
||||||
# Warmup caps are small, but old audiences can contain many prior bounces or
|
# Warmup caps are small, but old audiences can contain many prior bounces or
|
||||||
# unsubscribes. Scan beyond the quota while still bounding worst-case API calls.
|
# unsubscribes. Scan beyond the quota while still bounding worst-case API calls.
|
||||||
|
|
@ -1007,23 +1017,30 @@ def select_sendable_carriers(
|
||||||
continue
|
continue
|
||||||
seen_emails.add(email)
|
seen_emails.add(email)
|
||||||
# Per-MX-operator cap (reputation is per receiving operator).
|
# Per-MX-operator cap (reputation is per receiving operator).
|
||||||
# Untagged carriers (no mx_provider yet) are NOT capped here -- they
|
# Tagged carriers are capped per operator; untagged (no mx_provider
|
||||||
# would otherwise all collapse onto one __default__ bucket and starve
|
# yet) are bounded by a single shared safety cap (untagged_cap) instead
|
||||||
# the pool before tagging completes. The big-operator EXCLUSION in
|
# of being uncapped -- this stops a flood of unresolved domains (which
|
||||||
# fetch_carriers already keeps Google/MS out during warmup; this cap
|
# could include big/consumer operators) from dominating a run, without
|
||||||
# bounds the KNOWN operators once tagging fills in.
|
# starving the pool. The big-operator EXCLUSION in fetch_carriers keeps
|
||||||
|
# KNOWN Google/MS/consumer-MX out; the pw-mx-tag cron keeps NULL small.
|
||||||
prov = (row[MX_IDX] or "").strip().lower() if len(row) > MX_IDX else ""
|
prov = (row[MX_IDX] or "").strip().lower() if len(row) > MX_IDX else ""
|
||||||
if prov:
|
if prov:
|
||||||
cap = caps.get(prov, default_cap)
|
cap = caps.get(prov, default_cap)
|
||||||
if per_op.get(prov, 0) >= cap:
|
if per_op.get(prov, 0) >= cap:
|
||||||
skipped[f"mx_cap:{prov}"] = skipped.get(f"mx_cap:{prov}", 0) + 1
|
skipped[f"mx_cap:{prov}"] = skipped.get(f"mx_cap:{prov}", 0) + 1
|
||||||
continue
|
continue
|
||||||
|
else:
|
||||||
|
if untagged_used >= untagged_cap:
|
||||||
|
skipped["mx_cap:untagged"] = skipped.get("mx_cap:untagged", 0) + 1
|
||||||
|
continue
|
||||||
ok, reason = listmonk_sendable(email)
|
ok, reason = listmonk_sendable(email)
|
||||||
if not ok:
|
if not ok:
|
||||||
skipped[reason] = skipped.get(reason, 0) + 1
|
skipped[reason] = skipped.get(reason, 0) + 1
|
||||||
continue
|
continue
|
||||||
if prov:
|
if prov:
|
||||||
per_op[prov] = per_op.get(prov, 0) + 1
|
per_op[prov] = per_op.get(prov, 0) + 1
|
||||||
|
else:
|
||||||
|
untagged_used += 1
|
||||||
selected.append(row)
|
selected.append(row)
|
||||||
if len(selected) >= quota:
|
if len(selected) >= quota:
|
||||||
break
|
break
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue