healthcare: fix 4 bugs in segment-assignment + free-check email

Found during a bug-review pass of the one-email-per-provider work:

1. assign_all overwrite bug: an email on MULTIPLE rows (shared practice inbox /
   multiple NPIs -- 2,592 such emails, 299 with mixed status) was assigned by
   the LAST row, so a less-urgent row could clobber an urgent one (overdue ->
   free check). Now keeps the most-urgent (lowest-priority) assignment.

2. warm_segment double-import + wrong-row render: all of an email's rows passed
   the candidate filter, so it could be imported twice (over-counting the slice)
   and attribs_for could render a sibling row's blank due-date in the overdue
   email. Now requires row_matches(seg) for the specific row AND dedupes by
   email (one row per email).

3. free-check email rendered broken text ('last updated on  -- about  years
   ago', 'Last updated  . ~ yrs ago') for any provider whose NPPES date isn't
   cached yet (the free check goes to everyone, and the fill is gradual). Wrapped
   the example sentence + official-record card in listmonk {{ if
   .nppes_last_updated }}...{{ else }}...{{ end }}; added a date-free else
   branch. altbody keeps the conditionals (listmonk evaluates body+altbody), and
   the test/preview renderer gained a minimal {{ if/else/end }} evaluator so
   previews match real sends. Verified both branches render with zero unfilled
   tokens.

4. cross-cron double-send: pw-hc-campaign (warmup file) and pw-hc-nppes (63k
   file) share state but tracked imports per-segment; 312 emails overlap both
   files, so a provider could get an urgent email from one cron AND the free
   check from the other. Added load_all_imported() global guard (union of all
   segment state) so each provider gets exactly one healthcare email overall.

All verified: assignment regression test (10 cases) + new dup-email/guard checks
pass; all 6 templates render clean.
This commit is contained in:
justin 2026-06-20 16:14:44 -05:00
parent 0320dc17ba
commit 1acae2f20c
3 changed files with 92 additions and 8 deletions

View file

@ -14,7 +14,7 @@
<tr><td class="pw-pad" style="padding:28px;font-family:Inter,system-ui,sans-serif;color:#1f2937;">
<p style="font-size:15px;margin:0 0 18px;line-height:1.5;">Hi {{ .Subscriber.Name }},</p>
<h2 style="font-size:19px;margin:0 0 14px;color:#0f172a;line-height:1.3;">We pulled the public records for NPI {{ .Subscriber.Attribs.npi }} &mdash; here&rsquo;s a free check</h2>
<p style="font-size:14px;line-height:1.7;margin:0 0 18px;">As a quick example, the public NPPES NPI Registry shows the record for <strong>{{ .Subscriber.Attribs.practice }}</strong> was <strong>last updated on {{ .Subscriber.Attribs.nppes_last_updated }}</strong> &mdash; about <strong>{{ .Subscriber.Attribs.nppes_years_stale }} years ago</strong>. That&rsquo;s usually fine, but it&rsquo;s only one of several things payers and CMS check. Our free tool runs your NPI against the public government sources in one place &mdash; <strong>no signup, no cost</strong> &mdash; and tells you exactly where you stand.</p>
<p style="font-size:14px;line-height:1.7;margin:0 0 18px;">{{ if .Subscriber.Attribs.nppes_last_updated }}As a quick example, the public NPPES NPI Registry shows the record for <strong>{{ .Subscriber.Attribs.practice }}</strong> was <strong>last updated on {{ .Subscriber.Attribs.nppes_last_updated }}</strong> &mdash; about <strong>{{ .Subscriber.Attribs.nppes_years_stale }} years ago</strong>. That&rsquo;s usually fine, but it&rsquo;s only one of several things payers and CMS check. {{ else }}Your NPI touches several public government records &mdash; NPPES, the Medicare revalidation list, and the federal exclusion lists &mdash; and any one of them being off can hold up your payments. {{ end }}Our free tool runs your NPI against those public sources in one place &mdash; <strong>no signup, no cost</strong> &mdash; and tells you exactly where you stand.</p>
<table role="presentation" width="100%" cellpadding="0" cellspacing="0" style="margin:22px 0;"><tr><td style="background:#f0fdfa;border:1px solid #99f6e4;border-radius:10px;padding:18px;">
<p style="margin:0 0 10px;font-size:14px;color:#0f766e;font-weight:700;">Your free check covers:</p>
@ -31,7 +31,9 @@
<div style="font-size:13px;color:#065f46;line-height:1.7;">Payers, clearinghouses, and CMS pull from NPPES. A stale address, taxonomy, or contact can cause <strong>claim denials, mail you never receive, and failed credentialing</strong>. CMS requires you to correct your NPPES record within 30 days of any change.</div>
</td></tr></table>
<!-- Official-record card: NPPES is fully public, so this mirrors the registry. -->
<!-- Official-record card: NPPES is fully public, so this mirrors the registry.
Only shown when we have the real Last Updated date for this NPI. -->
{{ if .Subscriber.Attribs.nppes_last_updated }}
<table role="presentation" width="100%" cellpadding="0" cellspacing="0" style="margin:22px 0;">
<tr><td style="border:1px solid #cbd5e1;border-radius:10px;overflow:hidden;">
<table role="presentation" width="100%" cellpadding="0" cellspacing="0">
@ -49,6 +51,7 @@
</table>
</td></tr>
</table>
{{ end }}
<!-- Free-first reassurance: the check is free; a fix is optional + flat-fee. -->
<table role="presentation" width="100%" cellpadding="0" cellspacing="0" style="margin:18px 0;"><tr><td style="background:#f8fafc;border:1px solid #e5e7eb;border-radius:10px;padding:14px 18px;">

View file

@ -140,6 +140,32 @@ def template_path(seg_key: str) -> str:
return os.path.join(OUT_DIR, SEGMENTS[seg_key]["template"])
def _eval_conditionals(html: str, attribs: dict) -> str:
"""Minimal evaluator for the listmonk/Go `{{ if .Subscriber.Attribs.X }}A
{{ else }}B{{ end }}` blocks used in the templates, so TEST/PREVIEW renders
match what listmonk produces at send time (listmonk itself evaluates these
server-side; this is only for the standalone preview/test-send path). Treats
an attribute as truthy when it is present and non-empty. Supports an optional
{{ else }} and is non-nested (which is all the templates use)."""
import re
pat = re.compile(
r"\{\{\s*if\s+\.Subscriber\.Attribs\.(\w+)\s*\}\}(.*?)"
r"(?:\{\{\s*else\s*\}\}(.*?))?\{\{\s*end\s*\}\}",
re.DOTALL,
)
def repl(m: "re.Match") -> str:
key, if_body, else_body = m.group(1), m.group(2), m.group(3) or ""
return if_body if str(attribs.get(key, "")).strip() else else_body
# Loop until stable so adjacent/multiple blocks all resolve.
prev = None
while prev != html:
prev = html
html = pat.sub(repl, html)
return html
def render(seg_key: str, *, test: bool = False) -> tuple[str, str]:
"""Return (subject, html) for a segment. The html is the canonical
data/hc_campaigns/<template> file -- the single source of truth. For test
@ -148,6 +174,9 @@ def render(seg_key: str, *, test: bool = False) -> tuple[str, str]:
s = SEGMENTS[seg_key]
html = open(template_path(seg_key)).read()
if test:
# Resolve {{ if .Subscriber.Attribs.X }} blocks first (listmonk does this
# server-side on real sends), using SAMPLE as the attrib source.
html = _eval_conditionals(html, SAMPLE)
html = (html
.replace("{{ .Subscriber.Name }}", SAMPLE["name"])
.replace("{{ .Subscriber.Attribs.npi }}", SAMPLE["npi"])

View file

@ -247,6 +247,24 @@ def save_imported(seg_key: str, emails: set[str]):
f.write("\n".join(sorted(emails)) + "\n")
def load_all_imported() -> set[str]:
"""Union of EVERY segment's imported-emails state, i.e. everyone who has
already been emailed by ANY segment. Used as a cross-segment AND cross-cron
guard so a provider gets exactly one healthcare email overall: the two crons
(pw-hc-campaign on the small warmup file, pw-hc-nppes on the 63k institutional
file) share these state files, and ~312 emails overlap both files, so without
this a provider warmed as 'revalidation_overdue' by one cron could also be
warmed as the free 'nppes_outdated' check by the other. Reads all
hc_imported_*.txt plus the legacy single-segment file."""
seen: set[str] = set()
for key in SEGMENTS:
seen |= load_imported(key)
legacy = os.path.join(STATE_DIR, "hc_imported_emails.txt")
if os.path.exists(legacy):
seen |= {ln.strip().lower() for ln in open(legacy) if ln.strip()}
return seen
def add_subscriber(list_id: int, email: str, name: str, attribs: dict) -> bool:
try:
lm("/subscribers", {
@ -410,14 +428,25 @@ def assign_segment(r: dict, active_segments: list[str]) -> str | None:
def assign_all(rows: list[dict], active_segments: list[str]) -> dict[str, str]:
"""Map email -> assigned segment across the whole list, so each segment's
importer can claim only its assigned providers. Computed once per run."""
importer can claim only its assigned providers. Computed once per run.
An email can appear on MULTIPLE rows (a shared practice inbox covering
several NPIs, e.g. a credentialing address) and those rows can carry
DIFFERENT statuses (one NPI overdue, another not on the list). We must keep
the MOST-URGENT assignment across all of that email's rows -- otherwise a
later, less-urgent row would clobber an earlier urgent one and the provider
would get the free check instead of the overdue email. So we compare
priorities and keep the winner (lower number = more urgent)."""
out: dict[str, str] = {}
for r in rows:
email = (r.get("email") or "").strip().lower()
if not email:
continue
seg = assign_segment(r, active_segments)
if seg is not None:
if seg is None:
continue
prev = out.get(email)
if prev is None or _seg_priority(seg) < _seg_priority(prev):
out[email] = seg
return out
@ -472,24 +501,47 @@ def warm_segment(seg_key: str, rows: list[dict], slice_n: int,
keeps working unchanged."""
seg = SEGMENTS[seg_key]
imported = load_imported(seg_key)
# Cross-segment + cross-cron guard: skip anyone already emailed by ANY
# segment so each provider gets exactly one healthcare email overall.
already_anywhere = load_all_imported()
suppressed = load_suppressed()
def _is_candidate(r: dict) -> bool:
email = r.get("email", "").strip().lower()
if not email or email in imported or email in suppressed:
if not email or email in already_anywhere or email in suppressed:
return False
if _is_google_hosted(r):
return False
if assignment is not None:
return assignment.get(email) == seg_key
# The email must be assigned to THIS segment AND this specific row
# must be the one that earns it. An email can span several rows (a
# shared practice inbox over multiple NPIs); only the row whose own
# status matches this segment's selector should represent it, so the
# template renders that row's real data (e.g. the overdue NPI's due
# date, never a sibling 'not_on_list' row's blank one). This also
# dedupes: at most one row per email passes.
return assignment.get(email) == seg_key and row_matches(seg_key, r)
return row_matches(seg_key, r)
candidates = [r for r in rows if _is_candidate(r)]
# Dedupe by email: an email can legitimately appear on multiple matching
# rows (e.g. two overdue NPIs share one inbox). Keep the first so the email
# is imported once and counted once against the slice budget.
candidates = []
seen_emails: set[str] = set()
for r in rows:
if not _is_candidate(r):
continue
email = r["email"].strip().lower()
if email in seen_emails:
continue
seen_emails.add(email)
candidates.append(r)
# Spread the slice across MX operators so no single receiving system (e.g.
# Microsoft 365) gets the whole batch. Caps ramp with the warmup day.
todo = mx_throttled(candidates, slice_n, mx_daily_caps(warmup_day()))
print(f"[hc-cron] {seg_key}: candidates={len(candidates)} "
f"already={len(imported)} to_import={len(todo)}")
f"in_segment={len(imported)} emailed_anywhere={len(already_anywhere)} "
f"to_import={len(todo)}")
if dry_run:
for r in todo[:3]: