healthcare: fix 4 bugs in segment-assignment + free-check email

Found during a bug-review pass of the one-email-per-provider work: 1. assign_all overwrite bug: an email on MULTIPLE rows (shared practice inbox / multiple NPIs -- 2,592 such emails, 299 with mixed status) was assigned by the LAST row, so a less-urgent row could clobber an urgent one (overdue -> free check). Now keeps the most-urgent (lowest-priority) assignment. 2. warm_segment double-import + wrong-row render: all of an email's rows passed the candidate filter, so it could be imported twice (over-counting the slice) and attribs_for could render a sibling row's blank due-date in the overdue email. Now requires row_matches(seg) for the specific row AND dedupes by email (one row per email). 3. free-check email rendered broken text ('last updated on -- about years ago', 'Last updated . ~ yrs ago') for any provider whose NPPES date isn't cached yet (the free check goes to everyone, and the fill is gradual). Wrapped the example sentence + official-record card in listmonk {{ if .nppes_last_updated }}...{{ else }}...{{ end }}; added a date-free else branch. altbody keeps the conditionals (listmonk evaluates body+altbody), and the test/preview renderer gained a minimal {{ if/else/end }} evaluator so previews match real sends. Verified both branches render with zero unfilled tokens. 4. cross-cron double-send: pw-hc-campaign (warmup file) and pw-hc-nppes (63k file) share state but tracked imports per-segment; 312 emails overlap both files, so a provider could get an urgent email from one cron AND the free check from the other. Added load_all_imported() global guard (union of all segment state) so each provider gets exactly one healthcare email overall. All verified: assignment regression test (10 cases) + new dup-email/guard checks pass; all 6 templates render clean.
2026-06-20 16:14:44 -05:00 · 2026-06-20 16:14:44 -05:00 · 1acae2f20c
commit 1acae2f20c
parent 0320dc17ba
3 changed files with 92 additions and 8 deletions
--- a/data/hc_campaigns/hc_nppes_outdated.html
+++ b/data/hc_campaigns/hc_nppes_outdated.html
@ -14,7 +14,7 @@
 <tr><td class="pw-pad" style="padding:28px;font-family:Inter,system-ui,sans-serif;color:#1f2937;">
  <p style="font-size:15px;margin:0 0 18px;line-height:1.5;">Hi {{ .Subscriber.Name }},</p>
  <h2 style="font-size:19px;margin:0 0 14px;color:#0f172a;line-height:1.3;">We pulled the public records for NPI {{ .Subscriber.Attribs.npi }} &mdash; here&rsquo;s a free check</h2>
-  <p style="font-size:14px;line-height:1.7;margin:0 0 18px;">As a quick example, the public NPPES NPI Registry shows the record for <strong>{{ .Subscriber.Attribs.practice }}</strong> was <strong>last updated on {{ .Subscriber.Attribs.nppes_last_updated }}</strong> &mdash; about <strong>{{ .Subscriber.Attribs.nppes_years_stale }} years ago</strong>. That&rsquo;s usually fine, but it&rsquo;s only one of several things payers and CMS check. Our free tool runs your NPI against the public government sources in one place &mdash; <strong>no signup, no cost</strong> &mdash; and tells you exactly where you stand.</p>
+  <p style="font-size:14px;line-height:1.7;margin:0 0 18px;">{{ if .Subscriber.Attribs.nppes_last_updated }}As a quick example, the public NPPES NPI Registry shows the record for <strong>{{ .Subscriber.Attribs.practice }}</strong> was <strong>last updated on {{ .Subscriber.Attribs.nppes_last_updated }}</strong> &mdash; about <strong>{{ .Subscriber.Attribs.nppes_years_stale }} years ago</strong>. That&rsquo;s usually fine, but it&rsquo;s only one of several things payers and CMS check. {{ else }}Your NPI touches several public government records &mdash; NPPES, the Medicare revalidation list, and the federal exclusion lists &mdash; and any one of them being off can hold up your payments. {{ end }}Our free tool runs your NPI against those public sources in one place &mdash; <strong>no signup, no cost</strong> &mdash; and tells you exactly where you stand.</p>

  <table role="presentation" width="100%" cellpadding="0" cellspacing="0" style="margin:22px 0;"><tr><td style="background:#f0fdfa;border:1px solid #99f6e4;border-radius:10px;padding:18px;">
    <p style="margin:0 0 10px;font-size:14px;color:#0f766e;font-weight:700;">Your free check covers:</p>
@ -31,7 +31,9 @@
    <div style="font-size:13px;color:#065f46;line-height:1.7;">Payers, clearinghouses, and CMS pull from NPPES. A stale address, taxonomy, or contact can cause <strong>claim denials, mail you never receive, and failed credentialing</strong>. CMS requires you to correct your NPPES record within 30 days of any change.</div>
  </td></tr></table>

-  <!-- Official-record card: NPPES is fully public, so this mirrors the registry. -->
+  <!-- Official-record card: NPPES is fully public, so this mirrors the registry.
+       Only shown when we have the real Last Updated date for this NPI. -->
+  {{ if .Subscriber.Attribs.nppes_last_updated }}
  <table role="presentation" width="100%" cellpadding="0" cellspacing="0" style="margin:22px 0;">
    <tr><td style="border:1px solid #cbd5e1;border-radius:10px;overflow:hidden;">
      <table role="presentation" width="100%" cellpadding="0" cellspacing="0">
@ -49,6 +51,7 @@
      </table>
    </td></tr>
  </table>
+  {{ end }}

  <!-- Free-first reassurance: the check is free; a fix is optional + flat-fee. -->
  <table role="presentation" width="100%" cellpadding="0" cellspacing="0" style="margin:18px 0;"><tr><td style="background:#f8fafc;border:1px solid #e5e7eb;border-radius:10px;padding:14px 18px;">
--- a/scripts/build_healthcare_campaigns.py
+++ b/scripts/build_healthcare_campaigns.py
@ -140,6 +140,32 @@ def template_path(seg_key: str) -> str:
    return os.path.join(OUT_DIR, SEGMENTS[seg_key]["template"])


+def _eval_conditionals(html: str, attribs: dict) -> str:
+    """Minimal evaluator for the listmonk/Go `{{ if .Subscriber.Attribs.X }}A
+    {{ else }}B{{ end }}` blocks used in the templates, so TEST/PREVIEW renders
+    match what listmonk produces at send time (listmonk itself evaluates these
+    server-side; this is only for the standalone preview/test-send path). Treats
+    an attribute as truthy when it is present and non-empty. Supports an optional
+    {{ else }} and is non-nested (which is all the templates use)."""
+    import re
+    pat = re.compile(
+        r"\{\{\s*if\s+\.Subscriber\.Attribs\.(\w+)\s*\}\}(.*?)"
+        r"(?:\{\{\s*else\s*\}\}(.*?))?\{\{\s*end\s*\}\}",
+        re.DOTALL,
+    )
+
+    def repl(m: "re.Match") -> str:
+        key, if_body, else_body = m.group(1), m.group(2), m.group(3) or ""
+        return if_body if str(attribs.get(key, "")).strip() else else_body
+
+    # Loop until stable so adjacent/multiple blocks all resolve.
+    prev = None
+    while prev != html:
+        prev = html
+        html = pat.sub(repl, html)
+    return html
+
+
 def render(seg_key: str, *, test: bool = False) -> tuple[str, str]:
    """Return (subject, html) for a segment. The html is the canonical
    data/hc_campaigns/<template> file -- the single source of truth. For test
@ -148,6 +174,9 @@ def render(seg_key: str, *, test: bool = False) -> tuple[str, str]:
    s = SEGMENTS[seg_key]
    html = open(template_path(seg_key)).read()
    if test:
+        # Resolve {{ if .Subscriber.Attribs.X }} blocks first (listmonk does this
+        # server-side on real sends), using SAMPLE as the attrib source.
+        html = _eval_conditionals(html, SAMPLE)
        html = (html
                .replace("{{ .Subscriber.Name }}", SAMPLE["name"])
                .replace("{{ .Subscriber.Attribs.npi }}", SAMPLE["npi"])
--- a/scripts/build_healthcare_campaigns_cron.py
+++ b/scripts/build_healthcare_campaigns_cron.py
@ -247,6 +247,24 @@ def save_imported(seg_key: str, emails: set[str]):
        f.write("\n".join(sorted(emails)) + "\n")


+def load_all_imported() -> set[str]:
+    """Union of EVERY segment's imported-emails state, i.e. everyone who has
+    already been emailed by ANY segment. Used as a cross-segment AND cross-cron
+    guard so a provider gets exactly one healthcare email overall: the two crons
+    (pw-hc-campaign on the small warmup file, pw-hc-nppes on the 63k institutional
+    file) share these state files, and ~312 emails overlap both files, so without
+    this a provider warmed as 'revalidation_overdue' by one cron could also be
+    warmed as the free 'nppes_outdated' check by the other. Reads all
+    hc_imported_*.txt plus the legacy single-segment file."""
+    seen: set[str] = set()
+    for key in SEGMENTS:
+        seen |= load_imported(key)
+    legacy = os.path.join(STATE_DIR, "hc_imported_emails.txt")
+    if os.path.exists(legacy):
+        seen |= {ln.strip().lower() for ln in open(legacy) if ln.strip()}
+    return seen
+
+
 def add_subscriber(list_id: int, email: str, name: str, attribs: dict) -> bool:
    try:
        lm("/subscribers", {
@ -410,14 +428,25 @@ def assign_segment(r: dict, active_segments: list[str]) -> str | None:

 def assign_all(rows: list[dict], active_segments: list[str]) -> dict[str, str]:
    """Map email -> assigned segment across the whole list, so each segment's
-    importer can claim only its assigned providers. Computed once per run."""
+    importer can claim only its assigned providers. Computed once per run.
+
+    An email can appear on MULTIPLE rows (a shared practice inbox covering
+    several NPIs, e.g. a credentialing address) and those rows can carry
+    DIFFERENT statuses (one NPI overdue, another not on the list). We must keep
+    the MOST-URGENT assignment across all of that email's rows -- otherwise a
+    later, less-urgent row would clobber an earlier urgent one and the provider
+    would get the free check instead of the overdue email. So we compare
+    priorities and keep the winner (lower number = more urgent)."""
    out: dict[str, str] = {}
    for r in rows:
        email = (r.get("email") or "").strip().lower()
        if not email:
            continue
        seg = assign_segment(r, active_segments)
-        if seg is not None:
+        if seg is None:
+            continue
+        prev = out.get(email)
+        if prev is None or _seg_priority(seg) < _seg_priority(prev):
            out[email] = seg
    return out

@ -472,24 +501,47 @@ def warm_segment(seg_key: str, rows: list[dict], slice_n: int,
    keeps working unchanged."""
    seg = SEGMENTS[seg_key]
    imported = load_imported(seg_key)
+    # Cross-segment + cross-cron guard: skip anyone already emailed by ANY
+    # segment so each provider gets exactly one healthcare email overall.
+    already_anywhere = load_all_imported()
    suppressed = load_suppressed()

    def _is_candidate(r: dict) -> bool:
        email = r.get("email", "").strip().lower()
-        if not email or email in imported or email in suppressed:
+        if not email or email in already_anywhere or email in suppressed:
            return False
        if _is_google_hosted(r):
            return False
        if assignment is not None:
-            return assignment.get(email) == seg_key
+            # The email must be assigned to THIS segment AND this specific row
+            # must be the one that earns it. An email can span several rows (a
+            # shared practice inbox over multiple NPIs); only the row whose own
+            # status matches this segment's selector should represent it, so the
+            # template renders that row's real data (e.g. the overdue NPI's due
+            # date, never a sibling 'not_on_list' row's blank one). This also
+            # dedupes: at most one row per email passes.
+            return assignment.get(email) == seg_key and row_matches(seg_key, r)
        return row_matches(seg_key, r)

-    candidates = [r for r in rows if _is_candidate(r)]
+    # Dedupe by email: an email can legitimately appear on multiple matching
+    # rows (e.g. two overdue NPIs share one inbox). Keep the first so the email
+    # is imported once and counted once against the slice budget.
+    candidates = []
+    seen_emails: set[str] = set()
+    for r in rows:
+        if not _is_candidate(r):
+            continue
+        email = r["email"].strip().lower()
+        if email in seen_emails:
+            continue
+        seen_emails.add(email)
+        candidates.append(r)
    # Spread the slice across MX operators so no single receiving system (e.g.
    # Microsoft 365) gets the whole batch. Caps ramp with the warmup day.
    todo = mx_throttled(candidates, slice_n, mx_daily_caps(warmup_day()))
    print(f"[hc-cron]   {seg_key}: candidates={len(candidates)} "
-          f"already={len(imported)} to_import={len(todo)}")
+          f"in_segment={len(imported)} emailed_anywhere={len(already_anywhere)} "
+          f"to_import={len(todo)}")

    if dry_run:
        for r in todo[:3]: