Commit graph

23 commits

Author SHA1 Message Date
justin
b350a1367d fix(hc-cron): stop re-mailing the whole list daily (per-day send lists)
The HC warmup imported ~1000 fresh providers/day into a persistent segment list
(list 10), but each day's campaign targeted that WHOLE cumulative list -- so the
cohort imported on day 1 received the identical 'Free NPI Check' email every
subsequent day (verified: subscriber 3410 got campaigns 38-42, 5x). Program-wide
that was 23,843 sends to 6,587 people (3.6x avg, max 30x) and blocklisted ~12%
of the list -- a Yahoo 'user complaints' deferral now confirms the burn.

Fix: import each day's slice into a dedicated dated SEND list and bind that day's
campaign to ONLY that list, so every provider gets exactly one send. The
persistent segment list is still used for dedup/records. ensure_campaign now
matches the exact dated name (never reuses a prior day's campaign/slice).
2026-06-26 12:19:41 -05:00
justin
1acae2f20c healthcare: fix 4 bugs in segment-assignment + free-check email
Found during a bug-review pass of the one-email-per-provider work:

1. assign_all overwrite bug: an email on MULTIPLE rows (shared practice inbox /
   multiple NPIs -- 2,592 such emails, 299 with mixed status) was assigned by
   the LAST row, so a less-urgent row could clobber an urgent one (overdue ->
   free check). Now keeps the most-urgent (lowest-priority) assignment.

2. warm_segment double-import + wrong-row render: all of an email's rows passed
   the candidate filter, so it could be imported twice (over-counting the slice)
   and attribs_for could render a sibling row's blank due-date in the overdue
   email. Now requires row_matches(seg) for the specific row AND dedupes by
   email (one row per email).

3. free-check email rendered broken text ('last updated on  -- about  years
   ago', 'Last updated  . ~ yrs ago') for any provider whose NPPES date isn't
   cached yet (the free check goes to everyone, and the fill is gradual). Wrapped
   the example sentence + official-record card in listmonk {{ if
   .nppes_last_updated }}...{{ else }}...{{ end }}; added a date-free else
   branch. altbody keeps the conditionals (listmonk evaluates body+altbody), and
   the test/preview renderer gained a minimal {{ if/else/end }} evaluator so
   previews match real sends. Verified both branches render with zero unfilled
   tokens.

4. cross-cron double-send: pw-hc-campaign (warmup file) and pw-hc-nppes (63k
   file) share state but tracked imports per-segment; 312 emails overlap both
   files, so a provider could get an urgent email from one cron AND the free
   check from the other. Added load_all_imported() global guard (union of all
   segment state) so each provider gets exactly one healthcare email overall.

All verified: assignment regression test (10 cases) + new dup-email/guard checks
pass; all 6 templates render clean.
2026-06-20 16:14:44 -05:00
justin
0320dc17ba healthcare: one-email-per-provider by urgency priority + free check as default
Make the free NPI compliance check the catch-all for ALL verified institutional
providers, but route anyone with a more important/time-sensitive issue to THAT
email instead -- each provider gets exactly one email, their most urgent.

- SEGMENTS gain a 'priority' (lower=more urgent): reactivation 10, revalidation
  overdue 20, due-soon 30, bundle 45, free-NPI-check 100 (catch-all).
- assign_segment()/assign_all(): route each provider to the single
  highest-priority active segment whose selector matches; warm_segment() takes
  the assignment map and only claims its assigned providers (disjoint pools, no
  double-mailing). main() now splits the daily slice by priority order, serving
  urgent segments fully before the broad free-check consumes the remainder.
- nppes_outdated selector -> 'institutional_default' (every verified, non-
  deactivated row), since the free check's value no longer depends on staleness;
  list/campaign renamed 'HC Warmup - Free NPI Check'.
- FIX latent bug: reactivation selector treated 'not on CMS reval list' as
  deactivated -- false for org NPIs (would mis-tell active practices they're
  deactivated). Now uses the REAL nppes_deactivated flag (or OIG/SAM exclusion).
- Drop blanket oig_screening from the default rotation: it matched every row and
  would starve the catch-all, and the free check already screens OIG/SAM and
  routes to the paid fix on a hit. Still runnable via --segments.
- Add scripts/test_segment_assignment.py (10 cases incl. 'overdue AND stale ->
  overdue wins'); all pass.
2026-06-20 16:01:23 -05:00
justin
744f0a89cf healthcare: bound NPPES-stale window [3,10]yr + restore verify_ok gate
- Add NPPES_STALE_MAX_YEARS (default 10): a record untouched for many years is
  a stronger signal the practice closed/moved, and a bounce burns the warming
  IP. Observed institutional distribution clusters 3-7yrs with ~0 beyond 8, so
  10 is a safe ceiling that mails the whole real pool while excluding any
  outlier ancient record. MIN stays 3 (keeps the 'out of date' claim credible).
- Restore the SMTP-verification gate (verify_ok) that the shared
  institutional_verified selector had -- the swap to nppes_stale dropped it; we
  only mail inboxes we already proved live.
- enrich: process the re-fetch queue STALEST-FIRST so a bounded (--limit) or
  --max-age refresh spends its budget on the most-overdue cache entries (and new
  NPIs) first, never starving them behind merely-aging ones.
- Selector unit-tested (10 cases incl. window edges, verify gate, deactivated).
2026-06-20 15:28:12 -05:00
justin
9e155d214c healthcare: cite REAL NPPES last_updated date in 'outdated' email
The NPPES 'may be out of date' email previously asserted staleness with no
per-record evidence (softened earlier to a generic 'periodic review required').
NPPES is fully public and every record carries basic.last_updated, so we now
cite the actual government date the provider can verify on the registry.

- enrich_nppes_last_updated.py: joins real basic.last_updated /
  enumeration_date / deactivated onto the institutional list via a cached,
  resumable per-NPI crawl (no batch endpoint exists). Adds nppes_last_updated,
  nppes_enumeration, nppes_years_stale, nppes_deactivated.
- cron: new 'nppes_stale' selector mails ONLY records >= 3yrs stale (env
  HC_NPPES_STALE_MIN_YEARS) and excludes deactivated NPIs; empty date => no
  match, so we never claim staleness without the government date to back it.
- template: headline + official-record card now show the real last_updated
  date and ~N-years-ago, sourced to npiregistry.cms.hhs.gov.
- attribs + test SAMPLE expose the new fields; verified render + plaintext.
2026-06-20 15:21:15 -05:00
justin
5c3b4291e7 feat(deliverability): send bulk campaigns from dedicated subdomain send.performancewest.net
Isolates bulk sending reputation onto a dedicated subdomain so the root domain
stays clean for transactional/verification mail (and recovers faster). Replies
still go to the root domain via Reply-To, so the customer-facing reply experience
is unchanged.

- build_trucking_campaigns.py: add env-overridable FROM_EMAIL
  (noreply@send.performancewest.net); use it for both scheduled + test sends
  instead of inheriting base["from_email"] from the DB base campaign.
- build_healthcare_campaigns_cron.py: FROM_EMAIL ->
  compliance@send.performancewest.net (env-overridable).
- bounce-watcher.sh / hc-bounce-watcher.sh: track the new subdomain envelope
  sender (keep legacy root-domain sender so the pre-cutover queue still drains;
  HC also tracks by hcout transport regardless of sender).

Infra already live (separate, non-code): subdomain DNS (A/MX/SPF/DKIM
selector=send/DMARC p=reject) on the Hestia master, OpenDKIM signs
d=send.performancewest.net (verified end-to-end), egress .94/.107. Root SPF
trimmed to the real IPs; pointless IP-rehab cron disabled.
2026-06-18 23:07:23 -05:00
justin
a32a3b05a0 email: add plaintext MIME part + stable Message-ID hostname
Two deliverability hardening fixes from the email audit:

1. Plaintext (altbody): all campaigns were HTML-only. Listmonk only emits
   multipart/alternative when altbody is set, and HTML-only bulk mail is a
   spam-score signal. New scripts/_email_plaintext.py renders a readable
   text/plain part from the HTML body (dependency-free; preserves Listmonk
   {{ .Subscriber }}/{{ UnsubscribeURL }} template tags, turns links into
   'text (url)'). Wired into the trucking builder (and thus UCR + IFTA, which
   reuse create_and_schedule_campaign) and the healthcare builder.

2. Stable container hostname: Listmonk derived its Message-ID from the random
   docker container id -> @localhost.localdomain (spam-score signal). Pin both
   listmonk + listmonk-hc hostname to perfwest.performancewest.net, matching
   Listmonk's SMTP hello_hostname.

Part of the email-deliverability incident hardening.
2026-06-17 20:09:02 -05:00
justin
2caab6aa69 hc: warmup must run DAILY for the full 21-day ramp (not weekdays-only)
The HC warmup crons were '* * 1-5' (Mon-Fri), silently skipping weekends -- but a
proper warmup needs CONTINUOUS daily volume for 21 days (mailbox providers reward
consistency; gaps stall reputation). The Jun 14 'HC 0 sent' alert was just a
skipped Sunday, but the weekend skips also broke ramp continuity.

- pw-hc-campaign + pw-hc-nppes: '* * 1-5' -> '* * *' (daily), vendored + applied live.
- Re-aligned the warmup start stamp from calendar-day 9 to send-day 5 so the
  volume ramp matches reputation actually built (it had skipped ~4 weekend days,
  running the ramp ahead of real history).
- Fixed the stale 'Mon-Fri only' comment in daily_slice().
- Vendored nppes cron now carries the enriched-CSV + 4-segment config.
2026-06-14 21:02:08 -05:00
justin
b73edadb89 hc: unlock the full 62k verified institutional pool for broad offers
The OIG-screening + NPPES-update segments were effectively limited to ~1,437
providers because the warmup 'any' selector excluded not-on-reval-list rows as a
deliverability proxy -- but that excludes almost the ENTIRE institutional list
(org NPIs aren't individual Medicare enrollees). Since we already SMTP-verified
all 63k inboxes, add an 'institutional_verified' selector that trusts our own
verification instead of reval-list presence. Result: OIG + NPPES-update now
address 62,422 (43x more), giving multiple broad offers to test engagement on.

- enrich_institutional_revalidation.py: fast local join of the institutional
  list to the CMS Revalidation Due Date List bulk file (revalidation_base.csv)
  by NPI -> adds reval_due_date/days_overdue/reval_status. ~1,437 are genuine
  Medicare enrollees (197 overdue / 164 due-soon) -> flagship $599 reval pitch.
- npi_reactivation stays on leie_or_deactivated (only REAL deactivations -- no
  false 'your NPI is deactivated' claims to active orgs).
2026-06-14 01:07:40 -05:00
justin
3f7ecf9d13 hc: persist mx_provider on imported subscribers (per-operator audit)
So we can verify/analyze the per-MX-operator throttle distribution from the
listmonk DB after import (and re-throttle future segment membership).
2026-06-12 22:28:49 -05:00
justin
5237c81385 hc: per-MX-operator warmup throttle (spread load across receiving systems)
Reputation is tracked per receiving mail operator, not per recipient domain, so
the daily warmup slice is now distributed across MX operators with per-operator
daily caps (ramping with the warmup day): Microsoft/Google/Proofpoint/etc. capped
individually, long-tail operators each get a generous default. This lets total
daily volume be much higher than a flat cap without hammering any single system.
mx_throttled() respects the mx_provider column the verifier now writes; falls back
to flat slicing if absent.
2026-06-12 22:09:29 -05:00
justin
6c8c823e5e hc: refresh attribs when cross-adding an existing subscriber to a segment
add_subscriber only attached an already-existing subscriber to the new list
without updating attribs, so the due-soon template's days_until merge field was
blank for providers already imported by another segment. Now PUT the merged
attribs (existing + this segment's npi/practice/due-date/days_until) before
adding to the list.
2026-06-12 19:37:01 -05:00
justin
c8c9a04c1d hc: add 'revalidation due soon' warmup segment (proactive, grows supply)
The HC warmup pool is supply-constrained (~400 verified providers, all fed by
the same narrow 'revalidation 1-90 days OVERDUE' slice). This adds a mirror-image
proactive segment that targets providers whose Medicare revalidation is UPCOMING
within the next 1-90 days, drawn from the same CMS Revalidation Due Date List --
no new data source needed. 'Handle it before your deadline' is a strong pitch and
roughly doubles the deliverable pool.

- New selector reval_due_soon (status=upcoming, days_until in [HC_DUE_SOON_MIN,
  HC_DUE_SOON_MAX] default 1-90).
- New segment revalidation_due_soon reusing the existing /order/npi-revalidation
  service ($599) with template hc_revalidation_due_soon.html.
- attribs_for now exposes days_until (positive days to due date).
- Added to ACTIVE_SEGMENTS.
2026-06-12 19:33:49 -05:00
justin
a78d60a127 hc: auto-reply for 'already revalidated' replies + permanent suppression
A lead replied with proof their Medicare revalidation was already approved (CMS
data-lag: the public Revalidation Due Date List still showed them overdue weeks
after approval). Two of these arrived same-day, so:

- Carbonio auto-reply (deployed on co.carrierone.com): created mailbox
  hc-replies@ on the info@ distribution list with a Sieve that auto-acknowledges
  'my revalidation is already complete' replies (tag + mark read + file into a
  'Reval Completed (auto-acked)' folder + on-brand reply explaining the CMS lag).
  CRITICAL: info@ is the shared reply-to for ALL campaigns (healthcare, trucking,
  telecom), so every rule is anchored to Medicare/revalidation context -- a
  trucking 'MCS-150 done, this is bogus' or telecom 'RMD done' reply does NOT
  trigger it (tested + passing). A buyer guard ('please file / how much') also
  suppresses the auto-reply so a human handles the sale.
  Carbonio 25.x Sieve quirks documented (vacation/imap4flags/body :text all
  unsupported; use reply/flag/tag/body :contains).

- Permanent suppression: new data/hc_suppress.txt do-not-contact list the warmup
  honors at import AND --prune removes from the live lists. Seeded with the two
  completed providers (Pangea Lab, Yakima Valley FWC); both also blocklisted in
  listmonk_hc and removed from lists 3 + 4.
2026-06-08 10:37:49 -05:00
justin
9cb10b18e0 feat(hc): deliverability prune -- evict newly-Google-hosted subscribers
Belt-and-suspenders for the edge you flagged: a domain already in a warmup list
could flip its MX to Google Workspace between weekly refreshes, after which it
would hard-bounce from the cold IP. The import-time guard only catches NEW adds.

- prune_holdouts(): enumerates each warmup list's subscribers, matches them
  against the FRESH master CSV (re-classified weekly), and removes any whose
  domain is now Google-hosted. DELIVERABILITY-ONLY -- it never evicts for
  audience reasons (an overdue provider drifting out of the 1-90 day window was
  a valid target when warmed; re-litigating that just wastes warmup progress).
- --prune (run alongside warming) and --prune-only (prune then exit).
- Wired into the weekly refresh cron as a --prune-only chained step, so MX is
  re-checked and holdouts removed every Monday before the weekday sends.

Verified end-to-end: with no Google domains in lists it's a 0-op; injecting a
simulated Google-flipped domain into the master, the prune correctly detects and
(in a real run) would remove it from every list it's on.
2026-06-08 03:39:56 -05:00
justin
54b92b1f06 fix(hc deliverability): MX-based Google-host exclusion during warmup
Found via live mail.log: Google-Workspace-hosted PRACTICE domains (custom
domains whose MX is aspmx.l.google.com, e.g. moosepharmacy.com, hc2kidney.com)
were getting hard 550-5.7.1 rejects from Google's cold-IP bulk filter -- exactly
the bounces that wreck a warming IP's reputation. The original google/non-google
split classified by the email's domain STRING, which can't see that a custom
domain silently uses Google Workspace; only an MX lookup reveals it (33% of our
domains, 228/689, are Google-hosted this way).

- hc_data_refresh.py: new MX classification (one lookup per unique domain via
  dnspython, cached) writes an mx_provider=google/other flag into the master and
  propagates it into the channel CSVs (auto-adding the column). --skip-mx for a
  fast status-only run.
- build_healthcare_campaigns_cron.py: warm_segment now drops mx_provider=google
  rows during warmup (HC_SKIP_GOOGLE=1 default; set 0 once IPs are warm). This is
  defense-in-depth -- correct regardless of which CSV the cron is pointed at.

Verified: today's sends (nongoogle CSV) had 0 Google bounces; the guard cuts the
Google-containing week1_verified cohort's revalidation candidates 82->8.
2026-06-08 03:32:12 -05:00
justin
feb677f6ce fix(hc warmup): only mail slightly-overdue providers (deliverability)
Mailing heavily-overdue NPIs (months/years past due) risks hitting practices
that have closed, merged, or abandoned the inbox -> hard bounces, which are the
fastest way to wreck a warming IP's reputation. The warmup now restricts the
reval_overdue selector to an inclusive [HC_OVERDUE_MIN, HC_OVERDUE_MAX] window
(default 1-90 days) and the OIG 'any' selector likewise excludes heavily-overdue
and dropped-off-list rows. On the current cohort this trims the overdue audience
178->96 and the OIG audience 399->317, holding out the stale long tail
(181-365d + 366d+). upcoming/active providers are unaffected.
2026-06-08 03:27:22 -05:00
justin
c79a7715e1 fix(hc): bugs found in self-audit of the new refresh + warmup + templates
Refresh (hc_data_refresh.py):
- CRITICAL: drop optout_ending from REFRESHED_FIELDS -- the refresh never
  computes it, so propagating it blanked the channel CSVs and would starve the
  compliance_bundle segment (whose selector IS optout_ending).
- MAJOR: only rewrite leie_excluded when OIG was actually pulled (guard was
  'not skip_oig OR not skip_sam', so a --skip-oig run blanked all exclusion
  flags). Also write 'Y' (matching the original list builder) not '1'.
- Use 'no_reval_flag' (the original vocabulary) instead of 'not_on_list' when an
  NPI drops off the reval list, and clear reval_due_date too.
- Throttle politeness: move time.sleep(0.05) above the early-continue paths so
  EVERY CMS request is spaced, not just the minority that are on the list.
- Guard blank-NPI rows (leave their status untouched instead of mislabeling).
- Master write preserves any columns beyond HEADER (no silent column drop).

Warmup cron (build_healthcare_campaigns_cron.py):
- Fix the daily-slice split: it summed to less than the budget (dropped ~2/day)
  and could OVERSHOOT on tiny totals (each 'other' floored to >=1). Now uses
  divmod for an even remainder and reclaims rounding onto the lead, so
  sum(per_seg) == total_slice exactly for every input (verified 0,1,2,7,100,300).

Templates: the non-revalidation emails rendered {{ .Subscriber.Attribs.detail }}
(a reval due date) under a 'Practice'/'Status'/'Record' label -- a wrong/
confusing personalization on a live send (esp. OIG, selector 'any'). All four
now show the practice name; 'detail' is retired from rendering (revalidation
uses reval_due_date/days_overdue directly).
2026-06-08 03:23:47 -05:00
justin
4f455475c0 hc: weekly data-refresh pipeline + multi-segment warmup cron
Two gaps closed:

1. hc_data_refresh.py (NEW): weekly source-data refresh. Re-checks every
   emailable NPI against the LIVE government sources so sends never go stale:
   - CMS Revalidation Due Date List (data.cms.gov per-NPI API; handles both ISO
     and US date formats, normalizes to MM/DD/YYYY).
   - OIG LEIE full CSV download (the NPI-bearing exclusion source).
   - SAM.gov v4 exclusions (key in .secrets/sam-api-key) -- OFF by default since
     SAM exclusions rarely carry an NPI and the full set is ~167k records; it's
     opt-in via --sam-pages. SAM's real value is the live per-name screening
     service, not a bulk NPI join.
   Writes the master CSV atomically (temp+rename). A provider who has since
   revalidated flips overdue->upcoming/not_on_list, so we stop nagging them.

2. build_healthcare_campaigns_cron.py: was revalidation-only (one hardcoded
   list/campaign/CSV/template). Now multi-segment: imports SEGMENTS from the
   single-source-of-truth registry, warms ALL five programs in parallel, each
   with its own list, dated campaign, and per-segment import-state file (so
   dedup is per-segment). A  per segment maps master-CSV rows to the
   right program (reval_overdue / reval_upcoming / leie_or_deactivated /
   optout_ending / any). Daily ramp slice is split across segments (revalidation
   leads at 50%, rest share the remainder) so every program collects engagement
   data while the IPs warm. Back-compat: seeds revalidation import-state from the
   legacy hc_imported_emails.txt once.
2026-06-08 03:06:29 -05:00
justin
483f185861 feat(healthcare): prove revalidation is real via official CMS data + self-verify
Skepticism ("is this even real?") is the top objection. The data IS accurate
(verified our subscribers' NPIs match the official CMS Revalidation Due Date List
exactly), so this is a credibility-presentation fix:

1. Email: replace the plain detail row with an "Official record - CMS Medicare
   Revalidation Due Date List" card (NPI, legal name, due date, days overdue)
   plus a "Verify on CMS.gov" button. Clearly labeled as our presentation of
   public CMS data, not a CMS screenshot (no impersonation).
2. API: npi/lookup now pulls the revalidation due date LIVE from the public CMS
   dataset (data.cms.gov) instead of the empty local table, and returns a
   revalidation{ due_date, source, cms_legal_name, verify_url } proof object.
3. Tool: /tools/npi-compliance-check shows a live "official record" card with a
   self-verify link when CMS returns a due date.

Builder now stores reval_due_date/days_overdue as separate attribs for the card
(existing 194 subscribers backfilled from their detail string).
2026-06-07 23:54:01 -05:00
justin
4233c90a4f hc email: reframe value-add to 'No 2FA. No government portals.' (we have a portal; the pain is CMS 2FA/identity-proofing); cron creates fresh dated campaign when prior is finished; add hc bounce watcher (Postfix->listmonk-hc webhook, hard/complaint->blocklist) 2026-06-06 16:47:12 -05:00
justin
95698852ce healthcare warmup: gate Google/Workspace domains out of week 1 (they hard-reject cold IPs 550-5.7.1); send 501 non-Google practice domains first, defer 222 Google to week 2-3; cron uses hc_warmup_nongoogle.csv 2026-06-06 04:02:00 -05:00
justin
2bc86268f7 healthcare: HC warmup campaign cron (Mon-Fri 7AM Central) - imports overdue-first verified slice into listmonk-hc + runs Medicare-revalidation campaign via hc HOT stream; rate-throttled by pw-hc-rampcap 2026-06-06 03:57:08 -05:00