Commit graph

2 commits

Author SHA1 Message Date
justin
85dc3d5c3b hc refresh: propagate fresh status into the channel CSVs the cron reads
The channel CSVs (hc_warmup_nongoogle/google/week1_verified) are email-keyed
subsets of the master with extra deliverability columns (verify_ok/verify_reason).
The refresh now writes the fresh status fields (reval_due_date, days_overdue,
reval_status, leie_excluded, optout_ending, name/specialty/state) back into each,
preserving the extra columns and row membership, so a single weekly run updates
everything the campaign cron consumes -- not just the master.
2026-06-08 03:13:00 -05:00
justin
4f455475c0 hc: weekly data-refresh pipeline + multi-segment warmup cron
Two gaps closed:

1. hc_data_refresh.py (NEW): weekly source-data refresh. Re-checks every
   emailable NPI against the LIVE government sources so sends never go stale:
   - CMS Revalidation Due Date List (data.cms.gov per-NPI API; handles both ISO
     and US date formats, normalizes to MM/DD/YYYY).
   - OIG LEIE full CSV download (the NPI-bearing exclusion source).
   - SAM.gov v4 exclusions (key in .secrets/sam-api-key) -- OFF by default since
     SAM exclusions rarely carry an NPI and the full set is ~167k records; it's
     opt-in via --sam-pages. SAM's real value is the live per-name screening
     service, not a bulk NPI join.
   Writes the master CSV atomically (temp+rename). A provider who has since
   revalidated flips overdue->upcoming/not_on_list, so we stop nagging them.

2. build_healthcare_campaigns_cron.py: was revalidation-only (one hardcoded
   list/campaign/CSV/template). Now multi-segment: imports SEGMENTS from the
   single-source-of-truth registry, warms ALL five programs in parallel, each
   with its own list, dated campaign, and per-segment import-state file (so
   dedup is per-segment). A  per segment maps master-CSV rows to the
   right program (reval_overdue / reval_upcoming / leie_or_deactivated /
   optout_ending / any). Daily ramp slice is split across segments (revalidation
   leads at 50%, rest share the remainder) so every program collects engagement
   data while the IPs warm. Back-compat: seeds revalidation import-state from the
   legacy hc_imported_emails.txt once.
2026-06-08 03:06:29 -05:00