new-site

justin/new-site

Fork 0

Commit graph

Author	SHA1	Message	Date
justin	c79a7715e1	fix(hc): bugs found in self-audit of the new refresh + warmup + templates Refresh (hc_data_refresh.py): - CRITICAL: drop optout_ending from REFRESHED_FIELDS -- the refresh never computes it, so propagating it blanked the channel CSVs and would starve the compliance_bundle segment (whose selector IS optout_ending). - MAJOR: only rewrite leie_excluded when OIG was actually pulled (guard was 'not skip_oig OR not skip_sam', so a --skip-oig run blanked all exclusion flags). Also write 'Y' (matching the original list builder) not '1'. - Use 'no_reval_flag' (the original vocabulary) instead of 'not_on_list' when an NPI drops off the reval list, and clear reval_due_date too. - Throttle politeness: move time.sleep(0.05) above the early-continue paths so EVERY CMS request is spaced, not just the minority that are on the list. - Guard blank-NPI rows (leave their status untouched instead of mislabeling). - Master write preserves any columns beyond HEADER (no silent column drop). Warmup cron (build_healthcare_campaigns_cron.py): - Fix the daily-slice split: it summed to less than the budget (dropped ~2/day) and could OVERSHOOT on tiny totals (each 'other' floored to >=1). Now uses divmod for an even remainder and reclaims rounding onto the lead, so sum(per_seg) == total_slice exactly for every input (verified 0,1,2,7,100,300). Templates: the non-revalidation emails rendered {{ .Subscriber.Attribs.detail }} (a reval due date) under a 'Practice'/'Status'/'Record' label -- a wrong/ confusing personalization on a live send (esp. OIG, selector 'any'). All four now show the practice name; 'detail' is retired from rendering (revalidation uses reval_due_date/days_overdue directly).	2026-06-08 03:23:47 -05:00
justin	85dc3d5c3b	hc refresh: propagate fresh status into the channel CSVs the cron reads The channel CSVs (hc_warmup_nongoogle/google/week1_verified) are email-keyed subsets of the master with extra deliverability columns (verify_ok/verify_reason). The refresh now writes the fresh status fields (reval_due_date, days_overdue, reval_status, leie_excluded, optout_ending, name/specialty/state) back into each, preserving the extra columns and row membership, so a single weekly run updates everything the campaign cron consumes -- not just the master.	2026-06-08 03:13:00 -05:00
justin	4f455475c0	hc: weekly data-refresh pipeline + multi-segment warmup cron Two gaps closed: 1. hc_data_refresh.py (NEW): weekly source-data refresh. Re-checks every emailable NPI against the LIVE government sources so sends never go stale: - CMS Revalidation Due Date List (data.cms.gov per-NPI API; handles both ISO and US date formats, normalizes to MM/DD/YYYY). - OIG LEIE full CSV download (the NPI-bearing exclusion source). - SAM.gov v4 exclusions (key in .secrets/sam-api-key) -- OFF by default since SAM exclusions rarely carry an NPI and the full set is ~167k records; it's opt-in via --sam-pages. SAM's real value is the live per-name screening service, not a bulk NPI join. Writes the master CSV atomically (temp+rename). A provider who has since revalidated flips overdue->upcoming/not_on_list, so we stop nagging them. 2. build_healthcare_campaigns_cron.py: was revalidation-only (one hardcoded list/campaign/CSV/template). Now multi-segment: imports SEGMENTS from the single-source-of-truth registry, warms ALL five programs in parallel, each with its own list, dated campaign, and per-segment import-state file (so dedup is per-segment). A per segment maps master-CSV rows to the right program (reval_overdue / reval_upcoming / leie_or_deactivated / optout_ending / any). Daily ramp slice is split across segments (revalidation leads at 50%, rest share the remainder) so every program collects engagement data while the IPs warm. Back-compat: seeds revalidation import-state from the legacy hc_imported_emails.txt once.	2026-06-08 03:06:29 -05:00

Author

SHA1

Message

Date

justin

c79a7715e1

fix(hc): bugs found in self-audit of the new refresh + warmup + templates

Refresh (hc_data_refresh.py):
- CRITICAL: drop optout_ending from REFRESHED_FIELDS -- the refresh never
  computes it, so propagating it blanked the channel CSVs and would starve the
  compliance_bundle segment (whose selector IS optout_ending).
- MAJOR: only rewrite leie_excluded when OIG was actually pulled (guard was
  'not skip_oig OR not skip_sam', so a --skip-oig run blanked all exclusion
  flags). Also write 'Y' (matching the original list builder) not '1'.
- Use 'no_reval_flag' (the original vocabulary) instead of 'not_on_list' when an
  NPI drops off the reval list, and clear reval_due_date too.
- Throttle politeness: move time.sleep(0.05) above the early-continue paths so
  EVERY CMS request is spaced, not just the minority that are on the list.
- Guard blank-NPI rows (leave their status untouched instead of mislabeling).
- Master write preserves any columns beyond HEADER (no silent column drop).

Warmup cron (build_healthcare_campaigns_cron.py):
- Fix the daily-slice split: it summed to less than the budget (dropped ~2/day)
  and could OVERSHOOT on tiny totals (each 'other' floored to >=1). Now uses
  divmod for an even remainder and reclaims rounding onto the lead, so
  sum(per_seg) == total_slice exactly for every input (verified 0,1,2,7,100,300).

Templates: the non-revalidation emails rendered {{ .Subscriber.Attribs.detail }}
(a reval due date) under a 'Practice'/'Status'/'Record' label -- a wrong/
confusing personalization on a live send (esp. OIG, selector 'any'). All four
now show the practice name; 'detail' is retired from rendering (revalidation
uses reval_due_date/days_overdue directly).

2026-06-08 03:23:47 -05:00

justin

85dc3d5c3b

hc refresh: propagate fresh status into the channel CSVs the cron reads

The channel CSVs (hc_warmup_nongoogle/google/week1_verified) are email-keyed
subsets of the master with extra deliverability columns (verify_ok/verify_reason).
The refresh now writes the fresh status fields (reval_due_date, days_overdue,
reval_status, leie_excluded, optout_ending, name/specialty/state) back into each,
preserving the extra columns and row membership, so a single weekly run updates
everything the campaign cron consumes -- not just the master.

2026-06-08 03:13:00 -05:00

justin

4f455475c0

hc: weekly data-refresh pipeline + multi-segment warmup cron

Two gaps closed:

1. hc_data_refresh.py (NEW): weekly source-data refresh. Re-checks every
   emailable NPI against the LIVE government sources so sends never go stale:
   - CMS Revalidation Due Date List (data.cms.gov per-NPI API; handles both ISO
     and US date formats, normalizes to MM/DD/YYYY).
   - OIG LEIE full CSV download (the NPI-bearing exclusion source).
   - SAM.gov v4 exclusions (key in .secrets/sam-api-key) -- OFF by default since
     SAM exclusions rarely carry an NPI and the full set is ~167k records; it's
     opt-in via --sam-pages. SAM's real value is the live per-name screening
     service, not a bulk NPI join.
   Writes the master CSV atomically (temp+rename). A provider who has since
   revalidated flips overdue->upcoming/not_on_list, so we stop nagging them.

2. build_healthcare_campaigns_cron.py: was revalidation-only (one hardcoded
   list/campaign/CSV/template). Now multi-segment: imports SEGMENTS from the
   single-source-of-truth registry, warms ALL five programs in parallel, each
   with its own list, dated campaign, and per-segment import-state file (so
   dedup is per-segment). A  per segment maps master-CSV rows to the
   right program (reval_overdue / reval_upcoming / leie_or_deactivated /
   optout_ending / any). Daily ramp slice is split across segments (revalidation
   leads at 50%, rest share the remainder) so every program collects engagement
   data while the IPs warm. Back-compat: seeds revalidation import-state from the
   legacy hc_imported_emails.txt once.

2026-06-08 03:06:29 -05:00

3 commits