docs: record HC org-email diversification + CMS-URL maintenance note

This commit is contained in:
justin 2026-06-14 01:10:31 -05:00
parent b73edadb89
commit 1465690832

View file

@ -78,3 +78,26 @@ emails we already harvested**. Everything else is either a scrape gamble or a
phone/mail channel. Before building any new vertical, confirm its email path
falls in Tier 1-2; if it is Tier 3, pilot the scrape yield FIRST (like we should
have for CLIA); if Tier 4, don't pretend it is an email channel.
## Update 2026-06-13: healthcare org-email diversification (ACTED ON #1)
Unlocked the full verified institutional pool for broad offers:
- Root cause found: OIG/NPPES segments were gated by a warmup selector that
excluded `not_on_list` rows (a deliverability proxy that excluded ~62k of the
63k -- org NPIs are not individual Medicare enrollees). Since we already
SMTP-verified every inbox, added `institutional_verified` selector that trusts
our verification. OIG screening + NPPES update now address **62,422** (was
~1,437).
- `enrich_institutional_revalidation.py` joins the institutional list to the CMS
Revalidation Due Date List (revalidation_base.csv) by NPI -> ~1,437 genuine
Medicare enrollees (197 overdue / 164 due-soon) for the flagship $599 reval pitch.
- `pw-hc-nppes` cron now runs oig_screening + nppes_outdated + revalidation_overdue
+ revalidation_due_soon against the enriched file (still warmup-capped +
MX-throttled; bigger supply, same safe send rate). npi_reactivation stays on
the accurate leie_or_deactivated selector (no false "deactivated" claims).
- `pw-hc-refresh` cron now re-downloads + re-joins the reval base so overdue
figures stay accurate.
- MAINTENANCE: the CMS bulk file URLs (revalidation_base.csv, CLIA) embed a
dated path that rotates ~monthly. If the download 404s, re-fetch the dataset's
current downloadURL from https://data.cms.gov/data.json. Consider switching to
the dataset's stable data-api endpoint.