healthcare: bound NPPES-stale window [3,10]yr + restore verify_ok gate
- Add NPPES_STALE_MAX_YEARS (default 10): a record untouched for many years is a stronger signal the practice closed/moved, and a bounce burns the warming IP. Observed institutional distribution clusters 3-7yrs with ~0 beyond 8, so 10 is a safe ceiling that mails the whole real pool while excluding any outlier ancient record. MIN stays 3 (keeps the 'out of date' claim credible). - Restore the SMTP-verification gate (verify_ok) that the shared institutional_verified selector had -- the swap to nppes_stale dropped it; we only mail inboxes we already proved live. - enrich: process the re-fetch queue STALEST-FIRST so a bounded (--limit) or --max-age refresh spends its budget on the most-overdue cache entries (and new NPIs) first, never starving them behind merely-aging ones. - Selector unit-tested (10 cases incl. window edges, verify gate, deactivated).
This commit is contained in:
parent
9e155d214c
commit
744f0a89cf
2 changed files with 28 additions and 11 deletions
|
|
@ -175,11 +175,17 @@ def main() -> int:
|
|||
cache = load_cache(args.cache)
|
||||
log(f"cache={args.cache} entries={len(cache):,}")
|
||||
|
||||
# Determine which NPIs need a (re)fetch.
|
||||
# Determine which NPIs need a (re)fetch, STALEST FIRST so a bounded run
|
||||
# (--limit) always spends its budget on the most-overdue cache entries.
|
||||
# Never-fetched entries have an empty fetched_at, which sorts first, so new
|
||||
# NPIs are prioritized over merely-aging ones.
|
||||
todo = [n for n in npis if not is_fresh(cache.get(n, {}), today, args.max_age)]
|
||||
todo.sort(key=lambda n: cache.get(n, {}).get("fetched_at", "") or "")
|
||||
n_due = len(todo)
|
||||
if args.limit:
|
||||
todo = todo[:args.limit]
|
||||
log(f"to_fetch={len(todo):,} (of {len(npis):,} unique NPIs; limit={args.limit or 'all'})")
|
||||
log(f"to_fetch={len(todo):,} (of {n_due:,} due / {len(npis):,} unique NPIs; "
|
||||
f"limit={args.limit or 'all'})")
|
||||
|
||||
fetched = 0
|
||||
t0 = time.time()
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue