new-site/infra/cron/pw-hc-refresh
justin 9cb10b18e0 feat(hc): deliverability prune -- evict newly-Google-hosted subscribers
Belt-and-suspenders for the edge you flagged: a domain already in a warmup list
could flip its MX to Google Workspace between weekly refreshes, after which it
would hard-bounce from the cold IP. The import-time guard only catches NEW adds.

- prune_holdouts(): enumerates each warmup list's subscribers, matches them
  against the FRESH master CSV (re-classified weekly), and removes any whose
  domain is now Google-hosted. DELIVERABILITY-ONLY -- it never evicts for
  audience reasons (an overdue provider drifting out of the 1-90 day window was
  a valid target when warmed; re-litigating that just wastes warmup progress).
- --prune (run alongside warming) and --prune-only (prune then exit).
- Wired into the weekly refresh cron as a --prune-only chained step, so MX is
  re-checked and holdouts removed every Monday before the weekday sends.

Verified end-to-end: with no Google domains in lists it's a 0-op; injecting a
simulated Google-flipped domain into the master, the prune correctly detects and
(in a real run) would remove it from every list it's on.
2026-06-08 03:39:56 -05:00

10 lines
907 B
Text

# Healthcare data refresh: weekly re-check of every emailable NPI against the
# live government sources (CMS Revalidation list, OIG LEIE) + MX re-classification
# (Google-host detection) so warmup sends never go stale. Runs Mon 06:00 Central,
# ~1h before the 07:00 weekday send, propagating fresh status into the channel
# CSVs the campaign cron reads. Takes ~8 min. SAM is opt-in (--sam-pages); SAM
# exclusions rarely carry an NPI, so OIG LEIE is the NPI-bearing exclusion source.
# Then prune-only: remove any subscriber whose domain newly became Google-hosted
# from the warmup lists (deliverability safety net; removes only likely-bouncers,
# never evicts for audience reasons).
0 6 * * 1 deploy cd /opt/performancewest && python3 -u scripts/hc_data_refresh.py >> /var/log/pw-hc-refresh.log 2>&1 && python3 -u scripts/build_healthcare_campaigns_cron.py --prune-only >> /var/log/pw-hc-refresh.log 2>&1