# NOTE: logs go to /opt/performancewest/logs/ (deploy-owned). The deploy user
# cannot write /var/log, so a /var/log redirect makes cron silently fail before
# the command runs. Ensure /opt/performancewest/logs exists + is deploy-owned.
# Healthcare data refresh: re-check every emailable NPI against the live
# government sources (CMS Revalidation list, OIG LEIE) + MX re-classification
# (Google-host detection) so warmup sends never go stale. Runs Mon/Wed/Fri 06:00
# Central, ~1h before the 07:00 weekday send. Mon/Wed/Fri (vs weekly) shrinks the
# CMS data-lag window to ~2-3 days, so a provider who just completed their
# revalidation stops being targeted faster (fewer "already done" replies).
# Takes ~8 min. SAM is opt-in (--sam-pages); SAM exclusions rarely carry an NPI,
# so OIG LEIE is the NPI-bearing exclusion source. Pipeline:
#   1. hc_data_refresh.py            -- re-verify NPIs vs CMS/OIG + MX reclassify
#   2. download CMS revalidation_base.csv (institutional revalidation dates)
#   3. enrich_institutional_revalidation.py -- merge reval dates into the
#      institutional CSV consumed by the pw-hc-nppes builder
#   4. build_healthcare_campaigns_cron.py --prune-only -- evict newly-Google-
#      hosted + suppressed subscribers from the warmup lists
0 6 * * 1,3,5 deploy cd /opt/performancewest && python3 -u scripts/hc_data_refresh.py >> /opt/performancewest/logs/pw-hc-refresh.log 2>&1 && curl -s "https://data.cms.gov/sites/default/files/2026-05/96484587-20ec-4070-a4de-cd7de3ec0093/revalidation_base.csv" -o data/npi_build/revalidation_base.csv 2>>/opt/performancewest/logs/pw-hc-refresh.log && python3 -u scripts/enrich_institutional_revalidation.py data/hc_nppes_institutional_verified.csv data/npi_build/revalidation_base.csv data/hc_nppes_institutional_enriched.csv >> /opt/performancewest/logs/pw-hc-refresh.log 2>&1 && python3 -u scripts/build_healthcare_campaigns_cron.py --prune-only >> /opt/performancewest/logs/pw-hc-refresh.log 2>&1
