- Add NPPES_STALE_MAX_YEARS (default 10): a record untouched for many years is
a stronger signal the practice closed/moved, and a bounce burns the warming
IP. Observed institutional distribution clusters 3-7yrs with ~0 beyond 8, so
10 is a safe ceiling that mails the whole real pool while excluding any
outlier ancient record. MIN stays 3 (keeps the 'out of date' claim credible).
- Restore the SMTP-verification gate (verify_ok) that the shared
institutional_verified selector had -- the swap to nppes_stale dropped it; we
only mail inboxes we already proved live.
- enrich: process the re-fetch queue STALEST-FIRST so a bounded (--limit) or
--max-age refresh spends its budget on the most-overdue cache entries (and new
NPIs) first, never starving them behind merely-aging ones.
- Selector unit-tested (10 cases incl. window edges, verify gate, deactivated).
The NPPES 'may be out of date' email previously asserted staleness with no
per-record evidence (softened earlier to a generic 'periodic review required').
NPPES is fully public and every record carries basic.last_updated, so we now
cite the actual government date the provider can verify on the registry.
- enrich_nppes_last_updated.py: joins real basic.last_updated /
enumeration_date / deactivated onto the institutional list via a cached,
resumable per-NPI crawl (no batch endpoint exists). Adds nppes_last_updated,
nppes_enumeration, nppes_years_stale, nppes_deactivated.
- cron: new 'nppes_stale' selector mails ONLY records >= 3yrs stale (env
HC_NPPES_STALE_MIN_YEARS) and excludes deactivated NPIs; empty date => no
match, so we never claim staleness without the government date to back it.
- template: headline + official-record card now show the real last_updated
date and ~N-years-ago, sourced to npiregistry.cms.hhs.gov.
- attribs + test SAMPLE expose the new fields; verified render + plaintext.
Isolates bulk sending reputation onto a dedicated subdomain so the root domain
stays clean for transactional/verification mail (and recovers faster). Replies
still go to the root domain via Reply-To, so the customer-facing reply experience
is unchanged.
- build_trucking_campaigns.py: add env-overridable FROM_EMAIL
(noreply@send.performancewest.net); use it for both scheduled + test sends
instead of inheriting base["from_email"] from the DB base campaign.
- build_healthcare_campaigns_cron.py: FROM_EMAIL ->
compliance@send.performancewest.net (env-overridable).
- bounce-watcher.sh / hc-bounce-watcher.sh: track the new subdomain envelope
sender (keep legacy root-domain sender so the pre-cutover queue still drains;
HC also tracks by hcout transport regardless of sender).
Infra already live (separate, non-code): subdomain DNS (A/MX/SPF/DKIM
selector=send/DMARC p=reject) on the Hestia master, OpenDKIM signs
d=send.performancewest.net (verified end-to-end), egress .94/.107. Root SPF
trimmed to the real IPs; pointless IP-rehab cron disabled.
Two deliverability hardening fixes from the email audit:
1. Plaintext (altbody): all campaigns were HTML-only. Listmonk only emits
multipart/alternative when altbody is set, and HTML-only bulk mail is a
spam-score signal. New scripts/_email_plaintext.py renders a readable
text/plain part from the HTML body (dependency-free; preserves Listmonk
{{ .Subscriber }}/{{ UnsubscribeURL }} template tags, turns links into
'text (url)'). Wired into the trucking builder (and thus UCR + IFTA, which
reuse create_and_schedule_campaign) and the healthcare builder.
2. Stable container hostname: Listmonk derived its Message-ID from the random
docker container id -> @localhost.localdomain (spam-score signal). Pin both
listmonk + listmonk-hc hostname to perfwest.performancewest.net, matching
Listmonk's SMTP hello_hostname.
Part of the email-deliverability incident hardening.
The HC warmup crons were '* * 1-5' (Mon-Fri), silently skipping weekends -- but a
proper warmup needs CONTINUOUS daily volume for 21 days (mailbox providers reward
consistency; gaps stall reputation). The Jun 14 'HC 0 sent' alert was just a
skipped Sunday, but the weekend skips also broke ramp continuity.
- pw-hc-campaign + pw-hc-nppes: '* * 1-5' -> '* * *' (daily), vendored + applied live.
- Re-aligned the warmup start stamp from calendar-day 9 to send-day 5 so the
volume ramp matches reputation actually built (it had skipped ~4 weekend days,
running the ramp ahead of real history).
- Fixed the stale 'Mon-Fri only' comment in daily_slice().
- Vendored nppes cron now carries the enriched-CSV + 4-segment config.
The OIG-screening + NPPES-update segments were effectively limited to ~1,437
providers because the warmup 'any' selector excluded not-on-reval-list rows as a
deliverability proxy -- but that excludes almost the ENTIRE institutional list
(org NPIs aren't individual Medicare enrollees). Since we already SMTP-verified
all 63k inboxes, add an 'institutional_verified' selector that trusts our own
verification instead of reval-list presence. Result: OIG + NPPES-update now
address 62,422 (43x more), giving multiple broad offers to test engagement on.
- enrich_institutional_revalidation.py: fast local join of the institutional
list to the CMS Revalidation Due Date List bulk file (revalidation_base.csv)
by NPI -> adds reval_due_date/days_overdue/reval_status. ~1,437 are genuine
Medicare enrollees (197 overdue / 164 due-soon) -> flagship $599 reval pitch.
- npi_reactivation stays on leie_or_deactivated (only REAL deactivations -- no
false 'your NPI is deactivated' claims to active orgs).
Reputation is tracked per receiving mail operator, not per recipient domain, so
the daily warmup slice is now distributed across MX operators with per-operator
daily caps (ramping with the warmup day): Microsoft/Google/Proofpoint/etc. capped
individually, long-tail operators each get a generous default. This lets total
daily volume be much higher than a flat cap without hammering any single system.
mx_throttled() respects the mx_provider column the verifier now writes; falls back
to flat slicing if absent.
add_subscriber only attached an already-existing subscriber to the new list
without updating attribs, so the due-soon template's days_until merge field was
blank for providers already imported by another segment. Now PUT the merged
attribs (existing + this segment's npi/practice/due-date/days_until) before
adding to the list.
The HC warmup pool is supply-constrained (~400 verified providers, all fed by
the same narrow 'revalidation 1-90 days OVERDUE' slice). This adds a mirror-image
proactive segment that targets providers whose Medicare revalidation is UPCOMING
within the next 1-90 days, drawn from the same CMS Revalidation Due Date List --
no new data source needed. 'Handle it before your deadline' is a strong pitch and
roughly doubles the deliverable pool.
- New selector reval_due_soon (status=upcoming, days_until in [HC_DUE_SOON_MIN,
HC_DUE_SOON_MAX] default 1-90).
- New segment revalidation_due_soon reusing the existing /order/npi-revalidation
service ($599) with template hc_revalidation_due_soon.html.
- attribs_for now exposes days_until (positive days to due date).
- Added to ACTIVE_SEGMENTS.
A lead replied with proof their Medicare revalidation was already approved (CMS
data-lag: the public Revalidation Due Date List still showed them overdue weeks
after approval). Two of these arrived same-day, so:
- Carbonio auto-reply (deployed on co.carrierone.com): created mailbox
hc-replies@ on the info@ distribution list with a Sieve that auto-acknowledges
'my revalidation is already complete' replies (tag + mark read + file into a
'Reval Completed (auto-acked)' folder + on-brand reply explaining the CMS lag).
CRITICAL: info@ is the shared reply-to for ALL campaigns (healthcare, trucking,
telecom), so every rule is anchored to Medicare/revalidation context -- a
trucking 'MCS-150 done, this is bogus' or telecom 'RMD done' reply does NOT
trigger it (tested + passing). A buyer guard ('please file / how much') also
suppresses the auto-reply so a human handles the sale.
Carbonio 25.x Sieve quirks documented (vacation/imap4flags/body :text all
unsupported; use reply/flag/tag/body :contains).
- Permanent suppression: new data/hc_suppress.txt do-not-contact list the warmup
honors at import AND --prune removes from the live lists. Seeded with the two
completed providers (Pangea Lab, Yakima Valley FWC); both also blocklisted in
listmonk_hc and removed from lists 3 + 4.
Belt-and-suspenders for the edge you flagged: a domain already in a warmup list
could flip its MX to Google Workspace between weekly refreshes, after which it
would hard-bounce from the cold IP. The import-time guard only catches NEW adds.
- prune_holdouts(): enumerates each warmup list's subscribers, matches them
against the FRESH master CSV (re-classified weekly), and removes any whose
domain is now Google-hosted. DELIVERABILITY-ONLY -- it never evicts for
audience reasons (an overdue provider drifting out of the 1-90 day window was
a valid target when warmed; re-litigating that just wastes warmup progress).
- --prune (run alongside warming) and --prune-only (prune then exit).
- Wired into the weekly refresh cron as a --prune-only chained step, so MX is
re-checked and holdouts removed every Monday before the weekday sends.
Verified end-to-end: with no Google domains in lists it's a 0-op; injecting a
simulated Google-flipped domain into the master, the prune correctly detects and
(in a real run) would remove it from every list it's on.
Found via live mail.log: Google-Workspace-hosted PRACTICE domains (custom
domains whose MX is aspmx.l.google.com, e.g. moosepharmacy.com, hc2kidney.com)
were getting hard 550-5.7.1 rejects from Google's cold-IP bulk filter -- exactly
the bounces that wreck a warming IP's reputation. The original google/non-google
split classified by the email's domain STRING, which can't see that a custom
domain silently uses Google Workspace; only an MX lookup reveals it (33% of our
domains, 228/689, are Google-hosted this way).
- hc_data_refresh.py: new MX classification (one lookup per unique domain via
dnspython, cached) writes an mx_provider=google/other flag into the master and
propagates it into the channel CSVs (auto-adding the column). --skip-mx for a
fast status-only run.
- build_healthcare_campaigns_cron.py: warm_segment now drops mx_provider=google
rows during warmup (HC_SKIP_GOOGLE=1 default; set 0 once IPs are warm). This is
defense-in-depth -- correct regardless of which CSV the cron is pointed at.
Verified: today's sends (nongoogle CSV) had 0 Google bounces; the guard cuts the
Google-containing week1_verified cohort's revalidation candidates 82->8.
Mailing heavily-overdue NPIs (months/years past due) risks hitting practices
that have closed, merged, or abandoned the inbox -> hard bounces, which are the
fastest way to wreck a warming IP's reputation. The warmup now restricts the
reval_overdue selector to an inclusive [HC_OVERDUE_MIN, HC_OVERDUE_MAX] window
(default 1-90 days) and the OIG 'any' selector likewise excludes heavily-overdue
and dropped-off-list rows. On the current cohort this trims the overdue audience
178->96 and the OIG audience 399->317, holding out the stale long tail
(181-365d + 366d+). upcoming/active providers are unaffected.
Refresh (hc_data_refresh.py):
- CRITICAL: drop optout_ending from REFRESHED_FIELDS -- the refresh never
computes it, so propagating it blanked the channel CSVs and would starve the
compliance_bundle segment (whose selector IS optout_ending).
- MAJOR: only rewrite leie_excluded when OIG was actually pulled (guard was
'not skip_oig OR not skip_sam', so a --skip-oig run blanked all exclusion
flags). Also write 'Y' (matching the original list builder) not '1'.
- Use 'no_reval_flag' (the original vocabulary) instead of 'not_on_list' when an
NPI drops off the reval list, and clear reval_due_date too.
- Throttle politeness: move time.sleep(0.05) above the early-continue paths so
EVERY CMS request is spaced, not just the minority that are on the list.
- Guard blank-NPI rows (leave their status untouched instead of mislabeling).
- Master write preserves any columns beyond HEADER (no silent column drop).
Warmup cron (build_healthcare_campaigns_cron.py):
- Fix the daily-slice split: it summed to less than the budget (dropped ~2/day)
and could OVERSHOOT on tiny totals (each 'other' floored to >=1). Now uses
divmod for an even remainder and reclaims rounding onto the lead, so
sum(per_seg) == total_slice exactly for every input (verified 0,1,2,7,100,300).
Templates: the non-revalidation emails rendered {{ .Subscriber.Attribs.detail }}
(a reval due date) under a 'Practice'/'Status'/'Record' label -- a wrong/
confusing personalization on a live send (esp. OIG, selector 'any'). All four
now show the practice name; 'detail' is retired from rendering (revalidation
uses reval_due_date/days_overdue directly).
Two gaps closed:
1. hc_data_refresh.py (NEW): weekly source-data refresh. Re-checks every
emailable NPI against the LIVE government sources so sends never go stale:
- CMS Revalidation Due Date List (data.cms.gov per-NPI API; handles both ISO
and US date formats, normalizes to MM/DD/YYYY).
- OIG LEIE full CSV download (the NPI-bearing exclusion source).
- SAM.gov v4 exclusions (key in .secrets/sam-api-key) -- OFF by default since
SAM exclusions rarely carry an NPI and the full set is ~167k records; it's
opt-in via --sam-pages. SAM's real value is the live per-name screening
service, not a bulk NPI join.
Writes the master CSV atomically (temp+rename). A provider who has since
revalidated flips overdue->upcoming/not_on_list, so we stop nagging them.
2. build_healthcare_campaigns_cron.py: was revalidation-only (one hardcoded
list/campaign/CSV/template). Now multi-segment: imports SEGMENTS from the
single-source-of-truth registry, warms ALL five programs in parallel, each
with its own list, dated campaign, and per-segment import-state file (so
dedup is per-segment). A per segment maps master-CSV rows to the
right program (reval_overdue / reval_upcoming / leie_or_deactivated /
optout_ending / any). Daily ramp slice is split across segments (revalidation
leads at 50%, rest share the remainder) so every program collects engagement
data while the IPs warm. Back-compat: seeds revalidation import-state from the
legacy hc_imported_emails.txt once.
Skepticism ("is this even real?") is the top objection. The data IS accurate
(verified our subscribers' NPIs match the official CMS Revalidation Due Date List
exactly), so this is a credibility-presentation fix:
1. Email: replace the plain detail row with an "Official record - CMS Medicare
Revalidation Due Date List" card (NPI, legal name, due date, days overdue)
plus a "Verify on CMS.gov" button. Clearly labeled as our presentation of
public CMS data, not a CMS screenshot (no impersonation).
2. API: npi/lookup now pulls the revalidation due date LIVE from the public CMS
dataset (data.cms.gov) instead of the empty local table, and returns a
revalidation{ due_date, source, cms_legal_name, verify_url } proof object.
3. Tool: /tools/npi-compliance-check shows a live "official record" card with a
self-verify link when CMS returns a due date.
Builder now stores reval_due_date/days_overdue as separate attribs for the card
(existing 194 subscribers backfilled from their detail string).