Consolidate the outbound mail footprint to match the SPF intent (already
trimmed to .94/.107 on 2026-06-19). A 20-IP sending footprint reads as
snowshoe spam to receivers and was contributing to domain-reputation
throttling (Microsoft 451 4.7.500, Gmail low-reputation).
Removed from /etc/postfix/master.cf: transports yahooslow, out02-04,
out06-20, rehab02-04, HC submission ports 2527/2528, hcout2/hcout3.
Removed from /etc/network/interfaces (+ live ip addr del): host bindings
.90-.93, .95-.106, .108-.109. Kept: .94 (trucking/out05), .107 (HC/hcout1),
.71/.72 (infra).
Verified live: postfix check OK, both streams still status=sent post-change,
SSH session on .71 unaffected, transport_maps still routes via out05.
Snapshots: infra/postfix/live-snapshots/master.cf, infra/network/interfaces.
Live backups on server: /root/{master.cf,interfaces}.bak_snowshoe_*.
api.performancewest.net uses an explicit per-path allowlist; everything else
falls through to a trusted-IP-only catch-all that returns 403. Six browser-
facing routes had no location block, so they 403'd for every public visitor:
/api/v1/npi/ <- THE healthcare sales killer. The 'Free NPI
Compliance Check' tool (top of the HC funnel,
where every HC campaign sends traffic) fetches
/api/v1/npi/lookup. It 403'd -> CORS error in
the browser -> the tool never rendered results
or the upsell CTAs (Revalidation $399 / NPPES
$149 / Bundle $899) -> 0 HC sales despite 17
sessions reaching it in 30d and 0 HC orders
EVER created in the compliance DB.
/api/v1/cdr/ telecom CDR profile tool
/api/v1/icc/ intrastate/ICC profile tool
/api/v1/corp/ corporate foreign-qual check
/api/v1/foreign-qualification/ foreign qualification quote/jurisdictions
/api/v1/lnpa-regions LNPA region lookup
Added explicit proxy_pass blocks (mirroring the existing entities/identity
pattern) before the catch-all. Verified live: all six now reach the app with
proper CORS; the NPI tool renders results + order CTAs end-to-end via a real
browser; npi-revalidation order page -> Stripe confirmed.
The live /etc/nginx/sites-enabled/pw-api.conf was hand-edited and untracked;
committing the current state here so it is version-controlled. (Live backup:
/root/pw-api.conf.bak_20260623.)
The day-9 Gmail block that forced the 200/h hold is resolved: per-MX throttling
shipped, Google is excluded entirely (MAIN_EXCLUDE_OPERATORS=google), and the
OpenDKIM signing bug is fixed. With Google out of the mix, 400/h (~4k/day) is
within the envelope these IPs cleanly sustained at 68-76% delivery with zero
blocks. Lets the post-DKIM re-send backlog drain in ~1 day instead of ~3.
Fix 1 (build_trucking_campaigns.py): the warmup big-MX exclusion only covered the
clean-label operators (google/microsoft/proofpoint/...). Consumer mailbox
operators that mx_tag_carriers.py labels with an "mx:" prefix slipped BOTH the
exclusion and the per-MX throttle -- notably mx:yahoodns.net (283k sendable
carriers = Yahoo Small Business/AOL custom domains) and mx:icloud.com (25k), plus
comcast/charter/centurylink/windstream/tds/earthlink. These are custom domains
whose MX points at a consumer provider, invisible to the literal-domain blocklist.
Added CONSUMER_MX_OPERATORS, folded into WARMUP_EXCLUDE_OPERATORS used by both the
fetch_carriers() exclusion SQL and mx_daily_caps() (same day-30 ramp). Behind the
existing MAIN_SKIP_BIG_MX switch.
Validated read-only: after the fix the warmup-eligible pool is 353,909 carriers
(315,892 untagged + ~38k genuinely small/self-hosted operators), so the long tail
still sustains the daily quota -- not starved -- while 0 consumer-MX carriers are
selected during warmup.
Fix 3 (infra/cron/pw-mx-tag): mx_tag_carriers.py was on no cron, so the untagged
(NULL) backlog (~316k) never drained and new FMCSA imports stayed untagged,
slowly re-opening the gap. Added a daily 05:45 UTC cron (--only-unsent
--limit-domains 20000), before the 08:00 builder. Idempotent/bounded (only tags
mx_provider IS NULL). Verified live: a 200-domain test run tagged 216 domains.
(Fix 2 -- bounding the NULL bucket cap -- deferred; the cron will drain it.)
Tool 2 of the deliverability monitoring pair (Tool 1 = mail_reputation_monitor).
DMARC rua reports from dozens of operators (Google, Yahoo, Comcast, Cox, Bell,
Mimecast, Cisco ESA, GMX, mail.com, ...) were landing in ops@ (dmarc@ was a DL),
burying real mail and never parsed. Now ingested + queryable:
- dmarc@performancewest.net converted DL -> dedicated Carbonio mailbox; isolated
IMAP creds in server .env, surfaced to workers in docker-compose.yml (mirrors
OPS_IMAP_*). 29 historical reports moved ops@ -> dmarc@ via IMAP.
- scripts/dmarc_report_parser.py: IMAP fetch unseen -> decompress .gz/.zip/.xml
(namespace-agnostic: classic + urn:ietf:params:xml:ns:dmarc-2.0 GMX/mail.com) ->
parse aggregate XML -> upsert dmarc_report (keyed (org_name,report_id), no-op on
re-parse) + dmarc_record per source IP. dmarc_pass = dkim_aligned OR spf_aligned.
Marks \Seen. --dry-run/--all/--alert (7d per-IP summary + Telegram if one of OUR
IPs <95% pass, or EXTERNAL IP sends >=20 failing msgs as us = spoofing under
p=reject). psycopg2 imported lazily so --dry-run runs without the driver.
- api/migrations/102_dmarc_aggregate.sql: dmarc_report + dmarc_record tables.
- infra/cron/pw-dmarc-parser: 06:20 UTC daily --alert (after reputation, before scrub).
- docs/deliverability.md: DMARC section DONE; query examples.
Verified: dry-run --all parses all 28 reports (1 non-report test probe), 0 unknown
after the namespace fix.
Runs mail_reputation_monitor --alert at 06:10 UTC, piping the day's postfix log
(sudo cat, same pattern as pw-warmup-tg-alert) into the DB-connected workers
container. Builds the daily SNDS-equivalent reputation trend and Telegram-alerts
on operator regressions. Installed to /etc/cron.d/pw-mail-reputation.
Runs scrub_listmonk_consumer against both listmonk and listmonk_hc at 06:30 UTC,
before the campaign builders, so any ENABLED subscriber matching the authoritative
exclusion list is blocklisted retroactively. Keeps list-based campaigns (FCC
Direct Contacts, CRTC/USF, etc.) from leaking onto consumer mailboxes after a new
domain (e.g. Apple/iCloud) is added to the exclusion list. Installed to
/etc/cron.d/pw-listmonk-scrub on the host.
- infra/ansible/roles/mail: refactor OpenDKIM to support multiple signing domains
via opendkim_signing_domains list (root + send.performancewest.net). Loops
keygen/ownership/keytable/signingtable so the live two-domain setup is
reproducible from ansible.
- infra/ansible group_vars: add bulk_mail_subdomain + campaign_from_* +
campaign_reply_to documentation vars (map to CAMPAIGN_FROM / HC_CAMPAIGN_FROM
env read by the builder scripts). smtp_from (transactional) stays on root.
- docs/deliverability.md: rewrite TL;DR with the carrierone-vs-performancewest
A/B proof (same server/IPs, different From domain -> Inbox vs Junk) and the
~85% Microsoft / 14% Google / <1% Yahoo audience mix; add the bulk-subdomain
section, SPF trim, rehab-disabled, and the Hestia DNS automation runbook.
The multi-IP rotation was built to spread risk while DKIM was broken (fixed
2026-06-17) and after the May 30-31 over-volume blast. With DKIM signing
correctly, spreading ~3k trucking msgs/day across 12 IPs (.94-.105) + ~1.2k
healthcare msgs/day across 3 IPs (.107-.109) gave each IP far too little
per-receiver volume to build reputation. Gmail/Outlook read it as snowshoe spam
and reputation-blocked ~200 msgs/day ("very low reputation of the sending
domain") -> 0 human clicks, 0 sales.
Consolidate to ONE IP per stream so each accrues real reputation:
- trucking: pw-mta-warmup ALL=(out05) -> randmap collapses to {out05:} = .94
- healthcare: listmonk-hc SMTP servers 2/3 (ports 2527/2528 -> .108/.109)
disabled in DB; all HC mail now egresses .107 (hcmta01). [applied live]
Applied live: transport_maps now randmap:{out05:}; listmonk-hc restarted.
To re-expand later: add transports back to ALL + re-enable the HC SMTP servers.
Convert OIG/SAM from one-time $299/yr to recurring $79/month (card+ACH only) -
the first real recurring-billing product in the system. Exclusion screening is
a *monthly* federal obligation, so recurring monitoring fits the requirement and
is the biggest valuation lever (vs a one-time annual run).
Catalog (single source of truth):
- service-catalog.ts: add billing_interval + allowed_methods to ComplianceService;
oig-sam-screening -> 7900c, billing_interval:"month", allowed_methods:[card,ach],
name "(Monthly Monitoring)".
- gen-service-catalog.py + check-service-catalog-drift.py: carry/guard the two new
fields; regenerate site catalog.
Checkout (api/src/routes/checkout.ts):
- mode:"subscription" with recurring price_data when billing_interval is set;
surcharge absorbed for recurring (clean $79/mo); server-side METHOD_NOT_ALLOWED
re-validation against allowed_methods.
- ensureColumns + migration 100: compliance_orders.stripe_subscription_id,
bundle_upsell_sent_at (+ subscription index).
Webhooks (api/src/routes/webhooks.ts):
- record stripe_subscription_id on checkout.session.completed (subscription mode).
- invoice.paid (subscription_cycle only) -> re-dispatch screening for the cycle;
invoice.payment_failed -> admin alert + first-failure customer nudge;
customer.subscription.deleted -> mark order cancelled. (API 2026-03-25 moved the
subscription link to invoice.parent.subscription_details.subscription.)
Fulfillment:
- job_server.py: pass recurring_cycle/invoice_id into the order.
- npi_provider.py: OIG handler labels renewal cycles "[Monthly cycle]" + re-screen
note; bundle action runs only the FIRST screening + flags the $79/mo upsell.
Bundle land-and-expand:
- Provider Compliance Bundle now includes only the first OIG/SAM screening (was
giving away $948/yr of monitoring inside an $899 bundle).
- new worker scripts/workers/bundle_upsell.py (+ pw-bundle-upsell timer): ~3 weeks
after a paid bundle, emails the customer to continue $79/mo monitoring; dedup via
bundle_upsell_sent_at; skips customers who already have an OIG/SAM order.
Surfaces updated to $79/mo: PaymentStep (filters methods, "Billed every month,
cancel anytime"), order pages, healthcare index, npi-compliance-check tool (also
fixed stale $699 bundle drift -> $899), hc_oig_screening + hc_compliance_bundle
emails.
Docs: billing.md gains a "Stripe-native Subscriptions" section + a reality-check
banner (Adyen/ERPNext-gateway model documented there is NOT live; Stripe is the
real rail). Fixed run-migrations.yml container name bug
(performancewest-postgres-1 -> performancewest-api-postgres-1, overridable).
Tests: api/tests/recurring-subscription.test.ts (28 assertions) covers catalog
gating, method validation, surcharge suppression, recurring line-item build,
invoiceSubscriptionId extraction, renewal-cycle gating. tsc clean; site build
clean; catalog drift OK.
Manual deploy step: enable invoice.paid, invoice.payment_failed,
customer.subscription.deleted on the Stripe webhook endpoint.
SMTP2GO is no longer used: Listmonk relays through the local Postfix MTA
(172.18.0.1:25 from the Docker network), which DKIM-signs and delivers
direct-to-recipient-MX; transactional mail goes through Carbonio. Verified
zero smtp2go in any live container env + postfix has no external relayhost.
Removed the stale references so a rebuild/new dev can't re-introduce it:
- api/src/config.ts: SMTP_HOST default mail.smtp2go.com -> co.carrierone.com
- scripts/workers/crypto_payment_worker.py: same default fix
- infra/ansible all.yml: listmonk_smtp_* now 172.18.0.1:25, no auth (+comment)
- app.env.j2 / email.ts / crm.md / go-live-todo.md / architecture.svg: docs
The FMCSA census was a one-time snapshot (last loaded ~May 30) with NO refresh
timer -- carriers newly falling out of MCS-150/UCR compliance were never picked
up. New scripts/workers/fmcsa_source_refresh.py orchestrates the full pipeline
(census download -> enrichment -> deficiency flag -> verify new emails ->
MX-tag new) and runs weekly via cron pw-fmcsa-refresh (Sun 09:00 UTC), codified
in the mail-pipeline Ansible role.
Idempotent + incremental: the census upsert preserves email_verified /
listmonk_sent_at / deficiency_flags, so existing carriers keep their send state
and only census fields refresh; new DOTs flow into verification then campaigns.
A carrier who refiled gets a fresh mcs150_parsed, so the builder's overdue
WHERE clause stops targeting them automatically. Verify is capped per run
(20k) so it never stalls on millions of rows.
(Healthcare already auto-catches newly-revalidation-overdue providers within
its 63k institutional pool via pw-hc-refresh Mon/Wed/Fri.)
The entire outbound campaign pipeline lived ONLY on the host and was never in
IaC -- a fresh rebuild would have silently shipped NO campaigns, NO IP warmup/
ramp, and NO bounce processing. New mail-pipeline role + deploy-mail-pipeline.yml
playbook deploy it from the canonical repo copies:
cron.d (infra/cron/):
- pw-trucking-campaign-builder, pw-ifta-campaign, pw-ucr-campaign
- pw-hc-campaign, pw-hc-nppes, pw-hc-refresh
- pw-mta-warmup, pw-listmonk-rampcap, pw-hc-rampcap
- pw-ip-rehab, pw-warmup-tg-alert
helper scripts (-> /usr/local/bin):
- pw-mta-warmup, pw-listmonk-rampcap, pw-hc-rampcap, pw-warmup-tg-alert
- postfix-bounce-notify.sh, postfix-hc-bounce-notify.sh, listmonk-bounce-sync.py
systemd services:
- pw-bounce-watcher.service (was missing from repo), pw-hc-bounce-watcher.service
Also creates the deploy-owned {{project_dir}}/logs dir (deploy can't write
/var/log, so a missing dir made cron redirects fail). Added the 6 cron.d files
that existed only on the host, the trucking bounce-watcher unit, and synced
infra/cron/pw-hc-refresh to the live version (revalidation download + enrich
steps). Role wired into site.yml after the mail (OpenDKIM) role.
Part of the email-deliverability incident hardening.
mail.log had no logrotate rule and grew unbounded to ~1GB (~150MB/day)
since Jun 8. This host logs via Postfix's built-in postlogd (maillog_file
mode), not rsyslog (no rsyslog.service exists), so postlogd holds the file
open -- a plain rename+create would leave it writing to the stale inode.
Use copytruncate (no daemon signal needed). Rotate daily, keep 14 days
compressed. Applied live: forced first rotation, compressed the 1GB
archive (->99MB), verified logging + bounce watchers + DKIM signing intact.
Part of the email-deliverability incident hardening (follows DKIM fix 4d59019).
Root cause of the Jun 2026 deliverability collapse / 'no new sales':
opendkim.conf was in single-key mode with no InternalHosts, so it signed only
127.0.0.1. Transactional/cron mail (injected locally) was signed, but ALL
campaign mail -- injected over the Docker bridge from the Listmonk containers
(172.18.0.5 trucking, 172.18.0.25 healthcare) -- went out UNSIGNED. Gmail/Yahoo
require DKIM on bulk mail since Feb 2024, so cold campaigns were junked/blocked
(~23% delivery, 550-5.7.1). Proof: 2,620 campaign msgs that day, 0 DKIM sigs.
The correct table files already existed on the server but were never wired into
opendkim.conf. Fix points the daemon at key.table/signing.table and sets
InternalHosts/ExternalIgnoreList to trusted.hosts (which includes 172.16.0.0/12,
the Docker subnet). Fixes BOTH streams: HC submission ports 2526-2528 inherit
the global smtpd_milters and *@performancewest.net covers compliance@.
Verified by injecting from a Docker IP through port 25 and port 2526 -- both now
get 'DKIM-Signature field added'. Codified as new Ansible role 'mail' so it
can't silently regress (OpenDKIM was previously not in IaC at all).
Non-attorney 'Service' filer account registered under Performance West
(filings@performancewest.net). Credentials live only in the server .env
(blank default in template, never committed). Consumed by the upcoming SC
intrastate Playwright e-filer.
Intrastate operating authority is state-specific + application-based like IRP, so
it reuses the same email/POA + invoice-reconciliation flow:
- intrastate_filing.send_intrastate_submission: emails the state PSC/PUC the
authority application with the signed POA attached (subject tag [PW-ISA CO-..]),
reusing irp_filing's MinIO download + census enrich helpers.
- The shared poller (irp_invoice_poller) now matches BOTH [PW-IRP] and [PW-ISA]
tags, parses the fee, Telegram-alerts, and bills the customer the exact amount
with the correct service slug.
- state_trucking gov-fee gate routes intrastate-authority to the PSC/PUC email
path; if no submission email is configured for the base state it falls back
to a manual todo (safe default — no emailing guessed agency addresses).
Per-state ISA_<ST>_EMAIL env (blank until the exact agency address is verified).
SC/GA/TX scaffolded. Customer still only sees an exact-fee payment link; you only
approve the final filing.
- gov_fee: add AGENCY_PROCESSING_FEE (per-service card/convenience fee passed
through so the customer pays the true all-in cost); estimate_gov_fee now folds
it into the billed total. IFTA/intrastate/UCR fees are published/near-exact.
- IRP fees can't be looked up — only the base state computes them. New
irp_filing.py: emails the base-state IRP unit a Schedule A/B request (Reply-To
the IRP filings mailbox, [PW-IRP CO-...] subject tag), and a 15-min cron
(irp_invoice_poller) scans the mailbox for the state's invoice reply, parses
the exact apportioned fee, Telegram-alerts you, and bills the customer the
EXACT amount via a gov-fee child order + payment link. Then it proceeds to
ready_to_file for your final approval.
- state_trucking gov-fee gate now routes IRP to the email/invoice path and
IFTA/intrastate to immediate exact-fee billing.
- Mailbox is configurable (IRP_FILINGS_IMAP_* in app.env.j2); falls back to
OPS_IMAP_* filtered by the [PW-IRP] tag until a dedicated mailbox exists.
Telegram alerts fire on IRP submission sent, invoice received (billed), and
un-parseable replies (so you can read + enter the fee manually).
The shared security snippet blocked any path matching /(admin|administrator|
login.action|struts) with 'return 444', which drops the connection. That bare
'admin' token also matched our own operations dashboard at /admin and the new
/admin/compliance-orders, so the browser showed 'This site can't be reached'.
Dropped the bare 'admin' token; administrator/login.action/struts stay blocked.
Applied live on prod (sudo edit + nginx reload); this updates the source of
truth so the ansible nginx role won't reintroduce it.
The HC warmup crons were '* * 1-5' (Mon-Fri), silently skipping weekends -- but a
proper warmup needs CONTINUOUS daily volume for 21 days (mailbox providers reward
consistency; gaps stall reputation). The Jun 14 'HC 0 sent' alert was just a
skipped Sunday, but the weekend skips also broke ramp continuity.
- pw-hc-campaign + pw-hc-nppes: '* * 1-5' -> '* * *' (daily), vendored + applied live.
- Re-aligned the warmup start stamp from calendar-day 9 to send-day 5 so the
volume ramp matches reputation actually built (it had skipped ~4 weekend days,
running the ramp ahead of real history).
- Fixed the stale 'Mon-Fri only' comment in daily_slice().
- Vendored nppes cron now carries the enriched-CSV + 4-segment config.
Day 9 (2026-06-13) alert: main pool 54% delivery, 202 Gmail spam-blocks
(550-5.7.1 'Gmail has detected') on warming IPs .94-.98. The 4k/day (400/h)
ramp was too aggressive AND the trucking pool lacks the per-MX throttling the HC
pool got -- Google-Workspace-hosted business domains (weberfarms.net, uatruck.com,
etc.) concentrated and Gmail blocked us. Held at 200/h (~2k/day) through day 20 to
recover, then slow step to 300/h. Applied live (cap already set to 200/h).
Adds /etc/cron.d/pw-hc-nppes (weekdays 07:30) that imports the verified NPPES
institutional general-compliance base into the OIG screening segment, throttled
per MX operator. Separate from the 07:00 reval-segment run so the two pipelines
stay independent. Vendored the cron file under infra/cron/.
The main sending IPs are cleanly warmed: today 3,845 sent at 0.18% bounce,
ZERO deferrals, ZERO ISP rate-limit/blocklist/Spamhaus hits. The script's own
note records these IPs historically sustained ~2,500/day at 68-76% delivery;
collapses only ever came from 17k-29k spikes. So we have ample headroom to
accelerate the trucking ramp safely:
day 7-13: 300/h -> 400/h (~4,000/day) [applied now, day 8]
day 14+: new 500/h (~5,000/day) [hard ceiling, well under ~17k]
Also vendored pw-listmonk-rampcap into the repo (infra/postfix/) -- it
previously lived only on the server at /usr/local/bin. Live script updated and
applied (listmonk cap now 400/h).
Add ezstorehost to trusted_admin in both layers — the nft input set and
the DOCKER-USER iptables chain (Forgejo is containerised; DNAT means the
post-DNAT dport 22 rule applies). Required for static-tenant deploys from
ezStorehost-infra to clone repos over ssh://.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Tracks the rehab pool (rehab02-04 / .91-.93) delivery + bounce + Spamhaus ZEN
DNSBL status in the daily report and alert body. Alerts only if a rehab IP lands
on a DNSBL or rehab delivery drops <40% with real volume (recipient quality
slipped) -- a recovering IP naturally bounces more so the threshold is lenient.
The 3 IPs (mta02-04 / .91-.93) retired after the May 30-31 over-volume blast are
NOT on any DNSBL (Spamhaus/Barracuda/SpamCop/SORBS all clean) and have clean PTRs
+ SPF/DKIM/DMARC -- the damage was provider-internal reputation, which recovers
with slow clean sending. scripts/ip_rehab.py sends a tiny ramping trickle
(10/IP/day -> cap 60) of genuine CAN-SPAM-compliant compliance check-in mail to
clean business-domain, never-bounced recipients via dedicated heavily-throttled
postfix transports rehab02/03/04 (30s/msg, bound to .91/.92/.93). Routing uses an
X-PW-Rehab-IP header + header_checks FILTER to override the transport_maps randmap
warmup rotation (verified: mail routes via rehab transports, status=sent). Daily
cron pw-ip-rehab. After ~2-3 weeks of clean sending the IPs can be reallocated.
The HC warmup builder ran from cron at 07:00 but the >> /var/log/pw-hc-campaign.log
redirect failed (deploy user cannot write /var/log), and a failed output redirect
makes cron abort the command BEFORE it runs -> HC sent 0/day since the log file was
removed. Route HC cron logs to /opt/performancewest/logs/ (deploy-owned) so the
redirect always succeeds. Builder itself was fine (verified: imports + sends work,
0 bounces). Also removed the stale 'campaign-warmup.sh 122' root-cron line that
pointed at a finished campaign + no longer existed.
crypto.performancewest.net kept going down because the shkeeper-deployment web
pod periodically HANGS (HTTP server deadlocks while the apscheduler background
thread keeps the process alive). The helm chart (shkeeper-1.7.15) ships NO
liveness or readiness probe, so k8s saw the hung pod as Running and never
restarted it, and kept routing traffic to the dead backend -> site down until a
manual restart.
Added HTTP probes on / :5000 (302 = healthy): liveness auto-restarts a hung pod,
readiness pulls it from the Service endpoints. Applied live via kubectl patch
(chart does not expose probes via values; re-apply after any helm upgrade --
command in the file header). Verified: new pod comes up READY 1/1 (probe passes)
and crypto.performancewest.net serves 302 again.
End-of-day (20:00 Central) check of campaign deliverability across both sending
pools (main out05-09 + healthcare hcout). Sends a Telegram alert ONLY when there
is a reputation problem -- delivery below 65% or a spam/policy-block (550-5.7.1)
spike above 150/day -- so healthy days stay silent. Reuses the existing
TELEGRAM_BOT_TOKEN/CHAT_ID from /opt/performancewest/.env. Logs every run to
/var/log/pw-warmup-healthcheck.log for history. Excludes internal/probe noise so
the delivery figure reflects real external recipients.
The 'already revalidated' replies come from the CMS data-lag window (a provider
completes their revalidation but CMS's public Due Date List still shows them
overdue for weeks). Running the refresh 3x/week instead of weekly shrinks that
window from up to 7 days to ~2-3, so a provider who just completed stops being
targeted faster. No change to the overdue window or audience size -- this is the
lever that reduces stale-data complaints without losing prospects.
Belt-and-suspenders for the edge you flagged: a domain already in a warmup list
could flip its MX to Google Workspace between weekly refreshes, after which it
would hard-bounce from the cold IP. The import-time guard only catches NEW adds.
- prune_holdouts(): enumerates each warmup list's subscribers, matches them
against the FRESH master CSV (re-classified weekly), and removes any whose
domain is now Google-hosted. DELIVERABILITY-ONLY -- it never evicts for
audience reasons (an overdue provider drifting out of the 1-90 day window was
a valid target when warmed; re-litigating that just wastes warmup progress).
- --prune (run alongside warming) and --prune-only (prune then exit).
- Wired into the weekly refresh cron as a --prune-only chained step, so MX is
re-checked and holdouts removed every Monday before the weekday sends.
Verified end-to-end: with no Google domains in lists it's a 0-op; injecting a
simulated Google-flipped domain into the master, the prune correctly detects and
(in a real run) would remove it from every list it's on.
Mailing heavily-overdue NPIs (months/years past due) risks hitting practices
that have closed, merged, or abandoned the inbox -> hard bounces, which are the
fastest way to wreck a warming IP's reputation. The warmup now restricts the
reval_overdue selector to an inclusive [HC_OVERDUE_MIN, HC_OVERDUE_MAX] window
(default 1-90 days) and the OIG 'any' selector likewise excludes heavily-overdue
and dropped-off-list rows. On the current cohort this trims the overdue audience
178->96 and the OIG audience 399->317, holding out the stale long tail
(181-365d + 366d+). upcoming/active providers are unaffected.
Tracks the deployed cron.d files in the repo:
- pw-hc-campaign: updated comment to reflect the now multi-segment warmup
(revalidation + OIG + NPPES + reactivation + bundle); command unchanged.
- pw-hc-refresh (NEW): Mon 06:00 Central weekly data refresh, ~1h before the
07:00 weekday send, so every send uses fresh CMS/OIG status.
Standard (no-login) CMS filings are mailed in one Priority Mail envelope per
destination agency, batched each postal working-day morning to save postage.
- migration 089: paper_filing_batches table + esign_records.paper_batch_id /
filing_destination_key (idempotent: a filing is batched at most once).
- batch_cover_sheet.py: per-agency cover sheet (sender/dest/date/manifest) +
merged print-job PDF (cover + all enclosed signed filings).
- daily_paper_batch.py worker: gather signed+unbatched cms855/cms10114 filings,
group by destination (MAC by state via mac_routing; Fargo for CMS-10114),
build cover+merged PDF per agency, persist batch, mark filings batched.
Self-gates on postal working days (skips weekends + federal/USPS holidays).
Phase 1 = human prints+mails; phase 2 = wire print-mail API.
- worker-crons: pw-paper-batch systemd timer (Mon-Fri 13:30 UTC, self-gated).
- test_paper_batch.py: 15/15 pass (working-day gating, routing, cover+merge).
- deploy.sh/deploy-dev.sh: bring up listmonk-hc (upstream image, excluded from
build); document the one-time listmonk_hc DB create + --install.
- docker-compose.dev.override.yml: dev-only override (committed) that drops the
prod host-port bindings and pins dev's own postgres volume (dev-pgdata) via
compose !override tags. deploy-dev ships it as docker-compose.override.yml so
syncing the canonical compose to the shared host no longer breaks dev's
api-postgres (port :5432 clash + volume switch). Discovered + fixed while
validating listmonk-hc on dev.
- pw-hc-rampcap.sh: healthcare analogue of pw-listmonk-rampcap, ramps the
listmonk_hc cap 100->1000/h off /etc/postfix/hc-warmup-start, fully
independent of the trucking ramp/cap.
Adds 3 hc submission ports (2526/2527/2528) in the single Postfix instance,
each content_filter'd onto a dedicated hc transport (hcout1/2/3) binding the
hc IPs .107/.108/.109 with hc HELO identity (hcmta01-03) and hotter concurrency.
listmonk-hc round-robins the 3 ports.
Discovered + documented the constraint that drove this shape: transport_maps
randmap is owned by the shared trivial-rewrite(8) and is global, so neither a
per-smtpd -o transport_maps nor a FILTER randmap:{...} can scope a separate IP
pool (FILTER parses randmap as a literal transport). content_filter=hcoutN:
(empty nexthop) overrides transport_maps and keeps the real recipient domain.
Verified end-to-end on the server: :2527 -> hcout2 (.108) -> real gmail MX;
trucking transport_maps (.94-.96) untouched. Idempotent, postfix-check gated
with auto-rollback.
Chromium rejects authenticated SOCKS5 ('Browser does not support socks5 proxy
authentication'). Add a gost (ginuerzh/gost:2.11.5) 'proxy-relay' sidecar that
listens unauthenticated on socks5://proxy-relay:11080 and forwards to the
authenticated residential upstream (HEALTHCARE_PROXY_UPSTREAM_URL). Workers point
Playwright at the relay via HEALTHCARE_PROXY_URL=socks5://proxy-relay:11080.
env template: split into HEALTHCARE_PROXY_UPSTREAM_URL (authenticated, password
percent-encoded so '#' -> %23) and HEALTHCARE_PROXY_URL (the relay address).
Validated end-to-end on dev: workers Chromium -> proxy-relay -> residential
egress IP 76.228.206.147; NPPES + PECOS both HTTP 200.
CMS healthcare portals (NPPES, PECOS, I&A) block datacenter IPs, so the
healthcare browser automation needs to egress via the residential proxy on
hg409y7ez04.sn.mynetname.net (username 'performancewest').
- undetected_browser: use_proxy now accepts an env-var name, so callers can
select a domain-specific proxy. _proxy_config(proxy_env) reads it and falls
back to UNDETECTED_PROXY_URL. Healthcare uses 'HEALTHCARE_PROXY_URL'.
- probe_npi_undetected: launches with use_proxy='HEALTHCARE_PROXY_URL' when set.
- npi_provider: documents that the (future) automated NPPES/PECOS flows must
use the healthcare proxy.
- Plumb HEALTHCARE_PROXY_URL (+ UNDETECTED_PROXY_URL fallback) through the
ansible env template and docker-compose workers env.
The credential itself is NOT in the repo. Set the full URL in the ansible
vault as vault_healthcare_proxy_url:
socks5://performancewest:<password>@hg409y7ez04.sn.mynetname.net:<port>
Verified parsing + Playwright proxy-dict wiring with a unit test.