Commit graph

59 commits

Author SHA1 Message Date
justin
8e5590b492 mail: DMARC aggregate-report parser + dedicated dmarc@ mailbox ingestion
Tool 2 of the deliverability monitoring pair (Tool 1 = mail_reputation_monitor).
DMARC rua reports from dozens of operators (Google, Yahoo, Comcast, Cox, Bell,
Mimecast, Cisco ESA, GMX, mail.com, ...) were landing in ops@ (dmarc@ was a DL),
burying real mail and never parsed. Now ingested + queryable:

- dmarc@performancewest.net converted DL -> dedicated Carbonio mailbox; isolated
  IMAP creds in server .env, surfaced to workers in docker-compose.yml (mirrors
  OPS_IMAP_*). 29 historical reports moved ops@ -> dmarc@ via IMAP.
- scripts/dmarc_report_parser.py: IMAP fetch unseen -> decompress .gz/.zip/.xml
  (namespace-agnostic: classic + urn:ietf:params:xml:ns:dmarc-2.0 GMX/mail.com) ->
  parse aggregate XML -> upsert dmarc_report (keyed (org_name,report_id), no-op on
  re-parse) + dmarc_record per source IP. dmarc_pass = dkim_aligned OR spf_aligned.
  Marks \Seen. --dry-run/--all/--alert (7d per-IP summary + Telegram if one of OUR
  IPs <95% pass, or EXTERNAL IP sends >=20 failing msgs as us = spoofing under
  p=reject). psycopg2 imported lazily so --dry-run runs without the driver.
- api/migrations/102_dmarc_aggregate.sql: dmarc_report + dmarc_record tables.
- infra/cron/pw-dmarc-parser: 06:20 UTC daily --alert (after reputation, before scrub).
- docs/deliverability.md: DMARC section DONE; query examples.

Verified: dry-run --all parses all 28 reports (1 non-report test probe), 0 unknown
after the namespace fix.
2026-06-19 08:50:20 -05:00
justin
b45332b5f7 infra(cron): nightly mail-reputation snapshot (pw-mail-reputation)
Runs mail_reputation_monitor --alert at 06:10 UTC, piping the day's postfix log
(sudo cat, same pattern as pw-warmup-tg-alert) into the DB-connected workers
container. Builds the daily SNDS-equivalent reputation trend and Telegram-alerts
on operator regressions. Installed to /etc/cron.d/pw-mail-reputation.
2026-06-19 08:38:35 -05:00
justin
72c69a05c9 infra(cron): daily Listmonk consumer-domain reconciliation (pw-listmonk-scrub)
Runs scrub_listmonk_consumer against both listmonk and listmonk_hc at 06:30 UTC,
before the campaign builders, so any ENABLED subscriber matching the authoritative
exclusion list is blocklisted retroactively. Keeps list-based campaigns (FCC
Direct Contacts, CRTC/USF, etc.) from leaking onto consumer mailboxes after a new
domain (e.g. Apple/iCloud) is added to the exclusion list. Installed to
/etc/cron.d/pw-listmonk-scrub on the host.
2026-06-19 00:00:46 -05:00
justin
3ca960aca5 docs+infra(deliverability): document bulk subdomain; ansible signs send.performancewest.net
- infra/ansible/roles/mail: refactor OpenDKIM to support multiple signing domains
  via opendkim_signing_domains list (root + send.performancewest.net). Loops
  keygen/ownership/keytable/signingtable so the live two-domain setup is
  reproducible from ansible.
- infra/ansible group_vars: add bulk_mail_subdomain + campaign_from_* +
  campaign_reply_to documentation vars (map to CAMPAIGN_FROM / HC_CAMPAIGN_FROM
  env read by the builder scripts). smtp_from (transactional) stays on root.
- docs/deliverability.md: rewrite TL;DR with the carrierone-vs-performancewest
  A/B proof (same server/IPs, different From domain -> Inbox vs Junk) and the
  ~85% Microsoft / 14% Google / <1% Yahoo audience mix; add the bulk-subdomain
  section, SPF trim, rehab-disabled, and the Hestia DNS automation runbook.
2026-06-18 23:12:05 -05:00
justin
545e6f7ed7 infra(mail): consolidate sending IPs (kill snowshoe) now that DKIM is fixed
The multi-IP rotation was built to spread risk while DKIM was broken (fixed
2026-06-17) and after the May 30-31 over-volume blast. With DKIM signing
correctly, spreading ~3k trucking msgs/day across 12 IPs (.94-.105) + ~1.2k
healthcare msgs/day across 3 IPs (.107-.109) gave each IP far too little
per-receiver volume to build reputation. Gmail/Outlook read it as snowshoe spam
and reputation-blocked ~200 msgs/day ("very low reputation of the sending
domain") -> 0 human clicks, 0 sales.

Consolidate to ONE IP per stream so each accrues real reputation:
 - trucking: pw-mta-warmup ALL=(out05) -> randmap collapses to {out05:} = .94
 - healthcare: listmonk-hc SMTP servers 2/3 (ports 2527/2528 -> .108/.109)
   disabled in DB; all HC mail now egresses .107 (hcmta01). [applied live]

Applied live: transport_maps now randmap:{out05:}; listmonk-hc restarted.
To re-expand later: add transports back to ALL + re-enable the HC SMTP servers.
2026-06-18 17:41:07 -05:00
justin
cf021e2f91 feat(healthcare): OIG/SAM exclusion screening as $79/mo Stripe Subscription
Convert OIG/SAM from one-time $299/yr to recurring $79/month (card+ACH only) -
the first real recurring-billing product in the system. Exclusion screening is
a *monthly* federal obligation, so recurring monitoring fits the requirement and
is the biggest valuation lever (vs a one-time annual run).

Catalog (single source of truth):
- service-catalog.ts: add billing_interval + allowed_methods to ComplianceService;
  oig-sam-screening -> 7900c, billing_interval:"month", allowed_methods:[card,ach],
  name "(Monthly Monitoring)".
- gen-service-catalog.py + check-service-catalog-drift.py: carry/guard the two new
  fields; regenerate site catalog.

Checkout (api/src/routes/checkout.ts):
- mode:"subscription" with recurring price_data when billing_interval is set;
  surcharge absorbed for recurring (clean $79/mo); server-side METHOD_NOT_ALLOWED
  re-validation against allowed_methods.
- ensureColumns + migration 100: compliance_orders.stripe_subscription_id,
  bundle_upsell_sent_at (+ subscription index).

Webhooks (api/src/routes/webhooks.ts):
- record stripe_subscription_id on checkout.session.completed (subscription mode).
- invoice.paid (subscription_cycle only) -> re-dispatch screening for the cycle;
  invoice.payment_failed -> admin alert + first-failure customer nudge;
  customer.subscription.deleted -> mark order cancelled. (API 2026-03-25 moved the
  subscription link to invoice.parent.subscription_details.subscription.)

Fulfillment:
- job_server.py: pass recurring_cycle/invoice_id into the order.
- npi_provider.py: OIG handler labels renewal cycles "[Monthly cycle]" + re-screen
  note; bundle action runs only the FIRST screening + flags the $79/mo upsell.

Bundle land-and-expand:
- Provider Compliance Bundle now includes only the first OIG/SAM screening (was
  giving away $948/yr of monitoring inside an $899 bundle).
- new worker scripts/workers/bundle_upsell.py (+ pw-bundle-upsell timer): ~3 weeks
  after a paid bundle, emails the customer to continue $79/mo monitoring; dedup via
  bundle_upsell_sent_at; skips customers who already have an OIG/SAM order.

Surfaces updated to $79/mo: PaymentStep (filters methods, "Billed every month,
cancel anytime"), order pages, healthcare index, npi-compliance-check tool (also
fixed stale $699 bundle drift -> $899), hc_oig_screening + hc_compliance_bundle
emails.

Docs: billing.md gains a "Stripe-native Subscriptions" section + a reality-check
banner (Adyen/ERPNext-gateway model documented there is NOT live; Stripe is the
real rail). Fixed run-migrations.yml container name bug
(performancewest-postgres-1 -> performancewest-api-postgres-1, overridable).

Tests: api/tests/recurring-subscription.test.ts (28 assertions) covers catalog
gating, method validation, surcharge suppression, recurring line-item build,
invoiceSubscriptionId extraction, renewal-cycle gating. tsc clean; site build
clean; catalog drift OK.

Manual deploy step: enable invoice.paid, invoice.payment_failed,
customer.subscription.deleted on the Stripe webhook endpoint.
2026-06-18 07:54:38 -05:00
justin
a04ecf7df3 chore(email): decommission SMTP2GO references — local MTA only
SMTP2GO is no longer used: Listmonk relays through the local Postfix MTA
(172.18.0.1:25 from the Docker network), which DKIM-signs and delivers
direct-to-recipient-MX; transactional mail goes through Carbonio. Verified
zero smtp2go in any live container env + postfix has no external relayhost.

Removed the stale references so a rebuild/new dev can't re-introduce it:
- api/src/config.ts: SMTP_HOST default mail.smtp2go.com -> co.carrierone.com
- scripts/workers/crypto_payment_worker.py: same default fix
- infra/ansible all.yml: listmonk_smtp_* now 172.18.0.1:25, no auth (+comment)
- app.env.j2 / email.ts / crm.md / go-live-todo.md / architecture.svg: docs
2026-06-17 22:46:59 -05:00
justin
899b880e7f trucking: weekly FMCSA source refresh so new non-compliant carriers are caught
The FMCSA census was a one-time snapshot (last loaded ~May 30) with NO refresh
timer -- carriers newly falling out of MCS-150/UCR compliance were never picked
up. New scripts/workers/fmcsa_source_refresh.py orchestrates the full pipeline
(census download -> enrichment -> deficiency flag -> verify new emails ->
MX-tag new) and runs weekly via cron pw-fmcsa-refresh (Sun 09:00 UTC), codified
in the mail-pipeline Ansible role.

Idempotent + incremental: the census upsert preserves email_verified /
listmonk_sent_at / deficiency_flags, so existing carriers keep their send state
and only census fields refresh; new DOTs flow into verification then campaigns.
A carrier who refiled gets a fresh mcs150_parsed, so the builder's overdue
WHERE clause stops targeting them automatically. Verify is capped per run
(20k) so it never stalls on millions of rows.

(Healthcare already auto-catches newly-revalidation-overdue providers within
its 63k institutional pool via pw-hc-refresh Mon/Wed/Fri.)
2026-06-17 20:44:54 -05:00
justin
4dc5690666 infra: codify the email-campaign pipeline in Ansible (new mail-pipeline role)
The entire outbound campaign pipeline lived ONLY on the host and was never in
IaC -- a fresh rebuild would have silently shipped NO campaigns, NO IP warmup/
ramp, and NO bounce processing. New mail-pipeline role + deploy-mail-pipeline.yml
playbook deploy it from the canonical repo copies:

  cron.d (infra/cron/):
    - pw-trucking-campaign-builder, pw-ifta-campaign, pw-ucr-campaign
    - pw-hc-campaign, pw-hc-nppes, pw-hc-refresh
    - pw-mta-warmup, pw-listmonk-rampcap, pw-hc-rampcap
    - pw-ip-rehab, pw-warmup-tg-alert
  helper scripts (-> /usr/local/bin):
    - pw-mta-warmup, pw-listmonk-rampcap, pw-hc-rampcap, pw-warmup-tg-alert
    - postfix-bounce-notify.sh, postfix-hc-bounce-notify.sh, listmonk-bounce-sync.py
  systemd services:
    - pw-bounce-watcher.service (was missing from repo), pw-hc-bounce-watcher.service

Also creates the deploy-owned {{project_dir}}/logs dir (deploy can't write
/var/log, so a missing dir made cron redirects fail). Added the 6 cron.d files
that existed only on the host, the trucking bounce-watcher unit, and synced
infra/cron/pw-hc-refresh to the live version (revalidation download + enrich
steps). Role wired into site.yml after the mail (OpenDKIM) role.

Part of the email-deliverability incident hardening.
2026-06-17 20:26:01 -05:00
justin
2e4388a803 mail: add logrotate for Postfix mail.log (postlogd copytruncate)
mail.log had no logrotate rule and grew unbounded to ~1GB (~150MB/day)
since Jun 8. This host logs via Postfix's built-in postlogd (maillog_file
mode), not rsyslog (no rsyslog.service exists), so postlogd holds the file
open -- a plain rename+create would leave it writing to the stale inode.
Use copytruncate (no daemon signal needed). Rotate daily, keep 14 days
compressed. Applied live: forced first rotation, compressed the 1GB
archive (->99MB), verified logging + bounce watchers + DKIM signing intact.

Part of the email-deliverability incident hardening (follows DKIM fix 4d59019).
2026-06-17 19:47:13 -05:00
justin
4d5901921e mail: fix OpenDKIM not signing campaign mail (Docker-injected) + codify in Ansible
Root cause of the Jun 2026 deliverability collapse / 'no new sales':
opendkim.conf was in single-key mode with no InternalHosts, so it signed only
127.0.0.1. Transactional/cron mail (injected locally) was signed, but ALL
campaign mail -- injected over the Docker bridge from the Listmonk containers
(172.18.0.5 trucking, 172.18.0.25 healthcare) -- went out UNSIGNED. Gmail/Yahoo
require DKIM on bulk mail since Feb 2024, so cold campaigns were junked/blocked
(~23% delivery, 550-5.7.1). Proof: 2,620 campaign msgs that day, 0 DKIM sigs.

The correct table files already existed on the server but were never wired into
opendkim.conf. Fix points the daemon at key.table/signing.table and sets
InternalHosts/ExternalIgnoreList to trusted.hosts (which includes 172.16.0.0/12,
the Docker subnet). Fixes BOTH streams: HC submission ports 2526-2528 inherit
the global smtpd_milters and *@performancewest.net covers compliance@.

Verified by injecting from a Docker IP through port 25 and port 2526 -- both now
get 'DKIM-Signature field added'. Codified as new Ansible role 'mail' so it
can't silently regress (OpenDKIM was previously not in IaC at all).
2026-06-17 19:31:19 -05:00
justin
01b3e1d234 chore(env): scaffold ISA_SC_DMS_USER/PASS for SC PSC MyDMS e-file portal
Non-attorney 'Service' filer account registered under Performance West
(filings@performancewest.net). Credentials live only in the server .env
(blank default in template, never committed). Consumed by the upcoming SC
intrastate Playwright e-filer.
2026-06-16 08:19:17 -05:00
justin
c27cfd3242 docs(crons): note IRP invoice poller now also handles intrastate [PW-ISA] replies 2026-06-16 07:59:38 -05:00
justin
b125d46663 feat(intrastate): automate state PUC/PSC authority filing (email + invoice + auto-bill)
Intrastate operating authority is state-specific + application-based like IRP, so
it reuses the same email/POA + invoice-reconciliation flow:
  - intrastate_filing.send_intrastate_submission: emails the state PSC/PUC the
    authority application with the signed POA attached (subject tag [PW-ISA CO-..]),
    reusing irp_filing's MinIO download + census enrich helpers.
  - The shared poller (irp_invoice_poller) now matches BOTH [PW-IRP] and [PW-ISA]
    tags, parses the fee, Telegram-alerts, and bills the customer the exact amount
    with the correct service slug.
  - state_trucking gov-fee gate routes intrastate-authority to the PSC/PUC email
    path; if no submission email is configured for the base state it falls back
    to a manual todo (safe default — no emailing guessed agency addresses).

Per-state ISA_<ST>_EMAIL env (blank until the exact agency address is verified).
SC/GA/TX scaffolded. Customer still only sees an exact-fee payment link; you only
approve the final filing.
2026-06-16 07:57:57 -05:00
justin
ea695d6828 feat(govfee): exact fees + agency processing fees; IRP email/invoice reconciliation
- gov_fee: add AGENCY_PROCESSING_FEE (per-service card/convenience fee passed
  through so the customer pays the true all-in cost); estimate_gov_fee now folds
  it into the billed total. IFTA/intrastate/UCR fees are published/near-exact.

- IRP fees can't be looked up — only the base state computes them. New
  irp_filing.py: emails the base-state IRP unit a Schedule A/B request (Reply-To
  the IRP filings mailbox, [PW-IRP CO-...] subject tag), and a 15-min cron
  (irp_invoice_poller) scans the mailbox for the state's invoice reply, parses
  the exact apportioned fee, Telegram-alerts you, and bills the customer the
  EXACT amount via a gov-fee child order + payment link. Then it proceeds to
  ready_to_file for your final approval.

- state_trucking gov-fee gate now routes IRP to the email/invoice path and
  IFTA/intrastate to immediate exact-fee billing.

- Mailbox is configurable (IRP_FILINGS_IMAP_* in app.env.j2); falls back to
  OPS_IMAP_* filtered by the [PW-IRP] tag until a dedicated mailbox exists.

Telegram alerts fire on IRP submission sent, invoice received (billed), and
un-parseable replies (so you can read + enter the fee manually).
2026-06-16 04:58:14 -05:00
justin
d65f5ea279 nginx: stop blocking /admin (bot-scan rule matched our own dashboard)
The shared security snippet blocked any path matching /(admin|administrator|
login.action|struts) with 'return 444', which drops the connection. That bare
'admin' token also matched our own operations dashboard at /admin and the new
/admin/compliance-orders, so the browser showed 'This site can't be reached'.
Dropped the bare 'admin' token; administrator/login.action/struts stay blocked.
Applied live on prod (sudo edit + nginx reload); this updates the source of
truth so the ansible nginx role won't reintroduce it.
2026-06-16 00:05:54 -05:00
justin
2caab6aa69 hc: warmup must run DAILY for the full 21-day ramp (not weekdays-only)
The HC warmup crons were '* * 1-5' (Mon-Fri), silently skipping weekends -- but a
proper warmup needs CONTINUOUS daily volume for 21 days (mailbox providers reward
consistency; gaps stall reputation). The Jun 14 'HC 0 sent' alert was just a
skipped Sunday, but the weekend skips also broke ramp continuity.

- pw-hc-campaign + pw-hc-nppes: '* * 1-5' -> '* * *' (daily), vendored + applied live.
- Re-aligned the warmup start stamp from calendar-day 9 to send-day 5 so the
  volume ramp matches reputation actually built (it had skipped ~4 weekend days,
  running the ramp ahead of real history).
- Fixed the stale 'Mon-Fri only' comment in daily_slice().
- Vendored nppes cron now carries the enriched-CSV + 4-segment config.
2026-06-14 21:02:08 -05:00
justin
dd4ed3ea38 warmup: ROLL BACK main pool to 200/h after Gmail spam-blocked IPs at 400/h
Day 9 (2026-06-13) alert: main pool 54% delivery, 202 Gmail spam-blocks
(550-5.7.1 'Gmail has detected') on warming IPs .94-.98. The 4k/day (400/h)
ramp was too aggressive AND the trucking pool lacks the per-MX throttling the HC
pool got -- Google-Workspace-hosted business domains (weberfarms.net, uatruck.com,
etc.) concentrated and Gmail blocked us. Held at 200/h (~2k/day) through day 20 to
recover, then slow step to 300/h. Applied live (cap already set to 200/h).
2026-06-13 20:10:13 -05:00
justin
ff4ab262a8 hc: cron to feed NPPES institutional base (63k verified) into warmup, MX-throttled
Adds /etc/cron.d/pw-hc-nppes (weekdays 07:30) that imports the verified NPPES
institutional general-compliance base into the OIG screening segment, throttled
per MX operator. Separate from the 07:00 reval-segment run so the two pipelines
stay independent. Vendored the cron file under infra/cron/.
2026-06-12 22:11:12 -05:00
justin
887bf9a14a warmup: grow main (trucking) pool faster -- 3k -> 4k/day now, 5k at day 14
The main sending IPs are cleanly warmed: today 3,845 sent at 0.18% bounce,
ZERO deferrals, ZERO ISP rate-limit/blocklist/Spamhaus hits. The script's own
note records these IPs historically sustained ~2,500/day at 68-76% delivery;
collapses only ever came from 17k-29k spikes. So we have ample headroom to
accelerate the trucking ramp safely:
  day 7-13: 300/h -> 400/h (~4,000/day)   [applied now, day 8]
  day 14+:  new    500/h    (~5,000/day)   [hard ceiling, well under ~17k]

Also vendored pw-listmonk-rampcap into the repo (infra/postfix/) -- it
previously lived only on the server at /usr/local/bin. Live script updated and
applied (listmonk cap now 400/h).
2026-06-11 00:13:41 -05:00
justin
c8a0824143 firewall: allow ezstorehost (207.174.124.51) to reach Forgejo SSH
Add ezstorehost to trusted_admin in both layers — the nft input set and
the DOCKER-USER iptables chain (Forgejo is containerised; DNAT means the
post-DNAT dport 22 rule applies). Required for static-tenant deploys from
ezStorehost-infra to clone repos over ssh://.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 22:45:43 -05:00
justin
1854753c70 monitoring: add .91-.93 IP rehab to daily Telegram warmup alert
Tracks the rehab pool (rehab02-04 / .91-.93) delivery + bounce + Spamhaus ZEN
DNSBL status in the daily report and alert body. Alerts only if a rehab IP lands
on a DNSBL or rehab delivery drops <40% with real volume (recipient quality
slipped) -- a recovering IP naturally bounces more so the threshold is lenient.
2026-06-09 20:34:41 -05:00
justin
25f4a7503b warmup: IP rehab for .91-.93 so they can be reallocated
The 3 IPs (mta02-04 / .91-.93) retired after the May 30-31 over-volume blast are
NOT on any DNSBL (Spamhaus/Barracuda/SpamCop/SORBS all clean) and have clean PTRs
+ SPF/DKIM/DMARC -- the damage was provider-internal reputation, which recovers
with slow clean sending. scripts/ip_rehab.py sends a tiny ramping trickle
(10/IP/day -> cap 60) of genuine CAN-SPAM-compliant compliance check-in mail to
clean business-domain, never-bounced recipients via dedicated heavily-throttled
postfix transports rehab02/03/04 (30s/msg, bound to .91/.92/.93). Routing uses an
X-PW-Rehab-IP header + header_checks FILTER to override the transport_maps randmap
warmup rotation (verified: mail routes via rehab transports, status=sent). Daily
cron pw-ip-rehab. After ~2-3 weeks of clean sending the IPs can be reallocated.
2026-06-09 20:27:47 -05:00
justin
9fa2c86f01 fix(warmup): HC cron logged to /var/log (deploy can't write) -> cron silently died
The HC warmup builder ran from cron at 07:00 but the >> /var/log/pw-hc-campaign.log
redirect failed (deploy user cannot write /var/log), and a failed output redirect
makes cron abort the command BEFORE it runs -> HC sent 0/day since the log file was
removed. Route HC cron logs to /opt/performancewest/logs/ (deploy-owned) so the
redirect always succeeds. Builder itself was fine (verified: imports + sends work,
0 bounces). Also removed the stale 'campaign-warmup.sh 122' root-cron line that
pointed at a finished campaign + no longer existed.
2026-06-09 16:06:28 -05:00
justin
9b9d317916 infra/k8s: shkeeper liveness+readiness probes (fix recurring crypto.performancewest.net downtime)
crypto.performancewest.net kept going down because the shkeeper-deployment web
pod periodically HANGS (HTTP server deadlocks while the apscheduler background
thread keeps the process alive). The helm chart (shkeeper-1.7.15) ships NO
liveness or readiness probe, so k8s saw the hung pod as Running and never
restarted it, and kept routing traffic to the dead backend -> site down until a
manual restart.

Added HTTP probes on / :5000 (302 = healthy): liveness auto-restarts a hung pod,
readiness pulls it from the Service endpoints. Applied live via kubectl patch
(chart does not expose probes via values; re-apply after any helm upgrade --
command in the file header). Verified: new pod comes up READY 1/1 (probe passes)
and crypto.performancewest.net serves 302 again.
2026-06-09 04:57:50 -05:00
justin
7c39a858cc monitoring: daily warmup IP-reputation Telegram alert
End-of-day (20:00 Central) check of campaign deliverability across both sending
pools (main out05-09 + healthcare hcout). Sends a Telegram alert ONLY when there
is a reputation problem -- delivery below 65% or a spam/policy-block (550-5.7.1)
spike above 150/day -- so healthy days stay silent. Reuses the existing
TELEGRAM_BOT_TOKEN/CHAT_ID from /opt/performancewest/.env. Logs every run to
/var/log/pw-warmup-healthcheck.log for history. Excludes internal/probe noise so
the delivery figure reflects real external recipients.
2026-06-08 21:06:41 -05:00
justin
2156a5e05f hc refresh: run Mon/Wed/Fri instead of weekly to shrink CMS data-lag
The 'already revalidated' replies come from the CMS data-lag window (a provider
completes their revalidation but CMS's public Due Date List still shows them
overdue for weeks). Running the refresh 3x/week instead of weekly shrinks that
window from up to 7 days to ~2-3, so a provider who just completed stops being
targeted faster. No change to the overdue window or audience size -- this is the
lever that reduces stale-data complaints without losing prospects.
2026-06-08 10:53:36 -05:00
justin
9cb10b18e0 feat(hc): deliverability prune -- evict newly-Google-hosted subscribers
Belt-and-suspenders for the edge you flagged: a domain already in a warmup list
could flip its MX to Google Workspace between weekly refreshes, after which it
would hard-bounce from the cold IP. The import-time guard only catches NEW adds.

- prune_holdouts(): enumerates each warmup list's subscribers, matches them
  against the FRESH master CSV (re-classified weekly), and removes any whose
  domain is now Google-hosted. DELIVERABILITY-ONLY -- it never evicts for
  audience reasons (an overdue provider drifting out of the 1-90 day window was
  a valid target when warmed; re-litigating that just wastes warmup progress).
- --prune (run alongside warming) and --prune-only (prune then exit).
- Wired into the weekly refresh cron as a --prune-only chained step, so MX is
  re-checked and holdouts removed every Monday before the weekday sends.

Verified end-to-end: with no Google domains in lists it's a 0-op; injecting a
simulated Google-flipped domain into the master, the prune correctly detects and
(in a real run) would remove it from every list it's on.
2026-06-08 03:39:56 -05:00
justin
feb677f6ce fix(hc warmup): only mail slightly-overdue providers (deliverability)
Mailing heavily-overdue NPIs (months/years past due) risks hitting practices
that have closed, merged, or abandoned the inbox -> hard bounces, which are the
fastest way to wreck a warming IP's reputation. The warmup now restricts the
reval_overdue selector to an inclusive [HC_OVERDUE_MIN, HC_OVERDUE_MAX] window
(default 1-90 days) and the OIG 'any' selector likewise excludes heavily-overdue
and dropped-off-list rows. On the current cohort this trims the overdue audience
178->96 and the OIG audience 399->317, holding out the stale long tail
(181-365d + 366d+). upcoming/active providers are unaffected.
2026-06-08 03:27:22 -05:00
justin
167c4a3847 infra/cron: multi-segment hc warmup + weekly data-refresh cron
Tracks the deployed cron.d files in the repo:
- pw-hc-campaign: updated comment to reflect the now multi-segment warmup
  (revalidation + OIG + NPPES + reactivation + bundle); command unchanged.
- pw-hc-refresh (NEW): Mon 06:00 Central weekly data refresh, ~1h before the
  07:00 weekday send, so every send uses fresh CMS/OIG status.
2026-06-08 03:15:47 -05:00
justin
138fec17e9 healthcare: daily batched paper-filing fulfillment
Standard (no-login) CMS filings are mailed in one Priority Mail envelope per
destination agency, batched each postal working-day morning to save postage.

- migration 089: paper_filing_batches table + esign_records.paper_batch_id /
  filing_destination_key (idempotent: a filing is batched at most once).
- batch_cover_sheet.py: per-agency cover sheet (sender/dest/date/manifest) +
  merged print-job PDF (cover + all enclosed signed filings).
- daily_paper_batch.py worker: gather signed+unbatched cms855/cms10114 filings,
  group by destination (MAC by state via mac_routing; Fargo for CMS-10114),
  build cover+merged PDF per agency, persist batch, mark filings batched.
  Self-gates on postal working days (skips weekends + federal/USPS holidays).
  Phase 1 = human prints+mails; phase 2 = wire print-mail API.
- worker-crons: pw-paper-batch systemd timer (Mon-Fri 13:30 UTC, self-gated).
- test_paper_batch.py: 15/15 pass (working-day gating, routing, cover+merge).
2026-06-07 00:30:01 -05:00
justin
bf4e8c2277 infra: MTA-STS HTTPS vhost (cert issued, policy live) 2026-06-06 21:03:30 -05:00
justin
34daa0c1d3 infra: MTA-STS status note - cert pending stable HE.net DNS propagation 2026-06-06 19:37:37 -05:00
justin
7bd2f70de4 infra: MTA-STS policy + vhost + README (cert pending DNS propagation) 2026-06-06 19:36:27 -05:00
justin
4233c90a4f hc email: reframe value-add to 'No 2FA. No government portals.' (we have a portal; the pain is CMS 2FA/identity-proofing); cron creates fresh dated campaign when prior is finished; add hc bounce watcher (Postfix->listmonk-hc webhook, hard/complaint->blocklist) 2026-06-06 16:47:12 -05:00
justin
6738a335af infra: nginx vhost for listmonk-hc admin portal (lists-hc.performancewest.net -> 127.0.0.1:9101, LE cert) 2026-06-06 07:02:50 -05:00
justin
95698852ce healthcare warmup: gate Google/Workspace domains out of week 1 (they hard-reject cold IPs 550-5.7.1); send 501 non-Google practice domains first, defer 222 Google to week 2-3; cron uses hc_warmup_nongoogle.csv 2026-06-06 04:02:00 -05:00
justin
2bc86268f7 healthcare: HC warmup campaign cron (Mon-Fri 7AM Central) - imports overdue-first verified slice into listmonk-hc + runs Medicare-revalidation campaign via hc HOT stream; rate-throttled by pw-hc-rampcap 2026-06-06 03:57:08 -05:00
justin
695c3e2431 security: drop all CBC TLS suites (Qualys WEAK -> AEAD-only, still A+); sync ansible nginx templates (ciphers + ywxi CSP); capture host firewall as IaC 2026-06-06 00:49:21 -05:00
justin
90d8b94f3f feat(email): wire listmonk-hc into deploy + dev override + hc ramp-cap
- deploy.sh/deploy-dev.sh: bring up listmonk-hc (upstream image, excluded from
  build); document the one-time listmonk_hc DB create + --install.
- docker-compose.dev.override.yml: dev-only override (committed) that drops the
  prod host-port bindings and pins dev's own postgres volume (dev-pgdata) via
  compose !override tags. deploy-dev ships it as docker-compose.override.yml so
  syncing the canonical compose to the shared host no longer breaks dev's
  api-postgres (port :5432 clash + volume switch). Discovered + fixed while
  validating listmonk-hc on dev.
- pw-hc-rampcap.sh: healthcare analogue of pw-listmonk-rampcap, ramps the
  listmonk_hc cap 100->1000/h off /etc/postfix/hc-warmup-start, fully
  independent of the trucking ramp/cap.
2026-06-05 19:19:45 -05:00
justin
70d742df08 feat(mta): healthcare HOT-stream Postfix setup (dedicated hc IPs, isolated)
Adds 3 hc submission ports (2526/2527/2528) in the single Postfix instance,
each content_filter'd onto a dedicated hc transport (hcout1/2/3) binding the
hc IPs .107/.108/.109 with hc HELO identity (hcmta01-03) and hotter concurrency.
listmonk-hc round-robins the 3 ports.

Discovered + documented the constraint that drove this shape: transport_maps
randmap is owned by the shared trivial-rewrite(8) and is global, so neither a
per-smtpd -o transport_maps nor a FILTER randmap:{...} can scope a separate IP
pool (FILTER parses randmap as a literal transport). content_filter=hcoutN:
(empty nexthop) overrides transport_maps and keeps the real recipient domain.

Verified end-to-end on the server: :2527 -> hcout2 (.108) -> real gmail MX;
trucking transport_maps (.94-.96) untouched. Idempotent, postfix-check gated
with auto-rollback.
2026-06-05 19:07:02 -05:00
justin
a79d6b1906 feat(healthcare): add gost proxy-relay so Chromium can use the residential proxy
Chromium rejects authenticated SOCKS5 ('Browser does not support socks5 proxy
authentication'). Add a gost (ginuerzh/gost:2.11.5) 'proxy-relay' sidecar that
listens unauthenticated on socks5://proxy-relay:11080 and forwards to the
authenticated residential upstream (HEALTHCARE_PROXY_UPSTREAM_URL). Workers point
Playwright at the relay via HEALTHCARE_PROXY_URL=socks5://proxy-relay:11080.

env template: split into HEALTHCARE_PROXY_UPSTREAM_URL (authenticated, password
percent-encoded so '#' -> %23) and HEALTHCARE_PROXY_URL (the relay address).

Validated end-to-end on dev: workers Chromium -> proxy-relay -> residential
egress IP 76.228.206.147; NPPES + PECOS both HTTP 200.
2026-06-05 18:39:26 -05:00
justin
17318f6e7d feat(healthcare): route NPPES/PECOS Playwright flows through residential SOCKS proxy
CMS healthcare portals (NPPES, PECOS, I&A) block datacenter IPs, so the
healthcare browser automation needs to egress via the residential proxy on
hg409y7ez04.sn.mynetname.net (username 'performancewest').

- undetected_browser: use_proxy now accepts an env-var name, so callers can
  select a domain-specific proxy. _proxy_config(proxy_env) reads it and falls
  back to UNDETECTED_PROXY_URL. Healthcare uses 'HEALTHCARE_PROXY_URL'.
- probe_npi_undetected: launches with use_proxy='HEALTHCARE_PROXY_URL' when set.
- npi_provider: documents that the (future) automated NPPES/PECOS flows must
  use the healthcare proxy.
- Plumb HEALTHCARE_PROXY_URL (+ UNDETECTED_PROXY_URL fallback) through the
  ansible env template and docker-compose workers env.

The credential itself is NOT in the repo. Set the full URL in the ansible
vault as vault_healthcare_proxy_url:
  socks5://performancewest:<password>@hg409y7ez04.sn.mynetname.net:<port>
Verified parsing + Playwright proxy-dict wiring with a unit test.
2026-06-05 14:36:01 -05:00
justin
c027d49f43 Fix trucking campaign cron send date 2026-06-04 03:19:35 -05:00
justin
b48fc3a406 Retire burned MTA IPs in warmup script 2026-06-03 23:37:27 -05:00
justin
5c35140a22 Configure trucking deficiency campaign cron env 2026-06-03 23:04:41 -05:00
justin
6d4c323ab6 feat: daily intake-reminder worker for paid orders with incomplete intake
Adds a systemd-timed worker that nudges customers who paid but never completed
their intake form (which stalls fulfillment).

- migration 087: intake_reminder_count + intake_reminder_last_at on
  compliance_orders (makes the daily run idempotent and bounded), plus a
  partial index for the paid-order eligibility scan.
- scripts/workers/intake_reminder.py: each run emails any paid order with
  intake_data_validated != TRUE, capped at 10 reminders/order, at most one
  consolidated email per customer per day (groups a customer's incomplete
  services into one email). Reuses the post-payment intake URL format
  (/order/{slug}?order={n}) and the API's email validation, skipping
  placeholder/invalid addresses (synthetic@, pipeline.com, etc.). Sends via
  smtplib with SMTP_PASS (verified working in the worker container).
- worker-crons: pw-intake-reminder timer, daily ~noon ET (16:00 UTC).
2026-06-03 00:20:37 -05:00
justin
2b13c36c93 ansible: sync portal nginx template with live working config
The pw-portal-tls.conf.j2 template was stale (basic 47-line version) while the
live /etc/nginx/sites-enabled/pw-portal.conf was hand-maintained with branding,
/assets/ and /files/ serving. A future ansible run would have clobbered the
working config. Sync the template to the live config (templatized) and document
why /files/ must be served from /opt/erpnext-assets, not the docker volume.
2026-06-02 22:20:08 -05:00
justin
2fab98c0a8 postfix: multi-IP warmup sending pool (20 IPs, gradual rotation)
- 20 IPs (.90-.109 / mta01-mta20) with FCrDNS + SPF in HestiaCP
- .90 (mta01) dedicated Yahoo/AOL recovery IP (yahooslow, 20s trickle)
- .91-.109 (out02-out20) rotation pool via transport_maps randmap
- pw-mta-warmup: cron-driven scheduler grows the active rotation pool
  3 -> 5 -> 8 -> 12 -> 16 -> 19 IPs over ~25 days
- mta_setup.sh: idempotent installer (backups + postfix-check-gated reload)

New IPs verified clean on Spamhaus/Barracuda/SpamCop/SORBS.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-31 19:03:30 -05:00
justin
0b7a35a58e trucking campaigns: daily builder + MX verifier concurrency + tracking column
- build_trucking_campaigns.py: nightly script that creates 8 Listmonk campaigns
  per day (4 TZ x 2 types: MCS-150 overdue 2k/TZ, inactive USDOT 1k/TZ)
  at 4AM ET / 5AM ET (CT) / 6AM ET (MT) / 7AM ET (PT). Deduplicates via
  listmonk_sent_at column.
- migration 083: add listmonk_sent_at + listmonk_campaign_type to fmcsa_carriers
- email_verifier.py: bump max_workers from 5 to 20 for 4x faster throughput
- cron: daily pw-trucking-campaigns at 08:00 UTC (3 AM EST)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-31 10:07:44 -05:00