new-site

Author	SHA1	Message	Date
justin	9dd6f53eb2	infra(mail): remove 18 dormant snowshoe IPs from postfix + host Consolidate the outbound mail footprint to match the SPF intent (already trimmed to .94/.107 on 2026-06-19). A 20-IP sending footprint reads as snowshoe spam to receivers and was contributing to domain-reputation throttling (Microsoft 451 4.7.500, Gmail low-reputation). Removed from /etc/postfix/master.cf: transports yahooslow, out02-04, out06-20, rehab02-04, HC submission ports 2527/2528, hcout2/hcout3. Removed from /etc/network/interfaces (+ live ip addr del): host bindings .90-.93, .95-.106, .108-.109. Kept: .94 (trucking/out05), .107 (HC/hcout1), .71/.72 (infra). Verified live: postfix check OK, both streams still status=sent post-change, SSH session on .71 unaffected, transport_maps still routes via out05. Snapshots: infra/postfix/live-snapshots/master.cf, infra/network/interfaces. Live backups on server: /root/{master.cf,interfaces}.bak_snowshoe_*.	2026-06-23 23:45:41 -05:00
justin	14357a0223	fix(nginx): unblock public API routes powering lead tools/flows (HC sales killer) api.performancewest.net uses an explicit per-path allowlist; everything else falls through to a trusted-IP-only catch-all that returns 403. Six browser- facing routes had no location block, so they 403'd for every public visitor: /api/v1/npi/ <- THE healthcare sales killer. The 'Free NPI Compliance Check' tool (top of the HC funnel, where every HC campaign sends traffic) fetches /api/v1/npi/lookup. It 403'd -> CORS error in the browser -> the tool never rendered results or the upsell CTAs (Revalidation $399 / NPPES $149 / Bundle $899) -> 0 HC sales despite 17 sessions reaching it in 30d and 0 HC orders EVER created in the compliance DB. /api/v1/cdr/ telecom CDR profile tool /api/v1/icc/ intrastate/ICC profile tool /api/v1/corp/ corporate foreign-qual check /api/v1/foreign-qualification/ foreign qualification quote/jurisdictions /api/v1/lnpa-regions LNPA region lookup Added explicit proxy_pass blocks (mirroring the existing entities/identity pattern) before the catch-all. Verified live: all six now reach the app with proper CORS; the NPI tool renders results + order CTAs end-to-end via a real browser; npi-revalidation order page -> Stripe confirmed. The live /etc/nginx/sites-enabled/pw-api.conf was hand-edited and untracked; committing the current state here so it is version-controlled. (Live backup: /root/pw-api.conf.bak_20260623.)	2026-06-23 15:51:30 -05:00
justin	1e9dcfcfd1	mail(rampcap): step trucking cap back up to 400/h (day 19-20), 500/h ceiling The day-9 Gmail block that forced the 200/h hold is resolved: per-MX throttling shipped, Google is excluded entirely (MAIN_EXCLUDE_OPERATORS=google), and the OpenDKIM signing bug is fixed. With Google out of the mix, 400/h (~4k/day) is within the envelope these IPs cleanly sustained at 68-76% delivery with zero blocks. Lets the post-DKIM re-send backlog drain in ~1 day instead of ~3.	2026-06-22 12:49:54 -05:00
justin	9eeed47c4b	mail: close MX-exclusion gaps — exclude consumer mx: operators + add mx-tag cron Fix 1 (build_trucking_campaigns.py): the warmup big-MX exclusion only covered the clean-label operators (google/microsoft/proofpoint/...). Consumer mailbox operators that mx_tag_carriers.py labels with an "mx:" prefix slipped BOTH the exclusion and the per-MX throttle -- notably mx:yahoodns.net (283k sendable carriers = Yahoo Small Business/AOL custom domains) and mx:icloud.com (25k), plus comcast/charter/centurylink/windstream/tds/earthlink. These are custom domains whose MX points at a consumer provider, invisible to the literal-domain blocklist. Added CONSUMER_MX_OPERATORS, folded into WARMUP_EXCLUDE_OPERATORS used by both the fetch_carriers() exclusion SQL and mx_daily_caps() (same day-30 ramp). Behind the existing MAIN_SKIP_BIG_MX switch. Validated read-only: after the fix the warmup-eligible pool is 353,909 carriers (315,892 untagged + ~38k genuinely small/self-hosted operators), so the long tail still sustains the daily quota -- not starved -- while 0 consumer-MX carriers are selected during warmup. Fix 3 (infra/cron/pw-mx-tag): mx_tag_carriers.py was on no cron, so the untagged (NULL) backlog (~316k) never drained and new FMCSA imports stayed untagged, slowly re-opening the gap. Added a daily 05:45 UTC cron (--only-unsent --limit-domains 20000), before the 08:00 builder. Idempotent/bounded (only tags mx_provider IS NULL). Verified live: a 200-domain test run tagged 216 domains. (Fix 2 -- bounding the NULL bucket cap -- deferred; the cron will drain it.)	2026-06-20 00:03:47 -05:00
justin	8e5590b492	mail: DMARC aggregate-report parser + dedicated dmarc@ mailbox ingestion Tool 2 of the deliverability monitoring pair (Tool 1 = mail_reputation_monitor). DMARC rua reports from dozens of operators (Google, Yahoo, Comcast, Cox, Bell, Mimecast, Cisco ESA, GMX, mail.com, ...) were landing in ops@ (dmarc@ was a DL), burying real mail and never parsed. Now ingested + queryable: - dmarc@performancewest.net converted DL -> dedicated Carbonio mailbox; isolated IMAP creds in server .env, surfaced to workers in docker-compose.yml (mirrors OPS_IMAP_*). 29 historical reports moved ops@ -> dmarc@ via IMAP. - scripts/dmarc_report_parser.py: IMAP fetch unseen -> decompress .gz/.zip/.xml (namespace-agnostic: classic + urn:ietf:params:xml:ns:dmarc-2.0 GMX/mail.com) -> parse aggregate XML -> upsert dmarc_report (keyed (org_name,report_id), no-op on re-parse) + dmarc_record per source IP. dmarc_pass = dkim_aligned OR spf_aligned. Marks \Seen. --dry-run/--all/--alert (7d per-IP summary + Telegram if one of OUR IPs <95% pass, or EXTERNAL IP sends >=20 failing msgs as us = spoofing under p=reject). psycopg2 imported lazily so --dry-run runs without the driver. - api/migrations/102_dmarc_aggregate.sql: dmarc_report + dmarc_record tables. - infra/cron/pw-dmarc-parser: 06:20 UTC daily --alert (after reputation, before scrub). - docs/deliverability.md: DMARC section DONE; query examples. Verified: dry-run --all parses all 28 reports (1 non-report test probe), 0 unknown after the namespace fix.	2026-06-19 08:50:20 -05:00
justin	b45332b5f7	infra(cron): nightly mail-reputation snapshot (pw-mail-reputation) Runs mail_reputation_monitor --alert at 06:10 UTC, piping the day's postfix log (sudo cat, same pattern as pw-warmup-tg-alert) into the DB-connected workers container. Builds the daily SNDS-equivalent reputation trend and Telegram-alerts on operator regressions. Installed to /etc/cron.d/pw-mail-reputation.	2026-06-19 08:38:35 -05:00
justin	72c69a05c9	infra(cron): daily Listmonk consumer-domain reconciliation (pw-listmonk-scrub) Runs scrub_listmonk_consumer against both listmonk and listmonk_hc at 06:30 UTC, before the campaign builders, so any ENABLED subscriber matching the authoritative exclusion list is blocklisted retroactively. Keeps list-based campaigns (FCC Direct Contacts, CRTC/USF, etc.) from leaking onto consumer mailboxes after a new domain (e.g. Apple/iCloud) is added to the exclusion list. Installed to /etc/cron.d/pw-listmonk-scrub on the host.	2026-06-19 00:00:46 -05:00
justin	3ca960aca5	docs+infra(deliverability): document bulk subdomain; ansible signs send.performancewest.net - infra/ansible/roles/mail: refactor OpenDKIM to support multiple signing domains via opendkim_signing_domains list (root + send.performancewest.net). Loops keygen/ownership/keytable/signingtable so the live two-domain setup is reproducible from ansible. - infra/ansible group_vars: add bulk_mail_subdomain + campaign_from_* + campaign_reply_to documentation vars (map to CAMPAIGN_FROM / HC_CAMPAIGN_FROM env read by the builder scripts). smtp_from (transactional) stays on root. - docs/deliverability.md: rewrite TL;DR with the carrierone-vs-performancewest A/B proof (same server/IPs, different From domain -> Inbox vs Junk) and the ~85% Microsoft / 14% Google / <1% Yahoo audience mix; add the bulk-subdomain section, SPF trim, rehab-disabled, and the Hestia DNS automation runbook.	2026-06-18 23:12:05 -05:00
justin	545e6f7ed7	infra(mail): consolidate sending IPs (kill snowshoe) now that DKIM is fixed The multi-IP rotation was built to spread risk while DKIM was broken (fixed 2026-06-17) and after the May 30-31 over-volume blast. With DKIM signing correctly, spreading ~3k trucking msgs/day across 12 IPs (.94-.105) + ~1.2k healthcare msgs/day across 3 IPs (.107-.109) gave each IP far too little per-receiver volume to build reputation. Gmail/Outlook read it as snowshoe spam and reputation-blocked ~200 msgs/day ("very low reputation of the sending domain") -> 0 human clicks, 0 sales. Consolidate to ONE IP per stream so each accrues real reputation: - trucking: pw-mta-warmup ALL=(out05) -> randmap collapses to {out05:} = .94 - healthcare: listmonk-hc SMTP servers 2/3 (ports 2527/2528 -> .108/.109) disabled in DB; all HC mail now egresses .107 (hcmta01). [applied live] Applied live: transport_maps now randmap:{out05:}; listmonk-hc restarted. To re-expand later: add transports back to ALL + re-enable the HC SMTP servers.	2026-06-18 17:41:07 -05:00
justin	cf021e2f91	feat(healthcare): OIG/SAM exclusion screening as $79/mo Stripe Subscription Convert OIG/SAM from one-time $299/yr to recurring $79/month (card+ACH only) - the first real recurring-billing product in the system. Exclusion screening is a monthly federal obligation, so recurring monitoring fits the requirement and is the biggest valuation lever (vs a one-time annual run). Catalog (single source of truth): - service-catalog.ts: add billing_interval + allowed_methods to ComplianceService; oig-sam-screening -> 7900c, billing_interval:"month", allowed_methods:[card,ach], name "(Monthly Monitoring)". - gen-service-catalog.py + check-service-catalog-drift.py: carry/guard the two new fields; regenerate site catalog. Checkout (api/src/routes/checkout.ts): - mode:"subscription" with recurring price_data when billing_interval is set; surcharge absorbed for recurring (clean $79/mo); server-side METHOD_NOT_ALLOWED re-validation against allowed_methods. - ensureColumns + migration 100: compliance_orders.stripe_subscription_id, bundle_upsell_sent_at (+ subscription index). Webhooks (api/src/routes/webhooks.ts): - record stripe_subscription_id on checkout.session.completed (subscription mode). - invoice.paid (subscription_cycle only) -> re-dispatch screening for the cycle; invoice.payment_failed -> admin alert + first-failure customer nudge; customer.subscription.deleted -> mark order cancelled. (API 2026-03-25 moved the subscription link to invoice.parent.subscription_details.subscription.) Fulfillment: - job_server.py: pass recurring_cycle/invoice_id into the order. - npi_provider.py: OIG handler labels renewal cycles "[Monthly cycle]" + re-screen note; bundle action runs only the FIRST screening + flags the $79/mo upsell. Bundle land-and-expand: - Provider Compliance Bundle now includes only the first OIG/SAM screening (was giving away $948/yr of monitoring inside an $899 bundle). - new worker scripts/workers/bundle_upsell.py (+ pw-bundle-upsell timer): ~3 weeks after a paid bundle, emails the customer to continue $79/mo monitoring; dedup via bundle_upsell_sent_at; skips customers who already have an OIG/SAM order. Surfaces updated to $79/mo: PaymentStep (filters methods, "Billed every month, cancel anytime"), order pages, healthcare index, npi-compliance-check tool (also fixed stale $699 bundle drift -> $899), hc_oig_screening + hc_compliance_bundle emails. Docs: billing.md gains a "Stripe-native Subscriptions" section + a reality-check banner (Adyen/ERPNext-gateway model documented there is NOT live; Stripe is the real rail). Fixed run-migrations.yml container name bug (performancewest-postgres-1 -> performancewest-api-postgres-1, overridable). Tests: api/tests/recurring-subscription.test.ts (28 assertions) covers catalog gating, method validation, surcharge suppression, recurring line-item build, invoiceSubscriptionId extraction, renewal-cycle gating. tsc clean; site build clean; catalog drift OK. Manual deploy step: enable invoice.paid, invoice.payment_failed, customer.subscription.deleted on the Stripe webhook endpoint.	2026-06-18 07:54:38 -05:00
justin	a04ecf7df3	chore(email): decommission SMTP2GO references — local MTA only SMTP2GO is no longer used: Listmonk relays through the local Postfix MTA (172.18.0.1:25 from the Docker network), which DKIM-signs and delivers direct-to-recipient-MX; transactional mail goes through Carbonio. Verified zero smtp2go in any live container env + postfix has no external relayhost. Removed the stale references so a rebuild/new dev can't re-introduce it: - api/src/config.ts: SMTP_HOST default mail.smtp2go.com -> co.carrierone.com - scripts/workers/crypto_payment_worker.py: same default fix - infra/ansible all.yml: listmonk_smtp_* now 172.18.0.1:25, no auth (+comment) - app.env.j2 / email.ts / crm.md / go-live-todo.md / architecture.svg: docs	2026-06-17 22:46:59 -05:00
justin	899b880e7f	trucking: weekly FMCSA source refresh so new non-compliant carriers are caught The FMCSA census was a one-time snapshot (last loaded ~May 30) with NO refresh timer -- carriers newly falling out of MCS-150/UCR compliance were never picked up. New scripts/workers/fmcsa_source_refresh.py orchestrates the full pipeline (census download -> enrichment -> deficiency flag -> verify new emails -> MX-tag new) and runs weekly via cron pw-fmcsa-refresh (Sun 09:00 UTC), codified in the mail-pipeline Ansible role. Idempotent + incremental: the census upsert preserves email_verified / listmonk_sent_at / deficiency_flags, so existing carriers keep their send state and only census fields refresh; new DOTs flow into verification then campaigns. A carrier who refiled gets a fresh mcs150_parsed, so the builder's overdue WHERE clause stops targeting them automatically. Verify is capped per run (20k) so it never stalls on millions of rows. (Healthcare already auto-catches newly-revalidation-overdue providers within its 63k institutional pool via pw-hc-refresh Mon/Wed/Fri.)	2026-06-17 20:44:54 -05:00
justin	4dc5690666	infra: codify the email-campaign pipeline in Ansible (new mail-pipeline role) The entire outbound campaign pipeline lived ONLY on the host and was never in IaC -- a fresh rebuild would have silently shipped NO campaigns, NO IP warmup/ ramp, and NO bounce processing. New mail-pipeline role + deploy-mail-pipeline.yml playbook deploy it from the canonical repo copies: cron.d (infra/cron/): - pw-trucking-campaign-builder, pw-ifta-campaign, pw-ucr-campaign - pw-hc-campaign, pw-hc-nppes, pw-hc-refresh - pw-mta-warmup, pw-listmonk-rampcap, pw-hc-rampcap - pw-ip-rehab, pw-warmup-tg-alert helper scripts (-> /usr/local/bin): - pw-mta-warmup, pw-listmonk-rampcap, pw-hc-rampcap, pw-warmup-tg-alert - postfix-bounce-notify.sh, postfix-hc-bounce-notify.sh, listmonk-bounce-sync.py systemd services: - pw-bounce-watcher.service (was missing from repo), pw-hc-bounce-watcher.service Also creates the deploy-owned {{project_dir}}/logs dir (deploy can't write /var/log, so a missing dir made cron redirects fail). Added the 6 cron.d files that existed only on the host, the trucking bounce-watcher unit, and synced infra/cron/pw-hc-refresh to the live version (revalidation download + enrich steps). Role wired into site.yml after the mail (OpenDKIM) role. Part of the email-deliverability incident hardening.	2026-06-17 20:26:01 -05:00
justin	2e4388a803	mail: add logrotate for Postfix mail.log (postlogd copytruncate) mail.log had no logrotate rule and grew unbounded to ~1GB (~150MB/day) since Jun 8. This host logs via Postfix's built-in postlogd (maillog_file mode), not rsyslog (no rsyslog.service exists), so postlogd holds the file open -- a plain rename+create would leave it writing to the stale inode. Use copytruncate (no daemon signal needed). Rotate daily, keep 14 days compressed. Applied live: forced first rotation, compressed the 1GB archive (->99MB), verified logging + bounce watchers + DKIM signing intact. Part of the email-deliverability incident hardening (follows DKIM fix `4d59019`).	2026-06-17 19:47:13 -05:00
justin	4d5901921e	mail: fix OpenDKIM not signing campaign mail (Docker-injected) + codify in Ansible Root cause of the Jun 2026 deliverability collapse / 'no new sales': opendkim.conf was in single-key mode with no InternalHosts, so it signed only 127.0.0.1. Transactional/cron mail (injected locally) was signed, but ALL campaign mail -- injected over the Docker bridge from the Listmonk containers (172.18.0.5 trucking, 172.18.0.25 healthcare) -- went out UNSIGNED. Gmail/Yahoo require DKIM on bulk mail since Feb 2024, so cold campaigns were junked/blocked (~23% delivery, 550-5.7.1). Proof: 2,620 campaign msgs that day, 0 DKIM sigs. The correct table files already existed on the server but were never wired into opendkim.conf. Fix points the daemon at key.table/signing.table and sets InternalHosts/ExternalIgnoreList to trusted.hosts (which includes 172.16.0.0/12, the Docker subnet). Fixes BOTH streams: HC submission ports 2526-2528 inherit the global smtpd_milters and *@performancewest.net covers compliance@. Verified by injecting from a Docker IP through port 25 and port 2526 -- both now get 'DKIM-Signature field added'. Codified as new Ansible role 'mail' so it can't silently regress (OpenDKIM was previously not in IaC at all).	2026-06-17 19:31:19 -05:00
justin	01b3e1d234	chore(env): scaffold ISA_SC_DMS_USER/PASS for SC PSC MyDMS e-file portal Non-attorney 'Service' filer account registered under Performance West (filings@performancewest.net). Credentials live only in the server .env (blank default in template, never committed). Consumed by the upcoming SC intrastate Playwright e-filer.	2026-06-16 08:19:17 -05:00
justin	c27cfd3242	docs(crons): note IRP invoice poller now also handles intrastate [PW-ISA] replies	2026-06-16 07:59:38 -05:00
justin	b125d46663	feat(intrastate): automate state PUC/PSC authority filing (email + invoice + auto-bill) Intrastate operating authority is state-specific + application-based like IRP, so it reuses the same email/POA + invoice-reconciliation flow: - intrastate_filing.send_intrastate_submission: emails the state PSC/PUC the authority application with the signed POA attached (subject tag [PW-ISA CO-..]), reusing irp_filing's MinIO download + census enrich helpers. - The shared poller (irp_invoice_poller) now matches BOTH [PW-IRP] and [PW-ISA] tags, parses the fee, Telegram-alerts, and bills the customer the exact amount with the correct service slug. - state_trucking gov-fee gate routes intrastate-authority to the PSC/PUC email path; if no submission email is configured for the base state it falls back to a manual todo (safe default — no emailing guessed agency addresses). Per-state ISA_<ST>_EMAIL env (blank until the exact agency address is verified). SC/GA/TX scaffolded. Customer still only sees an exact-fee payment link; you only approve the final filing.	2026-06-16 07:57:57 -05:00
justin	ea695d6828	feat(govfee): exact fees + agency processing fees; IRP email/invoice reconciliation - gov_fee: add AGENCY_PROCESSING_FEE (per-service card/convenience fee passed through so the customer pays the true all-in cost); estimate_gov_fee now folds it into the billed total. IFTA/intrastate/UCR fees are published/near-exact. - IRP fees can't be looked up — only the base state computes them. New irp_filing.py: emails the base-state IRP unit a Schedule A/B request (Reply-To the IRP filings mailbox, [PW-IRP CO-...] subject tag), and a 15-min cron (irp_invoice_poller) scans the mailbox for the state's invoice reply, parses the exact apportioned fee, Telegram-alerts you, and bills the customer the EXACT amount via a gov-fee child order + payment link. Then it proceeds to ready_to_file for your final approval. - state_trucking gov-fee gate now routes IRP to the email/invoice path and IFTA/intrastate to immediate exact-fee billing. - Mailbox is configurable (IRP_FILINGS_IMAP_* in app.env.j2); falls back to OPS_IMAP_* filtered by the [PW-IRP] tag until a dedicated mailbox exists. Telegram alerts fire on IRP submission sent, invoice received (billed), and un-parseable replies (so you can read + enter the fee manually).	2026-06-16 04:58:14 -05:00
justin	d65f5ea279	nginx: stop blocking /admin (bot-scan rule matched our own dashboard) The shared security snippet blocked any path matching /(admin\|administrator\| login.action\|struts) with 'return 444', which drops the connection. That bare 'admin' token also matched our own operations dashboard at /admin and the new /admin/compliance-orders, so the browser showed 'This site can't be reached'. Dropped the bare 'admin' token; administrator/login.action/struts stay blocked. Applied live on prod (sudo edit + nginx reload); this updates the source of truth so the ansible nginx role won't reintroduce it.	2026-06-16 00:05:54 -05:00
justin	2caab6aa69	hc: warmup must run DAILY for the full 21-day ramp (not weekdays-only) The HC warmup crons were '* * 1-5' (Mon-Fri), silently skipping weekends -- but a proper warmup needs CONTINUOUS daily volume for 21 days (mailbox providers reward consistency; gaps stall reputation). The Jun 14 'HC 0 sent' alert was just a skipped Sunday, but the weekend skips also broke ramp continuity. - pw-hc-campaign + pw-hc-nppes: '* * 1-5' -> '* * *' (daily), vendored + applied live. - Re-aligned the warmup start stamp from calendar-day 9 to send-day 5 so the volume ramp matches reputation actually built (it had skipped ~4 weekend days, running the ramp ahead of real history). - Fixed the stale 'Mon-Fri only' comment in daily_slice(). - Vendored nppes cron now carries the enriched-CSV + 4-segment config.	2026-06-14 21:02:08 -05:00
justin	dd4ed3ea38	warmup: ROLL BACK main pool to 200/h after Gmail spam-blocked IPs at 400/h Day 9 (2026-06-13) alert: main pool 54% delivery, 202 Gmail spam-blocks (550-5.7.1 'Gmail has detected') on warming IPs .94-.98. The 4k/day (400/h) ramp was too aggressive AND the trucking pool lacks the per-MX throttling the HC pool got -- Google-Workspace-hosted business domains (weberfarms.net, uatruck.com, etc.) concentrated and Gmail blocked us. Held at 200/h (~2k/day) through day 20 to recover, then slow step to 300/h. Applied live (cap already set to 200/h).	2026-06-13 20:10:13 -05:00
justin	ff4ab262a8	hc: cron to feed NPPES institutional base (63k verified) into warmup, MX-throttled Adds /etc/cron.d/pw-hc-nppes (weekdays 07:30) that imports the verified NPPES institutional general-compliance base into the OIG screening segment, throttled per MX operator. Separate from the 07:00 reval-segment run so the two pipelines stay independent. Vendored the cron file under infra/cron/.	2026-06-12 22:11:12 -05:00
justin	887bf9a14a	warmup: grow main (trucking) pool faster -- 3k -> 4k/day now, 5k at day 14 The main sending IPs are cleanly warmed: today 3,845 sent at 0.18% bounce, ZERO deferrals, ZERO ISP rate-limit/blocklist/Spamhaus hits. The script's own note records these IPs historically sustained ~2,500/day at 68-76% delivery; collapses only ever came from 17k-29k spikes. So we have ample headroom to accelerate the trucking ramp safely: day 7-13: 300/h -> 400/h (~4,000/day) [applied now, day 8] day 14+: new 500/h (~5,000/day) [hard ceiling, well under ~17k] Also vendored pw-listmonk-rampcap into the repo (infra/postfix/) -- it previously lived only on the server at /usr/local/bin. Live script updated and applied (listmonk cap now 400/h).	2026-06-11 00:13:41 -05:00
justin	c8a0824143	firewall: allow ezstorehost (207.174.124.51) to reach Forgejo SSH Add ezstorehost to trusted_admin in both layers — the nft input set and the DOCKER-USER iptables chain (Forgejo is containerised; DNAT means the post-DNAT dport 22 rule applies). Required for static-tenant deploys from ezStorehost-infra to clone repos over ssh://. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 22:45:43 -05:00
justin	1854753c70	monitoring: add .91-.93 IP rehab to daily Telegram warmup alert Tracks the rehab pool (rehab02-04 / .91-.93) delivery + bounce + Spamhaus ZEN DNSBL status in the daily report and alert body. Alerts only if a rehab IP lands on a DNSBL or rehab delivery drops <40% with real volume (recipient quality slipped) -- a recovering IP naturally bounces more so the threshold is lenient.	2026-06-09 20:34:41 -05:00
justin	25f4a7503b	warmup: IP rehab for .91-.93 so they can be reallocated The 3 IPs (mta02-04 / .91-.93) retired after the May 30-31 over-volume blast are NOT on any DNSBL (Spamhaus/Barracuda/SpamCop/SORBS all clean) and have clean PTRs + SPF/DKIM/DMARC -- the damage was provider-internal reputation, which recovers with slow clean sending. scripts/ip_rehab.py sends a tiny ramping trickle (10/IP/day -> cap 60) of genuine CAN-SPAM-compliant compliance check-in mail to clean business-domain, never-bounced recipients via dedicated heavily-throttled postfix transports rehab02/03/04 (30s/msg, bound to .91/.92/.93). Routing uses an X-PW-Rehab-IP header + header_checks FILTER to override the transport_maps randmap warmup rotation (verified: mail routes via rehab transports, status=sent). Daily cron pw-ip-rehab. After ~2-3 weeks of clean sending the IPs can be reallocated.	2026-06-09 20:27:47 -05:00
justin	9fa2c86f01	fix(warmup): HC cron logged to /var/log (deploy can't write) -> cron silently died The HC warmup builder ran from cron at 07:00 but the >> /var/log/pw-hc-campaign.log redirect failed (deploy user cannot write /var/log), and a failed output redirect makes cron abort the command BEFORE it runs -> HC sent 0/day since the log file was removed. Route HC cron logs to /opt/performancewest/logs/ (deploy-owned) so the redirect always succeeds. Builder itself was fine (verified: imports + sends work, 0 bounces). Also removed the stale 'campaign-warmup.sh 122' root-cron line that pointed at a finished campaign + no longer existed.	2026-06-09 16:06:28 -05:00
justin	9b9d317916	infra/k8s: shkeeper liveness+readiness probes (fix recurring crypto.performancewest.net downtime) crypto.performancewest.net kept going down because the shkeeper-deployment web pod periodically HANGS (HTTP server deadlocks while the apscheduler background thread keeps the process alive). The helm chart (shkeeper-1.7.15) ships NO liveness or readiness probe, so k8s saw the hung pod as Running and never restarted it, and kept routing traffic to the dead backend -> site down until a manual restart. Added HTTP probes on / :5000 (302 = healthy): liveness auto-restarts a hung pod, readiness pulls it from the Service endpoints. Applied live via kubectl patch (chart does not expose probes via values; re-apply after any helm upgrade -- command in the file header). Verified: new pod comes up READY 1/1 (probe passes) and crypto.performancewest.net serves 302 again.	2026-06-09 04:57:50 -05:00
justin	7c39a858cc	monitoring: daily warmup IP-reputation Telegram alert End-of-day (20:00 Central) check of campaign deliverability across both sending pools (main out05-09 + healthcare hcout). Sends a Telegram alert ONLY when there is a reputation problem -- delivery below 65% or a spam/policy-block (550-5.7.1) spike above 150/day -- so healthy days stay silent. Reuses the existing TELEGRAM_BOT_TOKEN/CHAT_ID from /opt/performancewest/.env. Logs every run to /var/log/pw-warmup-healthcheck.log for history. Excludes internal/probe noise so the delivery figure reflects real external recipients.	2026-06-08 21:06:41 -05:00
justin	2156a5e05f	hc refresh: run Mon/Wed/Fri instead of weekly to shrink CMS data-lag The 'already revalidated' replies come from the CMS data-lag window (a provider completes their revalidation but CMS's public Due Date List still shows them overdue for weeks). Running the refresh 3x/week instead of weekly shrinks that window from up to 7 days to ~2-3, so a provider who just completed stops being targeted faster. No change to the overdue window or audience size -- this is the lever that reduces stale-data complaints without losing prospects.	2026-06-08 10:53:36 -05:00
justin	9cb10b18e0	feat(hc): deliverability prune -- evict newly-Google-hosted subscribers Belt-and-suspenders for the edge you flagged: a domain already in a warmup list could flip its MX to Google Workspace between weekly refreshes, after which it would hard-bounce from the cold IP. The import-time guard only catches NEW adds. - prune_holdouts(): enumerates each warmup list's subscribers, matches them against the FRESH master CSV (re-classified weekly), and removes any whose domain is now Google-hosted. DELIVERABILITY-ONLY -- it never evicts for audience reasons (an overdue provider drifting out of the 1-90 day window was a valid target when warmed; re-litigating that just wastes warmup progress). - --prune (run alongside warming) and --prune-only (prune then exit). - Wired into the weekly refresh cron as a --prune-only chained step, so MX is re-checked and holdouts removed every Monday before the weekday sends. Verified end-to-end: with no Google domains in lists it's a 0-op; injecting a simulated Google-flipped domain into the master, the prune correctly detects and (in a real run) would remove it from every list it's on.	2026-06-08 03:39:56 -05:00
justin	feb677f6ce	fix(hc warmup): only mail slightly-overdue providers (deliverability) Mailing heavily-overdue NPIs (months/years past due) risks hitting practices that have closed, merged, or abandoned the inbox -> hard bounces, which are the fastest way to wreck a warming IP's reputation. The warmup now restricts the reval_overdue selector to an inclusive [HC_OVERDUE_MIN, HC_OVERDUE_MAX] window (default 1-90 days) and the OIG 'any' selector likewise excludes heavily-overdue and dropped-off-list rows. On the current cohort this trims the overdue audience 178->96 and the OIG audience 399->317, holding out the stale long tail (181-365d + 366d+). upcoming/active providers are unaffected.	2026-06-08 03:27:22 -05:00
justin	167c4a3847	infra/cron: multi-segment hc warmup + weekly data-refresh cron Tracks the deployed cron.d files in the repo: - pw-hc-campaign: updated comment to reflect the now multi-segment warmup (revalidation + OIG + NPPES + reactivation + bundle); command unchanged. - pw-hc-refresh (NEW): Mon 06:00 Central weekly data refresh, ~1h before the 07:00 weekday send, so every send uses fresh CMS/OIG status.	2026-06-08 03:15:47 -05:00
justin	138fec17e9	healthcare: daily batched paper-filing fulfillment Standard (no-login) CMS filings are mailed in one Priority Mail envelope per destination agency, batched each postal working-day morning to save postage. - migration 089: paper_filing_batches table + esign_records.paper_batch_id / filing_destination_key (idempotent: a filing is batched at most once). - batch_cover_sheet.py: per-agency cover sheet (sender/dest/date/manifest) + merged print-job PDF (cover + all enclosed signed filings). - daily_paper_batch.py worker: gather signed+unbatched cms855/cms10114 filings, group by destination (MAC by state via mac_routing; Fargo for CMS-10114), build cover+merged PDF per agency, persist batch, mark filings batched. Self-gates on postal working days (skips weekends + federal/USPS holidays). Phase 1 = human prints+mails; phase 2 = wire print-mail API. - worker-crons: pw-paper-batch systemd timer (Mon-Fri 13:30 UTC, self-gated). - test_paper_batch.py: 15/15 pass (working-day gating, routing, cover+merge).	2026-06-07 00:30:01 -05:00
justin	bf4e8c2277	infra: MTA-STS HTTPS vhost (cert issued, policy live)	2026-06-06 21:03:30 -05:00
justin	34daa0c1d3	infra: MTA-STS status note - cert pending stable HE.net DNS propagation	2026-06-06 19:37:37 -05:00
justin	7bd2f70de4	infra: MTA-STS policy + vhost + README (cert pending DNS propagation)	2026-06-06 19:36:27 -05:00
justin	4233c90a4f	hc email: reframe value-add to 'No 2FA. No government portals.' (we have a portal; the pain is CMS 2FA/identity-proofing); cron creates fresh dated campaign when prior is finished; add hc bounce watcher (Postfix->listmonk-hc webhook, hard/complaint->blocklist)	2026-06-06 16:47:12 -05:00
justin	6738a335af	infra: nginx vhost for listmonk-hc admin portal (lists-hc.performancewest.net -> 127.0.0.1:9101, LE cert)	2026-06-06 07:02:50 -05:00
justin	95698852ce	healthcare warmup: gate Google/Workspace domains out of week 1 (they hard-reject cold IPs 550-5.7.1); send 501 non-Google practice domains first, defer 222 Google to week 2-3; cron uses hc_warmup_nongoogle.csv	2026-06-06 04:02:00 -05:00
justin	2bc86268f7	healthcare: HC warmup campaign cron (Mon-Fri 7AM Central) - imports overdue-first verified slice into listmonk-hc + runs Medicare-revalidation campaign via hc HOT stream; rate-throttled by pw-hc-rampcap	2026-06-06 03:57:08 -05:00
justin	695c3e2431	security: drop all CBC TLS suites (Qualys WEAK -> AEAD-only, still A+); sync ansible nginx templates (ciphers + ywxi CSP); capture host firewall as IaC	2026-06-06 00:49:21 -05:00
justin	90d8b94f3f	feat(email): wire listmonk-hc into deploy + dev override + hc ramp-cap - deploy.sh/deploy-dev.sh: bring up listmonk-hc (upstream image, excluded from build); document the one-time listmonk_hc DB create + --install. - docker-compose.dev.override.yml: dev-only override (committed) that drops the prod host-port bindings and pins dev's own postgres volume (dev-pgdata) via compose !override tags. deploy-dev ships it as docker-compose.override.yml so syncing the canonical compose to the shared host no longer breaks dev's api-postgres (port :5432 clash + volume switch). Discovered + fixed while validating listmonk-hc on dev. - pw-hc-rampcap.sh: healthcare analogue of pw-listmonk-rampcap, ramps the listmonk_hc cap 100->1000/h off /etc/postfix/hc-warmup-start, fully independent of the trucking ramp/cap.	2026-06-05 19:19:45 -05:00
justin	70d742df08	feat(mta): healthcare HOT-stream Postfix setup (dedicated hc IPs, isolated) Adds 3 hc submission ports (2526/2527/2528) in the single Postfix instance, each content_filter'd onto a dedicated hc transport (hcout1/2/3) binding the hc IPs .107/.108/.109 with hc HELO identity (hcmta01-03) and hotter concurrency. listmonk-hc round-robins the 3 ports. Discovered + documented the constraint that drove this shape: transport_maps randmap is owned by the shared trivial-rewrite(8) and is global, so neither a per-smtpd -o transport_maps nor a FILTER randmap:{...} can scope a separate IP pool (FILTER parses randmap as a literal transport). content_filter=hcoutN: (empty nexthop) overrides transport_maps and keeps the real recipient domain. Verified end-to-end on the server: :2527 -> hcout2 (.108) -> real gmail MX; trucking transport_maps (.94-.96) untouched. Idempotent, postfix-check gated with auto-rollback.	2026-06-05 19:07:02 -05:00
justin	a79d6b1906	feat(healthcare): add gost proxy-relay so Chromium can use the residential proxy Chromium rejects authenticated SOCKS5 ('Browser does not support socks5 proxy authentication'). Add a gost (ginuerzh/gost:2.11.5) 'proxy-relay' sidecar that listens unauthenticated on socks5://proxy-relay:11080 and forwards to the authenticated residential upstream (HEALTHCARE_PROXY_UPSTREAM_URL). Workers point Playwright at the relay via HEALTHCARE_PROXY_URL=socks5://proxy-relay:11080. env template: split into HEALTHCARE_PROXY_UPSTREAM_URL (authenticated, password percent-encoded so '#' -> %23) and HEALTHCARE_PROXY_URL (the relay address). Validated end-to-end on dev: workers Chromium -> proxy-relay -> residential egress IP 76.228.206.147; NPPES + PECOS both HTTP 200.	2026-06-05 18:39:26 -05:00
justin	17318f6e7d	feat(healthcare): route NPPES/PECOS Playwright flows through residential SOCKS proxy CMS healthcare portals (NPPES, PECOS, I&A) block datacenter IPs, so the healthcare browser automation needs to egress via the residential proxy on hg409y7ez04.sn.mynetname.net (username 'performancewest'). - undetected_browser: use_proxy now accepts an env-var name, so callers can select a domain-specific proxy. _proxy_config(proxy_env) reads it and falls back to UNDETECTED_PROXY_URL. Healthcare uses 'HEALTHCARE_PROXY_URL'. - probe_npi_undetected: launches with use_proxy='HEALTHCARE_PROXY_URL' when set. - npi_provider: documents that the (future) automated NPPES/PECOS flows must use the healthcare proxy. - Plumb HEALTHCARE_PROXY_URL (+ UNDETECTED_PROXY_URL fallback) through the ansible env template and docker-compose workers env. The credential itself is NOT in the repo. Set the full URL in the ansible vault as vault_healthcare_proxy_url: socks5://performancewest:<password>@hg409y7ez04.sn.mynetname.net:<port> Verified parsing + Playwright proxy-dict wiring with a unit test.	2026-06-05 14:36:01 -05:00
justin	c027d49f43	Fix trucking campaign cron send date	2026-06-04 03:19:35 -05:00
justin	b48fc3a406	Retire burned MTA IPs in warmup script	2026-06-03 23:37:27 -05:00
justin	5c35140a22	Configure trucking deficiency campaign cron env	2026-06-03 23:04:41 -05:00

1 2

63 commits