Commit graph

799 commits

Author SHA1 Message Date
justin
a9bbfbf59b docs(deliverability): Microsoft MANUAL 2 fully DONE — SNDS access + JMRP both set
SNDS access requested/granted for 207.174.124.94 + .107; JMRP feeds registered
with complaint dest fbl@. Section marked complete. SNDS data populates in ~24-48h.
2026-06-19 02:03:30 -05:00
justin
f293466519 docs(deliverability): JMRP complaint dest set to fbl@performancewest.net
Corrected: JMRP feed destination was set to fbl@ directly (no forward needed);
ARF complaints route to ops@.
2026-06-19 01:00:16 -05:00
justin
60540f949d docs(deliverability): JMRP done — both IPs registered (pw1/.94, pw2/.107)
Note JMRP delivers ARF complaints to the signed-in MS account's email, not
automatically to fbl@; set a forward if that account isn't fbl@performancewest.net.
2026-06-19 00:59:49 -05:00
justin
776817c727 docs(deliverability): correct SNDS entry URL (snds.microsoft.com does not resolve)
Use the legacy sendersupport.olc.protection.outlook.com/snds/ (308-redirects) or
the direct substrate.office.com/ip-domain-management-snds/SNDS app URL. Flag that
snds.microsoft.com has no DNS.
2026-06-19 00:46:25 -05:00
justin
7828ee4587 docs(deliverability): fix SNDS/JMRP URLs for Microsoft's 2026 substrate migration
SNDS moved off sendersupport.olc.protection.outlook.com to
substrate.office.com/ip-domain-management-snds/. The old /snds/ and /pm/ links
308-redirect there. Document that the footer/help links going to microsoft.com
are boilerplate (not broken), and that you must Log in FIRST or the Request
Access / JMRP links bounce to login.microsoftonline.com (expected, not dead).
Add working direct links + canonical https://snds.microsoft.com entry point.
2026-06-19 00:45:59 -05:00
justin
e18f23634a docs(deliverability): document consumer-domain exclusion two-layer model + scrub
Records the Apple/iCloud addition, the builder-vs-list-based distinction, the
scrub_listmonk_consumer reconciliation tool + daily cron, and the 2026-06-19
first-run numbers (7,943 trucking + 21 HC stale consumer subs blocklisted).
2026-06-19 00:01:17 -05:00
justin
72c69a05c9 infra(cron): daily Listmonk consumer-domain reconciliation (pw-listmonk-scrub)
Runs scrub_listmonk_consumer against both listmonk and listmonk_hc at 06:30 UTC,
before the campaign builders, so any ENABLED subscriber matching the authoritative
exclusion list is blocklisted retroactively. Keeps list-based campaigns (FCC
Direct Contacts, CRTC/USF, etc.) from leaking onto consumer mailboxes after a new
domain (e.g. Apple/iCloud) is added to the exclusion list. Installed to
/etc/cron.d/pw-listmonk-scrub on the host.
2026-06-19 00:00:46 -05:00
justin
b40fc7ec36 feat(deliverability): exclude Apple consumer mail + scrub stale consumer subs from Listmonk
The fmcsa campaign builders already exclude gmail/yahoo/microsoft/etc. from NEW
audience selections, but two reputation leaks remained on the LIST-BASED side:

1. iCloud/Apple gap. icloud.com/me.com/mac.com were never in the exclusion set.
   A 2026-06 Listmonk audit found 1,321 ENABLED iCloud subscribers on list 3
   ("FCC Carriers - Direct Contacts") -- the single largest enabled-consumer
   bucket -- being cold-blasted with no exclusion at all. Add APPLE_CONSUMER_DOMAINS.

2. Stale already-imported consumer subs. List-based campaigns (e.g. the running
   CRTC/USF blast on list 3) keep hitting consumer addresses imported BEFORE the
   relevant domain joined the exclusion list. gmail.com was still the #1 bounce
   domain via that campaign even though new selections exclude it. Add
   scrub_listmonk_consumer.py: reconciles the live Listmonk subscriber table
   against the authoritative exclusion list and blocklists any ENABLED subscriber
   whose address is_blocked(). Idempotent; re-run whenever the exclusion grows so
   it applies retroactively. Uses the same 'blocklisted' terminal state as the
   bounce handler, so contacts are excluded from all current/future campaigns
   without deleting history. Supports --dry-run and both listmonk / listmonk_hc.
2026-06-18 23:55:58 -05:00
justin
49842bddbb docs(deliverability): Microsoft #1 priority + role mailboxes created (Carbonio)
Created postmaster@/abuse@/fbl@/dmarc@ as Carbonio DLs -> ops@ (they previously
REJECTED 5.1.1, which would have blocked SNDS verification AND was silently
dropping all DMARC aggregate reports). Verified accept-at-MX + delivered E2E.
Reframe Microsoft as the #1 monitoring priority (85% of audience), Yahoo as
lowest (<1%); add Carbonio admin access note; note DMARC parser now worth building.
2026-06-18 23:31:20 -05:00
justin
3ca960aca5 docs+infra(deliverability): document bulk subdomain; ansible signs send.performancewest.net
- infra/ansible/roles/mail: refactor OpenDKIM to support multiple signing domains
  via opendkim_signing_domains list (root + send.performancewest.net). Loops
  keygen/ownership/keytable/signingtable so the live two-domain setup is
  reproducible from ansible.
- infra/ansible group_vars: add bulk_mail_subdomain + campaign_from_* +
  campaign_reply_to documentation vars (map to CAMPAIGN_FROM / HC_CAMPAIGN_FROM
  env read by the builder scripts). smtp_from (transactional) stays on root.
- docs/deliverability.md: rewrite TL;DR with the carrierone-vs-performancewest
  A/B proof (same server/IPs, different From domain -> Inbox vs Junk) and the
  ~85% Microsoft / 14% Google / <1% Yahoo audience mix; add the bulk-subdomain
  section, SPF trim, rehab-disabled, and the Hestia DNS automation runbook.
2026-06-18 23:12:05 -05:00
justin
5c3b4291e7 feat(deliverability): send bulk campaigns from dedicated subdomain send.performancewest.net
Isolates bulk sending reputation onto a dedicated subdomain so the root domain
stays clean for transactional/verification mail (and recovers faster). Replies
still go to the root domain via Reply-To, so the customer-facing reply experience
is unchanged.

- build_trucking_campaigns.py: add env-overridable FROM_EMAIL
  (noreply@send.performancewest.net); use it for both scheduled + test sends
  instead of inheriting base["from_email"] from the DB base campaign.
- build_healthcare_campaigns_cron.py: FROM_EMAIL ->
  compliance@send.performancewest.net (env-overridable).
- bounce-watcher.sh / hc-bounce-watcher.sh: track the new subdomain envelope
  sender (keep legacy root-domain sender so the pre-cutover queue still drains;
  HC also tracks by hcout transport regardless of sender).

Infra already live (separate, non-code): subdomain DNS (A/MX/SPF/DKIM
selector=send/DMARC p=reject) on the Hestia master, OpenDKIM signs
d=send.performancewest.net (verified end-to-end), egress .94/.107. Root SPF
trimmed to the real IPs; pointless IP-rehab cron disabled.
2026-06-18 23:07:23 -05:00
justin
1056705cf9 docs(deliverability): Google Postmaster TXT added+verified via Hestia DNS master
DNS is fully automatable: Hestia (cp.carrierone.com, zone owner = justin user)
is the DNS master, HE.net are slaves. Added google-site-verification TXT (id
14464) via v-add-dns-record as root; verified resolving on public resolvers +
HE.net slaves. Owner just clicks Verify in the Postmaster console. Documents the
v-add-dns-record path for future records.
2026-06-18 22:05:01 -05:00
justin
5253f16675 docs: deliverability runbook (incident, IP consolidation, monitoring setup)
Documents the 2026-06-18 reputation incident (snowshoe -> Gmail domain-rep
blocks, RBLs all clean), the single-IP-per-stream consolidation, and
fill-in-the-blanks setup steps for Google Postmaster Tools, Microsoft SNDS/JMRP,
and Yahoo CFL (all require owner account login + HE.net DNS). Plus ongoing
hygiene + how to re-expand IPs once reputation recovers.
2026-06-18 17:46:28 -05:00
justin
545e6f7ed7 infra(mail): consolidate sending IPs (kill snowshoe) now that DKIM is fixed
The multi-IP rotation was built to spread risk while DKIM was broken (fixed
2026-06-17) and after the May 30-31 over-volume blast. With DKIM signing
correctly, spreading ~3k trucking msgs/day across 12 IPs (.94-.105) + ~1.2k
healthcare msgs/day across 3 IPs (.107-.109) gave each IP far too little
per-receiver volume to build reputation. Gmail/Outlook read it as snowshoe spam
and reputation-blocked ~200 msgs/day ("very low reputation of the sending
domain") -> 0 human clicks, 0 sales.

Consolidate to ONE IP per stream so each accrues real reputation:
 - trucking: pw-mta-warmup ALL=(out05) -> randmap collapses to {out05:} = .94
 - healthcare: listmonk-hc SMTP servers 2/3 (ports 2527/2528 -> .108/.109)
   disabled in DB; all HC mail now egresses .107 (hcmta01). [applied live]

Applied live: transport_maps now randmap:{out05:}; listmonk-hc restarted.
To re-expand later: add transports back to ALL + re-enable the HC SMTP servers.
2026-06-18 17:41:07 -05:00
justin
f43957882f docs(billing): record OIG/SAM recurring validation status
Checkout half proven against live Stripe (dry-run session created + expired,
zero charge), webhook subscription-id extraction + worker renewal fulfillment
covered by unit tests (31 + 13). Remaining gap: full E2E with a Stripe test
clock, which needs test-mode keys in the server .env (currently unset).
2026-06-18 09:38:51 -05:00
justin
5c1f239307 test(workers): NPI recurring-cycle fulfillment path (13 assertions)
Runs the real _BaseNPIHandler.handle() with _create_todo monkeypatched (no DB /
ERPNext / email side effects) and asserts:
 - first OIG/SAM screening has no [Monthly cycle] prefix / RECURRING banner
 - a recurring_cycle order gets the [Monthly cycle] title prefix, the
   "RECURRING MONTHLY CYCLE" banner, the invoice id, and the re-run-against-
   CURRENT-data + issue-NEW-certificate instructions
 - recurring_cycle works with and without an invoice id
 - the bundle handler's first run is not flagged recurring

Verified passing both locally and inside the deployed workers container.
2026-06-18 09:38:26 -05:00
justin
0083bc1354 docs(billing): record Stripe subscription webhook events as ENABLED + api-version caveat
The 3 subscription-lifecycle events (invoice.paid, invoice.payment_failed,
customer.subscription.deleted) are now enabled on the live endpoint
we_1THBjyB46qMvF2jnYyN8IfkK (6 events total). Documents the unpinned-endpoint
api_version caveat (account default 2024-12-18.acacia, not the SDK's dahlia) and
why invoiceSubscriptionId() must read both invoice shapes. Notes that
charge.dispute.created / balance.available are handled in code but not yet
enabled on the endpoint.
2026-06-18 08:45:22 -05:00
justin
8af2685d07 fix(webhooks): read invoice.subscription in both API shapes (acacia + dahlia)
The live Stripe webhook endpoint has NO pinned api_version, so it follows the
account default (currently 2024-12-18.acacia), which delivers the subscription
link as the top-level invoice.subscription. The code only read the new
2026-03-25.dahlia shape (invoice.parent.subscription_details.subscription), so
recurring renewal/payment-failed events would have returned a null subscription
id and silently failed to fulfill once the events were enabled.

invoiceSubscriptionId() now reads the modern shape first, then falls back to the
legacy top-level field. All other invoice fields used by the handlers
(amount_due, attempt_count, hosted_invoice_url, id) are stable across both
versions. +5 tests (legacy string/object, modern-preferred-over-legacy).
2026-06-18 08:42:29 -05:00
justin
cf021e2f91 feat(healthcare): OIG/SAM exclusion screening as $79/mo Stripe Subscription
Convert OIG/SAM from one-time $299/yr to recurring $79/month (card+ACH only) -
the first real recurring-billing product in the system. Exclusion screening is
a *monthly* federal obligation, so recurring monitoring fits the requirement and
is the biggest valuation lever (vs a one-time annual run).

Catalog (single source of truth):
- service-catalog.ts: add billing_interval + allowed_methods to ComplianceService;
  oig-sam-screening -> 7900c, billing_interval:"month", allowed_methods:[card,ach],
  name "(Monthly Monitoring)".
- gen-service-catalog.py + check-service-catalog-drift.py: carry/guard the two new
  fields; regenerate site catalog.

Checkout (api/src/routes/checkout.ts):
- mode:"subscription" with recurring price_data when billing_interval is set;
  surcharge absorbed for recurring (clean $79/mo); server-side METHOD_NOT_ALLOWED
  re-validation against allowed_methods.
- ensureColumns + migration 100: compliance_orders.stripe_subscription_id,
  bundle_upsell_sent_at (+ subscription index).

Webhooks (api/src/routes/webhooks.ts):
- record stripe_subscription_id on checkout.session.completed (subscription mode).
- invoice.paid (subscription_cycle only) -> re-dispatch screening for the cycle;
  invoice.payment_failed -> admin alert + first-failure customer nudge;
  customer.subscription.deleted -> mark order cancelled. (API 2026-03-25 moved the
  subscription link to invoice.parent.subscription_details.subscription.)

Fulfillment:
- job_server.py: pass recurring_cycle/invoice_id into the order.
- npi_provider.py: OIG handler labels renewal cycles "[Monthly cycle]" + re-screen
  note; bundle action runs only the FIRST screening + flags the $79/mo upsell.

Bundle land-and-expand:
- Provider Compliance Bundle now includes only the first OIG/SAM screening (was
  giving away $948/yr of monitoring inside an $899 bundle).
- new worker scripts/workers/bundle_upsell.py (+ pw-bundle-upsell timer): ~3 weeks
  after a paid bundle, emails the customer to continue $79/mo monitoring; dedup via
  bundle_upsell_sent_at; skips customers who already have an OIG/SAM order.

Surfaces updated to $79/mo: PaymentStep (filters methods, "Billed every month,
cancel anytime"), order pages, healthcare index, npi-compliance-check tool (also
fixed stale $699 bundle drift -> $899), hc_oig_screening + hc_compliance_bundle
emails.

Docs: billing.md gains a "Stripe-native Subscriptions" section + a reality-check
banner (Adyen/ERPNext-gateway model documented there is NOT live; Stripe is the
real rail). Fixed run-migrations.yml container name bug
(performancewest-postgres-1 -> performancewest-api-postgres-1, overridable).

Tests: api/tests/recurring-subscription.test.ts (28 assertions) covers catalog
gating, method validation, surcharge suppression, recurring line-item build,
invoiceSubscriptionId extraction, renewal-cycle gating. tsc clean; site build
clean; catalog drift OK.

Manual deploy step: enable invoice.paid, invoice.payment_failed,
customer.subscription.deleted on the Stripe webhook endpoint.
2026-06-18 07:54:38 -05:00
justin
f481a1d13c analytics: filter email-scanner / headless traffic out of Umami stats
Email security gateways (Microsoft Defender Safe Links / ATP, Proofpoint,
Mimecast, Barracuda, etc.) auto-fetch and often render every link in a
campaign email to scan for malware. The advanced ones drive a real headless
browser, execute JS, and fire Umami pageviews/clicks that masquerade as human
visits -- inflating campaign click-through.

New site/public/js/pw-bot-filter.js queries multiple real-browser signals and
gates Umami via its official data-before-send hook (umamiBeforeSend), dropping
all events when the visitor is a bot. Signals (from empirical chromium probing):
  decisive: navigator.webdriver, HeadlessChrome UA, known scanner UAs, zero/
            collapsed screen|viewport|outer geometry, window LARGER than the
            physical screen (impossible on real HW; uses outerW/H so page zoom
            does not false-positive), software GPU rasterizer (SwiftShader/
            llvmpipe/swrast via WebGL UNMASKED_RENDERER), zero logical CPUs.
  soft (>=2 to trip): tiny screen, inner>screen, low color depth, empty
            navigator.languages, no input device (no fine/coarse pointer + no
            hover + 0 touch), no WebGL on a desktop UA.
Designed to FAIL OPEN: only strong/corroborated evidence suppresses, so real
visitors (incl. zoomed, privacy-tooled, remote-desktop, kiosk) still count.

Wired before the Umami tag in Base.astro (Astro pages) and all 86 static
public/**/*.html pages; both load with defer so order is guaranteed and the
hook is defined before Umami reads it.

Tested end-to-end with chromium (site/tests/bot-filter.test.sh, 4/4):
default headless-new, spoofed-Windows-UA + normal 1366x768 window, and
spoofed-UA + 1x1 window are all caught; hook returns null to drop the event.
2026-06-18 02:02:34 -05:00
justin
40da017b79 campaigns: auto-rollout catch-all pool gated by warmup day + live bounce rate
Replaces the panic-era burner-domain verification plan with an in-house
automatic catch-all rollout in the trucking/IFTA/UCR builders. Root-cause
classification of the 75k pre-DKIM-fix bounces showed ~55% were reputation/
auth (now fixed by DKIM signing) and only ~29% genuinely-dead mailboxes;
catch-all domains accept at RCPT time so they do not user-unknown bounce at
send, making a controlled in-house bleed safer than warming a separate burner.

catch_all_enabled() adds catch-all results only when warmup_day >=
CAMPAIGN_CATCH_ALL_MIN_DAY (21) AND the recent 2-day live bounce rate is below
CAMPAIGN_CATCH_ALL_MAX_BOUNCE_PCT (8%) on a >=300-sent sample; auto-reverts to
the clean smtp_valid/send_confirmed pool on the next run if bounces spike.
Short window so a past disaster cannot block the rollout forever and a fresh
spike trips fast. CAMPAIGN_INCLUDE_CATCH_ALL=1/0 still hard-overrides.

USABLE_FILTER (static) -> usable_filter() (per-run, memoized, one DB probe).
IFTA/UCR SELECT_SQL -> _select_sql() so tc.usable_filter() resolves at call
time, not import. 13 logic unit tests pass; live dry-run decision = OFF
(day 15 < 21 and recent 2d bounce 42% from the aging-out Jun-16 disaster).
2026-06-18 01:39:09 -05:00
justin
c36ef07310 crtc site: defensible framing + 'who this is for' compliance posture
Reduce evasion optics that would draw FCC enforcement attention while keeping the
real value props:
- 'What they avoid by being Canadian' -> 'What the Canadian structure changes'
- Drop 'No US telecom taxes on invoices (15-40% saved)' -> Canadian tax treatment
  on the Canadian entity's billing; 'No US FCC regulatory fees on the Canadian entity'
- '...avoid this by routing US traffic...' -> '...instead route US traffic through
  US intermediaries who carry the 499-A obligation...'
- Add prominent 'Who this is for - and who it isn't' section: legitimate
  conversational voice (UCaaS/PBX/business/residential/live-agent) yes;
  short-duration/dialer/robocall-evasion no. States upstreams are fully
  STIR/SHAKEN compliant and we don't onboard traffic designed to evade
  caller-ID auth; notes Canadian carriers police ASR/ACD more strictly than
  anywhere (a feature). HTML validated balanced.
2026-06-18 00:22:58 -05:00
justin
720197095c CRTC USF email: defensible framing + conversational-voice caveat
Reframe away from 'escape the FCC' optics that would draw enforcement attention:
- Header/flagbar: 'Move your VoIP home to Canada' / 'US obligations ride on your
  upstream' (was 'no FCC reporting, no USAC, no S/S to run')
- Recast claims to 'CRTC regulatory home, not FCC' and scope the no-USF/no-499/
  no-RMD claims to the Canadian-jurisdiction traffic (accurate for US-number
  traffic, which rides on the compliant US upstream)
- STIR/SHAKEN bullet now explicitly pro-compliance: 'we don't help anyone dodge
  call-authentication; upstream partners are fully S/S compliant'
- Drop 'outside the FCC's reach'
- Add honest caveat: Canada is not for short-duration/dialer traffic; Canadian
  carriers are more stringent on ACD/ASR than anywhere; this is for real
  conversational voice (UCaaS/PBX/business/residential/live-agent)
2026-06-18 00:20:44 -05:00
justin
a82b356921 CRTC USF email: reframe to 'run your whole VoIP as a Canadian carrier'
Pivot from the hedge/second-entity framing to the consolidation pitch: one CRTC
carrier as the home base, nexus in Canada, customers onboarded from anywhere.
Lead value props with the three concrete reseller realities:
- No FCC reporting (no 499-A/Q, no RMD recert)
- No USAC/USF on your revenue (contribution sits upstream)
- No STIR/SHAKEN to set up or run (reseller can't get a US token; upstream signs)
Add: No FCC Section 214 / no ongoing 214 burden -- CRTC BITS is a cheap,
low-burden notification by comparison. Header/subject reworked; keeps the honest
US-termination + upstream-signing explanation.
2026-06-18 00:10:06 -05:00
justin
d9ecb94b27 CRTC USF email: add honest US-termination + STIR/SHAKEN section
Address the two most common objections truthfully (researched against CRTC,
FCC 2025 Third-Party Authentication Order, and STIR/SHAKEN cross-border docs):
- US-based long-distance termination operators routinely accept traffic from
  Canadian carriers (cross-border voice is a standard interconnect).
- STIR/SHAKEN: a Canadian reseller cannot get a US SPC token (US-carrier-only),
  so US-bound calls are signed by the upstream US-number provider that assigns
  the DIDs -- exactly how most small US carriers already rely on upstream
  signing. Canadian-origin traffic falls under the lighter CRTC regime, handled
  by the upstream Canadian carrier. Does NOT claim S/S disappears -- it moves to
  the upstream, off the carrier's day-to-day operation.
2026-06-18 00:03:31 -05:00
justin
8099afc5ab CRTC USF email: note US DIDs available from Canadian carriers + point to guide
Address the obvious 'but I need US numbers' objection: several Canadian
wholesale carriers (Fibernetics, Iristel, VoIP.ms, Telnyx, Bandwidth, Twilio,
Frontier) provision US DIDs to CRTC-registered carriers, so they can keep
serving US customers from the Canadian entity. Adds a Canada-advantage bullet
and updates the guide block to call out both US + Canadian DIDs.
2026-06-17 23:53:19 -05:00
justin
1c63e8f4b5 CRTC USF email: add FCC photo-ID KYC requirement to the burden list + Canada contrast
The FCC's 2025 Robocall Mitigation Order (47 CFR 64.1200(n)(4), FCC 25-6)
requires collecting + authenticating a government-issued photo ID for every
new customer before turning up voice service. Add it to the US-carrier burden
list and the matching 'does not apply in Canada' advantage.
2026-06-17 23:46:04 -05:00
justin
2611b5458b CRTC USF campaign: shared campaign_helpers + Q3 38.8% USF email builder
- campaign_helpers.py: extract the branded Listmonk HTML helpers (hdr/flagbar/
  stats/cta/footer/P/UL/etc.) + create_campaign() from create_campaigns.py into
  a side-effect-free shared module; create_campaign() now takes an altbody so
  every campaign ships a plaintext alternative (deliverability).
- create_crtc_usf_campaign.py: build the one-off CRTC email hooked on the Q3
  2026 USF factor (38.8%, +1.8pts, eff Jul 1), with a $200-off CANADA200 banner
  (expires Fri 23:59 ET, CTA links carry ?code= for auto-apply), the full US
  carrier burden vs Canada advantage, BC/ON incorporation, and a hosted
  carrier-guide PDF download. Creates a DRAFT only; sending stays manual.
2026-06-17 23:40:01 -05:00
justin
e379e2b10f CRTC: ERPNext as portal source of truth + harden discount expiry + carrier guide PDF
- checkout.ts: generalize ensureCompliancePortalUser -> ensurePortalUser and
  call it in the CRTC post-payment path so PayPal/crypto/webhook-confirmed CRTC
  orders always get an ERPNext Customer + Website User (the single source of
  truth for portal login/password), matching the compliance fix from the
  PayPal incident. Also flip portal_user_created for canada_crtc/formation.
- canada-crtc.ts: enforce discount active+start/expiry windows, global usage
  limit and applies_to scope server-side at checkout (was active-only), so a
  promo like CANADA200 actually stops working after its expiry.
- scripts/generate_canada_carrier_guide_pdf.py: render the public Canadian
  wholesale carrier/vendor guide PDF (reuses the canonical VENDORS list) to
  site/public/guides/canada-carrier-guide.pdf for the CRTC campaign lead magnet.
2026-06-17 23:34:13 -05:00
justin
eed5e4a258 campaigns: disable daily discount by default — test normal-price deals
The daily 40%-off coupon was being merged into every trucking/UCR/IFTA/OTC
send, but those discount sends were not actually being delivered (the
DKIM-broken window). Now that deliverability is fixed, re-test whether
normal-price offers convert before giving margin away.

New CAMPAIGN_ENABLE_COUPON env flag (default OFF) gates daily-coupon
minting in build_trucking_campaigns + the UCR/IFTA/OTC builders (which
import it as tc.COUPON_ENABLED). With it off, no code is minted and an
empty coupon_code is merged -> the campaign templates' existing
{{ if .Subscriber.Attribs.coupon_code }} guard falls through to the
normal-price {{ else }} branch and landing-page links carry no ?code=.
No template or DB changes; fully reversible (set CAMPAIGN_ENABLE_COUPON=1).

Verified: COUPON_ENABLED defaults False, coupon_attribs(None) -> empty,
lp_link drops ?code= when no coupon, all 4 builders compile.
2026-06-17 22:51:28 -05:00
justin
a04ecf7df3 chore(email): decommission SMTP2GO references — local MTA only
SMTP2GO is no longer used: Listmonk relays through the local Postfix MTA
(172.18.0.1:25 from the Docker network), which DKIM-signs and delivers
direct-to-recipient-MX; transactional mail goes through Carbonio. Verified
zero smtp2go in any live container env + postfix has no external relayhost.

Removed the stale references so a rebuild/new dev can't re-introduce it:
- api/src/config.ts: SMTP_HOST default mail.smtp2go.com -> co.carrierone.com
- scripts/workers/crypto_payment_worker.py: same default fix
- infra/ansible all.yml: listmonk_smtp_* now 172.18.0.1:25, no auth (+comment)
- app.env.j2 / email.ts / crm.md / go-live-todo.md / architecture.svg: docs
2026-06-17 22:46:59 -05:00
justin
eba525f83f docs: runbook fix #8 — telecom/transactional HTML-only plaintext fix + campaign 407 finding 2026-06-17 21:17:06 -05:00
justin
b375385efd fix(email): add text/plain part to every transactional + telecom email
All transactional/worker senders built multipart/alternative (or mixed)
messages with ONLY an HTML part. A single-part multipart/alternative is
malformed and HTML-only mail is a spam-score signal -- the same class of
deliverability bug that hurt the campaign pipeline, but on the telecom /
filing / customer-transactional path (499-Q reminders, RMD/FCC filing
review links, intake/completion/delivery emails, commissions, etc).

- worker_email.send_worker_email: auto-derive plaintext from HTML when
  caller omits text= (fixes the shared helper for all current+future use)
- 16 rolled-their-own senders in scripts/workers/** + scripts/formation/
  document_delivery.py: attach html_to_text(...) plaintext sibling before
  the HTML part (job_server + document_delivery wrap text+html in an
  alternative sub-part so PDFs still attach to the mixed root)
- api/src/email.ts: add dependency-free htmlToText() and default
  sendEmail text to it (fixes checkout/webhook HTML-only sends)

Verified: all py files compile + import at runtime, api tsc passes,
htmlToText handles hrefs/lists/entities, 11 plaintext unit tests pass.
Telecom campaign 407 (Jun 8) was HTML-only + sent in the DKIM-broken
window -> 384 sent / 0 clicks (same junked-mail signature).
2026-06-17 21:07:40 -05:00
justin
899b880e7f trucking: weekly FMCSA source refresh so new non-compliant carriers are caught
The FMCSA census was a one-time snapshot (last loaded ~May 30) with NO refresh
timer -- carriers newly falling out of MCS-150/UCR compliance were never picked
up. New scripts/workers/fmcsa_source_refresh.py orchestrates the full pipeline
(census download -> enrichment -> deficiency flag -> verify new emails ->
MX-tag new) and runs weekly via cron pw-fmcsa-refresh (Sun 09:00 UTC), codified
in the mail-pipeline Ansible role.

Idempotent + incremental: the census upsert preserves email_verified /
listmonk_sent_at / deficiency_flags, so existing carriers keep their send state
and only census fields refresh; new DOTs flow into verification then campaigns.
A carrier who refiled gets a fresh mcs150_parsed, so the builder's overdue
WHERE clause stops targeting them automatically. Verify is capped per run
(20k) so it never stalls on millions of rows.

(Healthcare already auto-catches newly-revalidation-overdue providers within
its 63k institutional pool via pw-hc-refresh Mon/Wed/Fri.)
2026-06-17 20:44:54 -05:00
justin
4171f48736 docs: record post-incident email hardening (7 fixes) in runbook 2026-06-17 20:30:59 -05:00
justin
466460112b email: handle unquoted hrefs in plaintext converter + add tests
The anchor regex only matched quoted hrefs; unquoted (href=URL) dropped the
URL from the plaintext part. Now handles double/single/unquoted. Added
scripts/test_email_plaintext.py (11 cases: link forms, mailto, template-tag
preservation, tag stripping, entity unescape, blank-line collapse).
2026-06-17 20:28:15 -05:00
justin
4dc5690666 infra: codify the email-campaign pipeline in Ansible (new mail-pipeline role)
The entire outbound campaign pipeline lived ONLY on the host and was never in
IaC -- a fresh rebuild would have silently shipped NO campaigns, NO IP warmup/
ramp, and NO bounce processing. New mail-pipeline role + deploy-mail-pipeline.yml
playbook deploy it from the canonical repo copies:

  cron.d (infra/cron/):
    - pw-trucking-campaign-builder, pw-ifta-campaign, pw-ucr-campaign
    - pw-hc-campaign, pw-hc-nppes, pw-hc-refresh
    - pw-mta-warmup, pw-listmonk-rampcap, pw-hc-rampcap
    - pw-ip-rehab, pw-warmup-tg-alert
  helper scripts (-> /usr/local/bin):
    - pw-mta-warmup, pw-listmonk-rampcap, pw-hc-rampcap, pw-warmup-tg-alert
    - postfix-bounce-notify.sh, postfix-hc-bounce-notify.sh, listmonk-bounce-sync.py
  systemd services:
    - pw-bounce-watcher.service (was missing from repo), pw-hc-bounce-watcher.service

Also creates the deploy-owned {{project_dir}}/logs dir (deploy can't write
/var/log, so a missing dir made cron redirects fail). Added the 6 cron.d files
that existed only on the host, the trucking bounce-watcher unit, and synced
infra/cron/pw-hc-refresh to the live version (revalidation download + enrich
steps). Role wired into site.yml after the mail (OpenDKIM) role.

Part of the email-deliverability incident hardening.
2026-06-17 20:26:01 -05:00
justin
c183957939 email: suppress defunct/legacy/satellite ISP domains in cold sends
Added DEAD_ISP_DOMAINS (52 domains) to BLOCKED_EMAIL_DOMAINS, so every
campaign builder that imports the shared exclusions (trucking, UCR, IFTA via
create_and_schedule_campaign, and the healthcare importer) stops cold-mailing
them. Domains were identified from our own Listmonk bounce table (top bounced
recipient domains) cross-checked against ISP status: defunct dial-up brands
(earthlink, netzero, juno, mindspring...), Qwest/Embarq legacy, satellite
(hughes, wildblue, dishmail), Altice/Suddenlink rural, WOW!/Knology, small
rural ISPs (windstream, tds, iowatelecom...) and Alaska regional.

Deliberately keeps still-active large consumer ISPs (comcast/charter/cox/
centurylink) -- their bounces were the cold-IP/no-DKIM reputation problem
(now fixed), not dead mailboxes, and they carry real prospects.

Part of the email-deliverability incident hardening.
2026-06-17 20:16:00 -05:00
justin
a32a3b05a0 email: add plaintext MIME part + stable Message-ID hostname
Two deliverability hardening fixes from the email audit:

1. Plaintext (altbody): all campaigns were HTML-only. Listmonk only emits
   multipart/alternative when altbody is set, and HTML-only bulk mail is a
   spam-score signal. New scripts/_email_plaintext.py renders a readable
   text/plain part from the HTML body (dependency-free; preserves Listmonk
   {{ .Subscriber }}/{{ UnsubscribeURL }} template tags, turns links into
   'text (url)'). Wired into the trucking builder (and thus UCR + IFTA, which
   reuse create_and_schedule_campaign) and the healthcare builder.

2. Stable container hostname: Listmonk derived its Message-ID from the random
   docker container id -> @localhost.localdomain (spam-score signal). Pin both
   listmonk + listmonk-hc hostname to perfwest.performancewest.net, matching
   Listmonk's SMTP hello_hostname.

Part of the email-deliverability incident hardening.
2026-06-17 20:09:02 -05:00
justin
2e4388a803 mail: add logrotate for Postfix mail.log (postlogd copytruncate)
mail.log had no logrotate rule and grew unbounded to ~1GB (~150MB/day)
since Jun 8. This host logs via Postfix's built-in postlogd (maillog_file
mode), not rsyslog (no rsyslog.service exists), so postlogd holds the file
open -- a plain rename+create would leave it writing to the stale inode.
Use copytruncate (no daemon signal needed). Rotate daily, keep 14 days
compressed. Applied live: forced first rotation, compressed the 1GB
archive (->99MB), verified logging + bounce watchers + DKIM signing intact.

Part of the email-deliverability incident hardening (follows DKIM fix 4d59019).
2026-06-17 19:47:13 -05:00
justin
4d5901921e mail: fix OpenDKIM not signing campaign mail (Docker-injected) + codify in Ansible
Root cause of the Jun 2026 deliverability collapse / 'no new sales':
opendkim.conf was in single-key mode with no InternalHosts, so it signed only
127.0.0.1. Transactional/cron mail (injected locally) was signed, but ALL
campaign mail -- injected over the Docker bridge from the Listmonk containers
(172.18.0.5 trucking, 172.18.0.25 healthcare) -- went out UNSIGNED. Gmail/Yahoo
require DKIM on bulk mail since Feb 2024, so cold campaigns were junked/blocked
(~23% delivery, 550-5.7.1). Proof: 2,620 campaign msgs that day, 0 DKIM sigs.

The correct table files already existed on the server but were never wired into
opendkim.conf. Fix points the daemon at key.table/signing.table and sets
InternalHosts/ExternalIgnoreList to trusted.hosts (which includes 172.16.0.0/12,
the Docker subnet). Fixes BOTH streams: HC submission ports 2526-2528 inherit
the global smtpd_milters and *@performancewest.net covers compliance@.

Verified by injecting from a Docker IP through port 25 and port 2526 -- both now
get 'DKIM-Signature field added'. Codified as new Ansible role 'mail' so it
can't silently regress (OpenDKIM was previously not in IaC at all).
2026-06-17 19:31:19 -05:00
justin
f7212b3969 scripts: one-off fresh password-set link for Paul Wilson (ERPNext auth) 2026-06-17 10:19:53 -05:00
justin
9c87759501 auth: make ERPNext the single source of truth for customer passwords
Customer portal login previously checked a bcrypt customers.password_hash
in Postgres, while portal.performancewest.net validated against ERPNext —
two stores that drifted (the Paul Wilson lockout). Consolidate on ERPNext:

- erpnext-client: add verifyWebsiteUserPassword() — delegates the credential
  check to Frappe /api/method/login (Host header = site name; 200=ok,401=bad).
- portal-auth /login: verify against ERPNext, then mint the pw_customer cookie.
- portal-auth /register: create+set the ERPNext password (authority) and upsert
  a password-less customers profile row; takeover guard still honors any legacy
  PG password until the column is dropped.
- portal-auth /reset-password + /forgot-password: write the new password to
  ERPNext; forgot-password now also works for ERPNext-only users (creates the
  PG profile row on demand).
- Legacy customers with only a PG bcrypt password reset via forgot-password.
- checkout: refresh the stale comment (customers row is now a profile, no pw).

Build + typecheck green.
2026-06-17 10:09:32 -05:00
justin
557b45f65d fix(erpnext): self-heal outgoing Email Account password from SMTP_* env
Root cause of recurring 'Password not found for Email Account Performance West
Outgoing': the account was shipped as a fixture with awaiting_password=1 and no
password. Email Account SMTP passwords are encrypted per-site and cannot live in
a fixture, so every `bench migrate` reimported the fixture and re-broke
outgoing mail (login notifications, password resets, welcome emails).

- Remove the Email Account fixture (it cannot carry the encrypted secret).
- Add email_account_sync.sync_outgoing_password: idempotent, exception-safe
  upsert that reconciles the account + password from SMTP_* env and clears
  awaiting_password.
- Wire it to after_migrate (repairs at end of every deploy/migrate, right after
  fixtures import) and the daily scheduler (heals out-of-band restore/restart
  drift).
- Pass SMTP_* into the erpnext + erpnext-scheduler containers so the sync has
  the secret (they previously had no SMTP env).
2026-06-17 09:48:28 -05:00
justin
1eb29f80be fix(verifier): mx_unreachable was mislabeling live big-ISP mailboxes
The verifier returned (True, 'mx_unreachable') when it couldn't complete a port-25
probe to ANY MX — marking 438,163 addresses email_verified=TRUE. But these are NOT
dead: they're dominated by Comcast (13.7k), AT&T/SBCGlobal (13.5k), Verizon, Cox,
Charter, Frontier, etc. — major ISPs that deliberately tarpit/refuse probes from
unknown IPs. Confirmed from prod: comcast MX connects + returns 220. The probe
failure ≠ undeliverable.

Fix: return (False, 'mx_probe_blocked') — MX exists, deliverability UNKNOWN, must
be confirmed by a real send. Excluded from PW campaigns; prime burner-verification
target (burner_list_verify upgrades it to send_confirmed on delivery). Existing
438,163 mx_unreachable rows reclassified in prod to mx_probe_blocked / verified=FALSE.
2026-06-17 05:48:08 -05:00
justin
c2737f2001 feat(deliverability): burner-domain list verification + plan doc
The smtp_valid pool is only ~3k unsent — too small to sustain campaigns. SMTP
probing can't confirm catch-all/mx_unreachable deliverability; only a REAL send
can. burner_list_verify.py reconciles a verification send from a DISPOSABLE burner
domain (isolated from PW/carrierone reputation):
  - hard bounce  -> fmcsa_carriers.email_verify_result='hard_bounced' (excluded)
  - delivered    -> 'send_confirmed' (proven deliverable; PW campaigns send to it)
It tails the burner MTA mail.log (reuses bounce-watcher's status= pattern) and
writes back idempotently. The PW trucking filter now treats smtp_valid +
send_confirmed as sendable. docs/campaign-deliverability-plan.md captures the full
diagnosis, the burner design, and CAN-SPAM guardrails.

Remaining (needs a domain + isolated MTA identity — operator/infra decision):
stand up the burner domain, the verification-send worker, and a writeback cron.
2026-06-16 22:28:24 -05:00
justin
1652a3b8bc fix(campaigns): stop sending trucking blasts to mx_unreachable dead domains
Root cause of zero conversions since Jun 9 + the Gmail/Outlook block storm:
the send filter was '(email_verified IS TRUE OR result IN ...)'. The verifier
sets email_verified=TRUE optimistically for mx_unreachable (domain exists but
its mail server never answered the RCPT probe) — 438,163 such rows. Those HARD
BOUNCE on send, producing ~1,100 bounces/day (~47% rate) and blocklisting half
the 120k subscriber base, so real prospects never saw the offer.

Fix: key the send filter ONLY off email_verify_result, never the broken boolean.
Recovery mode (default): send only 'smtp_valid' to drive bounce rate to ~0 and
rebuild reputation; set CAMPAIGN_INCLUDE_CATCH_ALL=1 to re-add catch-all domains
once recovered. Mirrors the healthcare list-cleaning approach (HC bounces ~2-3%,
which proves the fix). Note: only ~3k smtp_valid unsent remain — list growth via
real-send bounce verification (separate burner domain) is the follow-up.
2026-06-16 22:24:15 -05:00
justin
35f204c2b8 fix(mcs150): point intake email to per-slug wizard (not sales page) + add Trailers field
The MCS-150 intake-completion email linked customers to /order/dot-compliance,
which is the sales/checkout page -- it ignores ?order= and asks the customer to
re-pick services and pay again, so they 'cannot enter any data' (Paul Wilson's
report). Link to the per-service intake wizard /order/<slug>?order=... instead,
which loads the paid order, pre-fills from the FMCSA census, and drops payment.

Also add a Trailers field to the DOT intake fleet section and wire it through to
the MCS-150 PDF Q26 trailer row, so carriers can update trucks AND trailers.
2026-06-16 16:21:57 -05:00
justin
674979c928 tweak(sc-coc): tell carrier to check with insurer before answering + Reply-To info@
- Added a line asking them to call their insurance agent to confirm Form E
  ability before clicking yes/no, so we pick the right path first time.
- Reply-To now routes to info@performancewest.net (monitored), overridable via
  SC_COC_REPLY_TO env.
2026-06-16 09:35:13 -05:00
justin
ab9491be6a fix(deploy): hard-reset to origin/main + assert HEAD advanced (stop silent strands)
deploy.sh used 'git pull origin main', which silently ABORTS when the tracked
tree is dirty (generated site files, or any drift), stranding new commits on an
old checkout — this bit us twice today (prod stuck at b125d46 while origin had
the COC work). Replaced with:
  git fetch origin main && git reset --hard origin/main
The deploy box is a pure mirror of origin (all real changes land via git), so a
hard reset is safe and untracked files (data/*, .secrets/) are preserved. Added
a post-reset assertion that HEAD == origin/main and exits 1 loudly otherwise, so
a strand can never again be masked by a '| tail' in the caller.
2026-06-16 09:25:11 -05:00