Commit graph

754 commits

Author SHA1 Message Date
justin
8e5590b492 mail: DMARC aggregate-report parser + dedicated dmarc@ mailbox ingestion
Tool 2 of the deliverability monitoring pair (Tool 1 = mail_reputation_monitor).
DMARC rua reports from dozens of operators (Google, Yahoo, Comcast, Cox, Bell,
Mimecast, Cisco ESA, GMX, mail.com, ...) were landing in ops@ (dmarc@ was a DL),
burying real mail and never parsed. Now ingested + queryable:

- dmarc@performancewest.net converted DL -> dedicated Carbonio mailbox; isolated
  IMAP creds in server .env, surfaced to workers in docker-compose.yml (mirrors
  OPS_IMAP_*). 29 historical reports moved ops@ -> dmarc@ via IMAP.
- scripts/dmarc_report_parser.py: IMAP fetch unseen -> decompress .gz/.zip/.xml
  (namespace-agnostic: classic + urn:ietf:params:xml:ns:dmarc-2.0 GMX/mail.com) ->
  parse aggregate XML -> upsert dmarc_report (keyed (org_name,report_id), no-op on
  re-parse) + dmarc_record per source IP. dmarc_pass = dkim_aligned OR spf_aligned.
  Marks \Seen. --dry-run/--all/--alert (7d per-IP summary + Telegram if one of OUR
  IPs <95% pass, or EXTERNAL IP sends >=20 failing msgs as us = spoofing under
  p=reject). psycopg2 imported lazily so --dry-run runs without the driver.
- api/migrations/102_dmarc_aggregate.sql: dmarc_report + dmarc_record tables.
- infra/cron/pw-dmarc-parser: 06:20 UTC daily --alert (after reputation, before scrub).
- docs/deliverability.md: DMARC section DONE; query examples.

Verified: dry-run --all parses all 28 reports (1 non-report test probe), 0 unknown
after the namespace fix.
2026-06-19 08:50:20 -05:00
justin
b45332b5f7 infra(cron): nightly mail-reputation snapshot (pw-mail-reputation)
Runs mail_reputation_monitor --alert at 06:10 UTC, piping the day's postfix log
(sudo cat, same pattern as pw-warmup-tg-alert) into the DB-connected workers
container. Builds the daily SNDS-equivalent reputation trend and Telegram-alerts
on operator regressions. Installed to /etc/cron.d/pw-mail-reputation.
2026-06-19 08:38:35 -05:00
justin
08f651dc1e feat(deliverability): mail reputation monitor (SNDS-equivalent from postfix logs)
Adds scripts/mail_reputation_monitor.py + migration 101 (mail_reputation_daily).
Sender reputation is judged by the RECEIVING operator (Microsoft/Google/Yahoo/
Proofpoint), and the provider portals (SNDS/Postmaster/CFL) need a login and lag
24-48h. Our postfix logs already carry the ground truth in real time: every send
records the receiving host + SMTP response, and the response classifies WHY:
  250            -> accepted
  451 4.7.500    -> throttled (Microsoft rate-limiting a cold IP)
  550 5.7.x      -> reject_reputation (spam/reputation)
  550 5.1.1/5.4.1-> reject_recipient (dead mailbox / access denied = list hygiene)
  550 ...SPAM    -> reject_content (SpamAssassin)

The parser classifies each egress delivery (out0x/hcout/relay) by (sending_ip,
receiver, outcome, reason_code) and upserts ONE daily aggregate row per bucket
(idempotent ON CONFLICT), so a nightly cron over the rotated log gives a queryable
trend without re-parse double-counting. --alert prints a per-operator summary and
Telegram-alerts on regressions (>=10% reputation rejects, or Microsoft >=70%
throttled). Reads stdin ("-") so the host-owned /var/log/mail.log can be piped
into the DB-connected workers container.

Motivation: 2026-06-19 audit found ~80% of Microsoft sends were getting 451 4.7.500
throttles on the warming IPs -- this makes that trend visible as reputation recovers.
2026-06-19 08:35:45 -05:00
justin
bd7ba23841 docs(deliverability): Yahoo CFL ENROLLED for both domains (reporting fbl@)
performancewest.net + send.performancewest.net both show Enrolled in the Yahoo
Sender Hub, reporting email fbl@. All three FBLs (Google Postmaster, MS SNDS+JMRP,
Yahoo CFL) now complete.
2026-06-19 08:29:12 -05:00
justin
b8b6444084 docs(deliverability): Yahoo CFL verification keys added for both domains
Added yahoo-verification-key TXT records via Hestia for performancewest.net
(apex) and send.performancewest.net; both propagated to all HE.net slaves +
public resolvers. Ready to click Verify in the Yahoo CFL form, complaint dest fbl@.
2026-06-19 02:13:48 -05:00
justin
a9bbfbf59b docs(deliverability): Microsoft MANUAL 2 fully DONE — SNDS access + JMRP both set
SNDS access requested/granted for 207.174.124.94 + .107; JMRP feeds registered
with complaint dest fbl@. Section marked complete. SNDS data populates in ~24-48h.
2026-06-19 02:03:30 -05:00
justin
f293466519 docs(deliverability): JMRP complaint dest set to fbl@performancewest.net
Corrected: JMRP feed destination was set to fbl@ directly (no forward needed);
ARF complaints route to ops@.
2026-06-19 01:00:16 -05:00
justin
60540f949d docs(deliverability): JMRP done — both IPs registered (pw1/.94, pw2/.107)
Note JMRP delivers ARF complaints to the signed-in MS account's email, not
automatically to fbl@; set a forward if that account isn't fbl@performancewest.net.
2026-06-19 00:59:49 -05:00
justin
776817c727 docs(deliverability): correct SNDS entry URL (snds.microsoft.com does not resolve)
Use the legacy sendersupport.olc.protection.outlook.com/snds/ (308-redirects) or
the direct substrate.office.com/ip-domain-management-snds/SNDS app URL. Flag that
snds.microsoft.com has no DNS.
2026-06-19 00:46:25 -05:00
justin
7828ee4587 docs(deliverability): fix SNDS/JMRP URLs for Microsoft's 2026 substrate migration
SNDS moved off sendersupport.olc.protection.outlook.com to
substrate.office.com/ip-domain-management-snds/. The old /snds/ and /pm/ links
308-redirect there. Document that the footer/help links going to microsoft.com
are boilerplate (not broken), and that you must Log in FIRST or the Request
Access / JMRP links bounce to login.microsoftonline.com (expected, not dead).
Add working direct links + canonical https://snds.microsoft.com entry point.
2026-06-19 00:45:59 -05:00
justin
e18f23634a docs(deliverability): document consumer-domain exclusion two-layer model + scrub
Records the Apple/iCloud addition, the builder-vs-list-based distinction, the
scrub_listmonk_consumer reconciliation tool + daily cron, and the 2026-06-19
first-run numbers (7,943 trucking + 21 HC stale consumer subs blocklisted).
2026-06-19 00:01:17 -05:00
justin
72c69a05c9 infra(cron): daily Listmonk consumer-domain reconciliation (pw-listmonk-scrub)
Runs scrub_listmonk_consumer against both listmonk and listmonk_hc at 06:30 UTC,
before the campaign builders, so any ENABLED subscriber matching the authoritative
exclusion list is blocklisted retroactively. Keeps list-based campaigns (FCC
Direct Contacts, CRTC/USF, etc.) from leaking onto consumer mailboxes after a new
domain (e.g. Apple/iCloud) is added to the exclusion list. Installed to
/etc/cron.d/pw-listmonk-scrub on the host.
2026-06-19 00:00:46 -05:00
justin
b40fc7ec36 feat(deliverability): exclude Apple consumer mail + scrub stale consumer subs from Listmonk
The fmcsa campaign builders already exclude gmail/yahoo/microsoft/etc. from NEW
audience selections, but two reputation leaks remained on the LIST-BASED side:

1. iCloud/Apple gap. icloud.com/me.com/mac.com were never in the exclusion set.
   A 2026-06 Listmonk audit found 1,321 ENABLED iCloud subscribers on list 3
   ("FCC Carriers - Direct Contacts") -- the single largest enabled-consumer
   bucket -- being cold-blasted with no exclusion at all. Add APPLE_CONSUMER_DOMAINS.

2. Stale already-imported consumer subs. List-based campaigns (e.g. the running
   CRTC/USF blast on list 3) keep hitting consumer addresses imported BEFORE the
   relevant domain joined the exclusion list. gmail.com was still the #1 bounce
   domain via that campaign even though new selections exclude it. Add
   scrub_listmonk_consumer.py: reconciles the live Listmonk subscriber table
   against the authoritative exclusion list and blocklists any ENABLED subscriber
   whose address is_blocked(). Idempotent; re-run whenever the exclusion grows so
   it applies retroactively. Uses the same 'blocklisted' terminal state as the
   bounce handler, so contacts are excluded from all current/future campaigns
   without deleting history. Supports --dry-run and both listmonk / listmonk_hc.
2026-06-18 23:55:58 -05:00
justin
49842bddbb docs(deliverability): Microsoft #1 priority + role mailboxes created (Carbonio)
Created postmaster@/abuse@/fbl@/dmarc@ as Carbonio DLs -> ops@ (they previously
REJECTED 5.1.1, which would have blocked SNDS verification AND was silently
dropping all DMARC aggregate reports). Verified accept-at-MX + delivered E2E.
Reframe Microsoft as the #1 monitoring priority (85% of audience), Yahoo as
lowest (<1%); add Carbonio admin access note; note DMARC parser now worth building.
2026-06-18 23:31:20 -05:00
justin
3ca960aca5 docs+infra(deliverability): document bulk subdomain; ansible signs send.performancewest.net
- infra/ansible/roles/mail: refactor OpenDKIM to support multiple signing domains
  via opendkim_signing_domains list (root + send.performancewest.net). Loops
  keygen/ownership/keytable/signingtable so the live two-domain setup is
  reproducible from ansible.
- infra/ansible group_vars: add bulk_mail_subdomain + campaign_from_* +
  campaign_reply_to documentation vars (map to CAMPAIGN_FROM / HC_CAMPAIGN_FROM
  env read by the builder scripts). smtp_from (transactional) stays on root.
- docs/deliverability.md: rewrite TL;DR with the carrierone-vs-performancewest
  A/B proof (same server/IPs, different From domain -> Inbox vs Junk) and the
  ~85% Microsoft / 14% Google / <1% Yahoo audience mix; add the bulk-subdomain
  section, SPF trim, rehab-disabled, and the Hestia DNS automation runbook.
2026-06-18 23:12:05 -05:00
justin
5c3b4291e7 feat(deliverability): send bulk campaigns from dedicated subdomain send.performancewest.net
Isolates bulk sending reputation onto a dedicated subdomain so the root domain
stays clean for transactional/verification mail (and recovers faster). Replies
still go to the root domain via Reply-To, so the customer-facing reply experience
is unchanged.

- build_trucking_campaigns.py: add env-overridable FROM_EMAIL
  (noreply@send.performancewest.net); use it for both scheduled + test sends
  instead of inheriting base["from_email"] from the DB base campaign.
- build_healthcare_campaigns_cron.py: FROM_EMAIL ->
  compliance@send.performancewest.net (env-overridable).
- bounce-watcher.sh / hc-bounce-watcher.sh: track the new subdomain envelope
  sender (keep legacy root-domain sender so the pre-cutover queue still drains;
  HC also tracks by hcout transport regardless of sender).

Infra already live (separate, non-code): subdomain DNS (A/MX/SPF/DKIM
selector=send/DMARC p=reject) on the Hestia master, OpenDKIM signs
d=send.performancewest.net (verified end-to-end), egress .94/.107. Root SPF
trimmed to the real IPs; pointless IP-rehab cron disabled.
2026-06-18 23:07:23 -05:00
justin
1056705cf9 docs(deliverability): Google Postmaster TXT added+verified via Hestia DNS master
DNS is fully automatable: Hestia (cp.carrierone.com, zone owner = justin user)
is the DNS master, HE.net are slaves. Added google-site-verification TXT (id
14464) via v-add-dns-record as root; verified resolving on public resolvers +
HE.net slaves. Owner just clicks Verify in the Postmaster console. Documents the
v-add-dns-record path for future records.
2026-06-18 22:05:01 -05:00
justin
5253f16675 docs: deliverability runbook (incident, IP consolidation, monitoring setup)
Documents the 2026-06-18 reputation incident (snowshoe -> Gmail domain-rep
blocks, RBLs all clean), the single-IP-per-stream consolidation, and
fill-in-the-blanks setup steps for Google Postmaster Tools, Microsoft SNDS/JMRP,
and Yahoo CFL (all require owner account login + HE.net DNS). Plus ongoing
hygiene + how to re-expand IPs once reputation recovers.
2026-06-18 17:46:28 -05:00
justin
545e6f7ed7 infra(mail): consolidate sending IPs (kill snowshoe) now that DKIM is fixed
The multi-IP rotation was built to spread risk while DKIM was broken (fixed
2026-06-17) and after the May 30-31 over-volume blast. With DKIM signing
correctly, spreading ~3k trucking msgs/day across 12 IPs (.94-.105) + ~1.2k
healthcare msgs/day across 3 IPs (.107-.109) gave each IP far too little
per-receiver volume to build reputation. Gmail/Outlook read it as snowshoe spam
and reputation-blocked ~200 msgs/day ("very low reputation of the sending
domain") -> 0 human clicks, 0 sales.

Consolidate to ONE IP per stream so each accrues real reputation:
 - trucking: pw-mta-warmup ALL=(out05) -> randmap collapses to {out05:} = .94
 - healthcare: listmonk-hc SMTP servers 2/3 (ports 2527/2528 -> .108/.109)
   disabled in DB; all HC mail now egresses .107 (hcmta01). [applied live]

Applied live: transport_maps now randmap:{out05:}; listmonk-hc restarted.
To re-expand later: add transports back to ALL + re-enable the HC SMTP servers.
2026-06-18 17:41:07 -05:00
justin
f43957882f docs(billing): record OIG/SAM recurring validation status
Checkout half proven against live Stripe (dry-run session created + expired,
zero charge), webhook subscription-id extraction + worker renewal fulfillment
covered by unit tests (31 + 13). Remaining gap: full E2E with a Stripe test
clock, which needs test-mode keys in the server .env (currently unset).
2026-06-18 09:38:51 -05:00
justin
5c1f239307 test(workers): NPI recurring-cycle fulfillment path (13 assertions)
Runs the real _BaseNPIHandler.handle() with _create_todo monkeypatched (no DB /
ERPNext / email side effects) and asserts:
 - first OIG/SAM screening has no [Monthly cycle] prefix / RECURRING banner
 - a recurring_cycle order gets the [Monthly cycle] title prefix, the
   "RECURRING MONTHLY CYCLE" banner, the invoice id, and the re-run-against-
   CURRENT-data + issue-NEW-certificate instructions
 - recurring_cycle works with and without an invoice id
 - the bundle handler's first run is not flagged recurring

Verified passing both locally and inside the deployed workers container.
2026-06-18 09:38:26 -05:00
justin
0083bc1354 docs(billing): record Stripe subscription webhook events as ENABLED + api-version caveat
The 3 subscription-lifecycle events (invoice.paid, invoice.payment_failed,
customer.subscription.deleted) are now enabled on the live endpoint
we_1THBjyB46qMvF2jnYyN8IfkK (6 events total). Documents the unpinned-endpoint
api_version caveat (account default 2024-12-18.acacia, not the SDK's dahlia) and
why invoiceSubscriptionId() must read both invoice shapes. Notes that
charge.dispute.created / balance.available are handled in code but not yet
enabled on the endpoint.
2026-06-18 08:45:22 -05:00
justin
8af2685d07 fix(webhooks): read invoice.subscription in both API shapes (acacia + dahlia)
The live Stripe webhook endpoint has NO pinned api_version, so it follows the
account default (currently 2024-12-18.acacia), which delivers the subscription
link as the top-level invoice.subscription. The code only read the new
2026-03-25.dahlia shape (invoice.parent.subscription_details.subscription), so
recurring renewal/payment-failed events would have returned a null subscription
id and silently failed to fulfill once the events were enabled.

invoiceSubscriptionId() now reads the modern shape first, then falls back to the
legacy top-level field. All other invoice fields used by the handlers
(amount_due, attempt_count, hosted_invoice_url, id) are stable across both
versions. +5 tests (legacy string/object, modern-preferred-over-legacy).
2026-06-18 08:42:29 -05:00
justin
cf021e2f91 feat(healthcare): OIG/SAM exclusion screening as $79/mo Stripe Subscription
Convert OIG/SAM from one-time $299/yr to recurring $79/month (card+ACH only) -
the first real recurring-billing product in the system. Exclusion screening is
a *monthly* federal obligation, so recurring monitoring fits the requirement and
is the biggest valuation lever (vs a one-time annual run).

Catalog (single source of truth):
- service-catalog.ts: add billing_interval + allowed_methods to ComplianceService;
  oig-sam-screening -> 7900c, billing_interval:"month", allowed_methods:[card,ach],
  name "(Monthly Monitoring)".
- gen-service-catalog.py + check-service-catalog-drift.py: carry/guard the two new
  fields; regenerate site catalog.

Checkout (api/src/routes/checkout.ts):
- mode:"subscription" with recurring price_data when billing_interval is set;
  surcharge absorbed for recurring (clean $79/mo); server-side METHOD_NOT_ALLOWED
  re-validation against allowed_methods.
- ensureColumns + migration 100: compliance_orders.stripe_subscription_id,
  bundle_upsell_sent_at (+ subscription index).

Webhooks (api/src/routes/webhooks.ts):
- record stripe_subscription_id on checkout.session.completed (subscription mode).
- invoice.paid (subscription_cycle only) -> re-dispatch screening for the cycle;
  invoice.payment_failed -> admin alert + first-failure customer nudge;
  customer.subscription.deleted -> mark order cancelled. (API 2026-03-25 moved the
  subscription link to invoice.parent.subscription_details.subscription.)

Fulfillment:
- job_server.py: pass recurring_cycle/invoice_id into the order.
- npi_provider.py: OIG handler labels renewal cycles "[Monthly cycle]" + re-screen
  note; bundle action runs only the FIRST screening + flags the $79/mo upsell.

Bundle land-and-expand:
- Provider Compliance Bundle now includes only the first OIG/SAM screening (was
  giving away $948/yr of monitoring inside an $899 bundle).
- new worker scripts/workers/bundle_upsell.py (+ pw-bundle-upsell timer): ~3 weeks
  after a paid bundle, emails the customer to continue $79/mo monitoring; dedup via
  bundle_upsell_sent_at; skips customers who already have an OIG/SAM order.

Surfaces updated to $79/mo: PaymentStep (filters methods, "Billed every month,
cancel anytime"), order pages, healthcare index, npi-compliance-check tool (also
fixed stale $699 bundle drift -> $899), hc_oig_screening + hc_compliance_bundle
emails.

Docs: billing.md gains a "Stripe-native Subscriptions" section + a reality-check
banner (Adyen/ERPNext-gateway model documented there is NOT live; Stripe is the
real rail). Fixed run-migrations.yml container name bug
(performancewest-postgres-1 -> performancewest-api-postgres-1, overridable).

Tests: api/tests/recurring-subscription.test.ts (28 assertions) covers catalog
gating, method validation, surcharge suppression, recurring line-item build,
invoiceSubscriptionId extraction, renewal-cycle gating. tsc clean; site build
clean; catalog drift OK.

Manual deploy step: enable invoice.paid, invoice.payment_failed,
customer.subscription.deleted on the Stripe webhook endpoint.
2026-06-18 07:54:38 -05:00
justin
f481a1d13c analytics: filter email-scanner / headless traffic out of Umami stats
Email security gateways (Microsoft Defender Safe Links / ATP, Proofpoint,
Mimecast, Barracuda, etc.) auto-fetch and often render every link in a
campaign email to scan for malware. The advanced ones drive a real headless
browser, execute JS, and fire Umami pageviews/clicks that masquerade as human
visits -- inflating campaign click-through.

New site/public/js/pw-bot-filter.js queries multiple real-browser signals and
gates Umami via its official data-before-send hook (umamiBeforeSend), dropping
all events when the visitor is a bot. Signals (from empirical chromium probing):
  decisive: navigator.webdriver, HeadlessChrome UA, known scanner UAs, zero/
            collapsed screen|viewport|outer geometry, window LARGER than the
            physical screen (impossible on real HW; uses outerW/H so page zoom
            does not false-positive), software GPU rasterizer (SwiftShader/
            llvmpipe/swrast via WebGL UNMASKED_RENDERER), zero logical CPUs.
  soft (>=2 to trip): tiny screen, inner>screen, low color depth, empty
            navigator.languages, no input device (no fine/coarse pointer + no
            hover + 0 touch), no WebGL on a desktop UA.
Designed to FAIL OPEN: only strong/corroborated evidence suppresses, so real
visitors (incl. zoomed, privacy-tooled, remote-desktop, kiosk) still count.

Wired before the Umami tag in Base.astro (Astro pages) and all 86 static
public/**/*.html pages; both load with defer so order is guaranteed and the
hook is defined before Umami reads it.

Tested end-to-end with chromium (site/tests/bot-filter.test.sh, 4/4):
default headless-new, spoofed-Windows-UA + normal 1366x768 window, and
spoofed-UA + 1x1 window are all caught; hook returns null to drop the event.
2026-06-18 02:02:34 -05:00
justin
40da017b79 campaigns: auto-rollout catch-all pool gated by warmup day + live bounce rate
Replaces the panic-era burner-domain verification plan with an in-house
automatic catch-all rollout in the trucking/IFTA/UCR builders. Root-cause
classification of the 75k pre-DKIM-fix bounces showed ~55% were reputation/
auth (now fixed by DKIM signing) and only ~29% genuinely-dead mailboxes;
catch-all domains accept at RCPT time so they do not user-unknown bounce at
send, making a controlled in-house bleed safer than warming a separate burner.

catch_all_enabled() adds catch-all results only when warmup_day >=
CAMPAIGN_CATCH_ALL_MIN_DAY (21) AND the recent 2-day live bounce rate is below
CAMPAIGN_CATCH_ALL_MAX_BOUNCE_PCT (8%) on a >=300-sent sample; auto-reverts to
the clean smtp_valid/send_confirmed pool on the next run if bounces spike.
Short window so a past disaster cannot block the rollout forever and a fresh
spike trips fast. CAMPAIGN_INCLUDE_CATCH_ALL=1/0 still hard-overrides.

USABLE_FILTER (static) -> usable_filter() (per-run, memoized, one DB probe).
IFTA/UCR SELECT_SQL -> _select_sql() so tc.usable_filter() resolves at call
time, not import. 13 logic unit tests pass; live dry-run decision = OFF
(day 15 < 21 and recent 2d bounce 42% from the aging-out Jun-16 disaster).
2026-06-18 01:39:09 -05:00
justin
c36ef07310 crtc site: defensible framing + 'who this is for' compliance posture
Reduce evasion optics that would draw FCC enforcement attention while keeping the
real value props:
- 'What they avoid by being Canadian' -> 'What the Canadian structure changes'
- Drop 'No US telecom taxes on invoices (15-40% saved)' -> Canadian tax treatment
  on the Canadian entity's billing; 'No US FCC regulatory fees on the Canadian entity'
- '...avoid this by routing US traffic...' -> '...instead route US traffic through
  US intermediaries who carry the 499-A obligation...'
- Add prominent 'Who this is for - and who it isn't' section: legitimate
  conversational voice (UCaaS/PBX/business/residential/live-agent) yes;
  short-duration/dialer/robocall-evasion no. States upstreams are fully
  STIR/SHAKEN compliant and we don't onboard traffic designed to evade
  caller-ID auth; notes Canadian carriers police ASR/ACD more strictly than
  anywhere (a feature). HTML validated balanced.
2026-06-18 00:22:58 -05:00
justin
720197095c CRTC USF email: defensible framing + conversational-voice caveat
Reframe away from 'escape the FCC' optics that would draw enforcement attention:
- Header/flagbar: 'Move your VoIP home to Canada' / 'US obligations ride on your
  upstream' (was 'no FCC reporting, no USAC, no S/S to run')
- Recast claims to 'CRTC regulatory home, not FCC' and scope the no-USF/no-499/
  no-RMD claims to the Canadian-jurisdiction traffic (accurate for US-number
  traffic, which rides on the compliant US upstream)
- STIR/SHAKEN bullet now explicitly pro-compliance: 'we don't help anyone dodge
  call-authentication; upstream partners are fully S/S compliant'
- Drop 'outside the FCC's reach'
- Add honest caveat: Canada is not for short-duration/dialer traffic; Canadian
  carriers are more stringent on ACD/ASR than anywhere; this is for real
  conversational voice (UCaaS/PBX/business/residential/live-agent)
2026-06-18 00:20:44 -05:00
justin
a82b356921 CRTC USF email: reframe to 'run your whole VoIP as a Canadian carrier'
Pivot from the hedge/second-entity framing to the consolidation pitch: one CRTC
carrier as the home base, nexus in Canada, customers onboarded from anywhere.
Lead value props with the three concrete reseller realities:
- No FCC reporting (no 499-A/Q, no RMD recert)
- No USAC/USF on your revenue (contribution sits upstream)
- No STIR/SHAKEN to set up or run (reseller can't get a US token; upstream signs)
Add: No FCC Section 214 / no ongoing 214 burden -- CRTC BITS is a cheap,
low-burden notification by comparison. Header/subject reworked; keeps the honest
US-termination + upstream-signing explanation.
2026-06-18 00:10:06 -05:00
justin
d9ecb94b27 CRTC USF email: add honest US-termination + STIR/SHAKEN section
Address the two most common objections truthfully (researched against CRTC,
FCC 2025 Third-Party Authentication Order, and STIR/SHAKEN cross-border docs):
- US-based long-distance termination operators routinely accept traffic from
  Canadian carriers (cross-border voice is a standard interconnect).
- STIR/SHAKEN: a Canadian reseller cannot get a US SPC token (US-carrier-only),
  so US-bound calls are signed by the upstream US-number provider that assigns
  the DIDs -- exactly how most small US carriers already rely on upstream
  signing. Canadian-origin traffic falls under the lighter CRTC regime, handled
  by the upstream Canadian carrier. Does NOT claim S/S disappears -- it moves to
  the upstream, off the carrier's day-to-day operation.
2026-06-18 00:03:31 -05:00
justin
8099afc5ab CRTC USF email: note US DIDs available from Canadian carriers + point to guide
Address the obvious 'but I need US numbers' objection: several Canadian
wholesale carriers (Fibernetics, Iristel, VoIP.ms, Telnyx, Bandwidth, Twilio,
Frontier) provision US DIDs to CRTC-registered carriers, so they can keep
serving US customers from the Canadian entity. Adds a Canada-advantage bullet
and updates the guide block to call out both US + Canadian DIDs.
2026-06-17 23:53:19 -05:00
justin
1c63e8f4b5 CRTC USF email: add FCC photo-ID KYC requirement to the burden list + Canada contrast
The FCC's 2025 Robocall Mitigation Order (47 CFR 64.1200(n)(4), FCC 25-6)
requires collecting + authenticating a government-issued photo ID for every
new customer before turning up voice service. Add it to the US-carrier burden
list and the matching 'does not apply in Canada' advantage.
2026-06-17 23:46:04 -05:00
justin
2611b5458b CRTC USF campaign: shared campaign_helpers + Q3 38.8% USF email builder
- campaign_helpers.py: extract the branded Listmonk HTML helpers (hdr/flagbar/
  stats/cta/footer/P/UL/etc.) + create_campaign() from create_campaigns.py into
  a side-effect-free shared module; create_campaign() now takes an altbody so
  every campaign ships a plaintext alternative (deliverability).
- create_crtc_usf_campaign.py: build the one-off CRTC email hooked on the Q3
  2026 USF factor (38.8%, +1.8pts, eff Jul 1), with a $200-off CANADA200 banner
  (expires Fri 23:59 ET, CTA links carry ?code= for auto-apply), the full US
  carrier burden vs Canada advantage, BC/ON incorporation, and a hosted
  carrier-guide PDF download. Creates a DRAFT only; sending stays manual.
2026-06-17 23:40:01 -05:00
justin
e379e2b10f CRTC: ERPNext as portal source of truth + harden discount expiry + carrier guide PDF
- checkout.ts: generalize ensureCompliancePortalUser -> ensurePortalUser and
  call it in the CRTC post-payment path so PayPal/crypto/webhook-confirmed CRTC
  orders always get an ERPNext Customer + Website User (the single source of
  truth for portal login/password), matching the compliance fix from the
  PayPal incident. Also flip portal_user_created for canada_crtc/formation.
- canada-crtc.ts: enforce discount active+start/expiry windows, global usage
  limit and applies_to scope server-side at checkout (was active-only), so a
  promo like CANADA200 actually stops working after its expiry.
- scripts/generate_canada_carrier_guide_pdf.py: render the public Canadian
  wholesale carrier/vendor guide PDF (reuses the canonical VENDORS list) to
  site/public/guides/canada-carrier-guide.pdf for the CRTC campaign lead magnet.
2026-06-17 23:34:13 -05:00
justin
eed5e4a258 campaigns: disable daily discount by default — test normal-price deals
The daily 40%-off coupon was being merged into every trucking/UCR/IFTA/OTC
send, but those discount sends were not actually being delivered (the
DKIM-broken window). Now that deliverability is fixed, re-test whether
normal-price offers convert before giving margin away.

New CAMPAIGN_ENABLE_COUPON env flag (default OFF) gates daily-coupon
minting in build_trucking_campaigns + the UCR/IFTA/OTC builders (which
import it as tc.COUPON_ENABLED). With it off, no code is minted and an
empty coupon_code is merged -> the campaign templates' existing
{{ if .Subscriber.Attribs.coupon_code }} guard falls through to the
normal-price {{ else }} branch and landing-page links carry no ?code=.
No template or DB changes; fully reversible (set CAMPAIGN_ENABLE_COUPON=1).

Verified: COUPON_ENABLED defaults False, coupon_attribs(None) -> empty,
lp_link drops ?code= when no coupon, all 4 builders compile.
2026-06-17 22:51:28 -05:00
justin
a04ecf7df3 chore(email): decommission SMTP2GO references — local MTA only
SMTP2GO is no longer used: Listmonk relays through the local Postfix MTA
(172.18.0.1:25 from the Docker network), which DKIM-signs and delivers
direct-to-recipient-MX; transactional mail goes through Carbonio. Verified
zero smtp2go in any live container env + postfix has no external relayhost.

Removed the stale references so a rebuild/new dev can't re-introduce it:
- api/src/config.ts: SMTP_HOST default mail.smtp2go.com -> co.carrierone.com
- scripts/workers/crypto_payment_worker.py: same default fix
- infra/ansible all.yml: listmonk_smtp_* now 172.18.0.1:25, no auth (+comment)
- app.env.j2 / email.ts / crm.md / go-live-todo.md / architecture.svg: docs
2026-06-17 22:46:59 -05:00
justin
eba525f83f docs: runbook fix #8 — telecom/transactional HTML-only plaintext fix + campaign 407 finding 2026-06-17 21:17:06 -05:00
justin
b375385efd fix(email): add text/plain part to every transactional + telecom email
All transactional/worker senders built multipart/alternative (or mixed)
messages with ONLY an HTML part. A single-part multipart/alternative is
malformed and HTML-only mail is a spam-score signal -- the same class of
deliverability bug that hurt the campaign pipeline, but on the telecom /
filing / customer-transactional path (499-Q reminders, RMD/FCC filing
review links, intake/completion/delivery emails, commissions, etc).

- worker_email.send_worker_email: auto-derive plaintext from HTML when
  caller omits text= (fixes the shared helper for all current+future use)
- 16 rolled-their-own senders in scripts/workers/** + scripts/formation/
  document_delivery.py: attach html_to_text(...) plaintext sibling before
  the HTML part (job_server + document_delivery wrap text+html in an
  alternative sub-part so PDFs still attach to the mixed root)
- api/src/email.ts: add dependency-free htmlToText() and default
  sendEmail text to it (fixes checkout/webhook HTML-only sends)

Verified: all py files compile + import at runtime, api tsc passes,
htmlToText handles hrefs/lists/entities, 11 plaintext unit tests pass.
Telecom campaign 407 (Jun 8) was HTML-only + sent in the DKIM-broken
window -> 384 sent / 0 clicks (same junked-mail signature).
2026-06-17 21:07:40 -05:00
justin
899b880e7f trucking: weekly FMCSA source refresh so new non-compliant carriers are caught
The FMCSA census was a one-time snapshot (last loaded ~May 30) with NO refresh
timer -- carriers newly falling out of MCS-150/UCR compliance were never picked
up. New scripts/workers/fmcsa_source_refresh.py orchestrates the full pipeline
(census download -> enrichment -> deficiency flag -> verify new emails ->
MX-tag new) and runs weekly via cron pw-fmcsa-refresh (Sun 09:00 UTC), codified
in the mail-pipeline Ansible role.

Idempotent + incremental: the census upsert preserves email_verified /
listmonk_sent_at / deficiency_flags, so existing carriers keep their send state
and only census fields refresh; new DOTs flow into verification then campaigns.
A carrier who refiled gets a fresh mcs150_parsed, so the builder's overdue
WHERE clause stops targeting them automatically. Verify is capped per run
(20k) so it never stalls on millions of rows.

(Healthcare already auto-catches newly-revalidation-overdue providers within
its 63k institutional pool via pw-hc-refresh Mon/Wed/Fri.)
2026-06-17 20:44:54 -05:00
justin
4171f48736 docs: record post-incident email hardening (7 fixes) in runbook 2026-06-17 20:30:59 -05:00
justin
466460112b email: handle unquoted hrefs in plaintext converter + add tests
The anchor regex only matched quoted hrefs; unquoted (href=URL) dropped the
URL from the plaintext part. Now handles double/single/unquoted. Added
scripts/test_email_plaintext.py (11 cases: link forms, mailto, template-tag
preservation, tag stripping, entity unescape, blank-line collapse).
2026-06-17 20:28:15 -05:00
justin
4dc5690666 infra: codify the email-campaign pipeline in Ansible (new mail-pipeline role)
The entire outbound campaign pipeline lived ONLY on the host and was never in
IaC -- a fresh rebuild would have silently shipped NO campaigns, NO IP warmup/
ramp, and NO bounce processing. New mail-pipeline role + deploy-mail-pipeline.yml
playbook deploy it from the canonical repo copies:

  cron.d (infra/cron/):
    - pw-trucking-campaign-builder, pw-ifta-campaign, pw-ucr-campaign
    - pw-hc-campaign, pw-hc-nppes, pw-hc-refresh
    - pw-mta-warmup, pw-listmonk-rampcap, pw-hc-rampcap
    - pw-ip-rehab, pw-warmup-tg-alert
  helper scripts (-> /usr/local/bin):
    - pw-mta-warmup, pw-listmonk-rampcap, pw-hc-rampcap, pw-warmup-tg-alert
    - postfix-bounce-notify.sh, postfix-hc-bounce-notify.sh, listmonk-bounce-sync.py
  systemd services:
    - pw-bounce-watcher.service (was missing from repo), pw-hc-bounce-watcher.service

Also creates the deploy-owned {{project_dir}}/logs dir (deploy can't write
/var/log, so a missing dir made cron redirects fail). Added the 6 cron.d files
that existed only on the host, the trucking bounce-watcher unit, and synced
infra/cron/pw-hc-refresh to the live version (revalidation download + enrich
steps). Role wired into site.yml after the mail (OpenDKIM) role.

Part of the email-deliverability incident hardening.
2026-06-17 20:26:01 -05:00
justin
c183957939 email: suppress defunct/legacy/satellite ISP domains in cold sends
Added DEAD_ISP_DOMAINS (52 domains) to BLOCKED_EMAIL_DOMAINS, so every
campaign builder that imports the shared exclusions (trucking, UCR, IFTA via
create_and_schedule_campaign, and the healthcare importer) stops cold-mailing
them. Domains were identified from our own Listmonk bounce table (top bounced
recipient domains) cross-checked against ISP status: defunct dial-up brands
(earthlink, netzero, juno, mindspring...), Qwest/Embarq legacy, satellite
(hughes, wildblue, dishmail), Altice/Suddenlink rural, WOW!/Knology, small
rural ISPs (windstream, tds, iowatelecom...) and Alaska regional.

Deliberately keeps still-active large consumer ISPs (comcast/charter/cox/
centurylink) -- their bounces were the cold-IP/no-DKIM reputation problem
(now fixed), not dead mailboxes, and they carry real prospects.

Part of the email-deliverability incident hardening.
2026-06-17 20:16:00 -05:00
justin
a32a3b05a0 email: add plaintext MIME part + stable Message-ID hostname
Two deliverability hardening fixes from the email audit:

1. Plaintext (altbody): all campaigns were HTML-only. Listmonk only emits
   multipart/alternative when altbody is set, and HTML-only bulk mail is a
   spam-score signal. New scripts/_email_plaintext.py renders a readable
   text/plain part from the HTML body (dependency-free; preserves Listmonk
   {{ .Subscriber }}/{{ UnsubscribeURL }} template tags, turns links into
   'text (url)'). Wired into the trucking builder (and thus UCR + IFTA, which
   reuse create_and_schedule_campaign) and the healthcare builder.

2. Stable container hostname: Listmonk derived its Message-ID from the random
   docker container id -> @localhost.localdomain (spam-score signal). Pin both
   listmonk + listmonk-hc hostname to perfwest.performancewest.net, matching
   Listmonk's SMTP hello_hostname.

Part of the email-deliverability incident hardening.
2026-06-17 20:09:02 -05:00
justin
2e4388a803 mail: add logrotate for Postfix mail.log (postlogd copytruncate)
mail.log had no logrotate rule and grew unbounded to ~1GB (~150MB/day)
since Jun 8. This host logs via Postfix's built-in postlogd (maillog_file
mode), not rsyslog (no rsyslog.service exists), so postlogd holds the file
open -- a plain rename+create would leave it writing to the stale inode.
Use copytruncate (no daemon signal needed). Rotate daily, keep 14 days
compressed. Applied live: forced first rotation, compressed the 1GB
archive (->99MB), verified logging + bounce watchers + DKIM signing intact.

Part of the email-deliverability incident hardening (follows DKIM fix 4d59019).
2026-06-17 19:47:13 -05:00
justin
4d5901921e mail: fix OpenDKIM not signing campaign mail (Docker-injected) + codify in Ansible
Root cause of the Jun 2026 deliverability collapse / 'no new sales':
opendkim.conf was in single-key mode with no InternalHosts, so it signed only
127.0.0.1. Transactional/cron mail (injected locally) was signed, but ALL
campaign mail -- injected over the Docker bridge from the Listmonk containers
(172.18.0.5 trucking, 172.18.0.25 healthcare) -- went out UNSIGNED. Gmail/Yahoo
require DKIM on bulk mail since Feb 2024, so cold campaigns were junked/blocked
(~23% delivery, 550-5.7.1). Proof: 2,620 campaign msgs that day, 0 DKIM sigs.

The correct table files already existed on the server but were never wired into
opendkim.conf. Fix points the daemon at key.table/signing.table and sets
InternalHosts/ExternalIgnoreList to trusted.hosts (which includes 172.16.0.0/12,
the Docker subnet). Fixes BOTH streams: HC submission ports 2526-2528 inherit
the global smtpd_milters and *@performancewest.net covers compliance@.

Verified by injecting from a Docker IP through port 25 and port 2526 -- both now
get 'DKIM-Signature field added'. Codified as new Ansible role 'mail' so it
can't silently regress (OpenDKIM was previously not in IaC at all).
2026-06-17 19:31:19 -05:00
justin
f7212b3969 scripts: one-off fresh password-set link for Paul Wilson (ERPNext auth) 2026-06-17 10:19:53 -05:00
justin
9c87759501 auth: make ERPNext the single source of truth for customer passwords
Customer portal login previously checked a bcrypt customers.password_hash
in Postgres, while portal.performancewest.net validated against ERPNext —
two stores that drifted (the Paul Wilson lockout). Consolidate on ERPNext:

- erpnext-client: add verifyWebsiteUserPassword() — delegates the credential
  check to Frappe /api/method/login (Host header = site name; 200=ok,401=bad).
- portal-auth /login: verify against ERPNext, then mint the pw_customer cookie.
- portal-auth /register: create+set the ERPNext password (authority) and upsert
  a password-less customers profile row; takeover guard still honors any legacy
  PG password until the column is dropped.
- portal-auth /reset-password + /forgot-password: write the new password to
  ERPNext; forgot-password now also works for ERPNext-only users (creates the
  PG profile row on demand).
- Legacy customers with only a PG bcrypt password reset via forgot-password.
- checkout: refresh the stale comment (customers row is now a profile, no pw).

Build + typecheck green.
2026-06-17 10:09:32 -05:00
justin
557b45f65d fix(erpnext): self-heal outgoing Email Account password from SMTP_* env
Root cause of recurring 'Password not found for Email Account Performance West
Outgoing': the account was shipped as a fixture with awaiting_password=1 and no
password. Email Account SMTP passwords are encrypted per-site and cannot live in
a fixture, so every `bench migrate` reimported the fixture and re-broke
outgoing mail (login notifications, password resets, welcome emails).

- Remove the Email Account fixture (it cannot carry the encrypted secret).
- Add email_account_sync.sync_outgoing_password: idempotent, exception-safe
  upsert that reconciles the account + password from SMTP_* env and clears
  awaiting_password.
- Wire it to after_migrate (repairs at end of every deploy/migrate, right after
  fixtures import) and the daily scheduler (heals out-of-band restore/restart
  drift).
- Pass SMTP_* into the erpnext + erpnext-scheduler containers so the sync has
  the secret (they previously had no SMTP env).
2026-06-17 09:48:28 -05:00
justin
1eb29f80be fix(verifier): mx_unreachable was mislabeling live big-ISP mailboxes
The verifier returned (True, 'mx_unreachable') when it couldn't complete a port-25
probe to ANY MX — marking 438,163 addresses email_verified=TRUE. But these are NOT
dead: they're dominated by Comcast (13.7k), AT&T/SBCGlobal (13.5k), Verizon, Cox,
Charter, Frontier, etc. — major ISPs that deliberately tarpit/refuse probes from
unknown IPs. Confirmed from prod: comcast MX connects + returns 220. The probe
failure ≠ undeliverable.

Fix: return (False, 'mx_probe_blocked') — MX exists, deliverability UNKNOWN, must
be confirmed by a real send. Excluded from PW campaigns; prime burner-verification
target (burner_list_verify upgrades it to send_confirmed on delivery). Existing
438,163 mx_unreachable rows reclassified in prod to mx_probe_blocked / verified=FALSE.
2026-06-17 05:48:08 -05:00