Commit graph

719 commits

Author SHA1 Message Date
justin
a04ecf7df3 chore(email): decommission SMTP2GO references — local MTA only
SMTP2GO is no longer used: Listmonk relays through the local Postfix MTA
(172.18.0.1:25 from the Docker network), which DKIM-signs and delivers
direct-to-recipient-MX; transactional mail goes through Carbonio. Verified
zero smtp2go in any live container env + postfix has no external relayhost.

Removed the stale references so a rebuild/new dev can't re-introduce it:
- api/src/config.ts: SMTP_HOST default mail.smtp2go.com -> co.carrierone.com
- scripts/workers/crypto_payment_worker.py: same default fix
- infra/ansible all.yml: listmonk_smtp_* now 172.18.0.1:25, no auth (+comment)
- app.env.j2 / email.ts / crm.md / go-live-todo.md / architecture.svg: docs
2026-06-17 22:46:59 -05:00
justin
eba525f83f docs: runbook fix #8 — telecom/transactional HTML-only plaintext fix + campaign 407 finding 2026-06-17 21:17:06 -05:00
justin
b375385efd fix(email): add text/plain part to every transactional + telecom email
All transactional/worker senders built multipart/alternative (or mixed)
messages with ONLY an HTML part. A single-part multipart/alternative is
malformed and HTML-only mail is a spam-score signal -- the same class of
deliverability bug that hurt the campaign pipeline, but on the telecom /
filing / customer-transactional path (499-Q reminders, RMD/FCC filing
review links, intake/completion/delivery emails, commissions, etc).

- worker_email.send_worker_email: auto-derive plaintext from HTML when
  caller omits text= (fixes the shared helper for all current+future use)
- 16 rolled-their-own senders in scripts/workers/** + scripts/formation/
  document_delivery.py: attach html_to_text(...) plaintext sibling before
  the HTML part (job_server + document_delivery wrap text+html in an
  alternative sub-part so PDFs still attach to the mixed root)
- api/src/email.ts: add dependency-free htmlToText() and default
  sendEmail text to it (fixes checkout/webhook HTML-only sends)

Verified: all py files compile + import at runtime, api tsc passes,
htmlToText handles hrefs/lists/entities, 11 plaintext unit tests pass.
Telecom campaign 407 (Jun 8) was HTML-only + sent in the DKIM-broken
window -> 384 sent / 0 clicks (same junked-mail signature).
2026-06-17 21:07:40 -05:00
justin
899b880e7f trucking: weekly FMCSA source refresh so new non-compliant carriers are caught
The FMCSA census was a one-time snapshot (last loaded ~May 30) with NO refresh
timer -- carriers newly falling out of MCS-150/UCR compliance were never picked
up. New scripts/workers/fmcsa_source_refresh.py orchestrates the full pipeline
(census download -> enrichment -> deficiency flag -> verify new emails ->
MX-tag new) and runs weekly via cron pw-fmcsa-refresh (Sun 09:00 UTC), codified
in the mail-pipeline Ansible role.

Idempotent + incremental: the census upsert preserves email_verified /
listmonk_sent_at / deficiency_flags, so existing carriers keep their send state
and only census fields refresh; new DOTs flow into verification then campaigns.
A carrier who refiled gets a fresh mcs150_parsed, so the builder's overdue
WHERE clause stops targeting them automatically. Verify is capped per run
(20k) so it never stalls on millions of rows.

(Healthcare already auto-catches newly-revalidation-overdue providers within
its 63k institutional pool via pw-hc-refresh Mon/Wed/Fri.)
2026-06-17 20:44:54 -05:00
justin
4171f48736 docs: record post-incident email hardening (7 fixes) in runbook 2026-06-17 20:30:59 -05:00
justin
466460112b email: handle unquoted hrefs in plaintext converter + add tests
The anchor regex only matched quoted hrefs; unquoted (href=URL) dropped the
URL from the plaintext part. Now handles double/single/unquoted. Added
scripts/test_email_plaintext.py (11 cases: link forms, mailto, template-tag
preservation, tag stripping, entity unescape, blank-line collapse).
2026-06-17 20:28:15 -05:00
justin
4dc5690666 infra: codify the email-campaign pipeline in Ansible (new mail-pipeline role)
The entire outbound campaign pipeline lived ONLY on the host and was never in
IaC -- a fresh rebuild would have silently shipped NO campaigns, NO IP warmup/
ramp, and NO bounce processing. New mail-pipeline role + deploy-mail-pipeline.yml
playbook deploy it from the canonical repo copies:

  cron.d (infra/cron/):
    - pw-trucking-campaign-builder, pw-ifta-campaign, pw-ucr-campaign
    - pw-hc-campaign, pw-hc-nppes, pw-hc-refresh
    - pw-mta-warmup, pw-listmonk-rampcap, pw-hc-rampcap
    - pw-ip-rehab, pw-warmup-tg-alert
  helper scripts (-> /usr/local/bin):
    - pw-mta-warmup, pw-listmonk-rampcap, pw-hc-rampcap, pw-warmup-tg-alert
    - postfix-bounce-notify.sh, postfix-hc-bounce-notify.sh, listmonk-bounce-sync.py
  systemd services:
    - pw-bounce-watcher.service (was missing from repo), pw-hc-bounce-watcher.service

Also creates the deploy-owned {{project_dir}}/logs dir (deploy can't write
/var/log, so a missing dir made cron redirects fail). Added the 6 cron.d files
that existed only on the host, the trucking bounce-watcher unit, and synced
infra/cron/pw-hc-refresh to the live version (revalidation download + enrich
steps). Role wired into site.yml after the mail (OpenDKIM) role.

Part of the email-deliverability incident hardening.
2026-06-17 20:26:01 -05:00
justin
c183957939 email: suppress defunct/legacy/satellite ISP domains in cold sends
Added DEAD_ISP_DOMAINS (52 domains) to BLOCKED_EMAIL_DOMAINS, so every
campaign builder that imports the shared exclusions (trucking, UCR, IFTA via
create_and_schedule_campaign, and the healthcare importer) stops cold-mailing
them. Domains were identified from our own Listmonk bounce table (top bounced
recipient domains) cross-checked against ISP status: defunct dial-up brands
(earthlink, netzero, juno, mindspring...), Qwest/Embarq legacy, satellite
(hughes, wildblue, dishmail), Altice/Suddenlink rural, WOW!/Knology, small
rural ISPs (windstream, tds, iowatelecom...) and Alaska regional.

Deliberately keeps still-active large consumer ISPs (comcast/charter/cox/
centurylink) -- their bounces were the cold-IP/no-DKIM reputation problem
(now fixed), not dead mailboxes, and they carry real prospects.

Part of the email-deliverability incident hardening.
2026-06-17 20:16:00 -05:00
justin
a32a3b05a0 email: add plaintext MIME part + stable Message-ID hostname
Two deliverability hardening fixes from the email audit:

1. Plaintext (altbody): all campaigns were HTML-only. Listmonk only emits
   multipart/alternative when altbody is set, and HTML-only bulk mail is a
   spam-score signal. New scripts/_email_plaintext.py renders a readable
   text/plain part from the HTML body (dependency-free; preserves Listmonk
   {{ .Subscriber }}/{{ UnsubscribeURL }} template tags, turns links into
   'text (url)'). Wired into the trucking builder (and thus UCR + IFTA, which
   reuse create_and_schedule_campaign) and the healthcare builder.

2. Stable container hostname: Listmonk derived its Message-ID from the random
   docker container id -> @localhost.localdomain (spam-score signal). Pin both
   listmonk + listmonk-hc hostname to perfwest.performancewest.net, matching
   Listmonk's SMTP hello_hostname.

Part of the email-deliverability incident hardening.
2026-06-17 20:09:02 -05:00
justin
2e4388a803 mail: add logrotate for Postfix mail.log (postlogd copytruncate)
mail.log had no logrotate rule and grew unbounded to ~1GB (~150MB/day)
since Jun 8. This host logs via Postfix's built-in postlogd (maillog_file
mode), not rsyslog (no rsyslog.service exists), so postlogd holds the file
open -- a plain rename+create would leave it writing to the stale inode.
Use copytruncate (no daemon signal needed). Rotate daily, keep 14 days
compressed. Applied live: forced first rotation, compressed the 1GB
archive (->99MB), verified logging + bounce watchers + DKIM signing intact.

Part of the email-deliverability incident hardening (follows DKIM fix 4d59019).
2026-06-17 19:47:13 -05:00
justin
4d5901921e mail: fix OpenDKIM not signing campaign mail (Docker-injected) + codify in Ansible
Root cause of the Jun 2026 deliverability collapse / 'no new sales':
opendkim.conf was in single-key mode with no InternalHosts, so it signed only
127.0.0.1. Transactional/cron mail (injected locally) was signed, but ALL
campaign mail -- injected over the Docker bridge from the Listmonk containers
(172.18.0.5 trucking, 172.18.0.25 healthcare) -- went out UNSIGNED. Gmail/Yahoo
require DKIM on bulk mail since Feb 2024, so cold campaigns were junked/blocked
(~23% delivery, 550-5.7.1). Proof: 2,620 campaign msgs that day, 0 DKIM sigs.

The correct table files already existed on the server but were never wired into
opendkim.conf. Fix points the daemon at key.table/signing.table and sets
InternalHosts/ExternalIgnoreList to trusted.hosts (which includes 172.16.0.0/12,
the Docker subnet). Fixes BOTH streams: HC submission ports 2526-2528 inherit
the global smtpd_milters and *@performancewest.net covers compliance@.

Verified by injecting from a Docker IP through port 25 and port 2526 -- both now
get 'DKIM-Signature field added'. Codified as new Ansible role 'mail' so it
can't silently regress (OpenDKIM was previously not in IaC at all).
2026-06-17 19:31:19 -05:00
justin
f7212b3969 scripts: one-off fresh password-set link for Paul Wilson (ERPNext auth) 2026-06-17 10:19:53 -05:00
justin
9c87759501 auth: make ERPNext the single source of truth for customer passwords
Customer portal login previously checked a bcrypt customers.password_hash
in Postgres, while portal.performancewest.net validated against ERPNext —
two stores that drifted (the Paul Wilson lockout). Consolidate on ERPNext:

- erpnext-client: add verifyWebsiteUserPassword() — delegates the credential
  check to Frappe /api/method/login (Host header = site name; 200=ok,401=bad).
- portal-auth /login: verify against ERPNext, then mint the pw_customer cookie.
- portal-auth /register: create+set the ERPNext password (authority) and upsert
  a password-less customers profile row; takeover guard still honors any legacy
  PG password until the column is dropped.
- portal-auth /reset-password + /forgot-password: write the new password to
  ERPNext; forgot-password now also works for ERPNext-only users (creates the
  PG profile row on demand).
- Legacy customers with only a PG bcrypt password reset via forgot-password.
- checkout: refresh the stale comment (customers row is now a profile, no pw).

Build + typecheck green.
2026-06-17 10:09:32 -05:00
justin
557b45f65d fix(erpnext): self-heal outgoing Email Account password from SMTP_* env
Root cause of recurring 'Password not found for Email Account Performance West
Outgoing': the account was shipped as a fixture with awaiting_password=1 and no
password. Email Account SMTP passwords are encrypted per-site and cannot live in
a fixture, so every `bench migrate` reimported the fixture and re-broke
outgoing mail (login notifications, password resets, welcome emails).

- Remove the Email Account fixture (it cannot carry the encrypted secret).
- Add email_account_sync.sync_outgoing_password: idempotent, exception-safe
  upsert that reconciles the account + password from SMTP_* env and clears
  awaiting_password.
- Wire it to after_migrate (repairs at end of every deploy/migrate, right after
  fixtures import) and the daily scheduler (heals out-of-band restore/restart
  drift).
- Pass SMTP_* into the erpnext + erpnext-scheduler containers so the sync has
  the secret (they previously had no SMTP env).
2026-06-17 09:48:28 -05:00
justin
1eb29f80be fix(verifier): mx_unreachable was mislabeling live big-ISP mailboxes
The verifier returned (True, 'mx_unreachable') when it couldn't complete a port-25
probe to ANY MX — marking 438,163 addresses email_verified=TRUE. But these are NOT
dead: they're dominated by Comcast (13.7k), AT&T/SBCGlobal (13.5k), Verizon, Cox,
Charter, Frontier, etc. — major ISPs that deliberately tarpit/refuse probes from
unknown IPs. Confirmed from prod: comcast MX connects + returns 220. The probe
failure ≠ undeliverable.

Fix: return (False, 'mx_probe_blocked') — MX exists, deliverability UNKNOWN, must
be confirmed by a real send. Excluded from PW campaigns; prime burner-verification
target (burner_list_verify upgrades it to send_confirmed on delivery). Existing
438,163 mx_unreachable rows reclassified in prod to mx_probe_blocked / verified=FALSE.
2026-06-17 05:48:08 -05:00
justin
c2737f2001 feat(deliverability): burner-domain list verification + plan doc
The smtp_valid pool is only ~3k unsent — too small to sustain campaigns. SMTP
probing can't confirm catch-all/mx_unreachable deliverability; only a REAL send
can. burner_list_verify.py reconciles a verification send from a DISPOSABLE burner
domain (isolated from PW/carrierone reputation):
  - hard bounce  -> fmcsa_carriers.email_verify_result='hard_bounced' (excluded)
  - delivered    -> 'send_confirmed' (proven deliverable; PW campaigns send to it)
It tails the burner MTA mail.log (reuses bounce-watcher's status= pattern) and
writes back idempotently. The PW trucking filter now treats smtp_valid +
send_confirmed as sendable. docs/campaign-deliverability-plan.md captures the full
diagnosis, the burner design, and CAN-SPAM guardrails.

Remaining (needs a domain + isolated MTA identity — operator/infra decision):
stand up the burner domain, the verification-send worker, and a writeback cron.
2026-06-16 22:28:24 -05:00
justin
1652a3b8bc fix(campaigns): stop sending trucking blasts to mx_unreachable dead domains
Root cause of zero conversions since Jun 9 + the Gmail/Outlook block storm:
the send filter was '(email_verified IS TRUE OR result IN ...)'. The verifier
sets email_verified=TRUE optimistically for mx_unreachable (domain exists but
its mail server never answered the RCPT probe) — 438,163 such rows. Those HARD
BOUNCE on send, producing ~1,100 bounces/day (~47% rate) and blocklisting half
the 120k subscriber base, so real prospects never saw the offer.

Fix: key the send filter ONLY off email_verify_result, never the broken boolean.
Recovery mode (default): send only 'smtp_valid' to drive bounce rate to ~0 and
rebuild reputation; set CAMPAIGN_INCLUDE_CATCH_ALL=1 to re-add catch-all domains
once recovered. Mirrors the healthcare list-cleaning approach (HC bounces ~2-3%,
which proves the fix). Note: only ~3k smtp_valid unsent remain — list growth via
real-send bounce verification (separate burner domain) is the follow-up.
2026-06-16 22:24:15 -05:00
justin
35f204c2b8 fix(mcs150): point intake email to per-slug wizard (not sales page) + add Trailers field
The MCS-150 intake-completion email linked customers to /order/dot-compliance,
which is the sales/checkout page -- it ignores ?order= and asks the customer to
re-pick services and pay again, so they 'cannot enter any data' (Paul Wilson's
report). Link to the per-service intake wizard /order/<slug>?order=... instead,
which loads the paid order, pre-fills from the FMCSA census, and drops payment.

Also add a Trailers field to the DOT intake fleet section and wire it through to
the MCS-150 PDF Q26 trailer row, so carriers can update trucks AND trailers.
2026-06-16 16:21:57 -05:00
justin
674979c928 tweak(sc-coc): tell carrier to check with insurer before answering + Reply-To info@
- Added a line asking them to call their insurance agent to confirm Form E
  ability before clicking yes/no, so we pick the right path first time.
- Reply-To now routes to info@performancewest.net (monitored), overridable via
  SC_COC_REPLY_TO env.
2026-06-16 09:35:13 -05:00
justin
ab9491be6a fix(deploy): hard-reset to origin/main + assert HEAD advanced (stop silent strands)
deploy.sh used 'git pull origin main', which silently ABORTS when the tracked
tree is dirty (generated site files, or any drift), stranding new commits on an
old checkout — this bit us twice today (prod stuck at b125d46 while origin had
the COC work). Replaced with:
  git fetch origin main && git reset --hard origin/main
The deploy box is a pure mirror of origin (all real changes land via git), so a
hard reset is safe and untracked files (data/*, .secrets/) are preserved. Added
a post-reset assertion that HEAD == origin/main and exits 1 loudly otherwise, so
a strand can never again be masked by a '| tail' in the caller.
2026-06-16 09:25:11 -05:00
justin
147657d82d fix(docker): COPY SC COC Form.pdf into workers image
The Dockerfile copies form PDFs explicitly by name; the SC COC template was
missing, so fill_sc_coc() would FileNotFoundError in the container. Added it.
2026-06-16 09:23:43 -05:00
justin
c46efe5730 feat(sc-coc): SC intrastate Certificate of Compliance flow (insurance gate -> $25 fee -> file)
Routes SC intrastate-authority orders to the real SCDMV COC product instead of a
PSC certificate (which doesn't apply to property carriers):

  - sc_coc_filing.py: emails the carrier a one-click yes/no — does your insurer
    have / can they file a Form E (SC intrastate liability, $750k or $300k by
    GVWR) with SCDMV? Records the answer; builds the filled COC package.
  - state_trucking._handle_sc_coc_gate: SC intrastate gate —
      no answer  -> email the question once, HOLD
      answered no -> broker referral opened, HOLD (ops todo)
      answered yes-> proceed to bill the exact $25 SCDMV COC fee (at cost) + file
  - API POST /compliance-orders/:id/sc-insurance: records yes/no in intake_data
    (no schema change); NO opens an insurance_lead broker-referral ticket +
    Telegram; YES re-dispatches the worker to bill the $25 + file.
  - site/order/sc-insurance: customer one-click yes/no page (auto-submits when
    the email links straight to ?have=yes|no).

Non-SC intrastate still uses the PSC/PUC email path or a manual todo.
2026-06-16 09:15:55 -05:00
justin
dae9603808 fix(erpnext): remove default 'BC' from Sales Order incorporation_province
The custom_incorporation_province field had default='BC', which stamped 'BC'
on EVERY Sales Order (US trucking, formation, compliance) — not just Canadian
CRTC orders. This leaked a meaningless 'BC' onto e.g. an SC scrap-metal carrier's
order. Removed the default and added a blank option so it's empty unless it's an
actual Canadian incorporation. Existing non-canada_crtc orders cleared in prod
via db_set (13 fixed; the 2 real canada_crtc orders keep BC).
2026-06-16 09:12:49 -05:00
justin
ad590aab7c feat(sc-coc): SCDMV Certificate of Compliance PDF filler + correct $25 state fee
SC for-hire PROPERTY carriers (not passenger/HHG/hazwaste) register intrastate
via the SCDMV Certificate of Compliance (COC), not a PSC certificate. This adds:
  - sc_coc_pdf_filler.fill_sc_coc(): fills the official SCDMV Form COC from
    intake (business name, officers, physical/mailing address, phone), picks
    New vs Renewal, and stamps the coverage class (E-L low-value / E-LC).
    Field names in the source PDF are auto-generated + offset from their labels;
    mapped here by verified on-page geometry. Verified by render.
  - suggest_coverage_class(): E-L for low-value cargo (scrap/dump/aggregate),
    else E-LC (safer default).
  - gov_fee: SC intrastate fee corrected from $0 placeholder to the real $25
    COC new-application fee (renewals $0), billed at cost.

The carrier's INSURER files the Form E (liability) + Form H (cargo, E-LC only)
directly with SCDMV; we collect the COC app + $25 and submit it.
2026-06-16 09:08:50 -05:00
justin
01b3e1d234 chore(env): scaffold ISA_SC_DMS_USER/PASS for SC PSC MyDMS e-file portal
Non-attorney 'Service' filer account registered under Performance West
(filings@performancewest.net). Credentials live only in the server .env
(blank default in template, never committed). Consumed by the upcoming SC
intrastate Playwright e-filer.
2026-06-16 08:19:17 -05:00
justin
c27cfd3242 docs(crons): note IRP invoice poller now also handles intrastate [PW-ISA] replies 2026-06-16 07:59:38 -05:00
justin
b125d46663 feat(intrastate): automate state PUC/PSC authority filing (email + invoice + auto-bill)
Intrastate operating authority is state-specific + application-based like IRP, so
it reuses the same email/POA + invoice-reconciliation flow:
  - intrastate_filing.send_intrastate_submission: emails the state PSC/PUC the
    authority application with the signed POA attached (subject tag [PW-ISA CO-..]),
    reusing irp_filing's MinIO download + census enrich helpers.
  - The shared poller (irp_invoice_poller) now matches BOTH [PW-IRP] and [PW-ISA]
    tags, parses the fee, Telegram-alerts, and bills the customer the exact amount
    with the correct service slug.
  - state_trucking gov-fee gate routes intrastate-authority to the PSC/PUC email
    path; if no submission email is configured for the base state it falls back
    to a manual todo (safe default — no emailing guessed agency addresses).

Per-state ISA_<ST>_EMAIL env (blank until the exact agency address is verified).
SC/GA/TX scaffolded. Customer still only sees an exact-fee payment link; you only
approve the final filing.
2026-06-16 07:57:57 -05:00
justin
42b433db5a deploy: reset generated site files before pull (fixes silently-stranded commits)
deploy.sh ran sync_nav.py / gen-service-catalog.py which dirty site/public +
site/src in place; that made 'git pull' abort, so recent commits never reached
prod until pulled manually. Reset those generated paths before pulling so deploys
always fast-forward. Also document the IRP POA signer-name/title follow-up.
2026-06-16 05:28:45 -05:00
justin
a74516a255 irp: attach signed POA + census-enrich address; fix date JSON crash
- send_irp_submission now REQUIRES and ATTACHES the signed Power of Attorney PDF
  (downloaded from MinIO) — the state won't act on a third-party filing without
  it, and 'on file, available on request' stalls the request. If the POA isn't
  available we don't email and fall back to a manual todo.
- Backfill missing legal_name + registered address from the FMCSA census so the
  submission isn't sent with a blank address (root cause of the empty
  'Legal/registered address: , ,' line). Customer-supplied values win.
- state_trucking passes signed_auth_key through to the IRP submitter.
- Fix 'Object of type date is not JSON serializable' when creating the admin
  todo (json.dumps(..., default=str)) — broke the intrastate (bash-fee) path.
2026-06-16 05:18:23 -05:00
justin
1d6693adb9 govfee: itemize the estimate in the email + add a 'fix my fee' dispute path
The gov-fee email now lists exactly what the amount covers (full breakdown) so
the customer can check it for accuracy, with two clear actions: a  pay link and
a  'something looks wrong' link to /order/dispute.

New /order/dispute page shows the fee breakdown and lets the customer describe
what's wrong; it opens an 'issue' support ticket pre-tagged with the order
(amount + label + their note) via /api/v1/tickets, so ops corrects the fee
before any payment is taken. The /order/pay page also shows the itemized
breakdown and a dispute link.
2026-06-16 05:00:31 -05:00
justin
ea695d6828 feat(govfee): exact fees + agency processing fees; IRP email/invoice reconciliation
- gov_fee: add AGENCY_PROCESSING_FEE (per-service card/convenience fee passed
  through so the customer pays the true all-in cost); estimate_gov_fee now folds
  it into the billed total. IFTA/intrastate/UCR fees are published/near-exact.

- IRP fees can't be looked up — only the base state computes them. New
  irp_filing.py: emails the base-state IRP unit a Schedule A/B request (Reply-To
  the IRP filings mailbox, [PW-IRP CO-...] subject tag), and a 15-min cron
  (irp_invoice_poller) scans the mailbox for the state's invoice reply, parses
  the exact apportioned fee, Telegram-alerts you, and bills the customer the
  EXACT amount via a gov-fee child order + payment link. Then it proceeds to
  ready_to_file for your final approval.

- state_trucking gov-fee gate now routes IRP to the email/invoice path and
  IFTA/intrastate to immediate exact-fee billing.

- Mailbox is configurable (IRP_FILINGS_IMAP_* in app.env.j2); falls back to
  OPS_IMAP_* filtered by the [PW-IRP] tag until a dedicated mailbox exists.

Telegram alerts fire on IRP submission sent, invoice received (billed), and
un-parseable replies (so you can read + enter the fee manually).
2026-06-16 04:58:14 -05:00
justin
861f2fbfd4 feat(govfee): auto-quote + collect state fees for at-cost trucking services
At-cost services (IRP/IFTA/intrastate) only collected our service fee at
checkout; the variable state fee was never billed, so orders stalled at
authorization_signed and the filing card would have had to front large IRP fees.

New end-to-end, hands-off flow (you only approve the final filing):
  1. After authorization is signed, state_trucking auto-estimates the gov fee
     from intake (base/op states, power units, weight) via gov_fee.estimate_gov_fee.
  2. Creates a CHILD compliance order (CG-..., service_fee=0, gov_fee=estimate,
     parent_order_number set, migration 099) that flows through the EXISTING
     checkout/payment/webhook machinery.
  3. Emails the customer a payment link to /order/pay (new self-contained page)
     showing every method with correct surcharges — ACH 0% (Stripe 0.8%/ cap
     absorbed, no GoCardless needed), card/PayPal 3%, Klarna 6%, crypto 0%.
  4. Order holds at awaiting_government_fee_approval until paid.
  5. On payment, handlePaymentComplete detects the child (parent_order_number)
     and re-dispatches the PARENT with gov_fee_paid=true, which proceeds to
     prepare + queue the filing and stops at ready_to_file for your approval.

IRP fees are estimates billed at cost (refund overage / rebill shortfall); IFTA
decals + most intrastate fees are near-exact. Tunable via env.
2026-06-16 04:35:45 -05:00
justin
3e13b722f6 fix(relay): logging.getenv -> crash on import (card loading was broken)
relay_integration.py line 34 called logging.getenv (no such attr), which threw
AttributeError on import -> load_card_from_erpnext() crashed for every caller
(BOC-3 and now UCR filing payment). Drop the bogus line; LOG is set correctly on
the next line. Present since the initial commit.
2026-06-16 03:30:40 -05:00
justin
aadf9f5bc1 feat(ucr): Playwright auto-filing for UCR registration on approval
Adds scripts/workers/services/ucr_playwright.py — a UCR.gov National Registration
System automation that, given a USDOT + fleet size, runs the register/pay flow,
pays the federal UCR fee with the matched PW filing card (Relay/Stripe Issuing),
and captures a confirmation screenshot + number. Conventions match
boc3_playwright / fmcsa_web_submitter: dev-mode dry-run guard, undetected
(patchright) browser, CAPTCHA detection, screenshot evidence, dataclass result.

Safety: verifies the displayed fee against the federal schedule before paying and
refuses to auto-charge a surprising amount (UCR_MAX_AUTO_FEE_USD) — falls back to
manual filing instead.

Wires it into MCS150UpdateHandler: when an approved (admin_approved) order has
slug ucr-registration, _file_ucr_registration runs the automation, uploads the
confirmation screenshot to MinIO, records filing_status + confirmation, and sets
fulfillment_status=completed on success. On CAPTCHA / fee-mismatch / failure it
reverts to ready_to_file with a high-priority 'file manually' todo. This replaces
the old behavior where approving a UCR just sat at authorization_signed.
2026-06-16 03:29:05 -05:00
justin
bf69960e8c admin: mark-filed action to advance manual/admin-assisted orders to completed
Admin-assisted services (UCR, MC authority, etc.) have no automated submission,
so approving them only flips to authorization_signed and then sits there -- there
was no way to advance to completed. Add POST /mark-filed (filed_waiting_state |
completed, optional confirmation #, transactional + audit-logged) and drawer
buttons 'Mark as filed (waiting on agency)' / 'Mark completed' shown for orders in
authorization_signed / ready_to_file / filed_waiting_state. Confirmation number
is recorded into intake_data.filing_status.manual_confirmation.
2026-06-16 03:12:57 -05:00
justin
6c10c6a6cd mcs150 handler: service-aware todos/notifications/emails (stop mislabeling UCR as MCS-150)
UCR (and other admin-assisted DOT services) route through MCS150UpdateHandler,
which hardcoded 'MCS-150' and self.SERVICE_SLUG in the admin todo, the Telegram
fulfillment notification, and the customer status email -- so approving Paul's
UCR produced an 'MCS-150 Review / mcs150-update / PDF: not generated' alert and
an 'MCS-150 biennial update' customer email, both wrong.

Add SERVICE_DISPLAY_NAMES + _service_label(slug); use the actual slug everywhere.
Admin-assisted services now show 'UCR Annual Registration — FILE NOW ... file
manually on the portal (no auto-generated form)' instead of MCS-150/PDF wording,
and the customer email names the right service.
2026-06-16 03:02:53 -05:00
justin
326aee7714 admin: inline filing screenshots + atomic approve transaction
- Documents now flag is_image and the drawer renders screenshots / confirmation
  images as inline clickable thumbnails (click to open full size); PDFs keep the
  View link. Evidence keys are labeled (Filing confirmation screenshot, etc.),
  the worker-temp screenshot_path (not a MinIO key) is dropped in favor of the
  durable evidence copy, and non-file evidence (fax_log_id) is skipped.
- Wrap approve's status-update + audit-insert in a transaction so a failure can
  no longer leave an order out of ready_to_file without dispatching (the earlier
  audit CHECK violation did exactly that to Paul's UCR; it has been reset).
2026-06-16 02:57:24 -05:00
justin
73c27c75b1 migration 098: allow compliance order types in order_audit_log
The admin compliance-orders approve/re-arm actions write order_audit_log rows
with order_type='compliance', but the CHECK constraint (from migration 004)
only allowed formation/service/quote -- so every approve failed with a 500
('Approve failed.'). Expand the constraint to include compliance + compliance_batch.
2026-06-16 02:49:46 -05:00
justin
3df3a08221 mcs150 handler: derive admin-assisted intake from census; gate ready_to_file
Admin-assisted DOT services (UCR, BOC-3) routed to this handler were marked
ready_to_file with whatever intake existed -- e.g. a UCR with only a DOT number,
missing legal name / state / fleet-size bracket (which sets the UCR fee tier).
That made the admin 'ready to file' status dishonest and unfileable.

Now, for ADMIN_ASSISTED_REQUIRED services we first enrich intake from the FMCSA
census (legal_name, address_state, power_units) + the order email, and derive
the UCR fleet_size_bracket from power units (UCR_FLEET_BRACKETS). If every
required field is then present we persist it and mark intake validated (falls
through to the admin review gate -> ready_to_file). If anything is still
missing, we persist what we have, set fulfillment_status=awaiting_intake, and
email the customer to complete intake -- instead of falsely showing ready_to_file.
2026-06-16 02:46:10 -05:00
justin
8e1e2f16bf admin docs: only list objects that actually exist (drop dead/phantom rows)
Filter the documents list to objects that exist in storage, so stray keys (a
template pdf_minio_path, or a phantom mcs150 esign_records row on a UCR order
from the shared remediation pipeline) no longer surface as dead rows. The UI
drops the now-unreachable 'not generated yet' branch.
2026-06-16 02:37:33 -05:00
justin
c8e0065729 admin docs: hide phantom prepared-filing PDF for non-form services
The dot-compliance-remediation pipeline seeds filing_status.pdf_minio_path on
every order in a batch, but only MCS-150-producing slugs (mcs150-update,
dot-registration, usdot-reactivation, dot-full-compliance) ever generate it.
For admin-assisted services like UCR it was a phantom 'Prepared filing PDF /
not generated yet' row. Gate the prepared-filing artifacts on FORM_PRODUCING_SLUGS
(mirrors the worker's MCS150_FORM_SLUGS) and give the empty state a clearer
explanation.
2026-06-16 02:35:29 -05:00
justin
d18de006d8 admin approve: block filing when intake incomplete (force override + warning)
Paul Wilson's UCR (CO-FE07212A) sat at fulfillment_status=ready_to_file with
intake_data_validated=false, so the Approve & File button would have dispatched
it for government submission with incomplete intake and no document to review.

Backend: /approve now refuses an order whose intake_data_validated is false
unless {force:true} is passed (409 code=intake_incomplete); the override is
recorded in order_audit_log. The fulfillment_status=ready_to_file requirement
is unchanged, so awaiting_intake orders (e.g. Mitchell's MCS-150s) still 409.

UI: the drawer shows an amber 'intake not complete' warning above the approve
button, and approving an intake-incomplete order triggers an explicit
override confirmation before sending force=true.
2026-06-16 00:33:22 -05:00
justin
aa498fdfdf admin docs: probe existence with ranged GET (HEAD fails presigned-URL sig) 2026-06-16 00:23:40 -05:00
justin
1f3b36b29e admin docs: verify object existence, mark dead links, cleaner 404
The DB can record a pdf_minio_path before the object is uploaded (e.g. a
prepared-filing path written for an order whose prep never completed -- Paul
Wilson / Mark Adams MCS-150s). The documents list now HEAD-checks each key and
returns an exists flag; the UI shows 'not generated yet' instead of a dead View
button, and the stream endpoint returns a clean 404 for a missing object.
2026-06-16 00:22:35 -05:00
justin
bce5db4a09 admin: view order PDFs from MinIO (signed forms, prepared filings, evidence)
Adds a Documents section to the compliance-order detail drawer so you can
review the actual filing PDFs before approving an order:
  GET /api/v1/admin/compliance-orders/:id/documents  list viewable objects
  GET /api/v1/admin/compliance-orders/:id/document?key=&token=  stream one

Key discovery pulls from esign_records (unsigned + signed docs per order),
intake_data.filing_status (pdf_minio_path, attested_pdf, evidence/*), and the
order's engagement_letter / rmd_packet columns.

Rather than hand out presigned URLs (MinIO's public host is IP-allowlisted to a
few office IPs, so links break elsewhere), the API streams the object through
itself from internal minio:9000, gated by the admin JWT. The stream endpoint
accepts the token via ?token= (new middleware requireAdminQueryOrHeader) so a
PDF opens in a new tab, and refuses any key that isn't one of the order's own
documents.
2026-06-16 00:20:15 -05:00
justin
d65f5ea279 nginx: stop blocking /admin (bot-scan rule matched our own dashboard)
The shared security snippet blocked any path matching /(admin|administrator|
login.action|struts) with 'return 444', which drops the connection. That bare
'admin' token also matched our own operations dashboard at /admin and the new
/admin/compliance-orders, so the browser showed 'This site can't be reached'.
Dropped the bare 'admin' token; administrator/login.action/struts stay blocked.
Applied live on prod (sudo edit + nginx reload); this updates the source of
truth so the ansible nginx role won't reintroduce it.
2026-06-16 00:05:54 -05:00
justin
48fab25840 docs: document the admin compliance-orders surface in the runbook 2026-06-16 00:00:12 -05:00
justin
2296566e85 admin: compliance-orders dashboard (view, approve-to-file, re-arm intake)
The admin SPA only managed formation_orders; compliance service orders
(telecom/DOT/healthcare) had no admin surface, so you couldn't see what was
paid, what was stuck on intake, or approve a prepared filing for submission.

API (api/src/routes/admin.ts), all requireAdmin:
  GET  /api/v1/admin/compliance-orders            list, grouped by batch, filters
  GET  /api/v1/admin/compliance-orders/stats      queue overview counts
  GET  /api/v1/admin/compliance-orders/:id        full detail + audit log
  POST /api/v1/admin/compliance-orders/:id/approve       approve ready_to_file + dispatch worker
  POST /api/v1/admin/compliance-orders/:id/rearm-intake  clear reminder stamp so daily nudge resumes

UI: new static page /admin/compliance-orders/ (self-contained, CSP-safe inline
CSS, no external JS framework) reusing the existing pw_admin_token session.
Cards group multi-service batches, flag paid+intake-incomplete in red, show
reminder counts, and expose Approve & Re-arm buttons. Linked from the main
/admin top bar. Every approve/re-arm writes an order_audit_log entry.
2026-06-15 23:57:05 -05:00
justin
b48d0cb799 docserver: self-healing Task Scheduler config + docs
Companion to the worker MinIO-retry fix. Makes the worker auto-recover from
process death (crash, manual kill, missed boot trigger), not just MinIO outages.

- start_worker.bat: propagate Python's exit code (exit /b %rc%) so Task
  Scheduler can actually detect a failed run (it previously always exited 0).
- reconfigure_task.ps1 (new): re-registers PW-DocserverWorker with
  RestartCount=99 / 1-min interval, StartWhenAvailable, and two triggers —
  AtStartup plus a 5-min repeating trigger with MultipleInstances=IgnoreNew, so
  a dead worker relaunches within ~5 min and never double-runs. Idempotent.
- install.ps1: same self-healing settings for fresh installs.
- Verified on the box: killed the worker -> task relaunched it; firing again
  while running stayed at one instance.

Docs updated to match reality:
- docserver/README.md: new 'Reliability / self-healing' section.
- document-generation.md: corrected the stale 'Flask DocServer :5050 / HTTP'
  description to the actual MinIO outbound-only transport.
- e2e-test-plan.md: removed the outdated 'Word COM fails under SYSTEM / requires
  RDP after every reboot' limitation; now self-healing under SYSTEM session 0.
- infrastructure.md: fixed VM spec (Win Server 2019, Word 16.0, Python 3.13,
  SSH port 22422) + self-healing note.
- architecture.md / formation-system.md: trigger + self-healing details.
2026-06-15 22:49:21 -05:00
justin
7929413eeb docserver: survive MinIO outages instead of exiting
The worker called sys.exit(1) on any MinIO connection error, so a single
transient 502 from MinIO/its reverse proxy left it dead until a manual restart
or reboot (its scheduled task only runs at system startup). It had been dead
~5 weeks after a 502 on May 9.

- _connect_minio_forever(): retry the initial MinIO connect indefinitely with
  capped exponential backoff (5s..120s) instead of exiting.
- main loop: wrap each poll cycle; on any error, log + rebuild the client and
  keep polling rather than crashing.

Verified on the box: normal DOCX->PDF still works (~11s e2e); a bogus endpoint
now retries forever without ever calling sys.exit (was the exact May-9 failure).
2026-06-15 22:40:27 -05:00