new-site/docs/production-runbook.md
justin f8cd37ac8c Initial commit — Performance West telecom compliance platform
Includes: API (Express/TypeScript), Astro site, Python workers,
document generators, FCC compliance tools, Canada CRTC formation,
Ansible infrastructure, and deployment scripts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-27 06:54:22 -05:00

9.4 KiB

Production Runbook — FCC Filing + Treasury Stack

This runbook covers what an operator has to provision before the FCC filing automation and crypto-treasury pipeline can run in production. Each section lists the specific env vars, portal credentials, and one-time setup steps.


1. Admin dashboard auth (Blocker 1)

The admin dashboard and every /api/v1/admin/* endpoint is guarded by a JWT signed with ADMIN_JWT_SECRET. The API refuses to boot in production if the secret is still the built-in placeholder.

One-time setup

  1. Generate a strong random secret:

    openssl rand -base64 48

  2. Set on the API process (Docker / systemd env file):

    ADMIN_JWT_SECRET=

  3. Provision an admin user:

    psql "$DATABASE_URL" <<SQL INSERT INTO admin_users (username, password_hash, display_name, email, active) VALUES ( 'justin', crypt('', gen_salt('bf', 12)), 'Justin Tyson', 'ops@performancewest.net', TRUE ); SQL

    (Use bcryptjs from Node to hash if pgcrypto is unavailable.)

  4. Verify login:

    curl -s -X POST https://api.performancewest.net/api/v1/admin/login
    -H 'Content-Type: application/json'
    -d '{"username":"justin","password":""}'

  • ADMIN_JWT_SECRET — JWT signing secret. Required in production.
  • WEBHOOK_SECRET — shared secret for ERPNext → API formation/CRTC webhooks.
  • SHKEEPER_API_KEY — header used by SHKeeper to authenticate its callback.
  • STRIPE_WEBHOOK_SECRET — verified by Stripe's HMAC signature check.

The startup guard in api/src/config.ts (refuseInsecureProduction) blocks boot if any of the above are unset or still set to change-this-in-production.


2. USAC E-File storage state (Blocker 2)

USAC's E-File portal (https://www2.usac.org/cr/) requires a logged-in session cookie to submit Form 499-A. We drive it via Playwright. The filer's session (login cookies + MFA state) must be provisioned once per filing entity.

One-time setup per telecom entity

  1. Log in manually to E-File using the entity's FRN + the assigned E-File administrator account.

  2. Complete MFA (USAC MFA is TOTP-based as of 2026).

  3. Export the session state to MinIO:

    bucket: playwright-storage key: usac/<telecom_entity_id>/storage_state.json

    The filer reads this key at the start of each fcc-499a / fcc-499-initial job. If missing or expired, the handler logs a ToDo for the admin.

  4. Renewal: USAC session expires ~14 days idle; the filer re-uses it as long as it's valid, and the scheduled usac_session_refresh cron (every 7 days) re-logs in and re-exports. The cron requires a stored TOTP secret:

    ERPNext Sensitive ID: usac-totp-<telecom_entity_id>

Env vars

  • PLAYWRIGHT_STORAGE_BUCKET=playwright-storage
  • USAC_MFA_VIA=totp (alternative: sms — not supported in automation)
  • See scripts/workers/services/form_499a.py for the filer entry point.
  • See docs/fcc-references/499a-filing.md for screen-by-screen form notes.

3. Relay debit card (Blocker 4)

Filing portal charges settle on RELAY_FILING_CARD_ID — a Relay debit card whose balance is the Relay business account balance. Once Bridge offramps crypto USD to Relay, the same balance funds the card.

One-time setup

  1. In the Relay dashboard → Cards → Issue card.

  2. Virtual, unlimited (no per-transaction cap); lock to "Online purchases only".

  3. Whitelist MCCs 9399 (government services) and 7372 (computer services).

  4. Copy the card's internal id from Relay (visible in URL of the card detail page) and set:

    RELAY_FILING_CARD_ID=

  5. Fallback chain in scripts/workers/relay_integration.py:

    CRYPTO_FILING_CARD_ID → STRIPE_FILING_CARD_ID → PAYPAL_FILING_CARD_ID → RELAY_FILING_CARD_ID

    For crypto-funded orders, set PREFERRED_FUNDING_CARD=RELAY_FILING_CARD_ID so the Playwright filer charges Relay first.

Statement reconciliation

  • Daily: scripts/workers/relay_deposit_monitor.py parses Relay IMAP alerts into relay_deposits. Offramp deposits have source_kind='offramp_bridge'; vendor charges appear as outgoing card transactions.
  • Monthly: export Relay statement CSV, import into bookkeeping/imports/, and reconcile against filing_fee_reservations.status='spent' rows.

4. Webhook → worker dispatch chain

Confirmed wiring as of this commit:

  1. POST /api/v1/webhooks/stripe → verifies Stripe HMAC → handlePaymentComplete(order_id, order_type, session_id).
  2. POST /api/v1/webhooks/shkeeper → verifies X-Shkeeper-Api-Key → enqueues crypto_payment_jobs + calls handlePaymentComplete.
  3. For compliance orders, handlePaymentComplete:
    • Flips ERPNext Sales Order workflow_state to Service Queued.
    • Dispatches directly to the worker at ${WORKER_URL}/jobs with action=process_compliance_service (no dependency on an ERPNext Webhook fixture).
  4. POST /api/v1/webhooks/service/queued (ERPNext-driven) remains as a backup path — if you configure a Frappe Webhook on Sales Order workflow_state → Service Queued, it fires the same worker action.
  5. Worker job_server.py:748 handle_process_compliance_service routes to the handler from SERVICE_HANDLERS[service_slug].

Env vars

  • WORKER_URL=http://workers:8090 (internal Docker network name)
  • WEBHOOK_SECRET=<shared-with-ERPNext>
  • SHKEEPER_API_KEY=<configured-in-SHKeeper-admin>
  • STRIPE_WEBHOOK_SECRET=whsec_... (from dashboard.stripe.com/webhooks)

Verification

After deploying, confirm with:

# trigger a compliance test checkout
# then tail the API logs for these three lines per order:
[checkout] Payment confirmed: compliance CO-xxx via <method>
[checkout] Advanced compliance Sales Order SAL-xxx to Service Queued
[checkout] Worker dispatched: CO-xxx (<service-slug>)

# and the worker logs for:
[worker] process_compliance_service: CO-xxx (<handler>)

5. Crypto treasury env (manual mode)

Until Bridge is approved, treasury runs in manual mode — admin approves every offramp before it touches Bridge.

CRYPTO_TREASURY_MODE=manual        # default; flip to "auto" when Bridge is live

# Bridge (when approved):
BRIDGE_API_KEY=
BRIDGE_API_URL=https://api.bridge.xyz
BRIDGE_RELAY_EXTERNAL_ACCOUNT_ID=
BRIDGE_DEVELOPER_FEE_USD=0

RELAY_BANK_MEMO_PREFIX=PW-ORDER-
MAX_SLIPPAGE_BPS=300

# Cold wallet (Bridge approval not required to sweep — hardware wallet is live)
COLD_WALLET_BTC_ADDR=
COLD_WALLET_ETH_ADDR=
COLD_WALLET_USDC_ADDR=
COLD_WALLET_USDT_ADDR=
COLD_WALLET_HOT_FLOAT_USD_CENTS=50000
COLD_WALLET_AUTO_SWEEP_CEILING_USD_CENTS=500000
CRYPTO_SWEEP_ADMIN_EMAIL=ops@performancewest.net

In manual mode the crypto_payment_worker parks every received job at state='manual' and an admin approves via POST /api/v1/admin/crypto-payments/:order_id/retry-offramp.


6. Scheduled worker jobs (systemd timers)

Deployed by the worker-crons ansible role (infra/ansible/roles/worker-crons/). Each timer runs docker compose exec -T workers python -m <module> on its schedule.

Timer Cadence Module
pw-usf-factor-monitor.timer daily 09:00 CT scripts.workers.usf_factor_monitor
pw-deminimis-factor-check.timer daily 03:00 UTC scripts.workers.deminimis_factor_check
pw-cold-wallet-sweep.timer every 30 min scripts.workers.cold_wallet_sweeper
pw-crypto-payment-worker.timer every 60 s scripts.workers.crypto_payment_worker
pw-relay-deposit-monitor.timer every 5 min scripts.workers.relay_deposit_monitor
pw-commission-worker.timer daily 02:00 UTC scripts.workers.commission_worker
pw-renewal-worker.timer daily 04:00 UTC scripts.workers.renewal_worker
pw-cdr-retention.timer daily 05:00 UTC scripts.workers.cdr_retention_sweeper
pw-cdr-unlock-nudge.timer daily 10:00 CT scripts.workers.cdr_unlock_nudge
pw-payment-reminder.timer daily 11:00 CT scripts.workers.payment_reminder
pw-fcc-rmd-removed.timer weekly Wed 08:00 CT scripts.workers.fcc_rmd_removed_scraper

Verification

# list active timers
systemctl list-timers 'pw-*'

# tail a specific job's history
journalctl -u pw-usf-factor-monitor.service --since '1 day ago'

# trigger a job ad-hoc for testing
systemctl start pw-deminimis-factor-check.service

Adding a new cron

Add an entry to infra/ansible/roles/worker-crons/defaults/main.yml:

- name: pw-my-new-job
  description: What it does
  module: scripts.workers.my_new_job
  on_calendar: "*-*-* 06:00:00 UTC"
  persistent: true            # run on boot if missed

Then re-run ansible-playbook playbooks/site.yml.


7. Smoke tests

Run before every release:

# Service handler registry + CPNI/CALEA variant mapping
docker compose exec workers python -m scripts.tests.test_cpni_calea_variants

# Form 499 Initial handler guards
docker compose exec workers python -m scripts.tests.test_form_499_initial_smoke

Both return exit 0 on pass. Wire into CI.


8. Boot-time health checks

The API and worker services each expose:

  • GET /health — returns 200 when config loaded + DB reachable.
  • GET /health/deep — returns 200 only when ERPNext, MinIO, and the worker message channel all respond.

Set these as the Docker HEALTHCHECK / K8s liveness probe so deploys fail fast when secrets are missing.