deploy.sh used 'git pull origin main', which silently ABORTS when the tracked
tree is dirty (generated site files, or any drift), stranding new commits on an
old checkout — this bit us twice today (prod stuck at b125d46 while origin had
the COC work). Replaced with:
git fetch origin main && git reset --hard origin/main
The deploy box is a pure mirror of origin (all real changes land via git), so a
hard reset is safe and untracked files (data/*, .secrets/) are preserved. Added
a post-reset assertion that HEAD == origin/main and exits 1 loudly otherwise, so
a strand can never again be masked by a '| tail' in the caller.
deploy.sh ran sync_nav.py / gen-service-catalog.py which dirty site/public +
site/src in place; that made 'git pull' abort, so recent commits never reached
prod until pulled manually. Reset those generated paths before pulling so deploys
always fast-forward. Also document the IRP POA signer-name/title follow-up.
The host-side generator ran 'node scripts/*.mjs' in deploy.sh, but the prod box
has python3 only (no node outside containers), so the site deploy failed at the
generation step. Reimplemented both in Python (byte-identical output, verified
via diff against the node version; matches scripts/sync_nav.py tooling).
Previously two hand-maintained price lists (API COMPLIANCE_SERVICES + site
SERVICE_META) drifted apart -- that is how the healthcare +$200 raise charged
$399 while displaying $599. Eliminate the drift class entirely:
- Move the catalog to api/src/service-catalog.ts (the authority; checkout
charges from it). compliance-orders.ts imports it.
- scripts/gen-service-catalog.mjs generates site/src/lib/service-catalog.generated.ts
from the API source. intake_manifest.ts re-exports SERVICE_META from it, so all
~60 site pages keep working unchanged.
- deploy.sh regenerates + drift-checks before building (site build context is
./site only and cannot read ../api, so generation happens host-side).
- scripts/check-service-catalog-drift.mjs fails the build if the generated file
ever diverges from the API (verified: passes aligned, fails on mismatch).
To change a price now, edit ONE file: api/src/service-catalog.ts.
`. ./.env` choked on shell-hostile values (SMTP_PASS special chars), aborting
before TELEGRAM vars loaded and rendering an empty alertmanager.yml. Extract just
the two keys with sed instead, and warn if either is empty.
Alertmanager does not expand ${ENV} in its YAML, so the committed config with
${TELEGRAM_BOT_TOKEN}/${TELEGRAM_CHAT_ID} crash-looped it (line 24: cannot
unmarshal !!str `${TELEG...` into int64) - 11k+ restarts on prod, alerting dead.
- rename alertmanager.yml -> alertmanager.yml.template (keeps ${} placeholders)
- deploy.sh: envsubst the template into the (gitignored) alertmanager.yml from
.env, scoped to the two TELEGRAM vars so the {{ }} Go-template message survives
- gitignore the rendered file (contains the bot token)
- warns if the vars are unset
- deploy.sh/deploy-dev.sh: bring up listmonk-hc (upstream image, excluded from
build); document the one-time listmonk_hc DB create + --install.
- docker-compose.dev.override.yml: dev-only override (committed) that drops the
prod host-port bindings and pins dev's own postgres volume (dev-pgdata) via
compose !override tags. deploy-dev ships it as docker-compose.override.yml so
syncing the canonical compose to the shared host no longer breaks dev's
api-postgres (port :5432 clash + volume switch). Discovered + fixed while
validating listmonk-hc on dev.
- pw-hc-rampcap.sh: healthcare analogue of pw-listmonk-rampcap, ramps the
listmonk_hc cap 100->1000/h off /etc/postfix/hc-warmup-start, fully
independent of the trucking ramp/cap.
deploy.sh: include proxy-relay in the default service set; it's an upstream
image (ginuerzh/gost) with no build context, so exclude it from 'compose build'
while keeping it in 'compose up'.
deploy-dev.sh: rsync docker-compose.yml to the dev server (it was never synced,
so new services like proxy-relay never reached dev) and add proxy-relay to the
'compose up --build' set.
The site header / Services mega-dropdown was duplicated across two render
systems (Astro pages via Base.astro->nav.html, and ~80 pre-rendered static
public/**/index.html pages each embedding their own copy). They had drifted
into 5 different variants (missing 'New Carrier Setup', misplaced Healthcare
column, NEW vs FREE badges, em-dash encoding differences), so
dev.performancewest.net, the order pages, and the rest of the site disagreed.
- Make site/src/partials/nav.html the single source of truth (adopts the most
complete variant).
- Add scripts/sync_nav.py to rewrite every static page's <nav> block from
nav.html (idempotent; --check guards against drift in CI/deploy).
- Run the sync automatically in deploy.sh and scripts/deploy-dev.sh.
- Deprecate scripts/inject_healthcare_nav.py (now delegates to sync_nav.py).
- Neutralize the broken no-op SiteNav.astro component.
All 80 headers + the Astro-built order pages now render the identical dropdown.
Two build fixes surfaced while shipping the set-password rename:
1. erpnext/Dockerfile cloned frappe/payments unpinned; its default branch now
requires Python >=3.14 while frappe/erpnext:v15 ships 3.11, so the image
build failed with 'Package payments requires a different Python'. Pin the
clone to --branch version-15.
2. deploy.sh built the erpnext image without first staging the custom Frappe
apps into the build context (erpnext/build.sh). That meant a baked-code
change could silently ship stale code. Stage apps when erpnext is built.
The portal serves Frappe assets from a host copy (/opt/erpnext-assets). Frappe
emits content-hashed filenames that change on every ERPNext rebuild/migrate; the
host copy was never re-synced by deploy.sh, so the manifest referenced hashes
that 404'd on the host -> portal rendered with no CSS (recurring issue).
- Commit extract-erpnext-assets.sh (was untracked, prod-only). It now also runs
bench build to keep assets.json consistent with dist/, copies the manifest,
and verifies the login bundle exists on the host before finishing.
- deploy.sh: add an 'erpnext' target that rebuilds, runs bench migrate, and
re-extracts assets. Plus a cheap drift guard on EVERY deploy that auto-heals
by re-extracting if the portal manifest references a missing CSS bundle.