Root cause of recurring 'Password not found for Email Account Performance West
Outgoing': the account was shipped as a fixture with awaiting_password=1 and no
password. Email Account SMTP passwords are encrypted per-site and cannot live in
a fixture, so every `bench migrate` reimported the fixture and re-broke
outgoing mail (login notifications, password resets, welcome emails).
- Remove the Email Account fixture (it cannot carry the encrypted secret).
- Add email_account_sync.sync_outgoing_password: idempotent, exception-safe
upsert that reconciles the account + password from SMTP_* env and clears
awaiting_password.
- Wire it to after_migrate (repairs at end of every deploy/migrate, right after
fixtures import) and the daily scheduler (heals out-of-band restore/restart
drift).
- Pass SMTP_* into the erpnext + erpnext-scheduler containers so the sync has
the secret (they previously had no SMTP env).
Isolated from the trucking listmonk: own DB (listmonk_hc), own uploads volume,
own sliding-window cap. Configured (on dev) with 3 SMTP servers pointing at the
host Postfix hc submission ports 2526/2527/2528 so it round-robins the dedicated
hc IPs .107/.108/.109. Reaches the host MTA via the docker bridge gateway.
Note: the listmonk image needs an explicit one-time '--install --idempotent
--yes' against listmonk_hc (env vars alone do not auto-install this image tag).
Validated on dev: listmonk-hc container (172.19.0.16) -> host :2526 (hcsubmit107)
-> hcout1 (.107) -> real gmail MX; both listmonk instances Up.
Chromium rejects authenticated SOCKS5 ('Browser does not support socks5 proxy
authentication'). Add a gost (ginuerzh/gost:2.11.5) 'proxy-relay' sidecar that
listens unauthenticated on socks5://proxy-relay:11080 and forwards to the
authenticated residential upstream (HEALTHCARE_PROXY_UPSTREAM_URL). Workers point
Playwright at the relay via HEALTHCARE_PROXY_URL=socks5://proxy-relay:11080.
env template: split into HEALTHCARE_PROXY_UPSTREAM_URL (authenticated, password
percent-encoded so '#' -> %23) and HEALTHCARE_PROXY_URL (the relay address).
Validated end-to-end on dev: workers Chromium -> proxy-relay -> residential
egress IP 76.228.206.147; NPPES + PECOS both HTTP 200.
CMS healthcare portals (NPPES, PECOS, I&A) block datacenter IPs, so the
healthcare browser automation needs to egress via the residential proxy on
hg409y7ez04.sn.mynetname.net (username 'performancewest').
- undetected_browser: use_proxy now accepts an env-var name, so callers can
select a domain-specific proxy. _proxy_config(proxy_env) reads it and falls
back to UNDETECTED_PROXY_URL. Healthcare uses 'HEALTHCARE_PROXY_URL'.
- probe_npi_undetected: launches with use_proxy='HEALTHCARE_PROXY_URL' when set.
- npi_provider: documents that the (future) automated NPPES/PECOS flows must
use the healthcare proxy.
- Plumb HEALTHCARE_PROXY_URL (+ UNDETECTED_PROXY_URL fallback) through the
ansible env template and docker-compose workers env.
The credential itself is NOT in the repo. Set the full URL in the ansible
vault as vault_healthcare_proxy_url:
socks5://performancewest:<password>@hg409y7ez04.sn.mynetname.net:<port>
Verified parsing + Playwright proxy-dict wiring with a unit test.
The erpnext service was missing both env vars that the portal needs:
- CUSTOMER_JWT_SECRET: verifies /set-password magic-link tokens signed by the
API. Without it, the set-password page resolved an empty/placeholder secret
and showed 'Link invalid' for every customer onboarding link.
- DATABASE_URL: lets www/orders.py read compliance_orders from Postgres for the
portal's Compliance section.
Both were present on api/workers but never wired to erpnext -> drift. Now the
single ERPNext portal can actually verify invites and show compliance orders.
Sends to the monitoring bot immediately when payment is confirmed:
- Customer name and email
- Service/slug ordered
- Total amount (includes all fees: service + formation + state + addons)
- Payment method
- Order number and type
Fire-and-forget — never blocks the payment flow.
Requires TELEGRAM_BOT_TOKEN and TELEGRAM_CHAT_ID env vars on API container.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dashboard queries now use max() to pick UP value when old stale
probe targets coexist with new ones. Prometheus admin API enabled
for future TSDB cleanup of stale series.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ERPNext: custom blackbox module with Host: performancewest.net header
(ERPNext multitenancy requires site name in Host for routing)
- Forgejo: add extra_hosts to blackbox-exporter so it can resolve
host.docker.internal to reach forgejo on port 3000
- Blackbox http_erpnext module: sets Host header, expects 200
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
host network mode prevented Prometheus from reaching the exporter.
Switched back to bridge with extra_hosts + explicit port mapping.
Added timeout flag to prevent hanging on stub_status fetch.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
nginx-exporter couldn't reach host nginx via host.docker.internal
(connection timeout). Switch to network_mode: host so it can access
127.0.0.1:8888 directly. Prometheus scrapes via host.docker.internal
with extra_hosts mapping.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- nginx stub_status moved to port 8888 (port 80 was being caught
by other server blocks and returning 301)
- nginx-exporter updated to scrape :8888
- Added alertmanager scrape job to Prometheus config (was missing,
so alertmanager dashboard had no data)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>