Commit graph

15 commits

Author SHA1 Message Date
justin
08d5132459 feat(email): add listmonk-hc second instance for the healthcare HOT stream
Isolated from the trucking listmonk: own DB (listmonk_hc), own uploads volume,
own sliding-window cap. Configured (on dev) with 3 SMTP servers pointing at the
host Postfix hc submission ports 2526/2527/2528 so it round-robins the dedicated
hc IPs .107/.108/.109. Reaches the host MTA via the docker bridge gateway.

Note: the listmonk image needs an explicit one-time '--install --idempotent
--yes' against listmonk_hc (env vars alone do not auto-install this image tag).

Validated on dev: listmonk-hc container (172.19.0.16) -> host :2526 (hcsubmit107)
-> hcout1 (.107) -> real gmail MX; both listmonk instances Up.
2026-06-05 19:18:35 -05:00
justin
a79d6b1906 feat(healthcare): add gost proxy-relay so Chromium can use the residential proxy
Chromium rejects authenticated SOCKS5 ('Browser does not support socks5 proxy
authentication'). Add a gost (ginuerzh/gost:2.11.5) 'proxy-relay' sidecar that
listens unauthenticated on socks5://proxy-relay:11080 and forwards to the
authenticated residential upstream (HEALTHCARE_PROXY_UPSTREAM_URL). Workers point
Playwright at the relay via HEALTHCARE_PROXY_URL=socks5://proxy-relay:11080.

env template: split into HEALTHCARE_PROXY_UPSTREAM_URL (authenticated, password
percent-encoded so '#' -> %23) and HEALTHCARE_PROXY_URL (the relay address).

Validated end-to-end on dev: workers Chromium -> proxy-relay -> residential
egress IP 76.228.206.147; NPPES + PECOS both HTTP 200.
2026-06-05 18:39:26 -05:00
justin
17318f6e7d feat(healthcare): route NPPES/PECOS Playwright flows through residential SOCKS proxy
CMS healthcare portals (NPPES, PECOS, I&A) block datacenter IPs, so the
healthcare browser automation needs to egress via the residential proxy on
hg409y7ez04.sn.mynetname.net (username 'performancewest').

- undetected_browser: use_proxy now accepts an env-var name, so callers can
  select a domain-specific proxy. _proxy_config(proxy_env) reads it and falls
  back to UNDETECTED_PROXY_URL. Healthcare uses 'HEALTHCARE_PROXY_URL'.
- probe_npi_undetected: launches with use_proxy='HEALTHCARE_PROXY_URL' when set.
- npi_provider: documents that the (future) automated NPPES/PECOS flows must
  use the healthcare proxy.
- Plumb HEALTHCARE_PROXY_URL (+ UNDETECTED_PROXY_URL fallback) through the
  ansible env template and docker-compose workers env.

The credential itself is NOT in the repo. Set the full URL in the ansible
vault as vault_healthcare_proxy_url:
  socks5://performancewest:<password>@hg409y7ez04.sn.mynetname.net:<port>
Verified parsing + Playwright proxy-dict wiring with a unit test.
2026-06-05 14:36:01 -05:00
justin
c027d49f43 Fix trucking campaign cron send date 2026-06-04 03:19:35 -05:00
justin
5c35140a22 Configure trucking deficiency campaign cron env 2026-06-03 23:04:41 -05:00
justin
668fc6783b compose: give ERPNext CUSTOMER_JWT_SECRET + DATABASE_URL (fix portal drift)
The erpnext service was missing both env vars that the portal needs:
- CUSTOMER_JWT_SECRET: verifies /set-password magic-link tokens signed by the
  API. Without it, the set-password page resolved an empty/placeholder secret
  and showed 'Link invalid' for every customer onboarding link.
- DATABASE_URL: lets www/orders.py read compliance_orders from Postgres for the
  portal's Compliance section.

Both were present on api/workers but never wired to erpnext -> drift. Now the
single ERPNext portal can actually verify invites and show compliance orders.
2026-06-02 23:02:58 -05:00
justin
c9881868dd Add Telegram notification on every new paid order
Sends to the monitoring bot immediately when payment is confirmed:
- Customer name and email
- Service/slug ordered
- Total amount (includes all fees: service + formation + state + addons)
- Payment method
- Order number and type

Fire-and-forget — never blocks the payment flow.
Requires TELEGRAM_BOT_TOKEN and TELEGRAM_CHAT_ID env vars on API container.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 07:32:42 -05:00
justin
15f5c267e7 Fix dashboard stale series + enable Prometheus admin API
Dashboard queries now use max() to pick UP value when old stale
probe targets coexist with new ones. Prometheus admin API enabled
for future TSDB cleanup of stale series.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-01 03:43:42 -05:00
justin
b190bcef92 Fix ERPNext and Forgejo probes
- ERPNext: custom blackbox module with Host: performancewest.net header
  (ERPNext multitenancy requires site name in Host for routing)
- Forgejo: add extra_hosts to blackbox-exporter so it can resolve
  host.docker.internal to reach forgejo on port 3000
- Blackbox http_erpnext module: sets Host header, expects 200

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-01 03:35:45 -05:00
justin
0a31313956 Fix nginx-exporter: back to bridge network with host.docker.internal
host network mode prevented Prometheus from reaching the exporter.
Switched back to bridge with extra_hosts + explicit port mapping.
Added timeout flag to prevent hanging on stub_status fetch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-01 03:21:27 -05:00
justin
433827138b Fix nginx-exporter: use host network mode for direct stub_status access
nginx-exporter couldn't reach host nginx via host.docker.internal
(connection timeout). Switch to network_mode: host so it can access
127.0.0.1:8888 directly. Prometheus scrapes via host.docker.internal
with extra_hosts mapping.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-01 03:19:57 -05:00
justin
27cc925c4d Fix nginx-exporter port and add alertmanager scrape target
- nginx stub_status moved to port 8888 (port 80 was being caught
  by other server blocks and returning 301)
- nginx-exporter updated to scrape :8888
- Added alertmanager scrape job to Prometheus config (was missing,
  so alertmanager dashboard had no data)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-01 03:17:31 -05:00
justin
b38b1af872 Disable Grafana brute force lockout during initial setup
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-01 03:11:30 -05:00
justin
a4a5500bfc Add Prometheus + Grafana + Alertmanager monitoring stack
Full observability stack with Telegram alerting:

Components:
- Prometheus: metrics collection, 90-day retention
- Grafana: dashboards at monitoring.performancewest.net
- Alertmanager: routes alerts to Telegram bot
- node-exporter: OS metrics (CPU, RAM, disk, network)
- cAdvisor: container metrics (CPU, memory, restarts)
- postgres-exporter: PostgreSQL connection/query metrics
- nginx-exporter: request rate, 5xx errors, connections
- blackbox-exporter: HTTP/TCP endpoint probing + SSL cert checks

Alert rules:
- Service down (HTTP probe, TCP port, container missing)
- Container restart loops
- High CPU/memory/disk/load
- PostgreSQL down or high connections
- SSL cert expiring (14d warning, 3d critical)
- Slow HTTP responses, high 5xx rate

Blackbox probes all public endpoints:
  performancewest.net, api, dev, crm, lists, analytics,
  minio, crypto, pay

Telegram alerts: critical=1h repeat, warning=6h repeat,
  auto-resolve notifications

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-01 02:08:39 -05:00
justin
f8cd37ac8c Initial commit — Performance West telecom compliance platform
Includes: API (Express/TypeScript), Astro site, Python workers,
document generators, FCC compliance tools, Canada CRTC formation,
Ansible infrastructure, and deployment scripts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-27 06:54:22 -05:00