Commit graph

10 commits

Author SHA1 Message Date
justin
668fc6783b compose: give ERPNext CUSTOMER_JWT_SECRET + DATABASE_URL (fix portal drift)
The erpnext service was missing both env vars that the portal needs:
- CUSTOMER_JWT_SECRET: verifies /set-password magic-link tokens signed by the
  API. Without it, the set-password page resolved an empty/placeholder secret
  and showed 'Link invalid' for every customer onboarding link.
- DATABASE_URL: lets www/orders.py read compliance_orders from Postgres for the
  portal's Compliance section.

Both were present on api/workers but never wired to erpnext -> drift. Now the
single ERPNext portal can actually verify invites and show compliance orders.
2026-06-02 23:02:58 -05:00
justin
c9881868dd Add Telegram notification on every new paid order
Sends to the monitoring bot immediately when payment is confirmed:
- Customer name and email
- Service/slug ordered
- Total amount (includes all fees: service + formation + state + addons)
- Payment method
- Order number and type

Fire-and-forget — never blocks the payment flow.
Requires TELEGRAM_BOT_TOKEN and TELEGRAM_CHAT_ID env vars on API container.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 07:32:42 -05:00
justin
15f5c267e7 Fix dashboard stale series + enable Prometheus admin API
Dashboard queries now use max() to pick UP value when old stale
probe targets coexist with new ones. Prometheus admin API enabled
for future TSDB cleanup of stale series.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-01 03:43:42 -05:00
justin
b190bcef92 Fix ERPNext and Forgejo probes
- ERPNext: custom blackbox module with Host: performancewest.net header
  (ERPNext multitenancy requires site name in Host for routing)
- Forgejo: add extra_hosts to blackbox-exporter so it can resolve
  host.docker.internal to reach forgejo on port 3000
- Blackbox http_erpnext module: sets Host header, expects 200

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-01 03:35:45 -05:00
justin
0a31313956 Fix nginx-exporter: back to bridge network with host.docker.internal
host network mode prevented Prometheus from reaching the exporter.
Switched back to bridge with extra_hosts + explicit port mapping.
Added timeout flag to prevent hanging on stub_status fetch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-01 03:21:27 -05:00
justin
433827138b Fix nginx-exporter: use host network mode for direct stub_status access
nginx-exporter couldn't reach host nginx via host.docker.internal
(connection timeout). Switch to network_mode: host so it can access
127.0.0.1:8888 directly. Prometheus scrapes via host.docker.internal
with extra_hosts mapping.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-01 03:19:57 -05:00
justin
27cc925c4d Fix nginx-exporter port and add alertmanager scrape target
- nginx stub_status moved to port 8888 (port 80 was being caught
  by other server blocks and returning 301)
- nginx-exporter updated to scrape :8888
- Added alertmanager scrape job to Prometheus config (was missing,
  so alertmanager dashboard had no data)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-01 03:17:31 -05:00
justin
b38b1af872 Disable Grafana brute force lockout during initial setup
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-01 03:11:30 -05:00
justin
a4a5500bfc Add Prometheus + Grafana + Alertmanager monitoring stack
Full observability stack with Telegram alerting:

Components:
- Prometheus: metrics collection, 90-day retention
- Grafana: dashboards at monitoring.performancewest.net
- Alertmanager: routes alerts to Telegram bot
- node-exporter: OS metrics (CPU, RAM, disk, network)
- cAdvisor: container metrics (CPU, memory, restarts)
- postgres-exporter: PostgreSQL connection/query metrics
- nginx-exporter: request rate, 5xx errors, connections
- blackbox-exporter: HTTP/TCP endpoint probing + SSL cert checks

Alert rules:
- Service down (HTTP probe, TCP port, container missing)
- Container restart loops
- High CPU/memory/disk/load
- PostgreSQL down or high connections
- SSL cert expiring (14d warning, 3d critical)
- Slow HTTP responses, high 5xx rate

Blackbox probes all public endpoints:
  performancewest.net, api, dev, crm, lists, analytics,
  minio, crypto, pay

Telegram alerts: critical=1h repeat, warning=6h repeat,
  auto-resolve notifications

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-01 02:08:39 -05:00
justin
f8cd37ac8c Initial commit — Performance West telecom compliance platform
Includes: API (Express/TypeScript), Astro site, Python workers,
document generators, FCC compliance tools, Canada CRTC formation,
Ansible infrastructure, and deployment scripts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-27 06:54:22 -05:00