new-site

History

justin 2f9005693e Add deep service health monitoring for all PW dependencies Each service gets its own Prometheus probe verifying actual functionality: - API: /status endpoint (checks DB connectivity, returns 503 if down) - Workers: /health endpoint (job server responsive) - ERPNext: API method call (MariaDB + Redis + app all working) - MinIO: /minio/health/live (storage accessible) - Listmonk: /api/health (email service + DB) - Ollama: root endpoint (LLM inference available) - Umami: /api/heartbeat (analytics tracking) - Forgejo: root page (git server accessible) - PostgreSQL: pg_up metric from postgres-exporter - All HTTPS endpoints: SSL + reachability from outside Service-specific alerts with context: - API down = DB may be unreachable - Workers down = compliance orders not processing - ERPNext down = CRM inaccessible - MinIO down = document storage unavailable Custom Grafana dashboard: "Performance West — Services Overview" - Service status grid (UP/DOWN with colors) - Response time charts (internal + HTTPS) - SSL certificate expiry gauges - Container CPU/memory per service - PostgreSQL connections, nginx req/s, active alerts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>		2026-05-01 03:30:23 -05:00
..
alert_rules.yml	Add deep service health monitoring for all PW dependencies	2026-05-01 03:30:23 -05:00
alertmanager.yml	Add Prometheus + Grafana + Alertmanager monitoring stack	2026-05-01 02:08:39 -05:00
blackbox.yml	Add Prometheus + Grafana + Alertmanager monitoring stack	2026-05-01 02:08:39 -05:00
grafana-datasources.yml	Remove fixed uid from Grafana datasource provisioning — Grafana 13 rejects it on fresh boot	2026-05-01 03:09:10 -05:00
prometheus.yml	Add deep service health monitoring for all PW dependencies	2026-05-01 03:30:23 -05:00
pw-services-dashboard.json	Add deep service health monitoring for all PW dependencies	2026-05-01 03:30:23 -05:00