No description
Each service gets its own Prometheus probe verifying actual functionality: - API: /status endpoint (checks DB connectivity, returns 503 if down) - Workers: /health endpoint (job server responsive) - ERPNext: API method call (MariaDB + Redis + app all working) - MinIO: /minio/health/live (storage accessible) - Listmonk: /api/health (email service + DB) - Ollama: root endpoint (LLM inference available) - Umami: /api/heartbeat (analytics tracking) - Forgejo: root page (git server accessible) - PostgreSQL: pg_up metric from postgres-exporter - All HTTPS endpoints: SSL + reachability from outside Service-specific alerts with context: - API down = DB may be unreachable - Workers down = compliance orders not processing - ERPNext down = CRM inaccessible - MinIO down = document storage unavailable Custom Grafana dashboard: "Performance West — Services Overview" - Service status grid (UP/DOWN with colors) - Response time charts (internal + HTTPS) - SSL certificate expiry gauges - Container CPU/memory per service - PostgreSQL connections, nginx req/s, active alerts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .claude/projects/-home-justin-projects-performancewest-new-site/memory | ||
| api | ||
| chrome-extension/fcc-access-helper | ||
| docs | ||
| docserver | ||
| frappe_adyen | ||
| frappe_ca_registry | ||
| frappe_crypto | ||
| infra | ||
| mcp | ||
| monitoring | ||
| node-compile-cache/v25.1.0-x64-392347a2-1000 | ||
| performancewest_erpnext | ||
| scripts | ||
| site | ||
| src | ||
| .gitignore | ||
| CLAUDE.md | ||
| deploy.sh | ||
| docker-compose.yml | ||