Add deep service health monitoring for all PW dependencies
Each service gets its own Prometheus probe verifying actual functionality: - API: /status endpoint (checks DB connectivity, returns 503 if down) - Workers: /health endpoint (job server responsive) - ERPNext: API method call (MariaDB + Redis + app all working) - MinIO: /minio/health/live (storage accessible) - Listmonk: /api/health (email service + DB) - Ollama: root endpoint (LLM inference available) - Umami: /api/heartbeat (analytics tracking) - Forgejo: root page (git server accessible) - PostgreSQL: pg_up metric from postgres-exporter - All HTTPS endpoints: SSL + reachability from outside Service-specific alerts with context: - API down = DB may be unreachable - Workers down = compliance orders not processing - ERPNext down = CRM inaccessible - MinIO down = document storage unavailable Custom Grafana dashboard: "Performance West — Services Overview" - Service status grid (UP/DOWN with colors) - Response time charts (internal + HTTPS) - SSL certificate expiry gauges - Container CPU/memory per service - PostgreSQL connections, nginx req/s, active alerts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
cc463a662f
commit
2f9005693e
3 changed files with 547 additions and 87 deletions
|
|
@ -44,23 +44,22 @@ scrape_configs:
|
|||
static_configs:
|
||||
- targets: ["nginx-exporter:9113"]
|
||||
|
||||
# ── Blackbox probes (HTTP endpoint monitoring) ─────────────────────
|
||||
- job_name: blackbox_http
|
||||
# ══════════════════════════════════════════════════════════════════════
|
||||
# Performance West Service Health Probes
|
||||
# Each probe verifies the service is FUNCTIONAL, not just responding
|
||||
# ══════════════════════════════════════════════════════════════════════
|
||||
|
||||
# ── Prod API + DB (returns 503 if DB unreachable) ──────────────────
|
||||
- job_name: pw_api_prod
|
||||
metrics_path: /probe
|
||||
params:
|
||||
module: [http_2xx]
|
||||
static_configs:
|
||||
- targets:
|
||||
- https://performancewest.net
|
||||
- https://api.performancewest.net/api/v1/fcc/search?q=test
|
||||
- https://dev.performancewest.net
|
||||
- https://api.dev.performancewest.net/api/v1/fcc/search?q=test
|
||||
- https://crm.performancewest.net
|
||||
- https://lists.performancewest.net
|
||||
- https://analytics.performancewest.net
|
||||
- http://minio:9000/minio/health/live
|
||||
- https://crypto.performancewest.net
|
||||
- https://pay.performancewest.net
|
||||
- http://api:3001/api/v1/status
|
||||
labels:
|
||||
service: api
|
||||
env: prod
|
||||
relabel_configs:
|
||||
- source_labels: [__address__]
|
||||
target_label: __param_target
|
||||
|
|
@ -69,7 +68,203 @@ scrape_configs:
|
|||
- target_label: __address__
|
||||
replacement: blackbox-exporter:9115
|
||||
|
||||
# ── Blackbox TCP probes (port monitoring) ──────────────────────────
|
||||
# ── Dev API + DB ───────────────────────────────────────────────────
|
||||
- job_name: pw_api_dev
|
||||
metrics_path: /probe
|
||||
params:
|
||||
module: [http_2xx]
|
||||
static_configs:
|
||||
- targets:
|
||||
- http://host.docker.internal:3002/api/v1/status
|
||||
labels:
|
||||
service: api
|
||||
env: dev
|
||||
relabel_configs:
|
||||
- source_labels: [__address__]
|
||||
target_label: __param_target
|
||||
- source_labels: [__param_target]
|
||||
target_label: instance
|
||||
- target_label: __address__
|
||||
replacement: blackbox-exporter:9115
|
||||
|
||||
# ── Prod Site (Astro static) ───────────────────────────────────────
|
||||
- job_name: pw_site_prod
|
||||
metrics_path: /probe
|
||||
params:
|
||||
module: [http_2xx]
|
||||
static_configs:
|
||||
- targets:
|
||||
- http://site:80/
|
||||
labels:
|
||||
service: site
|
||||
env: prod
|
||||
relabel_configs:
|
||||
- source_labels: [__address__]
|
||||
target_label: __param_target
|
||||
- source_labels: [__param_target]
|
||||
target_label: instance
|
||||
- target_label: __address__
|
||||
replacement: blackbox-exporter:9115
|
||||
|
||||
# ── Workers (Python job server) ────────────────────────────────────
|
||||
- job_name: pw_workers
|
||||
metrics_path: /probe
|
||||
params:
|
||||
module: [http_2xx]
|
||||
static_configs:
|
||||
- targets:
|
||||
- http://workers:8090/health
|
||||
labels:
|
||||
service: workers
|
||||
env: prod
|
||||
relabel_configs:
|
||||
- source_labels: [__address__]
|
||||
target_label: __param_target
|
||||
- source_labels: [__param_target]
|
||||
target_label: instance
|
||||
- target_label: __address__
|
||||
replacement: blackbox-exporter:9115
|
||||
|
||||
# ── ERPNext CRM ────────────────────────────────────────────────────
|
||||
- job_name: pw_erpnext
|
||||
metrics_path: /probe
|
||||
params:
|
||||
module: [http_2xx]
|
||||
static_configs:
|
||||
- targets:
|
||||
- http://erpnext:8000/api/method/frappe.client.get_count?doctype=Customer
|
||||
labels:
|
||||
service: erpnext
|
||||
env: prod
|
||||
relabel_configs:
|
||||
- source_labels: [__address__]
|
||||
target_label: __param_target
|
||||
- source_labels: [__param_target]
|
||||
target_label: instance
|
||||
- target_label: __address__
|
||||
replacement: blackbox-exporter:9115
|
||||
|
||||
# ── MinIO object storage ───────────────────────────────────────────
|
||||
- job_name: pw_minio
|
||||
metrics_path: /probe
|
||||
params:
|
||||
module: [http_2xx]
|
||||
static_configs:
|
||||
- targets:
|
||||
- http://minio:9000/minio/health/live
|
||||
labels:
|
||||
service: minio
|
||||
env: prod
|
||||
relabel_configs:
|
||||
- source_labels: [__address__]
|
||||
target_label: __param_target
|
||||
- source_labels: [__param_target]
|
||||
target_label: instance
|
||||
- target_label: __address__
|
||||
replacement: blackbox-exporter:9115
|
||||
|
||||
# ── Listmonk email marketing ───────────────────────────────────────
|
||||
- job_name: pw_listmonk
|
||||
metrics_path: /probe
|
||||
params:
|
||||
module: [http_2xx]
|
||||
static_configs:
|
||||
- targets:
|
||||
- http://listmonk:9000/api/health
|
||||
labels:
|
||||
service: listmonk
|
||||
env: prod
|
||||
relabel_configs:
|
||||
- source_labels: [__address__]
|
||||
target_label: __param_target
|
||||
- source_labels: [__param_target]
|
||||
target_label: instance
|
||||
- target_label: __address__
|
||||
replacement: blackbox-exporter:9115
|
||||
|
||||
# ── Ollama LLM ────────────────────────────────────────────────────
|
||||
- job_name: pw_ollama
|
||||
metrics_path: /probe
|
||||
params:
|
||||
module: [http_2xx]
|
||||
static_configs:
|
||||
- targets:
|
||||
- http://ollama:11434/
|
||||
labels:
|
||||
service: ollama
|
||||
env: prod
|
||||
relabel_configs:
|
||||
- source_labels: [__address__]
|
||||
target_label: __param_target
|
||||
- source_labels: [__param_target]
|
||||
target_label: instance
|
||||
- target_label: __address__
|
||||
replacement: blackbox-exporter:9115
|
||||
|
||||
# ── Umami analytics ────────────────────────────────────────────────
|
||||
- job_name: pw_umami
|
||||
metrics_path: /probe
|
||||
params:
|
||||
module: [http_2xx]
|
||||
static_configs:
|
||||
- targets:
|
||||
- http://umami:3000/api/heartbeat
|
||||
labels:
|
||||
service: umami
|
||||
env: prod
|
||||
relabel_configs:
|
||||
- source_labels: [__address__]
|
||||
target_label: __param_target
|
||||
- source_labels: [__param_target]
|
||||
target_label: instance
|
||||
- target_label: __address__
|
||||
replacement: blackbox-exporter:9115
|
||||
|
||||
# ── Forgejo git server ─────────────────────────────────────────────
|
||||
- job_name: pw_forgejo
|
||||
metrics_path: /probe
|
||||
params:
|
||||
module: [http_2xx]
|
||||
static_configs:
|
||||
- targets:
|
||||
- http://host.docker.internal:3030/
|
||||
labels:
|
||||
service: forgejo
|
||||
env: prod
|
||||
relabel_configs:
|
||||
- source_labels: [__address__]
|
||||
target_label: __param_target
|
||||
- source_labels: [__param_target]
|
||||
target_label: instance
|
||||
- target_label: __address__
|
||||
replacement: blackbox-exporter:9115
|
||||
|
||||
# ══════════════════════════════════════════════════════════════════════
|
||||
# External-facing HTTPS probes (SSL + reachability from outside)
|
||||
# ══════════════════════════════════════════════════════════════════════
|
||||
- job_name: blackbox_https
|
||||
metrics_path: /probe
|
||||
params:
|
||||
module: [http_2xx]
|
||||
static_configs:
|
||||
- targets:
|
||||
- https://performancewest.net
|
||||
- https://api.performancewest.net/api/v1/status
|
||||
- https://dev.performancewest.net
|
||||
- https://crm.performancewest.net
|
||||
- https://lists.performancewest.net
|
||||
- https://analytics.performancewest.net
|
||||
- https://monitoring.performancewest.net
|
||||
- https://crypto.performancewest.net
|
||||
relabel_configs:
|
||||
- source_labels: [__address__]
|
||||
target_label: __param_target
|
||||
- source_labels: [__param_target]
|
||||
target_label: instance
|
||||
- target_label: __address__
|
||||
replacement: blackbox-exporter:9115
|
||||
|
||||
# ── TCP port probes (databases, caches) ────────────────────────────
|
||||
- job_name: blackbox_tcp
|
||||
metrics_path: /probe
|
||||
params:
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue