new-site/monitoring/blackbox.yml at 433827138b88ed2a40dbbce1bcb3ddd28a0e8691 - justin/new-site - Forgejo: Beyond coding. We Forge.

justin/new-site

justin a4a5500bfc Add Prometheus + Grafana + Alertmanager monitoring stack

Full observability stack with Telegram alerting:

Components:
- Prometheus: metrics collection, 90-day retention
- Grafana: dashboards at monitoring.performancewest.net
- Alertmanager: routes alerts to Telegram bot
- node-exporter: OS metrics (CPU, RAM, disk, network)
- cAdvisor: container metrics (CPU, memory, restarts)
- postgres-exporter: PostgreSQL connection/query metrics
- nginx-exporter: request rate, 5xx errors, connections
- blackbox-exporter: HTTP/TCP endpoint probing + SSL cert checks

Alert rules:
- Service down (HTTP probe, TCP port, container missing)
- Container restart loops
- High CPU/memory/disk/load
- PostgreSQL down or high connections
- SSL cert expiring (14d warning, 3d critical)
- Slow HTTP responses, high 5xx rate

Blackbox probes all public endpoints:
  performancewest.net, api, dev, crm, lists, analytics,
  minio, crypto, pay

Telegram alerts: critical=1h repeat, warning=6h repeat,
  auto-resolve notifications

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-05-01 02:08:39 -05:00

15 lines

323 B

YAML

Raw Blame History

 modules:
   http_2xx:
     prober: http
     timeout: 10s
     http:
       valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
       valid_status_codes: [200, 301, 302]
       follow_redirects: true
       preferred_ip_protocol: ip4
       tls_config:
         insecure_skip_verify: false
   tcp_connect:
     prober: tcp
     timeout: 5s