Full observability stack with Telegram alerting: Components: - Prometheus: metrics collection, 90-day retention - Grafana: dashboards at monitoring.performancewest.net - Alertmanager: routes alerts to Telegram bot - node-exporter: OS metrics (CPU, RAM, disk, network) - cAdvisor: container metrics (CPU, memory, restarts) - postgres-exporter: PostgreSQL connection/query metrics - nginx-exporter: request rate, 5xx errors, connections - blackbox-exporter: HTTP/TCP endpoint probing + SSL cert checks Alert rules: - Service down (HTTP probe, TCP port, container missing) - Container restart loops - High CPU/memory/disk/load - PostgreSQL down or high connections - SSL cert expiring (14d warning, 3d critical) - Slow HTTP responses, high 5xx rate Blackbox probes all public endpoints: performancewest.net, api, dev, crm, lists, analytics, minio, crypto, pay Telegram alerts: critical=1h repeat, warning=6h repeat, auto-resolve notifications Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
13 lines
497 B
YAML
13 lines
497 B
YAML
---
|
|
monitoring_domain: monitoring.performancewest.net
|
|
grafana_port: 3200
|
|
prometheus_port: 9090
|
|
alertmanager_port: 9093
|
|
|
|
# Telegram bot for alerts (set in vault)
|
|
telegram_bot_token: "{{ vault_telegram_bot_token | default('') }}"
|
|
telegram_chat_id: "{{ vault_telegram_chat_id | default('') }}"
|
|
|
|
# Grafana admin credentials (set in vault)
|
|
grafana_admin_user: "{{ vault_grafana_admin_user | default('admin') }}"
|
|
grafana_admin_password: "{{ vault_grafana_admin_password | default('pw_grafana_2026') }}"
|