new-site/infra/ansible
justin 78c04b8bc3 Add Playwright failure monitoring: Telegram alerts + screenshots + health check
When any Playwright submission fails (selector not found, timeout, etc.):
1. Full-page screenshot captured and uploaded to MinIO
2. Telegram alert sent immediately with error details + screenshot link
3. Email alert to ops with same info
4. Admin todo includes screenshot MinIO path for debugging
5. Client order stays pending for manual completion

Proactive selector health check (daily 7am CT cron):
- Navigates to each portal (FCC RMD, USAC E-File, FCC CPNI/ECFS)
- Verifies all critical selectors are still present in the DOM
- If selectors are missing (UI changed): alerts via Telegram + email
  BEFORE any real client order fails
- Reports which service slugs are affected

Integrated into:
- RMD filing handler (fccprod.servicenowservices.com)
- Form 499-A handler (forms.universalservice.org)
- Form 499-Q handler (already had error handling)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 02:44:02 -05:00
..
inventory Add Prometheus + Grafana + Alertmanager monitoring stack 2026-05-01 02:08:39 -05:00
playbooks Add Prometheus + Grafana + Alertmanager monitoring stack 2026-05-01 02:08:39 -05:00
roles Add Playwright failure monitoring: Telegram alerts + screenshots + health check 2026-05-04 02:44:02 -05:00
ansible.cfg Add Prometheus + Grafana + Alertmanager monitoring stack 2026-05-01 02:08:39 -05:00