new-site

Author	SHA1	Message	Date
justin	8e5590b492	mail: DMARC aggregate-report parser + dedicated dmarc@ mailbox ingestion Tool 2 of the deliverability monitoring pair (Tool 1 = mail_reputation_monitor). DMARC rua reports from dozens of operators (Google, Yahoo, Comcast, Cox, Bell, Mimecast, Cisco ESA, GMX, mail.com, ...) were landing in ops@ (dmarc@ was a DL), burying real mail and never parsed. Now ingested + queryable: - dmarc@performancewest.net converted DL -> dedicated Carbonio mailbox; isolated IMAP creds in server .env, surfaced to workers in docker-compose.yml (mirrors OPS_IMAP_*). 29 historical reports moved ops@ -> dmarc@ via IMAP. - scripts/dmarc_report_parser.py: IMAP fetch unseen -> decompress .gz/.zip/.xml (namespace-agnostic: classic + urn:ietf:params:xml:ns:dmarc-2.0 GMX/mail.com) -> parse aggregate XML -> upsert dmarc_report (keyed (org_name,report_id), no-op on re-parse) + dmarc_record per source IP. dmarc_pass = dkim_aligned OR spf_aligned. Marks \Seen. --dry-run/--all/--alert (7d per-IP summary + Telegram if one of OUR IPs <95% pass, or EXTERNAL IP sends >=20 failing msgs as us = spoofing under p=reject). psycopg2 imported lazily so --dry-run runs without the driver. - api/migrations/102_dmarc_aggregate.sql: dmarc_report + dmarc_record tables. - infra/cron/pw-dmarc-parser: 06:20 UTC daily --alert (after reputation, before scrub). - docs/deliverability.md: DMARC section DONE; query examples. Verified: dry-run --all parses all 28 reports (1 non-report test probe), 0 unknown after the namespace fix.	2026-06-19 08:50:20 -05:00
justin	a32a3b05a0	email: add plaintext MIME part + stable Message-ID hostname Two deliverability hardening fixes from the email audit: 1. Plaintext (altbody): all campaigns were HTML-only. Listmonk only emits multipart/alternative when altbody is set, and HTML-only bulk mail is a spam-score signal. New scripts/_email_plaintext.py renders a readable text/plain part from the HTML body (dependency-free; preserves Listmonk {{ .Subscriber }}/{{ UnsubscribeURL }} template tags, turns links into 'text (url)'). Wired into the trucking builder (and thus UCR + IFTA, which reuse create_and_schedule_campaign) and the healthcare builder. 2. Stable container hostname: Listmonk derived its Message-ID from the random docker container id -> @localhost.localdomain (spam-score signal). Pin both listmonk + listmonk-hc hostname to perfwest.performancewest.net, matching Listmonk's SMTP hello_hostname. Part of the email-deliverability incident hardening.	2026-06-17 20:09:02 -05:00
justin	557b45f65d	fix(erpnext): self-heal outgoing Email Account password from SMTP_* env Root cause of recurring 'Password not found for Email Account Performance West Outgoing': the account was shipped as a fixture with awaiting_password=1 and no password. Email Account SMTP passwords are encrypted per-site and cannot live in a fixture, so every `bench migrate` reimported the fixture and re-broke outgoing mail (login notifications, password resets, welcome emails). - Remove the Email Account fixture (it cannot carry the encrypted secret). - Add email_account_sync.sync_outgoing_password: idempotent, exception-safe upsert that reconciles the account + password from SMTP_* env and clears awaiting_password. - Wire it to after_migrate (repairs at end of every deploy/migrate, right after fixtures import) and the daily scheduler (heals out-of-band restore/restart drift). - Pass SMTP_* into the erpnext + erpnext-scheduler containers so the sync has the secret (they previously had no SMTP env).	2026-06-17 09:48:28 -05:00
justin	08d5132459	feat(email): add listmonk-hc second instance for the healthcare HOT stream Isolated from the trucking listmonk: own DB (listmonk_hc), own uploads volume, own sliding-window cap. Configured (on dev) with 3 SMTP servers pointing at the host Postfix hc submission ports 2526/2527/2528 so it round-robins the dedicated hc IPs .107/.108/.109. Reaches the host MTA via the docker bridge gateway. Note: the listmonk image needs an explicit one-time '--install --idempotent --yes' against listmonk_hc (env vars alone do not auto-install this image tag). Validated on dev: listmonk-hc container (172.19.0.16) -> host :2526 (hcsubmit107) -> hcout1 (.107) -> real gmail MX; both listmonk instances Up.	2026-06-05 19:18:35 -05:00
justin	a79d6b1906	feat(healthcare): add gost proxy-relay so Chromium can use the residential proxy Chromium rejects authenticated SOCKS5 ('Browser does not support socks5 proxy authentication'). Add a gost (ginuerzh/gost:2.11.5) 'proxy-relay' sidecar that listens unauthenticated on socks5://proxy-relay:11080 and forwards to the authenticated residential upstream (HEALTHCARE_PROXY_UPSTREAM_URL). Workers point Playwright at the relay via HEALTHCARE_PROXY_URL=socks5://proxy-relay:11080. env template: split into HEALTHCARE_PROXY_UPSTREAM_URL (authenticated, password percent-encoded so '#' -> %23) and HEALTHCARE_PROXY_URL (the relay address). Validated end-to-end on dev: workers Chromium -> proxy-relay -> residential egress IP 76.228.206.147; NPPES + PECOS both HTTP 200.	2026-06-05 18:39:26 -05:00
justin	17318f6e7d	feat(healthcare): route NPPES/PECOS Playwright flows through residential SOCKS proxy CMS healthcare portals (NPPES, PECOS, I&A) block datacenter IPs, so the healthcare browser automation needs to egress via the residential proxy on hg409y7ez04.sn.mynetname.net (username 'performancewest'). - undetected_browser: use_proxy now accepts an env-var name, so callers can select a domain-specific proxy. _proxy_config(proxy_env) reads it and falls back to UNDETECTED_PROXY_URL. Healthcare uses 'HEALTHCARE_PROXY_URL'. - probe_npi_undetected: launches with use_proxy='HEALTHCARE_PROXY_URL' when set. - npi_provider: documents that the (future) automated NPPES/PECOS flows must use the healthcare proxy. - Plumb HEALTHCARE_PROXY_URL (+ UNDETECTED_PROXY_URL fallback) through the ansible env template and docker-compose workers env. The credential itself is NOT in the repo. Set the full URL in the ansible vault as vault_healthcare_proxy_url: socks5://performancewest:<password>@hg409y7ez04.sn.mynetname.net:<port> Verified parsing + Playwright proxy-dict wiring with a unit test.	2026-06-05 14:36:01 -05:00
justin	c027d49f43	Fix trucking campaign cron send date	2026-06-04 03:19:35 -05:00
justin	5c35140a22	Configure trucking deficiency campaign cron env	2026-06-03 23:04:41 -05:00
justin	668fc6783b	compose: give ERPNext CUSTOMER_JWT_SECRET + DATABASE_URL (fix portal drift) The erpnext service was missing both env vars that the portal needs: - CUSTOMER_JWT_SECRET: verifies /set-password magic-link tokens signed by the API. Without it, the set-password page resolved an empty/placeholder secret and showed 'Link invalid' for every customer onboarding link. - DATABASE_URL: lets www/orders.py read compliance_orders from Postgres for the portal's Compliance section. Both were present on api/workers but never wired to erpnext -> drift. Now the single ERPNext portal can actually verify invites and show compliance orders.	2026-06-02 23:02:58 -05:00
justin	c9881868dd	Add Telegram notification on every new paid order Sends to the monitoring bot immediately when payment is confirmed: - Customer name and email - Service/slug ordered - Total amount (includes all fees: service + formation + state + addons) - Payment method - Order number and type Fire-and-forget — never blocks the payment flow. Requires TELEGRAM_BOT_TOKEN and TELEGRAM_CHAT_ID env vars on API container. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-04 07:32:42 -05:00
justin	15f5c267e7	Fix dashboard stale series + enable Prometheus admin API Dashboard queries now use max() to pick UP value when old stale probe targets coexist with new ones. Prometheus admin API enabled for future TSDB cleanup of stale series. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-01 03:43:42 -05:00
justin	b190bcef92	Fix ERPNext and Forgejo probes - ERPNext: custom blackbox module with Host: performancewest.net header (ERPNext multitenancy requires site name in Host for routing) - Forgejo: add extra_hosts to blackbox-exporter so it can resolve host.docker.internal to reach forgejo on port 3000 - Blackbox http_erpnext module: sets Host header, expects 200 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-01 03:35:45 -05:00
justin	0a31313956	Fix nginx-exporter: back to bridge network with host.docker.internal host network mode prevented Prometheus from reaching the exporter. Switched back to bridge with extra_hosts + explicit port mapping. Added timeout flag to prevent hanging on stub_status fetch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-01 03:21:27 -05:00
justin	433827138b	Fix nginx-exporter: use host network mode for direct stub_status access nginx-exporter couldn't reach host nginx via host.docker.internal (connection timeout). Switch to network_mode: host so it can access 127.0.0.1:8888 directly. Prometheus scrapes via host.docker.internal with extra_hosts mapping. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-01 03:19:57 -05:00
justin	27cc925c4d	Fix nginx-exporter port and add alertmanager scrape target - nginx stub_status moved to port 8888 (port 80 was being caught by other server blocks and returning 301) - nginx-exporter updated to scrape :8888 - Added alertmanager scrape job to Prometheus config (was missing, so alertmanager dashboard had no data) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-01 03:17:31 -05:00
justin	b38b1af872	Disable Grafana brute force lockout during initial setup Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-01 03:11:30 -05:00
justin	a4a5500bfc	Add Prometheus + Grafana + Alertmanager monitoring stack Full observability stack with Telegram alerting: Components: - Prometheus: metrics collection, 90-day retention - Grafana: dashboards at monitoring.performancewest.net - Alertmanager: routes alerts to Telegram bot - node-exporter: OS metrics (CPU, RAM, disk, network) - cAdvisor: container metrics (CPU, memory, restarts) - postgres-exporter: PostgreSQL connection/query metrics - nginx-exporter: request rate, 5xx errors, connections - blackbox-exporter: HTTP/TCP endpoint probing + SSL cert checks Alert rules: - Service down (HTTP probe, TCP port, container missing) - Container restart loops - High CPU/memory/disk/load - PostgreSQL down or high connections - SSL cert expiring (14d warning, 3d critical) - Slow HTTP responses, high 5xx rate Blackbox probes all public endpoints: performancewest.net, api, dev, crm, lists, analytics, minio, crypto, pay Telegram alerts: critical=1h repeat, warning=6h repeat, auto-resolve notifications Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-01 02:08:39 -05:00
justin	f8cd37ac8c	Initial commit — Performance West telecom compliance platform Includes: API (Express/TypeScript), Astro site, Python workers, document generators, FCC compliance tools, Canada CRTC formation, Ansible infrastructure, and deployment scripts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-27 06:54:22 -05:00

18 commits