First live ingest (28 reports) showed our warmup rotation pool (.91-.109, out0x)
mislabeled EXTERNAL because OUR_IPS only listed 4 specific IPs -- every one was
100% DMARC-passing, clearly ours, and would have generated false spoofing alerts.
Replace the literal-IP set with an ipaddress subnet check on 207.174.124.0/24
(our whole block). The only genuinely-external failing sender is 35.174.145.124
(AWS, 32 msgs spoofing us, SPF-fail/no-DKIM, all correctly rejected by p=reject) --
exactly the signal the --alert path is meant to surface.
Tool 2 of the deliverability monitoring pair (Tool 1 = mail_reputation_monitor).
DMARC rua reports from dozens of operators (Google, Yahoo, Comcast, Cox, Bell,
Mimecast, Cisco ESA, GMX, mail.com, ...) were landing in ops@ (dmarc@ was a DL),
burying real mail and never parsed. Now ingested + queryable:
- dmarc@performancewest.net converted DL -> dedicated Carbonio mailbox; isolated
IMAP creds in server .env, surfaced to workers in docker-compose.yml (mirrors
OPS_IMAP_*). 29 historical reports moved ops@ -> dmarc@ via IMAP.
- scripts/dmarc_report_parser.py: IMAP fetch unseen -> decompress .gz/.zip/.xml
(namespace-agnostic: classic + urn:ietf:params:xml:ns:dmarc-2.0 GMX/mail.com) ->
parse aggregate XML -> upsert dmarc_report (keyed (org_name,report_id), no-op on
re-parse) + dmarc_record per source IP. dmarc_pass = dkim_aligned OR spf_aligned.
Marks \Seen. --dry-run/--all/--alert (7d per-IP summary + Telegram if one of OUR
IPs <95% pass, or EXTERNAL IP sends >=20 failing msgs as us = spoofing under
p=reject). psycopg2 imported lazily so --dry-run runs without the driver.
- api/migrations/102_dmarc_aggregate.sql: dmarc_report + dmarc_record tables.
- infra/cron/pw-dmarc-parser: 06:20 UTC daily --alert (after reputation, before scrub).
- docs/deliverability.md: DMARC section DONE; query examples.
Verified: dry-run --all parses all 28 reports (1 non-report test probe), 0 unknown
after the namespace fix.
Runs mail_reputation_monitor --alert at 06:10 UTC, piping the day's postfix log
(sudo cat, same pattern as pw-warmup-tg-alert) into the DB-connected workers
container. Builds the daily SNDS-equivalent reputation trend and Telegram-alerts
on operator regressions. Installed to /etc/cron.d/pw-mail-reputation.
Adds scripts/mail_reputation_monitor.py + migration 101 (mail_reputation_daily).
Sender reputation is judged by the RECEIVING operator (Microsoft/Google/Yahoo/
Proofpoint), and the provider portals (SNDS/Postmaster/CFL) need a login and lag
24-48h. Our postfix logs already carry the ground truth in real time: every send
records the receiving host + SMTP response, and the response classifies WHY:
250 -> accepted
451 4.7.500 -> throttled (Microsoft rate-limiting a cold IP)
550 5.7.x -> reject_reputation (spam/reputation)
550 5.1.1/5.4.1-> reject_recipient (dead mailbox / access denied = list hygiene)
550 ...SPAM -> reject_content (SpamAssassin)
The parser classifies each egress delivery (out0x/hcout/relay) by (sending_ip,
receiver, outcome, reason_code) and upserts ONE daily aggregate row per bucket
(idempotent ON CONFLICT), so a nightly cron over the rotated log gives a queryable
trend without re-parse double-counting. --alert prints a per-operator summary and
Telegram-alerts on regressions (>=10% reputation rejects, or Microsoft >=70%
throttled). Reads stdin ("-") so the host-owned /var/log/mail.log can be piped
into the DB-connected workers container.
Motivation: 2026-06-19 audit found ~80% of Microsoft sends were getting 451 4.7.500
throttles on the warming IPs -- this makes that trend visible as reputation recovers.
performancewest.net + send.performancewest.net both show Enrolled in the Yahoo
Sender Hub, reporting email fbl@. All three FBLs (Google Postmaster, MS SNDS+JMRP,
Yahoo CFL) now complete.
Added yahoo-verification-key TXT records via Hestia for performancewest.net
(apex) and send.performancewest.net; both propagated to all HE.net slaves +
public resolvers. Ready to click Verify in the Yahoo CFL form, complaint dest fbl@.
SNDS access requested/granted for 207.174.124.94 + .107; JMRP feeds registered
with complaint dest fbl@. Section marked complete. SNDS data populates in ~24-48h.
Note JMRP delivers ARF complaints to the signed-in MS account's email, not
automatically to fbl@; set a forward if that account isn't fbl@performancewest.net.
Use the legacy sendersupport.olc.protection.outlook.com/snds/ (308-redirects) or
the direct substrate.office.com/ip-domain-management-snds/SNDS app URL. Flag that
snds.microsoft.com has no DNS.
SNDS moved off sendersupport.olc.protection.outlook.com to
substrate.office.com/ip-domain-management-snds/. The old /snds/ and /pm/ links
308-redirect there. Document that the footer/help links going to microsoft.com
are boilerplate (not broken), and that you must Log in FIRST or the Request
Access / JMRP links bounce to login.microsoftonline.com (expected, not dead).
Add working direct links + canonical https://snds.microsoft.com entry point.
Records the Apple/iCloud addition, the builder-vs-list-based distinction, the
scrub_listmonk_consumer reconciliation tool + daily cron, and the 2026-06-19
first-run numbers (7,943 trucking + 21 HC stale consumer subs blocklisted).
Runs scrub_listmonk_consumer against both listmonk and listmonk_hc at 06:30 UTC,
before the campaign builders, so any ENABLED subscriber matching the authoritative
exclusion list is blocklisted retroactively. Keeps list-based campaigns (FCC
Direct Contacts, CRTC/USF, etc.) from leaking onto consumer mailboxes after a new
domain (e.g. Apple/iCloud) is added to the exclusion list. Installed to
/etc/cron.d/pw-listmonk-scrub on the host.
The fmcsa campaign builders already exclude gmail/yahoo/microsoft/etc. from NEW
audience selections, but two reputation leaks remained on the LIST-BASED side:
1. iCloud/Apple gap. icloud.com/me.com/mac.com were never in the exclusion set.
A 2026-06 Listmonk audit found 1,321 ENABLED iCloud subscribers on list 3
("FCC Carriers - Direct Contacts") -- the single largest enabled-consumer
bucket -- being cold-blasted with no exclusion at all. Add APPLE_CONSUMER_DOMAINS.
2. Stale already-imported consumer subs. List-based campaigns (e.g. the running
CRTC/USF blast on list 3) keep hitting consumer addresses imported BEFORE the
relevant domain joined the exclusion list. gmail.com was still the #1 bounce
domain via that campaign even though new selections exclude it. Add
scrub_listmonk_consumer.py: reconciles the live Listmonk subscriber table
against the authoritative exclusion list and blocklists any ENABLED subscriber
whose address is_blocked(). Idempotent; re-run whenever the exclusion grows so
it applies retroactively. Uses the same 'blocklisted' terminal state as the
bounce handler, so contacts are excluded from all current/future campaigns
without deleting history. Supports --dry-run and both listmonk / listmonk_hc.
Created postmaster@/abuse@/fbl@/dmarc@ as Carbonio DLs -> ops@ (they previously
REJECTED 5.1.1, which would have blocked SNDS verification AND was silently
dropping all DMARC aggregate reports). Verified accept-at-MX + delivered E2E.
Reframe Microsoft as the #1 monitoring priority (85% of audience), Yahoo as
lowest (<1%); add Carbonio admin access note; note DMARC parser now worth building.
- infra/ansible/roles/mail: refactor OpenDKIM to support multiple signing domains
via opendkim_signing_domains list (root + send.performancewest.net). Loops
keygen/ownership/keytable/signingtable so the live two-domain setup is
reproducible from ansible.
- infra/ansible group_vars: add bulk_mail_subdomain + campaign_from_* +
campaign_reply_to documentation vars (map to CAMPAIGN_FROM / HC_CAMPAIGN_FROM
env read by the builder scripts). smtp_from (transactional) stays on root.
- docs/deliverability.md: rewrite TL;DR with the carrierone-vs-performancewest
A/B proof (same server/IPs, different From domain -> Inbox vs Junk) and the
~85% Microsoft / 14% Google / <1% Yahoo audience mix; add the bulk-subdomain
section, SPF trim, rehab-disabled, and the Hestia DNS automation runbook.
Isolates bulk sending reputation onto a dedicated subdomain so the root domain
stays clean for transactional/verification mail (and recovers faster). Replies
still go to the root domain via Reply-To, so the customer-facing reply experience
is unchanged.
- build_trucking_campaigns.py: add env-overridable FROM_EMAIL
(noreply@send.performancewest.net); use it for both scheduled + test sends
instead of inheriting base["from_email"] from the DB base campaign.
- build_healthcare_campaigns_cron.py: FROM_EMAIL ->
compliance@send.performancewest.net (env-overridable).
- bounce-watcher.sh / hc-bounce-watcher.sh: track the new subdomain envelope
sender (keep legacy root-domain sender so the pre-cutover queue still drains;
HC also tracks by hcout transport regardless of sender).
Infra already live (separate, non-code): subdomain DNS (A/MX/SPF/DKIM
selector=send/DMARC p=reject) on the Hestia master, OpenDKIM signs
d=send.performancewest.net (verified end-to-end), egress .94/.107. Root SPF
trimmed to the real IPs; pointless IP-rehab cron disabled.
DNS is fully automatable: Hestia (cp.carrierone.com, zone owner = justin user)
is the DNS master, HE.net are slaves. Added google-site-verification TXT (id
14464) via v-add-dns-record as root; verified resolving on public resolvers +
HE.net slaves. Owner just clicks Verify in the Postmaster console. Documents the
v-add-dns-record path for future records.
Documents the 2026-06-18 reputation incident (snowshoe -> Gmail domain-rep
blocks, RBLs all clean), the single-IP-per-stream consolidation, and
fill-in-the-blanks setup steps for Google Postmaster Tools, Microsoft SNDS/JMRP,
and Yahoo CFL (all require owner account login + HE.net DNS). Plus ongoing
hygiene + how to re-expand IPs once reputation recovers.
The multi-IP rotation was built to spread risk while DKIM was broken (fixed
2026-06-17) and after the May 30-31 over-volume blast. With DKIM signing
correctly, spreading ~3k trucking msgs/day across 12 IPs (.94-.105) + ~1.2k
healthcare msgs/day across 3 IPs (.107-.109) gave each IP far too little
per-receiver volume to build reputation. Gmail/Outlook read it as snowshoe spam
and reputation-blocked ~200 msgs/day ("very low reputation of the sending
domain") -> 0 human clicks, 0 sales.
Consolidate to ONE IP per stream so each accrues real reputation:
- trucking: pw-mta-warmup ALL=(out05) -> randmap collapses to {out05:} = .94
- healthcare: listmonk-hc SMTP servers 2/3 (ports 2527/2528 -> .108/.109)
disabled in DB; all HC mail now egresses .107 (hcmta01). [applied live]
Applied live: transport_maps now randmap:{out05:}; listmonk-hc restarted.
To re-expand later: add transports back to ALL + re-enable the HC SMTP servers.
Checkout half proven against live Stripe (dry-run session created + expired,
zero charge), webhook subscription-id extraction + worker renewal fulfillment
covered by unit tests (31 + 13). Remaining gap: full E2E with a Stripe test
clock, which needs test-mode keys in the server .env (currently unset).
Runs the real _BaseNPIHandler.handle() with _create_todo monkeypatched (no DB /
ERPNext / email side effects) and asserts:
- first OIG/SAM screening has no [Monthly cycle] prefix / RECURRING banner
- a recurring_cycle order gets the [Monthly cycle] title prefix, the
"RECURRING MONTHLY CYCLE" banner, the invoice id, and the re-run-against-
CURRENT-data + issue-NEW-certificate instructions
- recurring_cycle works with and without an invoice id
- the bundle handler's first run is not flagged recurring
Verified passing both locally and inside the deployed workers container.
The 3 subscription-lifecycle events (invoice.paid, invoice.payment_failed,
customer.subscription.deleted) are now enabled on the live endpoint
we_1THBjyB46qMvF2jnYyN8IfkK (6 events total). Documents the unpinned-endpoint
api_version caveat (account default 2024-12-18.acacia, not the SDK's dahlia) and
why invoiceSubscriptionId() must read both invoice shapes. Notes that
charge.dispute.created / balance.available are handled in code but not yet
enabled on the endpoint.
The live Stripe webhook endpoint has NO pinned api_version, so it follows the
account default (currently 2024-12-18.acacia), which delivers the subscription
link as the top-level invoice.subscription. The code only read the new
2026-03-25.dahlia shape (invoice.parent.subscription_details.subscription), so
recurring renewal/payment-failed events would have returned a null subscription
id and silently failed to fulfill once the events were enabled.
invoiceSubscriptionId() now reads the modern shape first, then falls back to the
legacy top-level field. All other invoice fields used by the handlers
(amount_due, attempt_count, hosted_invoice_url, id) are stable across both
versions. +5 tests (legacy string/object, modern-preferred-over-legacy).
Convert OIG/SAM from one-time $299/yr to recurring $79/month (card+ACH only) -
the first real recurring-billing product in the system. Exclusion screening is
a *monthly* federal obligation, so recurring monitoring fits the requirement and
is the biggest valuation lever (vs a one-time annual run).
Catalog (single source of truth):
- service-catalog.ts: add billing_interval + allowed_methods to ComplianceService;
oig-sam-screening -> 7900c, billing_interval:"month", allowed_methods:[card,ach],
name "(Monthly Monitoring)".
- gen-service-catalog.py + check-service-catalog-drift.py: carry/guard the two new
fields; regenerate site catalog.
Checkout (api/src/routes/checkout.ts):
- mode:"subscription" with recurring price_data when billing_interval is set;
surcharge absorbed for recurring (clean $79/mo); server-side METHOD_NOT_ALLOWED
re-validation against allowed_methods.
- ensureColumns + migration 100: compliance_orders.stripe_subscription_id,
bundle_upsell_sent_at (+ subscription index).
Webhooks (api/src/routes/webhooks.ts):
- record stripe_subscription_id on checkout.session.completed (subscription mode).
- invoice.paid (subscription_cycle only) -> re-dispatch screening for the cycle;
invoice.payment_failed -> admin alert + first-failure customer nudge;
customer.subscription.deleted -> mark order cancelled. (API 2026-03-25 moved the
subscription link to invoice.parent.subscription_details.subscription.)
Fulfillment:
- job_server.py: pass recurring_cycle/invoice_id into the order.
- npi_provider.py: OIG handler labels renewal cycles "[Monthly cycle]" + re-screen
note; bundle action runs only the FIRST screening + flags the $79/mo upsell.
Bundle land-and-expand:
- Provider Compliance Bundle now includes only the first OIG/SAM screening (was
giving away $948/yr of monitoring inside an $899 bundle).
- new worker scripts/workers/bundle_upsell.py (+ pw-bundle-upsell timer): ~3 weeks
after a paid bundle, emails the customer to continue $79/mo monitoring; dedup via
bundle_upsell_sent_at; skips customers who already have an OIG/SAM order.
Surfaces updated to $79/mo: PaymentStep (filters methods, "Billed every month,
cancel anytime"), order pages, healthcare index, npi-compliance-check tool (also
fixed stale $699 bundle drift -> $899), hc_oig_screening + hc_compliance_bundle
emails.
Docs: billing.md gains a "Stripe-native Subscriptions" section + a reality-check
banner (Adyen/ERPNext-gateway model documented there is NOT live; Stripe is the
real rail). Fixed run-migrations.yml container name bug
(performancewest-postgres-1 -> performancewest-api-postgres-1, overridable).
Tests: api/tests/recurring-subscription.test.ts (28 assertions) covers catalog
gating, method validation, surcharge suppression, recurring line-item build,
invoiceSubscriptionId extraction, renewal-cycle gating. tsc clean; site build
clean; catalog drift OK.
Manual deploy step: enable invoice.paid, invoice.payment_failed,
customer.subscription.deleted on the Stripe webhook endpoint.
Email security gateways (Microsoft Defender Safe Links / ATP, Proofpoint,
Mimecast, Barracuda, etc.) auto-fetch and often render every link in a
campaign email to scan for malware. The advanced ones drive a real headless
browser, execute JS, and fire Umami pageviews/clicks that masquerade as human
visits -- inflating campaign click-through.
New site/public/js/pw-bot-filter.js queries multiple real-browser signals and
gates Umami via its official data-before-send hook (umamiBeforeSend), dropping
all events when the visitor is a bot. Signals (from empirical chromium probing):
decisive: navigator.webdriver, HeadlessChrome UA, known scanner UAs, zero/
collapsed screen|viewport|outer geometry, window LARGER than the
physical screen (impossible on real HW; uses outerW/H so page zoom
does not false-positive), software GPU rasterizer (SwiftShader/
llvmpipe/swrast via WebGL UNMASKED_RENDERER), zero logical CPUs.
soft (>=2 to trip): tiny screen, inner>screen, low color depth, empty
navigator.languages, no input device (no fine/coarse pointer + no
hover + 0 touch), no WebGL on a desktop UA.
Designed to FAIL OPEN: only strong/corroborated evidence suppresses, so real
visitors (incl. zoomed, privacy-tooled, remote-desktop, kiosk) still count.
Wired before the Umami tag in Base.astro (Astro pages) and all 86 static
public/**/*.html pages; both load with defer so order is guaranteed and the
hook is defined before Umami reads it.
Tested end-to-end with chromium (site/tests/bot-filter.test.sh, 4/4):
default headless-new, spoofed-Windows-UA + normal 1366x768 window, and
spoofed-UA + 1x1 window are all caught; hook returns null to drop the event.
Replaces the panic-era burner-domain verification plan with an in-house
automatic catch-all rollout in the trucking/IFTA/UCR builders. Root-cause
classification of the 75k pre-DKIM-fix bounces showed ~55% were reputation/
auth (now fixed by DKIM signing) and only ~29% genuinely-dead mailboxes;
catch-all domains accept at RCPT time so they do not user-unknown bounce at
send, making a controlled in-house bleed safer than warming a separate burner.
catch_all_enabled() adds catch-all results only when warmup_day >=
CAMPAIGN_CATCH_ALL_MIN_DAY (21) AND the recent 2-day live bounce rate is below
CAMPAIGN_CATCH_ALL_MAX_BOUNCE_PCT (8%) on a >=300-sent sample; auto-reverts to
the clean smtp_valid/send_confirmed pool on the next run if bounces spike.
Short window so a past disaster cannot block the rollout forever and a fresh
spike trips fast. CAMPAIGN_INCLUDE_CATCH_ALL=1/0 still hard-overrides.
USABLE_FILTER (static) -> usable_filter() (per-run, memoized, one DB probe).
IFTA/UCR SELECT_SQL -> _select_sql() so tc.usable_filter() resolves at call
time, not import. 13 logic unit tests pass; live dry-run decision = OFF
(day 15 < 21 and recent 2d bounce 42% from the aging-out Jun-16 disaster).
Reduce evasion optics that would draw FCC enforcement attention while keeping the
real value props:
- 'What they avoid by being Canadian' -> 'What the Canadian structure changes'
- Drop 'No US telecom taxes on invoices (15-40% saved)' -> Canadian tax treatment
on the Canadian entity's billing; 'No US FCC regulatory fees on the Canadian entity'
- '...avoid this by routing US traffic...' -> '...instead route US traffic through
US intermediaries who carry the 499-A obligation...'
- Add prominent 'Who this is for - and who it isn't' section: legitimate
conversational voice (UCaaS/PBX/business/residential/live-agent) yes;
short-duration/dialer/robocall-evasion no. States upstreams are fully
STIR/SHAKEN compliant and we don't onboard traffic designed to evade
caller-ID auth; notes Canadian carriers police ASR/ACD more strictly than
anywhere (a feature). HTML validated balanced.
Reframe away from 'escape the FCC' optics that would draw enforcement attention:
- Header/flagbar: 'Move your VoIP home to Canada' / 'US obligations ride on your
upstream' (was 'no FCC reporting, no USAC, no S/S to run')
- Recast claims to 'CRTC regulatory home, not FCC' and scope the no-USF/no-499/
no-RMD claims to the Canadian-jurisdiction traffic (accurate for US-number
traffic, which rides on the compliant US upstream)
- STIR/SHAKEN bullet now explicitly pro-compliance: 'we don't help anyone dodge
call-authentication; upstream partners are fully S/S compliant'
- Drop 'outside the FCC's reach'
- Add honest caveat: Canada is not for short-duration/dialer traffic; Canadian
carriers are more stringent on ACD/ASR than anywhere; this is for real
conversational voice (UCaaS/PBX/business/residential/live-agent)
Pivot from the hedge/second-entity framing to the consolidation pitch: one CRTC
carrier as the home base, nexus in Canada, customers onboarded from anywhere.
Lead value props with the three concrete reseller realities:
- No FCC reporting (no 499-A/Q, no RMD recert)
- No USAC/USF on your revenue (contribution sits upstream)
- No STIR/SHAKEN to set up or run (reseller can't get a US token; upstream signs)
Add: No FCC Section 214 / no ongoing 214 burden -- CRTC BITS is a cheap,
low-burden notification by comparison. Header/subject reworked; keeps the honest
US-termination + upstream-signing explanation.
Address the two most common objections truthfully (researched against CRTC,
FCC 2025 Third-Party Authentication Order, and STIR/SHAKEN cross-border docs):
- US-based long-distance termination operators routinely accept traffic from
Canadian carriers (cross-border voice is a standard interconnect).
- STIR/SHAKEN: a Canadian reseller cannot get a US SPC token (US-carrier-only),
so US-bound calls are signed by the upstream US-number provider that assigns
the DIDs -- exactly how most small US carriers already rely on upstream
signing. Canadian-origin traffic falls under the lighter CRTC regime, handled
by the upstream Canadian carrier. Does NOT claim S/S disappears -- it moves to
the upstream, off the carrier's day-to-day operation.
Address the obvious 'but I need US numbers' objection: several Canadian
wholesale carriers (Fibernetics, Iristel, VoIP.ms, Telnyx, Bandwidth, Twilio,
Frontier) provision US DIDs to CRTC-registered carriers, so they can keep
serving US customers from the Canadian entity. Adds a Canada-advantage bullet
and updates the guide block to call out both US + Canadian DIDs.
The FCC's 2025 Robocall Mitigation Order (47 CFR 64.1200(n)(4), FCC 25-6)
requires collecting + authenticating a government-issued photo ID for every
new customer before turning up voice service. Add it to the US-carrier burden
list and the matching 'does not apply in Canada' advantage.
- campaign_helpers.py: extract the branded Listmonk HTML helpers (hdr/flagbar/
stats/cta/footer/P/UL/etc.) + create_campaign() from create_campaigns.py into
a side-effect-free shared module; create_campaign() now takes an altbody so
every campaign ships a plaintext alternative (deliverability).
- create_crtc_usf_campaign.py: build the one-off CRTC email hooked on the Q3
2026 USF factor (38.8%, +1.8pts, eff Jul 1), with a $200-off CANADA200 banner
(expires Fri 23:59 ET, CTA links carry ?code= for auto-apply), the full US
carrier burden vs Canada advantage, BC/ON incorporation, and a hosted
carrier-guide PDF download. Creates a DRAFT only; sending stays manual.
- checkout.ts: generalize ensureCompliancePortalUser -> ensurePortalUser and
call it in the CRTC post-payment path so PayPal/crypto/webhook-confirmed CRTC
orders always get an ERPNext Customer + Website User (the single source of
truth for portal login/password), matching the compliance fix from the
PayPal incident. Also flip portal_user_created for canada_crtc/formation.
- canada-crtc.ts: enforce discount active+start/expiry windows, global usage
limit and applies_to scope server-side at checkout (was active-only), so a
promo like CANADA200 actually stops working after its expiry.
- scripts/generate_canada_carrier_guide_pdf.py: render the public Canadian
wholesale carrier/vendor guide PDF (reuses the canonical VENDORS list) to
site/public/guides/canada-carrier-guide.pdf for the CRTC campaign lead magnet.
The daily 40%-off coupon was being merged into every trucking/UCR/IFTA/OTC
send, but those discount sends were not actually being delivered (the
DKIM-broken window). Now that deliverability is fixed, re-test whether
normal-price offers convert before giving margin away.
New CAMPAIGN_ENABLE_COUPON env flag (default OFF) gates daily-coupon
minting in build_trucking_campaigns + the UCR/IFTA/OTC builders (which
import it as tc.COUPON_ENABLED). With it off, no code is minted and an
empty coupon_code is merged -> the campaign templates' existing
{{ if .Subscriber.Attribs.coupon_code }} guard falls through to the
normal-price {{ else }} branch and landing-page links carry no ?code=.
No template or DB changes; fully reversible (set CAMPAIGN_ENABLE_COUPON=1).
Verified: COUPON_ENABLED defaults False, coupon_attribs(None) -> empty,
lp_link drops ?code= when no coupon, all 4 builders compile.
SMTP2GO is no longer used: Listmonk relays through the local Postfix MTA
(172.18.0.1:25 from the Docker network), which DKIM-signs and delivers
direct-to-recipient-MX; transactional mail goes through Carbonio. Verified
zero smtp2go in any live container env + postfix has no external relayhost.
Removed the stale references so a rebuild/new dev can't re-introduce it:
- api/src/config.ts: SMTP_HOST default mail.smtp2go.com -> co.carrierone.com
- scripts/workers/crypto_payment_worker.py: same default fix
- infra/ansible all.yml: listmonk_smtp_* now 172.18.0.1:25, no auth (+comment)
- app.env.j2 / email.ts / crm.md / go-live-todo.md / architecture.svg: docs
All transactional/worker senders built multipart/alternative (or mixed)
messages with ONLY an HTML part. A single-part multipart/alternative is
malformed and HTML-only mail is a spam-score signal -- the same class of
deliverability bug that hurt the campaign pipeline, but on the telecom /
filing / customer-transactional path (499-Q reminders, RMD/FCC filing
review links, intake/completion/delivery emails, commissions, etc).
- worker_email.send_worker_email: auto-derive plaintext from HTML when
caller omits text= (fixes the shared helper for all current+future use)
- 16 rolled-their-own senders in scripts/workers/** + scripts/formation/
document_delivery.py: attach html_to_text(...) plaintext sibling before
the HTML part (job_server + document_delivery wrap text+html in an
alternative sub-part so PDFs still attach to the mixed root)
- api/src/email.ts: add dependency-free htmlToText() and default
sendEmail text to it (fixes checkout/webhook HTML-only sends)
Verified: all py files compile + import at runtime, api tsc passes,
htmlToText handles hrefs/lists/entities, 11 plaintext unit tests pass.
Telecom campaign 407 (Jun 8) was HTML-only + sent in the DKIM-broken
window -> 384 sent / 0 clicks (same junked-mail signature).
The FMCSA census was a one-time snapshot (last loaded ~May 30) with NO refresh
timer -- carriers newly falling out of MCS-150/UCR compliance were never picked
up. New scripts/workers/fmcsa_source_refresh.py orchestrates the full pipeline
(census download -> enrichment -> deficiency flag -> verify new emails ->
MX-tag new) and runs weekly via cron pw-fmcsa-refresh (Sun 09:00 UTC), codified
in the mail-pipeline Ansible role.
Idempotent + incremental: the census upsert preserves email_verified /
listmonk_sent_at / deficiency_flags, so existing carriers keep their send state
and only census fields refresh; new DOTs flow into verification then campaigns.
A carrier who refiled gets a fresh mcs150_parsed, so the builder's overdue
WHERE clause stops targeting them automatically. Verify is capped per run
(20k) so it never stalls on millions of rows.
(Healthcare already auto-catches newly-revalidation-overdue providers within
its 63k institutional pool via pw-hc-refresh Mon/Wed/Fri.)
The anchor regex only matched quoted hrefs; unquoted (href=URL) dropped the
URL from the plaintext part. Now handles double/single/unquoted. Added
scripts/test_email_plaintext.py (11 cases: link forms, mailto, template-tag
preservation, tag stripping, entity unescape, blank-line collapse).
The entire outbound campaign pipeline lived ONLY on the host and was never in
IaC -- a fresh rebuild would have silently shipped NO campaigns, NO IP warmup/
ramp, and NO bounce processing. New mail-pipeline role + deploy-mail-pipeline.yml
playbook deploy it from the canonical repo copies:
cron.d (infra/cron/):
- pw-trucking-campaign-builder, pw-ifta-campaign, pw-ucr-campaign
- pw-hc-campaign, pw-hc-nppes, pw-hc-refresh
- pw-mta-warmup, pw-listmonk-rampcap, pw-hc-rampcap
- pw-ip-rehab, pw-warmup-tg-alert
helper scripts (-> /usr/local/bin):
- pw-mta-warmup, pw-listmonk-rampcap, pw-hc-rampcap, pw-warmup-tg-alert
- postfix-bounce-notify.sh, postfix-hc-bounce-notify.sh, listmonk-bounce-sync.py
systemd services:
- pw-bounce-watcher.service (was missing from repo), pw-hc-bounce-watcher.service
Also creates the deploy-owned {{project_dir}}/logs dir (deploy can't write
/var/log, so a missing dir made cron redirects fail). Added the 6 cron.d files
that existed only on the host, the trucking bounce-watcher unit, and synced
infra/cron/pw-hc-refresh to the live version (revalidation download + enrich
steps). Role wired into site.yml after the mail (OpenDKIM) role.
Part of the email-deliverability incident hardening.
Added DEAD_ISP_DOMAINS (52 domains) to BLOCKED_EMAIL_DOMAINS, so every
campaign builder that imports the shared exclusions (trucking, UCR, IFTA via
create_and_schedule_campaign, and the healthcare importer) stops cold-mailing
them. Domains were identified from our own Listmonk bounce table (top bounced
recipient domains) cross-checked against ISP status: defunct dial-up brands
(earthlink, netzero, juno, mindspring...), Qwest/Embarq legacy, satellite
(hughes, wildblue, dishmail), Altice/Suddenlink rural, WOW!/Knology, small
rural ISPs (windstream, tds, iowatelecom...) and Alaska regional.
Deliberately keeps still-active large consumer ISPs (comcast/charter/cox/
centurylink) -- their bounces were the cold-IP/no-DKIM reputation problem
(now fixed), not dead mailboxes, and they carry real prospects.
Part of the email-deliverability incident hardening.
Two deliverability hardening fixes from the email audit:
1. Plaintext (altbody): all campaigns were HTML-only. Listmonk only emits
multipart/alternative when altbody is set, and HTML-only bulk mail is a
spam-score signal. New scripts/_email_plaintext.py renders a readable
text/plain part from the HTML body (dependency-free; preserves Listmonk
{{ .Subscriber }}/{{ UnsubscribeURL }} template tags, turns links into
'text (url)'). Wired into the trucking builder (and thus UCR + IFTA, which
reuse create_and_schedule_campaign) and the healthcare builder.
2. Stable container hostname: Listmonk derived its Message-ID from the random
docker container id -> @localhost.localdomain (spam-score signal). Pin both
listmonk + listmonk-hc hostname to perfwest.performancewest.net, matching
Listmonk's SMTP hello_hostname.
Part of the email-deliverability incident hardening.
mail.log had no logrotate rule and grew unbounded to ~1GB (~150MB/day)
since Jun 8. This host logs via Postfix's built-in postlogd (maillog_file
mode), not rsyslog (no rsyslog.service exists), so postlogd holds the file
open -- a plain rename+create would leave it writing to the stale inode.
Use copytruncate (no daemon signal needed). Rotate daily, keep 14 days
compressed. Applied live: forced first rotation, compressed the 1GB
archive (->99MB), verified logging + bounce watchers + DKIM signing intact.
Part of the email-deliverability incident hardening (follows DKIM fix 4d59019).
Root cause of the Jun 2026 deliverability collapse / 'no new sales':
opendkim.conf was in single-key mode with no InternalHosts, so it signed only
127.0.0.1. Transactional/cron mail (injected locally) was signed, but ALL
campaign mail -- injected over the Docker bridge from the Listmonk containers
(172.18.0.5 trucking, 172.18.0.25 healthcare) -- went out UNSIGNED. Gmail/Yahoo
require DKIM on bulk mail since Feb 2024, so cold campaigns were junked/blocked
(~23% delivery, 550-5.7.1). Proof: 2,620 campaign msgs that day, 0 DKIM sigs.
The correct table files already existed on the server but were never wired into
opendkim.conf. Fix points the daemon at key.table/signing.table and sets
InternalHosts/ExternalIgnoreList to trusted.hosts (which includes 172.16.0.0/12,
the Docker subnet). Fixes BOTH streams: HC submission ports 2526-2528 inherit
the global smtpd_milters and *@performancewest.net covers compliance@.
Verified by injecting from a Docker IP through port 25 and port 2526 -- both now
get 'DKIM-Signature field added'. Codified as new Ansible role 'mail' so it
can't silently regress (OpenDKIM was previously not in IaC at all).
Customer portal login previously checked a bcrypt customers.password_hash
in Postgres, while portal.performancewest.net validated against ERPNext —
two stores that drifted (the Paul Wilson lockout). Consolidate on ERPNext:
- erpnext-client: add verifyWebsiteUserPassword() — delegates the credential
check to Frappe /api/method/login (Host header = site name; 200=ok,401=bad).
- portal-auth /login: verify against ERPNext, then mint the pw_customer cookie.
- portal-auth /register: create+set the ERPNext password (authority) and upsert
a password-less customers profile row; takeover guard still honors any legacy
PG password until the column is dropped.
- portal-auth /reset-password + /forgot-password: write the new password to
ERPNext; forgot-password now also works for ERPNext-only users (creates the
PG profile row on demand).
- Legacy customers with only a PG bcrypt password reset via forgot-password.
- checkout: refresh the stale comment (customers row is now a profile, no pw).
Build + typecheck green.
Root cause of recurring 'Password not found for Email Account Performance West
Outgoing': the account was shipped as a fixture with awaiting_password=1 and no
password. Email Account SMTP passwords are encrypted per-site and cannot live in
a fixture, so every `bench migrate` reimported the fixture and re-broke
outgoing mail (login notifications, password resets, welcome emails).
- Remove the Email Account fixture (it cannot carry the encrypted secret).
- Add email_account_sync.sync_outgoing_password: idempotent, exception-safe
upsert that reconciles the account + password from SMTP_* env and clears
awaiting_password.
- Wire it to after_migrate (repairs at end of every deploy/migrate, right after
fixtures import) and the daily scheduler (heals out-of-band restore/restart
drift).
- Pass SMTP_* into the erpnext + erpnext-scheduler containers so the sync has
the secret (they previously had no SMTP env).