Two routing bugs that sent carriers to wrong/dead order pages:
1. MCS-150 + Inactive campaigns linked to /order/dot-full-compliance ($399)
instead of their actual service: build_lp_link()/lp_slug_for() fell through
to the dot-full-compliance catch-all for any campaign_type not in
DEFICIENCY_SEGMENTS, ignoring the existing PRICE_SLUG_BY_CAMPAIGN map. So
MCS-150 carriers (should be mcs150-update $79) and Inactive carriers (should
be usdot-reactivation $149) were both quoted a 5x-priced bundle they never
asked for — a severe conversion killer on the two highest-volume segments.
Fix: lp_slug_for() now checks PRICE_SLUG_BY_CAMPAIGN first; build_lp_link()
delegates to it (single source of truth).
2. IFTA-quarterly + UCR-annual builders set lp_link to a BARE path when no
coupon was active (LP_LINK with no query). The body appends '&utm_source=...'
so the CTA rendered as /order/ifta-quarterly&utm... (no '?') = 404. Fix:
both now always emit a leading '?' query carrying ?dot= (and ?code= when a
coupon is on), mirroring the main builder's lp_link_with_coupon().
Audited every campaign_type: all 14 order slugs now resolve 200 and match the
intended service/price. Compliance-check secondary links (/tools/dot-compliance-
check) verified correct and intentionally kept where a 'check status' CTA fits.
Root cause of the order-CTA 404s recurring after the prior live fix: the
builder clones email bodies from STORED Listmonk source campaigns (ids
186/188/271-274/309/310/469/473), not from the edited source files. Those
stored bodies still carried @TrackLink on the per-subscriber order CTA, so
every nightly build re-registered a single static /order/<slug>&utm... link
(no '?') that 404s for every recipient. This morning's 3,000 real sends AND
the owner spot-check both went out with dead order links.
Two durable guards:
1. get_base_campaign() now strips @TrackLink from any cloned body (with a
warning), so a stale/re-edited source campaign can never reach recipients
broken again. Human clicks are already attributed via Umami.
2. The owner test-send now builds the CTA via lp_link_with_coupon(dot=...)
(leading '?') instead of build_lp_link() (bare path).
Also fixed live: stripped @TrackLink from the 10 stored source campaign
bodies; rewrote the 12 already-registered broken links. Backups in listmonk:
pw_source_tracklink_bak_20260623 + pw_links_tracklink_bak_20260623.
Listmonk @TrackLink registers ONE static URL per tracked link and points
every recipient's /link/<uuid> redirect at it. On per-subscriber hrefs
({{ lp_link }}, ?dot=, ?npi=, ?clia=) this is doubly broken:
- the registered links.url was captured before the {{ lp_link }} token
rendered, yielding /order/slug&utm_source=... (first &, no ?) -> 404
- even when valid it collapses every carrier/provider onto the first
subscriber's dot/npi/clia value
Real human clicks are already tracked via Umami campaign-click (bot
filtered), so Listmonk link tracking here is redundant and destructive.
Stripped @TrackLink from per-subscriber CTAs:
- scripts/create_deficiency_source_campaigns.py (_cta, _dot_check_cta)
- data/trucking_campaigns/{ucr,ifta}_*.html
- data/hc_campaigns/*.html (10 templates)
Static CTAs (e.g. CRTC ?code= order link) keep @TrackLink (safe).
Live fix to the 10 broken registered links.url rows applied separately
(first & -> ?), backup in listmonk.pw_links_dkim_fix_bak_20260622.
Docs: new runbook incident section + corrected the disproven
'use @TrackLink on all CTAs' guidance in fmcsa/hc plans.
After the Jun 2026 no-DKIM incident (campaign mail went out unsigned ->
junked/blocked, ~23% delivery), DKIM is fixed and we must re-send to the
now-signed audience. The builder previously held Google AND Microsoft AND
consumer-MX out until warmup day 30; that blocks the re-send of the Microsoft-
hosted business domains that are most of the list.
Add MAIN_EXCLUDE_OPERATORS (comma-separated mx_provider labels) to override
WARMUP_EXCLUDE_OPERATORS. Set it to 'google' in the workers env so we send to
everything EXCEPT Google's consumer inboxes (still recovering reputation),
including Microsoft/Hotmail. Drives both the SQL exclude and the per-operator
daily cap consistently. Unset => prior default; '' => exclude nobody.
Listmonk applies campaign headers as `for hdr,val := range set { h.Add(hdr,val) }`
(internal/manager/manager.go v6.1.0): each map's KEY is the literal header name.
The trucking/CRTC/deficiency builders wrote {"name":"Reply-To","value":..} (and
{"key":..,"value":..}), which emits junk `name:`/`value:` headers and NO real
Reply-To, so replies fell back to the From address (noreply@send.performancewest.net)
instead of info@performancewest.net. HC builder already used the correct
{"Reply-To": value} shape; match it everywhere. Verified against listmonk source.
Impact: outbound only; no customer replies were lost (noreply@ is a real mailbox),
but reply UX pointed at a no-reply address. Live campaign headers re-patched separately.
CAMPAIGN_COUPON_AB_PCTS="20,30,0" now means 20% / 30% / full-price. The 0 arm
mints no code; pick_coupon_for_email returns ("","") so it renders identically
to a normal-price send, while carriers are still deterministically hash-bucketed
into it (re-hash a converter's email to recover their arm). Even ~33/33/33 split
incl. the control verified over 30k. Adds test_full_price_control_arm; 8/8 pass.
Two correctness fixes that gate enabling the coupon test:
1. On-the-fly pricing. The coupon block hardcoded '$79 $47' (only true at 40%
off) — a false claim on the 20/30% arms. Now build_trucking_campaigns.py
reads api/src/service-catalog.ts (same source checkout uses) and computes
coupon_price_full / coupon_price_deal per recipient as full - round(full*pct/100),
exactly matching the server. Service-fee-only; non-discountable services
(boc3-filing passthrough) get NO price and fall back to percent-only copy.
Quotes the service the email is ABOUT (mcs150 $79, reactivation $149), not the
bundle the CTA happens to link to. service-catalog.ts now ships in the worker
image; helper degrades to percent-only if it can't be read.
2. CTA URL bug (likely a big driver of the zero-click problem). Main campaign
CTAs render '/order/slug&utm_source=...' (no '?') -> HTTP 404, verified live.
Deficiency CTAs would double-'?' once a coupon added '?code='. lp_link now
owns the query (?dot=...&code=...) so every template appends with a leading
'&' and is valid in all 4 states (main/deficiency x coupon on/off), verified
against live URLs returning 200.
Deficiency _deal_box now shows real was/now prices (percent-only for boc3).
Tests: 7/7 pass (adds URL-wellformed + price-matches-checkout cases).
- CAMPAIGN_COUPON_AB_PCTS="20,30,40" mints one daily code per arm; each
carrier is bucketed by a stable sha256(email) hash so the split is even
(~33/33/33 verified over 30k) and stable across re-sends (no arm-hopping).
- Each arm's code stores its own percent in discount_codes, so the advertised
discount always matches what checkout applies; redemptions are countable per
code (marker campaign-daily:<date>:<pct>).
- Empty/unset keeps legacy single-arm behavior (COUPON_PCT, legacy marker).
- coupon_attribs() now takes per-recipient pct.
- Tests: scripts/tests/test_coupon_ab.py (5 pass). SpamAssassin: both main
campaigns (186/188) score 0.0 HAM across all 3 arms, coupon block renders
clean; harness saved for re-runs.
Found during a bug-review pass of the one-email-per-provider work:
1. assign_all overwrite bug: an email on MULTIPLE rows (shared practice inbox /
multiple NPIs -- 2,592 such emails, 299 with mixed status) was assigned by
the LAST row, so a less-urgent row could clobber an urgent one (overdue ->
free check). Now keeps the most-urgent (lowest-priority) assignment.
2. warm_segment double-import + wrong-row render: all of an email's rows passed
the candidate filter, so it could be imported twice (over-counting the slice)
and attribs_for could render a sibling row's blank due-date in the overdue
email. Now requires row_matches(seg) for the specific row AND dedupes by
email (one row per email).
3. free-check email rendered broken text ('last updated on -- about years
ago', 'Last updated . ~ yrs ago') for any provider whose NPPES date isn't
cached yet (the free check goes to everyone, and the fill is gradual). Wrapped
the example sentence + official-record card in listmonk {{ if
.nppes_last_updated }}...{{ else }}...{{ end }}; added a date-free else
branch. altbody keeps the conditionals (listmonk evaluates body+altbody), and
the test/preview renderer gained a minimal {{ if/else/end }} evaluator so
previews match real sends. Verified both branches render with zero unfilled
tokens.
4. cross-cron double-send: pw-hc-campaign (warmup file) and pw-hc-nppes (63k
file) share state but tracked imports per-segment; 312 emails overlap both
files, so a provider could get an urgent email from one cron AND the free
check from the other. Added load_all_imported() global guard (union of all
segment state) so each provider gets exactly one healthcare email overall.
All verified: assignment regression test (10 cases) + new dup-email/guard checks
pass; all 6 templates render clean.
Make the free NPI compliance check the catch-all for ALL verified institutional
providers, but route anyone with a more important/time-sensitive issue to THAT
email instead -- each provider gets exactly one email, their most urgent.
- SEGMENTS gain a 'priority' (lower=more urgent): reactivation 10, revalidation
overdue 20, due-soon 30, bundle 45, free-NPI-check 100 (catch-all).
- assign_segment()/assign_all(): route each provider to the single
highest-priority active segment whose selector matches; warm_segment() takes
the assignment map and only claims its assigned providers (disjoint pools, no
double-mailing). main() now splits the daily slice by priority order, serving
urgent segments fully before the broad free-check consumes the remainder.
- nppes_outdated selector -> 'institutional_default' (every verified, non-
deactivated row), since the free check's value no longer depends on staleness;
list/campaign renamed 'HC Warmup - Free NPI Check'.
- FIX latent bug: reactivation selector treated 'not on CMS reval list' as
deactivated -- false for org NPIs (would mis-tell active practices they're
deactivated). Now uses the REAL nppes_deactivated flag (or OIG/SAM exclusion).
- Drop blanket oig_screening from the default rotation: it matched every row and
would starve the catch-all, and the free check already screens OIG/SAM and
routes to the paid fix on a hit. Still runnable via --segments.
- Add scripts/test_segment_assignment.py (10 cases incl. 'overdue AND stale ->
overdue wins'); all pass.
Pivot the weakest healthcare email from an 'your record is out of date -> buy an
update' sell into a free, value-first compliance check (the funnel already
exists: /tools/npi-compliance-check + /api/v1/npi/lookup run 5 live gov checks --
NPI status, Medicare revalidation, OIG/SAM exclusions, NPPES freshness -- and
deep-link to the right paid fix).
- Subject: 'A free compliance check for your NPI' (was 'may be out of date').
- Header: 'Free NPI Compliance Check' covering NPPES/revalidation/exclusions/NPI.
- Body: keep the REAL last_updated date as a credibility hook ('we pulled your
public records'), but frame it honestly ('that's usually fine') and pivot to
the broader free check. Adds a 4-item 'your free check covers' card.
- CTA now -> /tools/npi-compliance-check?npi={npi} (prefills + auto-runs their
own check on landing) with @TrackLink + UTM; dropped the straight-to-order
NPPES CTA and the redundant 'look up on NPPES' button.
- Reassurance reframed to free-first ('the check is completely free; a fix is
optional, flat-fee'). cta_path updated in the segment registry.
- Verified: render + plaintext + headless screenshot, CTA tracked, no stray
order link, zero unfilled tokens.
- Add NPPES_STALE_MAX_YEARS (default 10): a record untouched for many years is
a stronger signal the practice closed/moved, and a bounce burns the warming
IP. Observed institutional distribution clusters 3-7yrs with ~0 beyond 8, so
10 is a safe ceiling that mails the whole real pool while excluding any
outlier ancient record. MIN stays 3 (keeps the 'out of date' claim credible).
- Restore the SMTP-verification gate (verify_ok) that the shared
institutional_verified selector had -- the swap to nppes_stale dropped it; we
only mail inboxes we already proved live.
- enrich: process the re-fetch queue STALEST-FIRST so a bounded (--limit) or
--max-age refresh spends its budget on the most-overdue cache entries (and new
NPIs) first, never starving them behind merely-aging ones.
- Selector unit-tested (10 cases incl. window edges, verify gate, deactivated).
The NPPES 'may be out of date' email previously asserted staleness with no
per-record evidence (softened earlier to a generic 'periodic review required').
NPPES is fully public and every record carries basic.last_updated, so we now
cite the actual government date the provider can verify on the registry.
- enrich_nppes_last_updated.py: joins real basic.last_updated /
enumeration_date / deactivated onto the institutional list via a cached,
resumable per-NPI crawl (no batch endpoint exists). Adds nppes_last_updated,
nppes_enumeration, nppes_years_stale, nppes_deactivated.
- cron: new 'nppes_stale' selector mails ONLY records >= 3yrs stale (env
HC_NPPES_STALE_MIN_YEARS) and excludes deactivated NPIs; empty date => no
match, so we never claim staleness without the government date to back it.
- template: headline + official-record card now show the real last_updated
date and ~N-years-ago, sourced to npiregistry.cms.hhs.gov.
- attribs + test SAMPLE expose the new fields; verified render + plaintext.
Diagnosing zero healthcare sales (11k sent, 5479 opens, 0 clicks, 0 orders).
Root cause of clicks=0: Listmonk only registers a link for tracking when the
href ends with the literal @TrackLink marker; all 10 hc templates lacked it
(trucking/CRTC have it). So the entire funnel was unmeasurable below 'open'.
Changes:
- Click tracking: append @TrackLink + UTM to every /order/ CTA across all 10
templates (external gov self-verify links left untracked on purpose).
- Remove all service prices from emails (99/49/49/99yr/9mo). Price is
now revealed on the order page after value is established; catalog
(api/src/service-catalog.ts) stays source of truth. Kept the 0,000 OIG
penalty stat (regulatory fact, not our price). Added a neutral 'flat fee shown
up front' reassurance block where the fee table used to be.
- Compliance/honesty: the nppes_outdated email asserted a per-record
'FLAGGED OUT OF DATE / detected' status, but its selector only checks
deliverability and the data has no NPPES last-updated field -> unsubstantiated
for every recipient. Reframed to a generally-true periodic-attestation message
('PERIODIC REVIEW REQUIRED', 'most practices drift out of date'). Same hedging
applied to npi_reactivation ('may be deactivated ... confirm on official
sources'). Substantiated reval 'past due' claims (backed by the public CMS
Revalidation list) were kept.
- Fixed stale $299 OIG metadata in build script -> $79/mo (reference only).
Docs: docs/healthcare-competitive-pricing.md (benchmark research) and
docs/healthcare-email-compliance-review.md (CAN-SPAM / FTC / impersonation pass;
flags SOC2/HIPAA/PCI badge claims for owner confirmation).
Verified headless: all 10 render with 0 JS errors, exactly 1 tracked CTA each,
no price leaks.
Completes the MX-exclusion plan. Untagged carriers can't be excluded (the big-MX
gate is MX-based, so an unresolved Google/Yahoo domain would slip through), and
were previously UNCAPPED in select_sendable_carriers -- a flood of freshly-imported,
never-resolved domains could dominate a run before pw-mx-tag resolves them.
Added a single shared untagged_cap (env MAIN_UNTAGGED_MX_CAP, default max(quota,200))
so untagged sends are bounded without starving the pool: at the default the bucket
can still fill an entire run's quota (no behavior change today), but the cap can be
tightened to a fraction once pw-mx-tag has drained the backlog -- which is fast,
since only ~3,035 distinct *verified-sendable* untagged domains remain (< one
20k/day tag run). Tagged carriers keep their per-operator caps unchanged.
Verified: compiles; cap logic never starves at default, enforces the limit when
set lower.
Fix 1 (build_trucking_campaigns.py): the warmup big-MX exclusion only covered the
clean-label operators (google/microsoft/proofpoint/...). Consumer mailbox
operators that mx_tag_carriers.py labels with an "mx:" prefix slipped BOTH the
exclusion and the per-MX throttle -- notably mx:yahoodns.net (283k sendable
carriers = Yahoo Small Business/AOL custom domains) and mx:icloud.com (25k), plus
comcast/charter/centurylink/windstream/tds/earthlink. These are custom domains
whose MX points at a consumer provider, invisible to the literal-domain blocklist.
Added CONSUMER_MX_OPERATORS, folded into WARMUP_EXCLUDE_OPERATORS used by both the
fetch_carriers() exclusion SQL and mx_daily_caps() (same day-30 ramp). Behind the
existing MAIN_SKIP_BIG_MX switch.
Validated read-only: after the fix the warmup-eligible pool is 353,909 carriers
(315,892 untagged + ~38k genuinely small/self-hosted operators), so the long tail
still sustains the daily quota -- not starved -- while 0 consumer-MX carriers are
selected during warmup.
Fix 3 (infra/cron/pw-mx-tag): mx_tag_carriers.py was on no cron, so the untagged
(NULL) backlog (~316k) never drained and new FMCSA imports stayed untagged,
slowly re-opening the gap. Added a daily 05:45 UTC cron (--only-unsent
--limit-domains 20000), before the 08:00 builder. Idempotent/bounded (only tags
mx_provider IS NULL). Verified live: a 200-domain test run tagged 216 domains.
(Fix 2 -- bounding the NULL bucket cap -- deferred; the cron will drain it.)
Campaign 509 (CRTC USF Q3, 4,156 sent) shipped with raw <a href> URLs, so
Listmonk never registered the links and recorded ZERO clicks -- even though
Umami logged the real order-page visits AND a carrier phoned in after clicking.
Same mistake docs/fmcsa-trucking-plan.md already flagged ("Use @TrackLink on all
CTAs"); the trucking campaigns do it, the CRTC one didn't.
Listmonk only tracks a link when its href ends with the literal @TrackLink marker
(it strips it and rewrites through lists.performancewest.net/link/). Added a
_track() helper that appends UTM params (so Umami attributes the visit too) +
@TrackLink, applied to both the primary order CTA and the guide-PDF download.
The running campaign 509's body was also patched live in the DB (same two links)
so its remaining sends record clicks. Future CRTC campaigns get it from source.
First live ingest (28 reports) showed our warmup rotation pool (.91-.109, out0x)
mislabeled EXTERNAL because OUR_IPS only listed 4 specific IPs -- every one was
100% DMARC-passing, clearly ours, and would have generated false spoofing alerts.
Replace the literal-IP set with an ipaddress subnet check on 207.174.124.0/24
(our whole block). The only genuinely-external failing sender is 35.174.145.124
(AWS, 32 msgs spoofing us, SPF-fail/no-DKIM, all correctly rejected by p=reject) --
exactly the signal the --alert path is meant to surface.
Tool 2 of the deliverability monitoring pair (Tool 1 = mail_reputation_monitor).
DMARC rua reports from dozens of operators (Google, Yahoo, Comcast, Cox, Bell,
Mimecast, Cisco ESA, GMX, mail.com, ...) were landing in ops@ (dmarc@ was a DL),
burying real mail and never parsed. Now ingested + queryable:
- dmarc@performancewest.net converted DL -> dedicated Carbonio mailbox; isolated
IMAP creds in server .env, surfaced to workers in docker-compose.yml (mirrors
OPS_IMAP_*). 29 historical reports moved ops@ -> dmarc@ via IMAP.
- scripts/dmarc_report_parser.py: IMAP fetch unseen -> decompress .gz/.zip/.xml
(namespace-agnostic: classic + urn:ietf:params:xml:ns:dmarc-2.0 GMX/mail.com) ->
parse aggregate XML -> upsert dmarc_report (keyed (org_name,report_id), no-op on
re-parse) + dmarc_record per source IP. dmarc_pass = dkim_aligned OR spf_aligned.
Marks \Seen. --dry-run/--all/--alert (7d per-IP summary + Telegram if one of OUR
IPs <95% pass, or EXTERNAL IP sends >=20 failing msgs as us = spoofing under
p=reject). psycopg2 imported lazily so --dry-run runs without the driver.
- api/migrations/102_dmarc_aggregate.sql: dmarc_report + dmarc_record tables.
- infra/cron/pw-dmarc-parser: 06:20 UTC daily --alert (after reputation, before scrub).
- docs/deliverability.md: DMARC section DONE; query examples.
Verified: dry-run --all parses all 28 reports (1 non-report test probe), 0 unknown
after the namespace fix.
Adds scripts/mail_reputation_monitor.py + migration 101 (mail_reputation_daily).
Sender reputation is judged by the RECEIVING operator (Microsoft/Google/Yahoo/
Proofpoint), and the provider portals (SNDS/Postmaster/CFL) need a login and lag
24-48h. Our postfix logs already carry the ground truth in real time: every send
records the receiving host + SMTP response, and the response classifies WHY:
250 -> accepted
451 4.7.500 -> throttled (Microsoft rate-limiting a cold IP)
550 5.7.x -> reject_reputation (spam/reputation)
550 5.1.1/5.4.1-> reject_recipient (dead mailbox / access denied = list hygiene)
550 ...SPAM -> reject_content (SpamAssassin)
The parser classifies each egress delivery (out0x/hcout/relay) by (sending_ip,
receiver, outcome, reason_code) and upserts ONE daily aggregate row per bucket
(idempotent ON CONFLICT), so a nightly cron over the rotated log gives a queryable
trend without re-parse double-counting. --alert prints a per-operator summary and
Telegram-alerts on regressions (>=10% reputation rejects, or Microsoft >=70%
throttled). Reads stdin ("-") so the host-owned /var/log/mail.log can be piped
into the DB-connected workers container.
Motivation: 2026-06-19 audit found ~80% of Microsoft sends were getting 451 4.7.500
throttles on the warming IPs -- this makes that trend visible as reputation recovers.
The fmcsa campaign builders already exclude gmail/yahoo/microsoft/etc. from NEW
audience selections, but two reputation leaks remained on the LIST-BASED side:
1. iCloud/Apple gap. icloud.com/me.com/mac.com were never in the exclusion set.
A 2026-06 Listmonk audit found 1,321 ENABLED iCloud subscribers on list 3
("FCC Carriers - Direct Contacts") -- the single largest enabled-consumer
bucket -- being cold-blasted with no exclusion at all. Add APPLE_CONSUMER_DOMAINS.
2. Stale already-imported consumer subs. List-based campaigns (e.g. the running
CRTC/USF blast on list 3) keep hitting consumer addresses imported BEFORE the
relevant domain joined the exclusion list. gmail.com was still the #1 bounce
domain via that campaign even though new selections exclude it. Add
scrub_listmonk_consumer.py: reconciles the live Listmonk subscriber table
against the authoritative exclusion list and blocklists any ENABLED subscriber
whose address is_blocked(). Idempotent; re-run whenever the exclusion grows so
it applies retroactively. Uses the same 'blocklisted' terminal state as the
bounce handler, so contacts are excluded from all current/future campaigns
without deleting history. Supports --dry-run and both listmonk / listmonk_hc.
Isolates bulk sending reputation onto a dedicated subdomain so the root domain
stays clean for transactional/verification mail (and recovers faster). Replies
still go to the root domain via Reply-To, so the customer-facing reply experience
is unchanged.
- build_trucking_campaigns.py: add env-overridable FROM_EMAIL
(noreply@send.performancewest.net); use it for both scheduled + test sends
instead of inheriting base["from_email"] from the DB base campaign.
- build_healthcare_campaigns_cron.py: FROM_EMAIL ->
compliance@send.performancewest.net (env-overridable).
- bounce-watcher.sh / hc-bounce-watcher.sh: track the new subdomain envelope
sender (keep legacy root-domain sender so the pre-cutover queue still drains;
HC also tracks by hcout transport regardless of sender).
Infra already live (separate, non-code): subdomain DNS (A/MX/SPF/DKIM
selector=send/DMARC p=reject) on the Hestia master, OpenDKIM signs
d=send.performancewest.net (verified end-to-end), egress .94/.107. Root SPF
trimmed to the real IPs; pointless IP-rehab cron disabled.
Runs the real _BaseNPIHandler.handle() with _create_todo monkeypatched (no DB /
ERPNext / email side effects) and asserts:
- first OIG/SAM screening has no [Monthly cycle] prefix / RECURRING banner
- a recurring_cycle order gets the [Monthly cycle] title prefix, the
"RECURRING MONTHLY CYCLE" banner, the invoice id, and the re-run-against-
CURRENT-data + issue-NEW-certificate instructions
- recurring_cycle works with and without an invoice id
- the bundle handler's first run is not flagged recurring
Verified passing both locally and inside the deployed workers container.
Convert OIG/SAM from one-time $299/yr to recurring $79/month (card+ACH only) -
the first real recurring-billing product in the system. Exclusion screening is
a *monthly* federal obligation, so recurring monitoring fits the requirement and
is the biggest valuation lever (vs a one-time annual run).
Catalog (single source of truth):
- service-catalog.ts: add billing_interval + allowed_methods to ComplianceService;
oig-sam-screening -> 7900c, billing_interval:"month", allowed_methods:[card,ach],
name "(Monthly Monitoring)".
- gen-service-catalog.py + check-service-catalog-drift.py: carry/guard the two new
fields; regenerate site catalog.
Checkout (api/src/routes/checkout.ts):
- mode:"subscription" with recurring price_data when billing_interval is set;
surcharge absorbed for recurring (clean $79/mo); server-side METHOD_NOT_ALLOWED
re-validation against allowed_methods.
- ensureColumns + migration 100: compliance_orders.stripe_subscription_id,
bundle_upsell_sent_at (+ subscription index).
Webhooks (api/src/routes/webhooks.ts):
- record stripe_subscription_id on checkout.session.completed (subscription mode).
- invoice.paid (subscription_cycle only) -> re-dispatch screening for the cycle;
invoice.payment_failed -> admin alert + first-failure customer nudge;
customer.subscription.deleted -> mark order cancelled. (API 2026-03-25 moved the
subscription link to invoice.parent.subscription_details.subscription.)
Fulfillment:
- job_server.py: pass recurring_cycle/invoice_id into the order.
- npi_provider.py: OIG handler labels renewal cycles "[Monthly cycle]" + re-screen
note; bundle action runs only the FIRST screening + flags the $79/mo upsell.
Bundle land-and-expand:
- Provider Compliance Bundle now includes only the first OIG/SAM screening (was
giving away $948/yr of monitoring inside an $899 bundle).
- new worker scripts/workers/bundle_upsell.py (+ pw-bundle-upsell timer): ~3 weeks
after a paid bundle, emails the customer to continue $79/mo monitoring; dedup via
bundle_upsell_sent_at; skips customers who already have an OIG/SAM order.
Surfaces updated to $79/mo: PaymentStep (filters methods, "Billed every month,
cancel anytime"), order pages, healthcare index, npi-compliance-check tool (also
fixed stale $699 bundle drift -> $899), hc_oig_screening + hc_compliance_bundle
emails.
Docs: billing.md gains a "Stripe-native Subscriptions" section + a reality-check
banner (Adyen/ERPNext-gateway model documented there is NOT live; Stripe is the
real rail). Fixed run-migrations.yml container name bug
(performancewest-postgres-1 -> performancewest-api-postgres-1, overridable).
Tests: api/tests/recurring-subscription.test.ts (28 assertions) covers catalog
gating, method validation, surcharge suppression, recurring line-item build,
invoiceSubscriptionId extraction, renewal-cycle gating. tsc clean; site build
clean; catalog drift OK.
Manual deploy step: enable invoice.paid, invoice.payment_failed,
customer.subscription.deleted on the Stripe webhook endpoint.
Replaces the panic-era burner-domain verification plan with an in-house
automatic catch-all rollout in the trucking/IFTA/UCR builders. Root-cause
classification of the 75k pre-DKIM-fix bounces showed ~55% were reputation/
auth (now fixed by DKIM signing) and only ~29% genuinely-dead mailboxes;
catch-all domains accept at RCPT time so they do not user-unknown bounce at
send, making a controlled in-house bleed safer than warming a separate burner.
catch_all_enabled() adds catch-all results only when warmup_day >=
CAMPAIGN_CATCH_ALL_MIN_DAY (21) AND the recent 2-day live bounce rate is below
CAMPAIGN_CATCH_ALL_MAX_BOUNCE_PCT (8%) on a >=300-sent sample; auto-reverts to
the clean smtp_valid/send_confirmed pool on the next run if bounces spike.
Short window so a past disaster cannot block the rollout forever and a fresh
spike trips fast. CAMPAIGN_INCLUDE_CATCH_ALL=1/0 still hard-overrides.
USABLE_FILTER (static) -> usable_filter() (per-run, memoized, one DB probe).
IFTA/UCR SELECT_SQL -> _select_sql() so tc.usable_filter() resolves at call
time, not import. 13 logic unit tests pass; live dry-run decision = OFF
(day 15 < 21 and recent 2d bounce 42% from the aging-out Jun-16 disaster).
Reframe away from 'escape the FCC' optics that would draw enforcement attention:
- Header/flagbar: 'Move your VoIP home to Canada' / 'US obligations ride on your
upstream' (was 'no FCC reporting, no USAC, no S/S to run')
- Recast claims to 'CRTC regulatory home, not FCC' and scope the no-USF/no-499/
no-RMD claims to the Canadian-jurisdiction traffic (accurate for US-number
traffic, which rides on the compliant US upstream)
- STIR/SHAKEN bullet now explicitly pro-compliance: 'we don't help anyone dodge
call-authentication; upstream partners are fully S/S compliant'
- Drop 'outside the FCC's reach'
- Add honest caveat: Canada is not for short-duration/dialer traffic; Canadian
carriers are more stringent on ACD/ASR than anywhere; this is for real
conversational voice (UCaaS/PBX/business/residential/live-agent)
Pivot from the hedge/second-entity framing to the consolidation pitch: one CRTC
carrier as the home base, nexus in Canada, customers onboarded from anywhere.
Lead value props with the three concrete reseller realities:
- No FCC reporting (no 499-A/Q, no RMD recert)
- No USAC/USF on your revenue (contribution sits upstream)
- No STIR/SHAKEN to set up or run (reseller can't get a US token; upstream signs)
Add: No FCC Section 214 / no ongoing 214 burden -- CRTC BITS is a cheap,
low-burden notification by comparison. Header/subject reworked; keeps the honest
US-termination + upstream-signing explanation.
Address the two most common objections truthfully (researched against CRTC,
FCC 2025 Third-Party Authentication Order, and STIR/SHAKEN cross-border docs):
- US-based long-distance termination operators routinely accept traffic from
Canadian carriers (cross-border voice is a standard interconnect).
- STIR/SHAKEN: a Canadian reseller cannot get a US SPC token (US-carrier-only),
so US-bound calls are signed by the upstream US-number provider that assigns
the DIDs -- exactly how most small US carriers already rely on upstream
signing. Canadian-origin traffic falls under the lighter CRTC regime, handled
by the upstream Canadian carrier. Does NOT claim S/S disappears -- it moves to
the upstream, off the carrier's day-to-day operation.
Address the obvious 'but I need US numbers' objection: several Canadian
wholesale carriers (Fibernetics, Iristel, VoIP.ms, Telnyx, Bandwidth, Twilio,
Frontier) provision US DIDs to CRTC-registered carriers, so they can keep
serving US customers from the Canadian entity. Adds a Canada-advantage bullet
and updates the guide block to call out both US + Canadian DIDs.
The FCC's 2025 Robocall Mitigation Order (47 CFR 64.1200(n)(4), FCC 25-6)
requires collecting + authenticating a government-issued photo ID for every
new customer before turning up voice service. Add it to the US-carrier burden
list and the matching 'does not apply in Canada' advantage.
- campaign_helpers.py: extract the branded Listmonk HTML helpers (hdr/flagbar/
stats/cta/footer/P/UL/etc.) + create_campaign() from create_campaigns.py into
a side-effect-free shared module; create_campaign() now takes an altbody so
every campaign ships a plaintext alternative (deliverability).
- create_crtc_usf_campaign.py: build the one-off CRTC email hooked on the Q3
2026 USF factor (38.8%, +1.8pts, eff Jul 1), with a $200-off CANADA200 banner
(expires Fri 23:59 ET, CTA links carry ?code= for auto-apply), the full US
carrier burden vs Canada advantage, BC/ON incorporation, and a hosted
carrier-guide PDF download. Creates a DRAFT only; sending stays manual.
- checkout.ts: generalize ensureCompliancePortalUser -> ensurePortalUser and
call it in the CRTC post-payment path so PayPal/crypto/webhook-confirmed CRTC
orders always get an ERPNext Customer + Website User (the single source of
truth for portal login/password), matching the compliance fix from the
PayPal incident. Also flip portal_user_created for canada_crtc/formation.
- canada-crtc.ts: enforce discount active+start/expiry windows, global usage
limit and applies_to scope server-side at checkout (was active-only), so a
promo like CANADA200 actually stops working after its expiry.
- scripts/generate_canada_carrier_guide_pdf.py: render the public Canadian
wholesale carrier/vendor guide PDF (reuses the canonical VENDORS list) to
site/public/guides/canada-carrier-guide.pdf for the CRTC campaign lead magnet.
The daily 40%-off coupon was being merged into every trucking/UCR/IFTA/OTC
send, but those discount sends were not actually being delivered (the
DKIM-broken window). Now that deliverability is fixed, re-test whether
normal-price offers convert before giving margin away.
New CAMPAIGN_ENABLE_COUPON env flag (default OFF) gates daily-coupon
minting in build_trucking_campaigns + the UCR/IFTA/OTC builders (which
import it as tc.COUPON_ENABLED). With it off, no code is minted and an
empty coupon_code is merged -> the campaign templates' existing
{{ if .Subscriber.Attribs.coupon_code }} guard falls through to the
normal-price {{ else }} branch and landing-page links carry no ?code=.
No template or DB changes; fully reversible (set CAMPAIGN_ENABLE_COUPON=1).
Verified: COUPON_ENABLED defaults False, coupon_attribs(None) -> empty,
lp_link drops ?code= when no coupon, all 4 builders compile.
SMTP2GO is no longer used: Listmonk relays through the local Postfix MTA
(172.18.0.1:25 from the Docker network), which DKIM-signs and delivers
direct-to-recipient-MX; transactional mail goes through Carbonio. Verified
zero smtp2go in any live container env + postfix has no external relayhost.
Removed the stale references so a rebuild/new dev can't re-introduce it:
- api/src/config.ts: SMTP_HOST default mail.smtp2go.com -> co.carrierone.com
- scripts/workers/crypto_payment_worker.py: same default fix
- infra/ansible all.yml: listmonk_smtp_* now 172.18.0.1:25, no auth (+comment)
- app.env.j2 / email.ts / crm.md / go-live-todo.md / architecture.svg: docs
All transactional/worker senders built multipart/alternative (or mixed)
messages with ONLY an HTML part. A single-part multipart/alternative is
malformed and HTML-only mail is a spam-score signal -- the same class of
deliverability bug that hurt the campaign pipeline, but on the telecom /
filing / customer-transactional path (499-Q reminders, RMD/FCC filing
review links, intake/completion/delivery emails, commissions, etc).
- worker_email.send_worker_email: auto-derive plaintext from HTML when
caller omits text= (fixes the shared helper for all current+future use)
- 16 rolled-their-own senders in scripts/workers/** + scripts/formation/
document_delivery.py: attach html_to_text(...) plaintext sibling before
the HTML part (job_server + document_delivery wrap text+html in an
alternative sub-part so PDFs still attach to the mixed root)
- api/src/email.ts: add dependency-free htmlToText() and default
sendEmail text to it (fixes checkout/webhook HTML-only sends)
Verified: all py files compile + import at runtime, api tsc passes,
htmlToText handles hrefs/lists/entities, 11 plaintext unit tests pass.
Telecom campaign 407 (Jun 8) was HTML-only + sent in the DKIM-broken
window -> 384 sent / 0 clicks (same junked-mail signature).
The FMCSA census was a one-time snapshot (last loaded ~May 30) with NO refresh
timer -- carriers newly falling out of MCS-150/UCR compliance were never picked
up. New scripts/workers/fmcsa_source_refresh.py orchestrates the full pipeline
(census download -> enrichment -> deficiency flag -> verify new emails ->
MX-tag new) and runs weekly via cron pw-fmcsa-refresh (Sun 09:00 UTC), codified
in the mail-pipeline Ansible role.
Idempotent + incremental: the census upsert preserves email_verified /
listmonk_sent_at / deficiency_flags, so existing carriers keep their send state
and only census fields refresh; new DOTs flow into verification then campaigns.
A carrier who refiled gets a fresh mcs150_parsed, so the builder's overdue
WHERE clause stops targeting them automatically. Verify is capped per run
(20k) so it never stalls on millions of rows.
(Healthcare already auto-catches newly-revalidation-overdue providers within
its 63k institutional pool via pw-hc-refresh Mon/Wed/Fri.)
The anchor regex only matched quoted hrefs; unquoted (href=URL) dropped the
URL from the plaintext part. Now handles double/single/unquoted. Added
scripts/test_email_plaintext.py (11 cases: link forms, mailto, template-tag
preservation, tag stripping, entity unescape, blank-line collapse).
Added DEAD_ISP_DOMAINS (52 domains) to BLOCKED_EMAIL_DOMAINS, so every
campaign builder that imports the shared exclusions (trucking, UCR, IFTA via
create_and_schedule_campaign, and the healthcare importer) stops cold-mailing
them. Domains were identified from our own Listmonk bounce table (top bounced
recipient domains) cross-checked against ISP status: defunct dial-up brands
(earthlink, netzero, juno, mindspring...), Qwest/Embarq legacy, satellite
(hughes, wildblue, dishmail), Altice/Suddenlink rural, WOW!/Knology, small
rural ISPs (windstream, tds, iowatelecom...) and Alaska regional.
Deliberately keeps still-active large consumer ISPs (comcast/charter/cox/
centurylink) -- their bounces were the cold-IP/no-DKIM reputation problem
(now fixed), not dead mailboxes, and they carry real prospects.
Part of the email-deliverability incident hardening.
Two deliverability hardening fixes from the email audit:
1. Plaintext (altbody): all campaigns were HTML-only. Listmonk only emits
multipart/alternative when altbody is set, and HTML-only bulk mail is a
spam-score signal. New scripts/_email_plaintext.py renders a readable
text/plain part from the HTML body (dependency-free; preserves Listmonk
{{ .Subscriber }}/{{ UnsubscribeURL }} template tags, turns links into
'text (url)'). Wired into the trucking builder (and thus UCR + IFTA, which
reuse create_and_schedule_campaign) and the healthcare builder.
2. Stable container hostname: Listmonk derived its Message-ID from the random
docker container id -> @localhost.localdomain (spam-score signal). Pin both
listmonk + listmonk-hc hostname to perfwest.performancewest.net, matching
Listmonk's SMTP hello_hostname.
Part of the email-deliverability incident hardening.
The verifier returned (True, 'mx_unreachable') when it couldn't complete a port-25
probe to ANY MX — marking 438,163 addresses email_verified=TRUE. But these are NOT
dead: they're dominated by Comcast (13.7k), AT&T/SBCGlobal (13.5k), Verizon, Cox,
Charter, Frontier, etc. — major ISPs that deliberately tarpit/refuse probes from
unknown IPs. Confirmed from prod: comcast MX connects + returns 220. The probe
failure ≠ undeliverable.
Fix: return (False, 'mx_probe_blocked') — MX exists, deliverability UNKNOWN, must
be confirmed by a real send. Excluded from PW campaigns; prime burner-verification
target (burner_list_verify upgrades it to send_confirmed on delivery). Existing
438,163 mx_unreachable rows reclassified in prod to mx_probe_blocked / verified=FALSE.
The smtp_valid pool is only ~3k unsent — too small to sustain campaigns. SMTP
probing can't confirm catch-all/mx_unreachable deliverability; only a REAL send
can. burner_list_verify.py reconciles a verification send from a DISPOSABLE burner
domain (isolated from PW/carrierone reputation):
- hard bounce -> fmcsa_carriers.email_verify_result='hard_bounced' (excluded)
- delivered -> 'send_confirmed' (proven deliverable; PW campaigns send to it)
It tails the burner MTA mail.log (reuses bounce-watcher's status= pattern) and
writes back idempotently. The PW trucking filter now treats smtp_valid +
send_confirmed as sendable. docs/campaign-deliverability-plan.md captures the full
diagnosis, the burner design, and CAN-SPAM guardrails.
Remaining (needs a domain + isolated MTA identity — operator/infra decision):
stand up the burner domain, the verification-send worker, and a writeback cron.
Root cause of zero conversions since Jun 9 + the Gmail/Outlook block storm:
the send filter was '(email_verified IS TRUE OR result IN ...)'. The verifier
sets email_verified=TRUE optimistically for mx_unreachable (domain exists but
its mail server never answered the RCPT probe) — 438,163 such rows. Those HARD
BOUNCE on send, producing ~1,100 bounces/day (~47% rate) and blocklisting half
the 120k subscriber base, so real prospects never saw the offer.
Fix: key the send filter ONLY off email_verify_result, never the broken boolean.
Recovery mode (default): send only 'smtp_valid' to drive bounce rate to ~0 and
rebuild reputation; set CAMPAIGN_INCLUDE_CATCH_ALL=1 to re-add catch-all domains
once recovered. Mirrors the healthcare list-cleaning approach (HC bounces ~2-3%,
which proves the fix). Note: only ~3k smtp_valid unsent remain — list growth via
real-send bounce verification (separate burner domain) is the follow-up.
The MCS-150 intake-completion email linked customers to /order/dot-compliance,
which is the sales/checkout page -- it ignores ?order= and asks the customer to
re-pick services and pay again, so they 'cannot enter any data' (Paul Wilson's
report). Link to the per-service intake wizard /order/<slug>?order=... instead,
which loads the paid order, pre-fills from the FMCSA census, and drops payment.
Also add a Trailers field to the DOT intake fleet section and wire it through to
the MCS-150 PDF Q26 trailer row, so carriers can update trucks AND trailers.
- Added a line asking them to call their insurance agent to confirm Form E
ability before clicking yes/no, so we pick the right path first time.
- Reply-To now routes to info@performancewest.net (monitored), overridable via
SC_COC_REPLY_TO env.
The Dockerfile copies form PDFs explicitly by name; the SC COC template was
missing, so fill_sc_coc() would FileNotFoundError in the container. Added it.
Routes SC intrastate-authority orders to the real SCDMV COC product instead of a
PSC certificate (which doesn't apply to property carriers):
- sc_coc_filing.py: emails the carrier a one-click yes/no — does your insurer
have / can they file a Form E (SC intrastate liability, $750k or $300k by
GVWR) with SCDMV? Records the answer; builds the filled COC package.
- state_trucking._handle_sc_coc_gate: SC intrastate gate —
no answer -> email the question once, HOLD
answered no -> broker referral opened, HOLD (ops todo)
answered yes-> proceed to bill the exact $25 SCDMV COC fee (at cost) + file
- API POST /compliance-orders/:id/sc-insurance: records yes/no in intake_data
(no schema change); NO opens an insurance_lead broker-referral ticket +
Telegram; YES re-dispatches the worker to bill the $25 + file.
- site/order/sc-insurance: customer one-click yes/no page (auto-submits when
the email links straight to ?have=yes|no).
Non-SC intrastate still uses the PSC/PUC email path or a manual todo.
SC for-hire PROPERTY carriers (not passenger/HHG/hazwaste) register intrastate
via the SCDMV Certificate of Compliance (COC), not a PSC certificate. This adds:
- sc_coc_pdf_filler.fill_sc_coc(): fills the official SCDMV Form COC from
intake (business name, officers, physical/mailing address, phone), picks
New vs Renewal, and stamps the coverage class (E-L low-value / E-LC).
Field names in the source PDF are auto-generated + offset from their labels;
mapped here by verified on-page geometry. Verified by render.
- suggest_coverage_class(): E-L for low-value cargo (scrap/dump/aggregate),
else E-LC (safer default).
- gov_fee: SC intrastate fee corrected from $0 placeholder to the real $25
COC new-application fee (renewals $0), billed at cost.
The carrier's INSURER files the Form E (liability) + Form H (cargo, E-LC only)
directly with SCDMV; we collect the COC app + $25 and submit it.