Diagnosing zero healthcare sales (11k sent, 5479 opens, 0 clicks, 0 orders).
Root cause of clicks=0: Listmonk only registers a link for tracking when the
href ends with the literal @TrackLink marker; all 10 hc templates lacked it
(trucking/CRTC have it). So the entire funnel was unmeasurable below 'open'.
Changes:
- Click tracking: append @TrackLink + UTM to every /order/ CTA across all 10
templates (external gov self-verify links left untracked on purpose).
- Remove all service prices from emails (99/49/49/99yr/9mo). Price is
now revealed on the order page after value is established; catalog
(api/src/service-catalog.ts) stays source of truth. Kept the 0,000 OIG
penalty stat (regulatory fact, not our price). Added a neutral 'flat fee shown
up front' reassurance block where the fee table used to be.
- Compliance/honesty: the nppes_outdated email asserted a per-record
'FLAGGED OUT OF DATE / detected' status, but its selector only checks
deliverability and the data has no NPPES last-updated field -> unsubstantiated
for every recipient. Reframed to a generally-true periodic-attestation message
('PERIODIC REVIEW REQUIRED', 'most practices drift out of date'). Same hedging
applied to npi_reactivation ('may be deactivated ... confirm on official
sources'). Substantiated reval 'past due' claims (backed by the public CMS
Revalidation list) were kept.
- Fixed stale $299 OIG metadata in build script -> $79/mo (reference only).
Docs: docs/healthcare-competitive-pricing.md (benchmark research) and
docs/healthcare-email-compliance-review.md (CAN-SPAM / FTC / impersonation pass;
flags SOC2/HIPAA/PCI badge claims for owner confirmation).
Verified headless: all 10 render with 0 JS errors, exactly 1 tracked CTA each,
no price leaks.
Completes the MX-exclusion plan. Untagged carriers can't be excluded (the big-MX
gate is MX-based, so an unresolved Google/Yahoo domain would slip through), and
were previously UNCAPPED in select_sendable_carriers -- a flood of freshly-imported,
never-resolved domains could dominate a run before pw-mx-tag resolves them.
Added a single shared untagged_cap (env MAIN_UNTAGGED_MX_CAP, default max(quota,200))
so untagged sends are bounded without starving the pool: at the default the bucket
can still fill an entire run's quota (no behavior change today), but the cap can be
tightened to a fraction once pw-mx-tag has drained the backlog -- which is fast,
since only ~3,035 distinct *verified-sendable* untagged domains remain (< one
20k/day tag run). Tagged carriers keep their per-operator caps unchanged.
Verified: compiles; cap logic never starves at default, enforces the limit when
set lower.
Fix 1 (build_trucking_campaigns.py): the warmup big-MX exclusion only covered the
clean-label operators (google/microsoft/proofpoint/...). Consumer mailbox
operators that mx_tag_carriers.py labels with an "mx:" prefix slipped BOTH the
exclusion and the per-MX throttle -- notably mx:yahoodns.net (283k sendable
carriers = Yahoo Small Business/AOL custom domains) and mx:icloud.com (25k), plus
comcast/charter/centurylink/windstream/tds/earthlink. These are custom domains
whose MX points at a consumer provider, invisible to the literal-domain blocklist.
Added CONSUMER_MX_OPERATORS, folded into WARMUP_EXCLUDE_OPERATORS used by both the
fetch_carriers() exclusion SQL and mx_daily_caps() (same day-30 ramp). Behind the
existing MAIN_SKIP_BIG_MX switch.
Validated read-only: after the fix the warmup-eligible pool is 353,909 carriers
(315,892 untagged + ~38k genuinely small/self-hosted operators), so the long tail
still sustains the daily quota -- not starved -- while 0 consumer-MX carriers are
selected during warmup.
Fix 3 (infra/cron/pw-mx-tag): mx_tag_carriers.py was on no cron, so the untagged
(NULL) backlog (~316k) never drained and new FMCSA imports stayed untagged,
slowly re-opening the gap. Added a daily 05:45 UTC cron (--only-unsent
--limit-domains 20000), before the 08:00 builder. Idempotent/bounded (only tags
mx_provider IS NULL). Verified live: a 200-domain test run tagged 216 domains.
(Fix 2 -- bounding the NULL bucket cap -- deferred; the cron will drain it.)
Campaign 509 (CRTC USF Q3, 4,156 sent) shipped with raw <a href> URLs, so
Listmonk never registered the links and recorded ZERO clicks -- even though
Umami logged the real order-page visits AND a carrier phoned in after clicking.
Same mistake docs/fmcsa-trucking-plan.md already flagged ("Use @TrackLink on all
CTAs"); the trucking campaigns do it, the CRTC one didn't.
Listmonk only tracks a link when its href ends with the literal @TrackLink marker
(it strips it and rewrites through lists.performancewest.net/link/). Added a
_track() helper that appends UTM params (so Umami attributes the visit too) +
@TrackLink, applied to both the primary order CTA and the guide-PDF download.
The running campaign 509's body was also patched live in the DB (same two links)
so its remaining sends record clicks. Future CRTC campaigns get it from source.
First live ingest (28 reports) showed our warmup rotation pool (.91-.109, out0x)
mislabeled EXTERNAL because OUR_IPS only listed 4 specific IPs -- every one was
100% DMARC-passing, clearly ours, and would have generated false spoofing alerts.
Replace the literal-IP set with an ipaddress subnet check on 207.174.124.0/24
(our whole block). The only genuinely-external failing sender is 35.174.145.124
(AWS, 32 msgs spoofing us, SPF-fail/no-DKIM, all correctly rejected by p=reject) --
exactly the signal the --alert path is meant to surface.
Tool 2 of the deliverability monitoring pair (Tool 1 = mail_reputation_monitor).
DMARC rua reports from dozens of operators (Google, Yahoo, Comcast, Cox, Bell,
Mimecast, Cisco ESA, GMX, mail.com, ...) were landing in ops@ (dmarc@ was a DL),
burying real mail and never parsed. Now ingested + queryable:
- dmarc@performancewest.net converted DL -> dedicated Carbonio mailbox; isolated
IMAP creds in server .env, surfaced to workers in docker-compose.yml (mirrors
OPS_IMAP_*). 29 historical reports moved ops@ -> dmarc@ via IMAP.
- scripts/dmarc_report_parser.py: IMAP fetch unseen -> decompress .gz/.zip/.xml
(namespace-agnostic: classic + urn:ietf:params:xml:ns:dmarc-2.0 GMX/mail.com) ->
parse aggregate XML -> upsert dmarc_report (keyed (org_name,report_id), no-op on
re-parse) + dmarc_record per source IP. dmarc_pass = dkim_aligned OR spf_aligned.
Marks \Seen. --dry-run/--all/--alert (7d per-IP summary + Telegram if one of OUR
IPs <95% pass, or EXTERNAL IP sends >=20 failing msgs as us = spoofing under
p=reject). psycopg2 imported lazily so --dry-run runs without the driver.
- api/migrations/102_dmarc_aggregate.sql: dmarc_report + dmarc_record tables.
- infra/cron/pw-dmarc-parser: 06:20 UTC daily --alert (after reputation, before scrub).
- docs/deliverability.md: DMARC section DONE; query examples.
Verified: dry-run --all parses all 28 reports (1 non-report test probe), 0 unknown
after the namespace fix.
Adds scripts/mail_reputation_monitor.py + migration 101 (mail_reputation_daily).
Sender reputation is judged by the RECEIVING operator (Microsoft/Google/Yahoo/
Proofpoint), and the provider portals (SNDS/Postmaster/CFL) need a login and lag
24-48h. Our postfix logs already carry the ground truth in real time: every send
records the receiving host + SMTP response, and the response classifies WHY:
250 -> accepted
451 4.7.500 -> throttled (Microsoft rate-limiting a cold IP)
550 5.7.x -> reject_reputation (spam/reputation)
550 5.1.1/5.4.1-> reject_recipient (dead mailbox / access denied = list hygiene)
550 ...SPAM -> reject_content (SpamAssassin)
The parser classifies each egress delivery (out0x/hcout/relay) by (sending_ip,
receiver, outcome, reason_code) and upserts ONE daily aggregate row per bucket
(idempotent ON CONFLICT), so a nightly cron over the rotated log gives a queryable
trend without re-parse double-counting. --alert prints a per-operator summary and
Telegram-alerts on regressions (>=10% reputation rejects, or Microsoft >=70%
throttled). Reads stdin ("-") so the host-owned /var/log/mail.log can be piped
into the DB-connected workers container.
Motivation: 2026-06-19 audit found ~80% of Microsoft sends were getting 451 4.7.500
throttles on the warming IPs -- this makes that trend visible as reputation recovers.
The fmcsa campaign builders already exclude gmail/yahoo/microsoft/etc. from NEW
audience selections, but two reputation leaks remained on the LIST-BASED side:
1. iCloud/Apple gap. icloud.com/me.com/mac.com were never in the exclusion set.
A 2026-06 Listmonk audit found 1,321 ENABLED iCloud subscribers on list 3
("FCC Carriers - Direct Contacts") -- the single largest enabled-consumer
bucket -- being cold-blasted with no exclusion at all. Add APPLE_CONSUMER_DOMAINS.
2. Stale already-imported consumer subs. List-based campaigns (e.g. the running
CRTC/USF blast on list 3) keep hitting consumer addresses imported BEFORE the
relevant domain joined the exclusion list. gmail.com was still the #1 bounce
domain via that campaign even though new selections exclude it. Add
scrub_listmonk_consumer.py: reconciles the live Listmonk subscriber table
against the authoritative exclusion list and blocklists any ENABLED subscriber
whose address is_blocked(). Idempotent; re-run whenever the exclusion grows so
it applies retroactively. Uses the same 'blocklisted' terminal state as the
bounce handler, so contacts are excluded from all current/future campaigns
without deleting history. Supports --dry-run and both listmonk / listmonk_hc.
Isolates bulk sending reputation onto a dedicated subdomain so the root domain
stays clean for transactional/verification mail (and recovers faster). Replies
still go to the root domain via Reply-To, so the customer-facing reply experience
is unchanged.
- build_trucking_campaigns.py: add env-overridable FROM_EMAIL
(noreply@send.performancewest.net); use it for both scheduled + test sends
instead of inheriting base["from_email"] from the DB base campaign.
- build_healthcare_campaigns_cron.py: FROM_EMAIL ->
compliance@send.performancewest.net (env-overridable).
- bounce-watcher.sh / hc-bounce-watcher.sh: track the new subdomain envelope
sender (keep legacy root-domain sender so the pre-cutover queue still drains;
HC also tracks by hcout transport regardless of sender).
Infra already live (separate, non-code): subdomain DNS (A/MX/SPF/DKIM
selector=send/DMARC p=reject) on the Hestia master, OpenDKIM signs
d=send.performancewest.net (verified end-to-end), egress .94/.107. Root SPF
trimmed to the real IPs; pointless IP-rehab cron disabled.
Runs the real _BaseNPIHandler.handle() with _create_todo monkeypatched (no DB /
ERPNext / email side effects) and asserts:
- first OIG/SAM screening has no [Monthly cycle] prefix / RECURRING banner
- a recurring_cycle order gets the [Monthly cycle] title prefix, the
"RECURRING MONTHLY CYCLE" banner, the invoice id, and the re-run-against-
CURRENT-data + issue-NEW-certificate instructions
- recurring_cycle works with and without an invoice id
- the bundle handler's first run is not flagged recurring
Verified passing both locally and inside the deployed workers container.
Convert OIG/SAM from one-time $299/yr to recurring $79/month (card+ACH only) -
the first real recurring-billing product in the system. Exclusion screening is
a *monthly* federal obligation, so recurring monitoring fits the requirement and
is the biggest valuation lever (vs a one-time annual run).
Catalog (single source of truth):
- service-catalog.ts: add billing_interval + allowed_methods to ComplianceService;
oig-sam-screening -> 7900c, billing_interval:"month", allowed_methods:[card,ach],
name "(Monthly Monitoring)".
- gen-service-catalog.py + check-service-catalog-drift.py: carry/guard the two new
fields; regenerate site catalog.
Checkout (api/src/routes/checkout.ts):
- mode:"subscription" with recurring price_data when billing_interval is set;
surcharge absorbed for recurring (clean $79/mo); server-side METHOD_NOT_ALLOWED
re-validation against allowed_methods.
- ensureColumns + migration 100: compliance_orders.stripe_subscription_id,
bundle_upsell_sent_at (+ subscription index).
Webhooks (api/src/routes/webhooks.ts):
- record stripe_subscription_id on checkout.session.completed (subscription mode).
- invoice.paid (subscription_cycle only) -> re-dispatch screening for the cycle;
invoice.payment_failed -> admin alert + first-failure customer nudge;
customer.subscription.deleted -> mark order cancelled. (API 2026-03-25 moved the
subscription link to invoice.parent.subscription_details.subscription.)
Fulfillment:
- job_server.py: pass recurring_cycle/invoice_id into the order.
- npi_provider.py: OIG handler labels renewal cycles "[Monthly cycle]" + re-screen
note; bundle action runs only the FIRST screening + flags the $79/mo upsell.
Bundle land-and-expand:
- Provider Compliance Bundle now includes only the first OIG/SAM screening (was
giving away $948/yr of monitoring inside an $899 bundle).
- new worker scripts/workers/bundle_upsell.py (+ pw-bundle-upsell timer): ~3 weeks
after a paid bundle, emails the customer to continue $79/mo monitoring; dedup via
bundle_upsell_sent_at; skips customers who already have an OIG/SAM order.
Surfaces updated to $79/mo: PaymentStep (filters methods, "Billed every month,
cancel anytime"), order pages, healthcare index, npi-compliance-check tool (also
fixed stale $699 bundle drift -> $899), hc_oig_screening + hc_compliance_bundle
emails.
Docs: billing.md gains a "Stripe-native Subscriptions" section + a reality-check
banner (Adyen/ERPNext-gateway model documented there is NOT live; Stripe is the
real rail). Fixed run-migrations.yml container name bug
(performancewest-postgres-1 -> performancewest-api-postgres-1, overridable).
Tests: api/tests/recurring-subscription.test.ts (28 assertions) covers catalog
gating, method validation, surcharge suppression, recurring line-item build,
invoiceSubscriptionId extraction, renewal-cycle gating. tsc clean; site build
clean; catalog drift OK.
Manual deploy step: enable invoice.paid, invoice.payment_failed,
customer.subscription.deleted on the Stripe webhook endpoint.
Replaces the panic-era burner-domain verification plan with an in-house
automatic catch-all rollout in the trucking/IFTA/UCR builders. Root-cause
classification of the 75k pre-DKIM-fix bounces showed ~55% were reputation/
auth (now fixed by DKIM signing) and only ~29% genuinely-dead mailboxes;
catch-all domains accept at RCPT time so they do not user-unknown bounce at
send, making a controlled in-house bleed safer than warming a separate burner.
catch_all_enabled() adds catch-all results only when warmup_day >=
CAMPAIGN_CATCH_ALL_MIN_DAY (21) AND the recent 2-day live bounce rate is below
CAMPAIGN_CATCH_ALL_MAX_BOUNCE_PCT (8%) on a >=300-sent sample; auto-reverts to
the clean smtp_valid/send_confirmed pool on the next run if bounces spike.
Short window so a past disaster cannot block the rollout forever and a fresh
spike trips fast. CAMPAIGN_INCLUDE_CATCH_ALL=1/0 still hard-overrides.
USABLE_FILTER (static) -> usable_filter() (per-run, memoized, one DB probe).
IFTA/UCR SELECT_SQL -> _select_sql() so tc.usable_filter() resolves at call
time, not import. 13 logic unit tests pass; live dry-run decision = OFF
(day 15 < 21 and recent 2d bounce 42% from the aging-out Jun-16 disaster).
Reframe away from 'escape the FCC' optics that would draw enforcement attention:
- Header/flagbar: 'Move your VoIP home to Canada' / 'US obligations ride on your
upstream' (was 'no FCC reporting, no USAC, no S/S to run')
- Recast claims to 'CRTC regulatory home, not FCC' and scope the no-USF/no-499/
no-RMD claims to the Canadian-jurisdiction traffic (accurate for US-number
traffic, which rides on the compliant US upstream)
- STIR/SHAKEN bullet now explicitly pro-compliance: 'we don't help anyone dodge
call-authentication; upstream partners are fully S/S compliant'
- Drop 'outside the FCC's reach'
- Add honest caveat: Canada is not for short-duration/dialer traffic; Canadian
carriers are more stringent on ACD/ASR than anywhere; this is for real
conversational voice (UCaaS/PBX/business/residential/live-agent)
Pivot from the hedge/second-entity framing to the consolidation pitch: one CRTC
carrier as the home base, nexus in Canada, customers onboarded from anywhere.
Lead value props with the three concrete reseller realities:
- No FCC reporting (no 499-A/Q, no RMD recert)
- No USAC/USF on your revenue (contribution sits upstream)
- No STIR/SHAKEN to set up or run (reseller can't get a US token; upstream signs)
Add: No FCC Section 214 / no ongoing 214 burden -- CRTC BITS is a cheap,
low-burden notification by comparison. Header/subject reworked; keeps the honest
US-termination + upstream-signing explanation.
Address the two most common objections truthfully (researched against CRTC,
FCC 2025 Third-Party Authentication Order, and STIR/SHAKEN cross-border docs):
- US-based long-distance termination operators routinely accept traffic from
Canadian carriers (cross-border voice is a standard interconnect).
- STIR/SHAKEN: a Canadian reseller cannot get a US SPC token (US-carrier-only),
so US-bound calls are signed by the upstream US-number provider that assigns
the DIDs -- exactly how most small US carriers already rely on upstream
signing. Canadian-origin traffic falls under the lighter CRTC regime, handled
by the upstream Canadian carrier. Does NOT claim S/S disappears -- it moves to
the upstream, off the carrier's day-to-day operation.
Address the obvious 'but I need US numbers' objection: several Canadian
wholesale carriers (Fibernetics, Iristel, VoIP.ms, Telnyx, Bandwidth, Twilio,
Frontier) provision US DIDs to CRTC-registered carriers, so they can keep
serving US customers from the Canadian entity. Adds a Canada-advantage bullet
and updates the guide block to call out both US + Canadian DIDs.
The FCC's 2025 Robocall Mitigation Order (47 CFR 64.1200(n)(4), FCC 25-6)
requires collecting + authenticating a government-issued photo ID for every
new customer before turning up voice service. Add it to the US-carrier burden
list and the matching 'does not apply in Canada' advantage.
- campaign_helpers.py: extract the branded Listmonk HTML helpers (hdr/flagbar/
stats/cta/footer/P/UL/etc.) + create_campaign() from create_campaigns.py into
a side-effect-free shared module; create_campaign() now takes an altbody so
every campaign ships a plaintext alternative (deliverability).
- create_crtc_usf_campaign.py: build the one-off CRTC email hooked on the Q3
2026 USF factor (38.8%, +1.8pts, eff Jul 1), with a $200-off CANADA200 banner
(expires Fri 23:59 ET, CTA links carry ?code= for auto-apply), the full US
carrier burden vs Canada advantage, BC/ON incorporation, and a hosted
carrier-guide PDF download. Creates a DRAFT only; sending stays manual.
- checkout.ts: generalize ensureCompliancePortalUser -> ensurePortalUser and
call it in the CRTC post-payment path so PayPal/crypto/webhook-confirmed CRTC
orders always get an ERPNext Customer + Website User (the single source of
truth for portal login/password), matching the compliance fix from the
PayPal incident. Also flip portal_user_created for canada_crtc/formation.
- canada-crtc.ts: enforce discount active+start/expiry windows, global usage
limit and applies_to scope server-side at checkout (was active-only), so a
promo like CANADA200 actually stops working after its expiry.
- scripts/generate_canada_carrier_guide_pdf.py: render the public Canadian
wholesale carrier/vendor guide PDF (reuses the canonical VENDORS list) to
site/public/guides/canada-carrier-guide.pdf for the CRTC campaign lead magnet.
The daily 40%-off coupon was being merged into every trucking/UCR/IFTA/OTC
send, but those discount sends were not actually being delivered (the
DKIM-broken window). Now that deliverability is fixed, re-test whether
normal-price offers convert before giving margin away.
New CAMPAIGN_ENABLE_COUPON env flag (default OFF) gates daily-coupon
minting in build_trucking_campaigns + the UCR/IFTA/OTC builders (which
import it as tc.COUPON_ENABLED). With it off, no code is minted and an
empty coupon_code is merged -> the campaign templates' existing
{{ if .Subscriber.Attribs.coupon_code }} guard falls through to the
normal-price {{ else }} branch and landing-page links carry no ?code=.
No template or DB changes; fully reversible (set CAMPAIGN_ENABLE_COUPON=1).
Verified: COUPON_ENABLED defaults False, coupon_attribs(None) -> empty,
lp_link drops ?code= when no coupon, all 4 builders compile.
SMTP2GO is no longer used: Listmonk relays through the local Postfix MTA
(172.18.0.1:25 from the Docker network), which DKIM-signs and delivers
direct-to-recipient-MX; transactional mail goes through Carbonio. Verified
zero smtp2go in any live container env + postfix has no external relayhost.
Removed the stale references so a rebuild/new dev can't re-introduce it:
- api/src/config.ts: SMTP_HOST default mail.smtp2go.com -> co.carrierone.com
- scripts/workers/crypto_payment_worker.py: same default fix
- infra/ansible all.yml: listmonk_smtp_* now 172.18.0.1:25, no auth (+comment)
- app.env.j2 / email.ts / crm.md / go-live-todo.md / architecture.svg: docs
All transactional/worker senders built multipart/alternative (or mixed)
messages with ONLY an HTML part. A single-part multipart/alternative is
malformed and HTML-only mail is a spam-score signal -- the same class of
deliverability bug that hurt the campaign pipeline, but on the telecom /
filing / customer-transactional path (499-Q reminders, RMD/FCC filing
review links, intake/completion/delivery emails, commissions, etc).
- worker_email.send_worker_email: auto-derive plaintext from HTML when
caller omits text= (fixes the shared helper for all current+future use)
- 16 rolled-their-own senders in scripts/workers/** + scripts/formation/
document_delivery.py: attach html_to_text(...) plaintext sibling before
the HTML part (job_server + document_delivery wrap text+html in an
alternative sub-part so PDFs still attach to the mixed root)
- api/src/email.ts: add dependency-free htmlToText() and default
sendEmail text to it (fixes checkout/webhook HTML-only sends)
Verified: all py files compile + import at runtime, api tsc passes,
htmlToText handles hrefs/lists/entities, 11 plaintext unit tests pass.
Telecom campaign 407 (Jun 8) was HTML-only + sent in the DKIM-broken
window -> 384 sent / 0 clicks (same junked-mail signature).
The FMCSA census was a one-time snapshot (last loaded ~May 30) with NO refresh
timer -- carriers newly falling out of MCS-150/UCR compliance were never picked
up. New scripts/workers/fmcsa_source_refresh.py orchestrates the full pipeline
(census download -> enrichment -> deficiency flag -> verify new emails ->
MX-tag new) and runs weekly via cron pw-fmcsa-refresh (Sun 09:00 UTC), codified
in the mail-pipeline Ansible role.
Idempotent + incremental: the census upsert preserves email_verified /
listmonk_sent_at / deficiency_flags, so existing carriers keep their send state
and only census fields refresh; new DOTs flow into verification then campaigns.
A carrier who refiled gets a fresh mcs150_parsed, so the builder's overdue
WHERE clause stops targeting them automatically. Verify is capped per run
(20k) so it never stalls on millions of rows.
(Healthcare already auto-catches newly-revalidation-overdue providers within
its 63k institutional pool via pw-hc-refresh Mon/Wed/Fri.)
The anchor regex only matched quoted hrefs; unquoted (href=URL) dropped the
URL from the plaintext part. Now handles double/single/unquoted. Added
scripts/test_email_plaintext.py (11 cases: link forms, mailto, template-tag
preservation, tag stripping, entity unescape, blank-line collapse).
Added DEAD_ISP_DOMAINS (52 domains) to BLOCKED_EMAIL_DOMAINS, so every
campaign builder that imports the shared exclusions (trucking, UCR, IFTA via
create_and_schedule_campaign, and the healthcare importer) stops cold-mailing
them. Domains were identified from our own Listmonk bounce table (top bounced
recipient domains) cross-checked against ISP status: defunct dial-up brands
(earthlink, netzero, juno, mindspring...), Qwest/Embarq legacy, satellite
(hughes, wildblue, dishmail), Altice/Suddenlink rural, WOW!/Knology, small
rural ISPs (windstream, tds, iowatelecom...) and Alaska regional.
Deliberately keeps still-active large consumer ISPs (comcast/charter/cox/
centurylink) -- their bounces were the cold-IP/no-DKIM reputation problem
(now fixed), not dead mailboxes, and they carry real prospects.
Part of the email-deliverability incident hardening.
Two deliverability hardening fixes from the email audit:
1. Plaintext (altbody): all campaigns were HTML-only. Listmonk only emits
multipart/alternative when altbody is set, and HTML-only bulk mail is a
spam-score signal. New scripts/_email_plaintext.py renders a readable
text/plain part from the HTML body (dependency-free; preserves Listmonk
{{ .Subscriber }}/{{ UnsubscribeURL }} template tags, turns links into
'text (url)'). Wired into the trucking builder (and thus UCR + IFTA, which
reuse create_and_schedule_campaign) and the healthcare builder.
2. Stable container hostname: Listmonk derived its Message-ID from the random
docker container id -> @localhost.localdomain (spam-score signal). Pin both
listmonk + listmonk-hc hostname to perfwest.performancewest.net, matching
Listmonk's SMTP hello_hostname.
Part of the email-deliverability incident hardening.
The verifier returned (True, 'mx_unreachable') when it couldn't complete a port-25
probe to ANY MX — marking 438,163 addresses email_verified=TRUE. But these are NOT
dead: they're dominated by Comcast (13.7k), AT&T/SBCGlobal (13.5k), Verizon, Cox,
Charter, Frontier, etc. — major ISPs that deliberately tarpit/refuse probes from
unknown IPs. Confirmed from prod: comcast MX connects + returns 220. The probe
failure ≠ undeliverable.
Fix: return (False, 'mx_probe_blocked') — MX exists, deliverability UNKNOWN, must
be confirmed by a real send. Excluded from PW campaigns; prime burner-verification
target (burner_list_verify upgrades it to send_confirmed on delivery). Existing
438,163 mx_unreachable rows reclassified in prod to mx_probe_blocked / verified=FALSE.
The smtp_valid pool is only ~3k unsent — too small to sustain campaigns. SMTP
probing can't confirm catch-all/mx_unreachable deliverability; only a REAL send
can. burner_list_verify.py reconciles a verification send from a DISPOSABLE burner
domain (isolated from PW/carrierone reputation):
- hard bounce -> fmcsa_carriers.email_verify_result='hard_bounced' (excluded)
- delivered -> 'send_confirmed' (proven deliverable; PW campaigns send to it)
It tails the burner MTA mail.log (reuses bounce-watcher's status= pattern) and
writes back idempotently. The PW trucking filter now treats smtp_valid +
send_confirmed as sendable. docs/campaign-deliverability-plan.md captures the full
diagnosis, the burner design, and CAN-SPAM guardrails.
Remaining (needs a domain + isolated MTA identity — operator/infra decision):
stand up the burner domain, the verification-send worker, and a writeback cron.
Root cause of zero conversions since Jun 9 + the Gmail/Outlook block storm:
the send filter was '(email_verified IS TRUE OR result IN ...)'. The verifier
sets email_verified=TRUE optimistically for mx_unreachable (domain exists but
its mail server never answered the RCPT probe) — 438,163 such rows. Those HARD
BOUNCE on send, producing ~1,100 bounces/day (~47% rate) and blocklisting half
the 120k subscriber base, so real prospects never saw the offer.
Fix: key the send filter ONLY off email_verify_result, never the broken boolean.
Recovery mode (default): send only 'smtp_valid' to drive bounce rate to ~0 and
rebuild reputation; set CAMPAIGN_INCLUDE_CATCH_ALL=1 to re-add catch-all domains
once recovered. Mirrors the healthcare list-cleaning approach (HC bounces ~2-3%,
which proves the fix). Note: only ~3k smtp_valid unsent remain — list growth via
real-send bounce verification (separate burner domain) is the follow-up.
The MCS-150 intake-completion email linked customers to /order/dot-compliance,
which is the sales/checkout page -- it ignores ?order= and asks the customer to
re-pick services and pay again, so they 'cannot enter any data' (Paul Wilson's
report). Link to the per-service intake wizard /order/<slug>?order=... instead,
which loads the paid order, pre-fills from the FMCSA census, and drops payment.
Also add a Trailers field to the DOT intake fleet section and wire it through to
the MCS-150 PDF Q26 trailer row, so carriers can update trucks AND trailers.
- Added a line asking them to call their insurance agent to confirm Form E
ability before clicking yes/no, so we pick the right path first time.
- Reply-To now routes to info@performancewest.net (monitored), overridable via
SC_COC_REPLY_TO env.
The Dockerfile copies form PDFs explicitly by name; the SC COC template was
missing, so fill_sc_coc() would FileNotFoundError in the container. Added it.
Routes SC intrastate-authority orders to the real SCDMV COC product instead of a
PSC certificate (which doesn't apply to property carriers):
- sc_coc_filing.py: emails the carrier a one-click yes/no — does your insurer
have / can they file a Form E (SC intrastate liability, $750k or $300k by
GVWR) with SCDMV? Records the answer; builds the filled COC package.
- state_trucking._handle_sc_coc_gate: SC intrastate gate —
no answer -> email the question once, HOLD
answered no -> broker referral opened, HOLD (ops todo)
answered yes-> proceed to bill the exact $25 SCDMV COC fee (at cost) + file
- API POST /compliance-orders/:id/sc-insurance: records yes/no in intake_data
(no schema change); NO opens an insurance_lead broker-referral ticket +
Telegram; YES re-dispatches the worker to bill the $25 + file.
- site/order/sc-insurance: customer one-click yes/no page (auto-submits when
the email links straight to ?have=yes|no).
Non-SC intrastate still uses the PSC/PUC email path or a manual todo.
SC for-hire PROPERTY carriers (not passenger/HHG/hazwaste) register intrastate
via the SCDMV Certificate of Compliance (COC), not a PSC certificate. This adds:
- sc_coc_pdf_filler.fill_sc_coc(): fills the official SCDMV Form COC from
intake (business name, officers, physical/mailing address, phone), picks
New vs Renewal, and stamps the coverage class (E-L low-value / E-LC).
Field names in the source PDF are auto-generated + offset from their labels;
mapped here by verified on-page geometry. Verified by render.
- suggest_coverage_class(): E-L for low-value cargo (scrap/dump/aggregate),
else E-LC (safer default).
- gov_fee: SC intrastate fee corrected from $0 placeholder to the real $25
COC new-application fee (renewals $0), billed at cost.
The carrier's INSURER files the Form E (liability) + Form H (cargo, E-LC only)
directly with SCDMV; we collect the COC app + $25 and submit it.
Intrastate operating authority is state-specific + application-based like IRP, so
it reuses the same email/POA + invoice-reconciliation flow:
- intrastate_filing.send_intrastate_submission: emails the state PSC/PUC the
authority application with the signed POA attached (subject tag [PW-ISA CO-..]),
reusing irp_filing's MinIO download + census enrich helpers.
- The shared poller (irp_invoice_poller) now matches BOTH [PW-IRP] and [PW-ISA]
tags, parses the fee, Telegram-alerts, and bills the customer the exact amount
with the correct service slug.
- state_trucking gov-fee gate routes intrastate-authority to the PSC/PUC email
path; if no submission email is configured for the base state it falls back
to a manual todo (safe default — no emailing guessed agency addresses).
Per-state ISA_<ST>_EMAIL env (blank until the exact agency address is verified).
SC/GA/TX scaffolded. Customer still only sees an exact-fee payment link; you only
approve the final filing.
- send_irp_submission now REQUIRES and ATTACHES the signed Power of Attorney PDF
(downloaded from MinIO) — the state won't act on a third-party filing without
it, and 'on file, available on request' stalls the request. If the POA isn't
available we don't email and fall back to a manual todo.
- Backfill missing legal_name + registered address from the FMCSA census so the
submission isn't sent with a blank address (root cause of the empty
'Legal/registered address: , ,' line). Customer-supplied values win.
- state_trucking passes signed_auth_key through to the IRP submitter.
- Fix 'Object of type date is not JSON serializable' when creating the admin
todo (json.dumps(..., default=str)) — broke the intrastate (bash-fee) path.
The gov-fee email now lists exactly what the amount covers (full breakdown) so
the customer can check it for accuracy, with two clear actions: a ✅ pay link and
a ❓ 'something looks wrong' link to /order/dispute.
New /order/dispute page shows the fee breakdown and lets the customer describe
what's wrong; it opens an 'issue' support ticket pre-tagged with the order
(amount + label + their note) via /api/v1/tickets, so ops corrects the fee
before any payment is taken. The /order/pay page also shows the itemized
breakdown and a dispute link.
- gov_fee: add AGENCY_PROCESSING_FEE (per-service card/convenience fee passed
through so the customer pays the true all-in cost); estimate_gov_fee now folds
it into the billed total. IFTA/intrastate/UCR fees are published/near-exact.
- IRP fees can't be looked up — only the base state computes them. New
irp_filing.py: emails the base-state IRP unit a Schedule A/B request (Reply-To
the IRP filings mailbox, [PW-IRP CO-...] subject tag), and a 15-min cron
(irp_invoice_poller) scans the mailbox for the state's invoice reply, parses
the exact apportioned fee, Telegram-alerts you, and bills the customer the
EXACT amount via a gov-fee child order + payment link. Then it proceeds to
ready_to_file for your final approval.
- state_trucking gov-fee gate now routes IRP to the email/invoice path and
IFTA/intrastate to immediate exact-fee billing.
- Mailbox is configurable (IRP_FILINGS_IMAP_* in app.env.j2); falls back to
OPS_IMAP_* filtered by the [PW-IRP] tag until a dedicated mailbox exists.
Telegram alerts fire on IRP submission sent, invoice received (billed), and
un-parseable replies (so you can read + enter the fee manually).
At-cost services (IRP/IFTA/intrastate) only collected our service fee at
checkout; the variable state fee was never billed, so orders stalled at
authorization_signed and the filing card would have had to front large IRP fees.
New end-to-end, hands-off flow (you only approve the final filing):
1. After authorization is signed, state_trucking auto-estimates the gov fee
from intake (base/op states, power units, weight) via gov_fee.estimate_gov_fee.
2. Creates a CHILD compliance order (CG-..., service_fee=0, gov_fee=estimate,
parent_order_number set, migration 099) that flows through the EXISTING
checkout/payment/webhook machinery.
3. Emails the customer a payment link to /order/pay (new self-contained page)
showing every method with correct surcharges — ACH 0% (Stripe 0.8%/ cap
absorbed, no GoCardless needed), card/PayPal 3%, Klarna 6%, crypto 0%.
4. Order holds at awaiting_government_fee_approval until paid.
5. On payment, handlePaymentComplete detects the child (parent_order_number)
and re-dispatches the PARENT with gov_fee_paid=true, which proceeds to
prepare + queue the filing and stops at ready_to_file for your approval.
IRP fees are estimates billed at cost (refund overage / rebill shortfall); IFTA
decals + most intrastate fees are near-exact. Tunable via env.
relay_integration.py line 34 called logging.getenv (no such attr), which threw
AttributeError on import -> load_card_from_erpnext() crashed for every caller
(BOC-3 and now UCR filing payment). Drop the bogus line; LOG is set correctly on
the next line. Present since the initial commit.
Adds scripts/workers/services/ucr_playwright.py — a UCR.gov National Registration
System automation that, given a USDOT + fleet size, runs the register/pay flow,
pays the federal UCR fee with the matched PW filing card (Relay/Stripe Issuing),
and captures a confirmation screenshot + number. Conventions match
boc3_playwright / fmcsa_web_submitter: dev-mode dry-run guard, undetected
(patchright) browser, CAPTCHA detection, screenshot evidence, dataclass result.
Safety: verifies the displayed fee against the federal schedule before paying and
refuses to auto-charge a surprising amount (UCR_MAX_AUTO_FEE_USD) — falls back to
manual filing instead.
Wires it into MCS150UpdateHandler: when an approved (admin_approved) order has
slug ucr-registration, _file_ucr_registration runs the automation, uploads the
confirmation screenshot to MinIO, records filing_status + confirmation, and sets
fulfillment_status=completed on success. On CAPTCHA / fee-mismatch / failure it
reverts to ready_to_file with a high-priority 'file manually' todo. This replaces
the old behavior where approving a UCR just sat at authorization_signed.
UCR (and other admin-assisted DOT services) route through MCS150UpdateHandler,
which hardcoded 'MCS-150' and self.SERVICE_SLUG in the admin todo, the Telegram
fulfillment notification, and the customer status email -- so approving Paul's
UCR produced an 'MCS-150 Review / mcs150-update / PDF: not generated' alert and
an 'MCS-150 biennial update' customer email, both wrong.
Add SERVICE_DISPLAY_NAMES + _service_label(slug); use the actual slug everywhere.
Admin-assisted services now show 'UCR Annual Registration — FILE NOW ... file
manually on the portal (no auto-generated form)' instead of MCS-150/PDF wording,
and the customer email names the right service.
Admin-assisted DOT services (UCR, BOC-3) routed to this handler were marked
ready_to_file with whatever intake existed -- e.g. a UCR with only a DOT number,
missing legal name / state / fleet-size bracket (which sets the UCR fee tier).
That made the admin 'ready to file' status dishonest and unfileable.
Now, for ADMIN_ASSISTED_REQUIRED services we first enrich intake from the FMCSA
census (legal_name, address_state, power_units) + the order email, and derive
the UCR fleet_size_bracket from power units (UCR_FLEET_BRACKETS). If every
required field is then present we persist it and mark intake validated (falls
through to the admin review gate -> ready_to_file). If anything is still
missing, we persist what we have, set fulfillment_status=awaiting_intake, and
email the customer to complete intake -- instead of falsely showing ready_to_file.
Two of our three real paid customers (Mark Adams / mark@adamslumber.com and
Paul Wilson / synthetic@pipeline.com) never completed intake. They each hit the
old hard cap of 10 daily reminders (last sent Jun 12 / Jun 13) and the worker
then went permanently silent -- the last two daily runs reminded 0 orders even
though both still owe us intake on paid work. (The third, mitchell allen /
mitchell@allenscrapmetal.com, did complete intake; his orders are dispatched.)
Replace the dead-stop cap with a two-phase cadence:
- daily for the first DAILY_PHASE (10) nudges -- the initial burst,
- then weekly (WEEKLY_INTERVAL_DAYS) up to an absolute MAX_REMINDERS (60),
so a paid order with missing intake keeps getting a gentle nudge instead of
being dropped. Tunable via INTAKE_REMINDER_DAILY_PHASE /
INTAKE_REMINDER_WEEKLY_INTERVAL_DAYS / INTAKE_REMINDER_MAX. Clearing
intake_reminder_last_at re-arms an order immediately (documented in the
module docstring).
Main pool is calendar-day 12 but reputation is wrecked (54% delivery, Gmail+
Outlook blocks) -- NOT warmed. MX tagging confirmed the cause: 702k carriers on
Google + 135k on Microsoft = the warmup was hammering exactly the two operators
blocking us. Hold Google/MS/Proofpoint/etc. OUT entirely until day 30 (configurable),
sending only to the long-tail operators (yahoo/comcast/charter/centurylink/etc.)
that don't bot-throttle, so reputation can recover; then re-introduce big
operators gradually via mx_daily_caps. 1.24M/1.49M carriers now tagged.
The real bottleneck was the write, not DNS: each per-domain UPDATE full-scanned
fmcsa_carriers (no functional index on lower(split_part(email,'@',2))). Resolve
all domains concurrently into a list, load a temp table, then ONE join-UPDATE =
single table scan. Tags ~12k domains -> hundreds of thousands of carriers fast.
The serial path (verifier's 8s+6s lifetime per domain) was far too slow for
bulk tagging -- 0 tagged in 3 min on dead domains. Self-contained fast resolver
+ ThreadPoolExecutor(40) resolves thousands of domains in minutes.