7.8 KiB
Vertical Lead-Source Analysis: Ranked by Email Reliability
Date: 2026-06-13 Purpose: The proven bottleneck for every cold-email vertical is NOT the deficiency signal or the audience size -- it is whether a reliable, public, bulk source gives us a deliverable email (or a clean, high-yield path to one). This ranks candidate verticals by that single criterion, using what we verified this session (FCC, FMCSA work; CLIA email-match was 0.3% = dead).
The rule (learned the hard way)
A vertical is email-viable only if ONE of these is true:
- The public registry contains the email (FCC RMD
contact_email, FMCSA carrieremail_address). -> Tier 1, just send. - The registry maps to a second free public source that has email by a clean key (NPI, FRN, CIK, domain). -> Tier 2, one enrichment hop.
- The targets reliably have websites so a domain->scrape gets email at decent yield. -> Tier 3, scrape pipeline (proxy). Otherwise it is phone / direct-mail only (CLIA, EPA RCRA, raw NPPES individuals). Still real money, just not cold email.
Tier 1 -- Email is IN the registry (send today)
| Vertical | Source | Email field | Recurring obligation | Status |
|---|---|---|---|---|
| FCC carriers / VoIP / ISP | FCC RMD, 499 filer, CORES | contact_email (native) |
RMD annual, 499-A/Q, CPNI annual | LIVE (built) |
| FMCSA trucking | FMCSA carrier census | email_address (native) |
MCS-150 biennial, IFTA quarterly, UCR annual | LIVE (built) |
These are the whole reason the business works. Nothing else is as clean.
Tier 2 -- One free public hop to email (worth building)
| Vertical | Registry (no email) | Email source + key | Yield estimate | Notes |
|---|---|---|---|---|
| Healthcare providers (org NPIs) | NPPES | NPPES endpoint_pfile (Direct/email endpoints), keyed by NPI | ~88k institutional emails harvested, ~63k verified | ALREADY HARVESTED. The org/institutional slice has real emails (we filtered HISP/Direct gateways). Individual NPIs do NOT. Recurring: revalidation, NPPES update, OIG screening. |
| Public companies (OTC/SEC filers) | SEC EDGAR (CIK, state of incorp, phone, addr, website) | website domain -> scrape IR/contact email; or email-append | Medium-high (real cos w/ IR pages) | ~2,771 SEC-reporting OTC issuers; Delaware/Nevada heavy. Hook: reincorporate-to-TX, annual report, RA, franchise tax. Small but high-ticket. |
Tier 3 -- Domain-scrape required (proxy pipeline; medium yield)
| Vertical | Registry | Why scrape | Yield |
|---|---|---|---|
| FMC Ocean Transportation Intermediaries (NVOCC/forwarders) | FMC OTI lookup | few thousand licensees, most have websites | medium-high; small universe but real businesses + bonds renew |
| State business entities (formation/RA/foreign-qual) | State SOS bulk (FL/CA/VA/TX free; Socrata) | millions of entities, name+addr+officers, often a website | low-medium per scrape, but HUGE universe; better to target by trigger (newly-formed, delinquent, foreign-qual) |
Tier 4 -- Phone / direct-mail only (NOT cold email)
| Vertical | Registry | Why not email | Best channel |
|---|---|---|---|
| CLIA labs | CMS POS CLIA file | no NPI, no email; NPPES name+zip match = 0.3% (verified dead) | postcard (~3,100/wk full coverage), phone |
| EPA RCRA hazardous-waste handlers | ECHO bulk | no email anywhere in ECHO | phone (RCRAInfo), mail, append |
| NPPES individual providers | NPPES | individuals have phone/fax, rarely a usable org email | phone, fax, web inbound |
Net recommendation (where to invest next, in order)
- Mine the healthcare ORG emails we already harvested harder (Tier 2, zero new cost). 63k verified institutional emails -> diversify triggers beyond NPI revalidation: NPPES staleness, OIG/SAM screening, org-NPI corrections. The data is already on prod.
- SEC/OTC corporate (Tier 2). Small universe (~2.7k) but high-ticket (reincorporation, RA, franchise tax, foreign-qual) and a timely TX hook. EDGAR is free + bulk-OK; emails via website-domain scrape (we have the pipeline design from CLIA). Worth a pilot because the per-deal value is high.
- State business entities by TRIGGER (Tier 3, biggest universe). Do NOT blast all entities; target newly-formed (need RA/EIN/OA), delinquent/admin- dissolved (reinstatement), or foreign-qualification candidates. Free bulk from FL/CA/VA; email via domain-scrape. This is the largest TAM if the scrape yields.
- FMC OTI (Tier 3, small but clean): few thousand, website-rich, bonds renew annually. Quick win if we want another trucking-adjacent vein.
- CLIA / EPA RCRA: keep as phone/postcard, not email. Service + LP exist for CLIA; drive via mail to a "check your expiration" web tool that captures email.
The honest meta-point
We have spent effort proving that most government registries are email-poor. The reliable email money is: FCC + FMCSA (native), plus the healthcare org emails we already harvested. Everything else is either a scrape gamble or a phone/mail channel. Before building any new vertical, confirm its email path falls in Tier 1-2; if it is Tier 3, pilot the scrape yield FIRST (like we should have for CLIA); if Tier 4, don't pretend it is an email channel.
Update 2026-06-13: healthcare org-email diversification (ACTED ON #1)
Unlocked the full verified institutional pool for broad offers:
- Root cause found: OIG/NPPES segments were gated by a warmup selector that
excluded
not_on_listrows (a deliverability proxy that excluded ~62k of the 63k -- org NPIs are not individual Medicare enrollees). Since we already SMTP-verified every inbox, addedinstitutional_verifiedselector that trusts our verification. OIG screening + NPPES update now address 62,422 (was ~1,437). enrich_institutional_revalidation.pyjoins the institutional list to the CMS Revalidation Due Date List (revalidation_base.csv) by NPI -> ~1,437 genuine Medicare enrollees (197 overdue / 164 due-soon) for the flagship $599 reval pitch.pw-hc-nppescron now runs oig_screening + nppes_outdated + revalidation_overdue- revalidation_due_soon against the enriched file (still warmup-capped + MX-throttled; bigger supply, same safe send rate). npi_reactivation stays on the accurate leie_or_deactivated selector (no false "deactivated" claims).
pw-hc-refreshcron now re-downloads + re-joins the reval base so overdue figures stay accurate.- MAINTENANCE: the CMS bulk file URLs (revalidation_base.csv, CLIA) embed a dated path that rotates ~monthly. If the download 404s, re-fetch the dataset's current downloadURL from https://data.cms.gov/data.json. Consider switching to the dataset's stable data-api endpoint.
Update 2026-06-14: trucking/main pool per-MX throttling (deliverability fix)
The persistent main-pool 54% delivery + Gmail/Outlook block storm (Jun 13-14) root cause, now PROVEN by MX-tagging the carrier pool:
- 702,214 carriers on Google + 135,129 on Microsoft -- the warmup was hammering exactly the two operators blocking us (no per-MX throttle on trucking, only HC had it).
- Fix: migration 097 (mx_provider) + mx_tag_carriers.py (concurrent MX resolve, bulk temp-table-join write -- 1.24M/1.49M carriers tagged). build_trucking_ campaigns now EXCLUDES Google/Microsoft/Proofpoint/etc. until warmup day 30 (reputation recovery), per-MX caps thereafter. Untagged carriers pass (most are now tagged).
- Effect on the MCS-150 overdue pool: 496,743 sendable -> 230,135 after excluding 263,515 Google/MS carriers. Plenty of long-tail volume (yahoo/comcast/charter/ centurylink/windstream/earthlink/...) to warm on safely while reputation recovers.
- MAINTENANCE: re-run mx_tag_carriers.py periodically (or add to the trucking cron precursor) to tag newly-added carriers; flip MAIN_BIG_MX_EXCLUDE_UNTIL_DAY or MAIN_SKIP_BIG_MX=0 once Postmaster Tools shows recovered reputation.