# Vertical Lead-Source Analysis: Ranked by Email Reliability **Date:** 2026-06-13 **Purpose:** The proven bottleneck for every cold-email vertical is NOT the deficiency signal or the audience size -- it is whether a reliable, public, bulk source gives us a **deliverable email** (or a clean, high-yield path to one). This ranks candidate verticals by that single criterion, using what we verified this session (FCC, FMCSA work; CLIA email-match was 0.3% = dead). ## The rule (learned the hard way) A vertical is **email-viable** only if ONE of these is true: 1. The public registry **contains the email** (FCC RMD `contact_email`, FMCSA carrier `email_address`). -> Tier 1, just send. 2. The registry maps to a **second free public source that has email** by a clean key (NPI, FRN, CIK, domain). -> Tier 2, one enrichment hop. 3. The targets reliably have **websites** so a domain->scrape gets email at decent yield. -> Tier 3, scrape pipeline (proxy). Otherwise it is **phone / direct-mail only** (CLIA, EPA RCRA, raw NPPES individuals). Still real money, just not cold email. ## Tier 1 -- Email is IN the registry (send today) | Vertical | Source | Email field | Recurring obligation | Status | |---|---|---|---|---| | **FCC carriers / VoIP / ISP** | FCC RMD, 499 filer, CORES | `contact_email` (native) | RMD annual, 499-A/Q, CPNI annual | LIVE (built) | | **FMCSA trucking** | FMCSA carrier census | `email_address` (native) | MCS-150 biennial, IFTA quarterly, UCR annual | LIVE (built) | These are the whole reason the business works. Nothing else is as clean. ## Tier 2 -- One free public hop to email (worth building) | Vertical | Registry (no email) | Email source + key | Yield estimate | Notes | |---|---|---|---|---| | **Healthcare providers (org NPIs)** | NPPES | NPPES **endpoint_pfile** (Direct/email endpoints), keyed by NPI | ~88k institutional emails harvested, ~63k verified | ALREADY HARVESTED. The org/institutional slice has real emails (we filtered HISP/Direct gateways). Individual NPIs do NOT. Recurring: revalidation, NPPES update, OIG screening. | | **Public companies (OTC/SEC filers)** | SEC EDGAR (CIK, state of incorp, phone, addr, **website**) | website domain -> scrape IR/contact email; or email-append | Medium-high (real cos w/ IR pages) | ~2,771 SEC-reporting OTC issuers; Delaware/Nevada heavy. Hook: reincorporate-to-TX, annual report, RA, franchise tax. Small but high-ticket. | ## Tier 3 -- Domain-scrape required (proxy pipeline; medium yield) | Vertical | Registry | Why scrape | Yield | |---|---|---|---| | **FMC Ocean Transportation Intermediaries (NVOCC/forwarders)** | FMC OTI lookup | few thousand licensees, most have websites | medium-high; small universe but real businesses + bonds renew | | **State business entities (formation/RA/foreign-qual)** | State SOS bulk (FL/CA/VA/TX free; Socrata) | millions of entities, name+addr+officers, often a website | low-medium per scrape, but HUGE universe; better to target by trigger (newly-formed, delinquent, foreign-qual) | ## Tier 4 -- Phone / direct-mail only (NOT cold email) | Vertical | Registry | Why not email | Best channel | |---|---|---|---| | **CLIA labs** | CMS POS CLIA file | no NPI, no email; NPPES name+zip match = **0.3%** (verified dead) | postcard (~3,100/wk full coverage), phone | | **EPA RCRA hazardous-waste handlers** | ECHO bulk | no email anywhere in ECHO | phone (RCRAInfo), mail, append | | **NPPES individual providers** | NPPES | individuals have phone/fax, rarely a usable org email | phone, fax, web inbound | ## Net recommendation (where to invest next, in order) 1. **Mine the healthcare ORG emails we already harvested harder** (Tier 2, zero new cost). 63k verified institutional emails -> diversify triggers beyond NPI revalidation: NPPES staleness, OIG/SAM screening, org-NPI corrections. The data is already on prod. 2. **SEC/OTC corporate** (Tier 2). Small universe (~2.7k) but high-ticket (reincorporation, RA, franchise tax, foreign-qual) and a timely TX hook. EDGAR is free + bulk-OK; emails via website-domain scrape (we have the pipeline design from CLIA). Worth a pilot because the per-deal value is high. 3. **State business entities by TRIGGER** (Tier 3, biggest universe). Do NOT blast all entities; target newly-formed (need RA/EIN/OA), delinquent/admin- dissolved (reinstatement), or foreign-qualification candidates. Free bulk from FL/CA/VA; email via domain-scrape. This is the largest TAM if the scrape yields. 4. **FMC OTI** (Tier 3, small but clean): few thousand, website-rich, bonds renew annually. Quick win if we want another trucking-adjacent vein. 5. **CLIA / EPA RCRA: keep as phone/postcard**, not email. Service + LP exist for CLIA; drive via mail to a "check your expiration" web tool that captures email. ## The honest meta-point We have spent effort proving that **most government registries are email-poor.** The reliable email money is: FCC + FMCSA (native), plus the **healthcare org emails we already harvested**. Everything else is either a scrape gamble or a phone/mail channel. Before building any new vertical, confirm its email path falls in Tier 1-2; if it is Tier 3, pilot the scrape yield FIRST (like we should have for CLIA); if Tier 4, don't pretend it is an email channel. ## Update 2026-06-13: healthcare org-email diversification (ACTED ON #1) Unlocked the full verified institutional pool for broad offers: - Root cause found: OIG/NPPES segments were gated by a warmup selector that excluded `not_on_list` rows (a deliverability proxy that excluded ~62k of the 63k -- org NPIs are not individual Medicare enrollees). Since we already SMTP-verified every inbox, added `institutional_verified` selector that trusts our verification. OIG screening + NPPES update now address **62,422** (was ~1,437). - `enrich_institutional_revalidation.py` joins the institutional list to the CMS Revalidation Due Date List (revalidation_base.csv) by NPI -> ~1,437 genuine Medicare enrollees (197 overdue / 164 due-soon) for the flagship $599 reval pitch. - `pw-hc-nppes` cron now runs oig_screening + nppes_outdated + revalidation_overdue + revalidation_due_soon against the enriched file (still warmup-capped + MX-throttled; bigger supply, same safe send rate). npi_reactivation stays on the accurate leie_or_deactivated selector (no false "deactivated" claims). - `pw-hc-refresh` cron now re-downloads + re-joins the reval base so overdue figures stay accurate. - MAINTENANCE: the CMS bulk file URLs (revalidation_base.csv, CLIA) embed a dated path that rotates ~monthly. If the download 404s, re-fetch the dataset's current downloadURL from https://data.cms.gov/data.json. Consider switching to the dataset's stable data-api endpoint.