122 lines
7.8 KiB
Markdown
122 lines
7.8 KiB
Markdown
# Vertical Lead-Source Analysis: Ranked by Email Reliability
|
|
|
|
**Date:** 2026-06-13
|
|
**Purpose:** The proven bottleneck for every cold-email vertical is NOT the
|
|
deficiency signal or the audience size -- it is whether a reliable, public, bulk
|
|
source gives us a **deliverable email** (or a clean, high-yield path to one).
|
|
This ranks candidate verticals by that single criterion, using what we verified
|
|
this session (FCC, FMCSA work; CLIA email-match was 0.3% = dead).
|
|
|
|
## The rule (learned the hard way)
|
|
|
|
A vertical is **email-viable** only if ONE of these is true:
|
|
1. The public registry **contains the email** (FCC RMD `contact_email`, FMCSA
|
|
carrier `email_address`). -> Tier 1, just send.
|
|
2. The registry maps to a **second free public source that has email** by a clean
|
|
key (NPI, FRN, CIK, domain). -> Tier 2, one enrichment hop.
|
|
3. The targets reliably have **websites** so a domain->scrape gets email at
|
|
decent yield. -> Tier 3, scrape pipeline (proxy).
|
|
Otherwise it is **phone / direct-mail only** (CLIA, EPA RCRA, raw NPPES
|
|
individuals). Still real money, just not cold email.
|
|
|
|
## Tier 1 -- Email is IN the registry (send today)
|
|
|
|
| Vertical | Source | Email field | Recurring obligation | Status |
|
|
|---|---|---|---|---|
|
|
| **FCC carriers / VoIP / ISP** | FCC RMD, 499 filer, CORES | `contact_email` (native) | RMD annual, 499-A/Q, CPNI annual | LIVE (built) |
|
|
| **FMCSA trucking** | FMCSA carrier census | `email_address` (native) | MCS-150 biennial, IFTA quarterly, UCR annual | LIVE (built) |
|
|
|
|
These are the whole reason the business works. Nothing else is as clean.
|
|
|
|
## Tier 2 -- One free public hop to email (worth building)
|
|
|
|
| Vertical | Registry (no email) | Email source + key | Yield estimate | Notes |
|
|
|---|---|---|---|---|
|
|
| **Healthcare providers (org NPIs)** | NPPES | NPPES **endpoint_pfile** (Direct/email endpoints), keyed by NPI | ~88k institutional emails harvested, ~63k verified | ALREADY HARVESTED. The org/institutional slice has real emails (we filtered HISP/Direct gateways). Individual NPIs do NOT. Recurring: revalidation, NPPES update, OIG screening. |
|
|
| **Public companies (OTC/SEC filers)** | SEC EDGAR (CIK, state of incorp, phone, addr, **website**) | website domain -> scrape IR/contact email; or email-append | Medium-high (real cos w/ IR pages) | ~2,771 SEC-reporting OTC issuers; Delaware/Nevada heavy. Hook: reincorporate-to-TX, annual report, RA, franchise tax. Small but high-ticket. |
|
|
|
|
## Tier 3 -- Domain-scrape required (proxy pipeline; medium yield)
|
|
|
|
| Vertical | Registry | Why scrape | Yield |
|
|
|---|---|---|---|
|
|
| **FMC Ocean Transportation Intermediaries (NVOCC/forwarders)** | FMC OTI lookup | few thousand licensees, most have websites | medium-high; small universe but real businesses + bonds renew |
|
|
| **State business entities (formation/RA/foreign-qual)** | State SOS bulk (FL/CA/VA/TX free; Socrata) | millions of entities, name+addr+officers, often a website | low-medium per scrape, but HUGE universe; better to target by trigger (newly-formed, delinquent, foreign-qual) |
|
|
|
|
## Tier 4 -- Phone / direct-mail only (NOT cold email)
|
|
|
|
| Vertical | Registry | Why not email | Best channel |
|
|
|---|---|---|---|
|
|
| **CLIA labs** | CMS POS CLIA file | no NPI, no email; NPPES name+zip match = **0.3%** (verified dead) | postcard (~3,100/wk full coverage), phone |
|
|
| **EPA RCRA hazardous-waste handlers** | ECHO bulk | no email anywhere in ECHO | phone (RCRAInfo), mail, append |
|
|
| **NPPES individual providers** | NPPES | individuals have phone/fax, rarely a usable org email | phone, fax, web inbound |
|
|
|
|
## Net recommendation (where to invest next, in order)
|
|
|
|
1. **Mine the healthcare ORG emails we already harvested harder** (Tier 2, zero
|
|
new cost). 63k verified institutional emails -> diversify triggers beyond NPI
|
|
revalidation: NPPES staleness, OIG/SAM screening, org-NPI corrections. The
|
|
data is already on prod.
|
|
2. **SEC/OTC corporate** (Tier 2). Small universe (~2.7k) but high-ticket
|
|
(reincorporation, RA, franchise tax, foreign-qual) and a timely TX hook.
|
|
EDGAR is free + bulk-OK; emails via website-domain scrape (we have the
|
|
pipeline design from CLIA). Worth a pilot because the per-deal value is high.
|
|
3. **State business entities by TRIGGER** (Tier 3, biggest universe). Do NOT
|
|
blast all entities; target newly-formed (need RA/EIN/OA), delinquent/admin-
|
|
dissolved (reinstatement), or foreign-qualification candidates. Free bulk from
|
|
FL/CA/VA; email via domain-scrape. This is the largest TAM if the scrape
|
|
yields.
|
|
4. **FMC OTI** (Tier 3, small but clean): few thousand, website-rich, bonds renew
|
|
annually. Quick win if we want another trucking-adjacent vein.
|
|
5. **CLIA / EPA RCRA: keep as phone/postcard**, not email. Service + LP exist for
|
|
CLIA; drive via mail to a "check your expiration" web tool that captures email.
|
|
|
|
## The honest meta-point
|
|
|
|
We have spent effort proving that **most government registries are email-poor.**
|
|
The reliable email money is: FCC + FMCSA (native), plus the **healthcare org
|
|
emails we already harvested**. Everything else is either a scrape gamble or a
|
|
phone/mail channel. Before building any new vertical, confirm its email path
|
|
falls in Tier 1-2; if it is Tier 3, pilot the scrape yield FIRST (like we should
|
|
have for CLIA); if Tier 4, don't pretend it is an email channel.
|
|
|
|
## Update 2026-06-13: healthcare org-email diversification (ACTED ON #1)
|
|
|
|
Unlocked the full verified institutional pool for broad offers:
|
|
- Root cause found: OIG/NPPES segments were gated by a warmup selector that
|
|
excluded `not_on_list` rows (a deliverability proxy that excluded ~62k of the
|
|
63k -- org NPIs are not individual Medicare enrollees). Since we already
|
|
SMTP-verified every inbox, added `institutional_verified` selector that trusts
|
|
our verification. OIG screening + NPPES update now address **62,422** (was
|
|
~1,437).
|
|
- `enrich_institutional_revalidation.py` joins the institutional list to the CMS
|
|
Revalidation Due Date List (revalidation_base.csv) by NPI -> ~1,437 genuine
|
|
Medicare enrollees (197 overdue / 164 due-soon) for the flagship $599 reval pitch.
|
|
- `pw-hc-nppes` cron now runs oig_screening + nppes_outdated + revalidation_overdue
|
|
+ revalidation_due_soon against the enriched file (still warmup-capped +
|
|
MX-throttled; bigger supply, same safe send rate). npi_reactivation stays on
|
|
the accurate leie_or_deactivated selector (no false "deactivated" claims).
|
|
- `pw-hc-refresh` cron now re-downloads + re-joins the reval base so overdue
|
|
figures stay accurate.
|
|
- MAINTENANCE: the CMS bulk file URLs (revalidation_base.csv, CLIA) embed a
|
|
dated path that rotates ~monthly. If the download 404s, re-fetch the dataset's
|
|
current downloadURL from https://data.cms.gov/data.json. Consider switching to
|
|
the dataset's stable data-api endpoint.
|
|
|
|
## Update 2026-06-14: trucking/main pool per-MX throttling (deliverability fix)
|
|
|
|
The persistent main-pool 54% delivery + Gmail/Outlook block storm (Jun 13-14)
|
|
root cause, now PROVEN by MX-tagging the carrier pool:
|
|
- **702,214 carriers on Google + 135,129 on Microsoft** -- the warmup was
|
|
hammering exactly the two operators blocking us (no per-MX throttle on trucking,
|
|
only HC had it).
|
|
- Fix: migration 097 (mx_provider) + mx_tag_carriers.py (concurrent MX resolve,
|
|
bulk temp-table-join write -- 1.24M/1.49M carriers tagged). build_trucking_
|
|
campaigns now EXCLUDES Google/Microsoft/Proofpoint/etc. until warmup day 30
|
|
(reputation recovery), per-MX caps thereafter. Untagged carriers pass (most are
|
|
now tagged).
|
|
- Effect on the MCS-150 overdue pool: 496,743 sendable -> 230,135 after excluding
|
|
263,515 Google/MS carriers. Plenty of long-tail volume (yahoo/comcast/charter/
|
|
centurylink/windstream/earthlink/...) to warm on safely while reputation recovers.
|
|
- MAINTENANCE: re-run mx_tag_carriers.py periodically (or add to the trucking
|
|
cron precursor) to tag newly-added carriers; flip MAIN_BIG_MX_EXCLUDE_UNTIL_DAY
|
|
or MAIN_SKIP_BIG_MX=0 once Postmaster Tools shows recovered reputation.
|