docs: verified free NPI email-append paths (NPPES endpoint file + free SMTP/MX verify)

This commit is contained in:
justin 2026-06-05 01:00:54 -05:00
parent 604ad151c7
commit 091ebbd7f9

View file

@ -403,3 +403,77 @@ The single best, defensible, dateable hook is **217,968 providers with OVERDUE
Medicare revalidation**, each enrichable with NPPES address/phone/fax for
outreach. That is a larger and harder-deadline audience than the FCC RMD list,
and the $399 revalidation filing is a clean flagship product.
---
## 8. Free Email Append for NPI — VERIFIED FINDINGS
**Yes, there is a partial free email source, plus a free verification path.**
Investigated and tested live on the session date.
### 8.1 NPPES Endpoint file = free, NPI-keyed email addresses
The NPPES dissemination ZIP contains a separate **`endpoint_pfile`** (123 MB)
that we had not previously parsed. It holds electronic endpoints keyed by **NPI**.
Verified contents:
- **597,927 endpoint rows, covering 491,761 distinct NPIs.**
- **390,639 rows are email-formatted** (`user@domain.tld`).
- Endpoint types: DIRECT 356,394 · CONNECT 91,616 · SOAP 56,543 · FHIR 46,764 ·
OTHERS 45,938 · REST 672.
**The honest catch:** most are **Direct Secure Messaging (HISP) addresses**, not
normal inboxes. The top domains are health-system Direct gateways
(`ehrdirect.mayoclinicmsg.org`, `direct.iuhealth.org`, `upmcdirect.com`, …).
Direct addresses route only inside the DirectTrust network — **you cannot cold-
email them from a normal mail server; they will not deliver.** So the raw 390k is
NOT a usable marketing email list.
**BUT — the usable slice:** a meaningful subset are **real consumer/business
inboxes** the provider self-published:
- **~19,759 rows on common consumer webmail** (gmail.com 12,427, plus yahoo,
hotmail, outlook, aol, icloud).
- Verified samples are clearly personal/practice inboxes: `tcneurology@gmail.com`,
`veinsofkc@yahoo.com`, `kendalncarlsondmd@gmail.com`, `scottcopt@aol.com`, etc.
- Plus an additional long tail of non-Direct **practice-domain** emails
(clinic websites) that are also normal inboxes.
So the genuinely free, cold-emailable slice from NPPES endpoints is on the order
of **tens of thousands** (consumer webmail + real practice domains), not the full
391k. Still free, still NPI-keyed (joins to revalidation/LEIE/etc.), and exactly
the small-practice owner-operators who are our buyer.
### 8.2 Free SMTP/MX verification is possible from our infra
Tested from this host:
- **Port 25 egress is OPEN** (connected to `gmail-smtp-in.l.google.com:25`, got
`220` banner).
- **MX lookups work** (resolved MX for gmail.com, mayoclinic.org).
That means we can run **free email verification** ourselves (MX check + SMTP
RCPT-TO probe) with no paid validation vendor, to:
1. Filter the endpoint emails down to deliverable ones.
2. Verify guessed emails for the domain-inference path below.
> Caution: aggressive SMTP probing can get an IP greylisted/blocked. Throttle,
> rotate, and prefer MX-only validation where possible. Do it from a non-sending
> IP so it never touches our warmed MTA reputation.
### 8.3 Free domain-inference append (for the rest)
For NPIs without a usable endpoint email but with an org name + practice address:
1. Find the practice **website** (search name + city; or guess `name.com`).
2. Generate candidate emails (`info@`, `office@`, `contact@`, `first.last@`).
3. **MX + SMTP verify for free** (8.2) and keep only deliverable.
This is zero-cost compute, just our time/infra. Lower hit rate than a paid append
vendor but free.
### 8.4 Bottom line
| Path | Cost | Yield | Cold-emailable? |
|---|---|---|---|
| NPPES endpoint Direct addresses (~356k) | free | high count | ❌ no (HISP-only routing) |
| NPPES endpoint consumer/practice inboxes (~20k+) | free | tens of thousands | ✅ yes |
| Domain-inference + free SMTP verify | free (compute) | medium, varies | ✅ yes |
| Paid B2B email append vendor | $ per match | highest | ✅ yes |
**Recommendation:** build a free pipeline = (a) extract the cold-emailable subset
of endpoint emails, (b) domain-infer + free-SMTP-verify the rest, (c) fall back to
phone/fax/mail for non-matches. This recovers a real email channel for a
meaningful chunk of the 217,968 overdue-revalidation targets at **zero vendor
cost**, and we verify deliverability ourselves since port 25 + MX both work here.