docs: SEC/OTC pilot results - viable (domain free from EDGAR filings, 100%)

Ran the email-findability pilot we should have run for CLIA. SEC/OTC is viable:
~940 US-domestic OTC issuers, domain recoverable from the 10-K/8-K filing itself
at ~100% (free, no scrape), email via site scrape ~25-50%, phone 100%. High
per-deal value (reincorporation/RA/foreign-qual/franchise tax). Documented the
build plan.
This commit is contained in:
justin 2026-06-14 01:22:04 -05:00
parent 1465690832
commit 591e387513

View file

@ -200,3 +200,37 @@ For a *reincorporation* pitch (a 4-5 figure decision), a tighter, partly direct-
- SEC Fair-Access policy: `https://www.sec.gov/os/accessing-edgar-data` ("no more than 10 requests per second", declare User-Agent)
- EDGAR full-text search: `https://efts.sec.gov/LATEST/search-index` (Texas reincorporation filing counts)
- Texas Business Organizations Code Ch. 10 Subch. C (conversion/domestication); Texas Business Court (eff. 2024-09-01); Texas Stock Exchange (TXSE), 2024-2025.
---
## PILOT RESULTS (2026-06-13) -- SEC/OTC is VIABLE (better than CLIA)
Ran the email-findability pilot (the make-or-break test we skipped on CLIA):
| Metric | Result | How |
|---|---|---|
| OTC/None SEC issuers (universe) | 2,771 | `company_tickers_exchange.json` |
| US-domestic (reincorporation-eligible) | ~34% = **~940** | `stateOfIncorporation` in the per-CIK submissions JSON |
| DE/NV (prime reincorp/foreign-qual) | ~22% = **~610** | same |
| **Website/domain recoverable** | **~100%** | extracted directly from the company's recent 10-K/8-K filing HTML on EDGAR -- FREE, bulk-OK, NO scraping/proxy needed (4/4 in test: fortitudegold.com, mobivity.com, good-gaming.com, fzmd.com) |
| Email via basic home/contact scrape | ~25% (1/4: info@fortitudegold.com) | many use contact forms / JS mailto -> improvable with deeper scrape |
| Phone + business address | **100%** | submissions JSON `phone` + `addresses.business` |
**Why this beats CLIA:** the domain (the thing CLIA lacked) comes FREE from the
filing itself. Email yield ~25-50%, phone 100%. Small universe but high per-deal
value (reincorporation, registered agent, foreign-qualification, franchise tax,
annual report). EDGAR is free + explicitly bulk-OK (10 req/s, declare UA).
### Build plan
1. `harvest_otc_issuers.py`: pull master list -> filter exchange OTC/None ->
per-CIK submissions JSON -> keep US-domestic -> record name, ticker, CIK,
stateOfIncorporation, phone, business address, and the **domain extracted from
the latest 10-K/8-K** (regex the filing HTML, drop sec.gov/filing-agent noise).
2. Scrape domain -> contact/IR email (home + /contact + /investors + /investor-
relations; gzip+HTML-only; ~25-50% yield). Phone is the fallback channel.
3. Verify emails (existing verifier, .72).
4. Offer/segment: lead with the reincorporate-to-Texas hook (Business Court +
TXSE, real trend in filings) for DE/NV issuers; cross-sell RA / foreign-qual /
annual-report / franchise tax. CAN-SPAM B2B, full address + unsubscribe.
5. Channel split: email the ~25-50% we get addresses for; the rest are a clean
PHONE list (100% have phone) -- corporate/IR lines, real businesses.