scripts(otc): dedupe by CIK; commit the 861-company lead list

The master file lists warrants/units as separate tickers under one CIK, so the
pull now dedupes to one row per company (other tickers kept in all_tickers).

data/otc_leads.csv: 861 unique active US-domestic microcap OTC issuers
(<$75M float, all actively filing, 100% with business address + phone). By
incorporation: DE 365, NV 325 (DE+NV=690 = the reincorporation targets), WY 44,
FL 39, MD 38. Dropped from the 2,771 OTC universe: 1,672 foreign, 62
accelerated/large filers, 73 delinquent/dark. EDGAR has no email -> phone +
address captured for enrichment / direct mail / call.
This commit is contained in:
justin 2026-06-09 07:10:54 -05:00
parent 1b3cbf2fbf
commit 37393e5bbc
3 changed files with 2684 additions and 0 deletions

View file

@ -154,6 +154,20 @@ def main() -> int:
ix = {f: i for i, f in enumerate(fields)}
rows = master["data"]
otc = [r for r in rows if r[ix["exchange"]] in OTC_EXCHANGES]
# The master file lists warrants/units as separate tickers under the same CIK
# (e.g. ABPO + ABPWW = Abpro). Dedupe to one row per company (first ticker
# wins); the issuer's other tickers are still captured in `all_tickers`.
seen_cik: set[int] = set()
deduped = []
for r in otc:
cik = r[ix["cik"]]
if cik in seen_cik:
continue
seen_cik.add(cik)
deduped.append(r)
if len(deduped) != len(otc):
log(f"deduped {len(otc) - len(deduped)} warrant/unit ticker rows -> {len(deduped)} unique companies")
otc = deduped
if args.limit:
otc = otc[: args.limit]
log(f"OTC/off-exchange issuers to inspect: {len(otc)} (of {len(rows)} total tickers)")