scripts(otc): dedupe by CIK; commit the 861-company lead list
The master file lists warrants/units as separate tickers under one CIK, so the pull now dedupes to one row per company (other tickers kept in all_tickers). data/otc_leads.csv: 861 unique active US-domestic microcap OTC issuers (<$75M float, all actively filing, 100% with business address + phone). By incorporation: DE 365, NV 325 (DE+NV=690 = the reincorporation targets), WY 44, FL 39, MD 38. Dropped from the 2,771 OTC universe: 1,672 foreign, 62 accelerated/large filers, 73 delinquent/dark. EDGAR has no email -> phone + address captured for enrichment / direct mail / call.
This commit is contained in:
parent
1b3cbf2fbf
commit
37393e5bbc
3 changed files with 2684 additions and 0 deletions
|
|
@ -154,6 +154,20 @@ def main() -> int:
|
|||
ix = {f: i for i, f in enumerate(fields)}
|
||||
rows = master["data"]
|
||||
otc = [r for r in rows if r[ix["exchange"]] in OTC_EXCHANGES]
|
||||
# The master file lists warrants/units as separate tickers under the same CIK
|
||||
# (e.g. ABPO + ABPWW = Abpro). Dedupe to one row per company (first ticker
|
||||
# wins); the issuer's other tickers are still captured in `all_tickers`.
|
||||
seen_cik: set[int] = set()
|
||||
deduped = []
|
||||
for r in otc:
|
||||
cik = r[ix["cik"]]
|
||||
if cik in seen_cik:
|
||||
continue
|
||||
seen_cik.add(cik)
|
||||
deduped.append(r)
|
||||
if len(deduped) != len(otc):
|
||||
log(f"deduped {len(otc) - len(deduped)} warrant/unit ticker rows -> {len(deduped)} unique companies")
|
||||
otc = deduped
|
||||
if args.limit:
|
||||
otc = otc[: args.limit]
|
||||
log(f"OTC/off-exchange issuers to inspect: {len(otc)} (of {len(rows)} total tickers)")
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue