Commit graph

1 commit

Author SHA1 Message Date
justin
4d3af2aeae otc: domain->email scraper + filing-agent domain filtering
scrape_otc_emails.py: fetch each issuer domain's IR/contact pages (gzip,
HTML-only, early-abort, prefer ir@/investor@/info@), extract a contact email.
Skip filing-agent domains (DFN/Donnelley/Broadridge/etc.) that leak into the
extracted domain -- those are not the issuer's site. Same filter added to the
harvester's DOMAIN_NOISE for future runs. Phone (100%) is the fallback channel
for email misses.
2026-06-14 06:56:45 -05:00