Includes: API (Express/TypeScript), Astro site, Python workers, document generators, FCC compliance tools, Canada CRTC formation, Ansible infrastructure, and deployment scripts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5.4 KiB
Entity Cache Data Sources
Bulk business entity data for the corporation status check feature. Updated: 2026-04-20
Working Socrata SODA API States (free, JSON, unlimited)
| State | Dataset ID | Records | Status Field | Formation State Field | Notes |
|---|---|---|---|---|---|
| CO | 4ykn-tg5h |
~3M | entitystatus |
jurisdictonofformation |
Fully loaded |
| IA | ykb6-ywnd |
~500K | entity_status |
home_state |
Working |
| CT | n7gp-d28j |
~1.2M | status |
state_of_formation |
Working |
| OR | tckn-sxa6 |
~800K | status |
state_of_origin |
Active businesses only |
| NY | n9v6-gdp6 |
~2M | N/A (active only) | jurisdiction |
No status field — all records are active |
API pattern: https://data.{state}.gov/resource/{id}.json?$limit=50000&$offset=0&$order=:id
Broken Socrata URLs (portals reorganized, need new IDs)
| State | Old ID | Notes |
|---|---|---|
| WA | 7naq-cqm3 |
404. data.wa.gov catalog empty for business category |
| IL | vqps-xatp |
404. IL SOS prohibits bulk scraping officially |
| PA | 6ftj-q3fu |
404. PA has xvd7-5r2c but no status field |
| MI | uc6u-xab8 |
404. LARA portal, no confirmed free download |
| AK | p2kg-xwxr |
DNS failure. data.alaska.gov may be deprecated |
| VT | c7cm-s92n |
404. VT open data portal reorganized |
Free Bulk Download (non-Socrata)
| State | Source | Format | Cost | Fields | Status |
|---|---|---|---|---|---|
| FL | Sunbiz FTP | Fixed-width ASCII | Free (register for FTP creds) | Name, status (A/I), filing type, date, EIN, address, RA, officers | Has status |
| VA | data.virginia.gov | XLSX (~86MB) | Free | Name, address, officers, status, type, creation date | Has status |
FL download: https://dos.fl.gov/sunbiz/other-services/data-downloads/ VA download: https://data.virginia.gov/dataset/corporation
Free Subscription Downloads
| State | Source | Cost | Records | Notes |
|---|---|---|---|---|
| CA | bizfileOnline.sos.ca.gov | FREE (weekly subscription) | ~17M | Sign up at BizFileOnline → BE & UCC Bulk Orders → Weekly Data Download |
| FL | sftp.floridados.gov | FREE (SFTP) | ~4M | User: Public / Pass: PubAccess1845! — Quarterly full + daily diffs |
Paid Bulk Data
| State | Source | Cost | Notes |
|---|---|---|---|
| WY | SOS subscription form | $10K+/year | Too expensive — we scrape WyoBiz instead |
| TX | SOSDirect bulk orders | $20/month (weekly) or $1,350 one-time | https://direct.sos.state.tx.us/help/help-corp.asp?pg=bulk |
| TX | Comptroller franchise tax | FREE on data.texas.gov (xn8i-yb9w) | 3.2M records but SODA API returns empty — may need portal CSV export |
| MN | SOS data subscription | $30/week (free non-commercial) | CSV, delivered within 10 days |
| NE | SOS special request | $15 per 1,000 records | CSV with filters |
| AZ | Corp Commission form M027 | $75 partial / $1,000 full | Importable format |
| NC | SOS data subscription | $750 initial + $250/year | FTP weekly updates |
| LA | SOS office | $6,900–$12,500 | Too expensive |
No Bulk Access (Playwright search only)
These states require live SOS portal searches via our Playwright adapters (~3-20s per lookup, cached 24h):
DE, IL, GA, MA, MD, NH, NJ, SC, SD, TN, KY, IN, MS, MO, WV, ND, OK, RI, HI, NM, NV (search API only), MT, NE (unless paid), AL, AR, KS, LA, ME
Our state adapters handle all 52 jurisdictions via search_name() for on-demand lookups.
SEC EDGAR (public companies only)
For ~10K publicly-traded companies, SEC filings include authoritative state of incorporation:
- Company list: https://www.sec.gov/files/company_tickers.json
- Detail: https://data.sec.gov/submissions/CIK{padded_10}.json
- Fields:
stateOfIncorporation,name,ein,addresses - Rate limit: 10 req/sec, free, requires User-Agent header
- Limitation: Only SEC-registered filers (public companies, not private LLCs)
Aggregator APIs
| Service | Free Tier | Coverage | Notes |
|---|---|---|---|
| OpenCorporates | 200 calls/month | 170+ jurisdictions | Not viable for bulk. Paid plans start GBP 2,250/yr |
| Cobalt Intelligence | 20 free lookups | All 50 states | Credit-based paid API. Gold standard but expensive |
| Apify "US Business Entity Search" | Pay-per-use | 34 state registries | Uses SIP Public Data Gateway. Most comprehensive |
Daily Cron
The pw-entity-cache-refresh timer runs at 07:00 UTC (2am CT) daily:
python -m scripts.formation.bulk_download --all
Downloads all configured Socrata states and upserts into entity_cache.
Schema
-- entity_cache table (migration 009)
entity_name TEXT NOT NULL -- Uppercase
entity_number TEXT -- State filing number
entity_type TEXT -- LLC, CORPORATION, LP, NONPROFIT
status TEXT -- ACTIVE, DISSOLVED, SUSPENDED, DELINQUENT, INACTIVE
formation_date DATE
formation_state TEXT -- 2-letter code of state where entity was originally formed
registered_agent TEXT
principal_address TEXT
state TEXT NOT NULL -- State this record is registered in
source TEXT DEFAULT 'socrata'
UNIQUE(jurisdiction, entity_number)
INDEX gin_trgm on entity_name -- Fuzzy search
INDEX on state
INDEX on status