# Entity Cache Data Sources Bulk business entity data for the corporation status check feature. Updated: 2026-04-20 ## Working Socrata SODA API States (free, JSON, unlimited) | State | Dataset ID | Records | Status Field | Formation State Field | Notes | |-------|-----------|---------|--------------|----------------------|-------| | CO | `4ykn-tg5h` | ~3M | `entitystatus` | `jurisdictonofformation` | Fully loaded | | IA | `ykb6-ywnd` | ~500K | `entity_status` | `home_state` | Working | | CT | `n7gp-d28j` | ~1.2M | `status` | `state_of_formation` | Working | | OR | `tckn-sxa6` | ~800K | `status` | `state_of_origin` | Active businesses only | | NY | `n9v6-gdp6` | ~2M | N/A (active only) | `jurisdiction` | No status field — all records are active | **API pattern:** `https://data.{state}.gov/resource/{id}.json?$limit=50000&$offset=0&$order=:id` ## Broken Socrata URLs (portals reorganized, need new IDs) | State | Old ID | Notes | |-------|--------|-------| | WA | `7naq-cqm3` | 404. data.wa.gov catalog empty for business category | | IL | `vqps-xatp` | 404. IL SOS prohibits bulk scraping officially | | PA | `6ftj-q3fu` | 404. PA has `xvd7-5r2c` but no status field | | MI | `uc6u-xab8` | 404. LARA portal, no confirmed free download | | AK | `p2kg-xwxr` | DNS failure. data.alaska.gov may be deprecated | | VT | `c7cm-s92n` | 404. VT open data portal reorganized | ## Free Bulk Download (non-Socrata) | State | Source | Format | Cost | Fields | Status | |-------|--------|--------|------|--------|--------| | FL | Sunbiz FTP | Fixed-width ASCII | Free (register for FTP creds) | Name, status (A/I), filing type, date, EIN, address, RA, officers | Has status | | VA | data.virginia.gov | XLSX (~86MB) | Free | Name, address, officers, status, type, creation date | Has status | **FL download:** https://dos.fl.gov/sunbiz/other-services/data-downloads/ **VA download:** https://data.virginia.gov/dataset/corporation ## Free Subscription Downloads | State | Source | Cost | Records | Notes | |-------|--------|------|---------|-------| | CA | bizfileOnline.sos.ca.gov | **FREE** (weekly subscription) | ~17M | Sign up at BizFileOnline → BE & UCC Bulk Orders → Weekly Data Download | | FL | sftp.floridados.gov | **FREE** (SFTP) | ~4M | User: Public / Pass: PubAccess1845! — Quarterly full + daily diffs | ## Paid Bulk Data | State | Source | Cost | Notes | |-------|--------|------|-------| | WY | SOS subscription form | $10K+/year | Too expensive — we scrape WyoBiz instead | | TX | SOSDirect bulk orders | $20/month (weekly) or $1,350 one-time | https://direct.sos.state.tx.us/help/help-corp.asp?pg=bulk | | TX | Comptroller franchise tax | **FREE** on data.texas.gov (xn8i-yb9w) | 3.2M records but SODA API returns empty — may need portal CSV export | | MN | SOS data subscription | $30/week (free non-commercial) | CSV, delivered within 10 days | | NE | SOS special request | $15 per 1,000 records | CSV with filters | | AZ | Corp Commission form M027 | $75 partial / $1,000 full | Importable format | | NC | SOS data subscription | $750 initial + $250/year | FTP weekly updates | | LA | SOS office | $6,900–$12,500 | Too expensive | ## No Bulk Access (Playwright search only) These states require live SOS portal searches via our Playwright adapters (~3-20s per lookup, cached 24h): DE, IL, GA, MA, MD, NH, NJ, SC, SD, TN, KY, IN, MS, MO, WV, ND, OK, RI, HI, NM, NV (search API only), MT, NE (unless paid), AL, AR, KS, LA, ME Our state adapters handle all 52 jurisdictions via `search_name()` for on-demand lookups. ## SEC EDGAR (public companies only) For ~10K publicly-traded companies, SEC filings include authoritative state of incorporation: - **Company list:** https://www.sec.gov/files/company_tickers.json - **Detail:** https://data.sec.gov/submissions/CIK{padded_10}.json - **Fields:** `stateOfIncorporation`, `name`, `ein`, `addresses` - **Rate limit:** 10 req/sec, free, requires User-Agent header - **Limitation:** Only SEC-registered filers (public companies, not private LLCs) ## Aggregator APIs | Service | Free Tier | Coverage | Notes | |---------|-----------|----------|-------| | OpenCorporates | 200 calls/month | 170+ jurisdictions | Not viable for bulk. Paid plans start GBP 2,250/yr | | Cobalt Intelligence | 20 free lookups | All 50 states | Credit-based paid API. Gold standard but expensive | | Apify "US Business Entity Search" | Pay-per-use | 34 state registries | Uses SIP Public Data Gateway. Most comprehensive | ## Daily Cron The `pw-entity-cache-refresh` timer runs at 07:00 UTC (2am CT) daily: ``` python -m scripts.formation.bulk_download --all ``` Downloads all configured Socrata states and upserts into `entity_cache`. ## Schema ```sql -- entity_cache table (migration 009) entity_name TEXT NOT NULL -- Uppercase entity_number TEXT -- State filing number entity_type TEXT -- LLC, CORPORATION, LP, NONPROFIT status TEXT -- ACTIVE, DISSOLVED, SUSPENDED, DELINQUENT, INACTIVE formation_date DATE formation_state TEXT -- 2-letter code of state where entity was originally formed registered_agent TEXT principal_address TEXT state TEXT NOT NULL -- State this record is registered in source TEXT DEFAULT 'socrata' UNIQUE(jurisdiction, entity_number) INDEX gin_trgm on entity_name -- Fuzzy search INDEX on state INDEX on status ```