docs: verify NPPES + EPA RCRA field schemas against live files

This commit is contained in:
justin 2026-06-05 00:34:56 -05:00
parent 70d05e0607
commit 5e4e73674a

View file

@ -15,24 +15,44 @@ sectors and, critically, **how to reach the license holders besides postal mail.
## 1. NPPES / Healthcare Providers (NPI)
**Source:** CMS NPPES monthly full-replacement dissemination file (free bulk CSV,
millions of rows). Cross-joinable with OIG LEIE (exclusions) and the CMS
revalidation list, both free.
~10M rows). Verified live against `npidata_pfile_20050523-20260510.csv`
(`download.cms.gov/nppes/`). Cross-joinable with OIG LEIE (exclusions) and the
CMS revalidation list, both free.
**Email in file:** No. Practice/mailing address, phone, fax only.
**Email in file:** ❌ **VERIFIED — no email field exists** (file has 104 columns;
none is email). Contact info available: **mailing + practice TELEPHONE
(cols 27, 35), mailing + practice FAX (cols 28, 36)**, full mailing + practice
addresses, and Authorized Official telephone (col 47). So channel = fax, phone,
mail, or email-append. Not email-native.
### Detectable from the file
### Verified columns we care about (104-col file)
| Col # | Field (exact) |
|---|---|
| 1, 2 | NPI, Entity Type Code (1=individual, 2=org) |
| 511 | Org legal name / provider name + credential |
| 2128 | Mailing address, **mailing telephone (27)**, **mailing fax (28)** |
| 2936 | Practice location address, **practice telephone (35)**, **practice fax (36)** |
| 37 | Provider Enumeration Date |
| 38 | Last Update Date |
| 39, 40, 41 | NPI Deactivation Reason Code, Deactivation Date, Reactivation Date |
| 4347 | Authorized Official name/title + **telephone (47)** |
| 48103 | Up to **15× {Taxonomy Code, License Number, License State Code, Primary Taxonomy Switch}** |
> Note: the public file does NOT contain a "Is Sole Proprietor" or EIN-validated
> field in a usable way (EIN col 4 is usually masked). Earlier guess corrected.
### Detectable from the file (verified)
| Signal | Field(s) | Obligation | Service |
|---|---|---|---|
| Stale `Last Update Date` (>12 yrs) | Last Update Date | NPPES update within 30 days of any change | NPPES refresh/attestation |
| Deactivated NPI | NPI Deactivation Date / Reactivation Date | Deactivated NPI can't bill | NPI reactivation |
| Old enumeration + never updated | Provider Enumeration Date vs Last Update Date | Likely overdue Medicare revalidation (5-yr) | PECOS revalidation |
| Taxonomy vs license-state mismatch | Taxonomy, License Number, License State | Specialty/license inconsistency | License/taxonomy reconcile |
| No primary taxonomy flagged | taxonomy primary switch | Billing/credentialing errors | Taxonomy cleanup |
| Org (Type 2) missing Authorized Official | Authorized Official block | Incomplete org NPI | Org NPI correction |
| Sole-proprietor flag vs entity-type conflict | Is Sole Proprietor, Entity Type Code | Enrollment/tax classification issue | Enrollment review |
| Stale `Last Update Date` (>12 yrs) | col 38 | NPPES update within 30 days of any change | NPPES refresh/attestation |
| Deactivated NPI | cols 3941 | Deactivated NPI can't bill | NPI reactivation |
| Old enumeration + never updated | col 37 vs 38 | Likely overdue Medicare revalidation (5-yr) | PECOS revalidation |
| Taxonomy w/ license but no license-state | taxonomy/license/state sets | License/specialty inconsistency | License/taxonomy reconcile |
| No primary taxonomy flagged (switch all N) | Primary Taxonomy Switch_n | Billing/credentialing errors | Taxonomy cleanup |
| Org (Type 2) missing Authorized Official | cols 2, 4347 | Incomplete org NPI | Org NPI correction |
**Inferable only (not in file):** exact revalidation due date (PECOS), HIPAA
posture, active billing, sanctions (use OIG LEIE join).
posture, active billing, sanctions (use OIG LEIE join), email.
**Best cross-join hook:** NPPES ⨝ OIG LEIE ⨝ CMS revalidation list.
@ -62,29 +82,50 @@ Closest analog to FCC RMD in size and clock.
## 3. EPA RCRA Hazardous Waste Handlers (via ECHO / RCRAInfo / FRS)
**Source:** ECHO downloadable files, RCRAInfo public data, Facility Registry
Service. Richest enforcement data of the three. Cross-join with TRI.
**Source:** ECHO bulk files (`echo.epa.gov/files/echodownloads/`) — verified live.
Two relevant downloads:
- **`ECHO_EXPORTER`** (137 cols) — one row per facility across all programs, holds
the compliance signals. Column dict: `echo_exporter_columns_*.xlsx`.
- **`rcra_downloads.zip`** — 6 RCRA-specific CSVs: `RCRA_FACILITIES.csv` (15 cols),
`RCRA_VIOLATIONS.csv`, `RCRA_EVALUATIONS.csv`, `RCRA_ENFORCEMENTS.csv`,
`RCRA_NAICS.csv`, `RCRA_VIOSNC_HISTORY.csv`.
**Email in file:** Largely absent. Facility/owner contact name, phone, mailing
address present.
**Email in file:** ❌ **VERIFIED — no email anywhere in ECHO bulk.**
`RCRA_FACILITIES.csv` has only: `ID_NUMBER, FACILITY_NAME, ACTIVITY_LOCATION,
FULL_ENFORCEMENT, HREPORT_UNIVERSE_RECORD, STREET_ADDRESS, CITY_NAME, STATE_CODE,
ZIP_CODE, LATITUDE83, LONGITUDE83, FED_WASTE_GENERATOR, TRANSPORTER, ACTIVE_SITE,
OPERATING_TSDF`. **No contact name, no phone, no email** in ECHO RCRA. Owner/
operator contact NAME + PHONE (still no email) exists only in the deeper RCRAInfo
handler download (`rcrapublic.epa.gov`), where a PHONE field is present.
So channel = phone (from RCRAInfo) + mail + email-append. Not email-native.
### Detectable from the data
### Verified ECHO_EXPORTER RCRA signal columns
`RCRA_FLAG`, `RCRA_IDS`, `RCRA_PERMIT_TYPES`, `RCRA_NAICS`,
`RCRA_INSPECTION_COUNT`, `RCRA_DAYS_LAST_EVALUATION`, `RCRA_INFORMAL_COUNT`,
`RCRA_FORMAL_ACTION_COUNT`, `RCRA_DATE_LAST_FORMAL_ACTION`, `RCRA_PENALTIES`,
`RCRA_LAST_PENALTY_DATE`, `RCRA_LAST_PENALTY_AMT`, `RCRA_QTRS_WITH_NC`,
`RCRA_COMPLIANCE_STATUS`, `RCRA_SNC_FLAG`, `RCRA_3YR_COMPL_QTRS_HISTORY`. Plus
facility-level: `FAC_DATE_LAST_INSPECTION`, `FAC_SNC_FLG`, `FAC_COMPLIANCE_STATUS`.
### Detectable from the data (verified)
| Signal | Field(s) | Obligation | Service |
|---|---|---|---|
| Generator status LQG/SQG/VSQG | handler classification | Biennial report + manifest + training | Generator program |
| Biennial report not filed | RCRAInfo biennial flag | LQG Biennial Report (odd yrs, by Mar 1) | Biennial filing |
| Open/current violation | ECHO CurrViolation/history | Return-to-compliance | Violation remediation |
| SNC / HPV flag | ECHO SNC/SVQ flags | High enforcement priority | Audit prep + corrective |
| Old inspection + LQG | last inspection date | Overdue inspection risk | Self-audit |
| Permit expired/expiring | permit status/expiration | TSDF permit renewal | Permit renewal |
| Stale SQG re-notification | notification date | SQG re-notify (~4 yrs, state-dependent) | Re-notification |
| NAICS implies waste, no RCRA ID | FRS NAICS w/o RCRA link | Should be registered as generator | Generator registration |
| EPCRA/Tier II non-filer | facility + chemical thresholds | Tier II annual report (by Mar 1) | Tier II / SPCC filing |
| Generator status (LQG/SQG/VSQG) | `FED_WASTE_GENERATOR` (1/2/3/N), `RCRA_PERMIT_TYPES` | Biennial report + manifest + training | Generator program |
| Open/current violation | `RCRA_COMPLIANCE_STATUS`, `RCRA_QTRS_WITH_NC` | Return-to-compliance | Violation remediation |
| SNC flag | `RCRA_SNC_FLAG`, `FAC_SNC_FLG` | High enforcement priority | Audit prep + corrective |
| Old/never evaluated + LQG | `RCRA_DAYS_LAST_EVALUATION`, `FAC_DATE_LAST_INSPECTION` | Overdue inspection risk | Self-audit |
| Recent penalty / formal action | `RCRA_PENALTIES`, `RCRA_DATE_LAST_FORMAL_ACTION` | Active enforcement | Remediation/defense |
| TSDF without active permit | `OPERATING_TSDF`, `RCRA_PERMIT_TYPES` | TSDF permit renewal | Permit renewal |
| NAICS implies waste, no RCRA ID | `RCRA_NAICS` / FRS NAICS w/o `RCRA_FLAG` | Should be registered as generator | Generator registration |
| Cross-program: RCRA + TRI reporter | `RCRA_FLAG` + `TRI_FLAG` | EPCRA/Tier II overlap | Tier II / SPCC filing |
**Inferable only:** SPCC plan existence, actual chemical inventory, contact email.
**Inferable only (not in file):** biennial-report-not-filed status (need RCRAInfo
BR module, not in ECHO bulk), SPCC plan existence, actual chemical inventory,
contact email. (Earlier "biennial flag" claim corrected — ECHO bulk does not
expose a clean biennial-filed flag.)
**Cross-join opportunity:** ECHO ⨝ TRI ⨝ FRS NAICS to find facilities that
should be reporting but aren't.
**Cross-join opportunity:** ECHO_EXPORTER `RCRA_FLAG` + `TRI_FLAG` + `FAC_NAICS_CODES`
to find facilities that should be reporting but aren't.
---