From 5e4e73674a0c94bac4327e44fb0bb1fd1c391eb9 Mon Sep 17 00:00:00 2001 From: justin Date: Fri, 5 Jun 2026 00:34:56 -0500 Subject: [PATCH] docs: verify NPPES + EPA RCRA field schemas against live files --- docs/new-sector-compliance-targets.md | 99 +++++++++++++++++++-------- 1 file changed, 70 insertions(+), 29 deletions(-) diff --git a/docs/new-sector-compliance-targets.md b/docs/new-sector-compliance-targets.md index 453629b..7e843cd 100644 --- a/docs/new-sector-compliance-targets.md +++ b/docs/new-sector-compliance-targets.md @@ -15,24 +15,44 @@ sectors and, critically, **how to reach the license holders besides postal mail. ## 1. NPPES / Healthcare Providers (NPI) **Source:** CMS NPPES monthly full-replacement dissemination file (free bulk CSV, -millions of rows). Cross-joinable with OIG LEIE (exclusions) and the CMS -revalidation list, both free. +~10M rows). Verified live against `npidata_pfile_20050523-20260510.csv` +(`download.cms.gov/nppes/`). Cross-joinable with OIG LEIE (exclusions) and the +CMS revalidation list, both free. -**Email in file:** No. Practice/mailing address, phone, fax only. +**Email in file:** ❌ **VERIFIED — no email field exists** (file has 104 columns; +none is email). Contact info available: **mailing + practice TELEPHONE +(cols 27, 35), mailing + practice FAX (cols 28, 36)**, full mailing + practice +addresses, and Authorized Official telephone (col 47). So channel = fax, phone, +mail, or email-append. Not email-native. -### Detectable from the file +### Verified columns we care about (104-col file) +| Col # | Field (exact) | +|---|---| +| 1, 2 | NPI, Entity Type Code (1=individual, 2=org) | +| 5–11 | Org legal name / provider name + credential | +| 21–28 | Mailing address, **mailing telephone (27)**, **mailing fax (28)** | +| 29–36 | Practice location address, **practice telephone (35)**, **practice fax (36)** | +| 37 | Provider Enumeration Date | +| 38 | Last Update Date | +| 39, 40, 41 | NPI Deactivation Reason Code, Deactivation Date, Reactivation Date | +| 43–47 | Authorized Official name/title + **telephone (47)** | +| 48–103 | Up to **15× {Taxonomy Code, License Number, License State Code, Primary Taxonomy Switch}** | + +> Note: the public file does NOT contain a "Is Sole Proprietor" or EIN-validated +> field in a usable way (EIN col 4 is usually masked). Earlier guess corrected. + +### Detectable from the file (verified) | Signal | Field(s) | Obligation | Service | |---|---|---|---| -| Stale `Last Update Date` (>1–2 yrs) | Last Update Date | NPPES update within 30 days of any change | NPPES refresh/attestation | -| Deactivated NPI | NPI Deactivation Date / Reactivation Date | Deactivated NPI can't bill | NPI reactivation | -| Old enumeration + never updated | Provider Enumeration Date vs Last Update Date | Likely overdue Medicare revalidation (5-yr) | PECOS revalidation | -| Taxonomy vs license-state mismatch | Taxonomy, License Number, License State | Specialty/license inconsistency | License/taxonomy reconcile | -| No primary taxonomy flagged | taxonomy primary switch | Billing/credentialing errors | Taxonomy cleanup | -| Org (Type 2) missing Authorized Official | Authorized Official block | Incomplete org NPI | Org NPI correction | -| Sole-proprietor flag vs entity-type conflict | Is Sole Proprietor, Entity Type Code | Enrollment/tax classification issue | Enrollment review | +| Stale `Last Update Date` (>1–2 yrs) | col 38 | NPPES update within 30 days of any change | NPPES refresh/attestation | +| Deactivated NPI | cols 39–41 | Deactivated NPI can't bill | NPI reactivation | +| Old enumeration + never updated | col 37 vs 38 | Likely overdue Medicare revalidation (5-yr) | PECOS revalidation | +| Taxonomy w/ license but no license-state | taxonomy/license/state sets | License/specialty inconsistency | License/taxonomy reconcile | +| No primary taxonomy flagged (switch all N) | Primary Taxonomy Switch_n | Billing/credentialing errors | Taxonomy cleanup | +| Org (Type 2) missing Authorized Official | cols 2, 43–47 | Incomplete org NPI | Org NPI correction | **Inferable only (not in file):** exact revalidation due date (PECOS), HIPAA -posture, active billing, sanctions (use OIG LEIE join). +posture, active billing, sanctions (use OIG LEIE join), email. **Best cross-join hook:** NPPES ⨝ OIG LEIE ⨝ CMS revalidation list. @@ -62,29 +82,50 @@ Closest analog to FCC RMD in size and clock. ## 3. EPA RCRA Hazardous Waste Handlers (via ECHO / RCRAInfo / FRS) -**Source:** ECHO downloadable files, RCRAInfo public data, Facility Registry -Service. Richest enforcement data of the three. Cross-join with TRI. +**Source:** ECHO bulk files (`echo.epa.gov/files/echodownloads/`) — verified live. +Two relevant downloads: +- **`ECHO_EXPORTER`** (137 cols) — one row per facility across all programs, holds + the compliance signals. Column dict: `echo_exporter_columns_*.xlsx`. +- **`rcra_downloads.zip`** — 6 RCRA-specific CSVs: `RCRA_FACILITIES.csv` (15 cols), + `RCRA_VIOLATIONS.csv`, `RCRA_EVALUATIONS.csv`, `RCRA_ENFORCEMENTS.csv`, + `RCRA_NAICS.csv`, `RCRA_VIOSNC_HISTORY.csv`. -**Email in file:** Largely absent. Facility/owner contact name, phone, mailing -address present. +**Email in file:** ❌ **VERIFIED — no email anywhere in ECHO bulk.** +`RCRA_FACILITIES.csv` has only: `ID_NUMBER, FACILITY_NAME, ACTIVITY_LOCATION, +FULL_ENFORCEMENT, HREPORT_UNIVERSE_RECORD, STREET_ADDRESS, CITY_NAME, STATE_CODE, +ZIP_CODE, LATITUDE83, LONGITUDE83, FED_WASTE_GENERATOR, TRANSPORTER, ACTIVE_SITE, +OPERATING_TSDF`. **No contact name, no phone, no email** in ECHO RCRA. Owner/ +operator contact NAME + PHONE (still no email) exists only in the deeper RCRAInfo +handler download (`rcrapublic.epa.gov`), where a PHONE field is present. +So channel = phone (from RCRAInfo) + mail + email-append. Not email-native. -### Detectable from the data +### Verified ECHO_EXPORTER RCRA signal columns +`RCRA_FLAG`, `RCRA_IDS`, `RCRA_PERMIT_TYPES`, `RCRA_NAICS`, +`RCRA_INSPECTION_COUNT`, `RCRA_DAYS_LAST_EVALUATION`, `RCRA_INFORMAL_COUNT`, +`RCRA_FORMAL_ACTION_COUNT`, `RCRA_DATE_LAST_FORMAL_ACTION`, `RCRA_PENALTIES`, +`RCRA_LAST_PENALTY_DATE`, `RCRA_LAST_PENALTY_AMT`, `RCRA_QTRS_WITH_NC`, +`RCRA_COMPLIANCE_STATUS`, `RCRA_SNC_FLAG`, `RCRA_3YR_COMPL_QTRS_HISTORY`. Plus +facility-level: `FAC_DATE_LAST_INSPECTION`, `FAC_SNC_FLG`, `FAC_COMPLIANCE_STATUS`. + +### Detectable from the data (verified) | Signal | Field(s) | Obligation | Service | |---|---|---|---| -| Generator status LQG/SQG/VSQG | handler classification | Biennial report + manifest + training | Generator program | -| Biennial report not filed | RCRAInfo biennial flag | LQG Biennial Report (odd yrs, by Mar 1) | Biennial filing | -| Open/current violation | ECHO CurrViolation/history | Return-to-compliance | Violation remediation | -| SNC / HPV flag | ECHO SNC/SVQ flags | High enforcement priority | Audit prep + corrective | -| Old inspection + LQG | last inspection date | Overdue inspection risk | Self-audit | -| Permit expired/expiring | permit status/expiration | TSDF permit renewal | Permit renewal | -| Stale SQG re-notification | notification date | SQG re-notify (~4 yrs, state-dependent) | Re-notification | -| NAICS implies waste, no RCRA ID | FRS NAICS w/o RCRA link | Should be registered as generator | Generator registration | -| EPCRA/Tier II non-filer | facility + chemical thresholds | Tier II annual report (by Mar 1) | Tier II / SPCC filing | +| Generator status (LQG/SQG/VSQG) | `FED_WASTE_GENERATOR` (1/2/3/N), `RCRA_PERMIT_TYPES` | Biennial report + manifest + training | Generator program | +| Open/current violation | `RCRA_COMPLIANCE_STATUS`, `RCRA_QTRS_WITH_NC` | Return-to-compliance | Violation remediation | +| SNC flag | `RCRA_SNC_FLAG`, `FAC_SNC_FLG` | High enforcement priority | Audit prep + corrective | +| Old/never evaluated + LQG | `RCRA_DAYS_LAST_EVALUATION`, `FAC_DATE_LAST_INSPECTION` | Overdue inspection risk | Self-audit | +| Recent penalty / formal action | `RCRA_PENALTIES`, `RCRA_DATE_LAST_FORMAL_ACTION` | Active enforcement | Remediation/defense | +| TSDF without active permit | `OPERATING_TSDF`, `RCRA_PERMIT_TYPES` | TSDF permit renewal | Permit renewal | +| NAICS implies waste, no RCRA ID | `RCRA_NAICS` / FRS NAICS w/o `RCRA_FLAG` | Should be registered as generator | Generator registration | +| Cross-program: RCRA + TRI reporter | `RCRA_FLAG` + `TRI_FLAG` | EPCRA/Tier II overlap | Tier II / SPCC filing | -**Inferable only:** SPCC plan existence, actual chemical inventory, contact email. +**Inferable only (not in file):** biennial-report-not-filed status (need RCRAInfo +BR module, not in ECHO bulk), SPCC plan existence, actual chemical inventory, +contact email. (Earlier "biennial flag" claim corrected — ECHO bulk does not +expose a clean biennial-filed flag.) -**Cross-join opportunity:** ECHO ⨝ TRI ⨝ FRS NAICS to find facilities that -should be reporting but aren't. +**Cross-join opportunity:** ECHO_EXPORTER `RCRA_FLAG` + `TRI_FLAG` + `FAC_NAICS_CODES` +to find facilities that should be reporting but aren't. ---