new-site/docs/vm-security-hardening.md

100 lines
5.3 KiB
Markdown

# VM / host security hardening
Prod app host: `207.174.124.71` (Debian 13 trixie, k3s + Docker, SSH on 22022).
## Baseline (already in place before 2026-06)
- SSH: `permitrootlogin no`, `passwordauthentication no`, key-only, port 22022.
- fail2ban active (sshd, nginx-badbots).
- unattended-upgrades enabled; 0 pending security updates.
- TLS: Qualys SSL Labs **A+**, SecurityHeaders **A**, full CSP/HSTS-preload
(set in `/etc/nginx/snippets/pw-security.conf` + `pw-site.conf`).
## CRITICAL issue found + fixed (2026-06-06): wide-open host ports
`ss`/external probing showed a large attack surface reachable from the public
internet (no host firewall; iptables INPUT policy ACCEPT; kube-router does not
firewall host ports; Docker publishes container ports via 0.0.0.0 DNAT):
- **5432 Postgres** (customer DB!), **6443 k8s API**, **10250 kubelet**,
**3022 Forgejo SSH**, **9100/9101 Listmonk admin**, **3001/3002 APIs**,
**8080, 3033, 3100, 5555/5556, 8888, 4322/4323**, and the hc submission
ports **2526-2528** — all OPEN to the internet. Confirmed via off-network
`/dev/tcp` probes. Only :25 was provider-filtered.
### Fix
Two layers, installed as a persistent, boot-enabled systemd service
`pw-firewall.service` (auto-rollback timer was used during rollout):
1. **`/etc/pw-firewall/pw-firewall.nft`** - dedicated `inet pw_fw` table with an
input hook at priority -150 (evaluated before kube-router's ACCEPT). Allows
loopback, established/related, internal subnets (127/8, 172.16/12, k3s
10.42/16 + 10.43/16, docker/cni/flannel/veth ifaces), ICMP, then a public
allow-list **{ 22, 22022, 80, 443 }** and DROPs all other NEW inbound on
`ens18`. (Port **25 inbound removed** - the VM only SENDS bulk mail; inbound
mail is handled by Carbonio elsewhere. Outbound :25 egress is unaffected.)
2. **`/usr/local/sbin/pw-docker-fw.sh`** - because Docker-published ports are
DNAT'd and traverse FORWARD (not input), this adds DOCKER-USER rules:
RETURN established, **DROP NEW inbound on `ens18`** (scoped to the uplink so
container<->container + nginx->container loopback are untouched), RETURN.
Re-applied on docker restart via `docker.service.d/pw-firewall.conf`
ExecStartPost.
### Verified after fix (off-network)
- Sensitive ports 5432/3022/9100/9101/8080/3001/3002/3100/3033/6443/10250/2526/25
-> **blocked**. Public 80/443/22022 -> **open**. Site HTTPS -> 200.
- Internal intact: nginx->container loopback (listmonk 9100/9101 = 200), api->DB
`select 1`, container->internet DNS egress, k3s node Ready, outbound mail still
relays to Gmail MX from the hc IPs.
## Files
- `/etc/pw-firewall/pw-firewall.nft`
- `/usr/local/sbin/pw-docker-fw.sh`
- `/etc/systemd/system/pw-firewall.service` (enabled)
- `/etc/systemd/system/docker.service.d/pw-firewall.conf`
## TODO / follow-ups
- Consider binding the Docker-published ports to `127.0.0.1` in compose (defence
in depth) so they never bind 0.0.0.0 in the first place - the firewall already
covers it, but compose-level `127.0.0.1:PORT:PORT` is cleaner.
- k8s API (6443) / kubelet (10250): now firewalled; if remote kubectl is ever
needed, allow-list the specific admin source IP rather than reopening.
## Free security badge / scanner (2026-06-06)
Ran **ImmuniWeb Community Edition** SSL/TLS scan (free, embeddable seal +
live grade page) for performancewest.net. Results:
- **PCI DSS: fully compliant** (all cipher suites + protocols compliant).
- **HIPAA / NIST: compliant** after fix (see below).
- **GDPR: compliant.** Industry best practices: no issues. Post-quantum:
hybrid key-exchange supported.
### TLS cipher hardening (made HIPAA/NIST perfectly clean)
ImmuniWeb flagged 1 of 9 cipher suites (the SHA-1 MAC CBC suites
`ECDHE-ECDSA-AES{128,256}-SHA`) as non-NIST/HIPAA. The nginx cipher list was
the broad `HIGH:!aNULL:!MD5` repeated across all PW server blocks. Replaced it
globally with an explicit modern list (ECDHE + GCM/CHACHA20 + SHA256/384 CBC,
**no SHA-1**). Verified: SHA-1 CBC suites no longer negotiate, GCM + TLS 1.3
still work, site serves 200, and **Qualys SSL Labs still A+**. nginx config
backups moved to `/etc/nginx/backups/` (NOT in an include path).
### Trust badges we can legitimately display (for TrustStrip.astro)
- **Qualys SSL Labs A+** (verify link: ssllabs.com/ssltest)
- **SecurityHeaders.com A**
- **ImmuniWeb: PCI DSS / HIPAA / NIST / GDPR compliant TLS** (seal + report)
- **Payments by Stripe (PCI DSS Level 1)**
- **256-bit TLS, HSTS preloaded**
- **Hosted in a SOC 2 Type II compliant data center**
TODO: TrustedSite (ex-McAfee SECURE) free tier needs a signup to get the
daily-scan trustmark image - add later if an image seal is wanted.
### TLS cipher: removed all CBC suites (2026-06-06)
Qualys flagged the two remaining CBC suites as WEAK:
`TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384` (0xc024) and
`TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256` (0xc023). CBC modes carry the historic
padding-oracle risk; every modern client supports AEAD, so they were dropped.
Final cipher list = AEAD only: GCM (AES-128/256) + CHACHA20-POLY1305 (TLS 1.2)
plus TLS 1.3 suites. Verified: CBC no longer negotiates, GCM/TLS1.3 work, site
200, **Qualys A+ with WEAK suites: NONE**. The cipher list + the cdn.ywxi.net CSP
addition are now in the ansible templates (`infra/ansible/roles/nginx/templates/`)
so they don't drift on the next ansible run. Firewall captured as IaC in
`infra/firewall/`.