Companion to the worker MinIO-retry fix. Makes the worker auto-recover from process death (crash, manual kill, missed boot trigger), not just MinIO outages. - start_worker.bat: propagate Python's exit code (exit /b %rc%) so Task Scheduler can actually detect a failed run (it previously always exited 0). - reconfigure_task.ps1 (new): re-registers PW-DocserverWorker with RestartCount=99 / 1-min interval, StartWhenAvailable, and two triggers — AtStartup plus a 5-min repeating trigger with MultipleInstances=IgnoreNew, so a dead worker relaunches within ~5 min and never double-runs. Idempotent. - install.ps1: same self-healing settings for fresh installs. - Verified on the box: killed the worker -> task relaunched it; firing again while running stayed at one instance. Docs updated to match reality: - docserver/README.md: new 'Reliability / self-healing' section. - document-generation.md: corrected the stale 'Flask DocServer :5050 / HTTP' description to the actual MinIO outbound-only transport. - e2e-test-plan.md: removed the outdated 'Word COM fails under SYSTEM / requires RDP after every reboot' limitation; now self-healing under SYSTEM session 0. - infrastructure.md: fixed VM spec (Win Server 2019, Word 16.0, Python 3.13, SSH port 22422) + self-healing note. - architecture.md / formation-system.md: trigger + self-healing details.
5.2 KiB
Performance West — Document Conversion Worker
Converts DOCX files to pixel-perfect PDFs using Microsoft Word on a Windows VM. No HTTP server, no open ports, no SSH tunnel needed.
Architecture
The Windows VM connects outbound to MinIO only. No inbound access required.
Linux workers container MinIO (S3) Windows VM (any NAT)
│ │ │
├─ PUT docx ─────────────→│ │
│ to-convert/{id}.docx │←─ poll every 3s ───────┤
│ │ list to-convert/ │
│ │ ├─ Word.SaveAs PDF
│ │←─ PUT pdf ─────────────┤
│ │ converted/{id}.pdf │
│ │←─ DELETE docx ──────────┤
│←─ GET pdf ──────────────┤ │
│ converted/{id}.pdf │ │
└─ DELETE pdf ────────────┤ │
The pdf_converter.py on the Linux side uploads the DOCX and polls until
the PDF appears (up to DOCSERVER_TIMEOUT seconds, default 120).
If the Windows VM is unavailable or slow, conversion falls back automatically to LibreOffice headless in the workers container (70-80% fidelity).
Windows VM Requirements
- Windows 10/11 Pro or Windows Server 2022
- Microsoft Word (Office 2021+ recommended)
- Python 3.12+ (from python.org — check "Add to PATH")
- Outbound internet access to MinIO (HTTPS, no inbound ports needed)
Setup
Run install.ps1 as Administrator in PowerShell on the Windows VM:
cd C:\path\to\docserver
.\install.ps1 `
-MinioEndpoint "minio.performancewest.net" `
-MinioPort 443 `
-MinioSecure $true `
-MinioAccessKey "your_access_key" `
-MinioSecretKey "your_secret_key"
This will:
- Verify Python and Word are installed
- Install
pywin32andminioPython packages - Copy
docserver_worker.pytoC:\docserver\ - Write
C:\docserver\docserver.envwith your MinIO credentials - Register a Task Scheduler task (
PW-DocserverWorker) that starts at login - Start the worker immediately
The worker must run as a logged-in user — Word COM requires an interactive Windows session and will fail under a system service account.
Reliability / self-healing
The worker is designed to recover from outages without manual intervention:
- MinIO outages don't kill it. The worker retries the MinIO connection
indefinitely with capped exponential backoff (5s → 120s) instead of exiting,
and each poll cycle is wrapped so a transient network error / 502 just
rebuilds the client and keeps going. (Previously a single 502 made the worker
sys.exit(1), leaving it dead until a reboot.) - Crashes / kills are auto-recovered by Task Scheduler. The
PW-DocserverWorkertask has:RestartCount=99,RestartInterval=1 min— relaunch if the action fails,- two triggers:
AtStartupplus a repeating trigger every 5 minutes withMultipleInstances=IgnoreNew, so if the process ever dies (crash, manual kill, or a missed boot trigger) it relaunches within ~5 min and never runs more than one instance, StartWhenAvailableto catch up a missed trigger.
start_worker.batpropagates Python's exit code (exit /b %rc%) so Scheduler can actually detect a failed run.
To re-apply these task settings on an existing install, run as Administrator:
powershell -ExecutionPolicy Bypass -File C:\docserver\reconfigure_task.ps1
How to access MinIO externally
The Windows VM needs to reach MinIO. Options:
A. MinIO exposed externally (simplest)
Set MINIO_ENDPOINT=minio.performancewest.net, MINIO_PORT=443, MINIO_SECURE=true.
Add a MinIO nginx vhost on the Debian server that proxies port 443 → MinIO port 9000.
B. VPN / WireGuard
Connect the Windows VM to the same private network as the Debian server.
Use the internal IP 192.168.x.x:9000 and MINIO_SECURE=false.
C. Cloudflare Tunnel Run a cloudflared tunnel on the Debian server and connect from Windows.
Heartbeat monitoring
The worker writes minio://{bucket}/docserver-heartbeat.json every 60 seconds:
{
"status": "ok",
"word_version": "16.0",
"host": "WINVM-01",
"ts": "2026-04-05T12:00:00+00:00"
}
Read this to check if the worker is alive. The health_check() function in
pdf_converter.py reads it automatically.
Manual test
Place a .docx file in minio://{bucket}/to-convert/test.docx and watch for
minio://{bucket}/converted/test.pdf to appear within a few seconds.
Using the MinIO web console (http://server:9001) or mc CLI:
mc cp mydoc.docx local/performancewest/to-convert/test.docx
# wait a few seconds...
mc ls local/performancewest/converted/
mc cp local/performancewest/converted/test.pdf ./test.pdf
Logs
Worker logs: C:\docserver\logs\worker.log
Task Scheduler log: Event Viewer → Task Scheduler → PW-DocserverWorker