The worker called sys.exit(1) on any MinIO connection error, so a single transient 502 from MinIO/its reverse proxy left it dead until a manual restart or reboot (its scheduled task only runs at system startup). It had been dead ~5 weeks after a 502 on May 9. - _connect_minio_forever(): retry the initial MinIO connect indefinitely with capped exponential backoff (5s..120s) instead of exiting. - main loop: wrap each poll cycle; on any error, log + rebuild the client and keep polling rather than crashing. Verified on the box: normal DOCX->PDF still works (~11s e2e); a bogus endpoint now retries forever without ever calling sys.exit (was the exact May-9 failure). |
||
|---|---|---|
| .. | ||
| docserver_worker.py | ||
| fix_dcom.bat | ||
| install.ps1 | ||
| README.md | ||
| requirements.txt | ||
Performance West — Document Conversion Worker
Converts DOCX files to pixel-perfect PDFs using Microsoft Word on a Windows VM. No HTTP server, no open ports, no SSH tunnel needed.
Architecture
The Windows VM connects outbound to MinIO only. No inbound access required.
Linux workers container MinIO (S3) Windows VM (any NAT)
│ │ │
├─ PUT docx ─────────────→│ │
│ to-convert/{id}.docx │←─ poll every 3s ───────┤
│ │ list to-convert/ │
│ │ ├─ Word.SaveAs PDF
│ │←─ PUT pdf ─────────────┤
│ │ converted/{id}.pdf │
│ │←─ DELETE docx ──────────┤
│←─ GET pdf ──────────────┤ │
│ converted/{id}.pdf │ │
└─ DELETE pdf ────────────┤ │
The pdf_converter.py on the Linux side uploads the DOCX and polls until
the PDF appears (up to DOCSERVER_TIMEOUT seconds, default 120).
If the Windows VM is unavailable or slow, conversion falls back automatically to LibreOffice headless in the workers container (70-80% fidelity).
Windows VM Requirements
- Windows 10/11 Pro or Windows Server 2022
- Microsoft Word (Office 2021+ recommended)
- Python 3.12+ (from python.org — check "Add to PATH")
- Outbound internet access to MinIO (HTTPS, no inbound ports needed)
Setup
Run install.ps1 as Administrator in PowerShell on the Windows VM:
cd C:\path\to\docserver
.\install.ps1 `
-MinioEndpoint "minio.performancewest.net" `
-MinioPort 443 `
-MinioSecure $true `
-MinioAccessKey "your_access_key" `
-MinioSecretKey "your_secret_key"
This will:
- Verify Python and Word are installed
- Install
pywin32andminioPython packages - Copy
docserver_worker.pytoC:\docserver\ - Write
C:\docserver\docserver.envwith your MinIO credentials - Register a Task Scheduler task (
PW-DocserverWorker) that starts at login - Start the worker immediately
The worker must run as a logged-in user — Word COM requires an interactive Windows session and will fail under a system service account.
How to access MinIO externally
The Windows VM needs to reach MinIO. Options:
A. MinIO exposed externally (simplest)
Set MINIO_ENDPOINT=minio.performancewest.net, MINIO_PORT=443, MINIO_SECURE=true.
Add a MinIO nginx vhost on the Debian server that proxies port 443 → MinIO port 9000.
B. VPN / WireGuard
Connect the Windows VM to the same private network as the Debian server.
Use the internal IP 192.168.x.x:9000 and MINIO_SECURE=false.
C. Cloudflare Tunnel Run a cloudflared tunnel on the Debian server and connect from Windows.
Heartbeat monitoring
The worker writes minio://{bucket}/docserver-heartbeat.json every 60 seconds:
{
"status": "ok",
"word_version": "16.0",
"host": "WINVM-01",
"ts": "2026-04-05T12:00:00+00:00"
}
Read this to check if the worker is alive. The health_check() function in
pdf_converter.py reads it automatically.
Manual test
Place a .docx file in minio://{bucket}/to-convert/test.docx and watch for
minio://{bucket}/converted/test.pdf to appear within a few seconds.
Using the MinIO web console (http://server:9001) or mc CLI:
mc cp mydoc.docx local/performancewest/to-convert/test.docx
# wait a few seconds...
mc ls local/performancewest/converted/
mc cp local/performancewest/converted/test.pdf ./test.pdf
Logs
Worker logs: C:\docserver\logs\worker.log
Task Scheduler log: Event Viewer → Task Scheduler → PW-DocserverWorker