new-site/docserver/README.md
justin b48d0cb799 docserver: self-healing Task Scheduler config + docs
Companion to the worker MinIO-retry fix. Makes the worker auto-recover from
process death (crash, manual kill, missed boot trigger), not just MinIO outages.

- start_worker.bat: propagate Python's exit code (exit /b %rc%) so Task
  Scheduler can actually detect a failed run (it previously always exited 0).
- reconfigure_task.ps1 (new): re-registers PW-DocserverWorker with
  RestartCount=99 / 1-min interval, StartWhenAvailable, and two triggers —
  AtStartup plus a 5-min repeating trigger with MultipleInstances=IgnoreNew, so
  a dead worker relaunches within ~5 min and never double-runs. Idempotent.
- install.ps1: same self-healing settings for fresh installs.
- Verified on the box: killed the worker -> task relaunched it; firing again
  while running stayed at one instance.

Docs updated to match reality:
- docserver/README.md: new 'Reliability / self-healing' section.
- document-generation.md: corrected the stale 'Flask DocServer :5050 / HTTP'
  description to the actual MinIO outbound-only transport.
- e2e-test-plan.md: removed the outdated 'Word COM fails under SYSTEM / requires
  RDP after every reboot' limitation; now self-healing under SYSTEM session 0.
- infrastructure.md: fixed VM spec (Win Server 2019, Word 16.0, Python 3.13,
  SSH port 22422) + self-healing note.
- architecture.md / formation-system.md: trigger + self-healing details.
2026-06-15 22:49:21 -05:00

5.2 KiB

Performance West — Document Conversion Worker

Converts DOCX files to pixel-perfect PDFs using Microsoft Word on a Windows VM. No HTTP server, no open ports, no SSH tunnel needed.

Architecture

The Windows VM connects outbound to MinIO only. No inbound access required.

Linux workers container       MinIO (S3)           Windows VM (any NAT)
       │                         │                        │
       ├─ PUT docx ─────────────→│                        │
       │  to-convert/{id}.docx   │←─ poll every 3s ───────┤
       │                         │   list to-convert/      │
       │                         │                        ├─ Word.SaveAs PDF
       │                         │←─ PUT pdf ─────────────┤
       │                         │  converted/{id}.pdf     │
       │                         │←─ DELETE docx ──────────┤
       │←─ GET pdf ──────────────┤                        │
       │  converted/{id}.pdf     │                        │
       └─ DELETE pdf ────────────┤                        │

The pdf_converter.py on the Linux side uploads the DOCX and polls until the PDF appears (up to DOCSERVER_TIMEOUT seconds, default 120).

If the Windows VM is unavailable or slow, conversion falls back automatically to LibreOffice headless in the workers container (70-80% fidelity).

Windows VM Requirements

  • Windows 10/11 Pro or Windows Server 2022
  • Microsoft Word (Office 2021+ recommended)
  • Python 3.12+ (from python.org — check "Add to PATH")
  • Outbound internet access to MinIO (HTTPS, no inbound ports needed)

Setup

Run install.ps1 as Administrator in PowerShell on the Windows VM:

cd C:\path\to\docserver

.\install.ps1 `
  -MinioEndpoint  "minio.performancewest.net" `
  -MinioPort      443 `
  -MinioSecure    $true `
  -MinioAccessKey "your_access_key" `
  -MinioSecretKey "your_secret_key"

This will:

  1. Verify Python and Word are installed
  2. Install pywin32 and minio Python packages
  3. Copy docserver_worker.py to C:\docserver\
  4. Write C:\docserver\docserver.env with your MinIO credentials
  5. Register a Task Scheduler task (PW-DocserverWorker) that starts at login
  6. Start the worker immediately

The worker must run as a logged-in user — Word COM requires an interactive Windows session and will fail under a system service account.

Reliability / self-healing

The worker is designed to recover from outages without manual intervention:

  • MinIO outages don't kill it. The worker retries the MinIO connection indefinitely with capped exponential backoff (5s → 120s) instead of exiting, and each poll cycle is wrapped so a transient network error / 502 just rebuilds the client and keeps going. (Previously a single 502 made the worker sys.exit(1), leaving it dead until a reboot.)
  • Crashes / kills are auto-recovered by Task Scheduler. The PW-DocserverWorker task has:
    • RestartCount=99, RestartInterval=1 min — relaunch if the action fails,
    • two triggers: AtStartup plus a repeating trigger every 5 minutes with MultipleInstances=IgnoreNew, so if the process ever dies (crash, manual kill, or a missed boot trigger) it relaunches within ~5 min and never runs more than one instance,
    • StartWhenAvailable to catch up a missed trigger.
  • start_worker.bat propagates Python's exit code (exit /b %rc%) so Scheduler can actually detect a failed run.

To re-apply these task settings on an existing install, run as Administrator:

powershell -ExecutionPolicy Bypass -File C:\docserver\reconfigure_task.ps1

How to access MinIO externally

The Windows VM needs to reach MinIO. Options:

A. MinIO exposed externally (simplest) Set MINIO_ENDPOINT=minio.performancewest.net, MINIO_PORT=443, MINIO_SECURE=true. Add a MinIO nginx vhost on the Debian server that proxies port 443 → MinIO port 9000.

B. VPN / WireGuard Connect the Windows VM to the same private network as the Debian server. Use the internal IP 192.168.x.x:9000 and MINIO_SECURE=false.

C. Cloudflare Tunnel Run a cloudflared tunnel on the Debian server and connect from Windows.

Heartbeat monitoring

The worker writes minio://{bucket}/docserver-heartbeat.json every 60 seconds:

{
  "status": "ok",
  "word_version": "16.0",
  "host": "WINVM-01",
  "ts": "2026-04-05T12:00:00+00:00"
}

Read this to check if the worker is alive. The health_check() function in pdf_converter.py reads it automatically.

Manual test

Place a .docx file in minio://{bucket}/to-convert/test.docx and watch for minio://{bucket}/converted/test.pdf to appear within a few seconds.

Using the MinIO web console (http://server:9001) or mc CLI:

mc cp mydoc.docx local/performancewest/to-convert/test.docx
# wait a few seconds...
mc ls local/performancewest/converted/
mc cp local/performancewest/converted/test.pdf ./test.pdf

Logs

Worker logs: C:\docserver\logs\worker.log Task Scheduler log: Event Viewer → Task Scheduler → PW-DocserverWorker