new-site/docserver/README.md
justin f8cd37ac8c Initial commit — Performance West telecom compliance platform
Includes: API (Express/TypeScript), Astro site, Python workers,
document generators, FCC compliance tools, Canada CRTC formation,
Ansible infrastructure, and deployment scripts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-27 06:54:22 -05:00

3.9 KiB

Performance West — Document Conversion Worker

Converts DOCX files to pixel-perfect PDFs using Microsoft Word on a Windows VM. No HTTP server, no open ports, no SSH tunnel needed.

Architecture

The Windows VM connects outbound to MinIO only. No inbound access required.

Linux workers container       MinIO (S3)           Windows VM (any NAT)
       │                         │                        │
       ├─ PUT docx ─────────────→│                        │
       │  to-convert/{id}.docx   │←─ poll every 3s ───────┤
       │                         │   list to-convert/      │
       │                         │                        ├─ Word.SaveAs PDF
       │                         │←─ PUT pdf ─────────────┤
       │                         │  converted/{id}.pdf     │
       │                         │←─ DELETE docx ──────────┤
       │←─ GET pdf ──────────────┤                        │
       │  converted/{id}.pdf     │                        │
       └─ DELETE pdf ────────────┤                        │

The pdf_converter.py on the Linux side uploads the DOCX and polls until the PDF appears (up to DOCSERVER_TIMEOUT seconds, default 120).

If the Windows VM is unavailable or slow, conversion falls back automatically to LibreOffice headless in the workers container (70-80% fidelity).

Windows VM Requirements

  • Windows 10/11 Pro or Windows Server 2022
  • Microsoft Word (Office 2021+ recommended)
  • Python 3.12+ (from python.org — check "Add to PATH")
  • Outbound internet access to MinIO (HTTPS, no inbound ports needed)

Setup

Run install.ps1 as Administrator in PowerShell on the Windows VM:

cd C:\path\to\docserver

.\install.ps1 `
  -MinioEndpoint  "minio.performancewest.net" `
  -MinioPort      443 `
  -MinioSecure    $true `
  -MinioAccessKey "your_access_key" `
  -MinioSecretKey "your_secret_key"

This will:

  1. Verify Python and Word are installed
  2. Install pywin32 and minio Python packages
  3. Copy docserver_worker.py to C:\docserver\
  4. Write C:\docserver\docserver.env with your MinIO credentials
  5. Register a Task Scheduler task (PW-DocserverWorker) that starts at login
  6. Start the worker immediately

The worker must run as a logged-in user — Word COM requires an interactive Windows session and will fail under a system service account.

How to access MinIO externally

The Windows VM needs to reach MinIO. Options:

A. MinIO exposed externally (simplest) Set MINIO_ENDPOINT=minio.performancewest.net, MINIO_PORT=443, MINIO_SECURE=true. Add a MinIO nginx vhost on the Debian server that proxies port 443 → MinIO port 9000.

B. VPN / WireGuard Connect the Windows VM to the same private network as the Debian server. Use the internal IP 192.168.x.x:9000 and MINIO_SECURE=false.

C. Cloudflare Tunnel Run a cloudflared tunnel on the Debian server and connect from Windows.

Heartbeat monitoring

The worker writes minio://{bucket}/docserver-heartbeat.json every 60 seconds:

{
  "status": "ok",
  "word_version": "16.0",
  "host": "WINVM-01",
  "ts": "2026-04-05T12:00:00+00:00"
}

Read this to check if the worker is alive. The health_check() function in pdf_converter.py reads it automatically.

Manual test

Place a .docx file in minio://{bucket}/to-convert/test.docx and watch for minio://{bucket}/converted/test.pdf to appear within a few seconds.

Using the MinIO web console (http://server:9001) or mc CLI:

mc cp mydoc.docx local/performancewest/to-convert/test.docx
# wait a few seconds...
mc ls local/performancewest/converted/
mc cp local/performancewest/converted/test.pdf ./test.pdf

Logs

Worker logs: C:\docserver\logs\worker.log Task Scheduler log: Event Viewer → Task Scheduler → PW-DocserverWorker