new-site/docserver/README.md
justin f8cd37ac8c Initial commit — Performance West telecom compliance platform
Includes: API (Express/TypeScript), Astro site, Python workers,
document generators, FCC compliance tools, Canada CRTC formation,
Ansible infrastructure, and deployment scripts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-27 06:54:22 -05:00

112 lines
3.9 KiB
Markdown

# Performance West — Document Conversion Worker
Converts DOCX files to pixel-perfect PDFs using Microsoft Word on a Windows VM.
No HTTP server, no open ports, no SSH tunnel needed.
## Architecture
The Windows VM connects **outbound** to MinIO only. No inbound access required.
```
Linux workers container MinIO (S3) Windows VM (any NAT)
│ │ │
├─ PUT docx ─────────────→│ │
│ to-convert/{id}.docx │←─ poll every 3s ───────┤
│ │ list to-convert/ │
│ │ ├─ Word.SaveAs PDF
│ │←─ PUT pdf ─────────────┤
│ │ converted/{id}.pdf │
│ │←─ DELETE docx ──────────┤
│←─ GET pdf ──────────────┤ │
│ converted/{id}.pdf │ │
└─ DELETE pdf ────────────┤ │
```
The `pdf_converter.py` on the Linux side uploads the DOCX and polls until
the PDF appears (up to `DOCSERVER_TIMEOUT` seconds, default 120).
If the Windows VM is unavailable or slow, conversion falls back automatically
to LibreOffice headless in the workers container (70-80% fidelity).
## Windows VM Requirements
- Windows 10/11 Pro or Windows Server 2022
- Microsoft Word (Office 2021+ recommended)
- Python 3.12+ (from python.org — check "Add to PATH")
- Outbound internet access to MinIO (HTTPS, no inbound ports needed)
## Setup
Run `install.ps1` as Administrator in PowerShell on the Windows VM:
```powershell
cd C:\path\to\docserver
.\install.ps1 `
-MinioEndpoint "minio.performancewest.net" `
-MinioPort 443 `
-MinioSecure $true `
-MinioAccessKey "your_access_key" `
-MinioSecretKey "your_secret_key"
```
This will:
1. Verify Python and Word are installed
2. Install `pywin32` and `minio` Python packages
3. Copy `docserver_worker.py` to `C:\docserver\`
4. Write `C:\docserver\docserver.env` with your MinIO credentials
5. Register a Task Scheduler task (`PW-DocserverWorker`) that starts at login
6. Start the worker immediately
The worker must run as a **logged-in user** — Word COM requires an interactive
Windows session and will fail under a system service account.
## How to access MinIO externally
The Windows VM needs to reach MinIO. Options:
**A. MinIO exposed externally (simplest)**
Set `MINIO_ENDPOINT=minio.performancewest.net`, `MINIO_PORT=443`, `MINIO_SECURE=true`.
Add a MinIO nginx vhost on the Debian server that proxies port 443 → MinIO port 9000.
**B. VPN / WireGuard**
Connect the Windows VM to the same private network as the Debian server.
Use the internal IP `192.168.x.x:9000` and `MINIO_SECURE=false`.
**C. Cloudflare Tunnel**
Run a cloudflared tunnel on the Debian server and connect from Windows.
## Heartbeat monitoring
The worker writes `minio://{bucket}/docserver-heartbeat.json` every 60 seconds:
```json
{
"status": "ok",
"word_version": "16.0",
"host": "WINVM-01",
"ts": "2026-04-05T12:00:00+00:00"
}
```
Read this to check if the worker is alive. The `health_check()` function in
`pdf_converter.py` reads it automatically.
## Manual test
Place a `.docx` file in `minio://{bucket}/to-convert/test.docx` and watch for
`minio://{bucket}/converted/test.pdf` to appear within a few seconds.
Using the MinIO web console (`http://server:9001`) or `mc` CLI:
```bash
mc cp mydoc.docx local/performancewest/to-convert/test.docx
# wait a few seconds...
mc ls local/performancewest/converted/
mc cp local/performancewest/converted/test.pdf ./test.pdf
```
## Logs
Worker logs: `C:\docserver\logs\worker.log`
Task Scheduler log: Event Viewer → Task Scheduler → `PW-DocserverWorker`