Companion to the worker MinIO-retry fix. Makes the worker auto-recover from process death (crash, manual kill, missed boot trigger), not just MinIO outages. - start_worker.bat: propagate Python's exit code (exit /b %rc%) so Task Scheduler can actually detect a failed run (it previously always exited 0). - reconfigure_task.ps1 (new): re-registers PW-DocserverWorker with RestartCount=99 / 1-min interval, StartWhenAvailable, and two triggers — AtStartup plus a 5-min repeating trigger with MultipleInstances=IgnoreNew, so a dead worker relaunches within ~5 min and never double-runs. Idempotent. - install.ps1: same self-healing settings for fresh installs. - Verified on the box: killed the worker -> task relaunched it; firing again while running stayed at one instance. Docs updated to match reality: - docserver/README.md: new 'Reliability / self-healing' section. - document-generation.md: corrected the stale 'Flask DocServer :5050 / HTTP' description to the actual MinIO outbound-only transport. - e2e-test-plan.md: removed the outdated 'Word COM fails under SYSTEM / requires RDP after every reboot' limitation; now self-healing under SYSTEM session 0. - infrastructure.md: fixed VM spec (Win Server 2019, Word 16.0, Python 3.13, SSH port 22422) + self-healing note. - architecture.md / formation-system.md: trigger + self-healing details.
13 KiB
Performance West — Document Generation System
Last updated: 2026-03-27
Overview
The document generation system produces professional compliance documents for customers. It supports two generation modes:
- Template-based — DOCX templates with Jinja2 placeholders, filled with order data
- LLM-based — Templates provide structure; Ollama generates analysis sections
All generated documents pass through a quality gate (admin review) before delivery.
Architecture
┌─────────────┐
│ ERPNext │ (order data + intake forms)
└──────┬──────┘
│
┌──────┴──────┐
│ Worker │ (Python — polls for Queued orders)
└──────┬──────┘
│
┌────────────┼────────────┐
│ │
┌────────┴────────┐ ┌─────────┴─────────┐
│ Template-based │ │ LLM-based │
│ (DocxBuilder) │ │ (DocxBuilder + │
│ │ │ Ollama/LLM) │
└────────┬────────┘ └─────────┬─────────┘
│ │
└────────────┬────────────┘
│
┌──────┴──────┐
│ PDF Convert │
│ ┌─────────┐ │
│ │DocServer│ │ ← PRIMARY (Windows, MS Word COM, :5050)
│ │ :5050 │ │
│ └────┬────┘ │
│ │ fail │
│ ┌────┴────┐ │
│ │LibreOfc │ │ ← FALLBACK (headless, in Docker)
│ └─────────┘ │
└──────┬──────┘
│
┌──────┴──────┐
│ MinIO │ (upload DOCX + PDF)
└──────┬──────┘
│
┌──────┴──────┐
│ ERPNext │ (update status → Review)
└─────────────┘
Template-Based Generation
When Used
- Operating agreements (formation orders)
- Privacy policies
- Invoices
- CRTC registration letter (Canada CRTC Carrier Package)
- BC corporate binder (9 sections — cover page, incorporation certificate placeholder, articles of incorporation, registered office, directors/officers, share structure, CRTC registration, vendor directory, compliance calendar)
- Vendor directory PDF (Canadian telecom vendors and contacts)
- Any document where the content is deterministic (no analysis needed)
How It Works
- Worker fetches the
.docxtemplate from MinIO (templates/{template-name}.docx) DocxBuilderloads the template viapython-docx- Variables from the ERPNext order are substituted into Jinja2 placeholders
- The filled document is saved as DOCX
- LibreOffice converts DOCX to PDF
- Both files are uploaded to MinIO
DOCX Template Format
Templates are standard .docx files with Jinja2 syntax embedded in the text:
Simple variables:
This Operating Agreement of {{ entity_name }}, a limited liability company
organized under the laws of {{ state_name }}...
Conditionals:
{% if management_type == 'manager' %}
The Manager(s) of the Company shall be {{ managers }}.
{% else %}
All Members shall have the authority to manage the business.
{% endif %}
Loops (for tables or repeated sections):
{% for member in members %}
{{ member.name }} — {{ member.ownership_pct }}% ownership
{% endfor %}
Section placeholders (for LLM-generated content):
{{ executive_summary }}
{{ classification_analysis }}
{{ remediation_plan }}
Creating a New Template
- Run
python scripts/templates/create_templates.pyto generate the base templates, or create manually in Word/LibreOffice - Use
{{ variable_name }}for all dynamic content - Use Times New Roman for body text, navy blue (
#2D4E78) for headings - Include the Performance West header, confidentiality footer, and page numbers
- Save as
.docx(not.doc) - Upload to MinIO:
mc cp template.docx minio/performancewest/templates/
Modifying an Existing Template
- Download from MinIO:
mc cp minio/performancewest/templates/name.docx . - Edit in Word or LibreOffice — preserve all
{{ }}placeholders - Test locally:
python -c "from scripts.document_gen.docx_builder import DocxBuilder; ..." - Upload the updated template back to MinIO
- Existing generated documents are not affected (they are separate files)
LLM-Based Generation
When Used
- FLSA/wage & hour audit reports
- CCPA/CPRA compliance audit reports
- TCPA consent audit reports
- Independent contractor classification assessments
- Employee handbook reviews
- Data breach response plans
How It Works
- Worker fetches the DOCX template (provides structure and formatting)
- Worker constructs a prompt from the service-specific handler + intake data
- Worker sends the prompt to Ollama (qwen2.5:7b running locally)
- LLM returns analysis text for each section
DocxBuilder.insert_section()replaces section placeholders with LLM output- Simple variables (company name, dates) are filled via
DocxBuilder.fill() - Document is converted to PDF and uploaded to MinIO
- Status is always set to Review — LLM output must be human-reviewed
Prompt Engineering Guidelines
Each compliance service has a dedicated handler in scripts/workers/services/ that constructs the prompt. Follow these guidelines:
Structure:
You are a compliance consultant preparing a {document_type} for {company_name}.
CONTEXT:
{intake_data formatted as structured text}
INSTRUCTIONS:
- Write in a professional, objective tone
- Cite specific regulations by name and section number
- Identify concrete findings (compliant, non-compliant, needs improvement)
- Provide actionable remediation steps with deadlines
- Do not include legal advice disclaimers (the template adds these)
OUTPUT FORMAT:
Return a JSON object with the following keys:
- executive_summary: 2-3 paragraph overview
- {section_name}: detailed analysis for each section
- remediation_plan: prioritized action items
Write for a business audience. Be specific, not generic.
Key rules:
- Always request JSON output — easier to parse and insert into template sections
- Include the intake data as structured context, not raw form dumps
- Specify the exact section names that match template placeholders
- Set temperature to 0.3 for consistency; compliance documents should not be creative
- Maximum token limit: 4096 per section to prevent rambling
- If the LLM returns malformed JSON, retry once with a stricter prompt
Model selection:
- Default:
qwen2.5:7b(good balance of quality and speed for 16GB VRAM) - For complex multi-state analysis:
qwen2.5:14bif GPU memory allows - Configured via
OLLAMA_MODELenvironment variable
PDF Conversion
DOCX to PDF conversion uses a two-tier approach:
PRIMARY: Windows DocServer (Microsoft Word COM)
A Windows server runs docserver_worker.py that uses Microsoft Word via COM
automation for pixel-perfect DOCX → PDF conversion. This produces the highest-
fidelity output (exact font rendering, correct page breaks, proper table
formatting).
The transport is MinIO, not HTTP — the Windows VM only makes outbound connections to MinIO, so there are no open inbound ports / SSH tunnels and it works behind any NAT:
pdf_converter.py (Linux) MinIO (S3) docserver_worker.py (Windows)
PUT docx → to-convert/{id}.docx ─────────► │
│◄─ poll every 12s ───────┤
│ ├─ Word.SaveAs → PDF
GET pdf ← converted/{id}.pdf ◄──────────│◄─ PUT converted/{id}.pdf┘
DEL docx / DEL pdf (cleanup)
# pdf_converter.py — primary path (simplified)
mc.put_object(bucket, f"to-convert/{job_id}.docx", docx_stream, length)
# ...poll until converted/{job_id}.pdf appears (DOCSERVER_TIMEOUT, default 120s)...
pdf_bytes = mc.get_object(bucket, f"converted/{job_id}.pdf").read()
The Windows worker is self-healing: it retries MinIO with backoff instead of
exiting on a transient outage, and its PW-DocserverWorker scheduled task
restarts on failure plus re-fires every 5 minutes if the process dies. See
docserver/README.md → "Reliability / self-healing".
FALLBACK: LibreOffice Headless
If DocServer is unavailable (network error, timeout, Windows server down), the converter falls back to LibreOffice in headless mode:
libreoffice --headless --convert-to pdf --outdir /tmp document.docx
Converter Logic
The pdf_converter.py module handles:
- DocServer first — POST to
:5050/convert, 60-second timeout - Fallback to LibreOffice — if DocServer returns error or times out
- Retry logic (up to 3 attempts per converter)
- Temporary file cleanup
- Error reporting to ERPNext
- Logs which converter was used for each document
LibreOffice is installed in the Python worker Docker container (scripts/Dockerfile).
DocServer host is configured via DOCSERVER_HOST environment variable (default: 192.168.1.x).
MinIO Upload/Download
The minio_client.py module provides:
# Upload a generated document
upload_document(
local_path="/tmp/operating-agreement.pdf",
minio_path="orders/FO-2026-0001/operating-agreement.pdf",
content_type="application/pdf",
)
# Download a template
download_template(
template_name="operating-agreement", # downloads operating-agreement.docx
local_path="/tmp/operating-agreement.docx",
)
# Generate a pre-signed URL for customer download
url = presign_url(
minio_path="orders/FO-2026-0001/operating-agreement.pdf",
expires=3600, # 1 hour
)
Bucket structure: See docs/crm.md for the full MinIO directory layout.
Security: MinIO is not exposed externally. The Express API generates time-limited pre-signed URLs for customer downloads.
Quality Gates
Admin Review
Every generated document enters Review status before delivery:
- Admin opens the order in ERPNext
- Downloads the DOCX/PDF from the attached MinIO link
- Reviews for accuracy, completeness, and professionalism
- Actions:
- Approve — moves to Ready
- Request Revision — moves to Revision with notes; worker re-generates
- Reject — flags for manual document creation
Revision Loop
When a reviewer requests changes:
- Order status returns to Processing
- Reviewer's notes are stored in the ERPNext order comments
- Worker re-generates with adjusted prompts or manual edits
- Document re-enters Review
- Maximum 3 automated revision cycles; after that, manual creation is required
File Reference
scripts/
├── document_gen/
│ ├── __init__.py
│ ├── docx_builder.py # DOCX template filling (Jinja2 + python-docx)
│ ├── llm_writer.py # Ollama prompt construction and parsing
│ ├── minio_client.py # MinIO upload/download/presign
│ └── pdf_converter.py # LibreOffice headless DOCX→PDF
├── templates/
│ ├── create_templates.py # Generates all .docx templates (run once)
│ ├── crtc-registration-letter.docx # CRTC carrier registration letter template
│ ├── bc-corporate-binder.docx # BC corporate binder (9 sections)
│ ├── vendor-directory.docx # Canadian telecom vendor directory
│ └── *.docx # Other generated template files
└── workers/
├── base_worker.py # ERPNext polling loop, status transitions
├── erpnext_client.py # ERPNext REST API client
├── delivery_worker.py # Email delivery with SMTP
├── renewal_worker.py # Subscription renewal reminders
└── services/
├── base_handler.py # Base class for service handlers
├── privacy_policy.py # Template-based: fill and convert
├── breach_response.py # LLM: breach response plan
├── flsa_audit.py # LLM: FLSA audit report
├── ccpa_audit.py # LLM: CCPA audit report
├── consent_audit.py # LLM: TCPA consent audit
├── contractor_review.py # LLM: contractor classification
├── handbook_review.py # LLM: handbook review
├── campaign_review.py # LLM: marketing campaign review
└── dnc_review.py # LLM: DNC compliance review