# Performance West — Document Generation System **Last updated:** 2026-03-27 ## Overview The document generation system produces professional compliance documents for customers. It supports two generation modes: 1. **Template-based** — DOCX templates with Jinja2 placeholders, filled with order data 2. **LLM-based** — Templates provide structure; Ollama generates analysis sections All generated documents pass through a quality gate (admin review) before delivery. ## Architecture ``` ┌─────────────┐ │ ERPNext │ (order data + intake forms) └──────┬──────┘ │ ┌──────┴──────┐ │ Worker │ (Python — polls for Queued orders) └──────┬──────┘ │ ┌────────────┼────────────┐ │ │ ┌────────┴────────┐ ┌─────────┴─────────┐ │ Template-based │ │ LLM-based │ │ (DocxBuilder) │ │ (DocxBuilder + │ │ │ │ Ollama/LLM) │ └────────┬────────┘ └─────────┬─────────┘ │ │ └────────────┬────────────┘ │ ┌──────┴──────┐ │ PDF Convert │ │ ┌─────────┐ │ │ │DocServer│ │ ← PRIMARY (Windows, MS Word COM, :5050) │ │ :5050 │ │ │ └────┬────┘ │ │ │ fail │ │ ┌────┴────┐ │ │ │LibreOfc │ │ ← FALLBACK (headless, in Docker) │ └─────────┘ │ └──────┬──────┘ │ ┌──────┴──────┐ │ MinIO │ (upload DOCX + PDF) └──────┬──────┘ │ ┌──────┴──────┐ │ ERPNext │ (update status → Review) └─────────────┘ ``` ## Template-Based Generation ### When Used - Operating agreements (formation orders) - Privacy policies - Invoices - CRTC registration letter (Canada CRTC Carrier Package) - BC corporate binder (9 sections — cover page, incorporation certificate placeholder, articles of incorporation, registered office, directors/officers, share structure, CRTC registration, vendor directory, compliance calendar) - Vendor directory PDF (Canadian telecom vendors and contacts) - Any document where the content is deterministic (no analysis needed) ### How It Works 1. Worker fetches the `.docx` template from MinIO (`templates/{template-name}.docx`) 2. `DocxBuilder` loads the template via `python-docx` 3. Variables from the ERPNext order are substituted into Jinja2 placeholders 4. The filled document is saved as DOCX 5. LibreOffice converts DOCX to PDF 6. Both files are uploaded to MinIO ### DOCX Template Format Templates are standard `.docx` files with Jinja2 syntax embedded in the text: **Simple variables:** ``` This Operating Agreement of {{ entity_name }}, a limited liability company organized under the laws of {{ state_name }}... ``` **Conditionals:** ``` {% if management_type == 'manager' %} The Manager(s) of the Company shall be {{ managers }}. {% else %} All Members shall have the authority to manage the business. {% endif %} ``` **Loops (for tables or repeated sections):** ``` {% for member in members %} {{ member.name }} — {{ member.ownership_pct }}% ownership {% endfor %} ``` **Section placeholders (for LLM-generated content):** ``` {{ executive_summary }} {{ classification_analysis }} {{ remediation_plan }} ``` ### Creating a New Template 1. Run `python scripts/templates/create_templates.py` to generate the base templates, or create manually in Word/LibreOffice 2. Use `{{ variable_name }}` for all dynamic content 3. Use Times New Roman for body text, navy blue (`#2D4E78`) for headings 4. Include the Performance West header, confidentiality footer, and page numbers 5. Save as `.docx` (not `.doc`) 6. Upload to MinIO: `mc cp template.docx minio/performancewest/templates/` ### Modifying an Existing Template 1. Download from MinIO: `mc cp minio/performancewest/templates/name.docx .` 2. Edit in Word or LibreOffice — preserve all `{{ }}` placeholders 3. Test locally: `python -c "from scripts.document_gen.docx_builder import DocxBuilder; ..."` 4. Upload the updated template back to MinIO 5. Existing generated documents are not affected (they are separate files) ## LLM-Based Generation ### When Used - FLSA/wage & hour audit reports - CCPA/CPRA compliance audit reports - TCPA consent audit reports - Independent contractor classification assessments - Employee handbook reviews - Data breach response plans ### How It Works 1. Worker fetches the DOCX template (provides structure and formatting) 2. Worker constructs a prompt from the service-specific handler + intake data 3. Worker sends the prompt to Ollama (qwen2.5:7b running locally) 4. LLM returns analysis text for each section 5. `DocxBuilder.insert_section()` replaces section placeholders with LLM output 6. Simple variables (company name, dates) are filled via `DocxBuilder.fill()` 7. Document is converted to PDF and uploaded to MinIO 8. Status is always set to **Review** — LLM output must be human-reviewed ### Prompt Engineering Guidelines Each compliance service has a dedicated handler in `scripts/workers/services/` that constructs the prompt. Follow these guidelines: **Structure:** ``` You are a compliance consultant preparing a {document_type} for {company_name}. CONTEXT: {intake_data formatted as structured text} INSTRUCTIONS: - Write in a professional, objective tone - Cite specific regulations by name and section number - Identify concrete findings (compliant, non-compliant, needs improvement) - Provide actionable remediation steps with deadlines - Do not include legal advice disclaimers (the template adds these) OUTPUT FORMAT: Return a JSON object with the following keys: - executive_summary: 2-3 paragraph overview - {section_name}: detailed analysis for each section - remediation_plan: prioritized action items Write for a business audience. Be specific, not generic. ``` **Key rules:** - Always request JSON output — easier to parse and insert into template sections - Include the intake data as structured context, not raw form dumps - Specify the exact section names that match template placeholders - Set temperature to 0.3 for consistency; compliance documents should not be creative - Maximum token limit: 4096 per section to prevent rambling - If the LLM returns malformed JSON, retry once with a stricter prompt **Model selection:** - Default: `qwen2.5:7b` (good balance of quality and speed for 16GB VRAM) - For complex multi-state analysis: `qwen2.5:14b` if GPU memory allows - Configured via `OLLAMA_MODEL` environment variable ## PDF Conversion DOCX to PDF conversion uses a two-tier approach: ### PRIMARY: Windows DocServer (Microsoft Word COM) A Windows server runs a Flask-based DocServer at `:5050` that uses Microsoft Word via COM automation for pixel-perfect DOCX → PDF conversion. This produces the highest-fidelity output (exact font rendering, correct page breaks, proper table formatting). ```python # pdf_converter.py — primary path response = requests.post( f"http://{DOCSERVER_HOST}:5050/convert", files={"file": open(docx_path, "rb")}, timeout=60, ) pdf_bytes = response.content ``` ### FALLBACK: LibreOffice Headless If DocServer is unavailable (network error, timeout, Windows server down), the converter falls back to LibreOffice in headless mode: ```bash libreoffice --headless --convert-to pdf --outdir /tmp document.docx ``` ### Converter Logic The `pdf_converter.py` module handles: - **DocServer first** — POST to `:5050/convert`, 60-second timeout - **Fallback to LibreOffice** — if DocServer returns error or times out - Retry logic (up to 3 attempts per converter) - Temporary file cleanup - Error reporting to ERPNext - Logs which converter was used for each document LibreOffice is installed in the Python worker Docker container (`scripts/Dockerfile`). DocServer host is configured via `DOCSERVER_HOST` environment variable (default: `192.168.1.x`). ## MinIO Upload/Download The `minio_client.py` module provides: ```python # Upload a generated document upload_document( local_path="/tmp/operating-agreement.pdf", minio_path="orders/FO-2026-0001/operating-agreement.pdf", content_type="application/pdf", ) # Download a template download_template( template_name="operating-agreement", # downloads operating-agreement.docx local_path="/tmp/operating-agreement.docx", ) # Generate a pre-signed URL for customer download url = presign_url( minio_path="orders/FO-2026-0001/operating-agreement.pdf", expires=3600, # 1 hour ) ``` **Bucket structure:** See `docs/crm.md` for the full MinIO directory layout. **Security:** MinIO is not exposed externally. The Express API generates time-limited pre-signed URLs for customer downloads. ## Quality Gates ### Admin Review Every generated document enters **Review** status before delivery: 1. Admin opens the order in ERPNext 2. Downloads the DOCX/PDF from the attached MinIO link 3. Reviews for accuracy, completeness, and professionalism 4. Actions: - **Approve** — moves to Ready - **Request Revision** — moves to Revision with notes; worker re-generates - **Reject** — flags for manual document creation ### Revision Loop When a reviewer requests changes: 1. Order status returns to **Processing** 2. Reviewer's notes are stored in the ERPNext order comments 3. Worker re-generates with adjusted prompts or manual edits 4. Document re-enters **Review** 5. Maximum 3 automated revision cycles; after that, manual creation is required ## File Reference ``` scripts/ ├── document_gen/ │ ├── __init__.py │ ├── docx_builder.py # DOCX template filling (Jinja2 + python-docx) │ ├── llm_writer.py # Ollama prompt construction and parsing │ ├── minio_client.py # MinIO upload/download/presign │ └── pdf_converter.py # LibreOffice headless DOCX→PDF ├── templates/ │ ├── create_templates.py # Generates all .docx templates (run once) │ ├── crtc-registration-letter.docx # CRTC carrier registration letter template │ ├── bc-corporate-binder.docx # BC corporate binder (9 sections) │ ├── vendor-directory.docx # Canadian telecom vendor directory │ └── *.docx # Other generated template files └── workers/ ├── base_worker.py # ERPNext polling loop, status transitions ├── erpnext_client.py # ERPNext REST API client ├── delivery_worker.py # Email delivery with SMTP ├── renewal_worker.py # Subscription renewal reminders └── services/ ├── base_handler.py # Base class for service handlers ├── privacy_policy.py # Template-based: fill and convert ├── breach_response.py # LLM: breach response plan ├── flsa_audit.py # LLM: FLSA audit report ├── ccpa_audit.py # LLM: CCPA audit report ├── consent_audit.py # LLM: TCPA consent audit ├── contractor_review.py # LLM: contractor classification ├── handbook_review.py # LLM: handbook review ├── campaign_review.py # LLM: marketing campaign review └── dnc_review.py # LLM: DNC compliance review ```