new-site/docs/document-generation.md
justin f8cd37ac8c Initial commit — Performance West telecom compliance platform
Includes: API (Express/TypeScript), Astro site, Python workers,
document generators, FCC compliance tools, Canada CRTC formation,
Ansible infrastructure, and deployment scripts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-27 06:54:22 -05:00

12 KiB

Performance West — Document Generation System

Last updated: 2026-03-27

Overview

The document generation system produces professional compliance documents for customers. It supports two generation modes:

  1. Template-based — DOCX templates with Jinja2 placeholders, filled with order data
  2. LLM-based — Templates provide structure; Ollama generates analysis sections

All generated documents pass through a quality gate (admin review) before delivery.

Architecture

                    ┌─────────────┐
                    │   ERPNext   │  (order data + intake forms)
                    └──────┬──────┘
                           │
                    ┌──────┴──────┐
                    │   Worker    │  (Python — polls for Queued orders)
                    └──────┬──────┘
                           │
              ┌────────────┼────────────┐
              │                         │
     ┌────────┴────────┐     ┌─────────┴─────────┐
     │  Template-based │     │    LLM-based      │
     │  (DocxBuilder)  │     │  (DocxBuilder +   │
     │                 │     │   Ollama/LLM)     │
     └────────┬────────┘     └─────────┬─────────┘
              │                         │
              └────────────┬────────────┘
                           │
                    ┌──────┴──────┐
                    │ PDF Convert │
                    │ ┌─────────┐ │
                    │ │DocServer│ │  ← PRIMARY (Windows, MS Word COM, :5050)
                    │ │ :5050   │ │
                    │ └────┬────┘ │
                    │      │ fail │
                    │ ┌────┴────┐ │
                    │ │LibreOfc │ │  ← FALLBACK (headless, in Docker)
                    │ └─────────┘ │
                    └──────┬──────┘
                           │
                    ┌──────┴──────┐
                    │    MinIO    │  (upload DOCX + PDF)
                    └──────┬──────┘
                           │
                    ┌──────┴──────┐
                    │   ERPNext   │  (update status → Review)
                    └─────────────┘

Template-Based Generation

When Used

  • Operating agreements (formation orders)
  • Privacy policies
  • Invoices
  • CRTC registration letter (Canada CRTC Carrier Package)
  • BC corporate binder (9 sections — cover page, incorporation certificate placeholder, articles of incorporation, registered office, directors/officers, share structure, CRTC registration, vendor directory, compliance calendar)
  • Vendor directory PDF (Canadian telecom vendors and contacts)
  • Any document where the content is deterministic (no analysis needed)

How It Works

  1. Worker fetches the .docx template from MinIO (templates/{template-name}.docx)
  2. DocxBuilder loads the template via python-docx
  3. Variables from the ERPNext order are substituted into Jinja2 placeholders
  4. The filled document is saved as DOCX
  5. LibreOffice converts DOCX to PDF
  6. Both files are uploaded to MinIO

DOCX Template Format

Templates are standard .docx files with Jinja2 syntax embedded in the text:

Simple variables:

This Operating Agreement of {{ entity_name }}, a limited liability company
organized under the laws of {{ state_name }}...

Conditionals:

{% if management_type == 'manager' %}
The Manager(s) of the Company shall be {{ managers }}.
{% else %}
All Members shall have the authority to manage the business.
{% endif %}

Loops (for tables or repeated sections):

{% for member in members %}
{{ member.name }} — {{ member.ownership_pct }}% ownership
{% endfor %}

Section placeholders (for LLM-generated content):

{{ executive_summary }}
{{ classification_analysis }}
{{ remediation_plan }}

Creating a New Template

  1. Run python scripts/templates/create_templates.py to generate the base templates, or create manually in Word/LibreOffice
  2. Use {{ variable_name }} for all dynamic content
  3. Use Times New Roman for body text, navy blue (#2D4E78) for headings
  4. Include the Performance West header, confidentiality footer, and page numbers
  5. Save as .docx (not .doc)
  6. Upload to MinIO: mc cp template.docx minio/performancewest/templates/

Modifying an Existing Template

  1. Download from MinIO: mc cp minio/performancewest/templates/name.docx .
  2. Edit in Word or LibreOffice — preserve all {{ }} placeholders
  3. Test locally: python -c "from scripts.document_gen.docx_builder import DocxBuilder; ..."
  4. Upload the updated template back to MinIO
  5. Existing generated documents are not affected (they are separate files)

LLM-Based Generation

When Used

  • FLSA/wage & hour audit reports
  • CCPA/CPRA compliance audit reports
  • TCPA consent audit reports
  • Independent contractor classification assessments
  • Employee handbook reviews
  • Data breach response plans

How It Works

  1. Worker fetches the DOCX template (provides structure and formatting)
  2. Worker constructs a prompt from the service-specific handler + intake data
  3. Worker sends the prompt to Ollama (qwen2.5:7b running locally)
  4. LLM returns analysis text for each section
  5. DocxBuilder.insert_section() replaces section placeholders with LLM output
  6. Simple variables (company name, dates) are filled via DocxBuilder.fill()
  7. Document is converted to PDF and uploaded to MinIO
  8. Status is always set to Review — LLM output must be human-reviewed

Prompt Engineering Guidelines

Each compliance service has a dedicated handler in scripts/workers/services/ that constructs the prompt. Follow these guidelines:

Structure:

You are a compliance consultant preparing a {document_type} for {company_name}.

CONTEXT:
{intake_data formatted as structured text}

INSTRUCTIONS:
- Write in a professional, objective tone
- Cite specific regulations by name and section number
- Identify concrete findings (compliant, non-compliant, needs improvement)
- Provide actionable remediation steps with deadlines
- Do not include legal advice disclaimers (the template adds these)

OUTPUT FORMAT:
Return a JSON object with the following keys:
- executive_summary: 2-3 paragraph overview
- {section_name}: detailed analysis for each section
- remediation_plan: prioritized action items

Write for a business audience. Be specific, not generic.

Key rules:

  • Always request JSON output — easier to parse and insert into template sections
  • Include the intake data as structured context, not raw form dumps
  • Specify the exact section names that match template placeholders
  • Set temperature to 0.3 for consistency; compliance documents should not be creative
  • Maximum token limit: 4096 per section to prevent rambling
  • If the LLM returns malformed JSON, retry once with a stricter prompt

Model selection:

  • Default: qwen2.5:7b (good balance of quality and speed for 16GB VRAM)
  • For complex multi-state analysis: qwen2.5:14b if GPU memory allows
  • Configured via OLLAMA_MODEL environment variable

PDF Conversion

DOCX to PDF conversion uses a two-tier approach:

PRIMARY: Windows DocServer (Microsoft Word COM)

A Windows server runs a Flask-based DocServer at :5050 that uses Microsoft Word via COM automation for pixel-perfect DOCX → PDF conversion. This produces the highest-fidelity output (exact font rendering, correct page breaks, proper table formatting).

# pdf_converter.py — primary path
response = requests.post(
    f"http://{DOCSERVER_HOST}:5050/convert",
    files={"file": open(docx_path, "rb")},
    timeout=60,
)
pdf_bytes = response.content

FALLBACK: LibreOffice Headless

If DocServer is unavailable (network error, timeout, Windows server down), the converter falls back to LibreOffice in headless mode:

libreoffice --headless --convert-to pdf --outdir /tmp document.docx

Converter Logic

The pdf_converter.py module handles:

  • DocServer first — POST to :5050/convert, 60-second timeout
  • Fallback to LibreOffice — if DocServer returns error or times out
  • Retry logic (up to 3 attempts per converter)
  • Temporary file cleanup
  • Error reporting to ERPNext
  • Logs which converter was used for each document

LibreOffice is installed in the Python worker Docker container (scripts/Dockerfile). DocServer host is configured via DOCSERVER_HOST environment variable (default: 192.168.1.x).

MinIO Upload/Download

The minio_client.py module provides:

# Upload a generated document
upload_document(
    local_path="/tmp/operating-agreement.pdf",
    minio_path="orders/FO-2026-0001/operating-agreement.pdf",
    content_type="application/pdf",
)

# Download a template
download_template(
    template_name="operating-agreement",  # downloads operating-agreement.docx
    local_path="/tmp/operating-agreement.docx",
)

# Generate a pre-signed URL for customer download
url = presign_url(
    minio_path="orders/FO-2026-0001/operating-agreement.pdf",
    expires=3600,  # 1 hour
)

Bucket structure: See docs/crm.md for the full MinIO directory layout.

Security: MinIO is not exposed externally. The Express API generates time-limited pre-signed URLs for customer downloads.

Quality Gates

Admin Review

Every generated document enters Review status before delivery:

  1. Admin opens the order in ERPNext
  2. Downloads the DOCX/PDF from the attached MinIO link
  3. Reviews for accuracy, completeness, and professionalism
  4. Actions:
    • Approve — moves to Ready
    • Request Revision — moves to Revision with notes; worker re-generates
    • Reject — flags for manual document creation

Revision Loop

When a reviewer requests changes:

  1. Order status returns to Processing
  2. Reviewer's notes are stored in the ERPNext order comments
  3. Worker re-generates with adjusted prompts or manual edits
  4. Document re-enters Review
  5. Maximum 3 automated revision cycles; after that, manual creation is required

File Reference

scripts/
├── document_gen/
│   ├── __init__.py
│   ├── docx_builder.py          # DOCX template filling (Jinja2 + python-docx)
│   ├── llm_writer.py            # Ollama prompt construction and parsing
│   ├── minio_client.py          # MinIO upload/download/presign
│   └── pdf_converter.py         # LibreOffice headless DOCX→PDF
├── templates/
│   ├── create_templates.py      # Generates all .docx templates (run once)
│   ├── crtc-registration-letter.docx  # CRTC carrier registration letter template
│   ├── bc-corporate-binder.docx       # BC corporate binder (9 sections)
│   ├── vendor-directory.docx          # Canadian telecom vendor directory
│   └── *.docx                   # Other generated template files
└── workers/
    ├── base_worker.py           # ERPNext polling loop, status transitions
    ├── erpnext_client.py        # ERPNext REST API client
    ├── delivery_worker.py       # Email delivery with SMTP
    ├── renewal_worker.py        # Subscription renewal reminders
    └── services/
        ├── base_handler.py      # Base class for service handlers
        ├── privacy_policy.py    # Template-based: fill and convert
        ├── breach_response.py   # LLM: breach response plan
        ├── flsa_audit.py        # LLM: FLSA audit report
        ├── ccpa_audit.py        # LLM: CCPA audit report
        ├── consent_audit.py     # LLM: TCPA consent audit
        ├── contractor_review.py # LLM: contractor classification
        ├── handbook_review.py   # LLM: handbook review
        ├── campaign_review.py   # LLM: marketing campaign review
        └── dnc_review.py        # LLM: DNC compliance review