Includes: API (Express/TypeScript), Astro site, Python workers, document generators, FCC compliance tools, Canada CRTC formation, Ansible infrastructure, and deployment scripts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
12 KiB
Performance West — Document Generation System
Last updated: 2026-03-27
Overview
The document generation system produces professional compliance documents for customers. It supports two generation modes:
- Template-based — DOCX templates with Jinja2 placeholders, filled with order data
- LLM-based — Templates provide structure; Ollama generates analysis sections
All generated documents pass through a quality gate (admin review) before delivery.
Architecture
┌─────────────┐
│ ERPNext │ (order data + intake forms)
└──────┬──────┘
│
┌──────┴──────┐
│ Worker │ (Python — polls for Queued orders)
└──────┬──────┘
│
┌────────────┼────────────┐
│ │
┌────────┴────────┐ ┌─────────┴─────────┐
│ Template-based │ │ LLM-based │
│ (DocxBuilder) │ │ (DocxBuilder + │
│ │ │ Ollama/LLM) │
└────────┬────────┘ └─────────┬─────────┘
│ │
└────────────┬────────────┘
│
┌──────┴──────┐
│ PDF Convert │
│ ┌─────────┐ │
│ │DocServer│ │ ← PRIMARY (Windows, MS Word COM, :5050)
│ │ :5050 │ │
│ └────┬────┘ │
│ │ fail │
│ ┌────┴────┐ │
│ │LibreOfc │ │ ← FALLBACK (headless, in Docker)
│ └─────────┘ │
└──────┬──────┘
│
┌──────┴──────┐
│ MinIO │ (upload DOCX + PDF)
└──────┬──────┘
│
┌──────┴──────┐
│ ERPNext │ (update status → Review)
└─────────────┘
Template-Based Generation
When Used
- Operating agreements (formation orders)
- Privacy policies
- Invoices
- CRTC registration letter (Canada CRTC Carrier Package)
- BC corporate binder (9 sections — cover page, incorporation certificate placeholder, articles of incorporation, registered office, directors/officers, share structure, CRTC registration, vendor directory, compliance calendar)
- Vendor directory PDF (Canadian telecom vendors and contacts)
- Any document where the content is deterministic (no analysis needed)
How It Works
- Worker fetches the
.docxtemplate from MinIO (templates/{template-name}.docx) DocxBuilderloads the template viapython-docx- Variables from the ERPNext order are substituted into Jinja2 placeholders
- The filled document is saved as DOCX
- LibreOffice converts DOCX to PDF
- Both files are uploaded to MinIO
DOCX Template Format
Templates are standard .docx files with Jinja2 syntax embedded in the text:
Simple variables:
This Operating Agreement of {{ entity_name }}, a limited liability company
organized under the laws of {{ state_name }}...
Conditionals:
{% if management_type == 'manager' %}
The Manager(s) of the Company shall be {{ managers }}.
{% else %}
All Members shall have the authority to manage the business.
{% endif %}
Loops (for tables or repeated sections):
{% for member in members %}
{{ member.name }} — {{ member.ownership_pct }}% ownership
{% endfor %}
Section placeholders (for LLM-generated content):
{{ executive_summary }}
{{ classification_analysis }}
{{ remediation_plan }}
Creating a New Template
- Run
python scripts/templates/create_templates.pyto generate the base templates, or create manually in Word/LibreOffice - Use
{{ variable_name }}for all dynamic content - Use Times New Roman for body text, navy blue (
#2D4E78) for headings - Include the Performance West header, confidentiality footer, and page numbers
- Save as
.docx(not.doc) - Upload to MinIO:
mc cp template.docx minio/performancewest/templates/
Modifying an Existing Template
- Download from MinIO:
mc cp minio/performancewest/templates/name.docx . - Edit in Word or LibreOffice — preserve all
{{ }}placeholders - Test locally:
python -c "from scripts.document_gen.docx_builder import DocxBuilder; ..." - Upload the updated template back to MinIO
- Existing generated documents are not affected (they are separate files)
LLM-Based Generation
When Used
- FLSA/wage & hour audit reports
- CCPA/CPRA compliance audit reports
- TCPA consent audit reports
- Independent contractor classification assessments
- Employee handbook reviews
- Data breach response plans
How It Works
- Worker fetches the DOCX template (provides structure and formatting)
- Worker constructs a prompt from the service-specific handler + intake data
- Worker sends the prompt to Ollama (qwen2.5:7b running locally)
- LLM returns analysis text for each section
DocxBuilder.insert_section()replaces section placeholders with LLM output- Simple variables (company name, dates) are filled via
DocxBuilder.fill() - Document is converted to PDF and uploaded to MinIO
- Status is always set to Review — LLM output must be human-reviewed
Prompt Engineering Guidelines
Each compliance service has a dedicated handler in scripts/workers/services/ that constructs the prompt. Follow these guidelines:
Structure:
You are a compliance consultant preparing a {document_type} for {company_name}.
CONTEXT:
{intake_data formatted as structured text}
INSTRUCTIONS:
- Write in a professional, objective tone
- Cite specific regulations by name and section number
- Identify concrete findings (compliant, non-compliant, needs improvement)
- Provide actionable remediation steps with deadlines
- Do not include legal advice disclaimers (the template adds these)
OUTPUT FORMAT:
Return a JSON object with the following keys:
- executive_summary: 2-3 paragraph overview
- {section_name}: detailed analysis for each section
- remediation_plan: prioritized action items
Write for a business audience. Be specific, not generic.
Key rules:
- Always request JSON output — easier to parse and insert into template sections
- Include the intake data as structured context, not raw form dumps
- Specify the exact section names that match template placeholders
- Set temperature to 0.3 for consistency; compliance documents should not be creative
- Maximum token limit: 4096 per section to prevent rambling
- If the LLM returns malformed JSON, retry once with a stricter prompt
Model selection:
- Default:
qwen2.5:7b(good balance of quality and speed for 16GB VRAM) - For complex multi-state analysis:
qwen2.5:14bif GPU memory allows - Configured via
OLLAMA_MODELenvironment variable
PDF Conversion
DOCX to PDF conversion uses a two-tier approach:
PRIMARY: Windows DocServer (Microsoft Word COM)
A Windows server runs a Flask-based DocServer at :5050 that uses Microsoft Word via COM
automation for pixel-perfect DOCX → PDF conversion. This produces the highest-fidelity
output (exact font rendering, correct page breaks, proper table formatting).
# pdf_converter.py — primary path
response = requests.post(
f"http://{DOCSERVER_HOST}:5050/convert",
files={"file": open(docx_path, "rb")},
timeout=60,
)
pdf_bytes = response.content
FALLBACK: LibreOffice Headless
If DocServer is unavailable (network error, timeout, Windows server down), the converter falls back to LibreOffice in headless mode:
libreoffice --headless --convert-to pdf --outdir /tmp document.docx
Converter Logic
The pdf_converter.py module handles:
- DocServer first — POST to
:5050/convert, 60-second timeout - Fallback to LibreOffice — if DocServer returns error or times out
- Retry logic (up to 3 attempts per converter)
- Temporary file cleanup
- Error reporting to ERPNext
- Logs which converter was used for each document
LibreOffice is installed in the Python worker Docker container (scripts/Dockerfile).
DocServer host is configured via DOCSERVER_HOST environment variable (default: 192.168.1.x).
MinIO Upload/Download
The minio_client.py module provides:
# Upload a generated document
upload_document(
local_path="/tmp/operating-agreement.pdf",
minio_path="orders/FO-2026-0001/operating-agreement.pdf",
content_type="application/pdf",
)
# Download a template
download_template(
template_name="operating-agreement", # downloads operating-agreement.docx
local_path="/tmp/operating-agreement.docx",
)
# Generate a pre-signed URL for customer download
url = presign_url(
minio_path="orders/FO-2026-0001/operating-agreement.pdf",
expires=3600, # 1 hour
)
Bucket structure: See docs/crm.md for the full MinIO directory layout.
Security: MinIO is not exposed externally. The Express API generates time-limited pre-signed URLs for customer downloads.
Quality Gates
Admin Review
Every generated document enters Review status before delivery:
- Admin opens the order in ERPNext
- Downloads the DOCX/PDF from the attached MinIO link
- Reviews for accuracy, completeness, and professionalism
- Actions:
- Approve — moves to Ready
- Request Revision — moves to Revision with notes; worker re-generates
- Reject — flags for manual document creation
Revision Loop
When a reviewer requests changes:
- Order status returns to Processing
- Reviewer's notes are stored in the ERPNext order comments
- Worker re-generates with adjusted prompts or manual edits
- Document re-enters Review
- Maximum 3 automated revision cycles; after that, manual creation is required
File Reference
scripts/
├── document_gen/
│ ├── __init__.py
│ ├── docx_builder.py # DOCX template filling (Jinja2 + python-docx)
│ ├── llm_writer.py # Ollama prompt construction and parsing
│ ├── minio_client.py # MinIO upload/download/presign
│ └── pdf_converter.py # LibreOffice headless DOCX→PDF
├── templates/
│ ├── create_templates.py # Generates all .docx templates (run once)
│ ├── crtc-registration-letter.docx # CRTC carrier registration letter template
│ ├── bc-corporate-binder.docx # BC corporate binder (9 sections)
│ ├── vendor-directory.docx # Canadian telecom vendor directory
│ └── *.docx # Other generated template files
└── workers/
├── base_worker.py # ERPNext polling loop, status transitions
├── erpnext_client.py # ERPNext REST API client
├── delivery_worker.py # Email delivery with SMTP
├── renewal_worker.py # Subscription renewal reminders
└── services/
├── base_handler.py # Base class for service handlers
├── privacy_policy.py # Template-based: fill and convert
├── breach_response.py # LLM: breach response plan
├── flsa_audit.py # LLM: FLSA audit report
├── ccpa_audit.py # LLM: CCPA audit report
├── consent_audit.py # LLM: TCPA consent audit
├── contractor_review.py # LLM: contractor classification
├── handbook_review.py # LLM: handbook review
├── campaign_review.py # LLM: marketing campaign review
└── dnc_review.py # LLM: DNC compliance review