Includes: API (Express/TypeScript), Astro site, Python workers, document generators, FCC compliance tools, Canada CRTC formation, Ansible infrastructure, and deployment scripts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
318 lines
12 KiB
Markdown
318 lines
12 KiB
Markdown
# Performance West — Document Generation System
|
|
|
|
**Last updated:** 2026-03-27
|
|
|
|
## Overview
|
|
|
|
The document generation system produces professional compliance documents for customers. It supports two generation modes:
|
|
|
|
1. **Template-based** — DOCX templates with Jinja2 placeholders, filled with order data
|
|
2. **LLM-based** — Templates provide structure; Ollama generates analysis sections
|
|
|
|
All generated documents pass through a quality gate (admin review) before delivery.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────┐
|
|
│ ERPNext │ (order data + intake forms)
|
|
└──────┬──────┘
|
|
│
|
|
┌──────┴──────┐
|
|
│ Worker │ (Python — polls for Queued orders)
|
|
└──────┬──────┘
|
|
│
|
|
┌────────────┼────────────┐
|
|
│ │
|
|
┌────────┴────────┐ ┌─────────┴─────────┐
|
|
│ Template-based │ │ LLM-based │
|
|
│ (DocxBuilder) │ │ (DocxBuilder + │
|
|
│ │ │ Ollama/LLM) │
|
|
└────────┬────────┘ └─────────┬─────────┘
|
|
│ │
|
|
└────────────┬────────────┘
|
|
│
|
|
┌──────┴──────┐
|
|
│ PDF Convert │
|
|
│ ┌─────────┐ │
|
|
│ │DocServer│ │ ← PRIMARY (Windows, MS Word COM, :5050)
|
|
│ │ :5050 │ │
|
|
│ └────┬────┘ │
|
|
│ │ fail │
|
|
│ ┌────┴────┐ │
|
|
│ │LibreOfc │ │ ← FALLBACK (headless, in Docker)
|
|
│ └─────────┘ │
|
|
└──────┬──────┘
|
|
│
|
|
┌──────┴──────┐
|
|
│ MinIO │ (upload DOCX + PDF)
|
|
└──────┬──────┘
|
|
│
|
|
┌──────┴──────┐
|
|
│ ERPNext │ (update status → Review)
|
|
└─────────────┘
|
|
```
|
|
|
|
## Template-Based Generation
|
|
|
|
### When Used
|
|
|
|
- Operating agreements (formation orders)
|
|
- Privacy policies
|
|
- Invoices
|
|
- CRTC registration letter (Canada CRTC Carrier Package)
|
|
- BC corporate binder (9 sections — cover page, incorporation certificate placeholder,
|
|
articles of incorporation, registered office, directors/officers, share structure,
|
|
CRTC registration, vendor directory, compliance calendar)
|
|
- Vendor directory PDF (Canadian telecom vendors and contacts)
|
|
- Any document where the content is deterministic (no analysis needed)
|
|
|
|
### How It Works
|
|
|
|
1. Worker fetches the `.docx` template from MinIO (`templates/{template-name}.docx`)
|
|
2. `DocxBuilder` loads the template via `python-docx`
|
|
3. Variables from the ERPNext order are substituted into Jinja2 placeholders
|
|
4. The filled document is saved as DOCX
|
|
5. LibreOffice converts DOCX to PDF
|
|
6. Both files are uploaded to MinIO
|
|
|
|
### DOCX Template Format
|
|
|
|
Templates are standard `.docx` files with Jinja2 syntax embedded in the text:
|
|
|
|
**Simple variables:**
|
|
```
|
|
This Operating Agreement of {{ entity_name }}, a limited liability company
|
|
organized under the laws of {{ state_name }}...
|
|
```
|
|
|
|
**Conditionals:**
|
|
```
|
|
{% if management_type == 'manager' %}
|
|
The Manager(s) of the Company shall be {{ managers }}.
|
|
{% else %}
|
|
All Members shall have the authority to manage the business.
|
|
{% endif %}
|
|
```
|
|
|
|
**Loops (for tables or repeated sections):**
|
|
```
|
|
{% for member in members %}
|
|
{{ member.name }} — {{ member.ownership_pct }}% ownership
|
|
{% endfor %}
|
|
```
|
|
|
|
**Section placeholders (for LLM-generated content):**
|
|
```
|
|
{{ executive_summary }}
|
|
{{ classification_analysis }}
|
|
{{ remediation_plan }}
|
|
```
|
|
|
|
### Creating a New Template
|
|
|
|
1. Run `python scripts/templates/create_templates.py` to generate the base templates, or create manually in Word/LibreOffice
|
|
2. Use `{{ variable_name }}` for all dynamic content
|
|
3. Use Times New Roman for body text, navy blue (`#2D4E78`) for headings
|
|
4. Include the Performance West header, confidentiality footer, and page numbers
|
|
5. Save as `.docx` (not `.doc`)
|
|
6. Upload to MinIO: `mc cp template.docx minio/performancewest/templates/`
|
|
|
|
### Modifying an Existing Template
|
|
|
|
1. Download from MinIO: `mc cp minio/performancewest/templates/name.docx .`
|
|
2. Edit in Word or LibreOffice — preserve all `{{ }}` placeholders
|
|
3. Test locally: `python -c "from scripts.document_gen.docx_builder import DocxBuilder; ..."`
|
|
4. Upload the updated template back to MinIO
|
|
5. Existing generated documents are not affected (they are separate files)
|
|
|
|
## LLM-Based Generation
|
|
|
|
### When Used
|
|
|
|
- FLSA/wage & hour audit reports
|
|
- CCPA/CPRA compliance audit reports
|
|
- TCPA consent audit reports
|
|
- Independent contractor classification assessments
|
|
- Employee handbook reviews
|
|
- Data breach response plans
|
|
|
|
### How It Works
|
|
|
|
1. Worker fetches the DOCX template (provides structure and formatting)
|
|
2. Worker constructs a prompt from the service-specific handler + intake data
|
|
3. Worker sends the prompt to Ollama (qwen2.5:7b running locally)
|
|
4. LLM returns analysis text for each section
|
|
5. `DocxBuilder.insert_section()` replaces section placeholders with LLM output
|
|
6. Simple variables (company name, dates) are filled via `DocxBuilder.fill()`
|
|
7. Document is converted to PDF and uploaded to MinIO
|
|
8. Status is always set to **Review** — LLM output must be human-reviewed
|
|
|
|
### Prompt Engineering Guidelines
|
|
|
|
Each compliance service has a dedicated handler in `scripts/workers/services/` that constructs the prompt. Follow these guidelines:
|
|
|
|
**Structure:**
|
|
```
|
|
You are a compliance consultant preparing a {document_type} for {company_name}.
|
|
|
|
CONTEXT:
|
|
{intake_data formatted as structured text}
|
|
|
|
INSTRUCTIONS:
|
|
- Write in a professional, objective tone
|
|
- Cite specific regulations by name and section number
|
|
- Identify concrete findings (compliant, non-compliant, needs improvement)
|
|
- Provide actionable remediation steps with deadlines
|
|
- Do not include legal advice disclaimers (the template adds these)
|
|
|
|
OUTPUT FORMAT:
|
|
Return a JSON object with the following keys:
|
|
- executive_summary: 2-3 paragraph overview
|
|
- {section_name}: detailed analysis for each section
|
|
- remediation_plan: prioritized action items
|
|
|
|
Write for a business audience. Be specific, not generic.
|
|
```
|
|
|
|
**Key rules:**
|
|
- Always request JSON output — easier to parse and insert into template sections
|
|
- Include the intake data as structured context, not raw form dumps
|
|
- Specify the exact section names that match template placeholders
|
|
- Set temperature to 0.3 for consistency; compliance documents should not be creative
|
|
- Maximum token limit: 4096 per section to prevent rambling
|
|
- If the LLM returns malformed JSON, retry once with a stricter prompt
|
|
|
|
**Model selection:**
|
|
- Default: `qwen2.5:7b` (good balance of quality and speed for 16GB VRAM)
|
|
- For complex multi-state analysis: `qwen2.5:14b` if GPU memory allows
|
|
- Configured via `OLLAMA_MODEL` environment variable
|
|
|
|
## PDF Conversion
|
|
|
|
DOCX to PDF conversion uses a two-tier approach:
|
|
|
|
### PRIMARY: Windows DocServer (Microsoft Word COM)
|
|
|
|
A Windows server runs a Flask-based DocServer at `:5050` that uses Microsoft Word via COM
|
|
automation for pixel-perfect DOCX → PDF conversion. This produces the highest-fidelity
|
|
output (exact font rendering, correct page breaks, proper table formatting).
|
|
|
|
```python
|
|
# pdf_converter.py — primary path
|
|
response = requests.post(
|
|
f"http://{DOCSERVER_HOST}:5050/convert",
|
|
files={"file": open(docx_path, "rb")},
|
|
timeout=60,
|
|
)
|
|
pdf_bytes = response.content
|
|
```
|
|
|
|
### FALLBACK: LibreOffice Headless
|
|
|
|
If DocServer is unavailable (network error, timeout, Windows server down), the converter
|
|
falls back to LibreOffice in headless mode:
|
|
|
|
```bash
|
|
libreoffice --headless --convert-to pdf --outdir /tmp document.docx
|
|
```
|
|
|
|
### Converter Logic
|
|
|
|
The `pdf_converter.py` module handles:
|
|
- **DocServer first** — POST to `:5050/convert`, 60-second timeout
|
|
- **Fallback to LibreOffice** — if DocServer returns error or times out
|
|
- Retry logic (up to 3 attempts per converter)
|
|
- Temporary file cleanup
|
|
- Error reporting to ERPNext
|
|
- Logs which converter was used for each document
|
|
|
|
LibreOffice is installed in the Python worker Docker container (`scripts/Dockerfile`).
|
|
DocServer host is configured via `DOCSERVER_HOST` environment variable (default: `192.168.1.x`).
|
|
|
|
## MinIO Upload/Download
|
|
|
|
The `minio_client.py` module provides:
|
|
|
|
```python
|
|
# Upload a generated document
|
|
upload_document(
|
|
local_path="/tmp/operating-agreement.pdf",
|
|
minio_path="orders/FO-2026-0001/operating-agreement.pdf",
|
|
content_type="application/pdf",
|
|
)
|
|
|
|
# Download a template
|
|
download_template(
|
|
template_name="operating-agreement", # downloads operating-agreement.docx
|
|
local_path="/tmp/operating-agreement.docx",
|
|
)
|
|
|
|
# Generate a pre-signed URL for customer download
|
|
url = presign_url(
|
|
minio_path="orders/FO-2026-0001/operating-agreement.pdf",
|
|
expires=3600, # 1 hour
|
|
)
|
|
```
|
|
|
|
**Bucket structure:** See `docs/crm.md` for the full MinIO directory layout.
|
|
|
|
**Security:** MinIO is not exposed externally. The Express API generates time-limited pre-signed URLs for customer downloads.
|
|
|
|
## Quality Gates
|
|
|
|
### Admin Review
|
|
|
|
Every generated document enters **Review** status before delivery:
|
|
|
|
1. Admin opens the order in ERPNext
|
|
2. Downloads the DOCX/PDF from the attached MinIO link
|
|
3. Reviews for accuracy, completeness, and professionalism
|
|
4. Actions:
|
|
- **Approve** — moves to Ready
|
|
- **Request Revision** — moves to Revision with notes; worker re-generates
|
|
- **Reject** — flags for manual document creation
|
|
|
|
### Revision Loop
|
|
|
|
When a reviewer requests changes:
|
|
|
|
1. Order status returns to **Processing**
|
|
2. Reviewer's notes are stored in the ERPNext order comments
|
|
3. Worker re-generates with adjusted prompts or manual edits
|
|
4. Document re-enters **Review**
|
|
5. Maximum 3 automated revision cycles; after that, manual creation is required
|
|
|
|
## File Reference
|
|
|
|
```
|
|
scripts/
|
|
├── document_gen/
|
|
│ ├── __init__.py
|
|
│ ├── docx_builder.py # DOCX template filling (Jinja2 + python-docx)
|
|
│ ├── llm_writer.py # Ollama prompt construction and parsing
|
|
│ ├── minio_client.py # MinIO upload/download/presign
|
|
│ └── pdf_converter.py # LibreOffice headless DOCX→PDF
|
|
├── templates/
|
|
│ ├── create_templates.py # Generates all .docx templates (run once)
|
|
│ ├── crtc-registration-letter.docx # CRTC carrier registration letter template
|
|
│ ├── bc-corporate-binder.docx # BC corporate binder (9 sections)
|
|
│ ├── vendor-directory.docx # Canadian telecom vendor directory
|
|
│ └── *.docx # Other generated template files
|
|
└── workers/
|
|
├── base_worker.py # ERPNext polling loop, status transitions
|
|
├── erpnext_client.py # ERPNext REST API client
|
|
├── delivery_worker.py # Email delivery with SMTP
|
|
├── renewal_worker.py # Subscription renewal reminders
|
|
└── services/
|
|
├── base_handler.py # Base class for service handlers
|
|
├── privacy_policy.py # Template-based: fill and convert
|
|
├── breach_response.py # LLM: breach response plan
|
|
├── flsa_audit.py # LLM: FLSA audit report
|
|
├── ccpa_audit.py # LLM: CCPA audit report
|
|
├── consent_audit.py # LLM: TCPA consent audit
|
|
├── contractor_review.py # LLM: contractor classification
|
|
├── handbook_review.py # LLM: handbook review
|
|
├── campaign_review.py # LLM: marketing campaign review
|
|
└── dnc_review.py # LLM: DNC compliance review
|
|
```
|