docserver: self-healing Task Scheduler config + docs
Companion to the worker MinIO-retry fix. Makes the worker auto-recover from process death (crash, manual kill, missed boot trigger), not just MinIO outages. - start_worker.bat: propagate Python's exit code (exit /b %rc%) so Task Scheduler can actually detect a failed run (it previously always exited 0). - reconfigure_task.ps1 (new): re-registers PW-DocserverWorker with RestartCount=99 / 1-min interval, StartWhenAvailable, and two triggers — AtStartup plus a 5-min repeating trigger with MultipleInstances=IgnoreNew, so a dead worker relaunches within ~5 min and never double-runs. Idempotent. - install.ps1: same self-healing settings for fresh installs. - Verified on the box: killed the worker -> task relaunched it; firing again while running stayed at one instance. Docs updated to match reality: - docserver/README.md: new 'Reliability / self-healing' section. - document-generation.md: corrected the stale 'Flask DocServer :5050 / HTTP' description to the actual MinIO outbound-only transport. - e2e-test-plan.md: removed the outdated 'Word COM fails under SYSTEM / requires RDP after every reboot' limitation; now self-healing under SYSTEM session 0. - infrastructure.md: fixed VM spec (Win Server 2019, Word 16.0, Python 3.13, SSH port 22422) + self-healing note. - architecture.md / formation-system.md: trigger + self-healing details.
This commit is contained in:
parent
7929413eeb
commit
b48d0cb799
9 changed files with 150 additions and 24 deletions
46
docserver/reconfigure_task.ps1
Normal file
46
docserver/reconfigure_task.ps1
Normal file
|
|
@ -0,0 +1,46 @@
|
|||
# Reconfigures the PW-DocserverWorker scheduled task for self-healing:
|
||||
# - restart up to 99x at 1-min intervals if the task action fails
|
||||
# - StartWhenAvailable (catch up if a trigger was missed)
|
||||
# - a repeating safety trigger every 5 min with MultipleInstances=IgnoreNew,
|
||||
# so if the worker process ever dies (crash, manual kill, missed boot
|
||||
# trigger) it relaunches within ~5 min instead of waiting for a reboot
|
||||
# - keeps AtStartup + SYSTEM/Highest (current working config)
|
||||
# Idempotent: safe to re-run. Run as Administrator.
|
||||
$ErrorActionPreference = 'Stop'
|
||||
$taskName = 'PW-DocserverWorker'
|
||||
$appDir = 'C:\docserver'
|
||||
|
||||
$action = New-ScheduledTaskAction -Execute 'cmd.exe' `
|
||||
-Argument "/c `"$appDir\start_worker.bat`"" -WorkingDirectory $appDir
|
||||
|
||||
# Two triggers: at boot, and a repeating safety net every 5 minutes (indefinitely).
|
||||
$atStartup = New-ScheduledTaskTrigger -AtStartup
|
||||
$repeat = New-ScheduledTaskTrigger -Once -At (Get-Date) `
|
||||
-RepetitionInterval (New-TimeSpan -Minutes 5)
|
||||
# Some Windows builds cap repetition without an explicit long duration; set ~10y.
|
||||
try { $repeat.Repetition.Duration = 'P3650D' } catch {}
|
||||
|
||||
$settings = New-ScheduledTaskSettingsSet `
|
||||
-ExecutionTimeLimit (New-TimeSpan -Hours 0) `
|
||||
-RestartCount 99 `
|
||||
-RestartInterval (New-TimeSpan -Minutes 1) `
|
||||
-StartWhenAvailable `
|
||||
-MultipleInstances IgnoreNew `
|
||||
-AllowStartIfOnBatteries `
|
||||
-DontStopIfGoingOnBatteries
|
||||
|
||||
$principal = New-ScheduledTaskPrincipal -UserId 'SYSTEM' `
|
||||
-LogonType ServiceAccount -RunLevel Highest
|
||||
|
||||
Register-ScheduledTask -TaskName $taskName -Action $action `
|
||||
-Trigger @($atStartup, $repeat) -Settings $settings -Principal $principal `
|
||||
-Description 'Performance West DOCX-to-PDF worker (MinIO + Word COM). Self-healing: restarts on failure + 5-min safety trigger.' `
|
||||
-Force | Out-Null
|
||||
|
||||
Write-Host "Reconfigured ${taskName}:"
|
||||
$ti = Get-ScheduledTask -TaskName $taskName
|
||||
$ti.Triggers | ForEach-Object { Write-Host (" trigger: " + $_.CimClass.CimClassName) }
|
||||
$s = $ti.Settings
|
||||
Write-Host (" RestartCount=" + $s.RestartCount + " RestartInterval=" + $s.RestartInterval +
|
||||
" StartWhenAvailable=" + $s.StartWhenAvailable + " MultipleInstances=" + $s.MultipleInstances)
|
||||
Write-Host (" State=" + $ti.State)
|
||||
Loading…
Add table
Add a link
Reference in a new issue