Service Mode
Service mode keeps persistent worker loops alive so you can reuse the same execution context across multiple requests.
It is useful after a project already runs correctly in the normal local or distributed path. It is not part of the first-proof workflow.
Architecturally, service mode is queue-backed persistent worker execution, not a live RPC/session bus. AGILAB keeps worker loops alive, writes tasks into the service queue, tracks heartbeats and status files, and lets workers pull their next unit of work from that queue.
That choice keeps service mode aligned with the normal worker contract, but it also means operators should think in queue semantics rather than interactive request/response semantics.
Service queue security contract
Service task files use non-executable JSON payloads with schema
agi.service.task.v1 and the *.task.json suffix. Workers reject legacy
*.task.pkl files by moving them to failed without deserializing them.
The queue is still trusted scheduler-owned state, not a multi-tenant input surface. Keep the queue directory writable only by the scheduler/operator that submits work, and run workers without unnecessary secrets or filesystem access when apps or generated code are not fully trusted.
When to use it
Use service mode when you need one or more of the following:
repeated requests on the same app with low latency;
controlled worker lifecycle (start/status/health/stop);
machine-readable health output for monitoring.
Fast path in ORCHESTRATE (web interface)
Open ORCHESTRATE and select your project.
In System settings, configure cluster mode, scheduler, and workers.
In Service mode (persistent workers), click START service once.
Use STATUS service to inspect running/pending workers.
Use HEALTH gate to enforce SLA thresholds from
app_settings.toml.Use EXPORT snapshot to write the current operator summary, health rows, and gate thresholds to a JSON file under
~/log/execute/<app_target>/.Use STOP service before changing topology or ending the session.
Action semantics
action="start": provisions workers and starts persistent loops.action="status": returns runtime state (running/degraded/idle/stopped/error).action="health": same status snapshot plus JSON export (schemaagi.service.health.v1).action="stop": requests loop termination and optionally shuts down the Dask cluster.
The ORCHESTRATE panel also provides a UI-only export action for operators:
EXPORT snapshotwritesservice_operator_snapshot.jsonunder~/log/execute/<app_target>/with the current status, cached worker health, and effective SLA thresholds.
What service mode is, and what it is not
Service mode is:
persistent worker loops reused across requests
queue-backed execution with heartbeats and status snapshots
a good fit for repeated requests on the same already-installed app
Service mode is not:
a generic live RPC fabric
a per-request interactive remote session
a replacement for making work visible in the normal AGILAB work plan when you need first-class scheduling or telemetry
End-to-end CLI example
import asyncio
from agi_cluster.agi_distributor import AGI
from agi_env import AgiEnv
APPS_PATH = "src/agilab/apps/builtin"
APP = "mycode_project"
async def main():
env = AgiEnv(apps_path=APPS_PATH, app=APP, verbose=1)
started = await AGI.serve(env, action="start")
print("START:", started["status"])
status = await AGI.serve(env, action="status")
print("STATUS:", status["status"], status.get("workers_running_count", 0))
health = await AGI.serve(env, action="health")
print("HEALTH:", health["status"], health.get("workers_unhealthy_count", 0))
stopped = await AGI.serve(env, action="stop", shutdown_on_stop=False)
print("STOP:", stopped["status"])
if __name__ == "__main__":
asyncio.run(main())
SLA thresholds
Per-app defaults are stored in [cluster.service_health]:
[cluster.service_health]
allow_idle = false
max_unhealthy = 0
max_restart_rate = 0.25
These values are used by the ORCHESTRATE HEALTH gate and by
tools/service_health_check.py unless overridden on the command line.
Operational checks
Use this checker for automation/monitoring:
uv run python tools/service_health_check.py \
--app mycode_project \
--apps-path src/agilab/apps/builtin
Health JSON is written by default to:
${AGI_CLUSTER_SHARE}/service/<app_target>/health.json
Operator snapshot JSON written from the ORCHESTRATE page is stored at:
~/log/execute/<app_target>/service_operator_snapshot.json
Common pitfalls
Calling
starttwice withoutstopfirst: stop the existing service before restarting.Health status is
idlebut policy requires activity: setallow_idle = falseand enforce with HEALTH gate.Missing health file in external monitor: call
action="health"and verify permissions on the target output path.