Orchestration
Durable execution, retries, and cancellation.
Orchestration decides when an agent runs and how to handle failures mid-run. You submit a run; the orchestrator schedules, retries, and reports on it.
What the orchestrator does
- Schedules the run onto available capacity.
- Wakes the harness when the next step is ready.
- Retries failed steps with back-off, idempotently.
- Cancels a run cleanly on request, releasing the sandbox and the resources it held.
Durability
Every step is appended to the session as an idempotent event, so the orchestrator can resume a run exactly where it left off after a network failure, tool timeout, or pod eviction. Retries don't produce duplicate effects.
Triggering a run
A single request creates a run:
curl -X POST https://api.levainlabs.com/api/v1/runs \
-H "Authorization: Bearer $LEVAIN_API_KEY" \
-d '{
"agent_id": "$AGENT_ID",
"config": {...}
}'From there, poll
GET /api/v1/runs/{run_id} or
stream logs to watch it
work.
Cancelling a run
To stop a run in flight, call
POST /api/v1/runs/{run_id}/cancel.
The orchestrator stops scheduling new steps, drains the in-flight step, and
marks the run cancelled. The session is preserved for inspection up to the
cancellation point.