Skip to main content

Usage

archal run <scenario> [options]

Arguments

ArgumentDescription
<scenario>Scenario path (.md) or bundled scenario name (for example close-stale-issues)

Options

FlagDescriptionDefault
-n, --runs <count>Number of runs (max 100)2
-t, --timeout <seconds>Timeout per run in seconds (max 3600)600
-m, --model <model>Evaluator model for probabilistic criteriafrom evaluator.model config
-o, --output <format>Output format: terminal, json, junitterminal
--seed <name>Override twin seed namefrom scenario config
--rate-limit <count>Rate limit: max total requests before 429unlimited
--pass-threshold <score>Minimum passing satisfaction score (0–100)0
--tag <tag>Only run if scenario has this tag (exits 0 if not matched)
--api-key <key>API key for the model provider (overrides env vars)from env vars
--engine-endpoint <url>Agent gateway URL (remote /v1/responses endpoint)from ARCHAL_ENGINE_ENDPOINT
--engine-token <token>Bearer token for API engine authfrom ARCHAL_ENGINE_TOKEN
--agent-model <model>Agent model identifier. Required in API mode.from ARCHAL_ENGINE_MODEL
--engine-twin-urls <path>JSON file mapping twin names to remote-reachable MCP base URLsfrom ARCHAL_ENGINE_TWIN_URLS
--engine-timeout <seconds>Timeout for API engine HTTP call per runrun timeout
--harness <name>Use a named harness (react, hardened, zero-shot, naive, openclaw, or ~/.archal/harnesses/<name>)react (or engine.defaultHarness config)
--harness-dir <path>Local agent execution directory (archal-harness.json is optional)from ARCHAL_HARNESS_DIR
--api-base-urls <path>JSON file mapping service names to clone API base URLsoff
--api-proxy-url <url>Proxy URL for raw API code routing metadatafrom ARCHAL_API_PROXY_URL
--preflight-onlyValidate environment/config and exit before executionfalse
--seed-cacheEnable dynamic seed cache reusefalse (off by default)
--replay-seed <path>Replay a previously saved managed seed snapshotoff
--save-seed <path>Save the resolved managed seed snapshot used for this runoff
--no-failure-analysisSkip LLM failure analysis on imperfect scoresfalse
--allow-ambiguous-seedAllow dynamic seed generation when setup is underspecifiedfalse
--strict-seedTreat seed FK and coverage warnings as hard errorsfalse
--sandboxRun agent in sandboxed Docker container with TLS proxyfalse
--no-dockerSkip Docker and run with local OpenClaw CLI + proxyfalse
--openclaw-home <dir>Path to full OpenClaw home directory~/.openclaw
--workspace <dir>OpenClaw workspace directory to mount (workspace-only mode)
--openclaw-config <path>Path to openclaw.json (workspace-only mode)
--openclaw-version <version>OpenClaw version for sandbox image build
--openclaw-eval-mode <mode>OpenClaw eval mode: isolated or stateful
-q, --quietSuppress non-error outputfalse
-v, --verboseEnable debug loggingfalse

Agent execution modes

The agent execution mode is inferred from the flags you provide. Twins are always cloud-hosted.

API mode (remote agent)

Sends the scenario to a remote /v1/responses endpoint. Inferred when --engine-endpoint is set.
archal run scenario.md \
  --engine-endpoint "https://gateway.openclaw.ai/v1/responses" \
  --engine-token "$OPENCLAW_GATEWAY_TOKEN" \
  --agent-model "openclaw:main" \
  -n 1

Harness mode (local agent)

Spawns a local agent command. Inferred when --harness, --harness-dir, or engine.defaultHarness config is set:
# Use a bundled or custom harness by name
archal run scenario.md \
  --harness react \
  --agent-model gemini-2.5-flash \
  -n 3

# Or point to a custom harness directory
archal run scenario.md \
  --harness-dir ./my-harness \
  --agent-model "my-model" \
  -n 3
--harness and --harness-dir are mutually exclusive.

Sandbox mode (OpenClaw in Docker)

Runs your OpenClaw agent in a Docker container with a TLS-intercepting proxy. All HTTPS calls to service domains are transparently routed to cloud-hosted twins. Inferred when --sandbox is set.
archal run scenario.md \
  --sandbox \
  --openclaw-home ~/.openclaw
Use --no-docker to skip Docker and run with a local OpenClaw CLI + proxy instead.

Mode inference

  1. --sandboxsandbox mode
  2. --engine-endpoint, ARCHAL_ENGINE_ENDPOINT, or OPENCLAW_URL is set → API mode
  3. --harness, --harness-dir, ARCHAL_HARNESS_DIR, or engine.defaultHarness config → harness mode
  4. Neither → error with guidance

Exit codes

CodeMeaning
0Run succeeded (or scenario was skipped by --tag) and score met --pass-threshold
1Runtime error or satisfaction below threshold
2Validation error (bad flags, missing scenario, invalid config)

Environment variables

VariableDescription
ARCHAL_ENGINE_ENDPOINTDefault API engine endpoint
ARCHAL_ENGINE_TOKENDefault API engine auth token
ARCHAL_ENGINE_MODELDefault engine model identifier
ARCHAL_ENGINE_TIMEOUTDefault API engine timeout (seconds)
ARCHAL_ENGINE_TWIN_URLSDefault path to remote twin URL override map
ARCHAL_ENGINE_API_KEYAPI key for the model under test
OPENCLAW_URLLegacy OpenClaw endpoint alias for API mode
OPENCLAW_GATEWAY_TOKENLegacy OpenClaw token alias for API mode
OPENCLAW_GATEWAY_PASSWORDLegacy OpenClaw password alias for API mode
OPENCLAW_AGENT_IDLegacy OpenClaw agent fallback (prefer --agent-model or ARCHAL_ENGINE_MODEL)
ARCHAL_HARNESS_DIRDefault harness directory for local agent execution