archal run

Usage

archal run <scenario> [options]

Arguments

Argument	Description
`<scenario>`	Scenario path (`.md`) or bundled scenario name (for example `close-stale-issues`)

Options

Flag	Description	Default
`-n, --runs <count>`	Number of runs (max 100)	`2`
`-t, --timeout <seconds>`	Timeout per run in seconds (max 3600)	`600`
`-m, --model <model>`	Evaluator model for probabilistic criteria	from `evaluator.model` config
`-o, --output <format>`	Output format: `terminal`, `json`, `junit`	`terminal`
`--seed <name>`	Override twin seed name	from scenario config
`--rate-limit <count>`	Rate limit: max total requests before 429	unlimited
`--pass-threshold <score>`	Minimum passing satisfaction score (0–100)	`0`
`--tag <tag>`	Only run if scenario has this tag (exits 0 if not matched)	—
`--api-key <key>`	API key for the model provider (overrides env vars)	from env vars
`--engine-endpoint <url>`	Agent gateway URL (remote `/v1/responses` endpoint)	from `ARCHAL_ENGINE_ENDPOINT`
`--engine-token <token>`	Bearer token for API engine auth	from `ARCHAL_ENGINE_TOKEN`
`--agent-model <model>`	Agent model identifier. Required in API mode.	from `ARCHAL_ENGINE_MODEL`
`--engine-twin-urls <path>`	JSON file mapping twin names to remote-reachable MCP base URLs	from `ARCHAL_ENGINE_TWIN_URLS`
`--engine-timeout <seconds>`	Timeout for API engine HTTP call per run	run timeout
`--harness <name>`	Use a named harness (`react`, `hardened`, `zero-shot`, `naive`, `openclaw`, or `~/.archal/harnesses/<name>`)	`react` (or `engine.defaultHarness` config)
`--harness-dir <path>`	Local agent execution directory (`archal-harness.json` is optional)	from `ARCHAL_HARNESS_DIR`
`--api-base-urls <path>`	JSON file mapping service names to clone API base URLs	off
`--api-proxy-url <url>`	Proxy URL for raw API code routing metadata	from `ARCHAL_API_PROXY_URL`
`--preflight-only`	Validate environment/config and exit before execution	`false`
`--seed-cache`	Enable dynamic seed cache reuse	`false` (off by default)
`--replay-seed <path>`	Replay a previously saved managed seed snapshot	off
`--save-seed <path>`	Save the resolved managed seed snapshot used for this run	off
`--no-failure-analysis`	Skip LLM failure analysis on imperfect scores	`false`
`--allow-ambiguous-seed`	Allow dynamic seed generation when setup is underspecified	`false`
`--strict-seed`	Treat seed FK and coverage warnings as hard errors	`false`
`--sandbox`	Run agent in sandboxed Docker container with TLS proxy	`false`
`--no-docker`	Skip Docker and run with local OpenClaw CLI + proxy	`false`
`--openclaw-home <dir>`	Path to full OpenClaw home directory	`~/.openclaw`
`--workspace <dir>`	OpenClaw workspace directory to mount (workspace-only mode)	—
`--openclaw-config <path>`	Path to openclaw.json (workspace-only mode)	—
`--openclaw-version <version>`	OpenClaw version for sandbox image build	—
`--openclaw-eval-mode <mode>`	OpenClaw eval mode: `isolated` or `stateful`	—
`-q, --quiet`	Suppress non-error output	`false`
`-v, --verbose`	Enable debug logging	`false`

Agent execution modes

The agent execution mode is inferred from the flags you provide. Twins are always cloud-hosted.

API mode (remote agent)

Sends the scenario to a remote /v1/responses endpoint. Inferred when --engine-endpoint is set.

archal run scenario.md \
  --engine-endpoint "https://gateway.openclaw.ai/v1/responses" \
  --engine-token "$OPENCLAW_GATEWAY_TOKEN" \
  --agent-model "openclaw:main" \
  -n 1

Harness mode (local agent)

Spawns a local agent command. Inferred when --harness, --harness-dir, or engine.defaultHarness config is set:

# Use a bundled or custom harness by name
archal run scenario.md \
  --harness react \
  --agent-model gemini-2.5-flash \
  -n 3

# Or point to a custom harness directory
archal run scenario.md \
  --harness-dir ./my-harness \
  --agent-model "my-model" \
  -n 3

--harness and --harness-dir are mutually exclusive.

Sandbox mode (OpenClaw in Docker)

Runs your OpenClaw agent in a Docker container with a TLS-intercepting proxy. All HTTPS calls to service domains are transparently routed to cloud-hosted twins. Inferred when --sandbox is set.

archal run scenario.md \
  --sandbox \
  --openclaw-home ~/.openclaw

Use --no-docker to skip Docker and run with a local OpenClaw CLI + proxy instead.

Mode inference

--sandbox → sandbox mode
--engine-endpoint, ARCHAL_ENGINE_ENDPOINT, or OPENCLAW_URL is set → API mode
--harness, --harness-dir, ARCHAL_HARNESS_DIR, or engine.defaultHarness config → harness mode
Neither → error with guidance

Exit codes

Code	Meaning
`0`	Run succeeded (or scenario was skipped by `--tag`) and score met `--pass-threshold`
`1`	Runtime error or satisfaction below threshold
`2`	Validation error (bad flags, missing scenario, invalid config)

Environment variables

Variable	Description
`ARCHAL_ENGINE_ENDPOINT`	Default API engine endpoint
`ARCHAL_ENGINE_TOKEN`	Default API engine auth token
`ARCHAL_ENGINE_MODEL`	Default engine model identifier
`ARCHAL_ENGINE_TIMEOUT`	Default API engine timeout (seconds)
`ARCHAL_ENGINE_TWIN_URLS`	Default path to remote twin URL override map
`ARCHAL_ENGINE_API_KEY`	API key for the model under test
`OPENCLAW_URL`	Legacy OpenClaw endpoint alias for API mode
`OPENCLAW_GATEWAY_TOKEN`	Legacy OpenClaw token alias for API mode
`OPENCLAW_GATEWAY_PASSWORD`	Legacy OpenClaw password alias for API mode
`OPENCLAW_AGENT_ID`	Legacy OpenClaw agent fallback (prefer `--agent-model` or `ARCHAL_ENGINE_MODEL`)
`ARCHAL_HARNESS_DIR`	Default harness directory for local agent execution

Harness configuration — full reference for system prompts, temperature, thinking, and model tuning
OpenClaw setup
How do I run a scenario with a local harness?
How do I run scenarios in CI?

Core

Auth & Account

Resources

Configuration

Usage

Arguments

Options

Agent execution modes

API mode (remote agent)

Harness mode (local agent)

Sandbox mode (OpenClaw in Docker)

Mode inference

Exit codes

Environment variables

Core

Auth & Account

Resources

Configuration

​Usage

​Arguments

​Options

​Agent execution modes

​API mode (remote agent)

​Harness mode (local agent)

​Sandbox mode (OpenClaw in Docker)

​Mode inference

​Exit codes

​Environment variables

​Related

Usage

Arguments

Options

Agent execution modes

API mode (remote agent)

Harness mode (local agent)

Sandbox mode (OpenClaw in Docker)

Mode inference

Exit codes

Environment variables

Related