Skip to main content

What is Archal?

We build pre-deployment testing for AI agents. Just like how you wouldn’t hire someone without interviewing them or force push code onto a deployment branch, you shouldn’t let an AI system interact with volatile systems with testing.

How it works

Write scenario → Spawn twins → Agent runs → Evaluate → Satisfaction score
  1. Write a scenario in markdown with setup state, expected behavior, and success criteria.
  2. Archal provisions digital twins preloaded with scenario state.
  3. Your agent runs against those twins through MCP-compatible interfaces.
  4. The evaluator scores each run against deterministic and probabilistic criteria.
  5. You review satisfaction score, per-criterion results, and traces to decide what to improve.

Key concepts

ConceptWhat it means
Digital twinA stateful behavioral clone of a real service (not a mock or stub)
ScenarioA markdown file describing an agent test: setup, behavior, criteria
SatisfactionA probabilistic score of how well an agent meets scenario criteria
SeedPredefined state used to initialize a twin
TraceComplete record of an agent’s tool calls during a run

Next steps

Quickstart

Get up and running