GitHub Issue TriageLIVE
Tests agent ability to label, prioritize, and close issues based on content analysis.
Twins: GitHub5 scenarios
Pre-built scenario suites that measure agent reliability across multi-step workflows. Each suite runs scenarios multiple times and reports a satisfaction score.
Tests agent ability to label, prioritize, and close issues based on content analysis.
Tests agent ability to categorize support messages and route them to appropriate channels.
Tests coordinated agent workflows across Slack alerting and GitHub issue creation.
Tests agent ability to manage subscriptions, handle failed payments, and issue refunds.