Skip to content

Runs

Open in ChatGPT Open in Claude

Path: Trust Lab → Runs (fourth icon)

Runs is the execution log — every time scenarios were tested, the result appears here.

Run summary cards showing passed, failed, total, and pending counts Passed 21 runs Failed 7 runs Total 29 this page Pending / Running 1
CardDescription
PassedTotal runs where all scenarios passed
FailedTotal runs with assertion failures or errors
TotalAll runs on the current page
Pending/RunningRuns still in progress

Each row represents a single suite or scenario execution:

ColumnDescription
Suite / ScenarioWhat was tested
OutcomePassed or Failed
Pass Ratee.g., “3/3 passed”
TriggerHow it was initiated — MANUAL or SCHEDULED
StartedExact timestamp

Use the Suite, Scenario, and Outcome filters to isolate specific failures or track a scenario’s history over time.

Failed eval comparison showing expected criteria versus the agent's actual response FAILED — LLM-as-Judge: "Agent explains pricing tiers" Turn 2 of 4 EVAL CRITERIA Agent explains that pricing has three tiers: Free, Pro ($49/mo), and Enterprise (custom). Must mention annual discount of 20%. Should offer to navigate user to the pricing page. AGENT RESPONSE "We offer several pricing options. Our Pro plan is $49 per month and includes unlimited agents. Would you like to know more?" Missing: • Free tier not mentioned • No annual discount • No pricing page navigation

Click any row to open the run detail. You’ll see each turn with the agent’s actual response alongside the eval criteria. Failed evals are highlighted in red, showing exactly which assertion didn’t pass and what the agent said instead.