Path: Trust Lab → Runs (fourth icon)
Runs is the execution log — every time scenarios were tested, the result appears here.
Summary cards
Section titled “Summary cards”| Card | Description |
|---|---|
| Passed | Total runs where all scenarios passed |
| Failed | Total runs with assertion failures or errors |
| Total | All runs on the current page |
| Pending/Running | Runs still in progress |
Run list
Section titled “Run list”Each row represents a single suite or scenario execution:
| Column | Description |
|---|---|
| Suite / Scenario | What was tested |
| Outcome | Passed or Failed |
| Pass Rate | e.g., “3/3 passed” |
| Trigger | How it was initiated — MANUAL or SCHEDULED |
| Started | Exact timestamp |
Use the Suite, Scenario, and Outcome filters to isolate specific failures or track a scenario’s history over time.
Drilling into a failed run
Section titled “Drilling into a failed run”Click any row to open the run detail. You’ll see each turn with the agent’s actual response alongside the eval criteria. Failed evals are highlighted in red, showing exactly which assertion didn’t pass and what the agent said instead.