Skip to content

Trust Lab

Open in ChatGPT Open in Claude
Trust Lab — test suites, scenarios, runs, and scheduled testing Test Suites Sanity Suite 12/12 Regression 28/28 × Edge Cases 9/11 Scenario: Order lookup Where's my order? I found order #1234 Evals Contains order number Called lookupOrder action Run #47 — Results 88% pass rate Schedules Daily @ 6:00 AM Sanity Suite Weekly — Mon 9 AM Full Regression Next run in 4h 22m

Trust Lab is Foldspace’s test automation module. It lets you define conversations the agent must handle correctly, then run them automatically to catch regressions before they reach users. Access it by selecting Trust Lab from the top-left app switcher.

A scenario is a single test conversation and the evals that grade it, a suite groups related scenarios into the unit you run and track, Runs is the execution log, Schedules automate it, and the Dashboard is the health view across all of it.

flowchart LR
  SC[Scenarios<br/>test conversations]
  TS[Test Suites<br/>grouped scenarios]
  RU[Runs<br/>execution log]
  SCH[Schedules<br/>automated runs]
  DA[Dashboard<br/>health view]
  SC --> TS
  TS --> RU
  SCH -- "triggers" --> RU
  RU -- "feeds" --> DA
  DA -- "failures surface" --> SC
  • Dashboard: the real-time health view of your test coverage.
  • Test Suites: groups of scenarios you run, schedule, and track together.
  • Scenarios: individual test conversations and the evals that grade them.
  • Runs: the execution log, where you drill into failures.
  • Schedules: automated runs so regressions surface on their own.
  1. Import a real conversation: go to Scenarios+ New ScenarioImport Conversation, and pick a production conversation that represents an important user journey.
  2. Review and refine the evals: check the auto-generated evals and tighten any that are too vague; add tool use evals if the scenario involves actions or navigation.
  3. Group into a suite: add the scenario to a Sanity or Regression suite.
  4. Run manually once: click Run Suite to confirm everything passes before activating automation.
  5. Set a daily schedule: Schedules+ Add Schedule → daily at 08:00, scoped to your suite.
  6. Check the Dashboard each morning: if the pass rate drops, open Runs to find the failed assertion and fix the underlying Knowledge, action, or navigation config.
  7. Repeat for every new bug: any time a real user surfaces unexpected behavior, import that conversation as a new scenario.