Skip to content

Evals

Open in ChatGPT Open in Claude

The Evaluations tab is where you test if a user’s natural language prompt is correctly translated into the action you’ve defined, and whether the right input parameters are extracted based on the schema.

  • Confirm the AI routes the user intent to the correct action.
  • Check that the schema fields (parameters) are filled correctly from the prompt.
  • Validate how the agent handles the server response using the mock response.
  1. Open the Evaluations tab.
  2. Click Generate Prompts to auto-create a set of test prompts based on your action schema.

Example (for Create Campaign action):

  • “Create a new campaign named ‘Summer Sale’ starting from July 1 to July 31 with a budget of 5000 and status ACTIVE.”
  • “I want to create a campaign called ‘Holiday Promo’.”
  • “Create campaign starting on August 1 with a budget of 2000 and status PAUSED.”

You can also click + Add New Test to write your own custom prompt.

  1. Click Run next to a test prompt.
  2. The agent will process the input and attempt to:
    • Match the correct action.
    • Extract values for each field in the Schema.
    • Return the Response Mock you defined.

On the right-hand side, you’ll see the Agent Testing output.

Example:

Campaign Created: Spring Sale
Name: Spring Sale
Start Date: 2024-05-01
End Date: 2024-05-31
Budget: 10,000
Status: ACTIVE

In the Agent Testing panel, click the action link (e.g., Create Campaign was executed).

This opens the Arguments view, which shows the raw schema extraction:

{
"startDate": "2024-05-01",
"endDate": "2024-05-31",
"budget": 10000,
"status": "ACTIVE",
"campaignName": "Spring Sale"
}

This lets you confirm that user language (e.g., “budget of 10k”) is mapped into structured schema fields.

Create tests that cover:

  • All required fields provided (happy path).
  • Only required fields provided (minimal input).
  • Missing required fields (should fail validation).
  • Partial optional fields (some extras given, others missing).

Update your schema descriptions or instructions if the AI is misinterpreting user prompts.

Always re-run evaluations after editing the schema or response mock.

Use Evaluations before publishing to ensure your action works reliably across different ways a user might phrase their request.