Evals

Open in ChatGPT Open in Claude

The Evaluations tab is where you test if a user’s natural language prompt is correctly translated into the action you’ve defined, and whether the right input parameters are extracted based on the schema.

Goal

Confirm the AI routes the user intent to the correct action.
Check that the schema fields (parameters) are filled correctly from the prompt.
Validate how the agent handles the server response using the mock response.

Generating Test Prompts

Open the Evaluations tab.
Click Generate Prompts to auto-create a set of test prompts based on your action schema.

Example (for Create Campaign action):

“Create a new campaign named ‘Summer Sale’ starting from July 1 to July 31 with a budget of 5000 and status ACTIVE.”
“I want to create a campaign called ‘Holiday Promo’.”
“Create campaign starting on August 1 with a budget of 2000 and status PAUSED.”

You can also click + Add New Test to write your own custom prompt.

Running a Test

Click Run next to a test prompt.
The agent will process the input and attempt to:
- Match the correct action.
- Extract values for each field in the Schema.
- Return the Response Mock you defined.

Reviewing Results

On the right-hand side, you’ll see the Agent Testing output.

Example:

Campaign Created: Spring Sale
Name: Spring Sale
Start Date: 2024-05-01
End Date: 2024-05-31
Budget: 10,000
Status: ACTIVE

In the Agent Testing panel, click the action link (e.g., Create Campaign was executed).

This opens the Arguments view, which shows the raw schema extraction:

{
  "startDate": "2024-05-01",
  "endDate": "2024-05-31",
  "budget": 10000,
  "status": "ACTIVE",
  "campaignName": "Spring Sale"
}

This lets you confirm that user language (e.g., “budget of 10k”) is mapped into structured schema fields.

Best Practices

Create tests that cover:

All required fields provided (happy path).
Only required fields provided (minimal input).
Missing required fields (should fail validation).
Partial optional fields (some extras given, others missing).

Update your schema descriptions or instructions if the AI is misinterpreting user prompts.

Always re-run evaluations after editing the schema or response mock.

Use Evaluations before publishing to ensure your action works reliably across different ways a user might phrase their request.