Voice

Open in ChatGPT Open in Claude

The Voice tab under Style and Behavior controls how your agent listens and speaks. Pick the mode, the voice style, and the speaking behavior that fit your product. Configure both from the Voice tab in Agent Studio.

There are two modes:

Speech to Text: the user speaks, the agent listens (input only).
Voice Chat: the agent also talks back (two-way conversation).

Enable Speech to Text

Toggle this on or off.

On: users can talk to the agent, and their speech is transcribed into text input.
Off: users can only type.

This setting only captures input. The agent still responds with text on screen.

Enable Voice Chat

Toggle this on or off.

On: the agent replies with synthesized voice, in addition to text.
Off: the agent responds only in text, even if Speech to Text is enabled.

Voice Chat requires the Pro plan.

Choose a voice

Set the Voice Chat Type, the voice the agent uses for spoken responses.

Alloy (default): neutral, professional, multi-purpose.
Additional voices may be available depending on your plan.
Click the speaker icon to preview the selected voice.

Set voice instructions

Provide optional custom guidance for how the agent should sound.

Examples:

“Use a friendly, upbeat tone.”
“Speak slowly and clearly, with short sentences.”
“Adopt a formal, executive style suitable for leadership updates.”

Best practices

Clarify the mode: enable Speech to Text if you want voice input, enable Voice Chat if you want voice output.
Match your audience: for enterprise, keep it professional and concise. For consumer apps, go conversational and warmer.
Test voice speed: faster responses save time, slower speech increases clarity.
Document tone and style in Voice Instructions to keep experiences consistent.