Voice Call with an Agent
Talk to your agent in real time using voice instead of text. The platform supports live voice and video calls powered by LiveKit and real-time speech processing, so you can have natural spoken conversations with your agents -- including multi-agent group calls.
Voice calls are currently audio-only. Text transcripts of voice conversations are not automatically saved to the conversation history. Anything discussed during a call does not appear as text messages.
Prerequisites
- An XpressAI Platform account with at least one agent deployed
- A working microphone (and optionally a webcam)
- A modern browser with microphone permissions enabled
Steps
1. Open a conversation
Navigate to Conversations in the sidebar and open an existing conversation with your agent, or start a new one.
2. Start the call
Click the call button in the conversation toolbar. The platform creates a LiveKit room and connects your agent automatically. Your browser will ask for microphone (and optionally camera) permissions -- grant them to proceed.
Once the connection is established, you'll see the call interface with your agent shown as a participant.
3. Talk naturally
Speak as you normally would. The platform uses real-time speech processing:
- Your voice is captured and streamed to the agent.
- The agent processes your speech and generates a spoken response.
- The response plays back through your speakers in real time.
There's no need to press a button to talk -- the system handles turn detection automatically.
4. Watch your agent use tools
During a call, your agent can still use all its configured tools (searching knowledge bases, calling APIs, etc.). When the agent executes a tool, it pauses briefly while the tool runs, then resumes speaking with the result. You might notice a short silence during tool execution -- this is normal.
5. Try a multi-agent call (optional)
If you have a group conversation with multiple agents, all of them can join the call. The platform uses a turn-taking protocol to coordinate speakers.
Voice calls use a Single Active Speaker protocol. One agent speaks at a time, and others listen. An agent can request the floor using the request_turn tool. This prevents agents from talking over each other in group calls.
6. End the call
Click the end call button in the call interface to disconnect. The conversation continues in text mode.
What you've done
- Started a live voice call with an agent
- Had a real-time spoken conversation
- Observed tool usage during a call
- Learned how multi-agent calls use a single-active-speaker protocol
Next steps
Head to Set Up Agent Email to give your agent an email address and configure inbound and outbound email rules.
See also