Voice Call System
Voice calls use LiveKit for WebRTC transport and the OpenAI Realtime API for speech-to-speech processing inside agent containers.
Architecture
Components
| Component | Role |
|---|---|
| VideoCallResource | Creates LiveKit rooms, dispatches /voice/join to agents |
| LiveKit Server | WebRTC signaling and media transport |
| platform_openai_chat.py | Agent-side voice handler (in studio container) |
| OpenAI Realtime API | Speech-to-speech processing |
There are two voice handler files in the codebase. Only platform_openai_chat.py (in containers/studio/xaibo/server/) is used at runtime. The voice_worker.py in agent-templates/_shared/modules/ is a legacy XAIBO module that is never invoked for voice calls.
Single Active Speaker Protocol
In multi-agent voice calls, only one agent speaks at a time:
- Active agent: connected to the user's audio, processes and responds
- Listener agents: connected to the room but muted, monitoring the conversation
- Turn switching: user-initiated (click to switch) or agent-initiated via
request_turntool
Although the request_turn tool is conceptually invoked by agents, at the implementation level voice tools are LiveKit function_tool instances (not XAIBO @tool decorators). The example below shows the conceptual interface; the actual implementation uses LiveKit's native tool system in _create_voice_tools().
# Conceptual interface -- actual implementation uses LiveKit function_tool
def request_turn(reason: str) -> str:
"""Request speaking privileges in a multi-agent voice call."""
# Returns status: "granted" or "denied"
Voice Tools
Voice tools are LiveKit function_tool instances, not XAIBO @tool decorators. They're created in _create_voice_tools() and passed to lk_agents.Agent(tools=...).
This means voice tools use LiveKit's native tool system, which integrates directly with the OpenAI Realtime API's function calling.
Supported Features
- Real-time speech-to-speech conversation
- Multi-agent calls with turn-taking
- Tool execution during calls (brief pause while tool runs)
- Transcript logging for conversation history