Skip to main content

Voice Call System

Voice calls use LiveKit for WebRTC transport and the OpenAI Realtime API for speech-to-speech processing inside agent containers.

Architecture

Components

ComponentRole
VideoCallResourceCreates LiveKit rooms, dispatches /voice/join to agents
LiveKit ServerWebRTC signaling and media transport
platform_openai_chat.pyAgent-side voice handler (in studio container)
OpenAI Realtime APISpeech-to-speech processing
warning

There are two voice handler files in the codebase. Only platform_openai_chat.py (in containers/studio/xaibo/server/) is used at runtime. The voice_worker.py in agent-templates/_shared/modules/ is a legacy XAIBO module that is never invoked for voice calls.

Single Active Speaker Protocol

In multi-agent voice calls, only one agent speaks at a time:

  • Active agent: connected to the user's audio, processes and responds
  • Listener agents: connected to the room but muted, monitoring the conversation
  • Turn switching: user-initiated (click to switch) or agent-initiated via request_turn tool
note

Although the request_turn tool is conceptually invoked by agents, at the implementation level voice tools are LiveKit function_tool instances (not XAIBO @tool decorators). The example below shows the conceptual interface; the actual implementation uses LiveKit's native tool system in _create_voice_tools().

# Conceptual interface -- actual implementation uses LiveKit function_tool
def request_turn(reason: str) -> str:
"""Request speaking privileges in a multi-agent voice call."""
# Returns status: "granted" or "denied"

Voice Tools

Voice tools are LiveKit function_tool instances, not XAIBO @tool decorators. They're created in _create_voice_tools() and passed to lk_agents.Agent(tools=...).

This means voice tools use LiveKit's native tool system, which integrates directly with the OpenAI Realtime API's function calling.

Supported Features

  • Real-time speech-to-speech conversation
  • Multi-agent calls with turn-taking
  • Tool execution during calls (brief pause while tool runs)
  • Transcript logging for conversation history