Skip to main content

Deployment Model

The platform runs across two Kubernetes clusters with strict network isolation between platform infrastructure and user workloads.

Two-Cluster Architecture

ClusterPurposeIP Space
TrustedPlatform API, platform-admin, guacd (VNC proxy), shared infra172.32.0.0/16
UntrustedUser agents, Studio containers, customer workloads172.16.0.0/16

Both clusters are managed via Fleet (GitOps) from the dc_docs/fleet/ directory.

Namespace Model

Each user and team project gets its own Kubernetes namespace in the untrusted cluster:

  • Personal projects: one namespace per user
  • Team projects: one namespace per project
  • Studio, agents, and all user workloads run in the user's/project's namespace

This provides strong isolation — agents from different users can't access each other's resources.

Agent Deployment

Agents deploy as Knative Services, which means they:

  • Scale to zero when idle (no cost when not in use)
  • Scale up automatically when messages arrive
  • Share the same container image as Studio
ConfigValue
ImageStudio container image
Entrypoint/usr/local/xpressai/bin/setup_system.sh continue
NFS PVCxap-pvc2 (shared across namespaces)
NFS subPathhdd/user/{userId}/ (personal) or hdd/user/projects/{projectId}/ (team)
Mount/data/home

Agent files live at /data/home/agents/{agentName}/ inside the container.

NFS Storage Layout

xap-pvc2/
hdd/user/
{userId}/ # Personal project
agents/
{agentName}/
agent.yaml
tools/
modules/
prompts/
procedures/
skills/
knowledge/
knowledge/ # User's knowledge base
projects/
{projectId}/ # Team project
agents/
{agentName}/
...same structure...
agents/shared/knowledge/ # Shared team knowledge

Domain & Networking

note

The domain names listed below are for the default production deployment. Actual domain names may vary depending on your deployment environment.

  • Platform API: platform.ap.xpressai.cloud
  • Agent services: {serviceName}.{namespace}.ap.xpressai.cloud
  • LLM Relay: relay.public.cloud.xpress.ai/v1 (OpenAI-compatible proxy)

Cold-Start Latency

Because agents deploy as Knative Services that scale to zero when idle, the first request after an idle period incurs cold-start latency. This typically includes container startup, NFS mount, and module initialization. Subsequent requests within the active window are handled immediately. The cold-start time varies depending on agent complexity but is generally a few seconds.