Skip to main content

Agent Messaging System

Every interaction between a user and an agent flows through RabbitMQ. This is not an implementation detail you can ignore --- it shapes how agents process messages, how tool calls work, and why agents can recover from crashes mid-conversation. Understanding this system is essential for debugging agent behavior and building reliable tools.

Why RabbitMQ?

Agents are long-running, unpredictable workloads. An LLM call might take 30 seconds or 10 minutes. A tool-call loop might iterate five times before producing a final answer. Handling this synchronously in an HTTP request would tie up server threads and make error recovery painful.

RabbitMQ gives us:

  • Durability: messages survive broker restarts.
  • Ordered processing: one message at a time per conversation.
  • Retry semantics: NACK + requeue lets agents continue multi-step processing.
  • Dead-letter handling: poison messages get routed to a DLQ instead of blocking the queue.

Exchange Topology

The messaging layer uses two exchanges:

ExchangeTypePurpose
xpress.agents.topicTopic (durable)Primary dispatch for all agent messages
xpress.agents.dlqFanout (durable)Dead-letter exchange for failed messages

Messages are routed using topic routing keys that encode the destination:

agent.{projectId}.{agentId}.conv.{conversationKey}

Each unique combination of project, agent, and conversation gets its own durable queue bound to the topic exchange. This means:

  • Messages for different conversations never interfere with each other.
  • Each queue can be consumed by exactly one worker thread.
  • Queue state is visible in the RabbitMQ management UI for debugging.

Per-Conversation Processing

Each conversation queue gets its own virtual thread with prefetchCount=1. This is critical:

  • Prefetch of 1 means the broker delivers exactly one message at a time to the consumer. The next message does not arrive until the current one is either ACKed or NACKed.
  • One thread per conversation means messages within a conversation are always processed in order.
  • Virtual threads (Java 21) mean we can have thousands of these without running out of platform threads.
Why Not a Thread Pool?

A fixed thread pool would limit the number of concurrent conversations. Virtual threads let the platform handle as many conversations as there are messages, with the JVM managing scheduling. The actual bottleneck is LLM API rate limits, not thread count.

The Tool-Call Loop

This is the heart of the messaging system. When an agent receives a message, it calls the LLM, which might return a tool call instead of a final response. The NACK/requeue mechanism handles this elegantly.

How NACK + Requeue Works

When the LLM returns a tool call:

  1. The worker executes the tool (e.g., creating a task, searching files, sending email).
  2. The tool result is persisted to the database.
  3. The worker sends a NACK with requeue to RabbitMQ. (RabbitMQ messages are immutable -- the original message is redelivered as-is.)
  4. When the worker picks up the redelivered message, it reads the latest tool results from the database and continues the conversation with the updated context.
  5. This repeats until the LLM returns a final text response.

When the LLM returns a final response:

  1. The worker saves the response to the database.
  2. The worker sends an ACK to RabbitMQ.
  3. The message is permanently removed from the queue.
Dead-Letter Routing

If a message is NACKed without the requeue flag (e.g., after too many retries or an unrecoverable error), it routes to the dead-letter exchange xpress.agents.dlq. Messages in the DLQ can be inspected and replayed manually. This prevents poison messages from blocking a conversation queue forever.

Deduplication

Because messages can be re-delivered (NACK + requeue), tool executions could theoretically run twice. The platform prevents this with a deduplication table.

Before executing a tool or processing a message, the worker checks:

  1. Has this exact message (by ID) already been processed for this conversation?
  2. If yes, skip the side-effecting operation and use the cached result.
  3. If no, execute and record the message ID.

This is especially important for tools with side effects like send_email or create_task --- you do not want an agent to send the same email twice because of a redelivery. See Conversation Model for details on dispatch-level deduplication, which complements the execution-level deduplication described here.

History Window

Agents do not see the entire conversation history. A configurable history window (default: 50 messages) limits how many past messages are included in the LLM prompt. This keeps token usage manageable and prevents context window overflow for long-running conversations.

The history window is a sliding window over the most recent messages. Older messages are still stored in the database and available through the agent's mid-term memory (Vecto), but they are not included in the direct prompt context.

Configuration

Key configuration values that control the messaging system:

PropertyDefaultDescription
History window size50Number of past messages included in agent prompt
Prefetch count1Messages delivered to a consumer at a time
Queue durabilityDurableQueues survive broker restarts
Dead-letter exchangexpress.agents.dlqWhere failed messages are routed
Debugging Tips
  • Check the RabbitMQ management UI to see queue depth per conversation. A queue with many messages might indicate a stuck worker or slow LLM responses.
  • Messages in the dead-letter queue (xpress.agents.dlq) represent unrecoverable failures. Inspect their headers for the original routing key and failure reason.
  • If an agent seems to be "ignoring" messages, verify that its Knative pod is running and consuming from the correct queue.

See Also