Agent Messaging System
Every interaction between a user and an agent flows through RabbitMQ. This is not an implementation detail you can ignore --- it shapes how agents process messages, how tool calls work, and why agents can recover from crashes mid-conversation. Understanding this system is essential for debugging agent behavior and building reliable tools.
Why RabbitMQ?
Agents are long-running, unpredictable workloads. An LLM call might take 30 seconds or 10 minutes. A tool-call loop might iterate five times before producing a final answer. Handling this synchronously in an HTTP request would tie up server threads and make error recovery painful.
RabbitMQ gives us:
- Durability: messages survive broker restarts.
- Ordered processing: one message at a time per conversation.
- Retry semantics: NACK + requeue lets agents continue multi-step processing.
- Dead-letter handling: poison messages get routed to a DLQ instead of blocking the queue.
Exchange Topology
The messaging layer uses two exchanges:
| Exchange | Type | Purpose |
|---|---|---|
xpress.agents.topic | Topic (durable) | Primary dispatch for all agent messages |
xpress.agents.dlq | Fanout (durable) | Dead-letter exchange for failed messages |
Messages are routed using topic routing keys that encode the destination:
agent.{projectId}.{agentId}.conv.{conversationKey}
Each unique combination of project, agent, and conversation gets its own durable queue bound to the topic exchange. This means:
- Messages for different conversations never interfere with each other.
- Each queue can be consumed by exactly one worker thread.
- Queue state is visible in the RabbitMQ management UI for debugging.
Per-Conversation Processing
Each conversation queue gets its own virtual thread with prefetchCount=1. This is critical:
- Prefetch of 1 means the broker delivers exactly one message at a time to the consumer. The next message does not arrive until the current one is either ACKed or NACKed.
- One thread per conversation means messages within a conversation are always processed in order.
- Virtual threads (Java 21) mean we can have thousands of these without running out of platform threads.
A fixed thread pool would limit the number of concurrent conversations. Virtual threads let the platform handle as many conversations as there are messages, with the JVM managing scheduling. The actual bottleneck is LLM API rate limits, not thread count.
The Tool-Call Loop
This is the heart of the messaging system. When an agent receives a message, it calls the LLM, which might return a tool call instead of a final response. The NACK/requeue mechanism handles this elegantly.
How NACK + Requeue Works
When the LLM returns a tool call:
- The worker executes the tool (e.g., creating a task, searching files, sending email).
- The tool result is persisted to the database.
- The worker sends a NACK with requeue to RabbitMQ. (RabbitMQ messages are immutable -- the original message is redelivered as-is.)
- When the worker picks up the redelivered message, it reads the latest tool results from the database and continues the conversation with the updated context.
- This repeats until the LLM returns a final text response.
When the LLM returns a final response:
- The worker saves the response to the database.
- The worker sends an ACK to RabbitMQ.
- The message is permanently removed from the queue.
If a message is NACKed without the requeue flag (e.g., after too many retries or an unrecoverable error), it routes to the dead-letter exchange xpress.agents.dlq. Messages in the DLQ can be inspected and replayed manually. This prevents poison messages from blocking a conversation queue forever.
Deduplication
Because messages can be re-delivered (NACK + requeue), tool executions could theoretically run twice. The platform prevents this with a deduplication table.
Before executing a tool or processing a message, the worker checks:
- Has this exact message (by ID) already been processed for this conversation?
- If yes, skip the side-effecting operation and use the cached result.
- If no, execute and record the message ID.
This is especially important for tools with side effects like send_email or create_task --- you do not want an agent to send the same email twice because of a redelivery. See Conversation Model for details on dispatch-level deduplication, which complements the execution-level deduplication described here.
History Window
Agents do not see the entire conversation history. A configurable history window (default: 50 messages) limits how many past messages are included in the LLM prompt. This keeps token usage manageable and prevents context window overflow for long-running conversations.
The history window is a sliding window over the most recent messages. Older messages are still stored in the database and available through the agent's mid-term memory (Vecto), but they are not included in the direct prompt context.
Configuration
Key configuration values that control the messaging system:
| Property | Default | Description |
|---|---|---|
| History window size | 50 | Number of past messages included in agent prompt |
| Prefetch count | 1 | Messages delivered to a consumer at a time |
| Queue durability | Durable | Queues survive broker restarts |
| Dead-letter exchange | xpress.agents.dlq | Where failed messages are routed |
- Check the RabbitMQ management UI to see queue depth per conversation. A queue with many messages might indicate a stuck worker or slow LLM responses.
- Messages in the dead-letter queue (
xpress.agents.dlq) represent unrecoverable failures. Inspect their headers for the original routing key and failure reason. - If an agent seems to be "ignoring" messages, verify that its Knative pod is running and consuming from the correct queue.
See Also
- Conversation Model -- dispatch-level deduplication and how conversations create queues
- Task Execution Lifecycle -- the three execution drivers that consume messages
- Agent Deployment Model -- how agent pods connect to RabbitMQ after deployment