Knowledge & Memory
Agents on the XpressAI Platform have three tiers of memory, modeled loosely after how biological memory works. Short-term memory holds the current conversation. Mid-term memory stores semantically searchable past interactions in the cloud. Long-term memory lives as markdown files on disk. Understanding these tiers --- what goes where, how long it lasts, and how the agent retrieves it --- is essential for tuning agent behavior and building agents that actually learn from experience.
The Three Tiers
Tier 1: Short-term Memory (In-Memory, Per-Conversation)
Short-term memory is the simplest tier. It holds the last N messages of the current conversation as a JSON array in the agent's working memory.
| Property | Value |
|---|---|
| Storage | In-memory (agent process) |
| Scope | Current conversation only |
| Default limit | 10 messages |
| Persistence | None --- cleared between conversations |
| Retrieval | Direct inclusion in LLM prompt |
When the short-term buffer is full (more than 10 messages), the oldest messages are pushed to mid-term memory. This keeps the immediate context window focused on the most recent interaction while preserving older messages for semantic retrieval.
The conversation history window (50 messages, described in Conversation Model) is the platform-level limit on what gets sent to the agent. The short-term memory limit (10 messages) is the agent-level limit on what the agent keeps in its active working memory within a single turn. These are different mechanisms at different layers.
- Platform history window (50 messages): The maximum conversation history sent to the messaging system.
- Agent short-term memory (10 messages): The most recent messages the agent's reasoning loop considers for each response.
The agent sees up to 10 recent messages for immediate context, while the platform retains up to 50 messages for the broader conversation history.
Tier 2: Mid-term Memory (Vecto Cloud Vectors)
Mid-term memory uses Vecto, a cloud-based vector database, to store and retrieve past interactions through semantic search.
| Property | Value |
|---|---|
| Storage | Vecto cloud (managed vector DB) |
| Scope | All past interactions for this agent |
| Persistence | Survives restarts, pod terminations |
| Retrieval | Semantic similarity search |
| Embeddings | QWEN2 model (a pre-trained embedding model used to convert text into vector representations for semantic search; this is the platform default and is not currently user-configurable) |
The VectoMemory module handles all vector operations:
- Store: new memories are embedded and stored in the agent's vector space.
- Search: given a query (e.g., the current user message), retrieve the most semantically similar past memories.
- Update: existing memories can be rewritten during consolidation.
- Delete: outdated or irrelevant memories can be removed.
Each agent gets its own auto-created vector space in Vecto. This ensures memory isolation between agents --- one agent's memories are never mixed with another's.
Tier 3: Long-term Memory (File-based Zettelkasten)
Long-term memory is the most durable tier. Agents maintain a zettelkasten (a system of interconnected markdown notes) on the NFS filesystem.
| Property | Value |
|---|---|
| Storage | NFS filesystem (persistent volume) |
| Format | Markdown files |
| Personal path | /data/home/knowledge/ |
| Shared path | /data/home/agents/shared/knowledge/ |
| Persistence | Survives everything --- restarts, redeployments, scaling events |
| Retrieval | File listing + reading; agents manage their own knowledge structure |
The zettelkasten is the agent's personal knowledge base. Agents create, update, and organize notes as part of their normal operation. During idle time (see Task Execution Lifecycle), agents often review and consolidate their zettelkasten.
There are two knowledge locations:
/data/home/knowledge/: the agent's personal knowledge. Only this agent reads and writes here./data/home/agents/shared/knowledge/: shared team knowledge. All agents in the project can read this. It typically contains company context, reference documents, and shared protocols.
Memory Lifecycle: The Hippo Orchestrator
The Hippo orchestrator (named after the hippocampus, the brain's memory center) manages how memories flow between tiers. It has two modes of operation.
Synchronous Path (During Message Processing)
When an agent receives a message, before generating a response:
- Filter short-term: select relevant messages from the current conversation buffer.
- Search mid-term: query Vecto for semantically related past memories.
- Inject into context: append retrieved memories to the LLM prompt as additional context.
This happens on every message and ensures the agent has relevant historical context even for topics discussed days or weeks ago.
Asynchronous Path (Consolidation)
Every 5 minutes, the Hippo orchestrator runs a consolidation cycle:
The consolidation cycle uses three LLM prompts:
| Prompt | Purpose | Decision |
|---|---|---|
is_useful | Evaluate whether a memory is worth keeping | Keep or discard |
rewrite_memory | Rewrite the memory for clarity, removing conversational noise | Cleaned text for storage |
create_memory | Decide whether to create a new mid-term memory entry | Store or skip |
Memory consolidation is computationally expensive --- it requires multiple LLM calls per memory. Running it synchronously during message processing would add seconds of latency to every response. The 5-minute async cycle keeps response times fast while still ensuring memories are consolidated regularly.
Knowledge Bases (Project-Level RAG)
Separate from agent memory, the platform supports knowledge bases --- project-wide vector spaces that agents access through RAG (Retrieval-Augmented Generation).
| Aspect | Agent Memory | Knowledge Base |
|---|---|---|
| Scope | Per-agent | Per-project |
| Managed by | Agent (automatic) | Users (via Knowledge UI) |
| Content | Past interactions, learned facts | Uploaded documents, reference material |
| Access | Only the owning agent | All agents in the project |
Knowledge bases are created and managed by users through the Knowledge UI. Users upload documents (PDFs, text files, web pages), which are chunked, embedded, and stored in a Vecto vector space. Agents query these knowledge bases during message processing to ground their responses in project-specific information.
This is distinct from agent memory in an important way: knowledge bases contain information the user provides, while agent memory contains information the agent learns through interaction.
Design Trade-offs
The three-tier architecture involves several deliberate trade-offs:
Speed vs. completeness: short-term memory is fast (in-memory lookup) but limited. Mid-term memory is comprehensive but slower (network call to Vecto). The system queries both and merges results.
Cost vs. retention: not every interaction is worth storing. The is_useful prompt filters out trivial exchanges ("ok", "thanks", "got it") to avoid filling vector storage with noise.
Privacy vs. sharing: personal agent memory is isolated per-agent, but the zettelkasten supports shared knowledge directories. This lets teams share context without agents accidentally accessing each other's conversation history.
Autonomy vs. control: agents manage their own memory automatically, but users control knowledge bases. This split gives agents the ability to learn while keeping users in control of the authoritative information the agents reference.
- Agent forgets too quickly? Increase the short-term memory limit or reduce the consolidation interval.
- Agent remembers irrelevant things? Adjust the
is_usefulprompt to be more selective. - Agent lacks domain knowledge? Add documents to a project knowledge base rather than relying on the agent to learn through conversation.
- Multiple agents need the same context? Use the shared knowledge directory at
/data/home/agents/shared/knowledge/.
See Also
- Build a Knowledge Base -- end-to-end tutorial for creating and populating a knowledge base
- Create a Vector Space -- how to set up a new vector space for RAG
- Upload Documents to Knowledge Base -- adding documents to an existing knowledge base