Skip to main content

Knowledge & Memory

Agents on the XpressAI Platform have three tiers of memory, modeled loosely after how biological memory works. Short-term memory holds the current conversation. Mid-term memory stores semantically searchable past interactions in the cloud. Long-term memory lives as markdown files on disk. Understanding these tiers --- what goes where, how long it lasts, and how the agent retrieves it --- is essential for tuning agent behavior and building agents that actually learn from experience.

The Three Tiers

Tier 1: Short-term Memory (In-Memory, Per-Conversation)

Short-term memory is the simplest tier. It holds the last N messages of the current conversation as a JSON array in the agent's working memory.

PropertyValue
StorageIn-memory (agent process)
ScopeCurrent conversation only
Default limit10 messages
PersistenceNone --- cleared between conversations
RetrievalDirect inclusion in LLM prompt

When the short-term buffer is full (more than 10 messages), the oldest messages are pushed to mid-term memory. This keeps the immediate context window focused on the most recent interaction while preserving older messages for semantic retrieval.

Why 10, Not 50?

The conversation history window (50 messages, described in Conversation Model) is the platform-level limit on what gets sent to the agent. The short-term memory limit (10 messages) is the agent-level limit on what the agent keeps in its active working memory within a single turn. These are different mechanisms at different layers.

Two Different Limits
  • Platform history window (50 messages): The maximum conversation history sent to the messaging system.
  • Agent short-term memory (10 messages): The most recent messages the agent's reasoning loop considers for each response.

The agent sees up to 10 recent messages for immediate context, while the platform retains up to 50 messages for the broader conversation history.

Tier 2: Mid-term Memory (Vecto Cloud Vectors)

Mid-term memory uses Vecto, a cloud-based vector database, to store and retrieve past interactions through semantic search.

PropertyValue
StorageVecto cloud (managed vector DB)
ScopeAll past interactions for this agent
PersistenceSurvives restarts, pod terminations
RetrievalSemantic similarity search
EmbeddingsQWEN2 model (a pre-trained embedding model used to convert text into vector representations for semantic search; this is the platform default and is not currently user-configurable)

The VectoMemory module handles all vector operations:

  • Store: new memories are embedded and stored in the agent's vector space.
  • Search: given a query (e.g., the current user message), retrieve the most semantically similar past memories.
  • Update: existing memories can be rewritten during consolidation.
  • Delete: outdated or irrelevant memories can be removed.

Each agent gets its own auto-created vector space in Vecto. This ensures memory isolation between agents --- one agent's memories are never mixed with another's.

Tier 3: Long-term Memory (File-based Zettelkasten)

Long-term memory is the most durable tier. Agents maintain a zettelkasten (a system of interconnected markdown notes) on the NFS filesystem.

PropertyValue
StorageNFS filesystem (persistent volume)
FormatMarkdown files
Personal path/data/home/knowledge/
Shared path/data/home/agents/shared/knowledge/
PersistenceSurvives everything --- restarts, redeployments, scaling events
RetrievalFile listing + reading; agents manage their own knowledge structure

The zettelkasten is the agent's personal knowledge base. Agents create, update, and organize notes as part of their normal operation. During idle time (see Task Execution Lifecycle), agents often review and consolidate their zettelkasten.

There are two knowledge locations:

  • /data/home/knowledge/: the agent's personal knowledge. Only this agent reads and writes here.
  • /data/home/agents/shared/knowledge/: shared team knowledge. All agents in the project can read this. It typically contains company context, reference documents, and shared protocols.

Memory Lifecycle: The Hippo Orchestrator

The Hippo orchestrator (named after the hippocampus, the brain's memory center) manages how memories flow between tiers. It has two modes of operation.

Synchronous Path (During Message Processing)

When an agent receives a message, before generating a response:

  1. Filter short-term: select relevant messages from the current conversation buffer.
  2. Search mid-term: query Vecto for semantically related past memories.
  3. Inject into context: append retrieved memories to the LLM prompt as additional context.

This happens on every message and ensures the agent has relevant historical context even for topics discussed days or weeks ago.

Asynchronous Path (Consolidation)

Every 5 minutes, the Hippo orchestrator runs a consolidation cycle:

The consolidation cycle uses three LLM prompts:

PromptPurposeDecision
is_usefulEvaluate whether a memory is worth keepingKeep or discard
rewrite_memoryRewrite the memory for clarity, removing conversational noiseCleaned text for storage
create_memoryDecide whether to create a new mid-term memory entryStore or skip
Why Asynchronous?

Memory consolidation is computationally expensive --- it requires multiple LLM calls per memory. Running it synchronously during message processing would add seconds of latency to every response. The 5-minute async cycle keeps response times fast while still ensuring memories are consolidated regularly.

Knowledge Bases (Project-Level RAG)

Separate from agent memory, the platform supports knowledge bases --- project-wide vector spaces that agents access through RAG (Retrieval-Augmented Generation).

AspectAgent MemoryKnowledge Base
ScopePer-agentPer-project
Managed byAgent (automatic)Users (via Knowledge UI)
ContentPast interactions, learned factsUploaded documents, reference material
AccessOnly the owning agentAll agents in the project

Knowledge bases are created and managed by users through the Knowledge UI. Users upload documents (PDFs, text files, web pages), which are chunked, embedded, and stored in a Vecto vector space. Agents query these knowledge bases during message processing to ground their responses in project-specific information.

This is distinct from agent memory in an important way: knowledge bases contain information the user provides, while agent memory contains information the agent learns through interaction.

Design Trade-offs

The three-tier architecture involves several deliberate trade-offs:

Speed vs. completeness: short-term memory is fast (in-memory lookup) but limited. Mid-term memory is comprehensive but slower (network call to Vecto). The system queries both and merges results.

Cost vs. retention: not every interaction is worth storing. The is_useful prompt filters out trivial exchanges ("ok", "thanks", "got it") to avoid filling vector storage with noise.

Privacy vs. sharing: personal agent memory is isolated per-agent, but the zettelkasten supports shared knowledge directories. This lets teams share context without agents accidentally accessing each other's conversation history.

Autonomy vs. control: agents manage their own memory automatically, but users control knowledge bases. This split gives agents the ability to learn while keeping users in control of the authoritative information the agents reference.

Tuning Agent Memory
  • Agent forgets too quickly? Increase the short-term memory limit or reduce the consolidation interval.
  • Agent remembers irrelevant things? Adjust the is_useful prompt to be more selective.
  • Agent lacks domain knowledge? Add documents to a project knowledge base rather than relying on the agent to learn through conversation.
  • Multiple agents need the same context? Use the shared knowledge directory at /data/home/agents/shared/knowledge/.

See Also