Knowledge & Memory

Agents on the XpressAI Platform have three tiers of memory, modeled loosely after how biological memory works. Short-term memory holds the current conversation. Mid-term memory stores semantically searchable past interactions in the cloud. Long-term memory lives as markdown files on disk. Understanding these tiers --- what goes where, how long it lasts, and how the agent retrieves it --- is essential for tuning agent behavior and building agents that actually learn from experience.

The Three Tiers

Tier 1: Short-term Memory (In-Memory, Per-Conversation)

Short-term memory is the simplest tier. It holds the last N messages of the current conversation as a JSON array in the agent's working memory.

Property	Value
Storage	In-memory (agent process)
Scope	Current conversation only
Default limit	10 messages
Persistence	None --- cleared between conversations
Retrieval	Direct inclusion in LLM prompt

When the short-term buffer is full (more than 10 messages), the oldest messages are pushed to mid-term memory. This keeps the immediate context window focused on the most recent interaction while preserving older messages for semantic retrieval.

Why 10, Not 50?

The conversation history window (50 messages, described in Conversation Model) is the platform-level limit on what gets sent to the agent. The short-term memory limit (10 messages) is the agent-level limit on what the agent keeps in its active working memory within a single turn. These are different mechanisms at different layers.

Two Different Limits

Platform history window (50 messages): The maximum conversation history sent to the messaging system.
Agent short-term memory (10 messages): The most recent messages the agent's reasoning loop considers for each response.

The agent sees up to 10 recent messages for immediate context, while the platform retains up to 50 messages for the broader conversation history.

Tier 2: Mid-term Memory (Vecto Cloud Vectors)

Mid-term memory uses Vecto, a cloud-based vector database, to store and retrieve past interactions through semantic search.

Property	Value
Storage	Vecto cloud (managed vector DB)
Scope	All past interactions for this agent
Persistence	Survives restarts, pod terminations
Retrieval	Semantic similarity search
Embeddings	QWEN2 model (a pre-trained embedding model used to convert text into vector representations for semantic search; this is the platform default and is not currently user-configurable)

The VectoMemory module handles all vector operations:

Store: new memories are embedded and stored in the agent's vector space.
Search: given a query (e.g., the current user message), retrieve the most semantically similar past memories.
Update: existing memories can be rewritten during consolidation.
Delete: outdated or irrelevant memories can be removed.

Each agent gets its own auto-created vector space in Vecto. This ensures memory isolation between agents --- one agent's memories are never mixed with another's.

Tier 3: Long-term Memory (File-based Zettelkasten)

Long-term memory is the most durable tier. Agents maintain a zettelkasten (a system of interconnected markdown notes) on the NFS filesystem.

Property	Value
Storage	NFS filesystem (persistent volume)
Format	Markdown files
Personal path	`/data/home/knowledge/`
Shared path	`/data/home/agents/shared/knowledge/`
Persistence	Survives everything --- restarts, redeployments, scaling events
Retrieval	File listing + reading; agents manage their own knowledge structure

The zettelkasten is the agent's personal knowledge base. Agents create, update, and organize notes as part of their normal operation. During idle time (see Task Execution Lifecycle), agents often review and consolidate their zettelkasten.

There are two knowledge locations:

/data/home/knowledge/: the agent's personal knowledge. Only this agent reads and writes here.
/data/home/agents/shared/knowledge/: shared team knowledge. All agents in the project can read this. It typically contains company context, reference documents, and shared protocols.

Memory Lifecycle: The Hippo Orchestrator

The Hippo orchestrator (named after the hippocampus, the brain's memory center) manages how memories flow between tiers. It has two modes of operation.

Synchronous Path (During Message Processing)

When an agent receives a message, before generating a response:

Filter short-term: select relevant messages from the current conversation buffer.
Search mid-term: query Vecto for semantically related past memories.
Inject into context: append retrieved memories to the LLM prompt as additional context.

This happens on every message and ensures the agent has relevant historical context even for topics discussed days or weeks ago.

Asynchronous Path (Consolidation)

Every 5 minutes, the Hippo orchestrator runs a consolidation cycle:

The consolidation cycle uses three LLM prompts:

Prompt	Purpose	Decision
`is_useful`	Evaluate whether a memory is worth keeping	Keep or discard
`rewrite_memory`	Rewrite the memory for clarity, removing conversational noise	Cleaned text for storage
`create_memory`	Decide whether to create a new mid-term memory entry	Store or skip

Why Asynchronous?

Memory consolidation is computationally expensive --- it requires multiple LLM calls per memory. Running it synchronously during message processing would add seconds of latency to every response. The 5-minute async cycle keeps response times fast while still ensuring memories are consolidated regularly.

Knowledge Bases (Project-Level RAG)

Separate from agent memory, the platform supports knowledge bases --- project-wide vector spaces that agents access through RAG (Retrieval-Augmented Generation).

Aspect	Agent Memory	Knowledge Base
Scope	Per-agent	Per-project
Managed by	Agent (automatic)	Users (via Knowledge UI)
Content	Past interactions, learned facts	Uploaded documents, reference material
Access	Only the owning agent	All agents in the project

Knowledge bases are created and managed by users through the Knowledge UI. Users upload documents (PDFs, text files, web pages), which are chunked, embedded, and stored in a Vecto vector space. Agents query these knowledge bases during message processing to ground their responses in project-specific information.

This is distinct from agent memory in an important way: knowledge bases contain information the user provides, while agent memory contains information the agent learns through interaction.

Design Trade-offs

The three-tier architecture involves several deliberate trade-offs:

Speed vs. completeness: short-term memory is fast (in-memory lookup) but limited. Mid-term memory is comprehensive but slower (network call to Vecto). The system queries both and merges results.

Cost vs. retention: not every interaction is worth storing. The is_useful prompt filters out trivial exchanges ("ok", "thanks", "got it") to avoid filling vector storage with noise.

Privacy vs. sharing: personal agent memory is isolated per-agent, but the zettelkasten supports shared knowledge directories. This lets teams share context without agents accidentally accessing each other's conversation history.

Autonomy vs. control: agents manage their own memory automatically, but users control knowledge bases. This split gives agents the ability to learn while keeping users in control of the authoritative information the agents reference.

Tuning Agent Memory

Agent forgets too quickly? Increase the short-term memory limit or reduce the consolidation interval.
Agent remembers irrelevant things? Adjust the is_useful prompt to be more selective.
Agent lacks domain knowledge? Add documents to a project knowledge base rather than relying on the agent to learn through conversation.
Multiple agents need the same context? Use the shared knowledge directory at /data/home/agents/shared/knowledge/.

The Three Tiers​

Tier 1: Short-term Memory (In-Memory, Per-Conversation)​

Tier 2: Mid-term Memory (Vecto Cloud Vectors)​

Tier 3: Long-term Memory (File-based Zettelkasten)​

Memory Lifecycle: The Hippo Orchestrator​

Synchronous Path (During Message Processing)​

Asynchronous Path (Consolidation)​

Knowledge Bases (Project-Level RAG)​

Design Trade-offs​

See Also​