Cognitive Architecture
XpressAI agents use a modular cognitive architecture inspired by neuroscience. Each module handles a specific aspect of reasoning, memory, or execution. Modules are wired together in agent.yaml (see XAIBO Configuration).
Message Flow
- An incoming message arrives at the Hippo module, which enriches it with relevant memories.
- The enriched message passes to the PFC, which runs the main reasoning loop.
- During reasoning, the PFC may invoke System 2 for deep analysis, Meeseeks for focused sub-tasks, or Desktop for computer-use actions.
- The PFC calls the
respondtool (from Response Tool Provider) to deliver the final answer.
Modules
PFC (Prefrontal Cortex)
The main reasoning engine. Runs an iterative loop: the LLM generates a response, tools are executed, results feed back into the next iteration.
| Parameter | Default | Description |
|---|---|---|
max_thoughts | 10 | Maximum iterations per turn. Prevents runaway loops |
context_token_budget | 150000 | Token limit for the context window |
How it works:
- Receive enriched message (with memory context) from Hippo.
- Send context to the main LLM.
- If the LLM returns a tool call, execute it via Thalamus validation.
- Append the tool result to context and repeat from step 2.
- When the LLM calls the
respondtool, the loop ends and the response is delivered.
Stress-level temperature adjustment: The PFC can dynamically adjust the LLM temperature based on how many iterations have been used. As the thought count approaches max_thoughts, temperature may increase to encourage the model to converge on an answer.
If an agent seems to loop without responding, check max_thoughts. Setting it too low can cause the agent to hit the limit before finishing complex tasks.
Hippo (Memory Orchestrator)
Manages short-term and mid-term memory. Named after the hippocampus.
| Parameter | Default | Description |
|---|---|---|
memory_size | 10 | Number of recent messages kept in short-term memory |
mid_term_memory_size | 3 | Number of vector search results to retrieve |
consolidation_interval | 300 | Seconds between async consolidation cycles |
Memory tiers:
| Tier | Storage | Retrieval |
|---|---|---|
| Short-term | Last N messages (in-memory) | Always included in context |
| Mid-term | Vecto vector store | Semantic search, top-K results appended to context |
Consolidation: Every consolidation_interval seconds, Hippo runs an async consolidation pass. When short-term memory exceeds memory_size, overflow messages are encoded by the memory LLM and pushed to the Vecto vector store for mid-term retrieval.
System 2 Thinking
Provides deep reasoning capabilities via a high-capability reasoning model (see agent.yaml for the specific model configured). Used sparingly due to higher cost and latency.
Tools exposed to PFC:
| Tool | Input | Purpose |
|---|---|---|
think | problem (string) | Reason through a complex problem step by step |
analyze | subject (string) | Perform deep analysis of a subject |
plan | goal (string) | Create a structured plan to achieve a goal |
System 2 calls use a more expensive model. The PFC decides when to invoke System 2 based on task complexity. Simple questions are handled by the main LLM alone.
Thalamus
Safety validation layer. Sits between the PFC and tool execution. Validates every tool call before it runs.
- Checks that the requested tool exists.
- Validates parameter types and required fields.
- Enforces any tool-level access restrictions.
The Thalamus has no user-facing configuration. It operates transparently.
Meeseeks
Delegates focused sub-tasks to specialized sub-agents. Each Meeseeks runs a single LLM turn with its own system prompt and returns the result.
Configuration example:
meeseeks:
- name: research
description: "Delegate a focused research task."
system_prompt: "You are a research specialist. Given a topic, search the web and compile a concise summary with sources."
- name: writer
description: "Delegate a writing task."
system_prompt: "You are a professional writer. Given a brief, produce polished copy."
Each Meeseeks entry becomes a tool available to the PFC. When invoked, the Meeseeks provider:
- Creates a fresh context with the sub-agent's system prompt.
- Passes the task description as the user message.
- Runs a single LLM turn (no iteration).
- Returns the result to the PFC.
Response Tool Provider
Provides the respond tool that the PFC calls to deliver its final answer to the user.
Flags:
| Flag | Type | Description |
|---|---|---|
continue_as_task | boolean | When true, the response is delivered and a background task is created to continue working |
This is the only way an agent can send a message back to the user. The PFC must explicitly call respond -- it cannot implicitly return text.
Desktop Provider
Enables computer-use capabilities via Claude Sonnet. The agent can view and interact with a desktop environment through a screenshot-action loop.
How it works:
- Take a screenshot of the current desktop state.
- Send the screenshot to Claude Sonnet with the task description.
- Receive an action (click, type, scroll, etc.).
- Execute the action.
- Repeat from step 1 until the task is complete or the iteration limit is reached.
Limits: Maximum 25 iterations per desktop action to prevent runaway interactions.
Tool Logger
A transparent logging wrapper that sits around tool execution. Logs all tool calls and their results for debugging and audit purposes.
| Parameter | Default | Description |
|---|---|---|
max_result_chars | 30000 | Tool results longer than this are truncated in logs |
The Tool Logger does not modify tool behavior. It only observes and records.
Supporting Services
The following components are not modules declared in agent.yaml, but are used internally by the cognitive modules described above.
Vecto Memory (Cloud Vector Store)
Used internally by Hippo for mid-term memory storage. This is not a module in agent.yaml -- it is a backing service that Hippo communicates with automatically.
- Storage: Cloud-hosted vector database (Vecto).
- Auto-provisioning: A vector space is automatically created for each agent on first use.
- Semantic search: Queries are embedded and matched against stored memory vectors.
- Encoding: The memory LLM encodes messages into vector representations during consolidation.