How It Works — Runtime Flow

What actually happens under the hood when you chat with an agent that has hybrid memory enabled. The goal is simple: the right memories show up automatically so the agent can give you relevant, personal answers — here’s how that happens each turn.

The big picture

Every conversation turn goes through this cycle:

You send a message
      |
      v
 1. AUTO-RECALL (before_agent_start)
    Search both backends for relevant memories
    Inject top matches into the agent's context
      |
      v
 2. AGENT PROCESSES your message
    Has access to memory tools (memory_store, memory_recall, lookup, etc.)
    Can explicitly store/search if needed
      |
      v
 3. AUTO-CAPTURE (agent_end)
    Scan the assistant's reply for memorable content
    Extract and store facts automatically
      |
      v
 Agent responds to you

No manual intervention needed — the plugin hooks into OpenClaw’s lifecycle events to capture and recall automatically.

Step 1: Auto-Recall (before each turn)

When you send a message, before the agent sees it, the plugin:

Embeds your prompt — sends it to the configured embedding provider (OpenAI by default; or Ollama, ONNX, or Google) to get a vector representation.
Searches both backends in parallel:
- SQLite FTS5 — full-text search over all stored facts (free, instant).
- LanceDB — vector similarity search over embeddings (finds fuzzy/semantic matches).
Merges and deduplicates — combines results from both backends, removes duplicates, filters superseded facts.
Scores and ranks — factors in: vector similarity, text relevance, importance, recency, decay class (optionally boosting permanent/stable facts).
Applies token budget — trims to maxTokens (default 800) to avoid overwhelming the context.
Injects into context — adds a <memory-context> block before the agent’s system prompt with the top matches.

What the agent sees (injected at the top of context):

<memory-context>
[sqlite/preference] User prefers dark mode in all applications
[lance/decision] Decided to use PostgreSQL because of JSONB support
[sqlite/entity] User's email: john@example.com
</memory-context>

Optional enhancements:

Entity lookup — if your prompt mentions a known entity (e.g. “user”), lookup facts for that entity are merged in.
Retrieval directives — targeted recall when the prompt mentions an entity, matches keywords, or matches a task type; optional one-time recall at session start. Results are merged with semantic recall. Agent-scoped memory and scope filters apply so specialists see only relevant facts.
Summary injection — long facts are injected as short summaries to save tokens.
Graph traversal — if graph memory is enabled, related facts are discovered via typed links (zero LLM cost).
Query expansion — when queryExpansion.enabled is true, the plugin asks the LLM for a short hypothetical answer or expanded query, then embeds that for vector search. The expanded text often sits closer in embedding space to stored facts. Uses the nano tier model. (Legacy: search.hydeEnabled is deprecated; use queryExpansion.enabled.) See SEARCH-RRF-INGEST.md.

Cost per turn: One embedding call per turn (~$0.00002 for OpenAI text-embedding-3-small; free with local providers such as Ollama or ONNX). One nano-tier LLM call if query expansion is enabled (~$0.0001 for nano-tier models). See LLM-AND-PROVIDERS.md for local provider options.

Step 2: Agent processing

The agent processes your message with the injected memories in context. It also has access to tools it can call explicitly:

Tool	What it does	When the agent uses it
`memory_store`	Store a new fact	When it learns something important
`memory_recall`	Search memories by query	When auto-recall missed something
`memory_forget`	Remove a stored fact	When a fact is outdated or wrong
`memory_checkpoint`	Create a snapshot	Before major operations
`memory_prune`	Clean up expired facts	Maintenance
`lookup`	Exact entity/key lookup	“What’s User’s email?”
`memory_link`	Create a relationship between facts	Connect related facts
`memory_reflect`	Run pattern synthesis	Extract behavioral patterns

Most of the time, the agent doesn’t need to use these explicitly — auto-capture handles the common case.

Step 3: Auto-Capture (after each turn)

After the agent responds, the plugin scans the assistant’s reply:

Filter check (shouldCapture()) — regex triggers look for memorable content:
- Preference signals: “prefer”, “like”, “hate”, “want”
- Decision signals: “decided”, “chose”, “will use”, “always”, “never”
- Entity signals: email addresses, phone numbers, “is called”
- Factual signals: “born”, “birthday”, “lives”, “works”
Sensitive content exclusion — skips passwords, API keys, SSNs, credit cards.
Length check — skips messages shorter than 10 chars or longer than captureMaxChars (default 5000).
Category detection (detectCategory()) — fast regex classifies into: preference, fact, decision, entity, or other. No LLM call.
Structured field extraction (extractStructuredFields()) — extracts entity/key/value triples (e.g. “My birthday is Nov 13” → entity=user, key=birthday, value=Nov 13).
Classify-before-write (optional) — if enabled, checks existing facts via embedding similarity. Decides: ADD (new fact), UPDATE (supersede old), DELETE (retract), or NOOP (already known).
Dual store:
- WAL — writes to the write-ahead log first (crash protection).
- SQLite — stores the fact with metadata (category, importance, decay class, tags).
- LanceDB — stores the embedding vector for semantic search.
- WAL cleanup — removes the WAL entry after successful commit.

Cost per turn: Zero or one embedding call (only if a fact is captured). No LLM calls unless classify-before-write is enabled.

Background jobs (automatic)

These run inside the gateway process — no cron needed:

Job	Interval	What it does
Prune	Every 60 minutes	Hard-deletes expired facts; soft-decays confidence for aging facts
Auto-classify	Every 24 hours (+ 5 min after startup)	Reclassifies “other” facts into proper categories via nano-tier LLM
Proposal prune	Every 60 minutes	Removes expired persona proposals (if enabled)
WAL recovery	On startup	Replays any uncommitted operations from the write-ahead log

What happens at startup

When the gateway starts (or restarts):

Config load — reads openclaw.json, validates embedding API key.
Database init — opens SQLite (runs migrations if needed), connects to LanceDB.
WAL recovery — replays any pending operations from the write-ahead log.
Startup prune — deletes any expired facts immediately.
Auto-classify (if enabled) — schedules a classify run 5 minutes after startup.
Timer setup — starts the hourly prune timer and daily classify timer.
Tool registration — registers all memory tools with the agent.
Event hooks — registers before_agent_start (auto-recall) and agent_end (auto-capture).

What happens at shutdown

When the gateway stops:

Timers cleared — prune, classify, and proposal timers are cancelled.
Databases closed — SQLite, LanceDB, and credentials vault (if enabled) are closed cleanly.

Data flow diagram

                    ┌─────────────────────┐
                    │   Your message      │
                    └──────────┬──────────┘
                               │
                    ┌──────────v──────────┐
                    │   AUTO-RECALL       │
                    │                     │
                    │  Embed prompt       │
                    │  Search SQLite FTS5 │──── Free, instant
                    │  Search LanceDB    │──── ~$0.00002
                    │  Merge & rank      │
                    │  Inject top N      │
                    └──────────┬──────────┘
                               │
                    ┌──────────v──────────┐
                    │   AGENT RESPONSE    │
                    │                     │
                    │  (memory tools      │
                    │   available if      │
                    │   needed)           │
                    └──────────┬──────────┘
                               │
                    ┌──────────v──────────┐
                    │   AUTO-CAPTURE      │
                    │                     │
                    │  Regex filter       │──── Free, instant
                    │  Detect category   │──── Free, instant
                    │  Extract fields    │──── Free, instant
                    │  Store → WAL       │──── Disk write
                    │  Store → SQLite    │──── Disk write
                    │  Store → LanceDB   │──── ~$0.00002
                    │  Cleanup WAL       │
                    └─────────────────────┘

Background (automatic):
  ┌─────────────────────────────────────────┐
  │  Every 60 min: Prune expired facts      │
  │  Every 24h:    Auto-classify "other"    │──── ~$0.001/batch
  │  On startup:   WAL recovery + prune     │
  └─────────────────────────────────────────┘

Cost summary

Operation	Cost	When	Model tier
Auto-recall (per turn)	~$0.00002	Every turn	embedding only
Query expansion (per turn, if enabled)	~$0.0001	Every turn	nano
Auto-capture (per captured fact)	~$0.00002	When a fact is captured	embedding only
ClassifyBeforeWrite (per write, if enabled)	~$0.0001	On every memory write	nano
Auto-classify batch (20 facts)	~$0.0002–0.001	Once per 24h	nano
Reflection (per run)	~$0.002	On-demand (CLI)	default
Consolidation (per cluster)	~$0.002	On-demand (CLI)	default
Session distillation (per session)	~$0.01–0.05	On-demand / nightly cron	maintenance by default (`distill.modelTier`; heavy is opt-in)
SQLite operations	Free	Always	—

With the nano tier (gemini-2.5-flash-lite or gpt-4.1-nano), per-message LLM costs drop 5–10× vs mid-tier models. At typical usage (~50 turns/day with query expansion and auto-classify): roughly $0.10–0.20/month total.

DEEP-DIVE.md — Storage internals, search algorithms, tags, links, deduplication
ARCHITECTURE.md — System design and workspace layout
FEATURES.md — Categories, decay, tags, auto-classify
CONFIGURATION.md — Tune auto-recall, auto-capture, token budgets
OPERATIONS.md — Background jobs, cron, scripts, upgrades
TROUBLESHOOTING.md — When things don’t work