Deep Dive — Storage, Search, Tags, Links, and Context

How facts are stored, searched, linked, and injected — the internals explained.

The two backends

Every fact is stored in two places simultaneously:

1. SQLite + FTS5 (structured storage)

File: ~/.openclaw/memory/facts.db

SQLite stores the full fact with all metadata:

facts table
├── id              (UUID)
├── text            ("User prefers dark mode")
├── category        (preference)
├── importance      (0.5)
├── entity          ("user")
├── key             ("preference")
├── value           ("dark mode")
├── source          ("conversation")
├── created_at      (epoch seconds)
├── source_date     (epoch seconds, when fact originated)
├── decay_class     (stable)
├── expires_at      (epoch seconds or null)
├── confidence      (1.0, decays over time)
├── summary         (short summary for long facts)
├── tags            ("ui,preference")
├── normalized_hash (SHA-256 of normalized text, for dedup)
├── recall_count    (how many times recalled)
├── last_accessed   (epoch seconds)
├── access_count    (INTEGER, times recalled — added by migration #237)
├── last_accessed_at (ISO 8601 timestamp of last recall — added by migration #237)
├── valid_from      (bi-temporal: when fact became true)
├── valid_until     (bi-temporal: when fact stopped being true)
├── superseded_at   (epoch seconds, when a newer fact replaced this)
├── superseded_by   (ID of the replacing fact)
└── supersedes_id   (ID of the fact this one replaced)

An FTS5 virtual table (facts_fts) mirrors text, category, entity, key, and value for full-text search. It uses Porter stemming and Unicode tokenization — so searching for “preferred” matches “prefer”, and non-ASCII characters work correctly. Triggers keep FTS in sync on every insert/update/delete.

Indexes on: category, entity, created_at, expires_at, decay_class, tags, source_date, last_accessed, superseded_at, valid_from/valid_until, normalized_hash, access_count, last_accessed_at (partial, non-null only).

2. LanceDB (vector storage)

Directory: ~/.openclaw/memory/lancedb/

LanceDB stores the embedding vector alongside minimal metadata:

memories table (LanceDB)
├── id              (UUID)
├── text            (fact text)
├── vector          (float array, 1536 dims for text-embedding-3-small)
├── importance      (0.5)
├── category        ("preference")
└── createdAt       (epoch seconds)

The vector is generated by sending the fact text to the configured embedding provider (default: OpenAI text-embedding-3-small, 1536 dimensions). Alternative providers — Ollama (local), ONNX (local), or Google (Gemini API) — are also supported; see LLM-AND-PROVIDERS.md. LanceDB uses this vector for approximate nearest neighbor search — finding facts that are semantically similar to a query even when the exact words don’t match.

Why two backends?

	SQLite + FTS5	LanceDB
Query type	Exact match, keyword, entity/key lookup	“What was that thing about…” fuzzy semantic
Cost	Free (local)	~$0.00002 per embedding (OpenAI); free with Ollama or ONNX local providers
Speed	Instant (local disk)	Fast (local disk + ANN index)
Structured data	Full metadata (entity, key, value, tags, decay)	Minimal (text + vector)
When it excels	“What’s User’s email?” — exact entity/key	“That discussion about database choices” — semantic

By searching both and merging results, the system gets the best of both worlds.

How search works

FTS5 search (SQLite)

When you search for “database performance”:

Query preparation — words are quoted and joined with OR: "database" OR "performance"
FTS5 MATCH — Porter stemming means “databases” matches “database”, “performing” matches “performance”
Scoring — combines three factors:
- BM25 rank (60%) — text relevance from FTS5
- Freshness (25%) — how far from expiry (1.0 = not expiring, 0.0 = expired)
- Confidence (15%) — decays over time; refreshed on access
- Dynamic salience — access boost (frequently recalled facts score higher) and time decay (older unused memories fade). See DYNAMIC-SALIENCE.md.
Filtering — excludes expired facts, superseded facts, and optionally filters by tag
Sorting — by composite score, then by effective date (newer first) on ties
Access tracking — bumps access_count, last_accessed_at (and recall_count) for returned facts; extends TTL for stable/active/durable/normal facts; drives salience scoring

Vector search (LanceDB)

When you search for “database performance”:

Embed the query — sends text to OpenAI, gets a 1536-dim vector (~$0.00002)
ANN search — LanceDB finds the N nearest vectors by distance
Score conversion — distance → score: score = 1 / (1 + distance) (higher = more similar)
Min score filter — drops results below minScore (default 0.3)

Merge and deduplicate

Results from both backends are merged:

SQLite results first — added to the merged list (have full metadata)
LanceDB results — added if:
- Not a duplicate (by ID or by text match with an existing result)
- Not a superseded fact (checked against superseded texts cache)
Sort by score — highest score first; ties broken by newer effective date (source_date or created_at)
Limit — trim to the requested number of results

LanceDB results that match a SQLite fact by text are enriched — the full metadata from SQLite replaces the minimal LanceDB metadata. This means vector search results get proper entity/key/value, tags, decay info, etc.

How lookup works

Lookup is SQLite-only — no vector search, no embedding cost.

lookup("user", "preference") runs:

SELECT * FROM facts
WHERE lower(entity) = lower('user')
  AND lower(key) = lower('preference')
  AND (expires_at IS NULL OR expires_at > now)
  AND superseded_at IS NULL
ORDER BY confidence DESC, COALESCE(source_date, created_at) DESC

Returns: all matching facts, ordered by confidence (highest first), then by effective date (newest first).

With tag filter: lookup("user", "preference", "ui") adds:

AND (',' || COALESCE(tags,'') || ',') LIKE '%,ui,%'

How auto-recall injects context

Each turn, before the agent sees your message:

Search both backends with your prompt as the query
Merge results (as described above)
Optional entity lookup — if your prompt mentions a known entity (e.g. “user”), lookup facts for that entity are merged in. Names come from entityLookup.entities when that list is non-empty; otherwise, with entityLookup.autoFromFacts true (default), from distinct entity values on active facts (capped by maxAutoEntities). See CONFIGURATION-MODES.md and CONFIGURATION.md.
Optional graph traversal — if enabled, follow typed links from seed facts to discover related facts (zero LLM cost)
Score adjustments:
- preferLongTerm — multiply score by 1.2 for permanent facts, 1.1 for stable
- useImportanceRecency — factor in importance and recency alongside relevance
Token budget — accumulate facts until maxTokens (default 800) is reached
Summary injection — if a fact is longer than summaryThreshold (default 300 chars) and has a stored summary, inject the summary instead
Format — each fact is formatted as:
- full: [sqlite/preference] User prefers dark mode
- short: preference: User prefers dark mode
- minimal: User prefers dark mode
Inject — the formatted block is prepended to the agent’s context as <memory-context>...</memory-context>

Memory links (graph)

What links are

Links are typed, directed relationships between facts stored in the memory_links table:

memory_links table
├── id               (UUID)
├── source_fact_id   (the "from" fact)
├── target_fact_id   (the "to" fact)
├── link_type        (SUPERSEDES, CAUSED_BY, PART_OF, RELATED_TO, DEPENDS_ON)
├── strength         (0.0 – 1.0, default 1.0)
└── created_at       (epoch seconds)

Link types

Type	Meaning	Example
`SUPERSEDES`	New fact replaces old	“Use pnpm” supersedes “Use npm”
`CAUSED_BY`	A caused B	“Build failure” caused by “Dependency update”
`PART_OF`	A is part of B	“Login page” is part of “Auth system”
`RELATED_TO`	General association	“Database schema” related to “API endpoints”
`DEPENDS_ON`	A requires B	“Frontend deploy” depends on “API deploy”

How links are created

Explicit: The agent calls memory_link(sourceId, targetId, linkType, strength).

Auto-linking (when graph.autoLink is enabled): After storing a new fact, the plugin finds similar existing facts via embedding search. If the similarity score exceeds graph.autoLinkMinScore (default 0.7), a RELATED_TO link is created automatically.

Supersession: When classify-before-write determines a fact should UPDATE an existing one, a SUPERSEDES link is implicit in the supersedes_id / superseded_by columns.

How graph traversal works

When recall is enabled with graph.useInRecall:

Seed set — initial search results (from FTS5 + LanceDB)
BFS traversal — from each seed fact, follow links in both directions up to graph.maxTraversalDepth hops (default 2)
Collect connected facts — all discovered fact IDs
Fetch and merge — load the connected facts from SQLite and add them to the result set

This is zero LLM cost — graph traversal uses only SQLite queries. It finds causally or structurally related facts that vector search would miss.

Supersession (contradiction resolution)

When a new fact contradicts or updates an old one:

→ Full guide: CONFLICTING-MEMORIES.md

Automatic (classify-before-write)

If store.classifyBeforeWrite is enabled:

Find similar facts — embed the new text, search LanceDB + SQLite for similar existing facts
LLM classification — send the new fact + similar existing facts to a cheap LLM. It decides:
- ADD — new information, store alongside existing
- UPDATE — new version of an existing fact; supersede the old one
- DELETE — retraction of an existing fact; mark it as superseded
- NOOP — already known; don’t store
On UPDATE: the old fact gets superseded_at, superseded_by, and valid_until set. The new fact gets supersedes_id and valid_from. Search filters exclude superseded facts by default.

Manual

The memory_store tool accepts a supersedes parameter (fact ID). When provided, the specified fact is marked as superseded and the new fact is linked. The CLI supports the same: hybrid-mem store --text "..." --supersedes <fact-id>.

Bi-temporal queries

Every fact has valid_from and valid_until (epoch seconds). This enables point-in-time queries:

# What did we know as of January 15?
openclaw hybrid-mem search "database" --as-of 2026-01-15

The search adds: AND valid_from <= @asOf AND (valid_until IS NULL OR valid_until > @asOf).

File-based memory (memorySearch)

Separate from the plugin, but part of the hybrid system:

What it is

memorySearch is an OpenClaw built-in feature that indexes all memory/**/*.md files under your workspace. It provides semantic search over your file-based knowledge.

How it works

Indexing — on session start (and on file watch), memorySearch reads all .md files under memory/, chunks them (500 tokens, 50 overlap), and stores chunks in its own SQLite + vector index.
Search — hybrid BM25 + vector search over the chunks. Results include file path, section, and matching text.
Automatic — happens transparently when the agent needs information. No explicit action required.

How it differs from the plugin

	memory-hybrid plugin	memorySearch
Data	Individual facts (1 sentence – 1 paragraph)	Whole markdown files (any size)
Storage	Plugin’s SQLite + LanceDB	OpenClaw’s built-in index
Write	`memory_store`, auto-capture, CLI	Manual file editing, agent file writes
Search	Auto-recall, `memory_recall`, `lookup`	Automatic on session start, on search
Best for	Isolated facts, preferences, decisions	Structured docs, project state, reference data
Loaded	Auto-injected each turn (top N by relevance)	On-demand (when query matches a chunk)

When to use which

Small, isolated fact (“User’s timezone is CET”) → memory_store (plugin)
Structured reference doc (API endpoints, device list) → memory/technical/api.md (file)
Project status with roadmap → memory/projects/project.md (file)
Decision with rationale → memory_store (auto-captured) + memory/decisions/2026-02.md (file)

Both systems work together: the agent gets auto-recalled facts in context plus can search files when it needs deeper information.

Deduplication

Three levels prevent duplicate facts:

1. Exact text match

Before storing, checks: SELECT id FROM facts WHERE text = ? LIMIT 1. If found, the store is skipped.

2. Fuzzy dedup (optional)

When store.fuzzyDedupe is enabled, text is normalized (trim, collapse whitespace, lowercase) and SHA-256 hashed. If an existing fact has the same hash, the store is skipped. Catches near-identical rephrasing.

3. Vector dedup

Before adding to LanceDB, checks if a very similar vector already exists: hasDuplicate(vector, threshold=0.95). If found, the LanceDB write is skipped (SQLite still stores the fact for structured queries).

4. Classify-before-write (optional)

The most sophisticated level: asks an LLM whether the new fact is truly new (ADD), updates an existing fact (UPDATE), retracts one (DELETE), or is already known (NOOP). See the Supersession section above.

HOW-IT-WORKS.md — Runtime flow overview
FEATURES.md — Categories, decay, auto-classify
GRAPH-MEMORY.md — Graph-based spreading activation (full spec)
CONFLICTING-MEMORIES.md — Conflicting memories (classify-before-write, supersession, bi-temporal)
AUTOMATIC-CATEGORIES.md — Automatic category discovery
DYNAMIC-DERIVED-DATA.md — Tags, categories, decay, and other derived data (index)
CONFIGURATION.md — All config options for tuning
ARCHITECTURE.md — System design and workspace layout
CLI-REFERENCE.md — All CLI commands

Deep Dive — Storage, Search, Tags, Links, and Context

The two backends

1. SQLite + FTS5 (structured storage)

2. LanceDB (vector storage)

Why two backends?

How search works

FTS5 search (SQLite)

Vector search (LanceDB)

Merge and deduplicate

How lookup works

How auto-recall injects context

Tags

What tags are

How tags are assigned

How tags are stored

How tag filtering works

Memory links (graph)

What links are

Link types

How links are created

How graph traversal works

Supersession (contradiction resolution)

Automatic (classify-before-write)

Manual

Bi-temporal queries

File-based memory (memorySearch)

What it is

How it works

How it differs from the plugin

When to use which

Deduplication

1. Exact text match

2. Fuzzy dedup (optional)

3. Vector dedup

4. Classify-before-write (optional)

Deep Dive — Storage, Search, Tags, Links, and Context

The two backends

1. SQLite + FTS5 (structured storage)

2. LanceDB (vector storage)

Why two backends?

How search works

FTS5 search (SQLite)

Vector search (LanceDB)

Merge and deduplicate

How lookup works

How auto-recall injects context

Tags

What tags are

How tags are assigned

How tags are stored

How tag filtering works

Memory links (graph)

What links are

Link types

How links are created

How graph traversal works

Supersession (contradiction resolution)

Automatic (classify-before-write)

Manual

Bi-temporal queries

File-based memory (memorySearch)

What it is

How it works

How it differs from the plugin

When to use which

Deduplication

1. Exact text match

2. Fuzzy dedup (optional)

3. Vector dedup

4. Classify-before-write (optional)

Related docs