Hybrid-memory audit remediation (spikes + operator runbook)

This document backs the audit-health remediation work: operator commands for your live store, spike conclusions from code inspection, and follow-up fixes shipped in the plugin.

Phase 0 — Run on your live deployment (Node 22+)

Requires OpenClaw CLI with hybrid-memory loaded (openclaw hybrid-mem …). If the CLI reports a Node version error, use Node 22+ (see project .nvmrc).

Execute in order:

Collapse implicit-feedback paraphrase bloat (preview then apply):

openclaw hybrid-mem reflect-meta --collapse-implicit-feedback --include-legacy --threshold 0.8 --limit 1000 --dry-run
openclaw hybrid-mem reflect-meta --collapse-implicit-feedback --include-legacy --threshold 0.8 --limit 1000

Categories — defaults now include forge, monitoring, ops_*, etc. Apply the legacy remap policy (forge_* → forge, episode → ops_summary):

openclaw hybrid-mem categories audit
openclaw hybrid-mem categories remap --from forge_busy --to forge --apply
openclaw hybrid-mem categories remap --from forge_dispatch --to forge --apply
openclaw hybrid-mem categories remap --from forge_ops --to forge --apply
openclaw hybrid-mem categories remap --from episode --to ops_summary --apply

Re-embed vectorless facts (after collapse):

openclaw hybrid-mem reembed-vectorless --apply

Lance maintenance:

openclaw hybrid-mem vectordb-optimize --older-than-days 7

Re-baseline:
```
openclaw hybrid-mem audit health
```

Spike: vectorless auto-capture / reflection

Finding: audit health counts “vectorless” as facts without a fact_embeddings row (variant = canonical), not “missing Lance rows”. Ingest paths (stage-capture, reflection) previously called vectorDb.store + setEmbeddingModel but did not call factsDb.storeEmbedding, so large stores looked vectorless while Lance still held vectors.

Fix: Persist canonical embeddings via factsDb.storeEmbedding alongside Lance on successful embed paths (lifecycle/stage-capture.ts, services/reflection.ts).

Spike: procedures stuck in `low_recall`

Finding: procedureFeedback intentionally avoids bumping procedures.success_count, while triage uses enriched ProcedureEntry.successCount. Validated procedures could still show successCount === 0 with no procedure_versions rows → low_recall.

Fix: enrichProcedureWithFeedback applies an implied success of 1 when last_validated is set, base counts are zero, and there are no version failures yet (both branches: no versions / merged versions).

Spike: `registerContextEngine` log line

Finding: Older OpenClaw runtimes omit registerContextEngine from openclaw/plugin-sdk/core; registration is already feature-detected.

Follow-up: Type shim now declares optional registerContextEngine for clarity; runtime still uses typeof registerContextEngine === "function".

Spike: entity stop-words (`User` vs `user`)

Finding: Historical facts and tool-credential capture can still use stop-word entities; cleanEntityStopwords exists for one-shot cleanup. Audit output now lists topEntitiesFiltered (retrieval-aligned) separately from the stop-word warning (raw topEntities).

Scheduled maintenance (install cron)

New / updated jobs in cli/cmd-install.ts:

Job	Schedule (UTC)	Command
`daily-storage-growth-sample`	`5 4 * * *`	`openclaw hybrid-mem record-storage-sample`
`weekly-implicit-feedback-collapse`	`30 4 * * 0`	`reflect-meta --collapse-implicit-feedback --include-legacy …`
`weekly-vectordb-optimize-sunday`	`45 4 * * 0`	`vectordb-optimize --older-than-days 7`

Re-run plugin install / cron sync after upgrading so OpenClaw picks up new jobs.

Manual ad hoc sampling

openclaw hybrid-mem record-storage-sample
openclaw hybrid-mem record-storage-sample --force
openclaw hybrid-mem record-storage-sample --dry-run --json

Idempotent: at most one insert per UTC calendar day; enables 7d storage deltas in audit health once daily samples exist. Use --force for same-day manual QA reruns, or --dry-run --json to preview the sample payload without writing a row. Human output now includes parseable markers such as status=success_recorded and status=skipped_already_sampled_today.