Self-Correction Analysis Pipeline
Automated detection of user corrections/nudges in session logs and remediation (memory store, TOOLS.md rules, and proposed AGENTS/skill changes).
Multi-language support
Correction detection uses phrases (e.g. “that was wrong”, “try again”) from the same system as memory triggers:
- English phrases are built in; other languages come from
.language-keywords.json. - Run
openclaw hybrid-mem build-languagesonce (or when you add new languages). It detects top languages from your memory and translates correction signals (and other keyword groups) into those languages. After that,self-correction-extractmatches user messages in any of those languages.
So for full multi-language support: run build-languages, then use the self-correction commands or nightly job as below.
Emoji as signals
User messages that contain emoji are treated as implicit feedback and feed into both pipelines:
- Negative emoji (e.g. 👎 😠 😤 💩 🙁 😞 😒) — Treated as correction signals. A message containing one of these (alone or with text) is picked up by self-correction-extract. If you add a follow-up message explaining what was wrong, the analyzer gets both: the emoji shows you were unhappy, and the next message shows what to fix. Useful when you react with a thumbs-down or angry face and then type “the command should use –dry-run first”.
- Positive emoji (e.g. 👍 ❤️ 😊 😄 🔥 ⭐ ✨) — Treated as reinforcement (enforcer). A message containing one of these is picked up by extract-reinforcement and used to reinforce the preceding assistant turn (e.g. boost confidence on recalled facts or procedures). A lone “👍” or “❤️” after a good answer is enough to signal “I liked that” and strengthen the associated behavior in memory.
Emoji are language-agnostic and are always included in detection; no need to add them to .language-keywords.json. The same rate limits, confidence thresholds, and remediation caps apply.
For a short user-facing overview of how your replies and emoji feed into reinforcement and correction, see FAQ — How does the agent learn from my reactions?.
Learning your feedback wording (user-specific phrases)
Different users express praise and frustration differently. The plugin can learn your wording from session logs in a model-agnostic way (nano-tier and heavy-tier from your plugin config):
- Pre-filter: Messages that already match reinforcement/correction phrases are skipped. A nano-tier model labels the rest as positive/negative/neutral feedback.
- Phrase extraction: Only positive/negative messages are sent to a heavy-tier model to extract candidate phrases.
- Window: Omitting
--daysuses 30 days the first time (or when no.user-feedback-phrases.jsonexists), then 3 days on later runs—suitable for a weekly nightly.
# Auto window (30 days first run, 3 days after); models from config
openclaw hybrid-mem analyze-feedback-phrases
# Optional: override window or model
openclaw hybrid-mem analyze-feedback-phrases --days 30 --model <heavy-model>
# Merge discovered phrases into .user-feedback-phrases.json (used by detection from then on)
openclaw hybrid-mem analyze-feedback-phrases --learn
Discovered phrases are saved under ~/.openclaw/memory/.user-feedback-phrases.json and are merged with the built-in correction and reinforcement lists when building the detection regexes. So after you run with --learn, both self-correction extract and reinforcement extract will match your (and anyone else on the same install’s) typical phrases. Run it periodically (e.g. in a weekly nightly) to keep the list up to date.
Commands
1. Extract incidents (Phase 1)
Scans session JSONL from the last N days and finds user messages that look like corrections, using the merged correction signals (English + translated from .language-keywords.json).
# Default: last 3 days, print summary (and incidents to stdout if any)
openclaw hybrid-mem self-correction-extract
# Last 7 days, write incidents to a file for review or Phase 2
openclaw hybrid-mem self-correction-extract --days 7 --output /path/to/incidents.json
- Sessions are read from
~/.openclaw/agents/*/sessions/*.jsonl(same as session distillation). - Skip filters: heartbeat prompts, cron job text, compaction messages, sub-agent announcements, very short messages.
- Output:
{ incidents: [...], sessionsScanned }. Each incident hasuserMessage,precedingAssistant,followingAssistant,timestamp,sessionFile.
2. Analyze + remediate + report (Phases 2–4)
Takes incidents (from a file or by running extract in memory), sends them to the LLM for categorization and remediation type, then:
- MEMORY_STORE: Stores the suggested fact. Dedup is exact text plus semantic (embedding similarity) when
selfCorrection.semanticDedupis true (default). Threshold configurable viaselfCorrection.semanticDedupThreshold(default 0.92). - TOOLS_RULE: By default, suggested rules are applied (inserted under the configured section, e.g. “Self-correction rules”). To opt out of applying: set
selfCorrection.applyToolsByDefault: falsein config, or pass--no-apply-toolsfor that run. When opt-out is set, use--approveto apply for a run. Auto-rewrite (opt-in): setselfCorrection.autoRewriteTools: trueto have the LLM rewrite the whole TOOLS.md instead of section insert. - AGENTS_RULE / SKILL_UPDATE: Always added to the report as proposals (no auto-apply).
Cap: 5 auto-remediations per run. Report is written to memory/reports/self-correction-YYYY-MM-DD.md.
# Use incidents from file
openclaw hybrid-mem self-correction-run --extract /path/to/incidents.json
# Run extract in memory then analyze (no file)
openclaw hybrid-mem self-correction-run
# Preview only (no store, no TOOLS changes)
openclaw hybrid-mem self-correction-run --dry-run
# Skip applying TOOLS rules this run (only suggest in report)
openclaw hybrid-mem self-correction-run --no-apply-tools
# Force apply when config has applyToolsByDefault: false
openclaw hybrid-mem self-correction-run --approve
# Custom workspace and model
openclaw hybrid-mem self-correction-run --workspace /path/to/project --model gemini-2.0-flash
- Workspace (for TOOLS.md and
memory/reports/):--workspace, orOPENCLAW_WORKSPACE, or~/.openclaw/workspace. - Model:
--modelorconfig.distill.defaultModelorgpt-4o-mini. --no-apply-tools: Do not insert TOOLS rules this run (only suggest in report). Opt-out from default apply.--approve: Force apply TOOLS rules this run when config hasapplyToolsByDefault: false.
Nightly cron job (optional)
To run the full pipeline nightly (e.g. 02:30 Europe/Stockholm):
- Extract from the last 3 days (uses multi-language correction signals if
build-languageshas been run). - Analyze with the configured LLM (e.g. Gemini for cost/context).
- Auto-remediate (memory store + TOOLS.md append; cap 5).
- Report to
memory/reports/self-correction-YYYY-MM-DD.md.
Example job definition (schedule format depends on your OpenClaw/jobs setup):
{
"name": "self-correction-analysis",
"schedule": "30 2 * * *",
"tz": "Europe/Stockholm",
"message": "Run the nightly self-correction analysis: openclaw hybrid-mem self-correction-run. Uses last 3 days of sessions, multi-language correction detection from .language-keywords.json (run build-languages first for non-English). Report is written to workspace memory/reports/self-correction-YYYY-MM-DD.md.",
"sessionTarget": "isolated",
"model": "sonnet"
}
If your runner executes shell commands, you can instead run:
openclaw hybrid-mem self-correction-run
Ensure OPENCLAW_WORKSPACE (or your workspace root) is set so the report and TOOLS.md paths are correct.
Configuration (optional)
Under plugins.entries["openclaw-hybrid-memory"].config.selfCorrection:
| Option | Default | Description |
|---|---|---|
semanticDedup | true | Skip storing facts that are semantically similar to existing ones (embedding similarity). |
semanticDedupThreshold | 0.92 | Similarity threshold 0–1; higher = stricter (fewer near-duplicates stored). |
toolsSection | "Self-correction rules" | TOOLS.md section heading under which to insert rules. |
applyToolsByDefault | true | When true, apply (insert) suggested TOOLS rules by default. Set false to only suggest (then use --approve to apply). Use CLI --no-apply-tools to skip applying for one run. |
autoRewriteTools | false | When true, LLM rewrites TOOLS.md to integrate new rules (no duplicates/contradictions). When false, use section insert. |
analyzeViaSpawn | false | When true and incident count > spawnThreshold, run Phase 2 (analyze) via openclaw sessions spawn --model <spawnModel> for large context (e.g. Gemini). |
spawnThreshold | 15 | Use spawn for Phase 2 when incidents exceed this count. |
spawnModel | "gemini" | Model for spawn when analyzeViaSpawn is true. |
Example (in openclaw.json or plugin config):
"selfCorrection": {
"semanticDedup": true,
"semanticDedupThreshold": 0.92,
"toolsSection": "Self-correction rules",
"autoRewriteTools": false,
"analyzeViaSpawn": true,
"spawnThreshold": 15,
"spawnModel": "gemini"
}
Phase 2 via spawn (large incident batches)
For very large incident batches, Phase 2 (LLM analysis) can be run via openclaw sessions spawn so the analysis uses a separate process and a model with a large context (e.g. Gemini).
- Set
selfCorrection.analyzeViaSpawn: trueand optionallyspawnThreshold(default 15). When incident count exceeds the threshold, the plugin runsopenclaw sessions spawn --model <spawnModel> --message "..." --attach <prompt-file>and parses the JSON array from stdout. - Requires the OpenClaw CLI and a working
sessions spawncommand. If spawn fails, the run returns an error.
Historical testing (e.g. Feb 13–18)
To test with a fixed date range or existing extract:
- Extract incidents from the last N days and save to a file:
openclaw hybrid-mem self-correction-extract --days 6 --output /path/to/incidents.json - Run the pipeline on that file (optionally with
--dry-runfirst):openclaw hybrid-mem self-correction-run --extract /path/to/incidents.json # Or with approval for TOOLS rules: openclaw hybrid-mem self-correction-run --extract /path/to/incidents.json --approve
Adjust --days and paths as needed. The report is still written to memory/reports/self-correction-YYYY-MM-DD.md (today’s date).
Protocol summary (for the cron agent)
- Run
openclaw hybrid-mem self-correction-extract --days 3(or rely onself-correction-runto do the extract in memory). - Run
openclaw hybrid-mem self-correction-run(optionally with--extract <path>if you saved incidents to a file). - Report path:
<workspace>/memory/reports/self-correction-YYYY-MM-DD.md. Review proposals (AGENTS_RULE / SKILL_UPDATE) before applying.
Related
- GitHub issue #34: Nightly Self-Correction Analysis
- build-languages: CLI reference — run first for non-English correction detection.
- Reinforcement (positive signals):
openclaw hybrid-mem extract-reinforcement— uses praise phrases and positive emoji (👍 ❤️ etc.) to reinforce facts and procedures; see cron jobextract-reinforcementand CLI-REFERENCE.md. - Session distillation: SESSION-DISTILLATION.md — separate pipeline (fact extraction from sessions).