Self-Correction Analysis Pipeline

Automated detection of user corrections/nudges in session logs and remediation (memory store, TOOLS.md rules, and proposed AGENTS/skill changes).

Multi-language support

Correction detection uses phrases (e.g. “that was wrong”, “try again”) from the same system as memory triggers:

English phrases are built in; other languages come from .language-keywords.json.
Run openclaw hybrid-mem build-languages once (or when you add new languages). It detects top languages from your memory and translates correction signals (and other keyword groups) into those languages. After that, self-correction-extract matches user messages in any of those languages.

So for full multi-language support: run build-languages, then use the self-correction commands or nightly job as below.

Emoji as signals

User messages that contain emoji are treated as implicit feedback and feed into both pipelines:

Negative emoji (e.g. 👎 😠 😤 💩 🙁 😞 😒) — Treated as correction signals. A message containing one of these (alone or with text) is picked up by self-correction-extract. If you add a follow-up message explaining what was wrong, the analyzer gets both: the emoji shows you were unhappy, and the next message shows what to fix. Useful when you react with a thumbs-down or angry face and then type “the command should use –dry-run first”.
Positive emoji (e.g. 👍 ❤️ 😊 😄 🔥 ⭐ ✨) — Treated as reinforcement (enforcer). A message containing one of these is picked up by extract-reinforcement and used to reinforce the preceding assistant turn (e.g. boost confidence on recalled facts or procedures). A lone “👍” or “❤️” after a good answer is enough to signal “I liked that” and strengthen the associated behavior in memory.

Emoji are language-agnostic and are always included in detection; no need to add them to .language-keywords.json. The same rate limits, confidence thresholds, and remediation caps apply.

For a short user-facing overview of how your replies and emoji feed into reinforcement and correction, see FAQ — How does the agent learn from my reactions?.

Learning your feedback wording (user-specific phrases)

Different users express praise and frustration differently. The plugin can learn your wording from session logs in a model-agnostic way (nano-tier and heavy-tier from your plugin config):

Pre-filter: Messages that already match reinforcement/correction phrases are skipped. A nano-tier model labels the rest as positive/negative/neutral feedback.
Phrase extraction: Only positive/negative messages are sent to a heavy-tier model to extract candidate phrases.
Window: Omitting --days uses 30 days the first time (or when no .user-feedback-phrases.json exists), then 3 days on later runs—suitable for a weekly nightly.

# Auto window (30 days first run, 3 days after); models from config
openclaw hybrid-mem analyze-feedback-phrases

# Optional: override window or model
openclaw hybrid-mem analyze-feedback-phrases --days 30 --model <heavy-model>

# Merge discovered phrases into .user-feedback-phrases.json (used by detection from then on)
openclaw hybrid-mem analyze-feedback-phrases --learn

Discovered phrases are saved under ~/.openclaw/memory/.user-feedback-phrases.json and are merged with the built-in correction and reinforcement lists when building the detection regexes. So after you run with --learn, both self-correction extract and reinforcement extract will match your (and anyone else on the same install’s) typical phrases. Run it periodically (e.g. in a weekly nightly) to keep the list up to date.

Malformed JSONL: If a session file has a bad line (truncated write, partial copy), analyze-feedback-phrases logs a warning, still scans other session files, and sets error in the CLI result when any file had parse issues. Phrase extraction is skipped for that run (empty reinforcement / correction arrays) until the malformed file is repaired.

Commands

1. Extract incidents (Phase 1)

Scans session JSONL from the last N days and finds user messages that look like corrections, using the merged correction signals (English + translated from .language-keywords.json).

# Default: last 3 days, print summary (and incidents to stdout if any)
openclaw hybrid-mem self-correction-extract

# Last 7 days, write incidents to a file for review or Phase 2
openclaw hybrid-mem self-correction-extract --days 7 --output /path/to/incidents.json

Sessions are read from ~/.openclaw/agents/*/sessions/*.jsonl (same as session distillation).
Skip filters: heartbeat prompts, cron job text, compaction messages, sub-agent announcements, very short messages.
Output: { incidents: [...], sessionsScanned }. Each incident has userMessage, precedingAssistant, followingAssistant, timestamp, sessionFile.

2. Analyze + remediate + report (Phases 2–4)

Takes incidents (from a file or by running extract in memory), sends them to the LLM for categorization and remediation type, then:

MEMORY_STORE: Stores the suggested fact. Dedup is exact text plus semantic (embedding similarity) when selfCorrection.semanticDedup is true (default). Threshold configurable via selfCorrection.semanticDedupThreshold (default 0.92).
TOOLS_RULE: By default, suggested rules are applied (inserted under the configured section, e.g. “Self-correction rules”). To opt out of applying: set selfCorrection.applyToolsByDefault: false in config, or pass --no-apply-tools for that run. When opt-out is set, use --approve to apply for a run. Auto-rewrite (opt-in): set selfCorrection.autoRewriteTools: true to have the LLM rewrite the whole TOOLS.md instead of section insert.
AGENTS_RULE / SKILL_UPDATE: Always added to the report as proposals (no auto-apply).

Cap: 5 auto-remediations per run. Report is written to memory/reports/self-correction-YYYY-MM-DD.md.

# Use incidents from file
openclaw hybrid-mem self-correction-run --extract /path/to/incidents.json

# Run extract in memory then analyze (no file)
openclaw hybrid-mem self-correction-run

# Preview only (no store, no TOOLS changes)
openclaw hybrid-mem self-correction-run --dry-run

# Skip applying TOOLS rules this run (only suggest in report)
openclaw hybrid-mem self-correction-run --no-apply-tools

# Force apply when config has applyToolsByDefault: false
openclaw hybrid-mem self-correction-run --approve

# Custom workspace and model
openclaw hybrid-mem self-correction-run --workspace /path/to/project --model gemini-2.0-flash

Workspace (for TOOLS.md and memory/reports/): --workspace, or OPENCLAW_WORKSPACE, or ~/.openclaw/workspace.
Model: --model or heavy-tier resolution (llm.heavy primary, then built-in heavy default).
Fallbacks: when llm.heavy has one primary model, fallback candidates are merged from llm.fallbackModel and distill.fallbackModels (de-duplicated) and tried in order.
--model override fallback behavior: keeps configured fallback candidates by prepending the heavy-tier primary model to the same fallback chain.
--no-apply-tools: Do not insert TOOLS rules this run (only suggest in report). Opt-out from default apply.
--approve: Force apply TOOLS rules this run when config has applyToolsByDefault: false.

Nightly cron job (optional)

To run the full pipeline nightly (e.g. 02:30 Europe/Stockholm):

Extract from the last 3 days (uses multi-language correction signals if build-languages has been run).
Analyze with the configured LLM (e.g. Gemini for cost/context).
Auto-remediate (memory store + TOOLS.md append; cap 5).
Report to memory/reports/self-correction-YYYY-MM-DD.md.

Example job definition (schedule format depends on your OpenClaw/jobs setup):

{
  "name": "self-correction-analysis",
  "schedule": "30 2 * * *",
  "tz": "Europe/Stockholm",
  "message": "Run the nightly self-correction analysis: openclaw hybrid-mem self-correction-run. Uses last 3 days of sessions, multi-language correction detection from .language-keywords.json (run build-languages first for non-English). Report is written to workspace memory/reports/self-correction-YYYY-MM-DD.md.",
  "sessionTarget": "isolated",
  "model": "sonnet"
}

If your runner executes shell commands, you can instead run:

openclaw hybrid-mem self-correction-run

Ensure OPENCLAW_WORKSPACE (or your workspace root) is set so the report and TOOLS.md paths are correct.

Configuration (optional)

Under plugins.entries["openclaw-hybrid-memory"].config.selfCorrection:

Option	Default	Description
`semanticDedup`	`true`	Skip storing facts that are semantically similar to existing ones (embedding similarity).
`semanticDedupThreshold`	`0.92`	Similarity threshold 0–1; higher = stricter (fewer near-duplicates stored).
`toolsSection`	`"Self-correction rules"`	TOOLS.md section heading under which to insert rules.
`applyToolsByDefault`	`true`	When `true`, apply (insert) suggested TOOLS rules by default. Set `false` to only suggest (then use `--approve` to apply). Use CLI `--no-apply-tools` to skip applying for one run.
`autoRewriteTools`	`false`	When `true`, LLM rewrites TOOLS.md to integrate new rules (no duplicates/contradictions). When `false`, use section insert.
`analyzeViaSpawn`	`false`	When `true` and incident count > `spawnThreshold`, run Phase 2 (analyze) via `openclaw sessions spawn --model <spawnModel>` for large context (e.g. Gemini).
`spawnThreshold`	`15`	Use spawn for Phase 2 when incidents exceed this count.
`spawnModel`	`"gemini"`	Model for spawn when `analyzeViaSpawn` is true.

Example (in openclaw.json or plugin config):

"selfCorrection": {
  "semanticDedup": true,
  "semanticDedupThreshold": 0.92,
  "toolsSection": "Self-correction rules",
  "autoRewriteTools": false,
  "analyzeViaSpawn": true,
  "spawnThreshold": 15,
  "spawnModel": "gemini"
}

Phase 2 via spawn (large incident batches)

For very large incident batches, Phase 2 (LLM analysis) can be run via openclaw sessions spawn so the analysis uses a separate process and a model with a large context (e.g. Gemini).

Set selfCorrection.analyzeViaSpawn: true and optionally spawnThreshold (default 15). When incident count exceeds the threshold, the plugin runs openclaw sessions spawn --model <spawnModel> --message "..." --attach <prompt-file> and parses the JSON array from stdout.
Requires the OpenClaw CLI and a working sessions spawn command. If spawn fails, the run returns an error.

Historical testing (e.g. Feb 13–18)

To test with a fixed date range or existing extract:

Extract incidents from the last N days and save to a file:

openclaw hybrid-mem self-correction-extract --days 6 --output /path/to/incidents.json

Run the pipeline on that file (optionally with --dry-run first):

openclaw hybrid-mem self-correction-run --extract /path/to/incidents.json
# Or with approval for TOOLS rules:
openclaw hybrid-mem self-correction-run --extract /path/to/incidents.json --approve

Adjust --days and paths as needed. The report is still written to memory/reports/self-correction-YYYY-MM-DD.md (today’s date).

Protocol summary (for the cron agent)

Run openclaw hybrid-mem self-correction-extract --days 3 (or rely on self-correction-run to do the extract in memory).
Run openclaw hybrid-mem self-correction-run (optionally with --extract <path> if you saved incidents to a file).
Report path: <workspace>/memory/reports/self-correction-YYYY-MM-DD.md. Review proposals (AGENTS_RULE / SKILL_UPDATE) before applying.

GitHub issue #34: Nightly Self-Correction Analysis
build-languages: CLI reference — run first for non-English correction detection.
Reinforcement (positive signals): openclaw hybrid-mem extract-reinforcement — uses praise phrases and positive emoji (👍 ❤️ etc.) to reinforce facts and procedures; see cron job extract-reinforcement and CLI-REFERENCE.md.
Session distillation: SESSION-DISTILLATION.md — separate pipeline (fact extraction from sessions).