Model-agnostic support: analysis and options

The hybrid-memory plugin currently hardcodes OpenAI for embeddings and chat (classify, consolidate, summarize), and Gemini is recommended in docs/scripts for session distillation. This document analyzes how much work it would take to support OpenAI, Gemini, and Claude (or any combination) in a cleaner way.

Decision (as of 2026.2.16): We are not implementing model-agnostic setup for now. Keep the current hardcoded models (OpenAI for embeddings and chat; Gemini in docs/scripts for distillation). The options below remain for future reference.

⚠️ Status update (implemented): The embedding multi-provider support described in this document has since been implemented (see Option C below and PR #251). As of the current release, the plugin supports four embedding providers: OpenAI, Ollama (fully local), ONNX (fully local, no API key), and Google (Gemini API). Set embedding.provider in config; use embedding.preferredProviders for ordered fallback chains. See LLM-AND-PROVIDERS.md for configuration details. The original decision and analysis below are kept for historical context.

Current hardcoding

Area	What’s hardcoded	Where
Embeddings	OpenAI only: `OpenAI` client, `embeddings.create()`, `provider: "openai"`, model names `text-embedding-3-small` / `text-embedding-3-large`, dimensions 1536/3072	`config.ts` (schema, defaults, `vectorDimsForModel`), `index.ts` (class `Embeddings`, `new OpenAI()`)
Chat (LLM)	Single OpenAI client for: auto-classify, consolidate (merge step), summarize-when-over-budget. Model strings like `gpt-4o-mini`, `gpt-4.1-nano`	`index.ts` (`openaiClient.chat.completions.create`), `config.ts` (autoClassify.model, summarizeModel), defaults in verify/install
Distillation	Docs and scripts say “use Gemini”, cron suggests `model: "gemini"`. Runtime is already model-agnostic (`openclaw sessions spawn --model <any>`)	`SESSION-DISTILLATION.md`, `scripts/distill-sessions/*`, `SETUP-AUTONOMOUS.md`, install/cron snippets

So: embeddings and in-plugin chat are OpenAI-only; distillation is doc-level Gemini bias but the pipeline accepts any model.

What “model-agnostic” could mean

Embeddings: User can choose OpenAI, Google (Gemini), or another provider; each has its own API key and model list; dimensions stay correct per model.
Chat (classify / consolidate / summarize): User can choose OpenAI, Anthropic (Claude), or Google (Gemini) for these features, with the right API key and model id.
Distillation: Keep runtime agnostic; docs and examples should say “use any long-context model (e.g. Gemini, Claude, GPT)” and avoid a single hardcoded default.

Option A: Minimal (docs + config only)

Scope: No new code paths. Make wording and suggested config model-agnostic.

Distillation: Replace “use Gemini” with “use any long-context model (Gemini, Claude, GPT); Gemini recommended for 1M context”. Suggested cron/job: model: "<your long-context model>" or keep gemini as one example among others.
Chat: In config schema and docs, describe autoClassify.model and summarizeModel as “any chat model id your provider supports (e.g. openai/gpt-4o-mini, anthropic/claude-3-haiku, google/gemini-2.0-flash)” — only if OpenClaw already resolves these model ids when the plugin calls something. If the plugin today only has an OpenAI client, then changing docs alone doesn’t actually allow Claude/Gemini for classify/consolidate/summarize; we’d only be documenting a future or external behavior.

Effort: Small (a few doc/comment edits). Outcome: Distillation is clearly “any model”; plugin behaviour for embeddings and chat stays OpenAI-only unless we add code.

Option B: Chat (and embeddings) via OpenClaw — exploration result

Scope: Use OpenClaw’s existing model routing for chat (and optionally embeddings) so the plugin doesn’t manage providers or API keys.

Exploration (OpenClaw plugin SDK, as of 2026.2.14):

OpenClawPluginApi (the object passed to register(api)) exposes: id, name, config, pluginConfig, runtime, logger, registerTool, registerHook, registerHttpHandler, registerHttpRoute, registerChannel, registerGatewayMethod, registerCli, registerService, registerProvider, registerCommand, resolvePath, on (lifecycle hooks). There is no invokeChat, createCompletion, embed, or similar method.
api.runtime (PluginRuntime) exposes: config, system, media, tts, tools (e.g. createMemoryGetTool, createMemorySearchTool, registerMemoryCli), channel (routing, reply, discord/slack/telegram/etc.), logging, state. Again no model-invocation or embedding API.
OpenClaw internally has embedding clients (OpenAI, Gemini, Voyage) and chat/completion flows, but these are not part of the plugin API surface.

Conclusion: Option B is not available with the current SDK. Plugins cannot call OpenClaw’s chat or embedding APIs; they must use their own clients and keys. To use Option B in the future, OpenClaw would need to add something like:

api.invokeChat(modelId: string, messages: Array<{role, content}>, options?) → completion text or stream
and/or api.embed(modelId: string, text: string) (or batch) → number[]

Recommendation: If you want model-agnostic behaviour without maintaining multiple SDKs in the plugin, consider opening a feature request or PR on the OpenClaw repo for a plugin-callable model/embed API. Until then, use Option C (multi-provider inside the plugin).

Option C: Multi-provider in the plugin (embeddings + chat)

Scope: Plugin owns provider selection and (optionally) multiple API keys.

Embeddings

Config: e.g. embedding: { provider: "openai" | "google", apiKey: string, model: string }. Optional: baseURL for OpenAI-compatible endpoints.
Implementation:
- OpenAI: Keep current Embeddings class (OpenAI client, embeddings.create).
- Google: Gemini offers an OpenAI-compatible embedding endpoint. Use the same OpenAI SDK with baseURL: "https://generativelanguage.googleapis.com/v1beta/openai/", apiKey: GOOGLE_API_KEY, model e.g. text-embedding-004 or gemini-embedding-001. Add dimensions (e.g. 768 for text-embedding-004) to vectorDimsForModel.
- Anthropic: No native embedding API; they recommend Voyage. So “Claude” for embeddings could mean “use Voyage” (separate key) or we only support OpenAI + Google for embeddings and document that.
Code: One interface embed(text: string): Promise<number[]>; factory that returns OpenAI or Google (and optionally Voyage) implementation based on provider. Same vectorDimsForModel lookup for all.

Effort: Medium (config schema, factory, one new adapter, dimension map, verify/–fix messages). Rough size: on the order of 100–150 lines.

Chat (classify, consolidate, summarize)

Config: e.g. chat: { provider: "openai" | "anthropic" | "google", apiKey?: string, model: string }. If no key, could later plug into OpenClaw auth (Option B).
Implementation: Thin adapters that all expose the same shape: createCompletion(messages, options) -> content string.
- OpenAI: existing openai.chat.completions.create.
- Anthropic: @anthropic-ai/sdk, messages.create(); map to same request/response shape.
- Google: Gemini chat can be called via REST or SDK; map to same shape.
Code: Replace direct openaiClient usage in classifyBatch, runConsolidate (merge step), and summarize-when-over-budget with a call to the chosen chat adapter. One adapter per provider, selected from config.

Effort: Medium (config, 2–3 adapters, wire into 3 features, error messages). Rough size: 150–250 lines including types and errors.

Total for Option C

Embeddings: medium (OpenAI + Google; Voyage optional).
Chat: medium (OpenAI + Anthropic + Google).
Docs/config: update schema, CREDENTIALS/README/SETUP-AUTONOMOUS, verify/–fix copy.
Overall: moderate feature work, no change to core memory or storage; mainly new config and adapter layer.

Option D: OpenClaw as single source of truth (ideal long term)

Scope: Embeddings and chat both go through OpenClaw (or a shared gateway) so the plugin never holds API keys or provider logic.

Embeddings: e.g. api.embed(modelId, text) that uses OpenClaw’s embedding config and routing.
Chat: as in Option B, api.invokeChat(modelId, messages).

Then the plugin only stores model ids (e.g. openai/text-embedding-3-small, google/gemini-embedding-001, openai/gpt-4o-mini, anthropic/claude-3-haiku). Users configure providers and keys in OpenClaw once; the plugin stays model-agnostic.

Effort: Depends entirely on OpenClaw exposing these APIs. If they do: plugin change is similar to Option B + embedding entry point. If they don’t: not feasible without upstream work.

Recommendation

Short term (low effort):
- Distillation: Treat as model-agnostic in docs and scripts (Option A): “use any long-context model (e.g. Gemini, Claude, GPT); Gemini recommended for 1M context,” and avoid a single hardcoded default where possible.
- Chat/embedding: If the plugin SDK already exposes a chat (and optionally embedding) API, prefer Option B (and D for embeddings if available) so the plugin doesn’t hardcode providers.
If the plugin must remain self-contained:
- Option C is the way to support “any combination of OpenAI, Gemini, Claude”:
  - Embeddings: Add provider + Google (OpenAI-compatible endpoint); document that Claude doesn’t provide embeddings (optionally add Voyage).
  - Chat: Add chat.provider and adapters for OpenAI, Anthropic, Google; wire classify, consolidate, and summarize to the chosen adapter.
- Effort: on the order of 2–4 days for a solid implementation (config, adapters, tests, docs, verify/install defaults).
Dimension handling: Each embedding model has a fixed dimension (OpenAI 1536/3072, Gemini 768 or per-model, Voyage 1024). The plugin already has vectorDimsForModel; we’d extend it with a map for each supported model id so LanceDB and similarity logic stay correct.

Summary table

Area	Current	Option A	Option B	Option C
Distillation	Docs say Gemini	Docs say “any model”	Same	Same
Embeddings	OpenAI only	No change	Use OpenClaw if exists	OpenAI + Google (+ optional Voyage) in plugin
Chat (classify etc.)	OpenAI only	No change	Use OpenClaw if exists	OpenAI + Anthropic + Google in plugin
Effort	—	Small	Small–medium (if API exists)	Medium (2–4 days)

If you want to proceed with Option C later, the next step is to add embedding.provider and the Google embedding path (and optionally chat.provider plus one extra chat adapter), then iterate. For now we keep hardcoded models.

README — Project overview and all docs
CONFIGURATION.md — Current config reference (OpenAI-only)
ARCHITECTURE.md — System architecture overview