Models from Foundry Models from partners and community. These are third-party models deployable in Foundry (often via Azure Marketplace); availability and billing depend on the provider and your subscription region.
Anthropic (via Foundry)
When using Claude through Azure Foundry (Marketplace), these specs apply. Preview; requires paid Azure subscription and region where Anthropic offers the offer.
Model
Context window
Max output
Notes
claude-opus-4-6 (Preview)
1,000,000
128,000
Text, image, code in/out. Tool calling.
claude-sonnet-4-6 (Preview)
1,000,000
128,000
Text, image, code in/out. Tool calling.
claude-opus-4-5 (Preview)
200,000
64,000
Text, image, code.
claude-opus-4-1 (Preview)
200,000
32,000
Text, image, code.
claude-sonnet-4-5 (Preview)
200,000
64,000
Text, image, code.
claude-haiku-4-5 (Preview)
200,000
64,000
Text, image. Tool calling.
Cohere (partners)
Model
Input (tokens)
Output (tokens)
Type
Cohere-command-r-plus-08-2024
131,072
4,096
Chat, tool calling.
Cohere-command-r-08-2024
131,072
4,096
Chat, tool calling.
Cohere-embed-v3-english
512
Vector 1024 dim.
Embeddings, English.
Cohere-embed-v3-multilingual
512
Vector 1024 dim.
Embeddings, multilingual.
Meta (partners)
Model
Input (tokens)
Output (tokens)
Notes
Llama-3.2-11B-Vision-Instruct
128,000 (text + image)
8,192
Vision.
Llama-3.2-90B-Vision-Instruct
128,000 (text + image)
8,192
Vision.
Meta-Llama-3.1-405B-Instruct
131,072
8,192
Text.
Meta-Llama-3.1-8B-Instruct
131,072
8,192
Text.
Llama-4-Scout-17B-16E-Instruct
128,000 (text + image)
8,192
Vision.
Microsoft (partners — Phi)
Model
Input (tokens)
Output (tokens)
Notes
Phi-4-mini-instruct
131,072
4,096
Multilingual.
Phi-4-multimodal-instruct
131,072 (text, images, audio)
4,096
Multimodal.
Phi-4
16,384
16,384
Text.
Phi-4-reasoning
32,768
32,768
Reasoning content.
Phi-4-mini-reasoning
128,000
128,000
Reasoning content.
Mistral AI (partners)
Model
Input (tokens)
Output (tokens)
Notes
Codestral-2501
262,144
4,096
Code.
Ministral-3B
131,072
4,096
Tool calling.
Mistral-small-2503
32,768
4,096
Tool calling.
Mistral-medium-2505
128,000 (+ image)
128,000
Multimodal.
Stability AI (partners)
Image generation only: Stable Diffusion 3.5 Large (text + image in), Stable Image Core, Stable Image Ultra (text in). See Foundry Models from partners for details.
5. Other providers (used with hybrid-memory)
Models commonly used in llm.nano / llm.default / llm.heavy. Version and training data are from public provider docs; we update as we verify.
Anthropic Claude
When using Claude via Anthropic API (not Azure Foundry). For Foundry/Marketplace deployment, see §4 Foundry models from partners (e.g. 1M context, 128k output for opus-4-6 / sonnet-4-6).
When using openai provider with api.openai.com (not Azure), same model names apply; limits align with Azure where the model is the same. See Azure tables above for gpt-4.1, gpt-4o, o3, etc.
Ollama / local
Model ID (example)
Context / output
Notes
qwen3:8b
Depends on run
Local; thinking mode supported.
Other (Llama, Mistral, Phi, …)
Varies
No central doc; set per deployment.
6. How we use this in the plugin
Tier choice:LLM-AND-PROVIDERS.md and FEATURES-AND-TIERS.md recommend models by tier (nano / default / heavy). Use this reference to check context and max output when you need long context (e.g. distillation → heavy with 200k+ or 1M context).
Config: OpenClaw gateway or plugin can set per-model contextWindow and maxTokens in provider/model config; this doc is the source of truth for Azure where the API does not expose them.
Plugin catalog: The memory-hybrid plugin uses extensions/memory-hybrid/services/model-capabilities.ts for distillBatchTokenLimit and distillMaxOutputTokens (and optionally getContextWindow). That catalog is kept in sync with this doc; add new models there when you add them here.
Embeddings: When switching embedding model (e.g. text-embedding-3-small → text-embedding-3-large), dimensions change (1,536 → 3,072); re-embed and re-index. See Embedding providers and config for embedding.dimensions.
Added Foundry models from partners and community: Anthropic (Foundry), Cohere, Meta, Microsoft (Phi), Mistral AI, Stability AI; source models-from-partners.
2026-03-19
Plugin: added services/model-capabilities.ts with per-model context window, max output tokens, and batch token limit for distill; chat.ts now uses it for distillBatchTokenLimit and distillMaxOutputTokens. Deploy: added openclaw.model-tokens-snippet.json for OpenClaw config.
2026-03-20
Google: default nano/fallback model switched from deprecated gemini-2.0-flash-lite (404) to gemini-2.5-flash-lite. See Gemini deprecations.