Productivity & TasksDocumentedScanned
mlti-llm-fallback
Multi-LLM intelligent switching.
Share:
Installation
npx clawhub@latest install mlti-llm-fallbackView the full skill documentation and source below.
Documentation
Multi-LLM - Intelligent Model Switching
Trigger Command: multi llm
Default Behavior: Always use Claude Opus 4.5 (strongest model)
Only when the message contains multi llm command will local model selection be activated.
What's New in v1.1.0
- Renamed trigger from
mlti llmtomulti llm(clearer naming) - Enhanced model existence checking with fallback chain
- Added detailed usage examples and troubleshooting
- Improved task detection patterns
Usage
Default Mode (without command)
Help me write a Python function -> Uses Claude Opus 4.5
Analyze this code -> Uses Claude Opus 4.5
Multi-Model Mode (with command)
multi llm Help me write a Python function -> Selects qwen2.5-coder:32b
multi llm Analyze this math proof -> Selects deepseek-r1:70b
multi llm Translate to Chinese -> Selects glm4:9b
Command Format
| Command | Description |
multi llm | Activate intelligent model selection |
multi llm coding | Force coding model |
multi llm reasoning | Force reasoning model |
multi llm chinese | Force Chinese model |
multi llm general | Force general model |
Model Mapping
Primary Model (Default): github-copilot/claude-opus-4.5
Local Models (when multi llm triggered):
| Task Type | Model | Size | Best For |
| Coding | qwen2.5-coder:32b | 19GB | Code generation, debugging, refactoring |
| Reasoning | deepseek-r1:70b | 42GB | Math, logic, complex analysis |
| Chinese | glm4:9b | 5.5GB | Translation, summaries, quick tasks |
| General | qwen3:32b | 20GB | General purpose, fallback |
Fallback Chain
If the selected model is unavailable, the system tries alternatives:
Coding: qwen2.5-coder:32b -> qwen2.5-coder:14b -> qwen3:32b
Reasoning: deepseek-r1:70b -> deepseek-r1:32b -> qwen3:32b
Chinese: glm4:9b -> qwen3:8b -> qwen3:32b
General: qwen3:32b -> qwen3:14b -> qwen3:8b
Detection Logic
User Input
|
v
Contains "multi llm"?
|
+-- No -> Use Claude Opus 4.5 (default)
|
+-- Yes -> Task Type Detection
|
+-------+-------+-------+
v v v v
Coding Reasoning Chinese General
| | | |
v v v v
qwen2.5 deepseek glm4 qwen3
coder r1:70b :9b :32b
Task Detection Keywords
| Category | Keywords (EN) | Keywords (CN) |
| Coding | code, debug, function, script, api, bug, refactor, python, java, javascript | 代码, 编程, 函数, 调试, 重构 |
| Reasoning | analysis, proof, logic, math, solve, algorithm, evaluate | 推理, 分析, 证明, 逻辑, 数学, 计算, 算法 |
| Chinese | translate, summary | 翻译, 总结, 摘要, 简单, 快速 |
Examples
Example 1: Coding Task
# Input
multi llm Write a Python function to calculate fibonacci
# Output
Selected: qwen2.5-coder:32b
Reason: Detected coding task (keywords: python, function)
Example 2: Math Analysis
# Input
multi llm reasoning Prove that sqrt(2) is irrational
# Output
Selected: deepseek-r1:70b
Reason: Force command 'reasoning' used
Example 3: Quick Translation
# Input
multi llm 把这段话翻译成英文
# Output
Selected: glm4:9b
Reason: Detected Chinese lightweight task (keywords: 翻译)
Example 4: Default (No trigger)
# Input
Write a REST API with authentication
# Output
Selected: claude-opus-4.5
Reason: Default model (no 'multi llm' trigger)
Prerequisites
# Install Ollama
curl -fsSL | sh
# Start Ollama service
ollama serve
# Pull required models
ollama pull qwen2.5-coder:32b
ollama pull deepseek-r1:70b
ollama pull glm4:9b
ollama pull qwen3:32b
ollama list
Troubleshooting
Model not found
# Check if model exists
ollama list | grep "qwen2.5-coder"
# Pull missing model
ollama pull qwen2.5-coder:32b
Ollama not running
# Check service status
curl -s
# Start Ollama
ollama serve &
Slow response
- Large models (70b) require significant RAM/VRAM
- Consider using smaller variants:
deepseek-r1:32binstead of70b
Wrong model selected
- Use force commands:
multi llm coding,multi llm reasoning - Check if keywords match your task type
Files in This Skill
multi-llm/
├── SKILL.md # This documentation
└── scripts/
├── select-model.sh # Model selection logic
└── fallback-demo.sh # Interactive demo script
Integration
With OpenCode/ClaudeCode
The trigger multi llm is detected in your message. Simply prefix your request:
multi llm [your request here]
Programmatic Usage
# Get recommended model for a task
./scripts/select-model.sh "multi llm write a sorting algorithm"
# Output: qwen2.5-coder:32b
# Demo with actual model call
./scripts/fallback-demo.sh --force-local "explain recursion"
Author
- GitHub: [@leohan123123]()