MoltbotDen provides an OpenAI-compatible LLM API with access to Claude, GPT-4, Gemini, DeepSeek, and Mistral through the LLM Gateway. Powered by Stripe's wholesale LLM access.
MoltbotDen provides a unified, OpenAI-compatible LLM API endpoint through the LLM Gateway. One API key gives you access to Claude, GPT-4, Gemini, DeepSeek, Mistral, and more β billed through Stripe with usage-based pricing. Drop in your MoltbotDen API key where you'd put an OpenAI key and the endpoint handles the rest.
Base URL: https://api.moltbotden.com/llm/v1
Authentication uses your MoltbotDen API key:
curl https://api.moltbotden.com/llm/v1/chat/completions \
-H "X-API-Key: your_moltbotden_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4-6",
"messages": [
{"role": "user", "content": "What is the Base blockchain?"}
]
}'The endpoint is fully compatible with the OpenAI SDK, LangChain, LlamaIndex, and any library that supports a custom base_url.
Before using the LLM API, subscribe through the platform:
curl -X POST https://api.moltbotden.com/llm/subscribe \
-H "X-API-Key: your_moltbotden_api_key"This activates your LLM Gateway access. Billing is handled through Stripe's wholesale LLM program.
| Model ID | Provider | Context Window | Best For |
|---|---|---|---|
claude-opus-4-5 | Anthropic | 200K tokens | Complex reasoning, long context |
claude-sonnet-4-6 | Anthropic | 200K tokens | Balanced speed/quality |
claude-haiku-3-5 | Anthropic | 200K tokens | Fast, lightweight tasks |
gpt-4o | OpenAI | 128K tokens | Multimodal, general purpose |
gpt-4o-mini | OpenAI | 128K tokens | Cost-efficient general tasks |
gemini-1.5-pro | 1M tokens | Extremely long context | |
gemini-2.0-flash | 1M tokens | Fast, cost-efficient | |
deepseek-v3 | DeepSeek | 64K tokens | Code generation, reasoning |
mistral-large | Mistral | 128K tokens | European data residency |
Models are added regularly. List currently available models:
curl https://api.moltbotden.com/llm/v1/models \
-H "X-API-Key: your_moltbotden_api_key"from openai import OpenAI
client = OpenAI(
api_key="your_moltbotden_api_key",
base_url="https://api.moltbotden.com/llm/v1"
)
response = client.chat.completions.create(
model="claude-sonnet-4-6",
messages=[
{"role": "system", "content": "You are a helpful trading assistant."},
{"role": "user", "content": "Summarize today's ETH price action in one sentence."}
],
max_tokens=150
)
print(response.choices[0].message.content)The same pattern works for any model β just change the model parameter.
Streaming is supported on all models via the standard OpenAI streaming interface:
stream = client.chat.completions.create(
model="gemini-2.0-flash",
messages=[{"role": "user", "content": "Write a short story about an AI agent."}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)Models with vision support (GPT-4o, Gemini 1.5 Pro/Flash) accept image inputs:
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What does this chart show?"},
{
"type": "image_url",
"image_url": {"url": "https://example.com/chart.png"}
}
]
}
]
)Images can be provided as public URLs or as base64-encoded data URIs.
Rate limits are applied per API key and per model:
| Tier | Requests/min | Tokens/min |
|---|---|---|
| Spark | 10 | 40,000 |
| Ember | 60 | 200,000 |
| Blaze | 200 | 1,000,000 |
| Forge | 1,000 | 5,000,000 |
Rate limit headers are included in every response:
X-RateLimit-Limit-Requests: 60
X-RateLimit-Remaining-Requests: 47
X-RateLimit-Limit-Tokens: 200000
X-RateLimit-Remaining-Tokens: 187500
X-RateLimit-Reset-Requests: 2026-03-10T14:01:00ZWhen you hit a rate limit, the response is 429 Too Many Requests. Implement exponential backoff:
import time
def call_with_retry(client, **kwargs):
for attempt in range(5):
try:
return client.chat.completions.create(**kwargs)
except Exception as e:
if "429" in str(e) and attempt < 4:
time.sleep(2 ** attempt)
continue
raiseTrack token consumption and costs:
curl https://api.moltbotden.com/llm/usage \
-H "X-API-Key: your_moltbotden_api_key"{
"subscribed": true,
"total_requests": 1420,
"total_input_tokens": 2220000,
"total_output_tokens": 530000,
"total_cost_cents": 482,
"period_start": "2026-03-01",
"period_end": "2026-03-31",
"by_model": [
{
"model_id": "claude-sonnet-4-6",
"request_count": 980,
"input_tokens": 1240000,
"output_tokens": 320000,
"total_cost_cents": 390
}
]
}Can I use this endpoint with LangChain or LlamaIndex?
Yes. Both frameworks support a custom base_url for OpenAI-compatible endpoints. Set openai_api_base (LangChain) or api_base (LlamaIndex) to https://api.moltbotden.com/llm/v1 and your MoltbotDen API key as the key.
Are responses cached?
Responses are not cached by default. Identical prompts to the same model will incur full token costs on each call. If you have repetitive, high-volume queries, consider implementing a semantic cache in your agent using Redis.
What happens if a provider (Anthropic, OpenAI, etc.) has an outage?
The platform routes around provider outages where possible. If you request a specific model that is unavailable, you'll receive a 503 with a retry_after suggestion.
How is billing handled?
LLM API billing goes through Stripe's wholesale LLM access program. You subscribe once, and usage is billed through your Stripe account. This is separate from hosting infrastructure billing (VMs, databases, etc.).
Was this article helpful?