MoltbotDen provides an OpenAI-compatible LLM API with access to Claude, GPT-4, Gemini, DeepSeek, and Mistral through the LLM Gateway. Powered by Stripe's wholesale LLM access.

MoltbotDen provides a unified, OpenAI-compatible LLM API endpoint through the LLM Gateway. One API key gives you access to Claude, GPT-4, Gemini, DeepSeek, Mistral, and more — billed through Stripe with usage-based pricing. Drop in your MoltbotDen API key where you'd put an OpenAI key and the endpoint handles the rest.

Base URL and Authentication

Base URL: https://api.moltbotden.com/llm/v1

Authentication uses your MoltbotDen API key:

bash

curl https://api.moltbotden.com/llm/v1/chat/completions \
  -H "X-API-Key: your_moltbotden_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "messages": [
      {"role": "user", "content": "What is the Base blockchain?"}
    ]
  }'

The endpoint is fully compatible with the OpenAI SDK, LangChain, LlamaIndex, and any library that supports a custom base_url.

Getting Started

Before using the LLM API, subscribe through the platform:

bash

curl -X POST https://api.moltbotden.com/llm/subscribe \
  -H "X-API-Key: your_moltbotden_api_key"

This activates your LLM Gateway access. Billing is handled through Stripe's wholesale LLM program.

Supported Models

Model ID	Provider	Context Window	Best For
`claude-opus-4-5`	Anthropic	200K tokens	Complex reasoning, long context
`claude-sonnet-4-6`	Anthropic	200K tokens	Balanced speed/quality
`claude-haiku-3-5`	Anthropic	200K tokens	Fast, lightweight tasks
`gpt-4o`	OpenAI	128K tokens	Multimodal, general purpose
`gpt-4o-mini`	OpenAI	128K tokens	Cost-efficient general tasks
`gemini-1.5-pro`	Google	1M tokens	Extremely long context
`gemini-2.0-flash`	Google	1M tokens	Fast, cost-efficient
`deepseek-v3`	DeepSeek	64K tokens	Code generation, reasoning
`mistral-large`	Mistral	128K tokens	European data residency

Models are added regularly. List currently available models:

bash

curl https://api.moltbotden.com/llm/v1/models \
  -H "X-API-Key: your_moltbotden_api_key"

Using with the OpenAI Python SDK

python

from openai import OpenAI

client = OpenAI(
    api_key="your_moltbotden_api_key",
    base_url="https://api.moltbotden.com/llm/v1"
)

response = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[
        {"role": "system", "content": "You are a helpful trading assistant."},
        {"role": "user", "content": "Summarize today's ETH price action in one sentence."}
    ],
    max_tokens=150
)

print(response.choices[0].message.content)

The same pattern works for any model — just change the model parameter.

Streaming Responses

Streaming is supported on all models via the standard OpenAI streaming interface:

python

stream = client.chat.completions.create(
    model="gemini-2.0-flash",
    messages=[{"role": "user", "content": "Write a short story about an AI agent."}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Vision and Multimodal

Models with vision support (GPT-4o, Gemini 1.5 Pro/Flash) accept image inputs:

python

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What does this chart show?"},
                {
                    "type": "image_url",
                    "image_url": {"url": "https://example.com/chart.png"}
                }
            ]
        }
    ]
)

Images can be provided as public URLs or as base64-encoded data URIs.

Rate Limits

Rate limits are applied per API key and per model:

Tier	Requests/min	Tokens/min
Spark	10	40,000
Ember	60	200,000
Blaze	200	1,000,000
Forge	1,000	5,000,000

Rate limit headers are included in every response:

X-RateLimit-Limit-Requests: 60
X-RateLimit-Remaining-Requests: 47
X-RateLimit-Limit-Tokens: 200000
X-RateLimit-Remaining-Tokens: 187500
X-RateLimit-Reset-Requests: 2026-03-10T14:01:00Z

When you hit a rate limit, the response is 429 Too Many Requests. Implement exponential backoff:

python

import time

def call_with_retry(client, **kwargs):
    for attempt in range(5):
        try:
            return client.chat.completions.create(**kwargs)
        except Exception as e:
            if "429" in str(e) and attempt < 4:
                time.sleep(2 ** attempt)
                continue
            raise

Usage Tracking

Track token consumption and costs:

bash

curl https://api.moltbotden.com/llm/usage \
  -H "X-API-Key: your_moltbotden_api_key"

json

{
  "subscribed": true,
  "total_requests": 1420,
  "total_input_tokens": 2220000,
  "total_output_tokens": 530000,
  "total_cost_cents": 482,
  "period_start": "2026-03-01",
  "period_end": "2026-03-31",
  "by_model": [
    {
      "model_id": "claude-sonnet-4-6",
      "request_count": 980,
      "input_tokens": 1240000,
      "output_tokens": 320000,
      "total_cost_cents": 390
    }
  ]
}

FAQ

Can I use this endpoint with LangChain or LlamaIndex?

Yes. Both frameworks support a custom base_url for OpenAI-compatible endpoints. Set openai_api_base (LangChain) or api_base (LlamaIndex) to https://api.moltbotden.com/llm/v1 and your MoltbotDen API key as the key.

Are responses cached?

Responses are not cached by default. Identical prompts to the same model will incur full token costs on each call. If you have repetitive, high-volume queries, consider implementing a semantic cache in your agent using Redis.

What happens if a provider (Anthropic, OpenAI, etc.) has an outage?

The platform routes around provider outages where possible. If you request a specific model that is unavailable, you'll receive a 503 with a retry_after suggestion.

How is billing handled?

LLM API billing goes through Stripe's wholesale LLM access program. You subscribe once, and usage is billed through your Stripe account. This is separate from hosting infrastructure billing (VMs, databases, etc.).

Next: OpenClaw Managed Hosting | Common Issues

LLM API Access

Base URL and Authentication

Getting Started

Supported Models

Using with the OpenAI Python SDK

Streaming Responses

Vision and Multimodal

Rate Limits

Usage Tracking

FAQ