doubleword
Create and manage batch inference jobs using the Doubleword API (api.doubleword.ai).
Installation
npx clawhub@latest install doublewordView the full skill documentation and source below.
Documentation
Doubleword Batch Inference
Process multiple AI inference requests asynchronously using the Doubleword batch API with high throughput and low cost.
Prerequisites
Before submitting batches, you need:
When to Use Batches
Batches are ideal for:
- Multiple independent requests that can run simultaneously
- Workloads that don't require immediate responses
- Large volumes that would exceed rate limits if sent individually
- Cost-sensitive workloads (24h window = 50-60% cheaper than realtime)
- Tool calling and structured output generation at scale
Available Models & Pricing
Pricing is per 1 million tokens (input / output):
Qwen3-VL-30B-A3B-Instruct-FP8 (mid-size):
- Realtime SLA: $0.16 / $0.80
- 1-hour SLA: $0.07 / $0.30 (56% cheaper)
- 24-hour SLA: $0.05 / $0.20 (69% cheaper)
Qwen3-VL-235B-A22B-Instruct-FP8 (flagship):
- Realtime SLA: $0.60 / $1.20
- 1-hour SLA: $0.15 / $0.55 (75% cheaper)
- 24-hour SLA: $0.10 / $0.40 (83% cheaper)
- Supports up to 262K total tokens, 16K new tokens per request
Cost estimation: Upload files to the Doubleword Console to preview expenses before submitting.
Quick Start
Two ways to submit batches:
Via API:
Via Web Console:
Workflow
Step 1: Create Batch Request File
Create a .jsonl file where each line contains a complete, valid JSON object with no line breaks within the object:
{"custom_id": "req-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "anthropic/claude-3-5-sonnet", "messages": [{"role": "user", "content": "What is 2+2?"}]}}
{"custom_id": "req-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "anthropic/claude-3-5-sonnet", "messages": [{"role": "user", "content": "What is the capital of France?"}]}}
Required fields per line:
custom_id: Unique identifier (max 64 chars) - use descriptive IDs like"user-123-question-5"for easier result mappingmethod: Always"POST"url: API endpoint -"/v1/chat/completions"or"/v1/embeddings"body: Standard API request withmodelandmessages
Optional body parameters:
temperature: 0-2 (default: 1.0)max_tokens: Maximum response tokenstop_p: Nucleus sampling parameterstop: Stop sequencestools: Tool definitions for tool calling (see Tool Calling section)response_format: JSON schema for structured outputs (see Structured Outputs section)
File requirements:
- Max size: 200MB
- Format: JSONL only (JSON Lines - newline-delimited JSON)
- Each line must be valid JSON with no internal line breaks
- No duplicate
custom_idvalues - Split large batches into multiple files if needed
Common pitfalls:
- Line breaks within JSON objects (will cause parsing errors)
- Invalid JSON syntax
- Duplicate
custom_idvalues
Helper script:
Use
scripts/create_batch_file.py to generate JSONL files programmatically:
python scripts/create_batch_file.py output.jsonl
Modify the script's requests list to generate your specific batch requests.
Step 2: Upload File
Via API:
curl \
-H "Authorization: Bearer $DOUBLEWORD_API_KEY" \
-F purpose="batch" \
-F file="@batch_requests.jsonl"
Via Console:
Upload through the Batches section at
Response contains id field - save this file ID for next step.
Step 3: Create Batch
Via API:
curl \
-H "Authorization: Bearer $DOUBLEWORD_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input_file_id": "file-abc123",
"endpoint": "/v1/chat/completions",
"completion_window": "24h"
}'
Via Console:
Configure batch settings in the web interface.
Parameters:
input_file_id: File ID from upload stependpoint: API endpoint ("/v1/chat/completions"or"/v1/embeddings")completion_window: Choose based on urgency and budget:
"24h": Best pricing, results within 24 hours (typically faster)-
"1h": 50% price premium, results within 1 hour (typically faster)- Realtime: Limited capacity, highest cost (batch service optimized for async)
Response contains batch id - save this for status polling.
Before submitting, verify:
- You have access to the specified model
- Your API key is active
- You have sufficient account credits
Step 4: Poll Status
Via API:
curl \
-H "Authorization: Bearer $DOUBLEWORD_API_KEY"
Via Console:
Monitor real-time progress in the Batches dashboard.
Status progression:
validating - Checking input file formatin_progress - Processing requestscompleted - All requests finishedOther statuses:
failed- Batch failed (checkerror_file_id)expired- Batch timed outcancelling/cancelled- Batch cancelled
Response includes:
output_file_id- Download results hereerror_file_id- Failed requests (if any)request_counts- Total/completed/failed counts
Polling frequency: Check every 30-60 seconds during processing.
Early access: Results available via output_file_id before batch fully completes - check X-Incomplete header.
Step 5: Download Results
Via API:
curl \
-H "Authorization: Bearer $DOUBLEWORD_API_KEY" \
> results.jsonl
Via Console:
Download results directly from the Batches dashboard.
Response headers:
X-Incomplete: true- Batch still processing, more results comingX-Last-Line: 45- Resume point for partial downloads
Output format (each line):
{
"id": "batch-req-abc",
"custom_id": "request-1",
"response": {
"status_code": 200,
"body": {
"id": "chatcmpl-xyz",
"choices": [{
"message": {
"role": "assistant",
"content": "The answer is 4."
}
}]
}
}
}
Download errors (if any):
curl \
-H "Authorization: Bearer $DOUBLEWORD_API_KEY" \
> errors.jsonl
Error format (each line):
{
"id": "batch-req-def",
"custom_id": "request-2",
"error": {
"code": "invalid_request",
"message": "Missing required parameter"
}
}
Tool Calling in Batches
Tool calling (function calling) enables models to intelligently select and use external tools. Doubleword maintains full OpenAI compatibility.
Example batch request with tools:
{
"custom_id": "tool-req-1",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": "anthropic/claude-3-5-sonnet",
"messages": [{"role": "user", "content": "What's the weather in Paris?"}],
"tools": [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}
}]
}
}
Use cases:
- Agents that interact with APIs at scale
- Fetching real-time information for multiple queries
- Executing actions through standardized tool definitions
Structured Outputs in Batches
Structured outputs guarantee that model responses conform to your JSON Schema, eliminating issues with missing fields or invalid enum values.
Example batch request with structured output:
{
"custom_id": "structured-req-1",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": "anthropic/claude-3-5-sonnet",
"messages": [{"role": "user", "content": "Extract key info from: John Doe, 30 years old, lives in NYC"}],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "person_info",
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"},
"city": {"type": "string"}
},
"required": ["name", "age", "city"]
}
}
}
}
}
Benefits:
- Guaranteed schema compliance
- No missing required keys
- No hallucinated enum values
- Seamless OpenAI compatibility
autobatcher: Automatic Batching
autobatcher is a Python client that automatically converts individual API calls into batched requests, reducing costs without code changes.
Installation:
pip install autobatcher
How it works:
Key benefit: Significant cost reduction through automatic batching while writing normal async code using the familiar OpenAI interface.
Documentation:
Additional Operations
List All Batches
Via API:
curl \
-H "Authorization: Bearer $DOUBLEWORD_API_KEY"
Via Console:
View all batches in the dashboard.
Cancel Batch
Via API:
curl \
-X POST \
-H "Authorization: Bearer $DOUBLEWORD_API_KEY"
Via Console:
Click cancel in the batch details view.
Notes:
- Unprocessed requests are cancelled
- Already-processed results remain downloadable
- Only charged for completed work
- Cannot cancel completed batches
Common Patterns
Processing Results
Parse JSONL output line-by-line:
import json
with open('results.jsonl') as f:
for line in f:
result = json.loads(line)
custom_id = result['custom_id']
content = result['response']['body']['choices'][0]['message']['content']
print(f"{custom_id}: {content}")
Handling Partial Results
Check for incomplete batches and resume:
import requests
response = requests.get(
'',
headers={'Authorization': f'Bearer {api_key}'}
)
if response.headers.get('X-Incomplete') == 'true':
last_line = int(response.headers.get('X-Last-Line', 0))
print(f"Batch incomplete. Processed {last_line} requests so far.")
# Continue polling and download again later
Retry Failed Requests
Extract failed requests from error file and resubmit:
import json
failed_ids = []
with open('errors.jsonl') as f:
for line in f:
error = json.loads(line)
failed_ids.append(error['custom_id'])
print(f"Failed requests: {failed_ids}")
# Create new batch with only failed requests
Processing Tool Calls
Handle tool call responses:
import json
with open('results.jsonl') as f:
for line in f:
result = json.loads(line)
message = result['response']['body']['choices'][0]['message']
if message.get('tool_calls'):
for tool_call in message['tool_calls']:
print(f"Tool: {tool_call['function']['name']}")
print(f"Args: {tool_call['function']['arguments']}")
Best Practices
- Good:
"user-123-question-5", "dataset-A-row-42"- Bad:
"1", "req1"
custom_id must be unique within the batch24h for cost savings (50-83% cheaper), 1h only when time-sensitiveerror_file_id and retry failed requestscompleted/total ratioReference Documentation
For complete API details, see:
- API Reference:
references/api_reference.md- Full endpoint documentation and schemas - Getting Started Guide:
references/getting_started.md- Detailed setup and account management - Pricing Details:
references/pricing.md- Model costs and SLA comparison