docstrange
Document extraction API by Nanonets.
Installation
npx clawhub@latest install docstrangeView the full skill documentation and source below.
Documentation
DocStrange by Nanonets
Document extraction API — convert PDFs, images, and documents to markdown, JSON, or CSV with per-field confidence scoring.
Get your API key:
Quick Start
curl -X POST "" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
-F "file=@document.pdf" \
-F "output_format=markdown"
Response:
{
"success": true,
"record_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"result": {
"markdown": {
"content": "# Invoice\n\n**Invoice Number:** INV-2024-001..."
}
}
}
Setup
1. Get Your API Key
# Visit the dashboard
Save your API key:
export DOCSTRANGE_API_KEY="your_api_key_here"
2. OpenClaw Configuration (Optional)
Add to your ~/.openclaw/openclaw.json:
{
skills: {
entries: {
"docstrange": {
enabled: true,
apiKey: "your_api_key_here",
env: {
DOCSTRANGE_API_KEY: "your_api_key_here",
},
},
},
},
}
Common Tasks
Extract to Markdown
curl -X POST "" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
-F "file=@document.pdf" \
-F "output_format=markdown"
Access content: response["result"]["markdown"]["content"]
Extract JSON Fields
Simple field list:
curl -X POST "" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
-F "file=@invoice.pdf" \
-F "output_format=json" \
-F 'json_options=["invoice_number", "date", "total_amount", "vendor"]' \
-F "include_metadata=confidence_score"
With JSON schema:
curl -X POST "" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
-F "file=@invoice.pdf" \
-F "output_format=json" \
-F 'json_options={"type": "object", "properties": {"invoice_number": {"type": "string"}, "total_amount": {"type": "number"}}}'
Response with confidence scores:
{
"result": {
"json": {
"content": {
"invoice_number": "INV-2024-001",
"total_amount": 500.00
},
"metadata": {
"confidence_score": {
"invoice_number": 98,
"total_amount": 99
}
}
}
}
}
Extract Tables to CSV
curl -X POST "" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
-F "file=@table.pdf" \
-F "output_format=csv" \
-F "csv_options=table"
Async Extraction (Large Documents)
For documents >5 pages, use async and poll:
Queue the document:
curl -X POST "" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
-F "file=@large-document.pdf" \
-F "output_format=markdown"
# Returns: {"record_id": "12345", "status": "processing"}
Poll for results:
curl -X GET "" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY"
# Returns: {"status": "completed", "result": {...}}
Advanced Features
Bounding Boxes
Get element coordinates for layout analysis:-F "include_metadata=bounding_boxes"
Hierarchy Output
Extract document structure (sections, tables, key-value pairs):-F "json_options=hierarchy_output"
Financial Documents Mode
Enhanced table and number formatting:-F "markdown_options=financial-docs"
Custom Instructions
Guide extraction with prompts:-F "custom_instructions=Focus on financial data. Ignore headers."
-F "prompt_mode=append"
Multiple Formats
Request multiple formats in one call:-F "output_format=markdown,json"
When to Use
Use DocStrange For:
- Invoice and receipt processing
- Contract text extraction
- Bank statement parsing
- Form digitization
- Image OCR (scanned documents)
Don't Use For:
- Documents >5 pages with sync (use async)
- Video/audio transcription
- Non-document images
Best Practices
| Document Size | Endpoint | Notes |
| <=5 pages | /extract/sync | Immediate response |
| >5 pages | /extract/async | Poll for results |
- Field list:
["field1", "field2"]— quick extractions - JSON schema:
{"type": "object", ...}— strict typing, nested data
- Add
include_metadata=confidence_score - Scores are 0-100 per field
- Review fields <80 manually
Schema Templates
Invoice
{
"type": "object",
"properties": {
"invoice_number": {"type": "string"},
"date": {"type": "string"},
"vendor": {"type": "string"},
"total": {"type": "number"},
"line_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": {"type": "string"},
"quantity": {"type": "number"},
"price": {"type": "number"}
}
}
}
}
}
Receipt
{
"type": "object",
"properties": {
"merchant": {"type": "string"},
"date": {"type": "string"},
"total": {"type": "number"},
"items": {
"type": "array",
"items": {"type": "object", "properties": {"name": {"type": "string"}, "price": {"type": "number"}}}
}
}
}
Troubleshooting
400 Bad Request:
- Provide exactly one input:
file,file_url, orfile_base64 - Verify API key is valid
Sync Timeout:
- Use async for documents >5 pages
- Poll
/extract/results/{record_id}
Missing Confidence Scores:
- Requires
json_options(field list or schema) - Add
include_metadata=confidence_score
References
- API Docs:
- Get API Key: