tabstack-extractor
Extract structured data from websites using Tabstack API.
Installation
npx clawhub@latest install tabstack-extractorView the full skill documentation and source below.
Documentation
Tabstack Extractor
Overview
This skill enables structured data extraction from websites using the Tabstack API. It's ideal for web scraping tasks where you need consistent, schema-based data extraction from job boards, news sites, product pages, or any structured content.
Quick Start
1. Install Babashka (if needed)
# Option A: From GitHub (recommended for sharing)
curl -s | bash
# Option B: From Nix
nix-shell -p babashka
# Option C: From Homebrew
brew install borkdude/brew/babashka
2. Set up API Key
Option A: Environment variable (recommended)
export TABSTACK_API_KEY="your_api_key_here"
Option B: Configuration file
mkdir -p ~/.config/tabstack
echo '{:api-key "your_api_key_here"}' > ~/.config/tabstack/config.edn
Get an API key: Sign up at [Tabstack Console]()
3. Test Connection
bb scripts/tabstack.clj test
4. Extract Markdown (Simple)
bb scripts/tabstack.clj markdown ""
5. Extract JSON (Start Simple)
# Start with simple schema (fast, reliable)
bb scripts/tabstack.clj json "" references/simple_article.json
# Try more complex schemas (may be slower)
bb scripts/tabstack.clj json "" references/news_schema.json
6. Advanced Features
# Extract with retry logic (3 retries, 1s delay)
bb scripts/tabstack.clj json-retry "" references/simple_article.json
# Extract with caching (24-hour cache)
bb scripts/tabstack.clj json-cache "" references/simple_article.json
# Batch extract from URLs file
echo "" > urls.txt
echo "" >> urls.txt
bb scripts/tabstack.clj batch urls.txt references/simple_article.json
Core Capabilities
1. Markdown Extraction
Extract clean, readable markdown from any webpage. Useful for content analysis, summarization, or archiving.When to use: When you need the textual content of a page without the HTML clutter.
Example use cases:
- Extract article content for summarization
- Archive webpage content
- Analyze blog post content
2. JSON Schema Extraction
Extract structured data using JSON schemas. Define exactly what data you want and get it in a consistent format.
When to use: When scraping job listings, product pages, news articles, or any structured data.
Example use cases:
- Scrape job listings from BuiltIn/LinkedIn
- Extract product details from e-commerce sites
- Gather news articles with consistent metadata
3. Schema Templates
Pre-built schemas for common scraping tasks. See
references/ directory for templates.
Available schemas:
- Job listing schema (see
references/job_schema.json) - News article schema
- Product page schema
- Contact information schema
Workflow: Job Scraping Example
Follow this workflow to scrape job listings:
references/job_schema.json or customizeExample job schema:
{
"type": "object",
"properties": {
"title": {"type": "string"},
"company": {"type": "string"},
"location": {"type": "string"},
"description": {"type": "string"},
"salary": {"type": "string"},
"apply_url": {"type": "string"},
"posted_date": {"type": "string"},
"requirements": {"type": "array", "items": {"type": "string"}}
}
}
Integration with Other Skills
Combine with Web Search
web_search to find relevant URLsCombine with Browser Automation
browser tool to navigate complex sitesError Handling
Common issues and solutions:
TABSTACK_API_KEY environment variableResources
scripts/
tabstack.clj- Main API wrapper in Babashka (recommended, has retry logic, caching, batch processing)tabstack_curl.sh- Bash/curl fallback (simple, no dependencies)tabstack_api.py- Python API wrapper (requires requests module)
references/
job_schema.json- Template schema for job listingsapi_reference.md- Tabstack API documentation
Best Practices
Teaching Focus: How to Create Schemas
This skill is designed to teach agents how to use Tabstack API effectively. The key is learning to create appropriate JSON schemas for different websites.
Learning Path
references/simple_article.json (4 basic fields)See Schema Creation Guide for detailed instructions and examples.
Common Mistakes to Avoid
- Over-complex schemas - Start with 2-3 fields, not 20
- Missing fields - Don't require fields that don't exist on the page
- No testing - Always test with example.com first, then target sites
- Ignoring timeouts - Complex schemas take longer (45s timeout)
Babashka Advantages
Using Babashka for this skill provides:
Example User Requests
For this skill to trigger:
- "Scrape job listings from Docker careers page"
- "Extract the main content from this article"
- "Get structured product data from this e-commerce page"
- "Pull all the news articles from this site"
- "Extract contact information from this company page"
- "Batch extract job listings from these 20 URLs"
- "Get cached results for this page (avoid API calls)"