How to Use OpenClaw's Agent-Browser for Programmatic Browser Control
Browser automation used to be complex. Selenium, Puppeteer, Playwright: all powerful tools, but they require boilerplate, setup, and a fair bit of code to get anything done. For AI agents, browser control needs to be simpler. That's where OpenClaw's agent-browser skill comes in.
What Is agent-browser?
agent-browser is an OpenClaw skill that gives AI agents the ability to control web browsers through simple, declarative commands. No npm packages to install. No WebDriver setup. Just natural language instructions that translate into browser actions.
Think of it as Playwright for agents: snapshot the page, click elements, fill forms, navigate, extract data. All through OpenClaw's unified tool interface.
Getting Started
First, make sure you have OpenClaw installed and configured. The agent-browser skill should be available by default in recent versions. You can verify it's installed:
openclaw skills list | grep agent-browser
If it's not there, install it:
openclaw skills install agent-browser
Basic Browser Actions
Opening a Page
The simplest operation is navigating to a URL:
{
"action": "open",
"url": "https://example.com",
"profile": "openclaw"
}
The profile parameter determines which browser instance to use. "openclaw" gives you an isolated, agent-managed browser. "chrome" lets you take over your existing Chrome instance (requires the OpenClaw Browser Relay extension).
Taking Snapshots
Snapshots are how agents "see" the page. They return a structured representation of the DOM:
{
"action": "snapshot",
"refs": "role",
"labels": true
}
This returns element references (like e12, e45) that you can use in subsequent actions. The refs: "role" option gives you role-based selectors. Use refs: "aria" for Playwright-style aria selectors if you need more stability across calls.
Snapshot output looks like this:
button e12 "Sign In"
link e13 "Learn More"
textbox e14 "Email"
These refs are valid within the current page context. When you navigate or refresh, take a new snapshot.
Clicking Elements
Once you have a snapshot, clicking is straightforward:
{
"action": "act",
"kind": "click",
"ref": "e12"
}
Or use a selector directly:
{
"action": "act",
"kind": "click",
"selector": "button[type=submit]"
}
Refs are cleaner and more reliable. Selectors are useful when you know the exact CSS selector.
Filling Forms
Type into inputs with the type action:
{
"action": "act",
"kind": "type",
"ref": "e14",
"text": "[email protected]"
}
For complex forms, use the fill action with multiple fields:
{
"action": "act",
"kind": "fill",
"fields": [
{"ref": "e14", "text": "[email protected]"},
{"ref": "e15", "text": "password123"}
]
}
Submit the form by clicking the submit button or pressing Enter:
{
"action": "act",
"kind": "press",
"key": "Enter"
}
Real-World Examples
Example 1: Scraping Product Prices
Let's say you want to monitor a product price on an e-commerce site:
In OpenClaw agent code, this might look like:
// Open product page
await browser.open({
url: "https://store.example.com/product/12345",
profile: "openclaw"
});
// Take snapshot
const snapshot = await browser.snapshot({ refs: "role" });
// Find price element (you'd parse snapshot.content)
const priceRef = findElementByText(snapshot.content, "$");
// Extract text
const price = extractPrice(snapshot.content, priceRef);
console.log(`Current price: ${price}`);
Example 2: Automated Form Submission
Submitting a contact form programmatically:
// Navigate to contact page
await browser.open({ url: "https://example.com/contact" });
// Snapshot to get field refs
const snapshot = await browser.snapshot({ refs: "aria" });
// Fill all fields at once
await browser.act({
kind: "fill",
fields: [
{ selector: "input[name=name]", text: "Agent Name" },
{ selector: "input[name=email]", text: "[email protected]" },
{ selector: "textarea[name=message]", text: "Hello from OpenClaw!" }
]
});
// Submit
await browser.act({
kind: "click",
selector: "button[type=submit]"
});
// Wait for success message
await browser.act({
kind: "wait",
textGone: "Submitting..."
});
Example 3: Monitoring Dashboard Changes
Check if a dashboard metric has changed:
while (true) {
await browser.open({ url: "https://dashboard.example.com" });
const snapshot = await browser.snapshot();
const metric = extractMetricValue(snapshot.content);
if (metric > threshold) {
await sendAlert(`Metric exceeded: ${metric}`);
break;
}
await sleep(60000); // Check every minute
}
Advanced Patterns
Using Chrome Profile Takeover
The "chrome" profile lets agents take over your existing Chrome browser. This is useful when you need to reuse logged-in sessions or work with sites that have anti-bot measures.
Requirements:
profile: "chrome" in your browser calls{
"action": "snapshot",
"profile": "chrome"
}
The agent will control the attached Chrome tab directly. This is powerful for working with authenticated sessions or complex SPAs.
Handling Dynamic Content
For pages with lazy-loaded content, use the wait action:
{
"action": "act",
"kind": "wait",
"text": "Results loaded"
}
Or wait for an element to disappear:
{
"action": "act",
"kind": "wait",
"textGone": "Loading..."
}
Taking Screenshots
Capture visual proof or debug issues:
{
"action": "screenshot",
"type": "png",
"fullPage": true
}
The screenshot is returned as an attachment you can save or analyze.
Best Practices
1. Always Take Fresh Snapshots
Element refs are only valid for the current page state. After navigation or page changes, take a new snapshot before interacting with elements.
2. Use aria Refs for Stability
If your automation spans multiple calls or sessions:
{
"action": "snapshot",
"refs": "aria"
}
Aria refs are Playwright-style selectors that persist across snapshots.
3. Prefer Refs Over Selectors
Refs are more reliable than CSS selectors because they're generated from the actual DOM structure at snapshot time. Use selectors only when you know the exact selector won't change.
4. Handle Navigation Carefully
After clicking a link or submitting a form, wait for the new page to load:
{
"action": "act",
"kind": "click",
"ref": "e12",
"loadState": "networkidle"
}
5. Keep targetId Consistent
When using refs from a snapshot, pass the targetId from the snapshot response into subsequent actions. This ensures you're operating on the same tab.
Common Pitfalls
Using Stale Refs
Don't reuse refs after navigation:
// Wrong
const snapshot1 = await browser.snapshot();
await browser.act({ kind: "click", ref: "e12" }); // Navigates
await browser.act({ kind: "type", ref: "e13", text: "test" }); // e13 is stale!
// Right
const snapshot1 = await browser.snapshot();
await browser.act({ kind: "click", ref: "e12" }); // Navigates
const snapshot2 = await browser.snapshot(); // Fresh snapshot
await browser.act({ kind: "type", ref: snapshot2.newRef, text: "test" });
Forgetting to Wait
Don't assume instant page loads:
// Wrong
await browser.open({ url: "https://example.com" });
await browser.act({ kind: "click", selector: ".dynamic-button" }); // Might not exist yet
// Right
await browser.open({ url: "https://example.com" });
await browser.act({ kind: "wait", text: "Page ready" });
await browser.act({ kind: "click", selector: ".dynamic-button" });
Overusing wait Actions
Avoid wait when you can check the snapshot instead. Waiting blindly can slow down your automation.
Debugging Tips
When to Use agent-browser
agent-browser shines for:
- Web scraping: Extract data from sites without APIs
- Form automation: Submit forms, fill out applications
- Monitoring: Check dashboards, track changes
- Testing: Automated UI testing for web apps
- Research: Gather information from multiple sources
- Heavy data extraction: Use APIs or dedicated scrapers
- High-frequency polling: Browser overhead adds up
- Sites with strong anti-bot measures: Unless using chrome profile with real sessions
Wrapping Up
OpenClaw's agent-browser skill makes browser automation accessible to AI agents without the complexity of traditional tools. Start with simple navigation and snapshots, then build up to complex multi-step workflows.
The key is thinking in terms of: snapshot, identify, act, verify. Take a snapshot, find your elements, perform actions, and verify the result with another snapshot.
With agent-browser, your agents can interact with the web as naturally as they interact with APIs. Give it a try on your next automation project.