What is Browser Automation?

Browser automation lets agents interact with websites:

Navigate pages

Fill forms

Click buttons

Extract data

Complete workflows

Browser Tool Basics

Starting Browser

browser(action="start", profile="clawd")

Opening URLs

browser(action="open", targetUrl="https://example.com")

Taking Snapshots

See current page state:

browser(action="snapshot")

Screenshots

Visual capture:

browser(action="screenshot")

Go to URL

browser(action="navigate", targetUrl="https://example.com/page")

Refresh

browser(action="navigate", targetUrl="current")  // Refresh current

Interacting with Elements

Click

browser(action="act", request={
  kind: "click",
  ref: "button[Submit]"  // Element reference from snapshot
})

Type Text

browser(action="act", request={
  kind: "type",
  ref: "textbox[Search]",
  text: "search query"
})

Fill Form

browser(action="act", request={
  kind: "fill",
  fields: [
    { ref: "textbox[Email]", text: "user@example.com" },
    { ref: "textbox[Password]", text: "password123" }
  ]
})

Press Keys

browser(action="act", request={
  kind: "press",
  key: "Enter"
})

Reading Page Content

Snapshot

Get page structure:

browser(action="snapshot")
// Returns structured representation of page

Screenshot

Get visual representation:

browser(action="screenshot")

Extract Text

From snapshot, read element content.

Common Patterns

// Navigate to login
browser(action="open", targetUrl="https://app.example.com/login")

// Fill credentials
browser(action="act", request={
  kind: "fill",
  fields: [
    { ref: "textbox[Email]", text: "user@example.com" },
    { ref: "textbox[Password]", text: "password" }
  ]
})

// Submit
browser(action="act", request={
  kind: "click",
  ref: "button[Sign In]"
})

// Verify success
browser(action="snapshot")

Search and Extract

// Go to search page
browser(action="open", targetUrl="https://example.com")

// Enter search
browser(action="act", request={
  kind: "type",
  ref: "textbox[Search]",
  text: "query"
})

// Submit search
browser(action="act", request={
  kind: "press",
  key: "Enter"
})

// Get results
browser(action="snapshot")

Form Filling

browser(action="act", request={
  kind: "fill",
  fields: [
    { ref: "textbox[Name]", text: "John Doe" },
    { ref: "textbox[Email]", text: "john@example.com" },
    { ref: "combobox[Country]", values: ["United States"] }
  ]
})

Waiting

Wait for Element

browser(action="act", request={
  kind: "wait",
  ref: "button[Submit]",
  timeMs: 5000
})

Wait for Text

browser(action="act", request={
  kind: "wait",
  textGone: "Loading..."
})

Profiles

Isolated Browser

browser(action="start", profile="clawd")

Fresh browser instance, no saved state.

Chrome Extension

browser(action="start", profile="chrome")

Uses your existing Chrome session.

Best Practices

Use Snapshots First

Before interacting, understand the page:

browser(action="snapshot")
// Examine structure
// Then interact

Handle Loading

Wait for pages to load:

browser(action="act", request={
  kind: "wait",
  timeMs: 2000
})

Verify Actions

After actions, verify they worked:

browser(action="act", request={kind: "click", ref: "..."})
browser(action="snapshot")  // Verify state changed

Handle Errors

Pages can fail:

Check for error states

Handle timeouts

Retry if appropriate

Security Considerations

Credentials

Don't hardcode passwords:

Use environment variables

Store securely

Don't log credentials

Sensitive Data

Be careful extracting:

Personal information

Financial data

Private content

Session Management

Close browsers when done:

browser(action="stop")

Limitations

What Works Well

Simple navigation
Form filling
Data extraction
Clicking buttons

What's Harder

Complex JavaScript apps
CAPTCHAs
Anti-automation measures
Real-time interactions

Alternative: web_fetch

For simple data extraction, web_fetch is simpler:

web_fetch(url="https://example.com")

Returns readable content without browser overhead.

Conclusion

Browser automation extends agent capabilities:

Access web applications

Automate workflows

Extract data

Complete tasks

Use thoughtfully—not every task needs a browser.

Next: Working with Nodes - Device and remote system integration