AI & LLMsDocumentedScanned

gemini-computer-use

Build and run Gemini 2.5 Computer Use browser-control agents with Playwright.

Installation

npx clawhub@latest install gemini-computer-use

View the full skill documentation and source below.

Source the env file and set your API key:

cp env.example env.sh
   $EDITOR env.sh
   source env.sh

Create a virtual environment and install dependencies:

python -m venv .venv
   source .venv/bin/activate
   pip install google-genai playwright
   playwright install chromium

Run the agent script with a prompt:

python scripts/computer_use_agent.py \
     --prompt "Find the latest blog post title on example.com" \
     --start-url "" \
     --turn-limit 6

Default: Playwright's bundled Chromium (no env vars required).
Choose a channel (Chrome/Edge) with COMPUTER_USE_BROWSER_CHANNEL.
Use a custom Chromium-based executable (e.g., Brave) with COMPUTER_USE_BROWSER_EXECUTABLE.

If both are set, COMPUTER_USE_BROWSER_EXECUTABLE takes precedence.

Capture a screenshot and send the user goal + screenshot to the model.

Parse function_call actions in the response.

Execute each action in Playwright.

If a safety_decision is require_confirmation, prompt the user before executing.

Send function_response objects containing the latest URL + screenshot.

Repeat until the model returns only text (no actions) or you hit the turn limit.