AI & LLMsDocumentedScanned

gemini-computer-use

Build and run Gemini 2.5 Computer Use browser-control agents with Playwright.

Share:

Installation

npx clawhub@latest install gemini-computer-use

View the full skill documentation and source below.

Documentation

Gemini Computer Use

Quick start

  • Source the env file and set your API key:
  • cp env.example env.sh
       $EDITOR env.sh
       source env.sh

  • Create a virtual environment and install dependencies:
  • python -m venv .venv
       source .venv/bin/activate
       pip install google-genai playwright
       playwright install chromium

  • Run the agent script with a prompt:
  • python scripts/computer_use_agent.py \
         --prompt "Find the latest blog post title on example.com" \
         --start-url "" \
         --turn-limit 6

    Browser selection

    • Default: Playwright's bundled Chromium (no env vars required).
    • Choose a channel (Chrome/Edge) with COMPUTER_USE_BROWSER_CHANNEL.
    • Use a custom Chromium-based executable (e.g., Brave) with COMPUTER_USE_BROWSER_EXECUTABLE.
    If both are set, COMPUTER_USE_BROWSER_EXECUTABLE takes precedence.

    Core workflow (agent loop)

  • Capture a screenshot and send the user goal + screenshot to the model.

  • Parse function_call actions in the response.

  • Execute each action in Playwright.

  • If a safety_decision is require_confirmation, prompt the user before executing.

  • Send function_response objects containing the latest URL + screenshot.

  • Repeat until the model returns only text (no actions) or you hit the turn limit.
  • Operational guidance

    • Run in a sandboxed browser profile or container.
    • Use --exclude to block risky actions you do not want the model to take.
    • Keep the viewport at 1440x900 unless you have a reason to change it.

    Resources

    • Script: scripts/computer_use_agent.py
    • Reference notes: references/google-computer-use.md
    • Env template: env.example