Skip to main content

Browser-use Tool

PRO

This feature is supported in the Pro Edition.

Browser-use is a browser automation task execution tool designed for Xpert agents. It encapsulates integration capabilities with the browser-use open-source framework, allowing large model agents to invoke a real browser in a sandbox environment to perform specified web operations automatically.

This tool is suitable for scenarios requiring automated browser tasks such as accessing websites, extracting information, filling forms, clicking buttons, and more, extending browser operation capabilities for large models with general execution abilities.

Features

  • ✅ Supports automated browser operations based on natural language-described task instructions
  • ✅ Supports custom large model and API configuration
  • ✅ Seamlessly integrates with the browser-use open-source framework
  • ✅ Automatically tracks and records execution processes (history, video, trace)
  • ✅ Supports multi-step tasks combined with agent reasoning workflows
  • ✅ Supports runtime parameter configuration, such as enabling screen recording or vision models
  • ✅ Supports usage within Agentic Workflow

Usage Instructions

Tool Parameter Configuration

  • Specify browser tasks (provided by LLM)
  • Configure LLM model
  • Browser execution parameters (e.g., enable screen recording, use vision model, timeout duration)

Communication with browser-use in Sandbox

The tool establishes an SSE stream via EventSource to communicate with the Sandbox service, initiating browser task execution through /operator/stream.

Real-time Event Handling

During task execution, the tool listens for and parses event messages returned by browser-use:

  • Execution thoughts for each step (thoughts)
  • Current page URL
  • Error status (errors)
  • Completion status (messages containing done)

Parsed intermediate events are sent in real-time to the frontend or debugging interface to display execution status.

Final Result Return

Upon task completion (detected by done), the tool extracts the final_result field from events as the execution result.

Return Value

Returns a string result summarizing the task execution or describing the operation outcome for the large model agent.

Advanced Configuration

Configuration ItemDescription
copilotModelCurrent LLM model and its provider information
llm_temperatureControls the sampling temperature of the LLM (default: 0.5)
enable_recordingEnables browser screen recording (default: enabled)
max_stepsMaximum number of steps for browser tasks (default: 100)
use_visionEnables visual recognition (e.g., screenshot understanding)
timeoutTask timeout duration

Application Scenarios

  • Search and summarize specific content on webpages (e.g., news, stock prices, reports)
  • Perform multi-step interactions on complex websites (e.g., querying and exporting data)
  • Simulate real human web operation workflows in large agent systems

Notes

  • This tool relies on the backend browser-use runtime environment (Sandbox), which must be accessible and properly started.
  • The tool's results depend on the accuracy of browser task execution and model understanding.
  • The browser runs in headless mode by default.
  • Designed for large models, the task should be clearly described in natural language to express intent.