Browser-use Tool

PRO

This feature is supported in the Pro Edition.

Browser-use is a browser automation task execution tool designed for Xpert agents. It encapsulates integration capabilities with the browser-use open-source framework, allowing large model agents to invoke a real browser in a sandbox environment to perform specified web operations automatically.

This tool is suitable for scenarios requiring automated browser tasks such as accessing websites, extracting information, filling forms, clicking buttons, and more, extending browser operation capabilities for large models with general execution abilities.

Features

✅ Supports automated browser operations based on natural language-described task instructions
✅ Supports custom large model and API configuration
✅ Seamlessly integrates with the browser-use open-source framework
✅ Automatically tracks and records execution processes (history, video, trace)
✅ Supports multi-step tasks combined with agent reasoning workflows
✅ Supports runtime parameter configuration, such as enabling screen recording or vision models
✅ Supports usage within Agentic Workflow

Usage Instructions

Tool Parameter Configuration

Specify browser tasks (provided by LLM)
Configure LLM model
Browser execution parameters (e.g., enable screen recording, use vision model, timeout duration)

Communication with browser-use in Sandbox

The tool establishes an SSE stream via EventSource to communicate with the Sandbox service, initiating browser task execution through /operator/stream.

Real-time Event Handling

During task execution, the tool listens for and parses event messages returned by browser-use:

Execution thoughts for each step (thoughts)
Current page URL
Error status (errors)
Completion status (messages containing done)

Parsed intermediate events are sent in real-time to the frontend or debugging interface to display execution status.

Final Result Return

Upon task completion (detected by done), the tool extracts the final_result field from events as the execution result.

Return Value

Returns a string result summarizing the task execution or describing the operation outcome for the large model agent.

Advanced Configuration

Configuration Item	Description
`copilotModel`	Current LLM model and its provider information
`llm_temperature`	Controls the sampling temperature of the LLM (default: 0.5)
`enable_recording`	Enables browser screen recording (default: enabled)
`max_steps`	Maximum number of steps for browser tasks (default: 100)
`use_vision`	Enables visual recognition (e.g., screenshot understanding)
`timeout`	Task timeout duration

Application Scenarios

Search and summarize specific content on webpages (e.g., news, stock prices, reports)
Perform multi-step interactions on complex websites (e.g., querying and exporting data)
Simulate real human web operation workflows in large agent systems

Notes

This tool relies on the backend browser-use runtime environment (Sandbox), which must be accessible and properly started.
The tool's results depend on the accuracy of browser task execution and model understanding.
The browser runs in headless mode by default.
Designed for large models, the task should be clearly described in natural language to express intent.

browser-use GitHub Repository

Features​

Usage Instructions​

Tool Parameter Configuration​

Communication with browser-use in Sandbox​

Real-time Event Handling​

Final Result Return​

Return Value​

Advanced Configuration​

Application Scenarios​

Notes​

Related Links​