Browser-use Tool
This feature is supported in the Pro Edition.
Browser-use is a browser automation task execution tool designed for Xpert agents. It encapsulates integration capabilities with the browser-use open-source framework, allowing large model agents to invoke a real browser in a sandbox environment to perform specified web operations automatically.
This tool is suitable for scenarios requiring automated browser tasks such as accessing websites, extracting information, filling forms, clicking buttons, and more, extending browser operation capabilities for large models with general execution abilities.
Features
- ✅ Supports automated browser operations based on natural language-described
task
instructions - ✅ Supports custom large model and API configuration
- ✅ Seamlessly integrates with the
browser-use
open-source framework - ✅ Automatically tracks and records execution processes (history, video, trace)
- ✅ Supports multi-step tasks combined with agent reasoning workflows
- ✅ Supports runtime parameter configuration, such as enabling screen recording or vision models
- ✅ Supports usage within Agentic Workflow
Usage Instructions
Tool Parameter Configuration
- Specify browser tasks (provided by LLM)
- Configure LLM model
- Browser execution parameters (e.g., enable screen recording, use vision model, timeout duration)
Communication with browser-use in Sandbox
The tool establishes an SSE stream via EventSource
to communicate with the Sandbox service, initiating browser task execution through /operator/stream
.
Real-time Event Handling
During task execution, the tool listens for and parses event messages returned by browser-use:
- Execution thoughts for each step (
thoughts
) - Current page URL
- Error status (
errors
) - Completion status (messages containing
done
)
Parsed intermediate events are sent in real-time to the frontend or debugging interface to display execution status.
Final Result Return
Upon task completion (detected by done
), the tool extracts the final_result
field from events as the execution result.
Return Value
Returns a string result summarizing the task execution or describing the operation outcome for the large model agent.
Advanced Configuration
Configuration Item | Description |
---|---|
copilotModel | Current LLM model and its provider information |
llm_temperature | Controls the sampling temperature of the LLM (default: 0.5) |
enable_recording | Enables browser screen recording (default: enabled) |
max_steps | Maximum number of steps for browser tasks (default: 100) |
use_vision | Enables visual recognition (e.g., screenshot understanding) |
timeout | Task timeout duration |
Application Scenarios
- Search and summarize specific content on webpages (e.g., news, stock prices, reports)
- Perform multi-step interactions on complex websites (e.g., querying and exporting data)
- Simulate real human web operation workflows in large agent systems
Notes
- This tool relies on the backend
browser-use
runtime environment (Sandbox), which must be accessible and properly started. - The tool's results depend on the accuracy of browser task execution and model understanding.
- The browser runs in
headless
mode by default. - Designed for large models, the
task
should be clearly described in natural language to express intent.