Skip to main content

Use Computer Use with an Agent

Assign a cloud desktop to an agent so it can interact with graphical applications. The agent takes screenshots, analyzes them, and performs mouse/keyboard actions in a loop.

Prerequisites

  • You are signed in to the XpressAI Platform.
  • You have a provisioned desktop in Ready status (see Provision a Cloud Desktop).
  • You have a deployed agent.

Steps

  1. Navigate to the agent's profile page.
  2. Select the Desktop tab.
  3. Assign the provisioned desktop to the agent.
  4. Save the configuration.

How computer use works

When the agent is assigned a desktop, it gains access to the meeseeks_desktop tool. The agent uses a screenshot-action loop:

  1. The agent takes a screenshot of the desktop.
  2. It analyzes the screenshot to understand the current state.
  3. It performs an action (click, type, scroll, etc.).
  4. It takes another screenshot to verify the result.
  5. This loop repeats for up to 25 iterations per task.
tip

You can watch the agent interact with the desktop in real time by opening a VNC connection to the same desktop (see Connect via VNC).

Common use cases

  • Filling out web forms.
  • Interacting with legacy desktop software that has no API.
  • Performing data entry across multiple applications.
  • Taking screenshots for documentation or reporting.
warning

The 25-iteration limit prevents runaway loops. If the agent's task requires more steps, break it into smaller sub-tasks.

Verify

  • The agent's Desktop tab shows the assigned desktop.
  • Ask the agent to perform a desktop task (for example, "Open the browser and go to example.com").
  • Watch via VNC to confirm the agent is interacting with the desktop.
  • The agent reports the result of the task in the conversation.