AI can control your Desktop

AmDab1 pts0 comments

Clawd Cursor v1.5.2 — the local MCP server for safe desktop control

v1.5.2 — latest stable

It sees. It acts. It verifies.<br>Desktop control for any AI agent

Any model. Any app. One MCP entry. Local-only. clawdcursor compiles the screen into a map of stable element ids your agent acts on symbolically — no pixel guessing — then confirms every consequential action actually took. 7 compact tools, one safety chokepoint, no telemetry.

View on GitHub

Quick Start

Toolbox — compound tools (recommended)

98<br>Tools — granular (compat / debug)

Operating Systems

Two ways to use it

Run it yourself, or hand it to your agent.

Test from the CLI

Plain English in, actions out.

clawdcursor doctor<br>clawdcursor agent

Wire it into your agent

One MCP entry, desktop control appears as native tools.

Claude Code<br>Cursor<br>Windsurf<br>OpenClaw<br>Zed

Pick a mode

How will your AI talk to it?

Same tools, three entry shapes. Pick once during install.

clawdcursor mcp — recommended

AI lives in your editor (Claude Code, Cursor, Windsurf, Zed). Editor spawns clawdcursor on demand over stdio. No daemon, no port.

"mcpServers": {<br>"clawdcursor": {<br>"command": "clawdcursor",<br>"args": ["mcp", "--compact"]

7 / 98<br>Compact / Granular tools

stdio<br>Transport

clawdcursor agent — autonomous daemon

clawdcursor brings its own LLM brain (configured via doctor). For unattended runs, scheduled tasks, multi-process orchestration.

Run clawdcursor doctor &middot; pick a provider

Run clawdcursor agent

POST tasks to 127.0.0.1:3847/mcp

:3847<br>HTTP MCP

13+<br>Providers

clawdcursor agent --no-llm — BYO brain

Your agent already has a brain — you just want HTTP tools. Same daemon, no built-in agent loop.

Run clawdcursor agent --no-llm

98 tools on :3847/mcp

Stateless — no session init needed

98<br>Granular tools (compat)

any<br>HTTP client

How it works

Cheap paths first.

A11y tree before pixels. Vision only when needed.

1 Compile the screen

Zero pixels<br>Fuse the a11y tree (+ OCR) into one el_NN map. Act on an element by id — no screenshot, no vision LLM.

2 Escalate as needed

Cheapest rung that works<br>OCR when the tree is sparse, screenshot when you need pixels, vision only for canvas UIs.

3 Verify & gate

Reactive + one chokepoint<br>Pass expect and the action confirms its outcome — DEVIATION if the UI didn't obey. Every call routes through one safety layer.

🎯

Toolbox — 7 compound tools

The recommended surface — computer, accessibility, window, system, browser, task, batch. ~12× smaller catalog than the granular Tools surface.

🗺️

UI State Compiler

One fused, confidence-scored map of the screen with stable el_NN ids. Act symbolically — it survives DPI, resize, and layout shifts.

Features

Any OS. Any model.

🍎

macOS

TCC-safe. clawdcursor grant handles Accessibility + Screen Recording.

🪟

Windows

Native UIA + Windows.Media.Ocr. x64 and ARM64.

🐧

Linux

X11 and Wayland. AT-SPI for a11y, Tesseract for OCR.

Verified actions

Pass expect on send/save/submit — clawdcursor confirms the outcome on screen and reports a DEVIATION instead of a hollow success.

⌨️

Shortcuts engine

Platform-aware key combos — Cmd on macOS, Ctrl elsewhere. No LLM cost.

📦

batch — one round-trip

Collapse N deterministic tool calls into a single guarded, safety-gated batch. N calls &rarr; 1.

Tools<br>7 compact tools + 98 granular ›<br>The 7 compact compounds are the recommended public surface. Each row lists the actions you pass via { "action": "…" }. The 98 granular tools (one schema per verb) are listed below for compatibility and debugging — use them when your runtime requires every primitive as a top-level MCP tool. (98 total.)

Compound<br>Purpose<br>Actions

computer<br>Mouse, keyboard, screenshots. The raw I/O surface.<br>screenshot · click · double_click · right_click · triple_click · hover · scroll · scroll_horizontal · drag · drag_path · type · key · wait

accessibility<br>Drive UI by element name, not by pixel. Survives DPI, resize, layout shifts.<br>read_tree · find · get_element · focused · invoke · focus · set_value · get_value · expand · collapse · toggle · select · state · list_children · wait_for

window<br>Launch, focus, resize. App-level state management.<br>list · active · focus · maximize · minimize · restore · close · resize · list_displays · screen_size · open_app · open_file · open_url · switch_tab · navigate

system<br>Clipboard, OCR, shortcuts, undo, webview detection + CDP relaunch, the active system prompt, and task delegation. The meta surface for an external brain.<br>clipboard_read · clipboard_write · system_time · ocr · undo · shortcuts_list · shortcuts_run · delegate · detect_webview · relaunch_with_cdp · system_prompt

browser<br>Chrome DevTools Protocol — real DOM access for Electron / WebView2 apps whose a11y tree is sparse.<br>connect · page_context · read_text · click · type · select_option · evaluate · wait_for · list_tabs · switch_tab · scroll

task<br>Hand off the whole task to clawdcursor's autonomous loop. Daemon mode only — requires clawdcursor agent with an LLM...

clawdcursor tools agent granular screen compact

Related Articles