Clawd Cursor v1.5.2 — the local MCP server for safe desktop control
v1.5.2 — latest stable
It sees. It acts. It verifies.<br>Desktop control for any AI agent
Any model. Any app. One MCP entry. Local-only. clawdcursor compiles the screen into a map of stable element ids your agent acts on symbolically — no pixel guessing — then confirms every consequential action actually took. 7 compact tools, one safety chokepoint, no telemetry.
View on GitHub
Quick Start
Toolbox — compound tools (recommended)
98<br>Tools — granular (compat / debug)
Operating Systems
Two ways to use it
Run it yourself, or hand it to your agent.
Test from the CLI
Plain English in, actions out.
clawdcursor doctor<br>clawdcursor agent
Wire it into your agent
One MCP entry, desktop control appears as native tools.
Claude Code<br>Cursor<br>Windsurf<br>OpenClaw<br>Zed
Pick a mode
How will your AI talk to it?
Same tools, three entry shapes. Pick once during install.
clawdcursor mcp — recommended
AI lives in your editor (Claude Code, Cursor, Windsurf, Zed). Editor spawns clawdcursor on demand over stdio. No daemon, no port.
"mcpServers": {<br>"clawdcursor": {<br>"command": "clawdcursor",<br>"args": ["mcp", "--compact"]
7 / 98<br>Compact / Granular tools
stdio<br>Transport
clawdcursor agent — autonomous daemon
clawdcursor brings its own LLM brain (configured via doctor). For unattended runs, scheduled tasks, multi-process orchestration.
Run clawdcursor doctor · pick a provider
Run clawdcursor agent
POST tasks to 127.0.0.1:3847/mcp
:3847<br>HTTP MCP
13+<br>Providers
clawdcursor agent --no-llm — BYO brain
Your agent already has a brain — you just want HTTP tools. Same daemon, no built-in agent loop.
Run clawdcursor agent --no-llm
98 tools on :3847/mcp
Stateless — no session init needed
98<br>Granular tools (compat)
any<br>HTTP client
How it works
Cheap paths first.
A11y tree before pixels. Vision only when needed.
1 Compile the screen
Zero pixels<br>Fuse the a11y tree (+ OCR) into one el_NN map. Act on an element by id — no screenshot, no vision LLM.
2 Escalate as needed
Cheapest rung that works<br>OCR when the tree is sparse, screenshot when you need pixels, vision only for canvas UIs.
3 Verify & gate
Reactive + one chokepoint<br>Pass expect and the action confirms its outcome — DEVIATION if the UI didn't obey. Every call routes through one safety layer.
🎯
Toolbox — 7 compound tools
The recommended surface — computer, accessibility, window, system, browser, task, batch. ~12× smaller catalog than the granular Tools surface.
🗺️
UI State Compiler
One fused, confidence-scored map of the screen with stable el_NN ids. Act symbolically — it survives DPI, resize, and layout shifts.
Features
Any OS. Any model.
🍎
macOS
TCC-safe. clawdcursor grant handles Accessibility + Screen Recording.
🪟
Windows
Native UIA + Windows.Media.Ocr. x64 and ARM64.
🐧
Linux
X11 and Wayland. AT-SPI for a11y, Tesseract for OCR.
Verified actions
Pass expect on send/save/submit — clawdcursor confirms the outcome on screen and reports a DEVIATION instead of a hollow success.
⌨️
Shortcuts engine
Platform-aware key combos — Cmd on macOS, Ctrl elsewhere. No LLM cost.
📦
batch — one round-trip
Collapse N deterministic tool calls into a single guarded, safety-gated batch. N calls → 1.
Tools<br>7 compact tools + 98 granular ›<br>The 7 compact compounds are the recommended public surface. Each row lists the actions you pass via { "action": "…" }. The 98 granular tools (one schema per verb) are listed below for compatibility and debugging — use them when your runtime requires every primitive as a top-level MCP tool. (98 total.)
Compound<br>Purpose<br>Actions
computer<br>Mouse, keyboard, screenshots. The raw I/O surface.<br>screenshot · click · double_click · right_click · triple_click · hover · scroll · scroll_horizontal · drag · drag_path · type · key · wait
accessibility<br>Drive UI by element name, not by pixel. Survives DPI, resize, layout shifts.<br>read_tree · find · get_element · focused · invoke · focus · set_value · get_value · expand · collapse · toggle · select · state · list_children · wait_for
window<br>Launch, focus, resize. App-level state management.<br>list · active · focus · maximize · minimize · restore · close · resize · list_displays · screen_size · open_app · open_file · open_url · switch_tab · navigate
system<br>Clipboard, OCR, shortcuts, undo, webview detection + CDP relaunch, the active system prompt, and task delegation. The meta surface for an external brain.<br>clipboard_read · clipboard_write · system_time · ocr · undo · shortcuts_list · shortcuts_run · delegate · detect_webview · relaunch_with_cdp · system_prompt
browser<br>Chrome DevTools Protocol — real DOM access for Electron / WebView2 apps whose a11y tree is sparse.<br>connect · page_context · read_text · click · type · select_option · evaluate · wait_for · list_tabs · switch_tab · scroll
task<br>Hand off the whole task to clawdcursor's autonomous loop. Daemon mode only — requires clawdcursor agent with an LLM...