Control your Mac with voice and hand gestures

Gstrl — Control Your Mac With Hand Gestures

Gesture, voice, and AI control for macOS.

View on GitHub See gestures

↓ Scroll to explore

Gestures.

Just your Mac's webcam. No special hardware.

cursor 👌 Pinch to move

Right hand pinch — move your hand to drag the cursor anywhere on screen.

click 👌 Pinch to click

Quick left pinch for click. Hold for 1 second for right-click.

screenshot ⭕ Circle to screenshot

Draw a circle while pinching — captures that region to your clipboard instantly.

↑ ↓ ← → 🖐 Swipe for arrows

Open hand flick in any direction — up, down, left, right arrow keys.

dictate ✊ Fist to speak

Hold a fist to activate speech-to-text. Say "press enter" or just dictate.

AI agent ✊✊ Both fists for AI

Hold both fists — ask Claude anything. Gets spoken back to you.

scroll ↕ 👌✊ Pinch + fist to scroll

Left pinch, right fist. Move left hand up/down to scroll.

⌫ delete 🤙 Six to delete

Hold the shaka — deletes characters. Both hands 🤙🤙 = delete lines.

drag / select 👌👌 Both pinch to drag

Left pinch holds click, right hand pinch moves — drag and drop files, select text.

Voice commands.

Say commands to trigger actions instead of typing. Dictation supports multiple languages.

👆 Click

click · right click · command click

⌨️ Press + key

press enter · press delete · press tab · press escape · press up/down/left/right

Modifiers

command z · control c · shift left · option delete · command shift z

Get started.

Clone, build, run. No accounts needed.

Terminal

git clone https://github.com/TomYang-TZ/Gstrl.git cd Gstrl make install make run Requires macOS 14+, a webcam, and Swift 5.9+. Permissions auto-prompt on first launch. Claude Code CLI optional for AI agent.

How it works.

All processing on-device. Zero latency. Zero cloud.

Webcam captures frames

AVCaptureSession at 30fps (configurable to 120fps) feeds frames to Apple Vision.

Vision detects hand poses

VNDetectHumanHandPoseRequest identifies 21 joints per hand, every frame.

Classifier maps to actions

Pinch detection, velocity-based swipes, and combo tracking turn poses into CGEvents.

Speech + voice commands

Hold a fist to activate speech. Dictate text or say "press enter", "command z", "click" to trigger actions. Dictation supports multiple languages.

Control your Mac with voice and hand gestures

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast