Gstrl โ Control Your Mac With Hand Gestures
Gesture, voice, and AI control for macOS.
View on GitHub<br>See gestures
โ Scroll to explore
Gestures.
Just your Mac's webcam. No special hardware.
cursor<br>๐ Pinch to move
Right hand pinch โ move your hand to drag the cursor anywhere on screen.
click<br>๐ Pinch to click
Quick left pinch for click. Hold for 1 second for right-click.
screenshot<br>โญ Circle to screenshot
Draw a circle while pinching โ captures that region to your clipboard instantly.
โ โ โ โ<br>๐ Swipe for arrows
Open hand flick in any direction โ up, down, left, right arrow keys.
dictate<br>โ Fist to speak
Hold a fist to activate speech-to-text. Say "press enter" or just dictate.
AI agent<br>โโ Both fists for AI
Hold both fists โ ask Claude anything. Gets spoken back to you.
scroll โ<br>๐โ Pinch + fist to scroll
Left pinch, right fist. Move left hand up/down to scroll.
โซ delete<br>๐ค Six to delete
Hold the shaka โ deletes characters. Both hands ๐ค๐ค = delete lines.
drag / select<br>๐๐ Both pinch to drag
Left pinch holds click, right hand pinch moves โ drag and drop files, select text.
Voice commands.
Say commands to trigger actions instead of typing. Dictation supports multiple languages.
๐<br>Click
click · right click · command click
โจ๏ธ<br>Press + key
press enter · press delete · press tab · press escape · press up/down/left/right
Modifiers
command z · control c · shift left · option delete · command shift z
Get started.
Clone, build, run. No accounts needed.
Terminal
git clone https://github.com/TomYang-TZ/Gstrl.git<br>cd Gstrl<br>make install<br>make run<br>Requires macOS 14+, a webcam, and Swift 5.9+. Permissions auto-prompt on first launch. Claude Code CLI optional for AI agent.
How it works.
All processing on-device. Zero latency. Zero cloud.
Webcam captures frames
AVCaptureSession at 30fps (configurable to 120fps) feeds frames to Apple Vision.
Vision detects hand poses
VNDetectHumanHandPoseRequest identifies 21 joints per hand, every frame.
Classifier maps to actions
Pinch detection, velocity-based swipes, and combo tracking turn poses into CGEvents.
Speech + voice commands
Hold a fist to activate speech. Dictate text or say "press enter", "command z", "click" to trigger actions. Dictation supports multiple languages.