Control your Mac with voice and hand gestures

motgnay1 pts0 comments

Gstrl โ€” Control Your Mac With Hand Gestures

Gesture, voice, and AI control for macOS.

View on GitHub<br>See gestures

โ†“ Scroll to explore

Gestures.

Just your Mac's webcam. No special hardware.

cursor<br>๐Ÿ‘Œ Pinch to move

Right hand pinch โ€” move your hand to drag the cursor anywhere on screen.

click<br>๐Ÿ‘Œ Pinch to click

Quick left pinch for click. Hold for 1 second for right-click.

screenshot<br>โญ• Circle to screenshot

Draw a circle while pinching โ€” captures that region to your clipboard instantly.

โ†‘ โ†“ โ† โ†’<br>๐Ÿ– Swipe for arrows

Open hand flick in any direction โ€” up, down, left, right arrow keys.

dictate<br>โœŠ Fist to speak

Hold a fist to activate speech-to-text. Say "press enter" or just dictate.

AI agent<br>โœŠโœŠ Both fists for AI

Hold both fists โ€” ask Claude anything. Gets spoken back to you.

scroll โ†•<br>๐Ÿ‘ŒโœŠ Pinch + fist to scroll

Left pinch, right fist. Move left hand up/down to scroll.

โŒซ delete<br>๐Ÿค™ Six to delete

Hold the shaka โ€” deletes characters. Both hands ๐Ÿค™๐Ÿค™ = delete lines.

drag / select<br>๐Ÿ‘Œ๐Ÿ‘Œ Both pinch to drag

Left pinch holds click, right hand pinch moves โ€” drag and drop files, select text.

Voice commands.

Say commands to trigger actions instead of typing. Dictation supports multiple languages.

๐Ÿ‘†<br>Click

click &middot; right click &middot; command click

โŒจ๏ธ<br>Press + key

press enter &middot; press delete &middot; press tab &middot; press escape &middot; press up/down/left/right

Modifiers

command z &middot; control c &middot; shift left &middot; option delete &middot; command shift z

Get started.

Clone, build, run. No accounts needed.

Terminal

git clone https://github.com/TomYang-TZ/Gstrl.git<br>cd Gstrl<br>make install<br>make run<br>Requires macOS 14+, a webcam, and Swift 5.9+. Permissions auto-prompt on first launch. Claude Code CLI optional for AI agent.

How it works.

All processing on-device. Zero latency. Zero cloud.

Webcam captures frames

AVCaptureSession at 30fps (configurable to 120fps) feeds frames to Apple Vision.

Vision detects hand poses

VNDetectHumanHandPoseRequest identifies 21 joints per hand, every frame.

Classifier maps to actions

Pinch detection, velocity-based swipes, and combo tracking turn poses into CGEvents.

Speech + voice commands

Hold a fist to activate speech. Dictate text or say "press enter", "command z", "click" to trigger actions. Dictation supports multiple languages.

pinch click middot hand press right

Related Articles