GitHub - HUANGCHIHHUNGLeo/claude-real-video: Let Claude (or any LLM) actually watch a video — scene-aware, deduplicated frames + transcript, from a URL or local file. Runs locally, MIT. · GitHub
/" data-turbo-transient="true" />
Skip to content
Search or jump to...
Search code, repositories, users, issues, pull requests...
-->
Search
Clear
Search syntax tips
Provide feedback
--><br>We read every piece of feedback, and take your input very seriously.
Include my email address so I can be contacted
Cancel
Submit feedback
Saved searches
Use saved searches to filter your results more quickly
-->
Name
Query
To see all available qualifiers, see our documentation.
Cancel
Create saved search
Sign in
/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up
Appearance settings
Resetting focus
You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
{{ message }}
HUANGCHIHHUNGLeo
claude-real-video
Public
Notifications<br>You must be signed in to change notification settings
Fork
Star
master
BranchesTags
Go to file
CodeOpen more actions menu
Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit
History<br>6 Commits<br>6 Commits
src/claude_real_video
src/claude_real_video
.gitignore
.gitignore
LICENSE
LICENSE
README.md
README.md
pyproject.toml
pyproject.toml
View all files
Repository files navigation
claude-real-video
Let Claude — or any LLM — actually watch a video.
Most AI tools don't really see a video. Paste a YouTube link into ChatGPT and it<br>reads the transcript , not the picture. Claude won't take a video file at all.<br>Even Gemini, which can read video natively, has to send it up to Google and<br>samples frames at a fixed interval (1 fps by default), so fast cuts slip past.
claude-real-video does it differently, and locally : point it at a URL or a<br>file, and it pulls the frames that actually matter (every scene change, not a<br>fixed quota), throws away the near-duplicates, transcribes the audio, and hands<br>you a clean folder any LLM can read — on your own machine, nothing uploaded.
crv "https://www.youtube.com/watch?v=..."<br># → crv-out/frames/*.jpg + crv-out/transcript.txt + crv-out/MANIFEST.txt
Then drop the frames + MANIFEST.txt into Claude / ChatGPT / Gemini and ask away.
Why not just sample frames?
Most "let an LLM watch a video" scripts (and Gemini's own pipeline) grab frames<br>at a fixed interval — e.g. one per second. That over-samples a static<br>screencast and under-samples a fast-cut reel. claude-real-video is smarter:
fixed-interval sampling<br>claude-real-video
Frame selection<br>every N seconds<br>scene-change detection + density floor
Repeated shots (A-B-A cuts)<br>sent again every time<br>sliding-window dedup sends each shot once
Static slide (10 min)<br>~600 near-identical frames<br>collapses to 1 (dedup)
Fast-cut reel<br>misses frames between samples<br>catches each visual change
Audio<br>often ignored<br>Whisper transcript w/ language detect
Where the video goes<br>often uploaded to a cloud<br>stays on your machine
Input<br>usually local file only<br>URL (yt-dlp) or local file
You feed the model fewer, more meaningful frames — cheaper context, better<br>understanding.
Install
pip install claude-real-video # core (frames + dedup)<br>pip install "claude-real-video[whisper]" # + audio transcription
System requirement: ffmpeg
ffmpeg / ffprobe are used for frame extraction and audio, and aren't<br>pip-installable. Install them once:
OS<br>command
macOS<br>brew install ffmpeg
Linux<br>sudo apt install ffmpeg (or your distro's package manager)
Windows<br>winget install Gyan.FFmpeg — or choco install ffmpeg — or download a build and add its bin\ folder to your PATH
Verify it's on your PATH:
ffmpeg -version
Transcription uses the whisper CLI (installed by the [whisper] extra, or<br>pip install openai-whisper). Whisper also relies on ffmpeg.
Works on macOS, Windows, and Linux — Python 3.10+.
Usage
# A YouTube / Instagram / TikTok / ... link<br>crv "https://www.instagram.com/reel/XXXX/"
# A local file, English transcript, output to ./out<br>crv lecture.mp4 -o out --lang en
# Frames only, no transcription<br>crv clip.mp4 --no-transcribe
# A login-gated video (your own / authorised use): pass a Netscape cookie file<br>crv "https://..." --cookies cookies.txt
python -m claude_real_video ... works as an alias for crv too.
Options
flag<br>default<br>meaning
-o, --out<br>crv-out<br>output directory
--scene<br>0.30<br>scene-change sensitivity (lower = more frames)
--fps-floor<br>1.0<br>at least one frame every N seconds
--max-frames<br>150<br>hard cap on total frames
--lang<br>auto<br>Whisper language (en, zh, auto, ...)
--dedup-threshold<br>% of pixels that must change for a frame to count as new; higher = fewer frames
--dedup-window<br>compare against the last N kept frames — a shot the model already saw doesn't come back after a cutaway (1 =...