Claude-real-video - any LLM can watch a video

cortexosmain1 pts0 comments

GitHub - HUANGCHIHHUNGLeo/claude-real-video: Let Claude (or any LLM) actually watch a video — scene-aware, deduplicated frames + transcript, from a URL or local file. Runs locally, MIT. · GitHub

/" data-turbo-transient="true" />

Skip to content

Search or jump to...

Search code, repositories, users, issues, pull requests...

-->

Search

Clear

Search syntax tips

Provide feedback

--><br>We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

-->

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

Sign in

/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

{{ message }}

HUANGCHIHHUNGLeo

claude-real-video

Public

Notifications<br>You must be signed in to change notification settings

Fork

Star

master

BranchesTags

Go to file

CodeOpen more actions menu

Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit

History<br>6 Commits<br>6 Commits

src/claude_real_video

src/claude_real_video

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

pyproject.toml

pyproject.toml

View all files

Repository files navigation

claude-real-video

Let Claude — or any LLM — actually watch a video.

Most AI tools don't really see a video. Paste a YouTube link into ChatGPT and it<br>reads the transcript , not the picture. Claude won't take a video file at all.<br>Even Gemini, which can read video natively, has to send it up to Google and<br>samples frames at a fixed interval (1 fps by default), so fast cuts slip past.

claude-real-video does it differently, and locally : point it at a URL or a<br>file, and it pulls the frames that actually matter (every scene change, not a<br>fixed quota), throws away the near-duplicates, transcribes the audio, and hands<br>you a clean folder any LLM can read — on your own machine, nothing uploaded.

crv "https://www.youtube.com/watch?v=..."<br># → crv-out/frames/*.jpg + crv-out/transcript.txt + crv-out/MANIFEST.txt

Then drop the frames + MANIFEST.txt into Claude / ChatGPT / Gemini and ask away.

Why not just sample frames?

Most "let an LLM watch a video" scripts (and Gemini's own pipeline) grab frames<br>at a fixed interval — e.g. one per second. That over-samples a static<br>screencast and under-samples a fast-cut reel. claude-real-video is smarter:

fixed-interval sampling<br>claude-real-video

Frame selection<br>every N seconds<br>scene-change detection + density floor

Repeated shots (A-B-A cuts)<br>sent again every time<br>sliding-window dedup sends each shot once

Static slide (10 min)<br>~600 near-identical frames<br>collapses to 1 (dedup)

Fast-cut reel<br>misses frames between samples<br>catches each visual change

Audio<br>often ignored<br>Whisper transcript w/ language detect

Where the video goes<br>often uploaded to a cloud<br>stays on your machine

Input<br>usually local file only<br>URL (yt-dlp) or local file

You feed the model fewer, more meaningful frames — cheaper context, better<br>understanding.

Install

pip install claude-real-video # core (frames + dedup)<br>pip install "claude-real-video[whisper]" # + audio transcription

System requirement: ffmpeg

ffmpeg / ffprobe are used for frame extraction and audio, and aren't<br>pip-installable. Install them once:

OS<br>command

macOS<br>brew install ffmpeg

Linux<br>sudo apt install ffmpeg (or your distro's package manager)

Windows<br>winget install Gyan.FFmpeg — or choco install ffmpeg — or download a build and add its bin\ folder to your PATH

Verify it's on your PATH:

ffmpeg -version

Transcription uses the whisper CLI (installed by the [whisper] extra, or<br>pip install openai-whisper). Whisper also relies on ffmpeg.

Works on macOS, Windows, and Linux — Python 3.10+.

Usage

# A YouTube / Instagram / TikTok / ... link<br>crv "https://www.instagram.com/reel/XXXX/"

# A local file, English transcript, output to ./out<br>crv lecture.mp4 -o out --lang en

# Frames only, no transcription<br>crv clip.mp4 --no-transcribe

# A login-gated video (your own / authorised use): pass a Netscape cookie file<br>crv "https://..." --cookies cookies.txt

python -m claude_real_video ... works as an alias for crv too.

Options

flag<br>default<br>meaning

-o, --out<br>crv-out<br>output directory

--scene<br>0.30<br>scene-change sensitivity (lower = more frames)

--fps-floor<br>1.0<br>at least one frame every N seconds

--max-frames<br>150<br>hard cap on total frames

--lang<br>auto<br>Whisper language (en, zh, auto, ...)

--dedup-threshold<br>% of pixels that must change for a frame to count as new; higher = fewer frames

--dedup-window<br>compare against the last N kept frames — a shot the model already saw doesn't come back after a cutaway (1 =...

video frames claude real install file

Related Articles