I indexed 936 Lex Fridman episodes into a RAG that cites its sources

GitHub - aranajhonny/omnipod: Chat with Podcast Transcripts. · GitHub

/" data-turbo-transient="true" />

Search or jump to...

Search code, repositories, users, issues, pull requests...

-->

Clear

Search syntax tips

Provide feedback

--> We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

-->

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

/;ref_cta:Sign up;ref_loc:header logged out"}" Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

aranajhonny

omnipod

Public

Notifications You must be signed in to change notification settings

Fork

Star

main

BranchesTags

Go to file

CodeOpen more actions menu

Folders and files NameNameLast commit message Last commit date Latest commit

History 25 Commits 25 Commits

core

data/transcripts

lex_podcast/data

.gitignore

README.md

app.py

download_transcripts.py

ingest.py

requirements.txt

View all files

Repository files navigation

🎙️ OmniPod

Chat with 936 podcast episodes. Every answer cites its source.

Ask "What did Karpathy say about neural networks?" — get an answer with the exact transcript chunk it came from. No hallucinations. No guessing.

Why OmniPod?

Most RAG chatbots hallucinate. You ask about a podcast, they invent quotes.

OmniPod doesn't. Every response is grounded — verified against the actual transcript before it reaches you. If the source doesn't support the answer, it says so.

Three query types, one pipeline:

Type Example Strategy

Factual "What did Huberman say about sleep?" Retrieve → Generate → Verify

Synthetic "Compare AI safety views across guests" Map-Reduce → Deduplicate → Synthesize

Generative "Write an essay on consciousness from these episodes" Plan → Draft → Ground

How it works

You ask a question ┌─────────────┐ │ Router │ classify_intent() — routes to the right handler │ LRU cache │ avoids re-embedding repeated queries │ Semaphore │ caps concurrent LLM calls at 5 └──────┬──────┘ ┌─────────────┐ │ Retrieval │ bge-small-en-v1.5 (384d) → Qdrant cosine │ 19,140 │ chunks from 936 Lex Fridman episodes │ chunks │ Guest filtering via known-guests index └──────┬──────┘ ┌─────────────┐ │ Generate + │ DeepSeek V4 Flash via OpenCode API │ Verify │ verify_groundedness() — rejects ungrounded answers └──────┬──────┘ Cited answer in Chainlit UI (localhost:8000)

60-second setup

.env docker run -d --name qdrant -p 6333:6333 qdrant/qdrant python ingest.py --rebuild chainlit run app.py # → http://localhost:8000">git clone https://github.com/aranajhonny/omnipod && cd omnipod python3.13 -m venv .venv && source .venv/bin/activate pip install -r requirements.txt echo "OPENCODE_API_KEY=sk-your-key" > .env docker run -d --name qdrant -p 6333:6333 qdrant/qdrant python ingest.py --rebuild chainlit run app.py # → http://localhost:8000

Numbers that matter

Metric Value

Episodes indexed 936 Lex Fridman

Chunks 19,140 (512 chars, 128 overlap)

Embedding dim 384 (bge-small-en-v1.5, MPS GPU)

Query embedding ~100ms

Vector search ~50ms (cosine, 19K points)

Full answer ~2s on M1 Pro

Full ingest ~8 min

Codebase 1,138 lines Python, 9 files

Transcript scraper included

No YouTube API key needed. Two sources:

lexfridman.com — scrapes official transcript pages (requests + BeautifulSoup)

YouTube — uses free proxy at youtubetranscript.pro for auto-captions

cd lex_podcast pip install requests beautifulsoup4 python run.py pipeline # scrapes all 936 episodes

Output lands in data/transcripts/.

Example queries

"What did Andrej Karpathy say about neural networks?" "Compare views on AI safety across all guests" "Write a short essay on human consciousness based on these episodes" "Summarize what Andrew Huberman says about sleep"

Architecture decisions

Why bge-small-en-v1.5? 384-dim embeddings are fast to search and good enough for conversational podcast text. Runs locally on MPS GPU.

Why Qdrant over Chroma? Cosine search at 19K points in ~50ms. Filterable by guest metadata out of the box.

Why intent routing? Factual, synthetic, and generative queries need fundamentally different retrieval and generation strategies. One prompt fits all fails at scale.

Why groundedness verification? LLMs default to confident BS. verify_groundedness() forces the model to check its answer against the retrieved context before showing it to the user.

License

MIT

About

Chat with Podcast Transcripts.

Resources

Readme

Uh oh!

There was an error...

I indexed 936 Lex Fridman episodes into a RAG that cites its sources

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

It's Not Just X. It's Y