Autolectures<br>Examples<br>FinanceComputer ScienceMedicineHistory
What is a leveraged buyout?How do interest rates move through the economy?Why do banks fail?
Claude Opus 4.8GPT-5.5Gemini 3.1 Pro
Overview<br>The goal of this project is simple: given a prompt like "what is a leveraged buyout?", get back a lecture video. More specifically, a narrated, animated explainer video, with slides, a voiceover, and elements that appear in time with the narration.<br>You could try to do this with a video-generation model that produces the pixels directly, but we take a different approach based on the idea that an LLM (today) is (in my opinion) better at writing code than what a SOTA diffusion model is at generating pixels directly. Instead of generating a video, the model writes the source code of one: a sequence of Remotion scene components (React that renders to video frames) along with a narration script for each. We then render that code ourselves.<br>Working in code rather than pixels also changes what we can do with the result. A scene is something we can read and edit, so fixing a wrong label or color is a one-line change and a re-render, not a re-roll of the whole video in the hope that the next sample comes out better. Text and diagrams come out exactly as written, which still isn't a given with diffusion models on dense, labelled content. And the part that matters most for what follows is that a scene renders to a DOM we can inspect and measure headlessly, which is exactly what lets us check each scene's layout in Step 2.<br>The pipeline has five steps. The model writes the scenes in an agent loop, we check and repair their layout, we turn each script into narration with word-level timestamps, we time the reveals so each element appears as it is mentioned, and we finally render everything into a single MP4.<br>The pipeline end to end: a prompt goes in and an MP4 comes out. Step 2 is a loop rather than a strict stage, since it renders each scene, measures it, and sends anything that overflows back to be repaired and re-measured. Reveal timing is optional.
Step 1 - Writing the scenes<br>The first step is a single agent loop. We hand the model the prompt and a small set of tools, and it writes the video one file at a time until it decides it is done.<br>The unit it produces is a scene, which is a self-contained React component, sized to the 1920x1080 frame and styled with Tailwind, paired with a plain-text narration script. It is up to the model how many scenes the topic needs, what each one shows, and how the narration carries from one to the next.<br>The tools are deliberately minimal. write_file saves a scene component or a narration file, web_search lets the model ground facts it is unsure about, and search_image pulls real photos through Brave when a topic has a real-world subject like a person, a place, or a product, so the scene can use an actual image instead of a drawn approximation. Importantly we want to steer the models as little as possible, and nothing about the shape of the video is fixed in advance.<br>One convention does the heavy lifting later. As it writes a scene, the model tags the elements that should appear one at a time with data-animate="reveal-1", data-animate="reveal-2", and so on, in the order it wants them to land, and it writes the narration to describe them in that same order. Turning those tags into real timings is the job of Step 4; here the model is only declaring intent.<br>export function Scene() {<br>return (
THE BUSINESS MODEL<br>A bank doesn't keep your money in a vault
When you deposit $100, the bank keeps only a small fraction on hand.
{/* Flow diagram */}
DEPOSITORS<br>$100
THE BANK<br>{/* $10 kept as reserves, $90 lent out */}
BORROWERS<br>$90
The bank profits on the gap. This is called fractional reserve banking.
);<br>}The abridged source for one scene from the banks-fail video, and the 1920x1080 frame it renders to. The reveals run down the page in the order the narration introduces them: the headline and the one-line setup, then the depositors, the bank, and the borrowers, and finally the term for it, fractional reserve banking.One more thing happens on every write. When the model saves a .tsx file, we parse it before accepting it and hand any syntax error straight back as the tool result, so it gets fixed on the next turn. That is the first and cheapest check we run on generated code. The more interesting one is Step 2: whether the scene actually fits on screen.
Step 2 - Checking and repairing layout<br>Every model we tried shares one failure mode: it writes scenes that overflow the frame. Because the model never sees what its code renders to, it tends to pack in one card too many or size the text a little too large, and the bottom of the scene ends up past the 1080-pixel edge, where it gets cut off in the video. This is the most common problem by far, and it is also the easiest to catch, because a scene is something we can render and measure.<br>After the writing agent finishes, we render every scene to a...