We Made a Cinematic Heist Trailer With AI for $60. Here’s Exactly How. | by James Pelton | May, 2026 | MediumSitemapOpen in appSign up<br>Sign in
Medium Logo
Get app<br>Write
Search
Sign up<br>Sign in
We Made a Cinematic Heist Trailer With AI for $60. Here’s Exactly How.
James Pelton
5 min read·<br>Just now
Listen
Share
Four AI models. Zero actors. Zero cameras. Two days. A full production breakdown.<br>https://www.youtube.com/watch?v=D3GUxviHd4A<br>Last week we shipped a 60-second heist trailer for Zencoder. No actors. No cameras. No DP. Budget: about sixty bucks.<br>Here’s the trailer if you haven’t seen it:<br>The concept was simple: explain what multi-model AI orchestration actually does by turning it into a heist movie. Each frontier model plays a specialist on a crew. Claude is the Architect. GPT is the Safecracker. Gemini is the Lookout. Zenflow — the product we were actually trying to explain — is the faceless mastermind in the back of the van, running the whole operation.<br>Ocean’s Eleven, but for code.<br>This post is the full behind-the-scenes breakdown — what tools we used, what they cost, what went wrong, and what we’d do differently.<br>The Stack<br>Four tools, each doing one job:<br>Higgsfield (AI video generation) — all cinematics. Every shot of every character was generated, not filmed.<br>ElevenLabs (AI voice) — the voiceover narration. One take, Antoni voice.<br>Suno (AI music) — the heist-pulse score.<br>Remotion (React video framework) — compositing, text overlays, terminal animations, the end card. The glue that holds it together.<br>No After Effects. No Premiere. The entire composition pipeline is a React app that renders to MP4.<br>The Cost Breakdown<br>ToolCostWhat it didHiggsfield$23.7611 source clips across ~30 takesElevenLabs~$2Single VO take via APISuno~$5Custom heist-pulse scoreRemotion$0Open source, local CPU renderTotal~$60<br>The Higgsfield number is the interesting one. We’re on their ULTRA plan ($250/month for 6,000 credits), so the 570 credits we burned were about 9.5% of one month’s pool. If you were paying per-credit without a plan, it’d be higher — but the point is this wasn’t a $10K production. It was a Tuesday afternoon.<br>The Characters<br>Before generating a single shot, we locked character references. This is the step most people skip and then regret.<br>Each character got a detailed visual spec — clothing, lighting, props, accessories — and we generated reference stills first using Higgsfield’s Soul feature. These refs then got passed into every shot containing that character for consistency.<br>The Architect (Claude): Late-30s man, navy three-piece suit, tortoiseshell glasses, brass signet ring. Warm desk-lamp lighting over a drafting table. Blueprint in hand. Calm.<br>The Safecracker (GPT): Black tuxedo, white gloves, stethoscope. Single overhead bulb in a dim corridor. Hands precise. Focused.<br>The Lookout (Gemini): Trinity-from-Matrix archetype. Dark technical jacket, short dark hair, sharp jawline. Lit half-cyan by monitor glow. Expressionless.<br>The Wildcard (Grok): Leather jacket, fingerless gloves, cigar smoke. Dashboard lighting. Matte-black muscle car. Smirk. (Cameo role only — appears when Claude times out.)<br>The Mastermind (Zenflow): Hooded figure, face fully shadowed, tactical vest over a hoodie, six monitors. Faceless on purpose — Zenflow is the system, not a person.<br>What Went Wrong (The Honest Part)<br>This is the section you’re actually here for.<br>1. AI can’t stop hallucinating text on blank surfaces.<br>We needed blank props on several characters — a blank lapel pin on Claude, a blank shoulder patch on Grok — so we could composite real logos in Remotion later. Higgsfield had other ideas.<br>Grok’s leather jacket came back with “BLACK” scrawled across it. The Zenflow mastermind’s tactical vest generated with the word “TALSY” prominently displayed. Nobody knows what TALSY means. The AI certainly doesn’t.<br>Our rule: three attempts to get a clean blank surface, then give up and composite over whatever text the AI hallucinated. We shipped “TALSY” with a Zencoder logo slapped on top. You’d never know.<br>2. A man sitting at a desk got flagged as NSFW.<br>Beat 8a was supposed to be Claude (the architect) calmly setting down a pen at a drafting table. Take 2 came back rejected by Seedance’s content filter. Same prompt as take 1, which passed fine. Completely random false positive. We moved on with one viable take.<br>3. Identity drift is the real enemy.<br>The biggest risk with multi-character AI video isn’t quality — it’s consistency. Character drift between shots. Claude looking like a different person in Beat 3 versus Beat 8. We budgeted 6 takes per character beat specifically for this, ended up doing 3 because the reference-locking worked better than expected. But this is the thing that will eat your entire budget if you don’t plan for it.<br>4. Sub-4-second clips don’t exist.<br>Higgsfield’s practical minimum generation is about 4 seconds. Half our shots in the edit are under 2 seconds. The workaround: generate everything at 4–6 seconds with...