Show HN: We made a cinematic heist trailer with 4 AI models for $60

We Made a Cinematic Heist Trailer With AI for $60. Here’s Exactly How. | by James Pelton | May, 2026 | MediumSitemapOpen in appSign up Sign in

Medium Logo

Get app Write

We Made a Cinematic Heist Trailer With AI for $60. Here’s Exactly How.

James Pelton

5 min read· Just now

Listen

Four AI models. Zero actors. Zero cameras. Two days. A full production breakdown. https://www.youtube.com/watch?v=D3GUxviHd4A Last week we shipped a 60-second heist trailer for Zencoder. No actors. No cameras. No DP. Budget: about sixty bucks. Here’s the trailer if you haven’t seen it: The concept was simple: explain what multi-model AI orchestration actually does by turning it into a heist movie. Each frontier model plays a specialist on a crew. Claude is the Architect. GPT is the Safecracker. Gemini is the Lookout. Zenflow — the product we were actually trying to explain — is the faceless mastermind in the back of the van, running the whole operation. Ocean’s Eleven, but for code. This post is the full behind-the-scenes breakdown — what tools we used, what they cost, what went wrong, and what we’d do differently. The Stack Four tools, each doing one job: Higgsfield (AI video generation) — all cinematics. Every shot of every character was generated, not filmed. ElevenLabs (AI voice) — the voiceover narration. One take, Antoni voice. Suno (AI music) — the heist-pulse score. Remotion (React video framework) — compositing, text overlays, terminal animations, the end card. The glue that holds it together. No After Effects. No Premiere. The entire composition pipeline is a React app that renders to MP4. The Cost Breakdown ToolCostWhat it didHiggsfield$23.7611 source clips across ~30 takesElevenLabs~$2Single VO take via APISuno~$5Custom heist-pulse scoreRemotion$0Open source, local CPU renderTotal~$60 The Higgsfield number is the interesting one. We’re on their ULTRA plan ($250/month for 6,000 credits), so the 570 credits we burned were about 9.5% of one month’s pool. If you were paying per-credit without a plan, it’d be higher — but the point is this wasn’t a $10K production. It was a Tuesday afternoon. The Characters Before generating a single shot, we locked character references. This is the step most people skip and then regret. Each character got a detailed visual spec — clothing, lighting, props, accessories — and we generated reference stills first using Higgsfield’s Soul feature. These refs then got passed into every shot containing that character for consistency. The Architect (Claude): Late-30s man, navy three-piece suit, tortoiseshell glasses, brass signet ring. Warm desk-lamp lighting over a drafting table. Blueprint in hand. Calm. The Safecracker (GPT): Black tuxedo, white gloves, stethoscope. Single overhead bulb in a dim corridor. Hands precise. Focused. The Lookout (Gemini): Trinity-from-Matrix archetype. Dark technical jacket, short dark hair, sharp jawline. Lit half-cyan by monitor glow. Expressionless. The Wildcard (Grok): Leather jacket, fingerless gloves, cigar smoke. Dashboard lighting. Matte-black muscle car. Smirk. (Cameo role only — appears when Claude times out.) The Mastermind (Zenflow): Hooded figure, face fully shadowed, tactical vest over a hoodie, six monitors. Faceless on purpose — Zenflow is the system, not a person. What Went Wrong (The Honest Part) This is the section you’re actually here for. 1. AI can’t stop hallucinating text on blank surfaces. We needed blank props on several characters — a blank lapel pin on Claude, a blank shoulder patch on Grok — so we could composite real logos in Remotion later. Higgsfield had other ideas. Grok’s leather jacket came back with “BLACK” scrawled across it. The Zenflow mastermind’s tactical vest generated with the word “TALSY” prominently displayed. Nobody knows what TALSY means. The AI certainly doesn’t. Our rule: three attempts to get a clean blank surface, then give up and composite over whatever text the AI hallucinated. We shipped “TALSY” with a Zencoder logo slapped on top. You’d never know. 2. A man sitting at a desk got flagged as NSFW. Beat 8a was supposed to be Claude (the architect) calmly setting down a pen at a drafting table. Take 2 came back rejected by Seedance’s content filter. Same prompt as take 1, which passed fine. Completely random false positive. We moved on with one viable take. 3. Identity drift is the real enemy. The biggest risk with multi-character AI video isn’t quality — it’s consistency. Character drift between shots. Claude looking like a different person in Beat 3 versus Beat 8. We budgeted 6 takes per character beat specifically for this, ended up doing 3 because the reference-locking worked better than expected. But this is the thing that will eat your entire budget if you don’t plan for it. 4. Sub-4-second clips don’t exist. Higgsfield’s practical minimum generation is about 4 seconds. Half our shots in the edit are under 2 seconds. The workaround: generate everything at 4–6 seconds with...

Show HN: We made a cinematic heist trailer with 4 AI models for $60

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

SpaceX not the behemoth everyone thought

The Mirror Is Part of the Machine

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits