Introducing Genblaze: A Python SDK for Generative Media Pipelines Genblaze: An Open-Source Python SDK for Multi-Provider Generative Media Pipelines
A year ago, a good video model was a novelty. Today there are at least six worth using, and most of the teams we talk to are wiring up two or three of them into the same product, alongside image models, voice synthesis, and music generation. The hard question isn’t whether you can generate this kind of media. It’s how to build a pipeline that handles five providers without falling over.
That’s why we built Genblaze, an open-source Python SDK from Backblaze for building generative media pipelines: one API across video, image, and audio providers, swappable models, durable object storage, and a SHA-256-verified provenance manifest on every run.
The pipeline is becoming the moat
Models are commoditizing. New video, image, and audio releases drop every couple of months, and each one tends to be the best at one specific thing and middling at the rest. Nobody we work with is betting on a single provider anymore. They build a portfolio and configure fallbacks.
The pipeline is what stays. It’s where you’ve figured out which model handles which shot type and which voice fits which brand. It’s where retry logic and output guards live, and where your audit trail comes from. That work survives the next model release. The prompts you tuned for last quarter’s hero model don’t.
For a pipeline to actually be durable, though, it has to be reactive. Hard-coding one provider, blocking on every step, and returning a single synchronous result is fine for a demo. In production it ages out in weeks. The pipelines that hold up stream progress as events, fan out concurrent work, handle backpressure from slow providers, and let you add a new model with a one-line change.
That’s what Genblaze is designed to be. One pipeline object, every provider behind the same surface, and a new model is one more .step().
A workflow that uses five providers
Here’s a concrete example: producing a short brand film from a one-paragraph brief.
1. Storyboard frames. Lock the visual direction with Seedream 5.0 Lite or FLUX via GMI Cloud, or Imagen on Google.
2. Animate the approved frame. Kling image-to-video on GMI Cloud, Veo on Google, Runway Gen-4 Turbo, or Luma Ray-2. They’re good at different shot types, so we usually try two and pick. Setting chain=True on the pipeline passes the image from step one into the video step automatically.
3. Score and sound design. Music from Stability AI’s Stable Audio or GMI Cloud’s MiniMax. Ambient effects and voiceover from ElevenLabs. LMNT for low-latency text to speech (TTS) when responsiveness matters.
4. Upscale. There’s an upscale step type built in. Route the rendered video through a Replicate upscaler like Real-ESRGAN to hit delivery resolution.
5. Classify and tag. Use a vision-capable chat() call to tag scenes, run brand safety checks, or generate accessibility metadata. Gemini 2.5, GPT-4o, or Llama 3.2 Vision on GMI Cloud all handle this.
That’s five providers across five different model types, defined in one pipeline. The same retry behavior, fallback chains, and provenance manifest apply to every step.
from genblaze_core import Pipeline, Modality<br>from genblaze_gmicloud import (<br>GMICloudImageProvider, GMICloudVideoProvider, GMICloudAudioProvider,<br>from genblaze_replicate import ReplicateProvider<br>from genblaze_google import GeminiChatProvider
run, manifest = (<br>Pipeline("brand-film", chain=True)<br>.step(GMICloudImageProvider(), model="seedream-5.0-lite", prompt="...", modality=Modality.IMAGE)<br>.step(GMICloudVideoProvider(), model="Kling-Image2Video-V2.1-Master", prompt="...", modality=Modality.VIDEO)<br>.step(GMICloudAudioProvider(), model="minimax-music-2.5", prompt="...", modality=Modality.AUDIO)<br>.step(ReplicateProvider(), model="nightmareai/real-esrgan", step_type="upscale")<br>.step(GeminiChatProvider(), model="gemini-2.5-pro", step_type="classify",<br>prompt="Tag scenes, return JSON with shots, mood, brand-safety flags.")<br>.run(sink=storage, timeout=900)
Swap any step for a different provider and nothing else in the pipeline has to change.
Provenance
Every run produces a canonical, hash-bound manifest that records the provider, model, prompt, parameters, timestamps, and the URI of every asset it produced. You can embed it directly into the output file (.mp4, .png, .jpg, .webp, .mp3, .wav are all supported by the matching media handler), or persist it as a sidecar JSON.
The hash is deterministic, so anyone downstream can verify the file by calling manifest.verify(). The same manifest is replayable: genblaze replay manifest.json reconstructs the run with the same parameters. And because every manifest carries a parent_run_id, you can trace a v3 video back through v2 and v1, including the fork where you tried Runway instead of Kling.
If you’re building customer-facing pipelines, this is what gets you from "we generated this" to "here’s the...