Generate and edit videos with Gemini Omni Flash

Generate and edit videos with Gemini Omni Flash | Gemini API | Google AI for Developers

English

Deutsch

Español – América Latina

Français

Indonesia

Italiano

Polski

Português – Brasil

Shqip

Tiếng Việt

Türkçe

Русский

עברית

العربيّة

فارسی

हिंदी

বাংলা

ภาษาไทย

中文 – 简体

中文 – 繁體

日本語

한국어

Get API key

Cookbook

Community

The Interactions API is now generally available. We recommend using this API for access to all the latest features and models.

Home

Gemini API

Docs

Send feedback

Generate and edit videos with Gemini Omni Flash

Note: Gemini Omni Flash is in preview . Gemini Omni Flash (gemini-omni-flash-preview) is a high-performance multimodal model designed for high-speed video generation, editing, and cinematic control. Gemini Omni is built on the following core capabilities that distinguish it from previous video models:

Native multimodality: it processes text, image, audio, and video simultaneously, giving you more cohesive, consistent, and controllable output.

Conversational editing: enabled by the Interactions API, it lets you iteratively refine and edit your videos through natural language conversation. Describe what you want to change, and the model applies the edit while preserving the parts of the video you want to keep.

World knowledge: Gemini Omni combines an understanding of physics with Gemini's knowledge of history, science, and cultural context, bridging the gap from photorealism to meaningful storytelling.

Text to video generation

Generate a video from a text prompt. The model generates a video with audio based on your text description. Write prompts with details like scene description, camera movement, lighting and mood for best results.

Python

import base64 from google import genai

client = genai.Client()

interaction = client.interactions.create( model="gemini-omni-flash-preview", input="A marble rolling fast on a chain reaction style track, continuous smooth shot." with open("marble.mp4", "wb") as f: f.write(base64.b64decode(interaction.output_video.data))

JavaScript

import { GoogleGenAI } from '@google/genai'; import * as fs from 'fs'; const ai = new GoogleGenAI({});

const interaction = await ai.interactions.create({ model: 'gemini-omni-flash-preview', input: 'A marble rolling fast on a chain reaction style track, continuous smooth shot.', });

if (interaction.output_video?.data) { fs.writeFileSync('marble.mp4', Buffer.from(interaction.output_video.data, 'base64'));

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions?key=$API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gemini-omni-flash-preview", "input": "A marble rolling fast on a chain reaction style track, continuous smooth shot." }'

REST response schema

The convenience field interaction.output_video is SDK-only . Get the video output from the steps array when using the REST API directly.

Raw REST JSON structure:

"steps": [ { "type": "user_input", "content": [{"type": "text", "text": "..."}] }, { "type": "thought", "content": [{"text": "...", "type": "thought"}] }, "type": "model_output", "content": [ "type": "video", "mime_type": "video/mp4", "data": "AAAAIGZ0eXBpc29t..." // Base64 encoded video data ], "id": "v1_...", "status": "completed", "model": "gemini-omni-flash-preview", "object": "interaction"

Control aspect ratio

Set the aspect_ratio to "9:16" to create portrait videos. Landscape (16:9) is the default.

Python

import base64 from google import genai

client = genai.Client()

interaction = client.interactions.create( model="gemini-omni-flash-preview", input="A futuristic city with neon lights and flying cars, cyberpunk style", response_format={ "type": "video", # optional "aspect_ratio": "9:16" # Supported values: "9:16", "16:9" with open("example.mp4", "wb") as f: f.write(base64.b64decode(interaction.output_video.data))

JavaScript

import { GoogleGenAI } from '@google/genai'; import * as fs from 'fs'; const ai = new GoogleGenAI({});

const interaction = await ai.interactions.create({ model: 'gemini-omni-flash-preview', input: 'A futuristic city with neon lights and flying cars, cyberpunk style', response_format: { type: 'video', // optional aspect_ratio: '9:16' // Supported values: '9:16', '16:9' }, });

if (interaction.output_video?.data) { fs.writeFileSync('example.mp4', Buffer.from(interaction.output_video.data, 'base64'));

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions?key=$API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gemini-omni-flash-preview", "input": "A futuristic city with neon lights and flying cars, cyberpunk style", "response_format": { "type": "video", "aspect_ratio": "9:16" }'

Image to video generation

You can provide a reference image with your text prompt. Depending on your prompt, the model will decide how to use the image. This is useful for bringing product shots, illustrations, or photographs to life.

The following example...

Generate and edit videos with Gemini Omni Flash

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

ZCode – Harness for GLM-5.2

Apertus – Open Foundation Model for Sovereign AI