Generate and edit videos with Gemini Omni Flash | Gemini API | Google AI for Developers
Skip to main content
English
Deutsch
Español – América Latina
Français
Indonesia
Italiano
Polski
Português – Brasil
Shqip
Tiếng Việt
Türkçe
Русский
עברית
العربيّة
فارسی
हिंदी
বাংলা
ภาษาไทย
中文 – 简体
中文 – 繁體
日本語
한국어
Get API key
Cookbook
Community
Sign in
The Interactions API is now generally available. We recommend using this API for access to all the latest features and models.
Home
Gemini API
Docs
Send feedback
Generate and edit videos with Gemini Omni Flash
Note: Gemini Omni Flash is in preview .<br>Gemini Omni Flash (gemini-omni-flash-preview) is a high-performance multimodal<br>model designed for high-speed video generation, editing, and cinematic control.<br>Gemini Omni is built on the following core capabilities that distinguish it from<br>previous video models:
Native multimodality: it processes text, image, audio, and video<br>simultaneously, giving you more cohesive, consistent, and controllable<br>output.
Conversational editing: enabled by the Interactions<br>API, it lets you iteratively refine<br>and edit your videos through natural language conversation. Describe what<br>you want to change, and the model applies the edit while preserving the<br>parts of the video you want to keep.
World knowledge: Gemini Omni combines an understanding of<br>physics with Gemini's knowledge of history, science, and cultural context,<br>bridging the gap from photorealism to meaningful storytelling.
Text to video generation
Generate a video from a text prompt. The model generates a video with audio<br>based on your text description. Write prompts with details like scene description,<br>camera movement, lighting and mood for best results.
Python
import base64<br>from google import genai
client = genai.Client()
interaction = client.interactions.create(<br>model="gemini-omni-flash-preview",<br>input="A marble rolling fast on a chain reaction style track, continuous smooth shot."<br>with open("marble.mp4", "wb") as f:<br>f.write(base64.b64decode(interaction.output_video.data))
JavaScript
import { GoogleGenAI } from '@google/genai';<br>import * as fs from 'fs';<br>const ai = new GoogleGenAI({});
const interaction = await ai.interactions.create({<br>model: 'gemini-omni-flash-preview',<br>input: 'A marble rolling fast on a chain reaction style track, continuous smooth shot.',<br>});
if (interaction.output_video?.data) {<br>fs.writeFileSync('marble.mp4', Buffer.from(interaction.output_video.data, 'base64'));
REST
curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions?key=$API_KEY" \<br>-H "Content-Type: application/json" \<br>-d '{<br>"model": "gemini-omni-flash-preview",<br>"input": "A marble rolling fast on a chain reaction style track, continuous smooth shot."<br>}'
REST response schema
The convenience field interaction.output_video is SDK-only .<br>Get the video output from the steps array when using the REST API directly.
Raw REST JSON structure:
"steps": [<br>{ "type": "user_input", "content": [{"type": "text", "text": "..."}] },<br>{ "type": "thought", "content": [{"text": "...", "type": "thought"}] },<br>"type": "model_output",<br>"content": [<br>"type": "video",<br>"mime_type": "video/mp4",<br>"data": "AAAAIGZ0eXBpc29t..." // Base64 encoded video data<br>],<br>"id": "v1_...",<br>"status": "completed",<br>"model": "gemini-omni-flash-preview",<br>"object": "interaction"
Control aspect ratio
Set the aspect_ratio to "9:16" to create portrait videos. Landscape (16:9)<br>is the default.
Python
import base64<br>from google import genai
client = genai.Client()
interaction = client.interactions.create(<br>model="gemini-omni-flash-preview",<br>input="A futuristic city with neon lights and flying cars, cyberpunk style",<br>response_format={<br>"type": "video", # optional<br>"aspect_ratio": "9:16" # Supported values: "9:16", "16:9"<br>with open("example.mp4", "wb") as f:<br>f.write(base64.b64decode(interaction.output_video.data))
JavaScript
import { GoogleGenAI } from '@google/genai';<br>import * as fs from 'fs';<br>const ai = new GoogleGenAI({});
const interaction = await ai.interactions.create({<br>model: 'gemini-omni-flash-preview',<br>input: 'A futuristic city with neon lights and flying cars, cyberpunk style',<br>response_format: {<br>type: 'video', // optional<br>aspect_ratio: '9:16' // Supported values: '9:16', '16:9'<br>},<br>});
if (interaction.output_video?.data) {<br>fs.writeFileSync('example.mp4', Buffer.from(interaction.output_video.data, 'base64'));
REST
curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions?key=$API_KEY" \<br>-H "Content-Type: application/json" \<br>-d '{<br>"model": "gemini-omni-flash-preview",<br>"input": "A futuristic city with neon lights and flying cars, cyberpunk style",<br>"response_format": {<br>"type": "video",<br>"aspect_ratio": "9:16"<br>}'
Image to video generation
You can provide a reference image with your text prompt. Depending on your<br>prompt, the model will decide<br>how to use the image. This is useful for bringing product shots, illustrations,<br>or photographs to life.
The following example...