Semantic Browsing: Controllable Diversity for Image Generation
Semantic Browsing: Controllable Diversity for Image Generation
ECCV 2026
Sara Dorfman*,<br>Maya Vishnevsky*,<br>Omer Dahary,<br>Or Patashnik,<br>Daniel Cohen-Or
Tel Aviv University
* Equal contribution
arXiv
Code coming soon
Your browser does not support the video tag.
TL;DR
Semantic Browsing introduces an<br>agentic workflow that turns a single text prompt into a<br>structured, browsable gallery of<br>diverse image interpretations, where each variation<br>reflects meaningful and controllable semantic choices<br>rather than stochastic sampling.
Abstract
Modern text-to-image models produce high-fidelity images that closely follow prompts, but<br>repeated sampling often collapses toward a single semantic interpretation.<br>Semantic Browsing introduces controllable diversity: users explore generated images through<br>meaningful, interpretable variations rather than incidental stochastic changes. The method<br>shifts diversity to the text level, using a multi-agent workflow to expand prompts into<br>structured scene representations and to identify plausible under-specified axes of<br>variation. Each generated branch corresponds to a specific semantic decision, creating a<br>navigable design space while preserving the original prompt intent.
Semantic Browsing for Image Generation.<br>From a single text prompt A poster featuring animals,<br>the system produces a structured gallery of images that explore different meaningful<br>interpretations of the same scene. Rather than random variations, each image reflects a<br>distinct, coherent semantic choice (e.g., changes in character, composition, or style)<br>allowing users to browse a space of alternatives in a deliberate and interpretable way.<br>In this visualization, the leftmost image serves as the root for the four variations in the<br>center. The variation highlighted with a purple border is then selected as the specific<br>parent for its four children displayed on the right.
Overview
Semantic diversity through structured scene refinement
Structured scene-tree expansion
🧾We represent each fully specified scene as a structured JSON, capturing objects, attributes, interactions, and global scene properties.
🌳The method builds a rooted tree of JSON scenes: each node is a complete scene interpretation and each edge applies one semantic constraint.
🔁The tree grows iteratively by invoking the agentic workflow at a selected node, producing children that preserve the branch history.
Example of semantic browsing produced by our method.<br>Starting from an initial scene interpretation inferred from the user prompt, the method<br>explores alternative realizations by committing explicit semantic constraints at each step.<br>Each branching point corresponds to alternative realizations of a single semantic aspect,<br>while previously fixed constraints are preserved. Branching points also include an option<br>to preserve the current value of the selected aspect, allowing exploration to continue<br>along other semantic dimensions. Every node is a fully specified, renderable scene;<br>preserve branches propagate these states to the final level, ensuring the leaf<br>nodes contain all generated representations ready for rendering.
How to expand the tree?
Tree requirements
📐Semantic structuring: siblings branch along one shared semantic aspect, such as interaction, composition, or style.
🌈Heterogeneity: each child realizes that aspect in a distinct way, creating meaningful conceptual spread.
✅Plausibility: every refinement remains consistent with the original prompt and with constraints already fixed along its path.
Multi-agent workflow
🔎Context Analyst: separates fixed constraints from mutable scene details, defining a plausible search space for modification.
💡Brainstormer: groups mutable details into high-level semantic aspects, encouraging structured branches rather than isolated edits.
🎯Decision Maker: selects one impactful aspect and instantiates it into diverse alternative constraints for sibling nodes.
🛡️Critic: validates and refines the proposed constraints so they remain faithful, non-contradictory, and clearly distinct.
Multi-Agent workflow guiding an iterative JSON generation process.<br>The pipeline takes the current JSON configuration and a history of constraints derived<br>from previous modifications (including the user prompt) as inputs. A sequence of agents -<br>Context Analyst, Brainstormer, Decision Maker, and<br>Critic - analyzes these inputs to select an aspect to modify and formulate<br>specific instructions. The JSON Refiner then translates these instructions into an updated<br>JSON configuration, and the new modifications are added to the constraint set for<br>subsequent iterations.
Results
Galleries of semantic alternatives
All images shown are derived from a single initial scene. The outer gray groupings organize<br>results that share a direct common ancestor scene. Inside, the colored boxes distinguish<br>sibling branches—parallel variations that share that same parent...