AsymFlow: Turning Latent Diffusion Models into Pixel-Space Generators

steveharing11 pts0 comments

AsymFlow Claims More Realistic AI Images by Moving Beyond Latent Diffusion - Firethering

back to top

Home

Softwares

AI Tools

DevTools

3D Tools

Design Tools

Image Editors

Video Editors

Productivity

Utilities

Apps

Android Apps

iOS Apps

Games

Windows Games

macOS Games

Android Games

iOS Games

Tech

Picks

AI Picks

AI Models

Trends

Search

Sunday, May 17, 2026

Home

Softwares

AI Tools

DevTools

3D Tools

Design Tools

Image Editors

Video Editors

Productivity

Utilities

Apps

Android Apps

iOS Apps

Games

Windows Games

macOS Games

Android Games

iOS Games

Tech

Picks

AI Picks

AI Models

Trends

Facebook<br>Instagram<br>Twitter<br>Vimeo<br>Youtube

Home

Softwares

AI Tools

DevTools

3D Tools

Design Tools

Image Editors

Video Editors

Productivity

Utilities

Apps

Android Apps

iOS Apps

Games

Windows Games

macOS Games

Android Games

iOS Games

Tech

Picks

AI Picks

AI Models

Trends

Search

HomeTechAsymFlow Claims More Realistic AI Images by Moving Beyond Latent Diffusion

AsymFlow Claims More Realistic AI Images by Moving Beyond Latent Diffusion

via: AsymFlow Github Repo

By Mohit Geryani

May 17, 2026

Last updated: May 17, 2026

Share

Facebook

Twitter

Pinterest

WhatsApp

- Advertisement -

At some point the field quietly agreed that pixel space was too hard and moved on.

Stable Diffusion, FLUX, every serious text-to-image model you’ve used in the last three years works in latent space. Instead of generating actual pixels directly, these models compress images into a smaller mathematical representation, do all the expensive work there, then decompress back to pixels at the end. It’s faster, it’s cheaper to train, and it made the current generation of image models possible.

The cost is subtle but noticable. That compression step loses information. Fine textures, sharp edges, precise details, things that live at the pixel level get smoothed over in ways that latent models can never fully recover because by the time they’re generating, those details are already gone.

Researchers at Stanford just published a way around this. AsymFlow doesn’t ask you to abandon your latent model or train a pixel model from scratch. It takes what you already have and converts it. And the result beats the latent model it started from.

Table of Contents

The asymmetric trick that changes the math

Standard flow models predict velocity essentially the direction and speed the model should move from noise toward a clean image. The problem in pixel space is that predicting velocity means predicting both the data term and the noise term at full pixel resolution simultaneously. That’s an enormous amount of work for a transformer, most of which is spent modeling high-dimensional noise that doesn’t carry much useful information anyway.

AsymFlow splits that prediction asymmetrically. The data term stays full-dimensional because that’s where the actual image lives. The noise term gets restricted to a low-rank subspace, a mathematically smaller representation that captures the essential noise structure without the computational overhead of full pixel prediction. From those two asymmetric predictions, the full velocity gets recovered analytically without changing the network architecture or the training procedure.

The practical result is a model that does meaningful work in pixel space without paying the full computational cost that made pixel generation impractical in the first place. Think of it as finding the part of noise prediction that actually matters and ignoring the rest.

On ImageNet 256×256, this approach hits 1.57 FID, the best result among pixel diffusion models in the DiT and JiT family by a clear margin.

AsymFlow

Surpassing FLUX.2 klein on its own benchmarks

via: AsymFlow Github Repo<br>Finetuned from FLUX.2 klein 9B, AsymFLUX.2 klein is the pixel-space version of a model that already has serious capabilities. The finetuning works because AsymFlow aligns the latent space mathematically to a low-rank pixel subspace before training starts. The pixel model begins with the latent model’s full understanding of text, composition, and structure already intact. Finetuning then corrects the low-level detail that latent compression lost.

On HPSv3, which measures human preference for image quality and aesthetics, AsymFLUX.2 klein scores 10.66 against FLUX.2 klein base at 9.50. On DPG-Bench, which tests prompt adherence, it scores 86.8 against the base’s 85.2. On GenEval, 0.82 versus 0.80.

Those aren’t huge gaps but the direction matters. A pixel model finetuned from a latent base is beating that latent base on its own evaluation benchmarks. The detail and texture improvements you’d expect from pixel-space generation are showing up in the scores.

For context, FLUX.1 dev, a much larger and more established model sits at 10.43 on HPSv3. AsymFLUX.2 klein is above that.

You May Like: Open Source AI Image Editing Models That Challenge Google’s Nano Banana

What this means if you already have a latent...

pixel games latent model models asymflow

Related Articles