Introduction to Conditional Flow Matching – Part I, Normalizing Flows

huet.ing - Introduction to Conditional Flow Matching - Part I, Normalizing Flows

Introduction to Conditional Flow Matching - Part I, Normalizing Flows

Dec 2024

Generative AI is the study and application of algorithms to generate data following some distribution. The distribution could be of anything, in any modality: images of cats and dogs, samples of human speech, medical imaging data, natural language, etc. The accurate modeling of these distributions has shown great potential in creating commercial value as well as helping to address some fundamental societal problems: generative models are being used to design novel proteins for medicine, to reconstruct medical scans from shorter and safer acquisitions, and to give a synthetic voice back to people who have lost theirs. At their core, all generative techniques share the same goal: to approximate a complex generating distribution using only the samples we have from that distribution.

Over the years, a variety of powerful approaches to achieving this goal have emerged. Some famous examples of these techniques include Generative Adversarial Networks (GANs), autoregressive models, and diffusion models. Each of these methods, while mathematically rich, is grounded in concepts that are relatively straightforward to explain intuitively. GANs approximate the distribution by pitting two networks against each other: a generator, which turns random noise into candidate samples, and a discriminator, which is shown a mix of real training data and the generator's output and must judge which is which. The generator tries to fool the discriminator, the discriminator tries to catch the generator. Over time, both improve at their respective tasks, resulting in high quality samples from the generator that accurately represent the underlying training data distribution. Autoregressive models — of which the transformer-based large language models are the most famous example — break the problem of generating a whole sequence into a chain of much smaller ones: predict a distribution over just the next token given everything generated so far, sample from it, append, and repeat. Diffusion models corrupt training samples with noise in many small steps until nothing but noise remains, then train a model to undo the corruption one step at a time; generating means starting from fresh noise and running the learned denoiser.

A more recent arrival is Conditional Flow Matching (CFM), introduced in 2022 and already the engine behind several state-of-the-art image and video generators. In contrast to the techniques mentioned above, it is not so straightforward to give an intuition for what's happening in CFM. At a very high level, it is a way to transform a simple generating distribution into the target distribution by flowing through a continuous set of transformations, with a few tricks thrown in to make the modeling tractable. As that is a little too vague of an explanation for anyone's comfort, I've written this article. I aim to provide a thorough, step-by-step explanation of CFMs, offering intuition for each part of the journey.

Today we start with part I on normalizing flows, which constitute a method to construct complex distributions by transforming a simpler distribution through a series of invertible mappings.

Normalizing flows

Before we dive in, it helps to state what we actually need from a generative model. We need two things. First, a way to sample new data points: generation is the whole point, and whatever machinery we build must ultimately let us draw fresh samples that behave as if they came from the data distribution. Second, a way to score how likely a given data point is under the model. This one is less obvious, but it is what makes training possible: if the model can tell us how probable each training example is, then "make the training data probable" becomes an objective we can optimize, and a model under which the real data is likely is precisely a model that has captured the distribution. Normalizing flows are built to deliver exactly these two abilities. They transform a simple distribution (easy to sample from, easy to score) through an invertible transformation, while keeping exact track of what that transformation does to the density. The rest of this article builds up that machinery from scratch.

Let's say we have a random variable \(X\) distributed according to a simple uniform distribution between 0 and 1: \(X \sim p_x = U(0, 1)\). It is a constant function, analytically defined on the domain \([0, 1]\) as \(p_x(x) = 1\) and \(0\) elsewhere. Any sample between 0 and 1 is equally likely, and the PDF over the full domain integrates to 1.

But now let's say that after taking our sample \(x\), we transform it by some continuously differentiable invertible function \(\phi(x)\), with a continuously differentiable inverse, to get \(y = \phi(x)\). As an example, let's pick \( \phi(x) = 2x \). This induces by way of our transformation \(\phi(x)\) a new...

Introduction to Conditional Flow Matching – Part I, Normalizing Flows

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

ZCode – Harness for GLM-5.2

Apertus – Open Foundation Model for Sovereign AI