Flow Matching for Generative Modeling (Lipman et al.)

May 15, 2026

-diffusion only allows a narrow set of trajectories to go from noise to data (i.e. iterative, stochastic noising/denoising)

Continuous Normalizing Flows - define a continuous-time dynamical system that moves a simple distribution (i.e. Gaussian noise) into the data distribution

\frac{d x_t}{dt} = v_\theta(x_t, t)

x_t

- the state/sample at time

t

v_\theta

- a neural network that defines a velocity field

-ODE solver used to solve for

x(t)

-originally, repeatedly solve ODE through all time steps between 0 and 1, and update NN based on loss at the end instead of using the known correct direction

-much more computationally expensive

flow matching - a method to learn the $v_\theta(x_t, t)$ function for CNFs

-supervise the correct overall motion (interpolate

x_t = (1-t)x_0 + tx_1

) for a given image/time step to learn the vector field

-takes advantage of the fact that you know the "total displacement" between noise and sample, and uses that to learn the vector field

score matching - "which direction in space increases probability density the fastest"

-in diffusion used to find gradients of a probability distribution (reverse process) without modeling the underlying distribution

flow matching - objective designed to match a target probability path, allowing a distribution to flow from $p_0 \sim \mathcal{N}(x|0, I)$ to some unkown, desired $p_1$ for which there are data samples

-construct intermediate distributions

p_t(x)

, and then learn the vector field that flows from

p_0 \rightarrow p_1

by sampling data points

u_t(x)

- velocity vector field evaluated at position

x

and time

t

-FM Objective

-regression between the vector field

u_t(x)

and the neural network

v_t(x)

\mathcal{L}_{\text{FM}}(\theta) = \mathbb{E}_{t, p_t(x)} || v_t(x) - u_t(x)||^2

Process:

1.Sample

x_0 \sim \mathcal{N}(0, 1)

and

x_1 \sim p_1

2.Sample time

t

3.Calculate sample

x_t = f_t(x_0, x_1)

1.e.g. for linear interpolation, we have

x_t = (1-t)x_0 + tx_1

4.Calculate

u_t(x) = \frac{d}{dt} f_t(x_0, x_1)

1.e.g.

u_t(x) = x_1 - x_0

for linear interpolation

5.Compute regression loss

\mathcal{L} = ||v_\theta(x_t, t) - u_t(x)||^2

conditional probability path - $p_t(x | x_1)$ , intermediate probability path conditioned on ending at the sample $x_1$

-Motivation: the location/point

x_t

can be on many trajectories, making it ambiguous unless conditioned on the sample

marginal probability path - $p_t(x)$ , mixture of all conditional probability paths over samples

marginal vector field - $u_t(x)$ , expectation of all conditional vector fields over samples