Flow Matching for Generative Modeling (Lipman et al.)

May 15, 2026

-diffusion only allows a narrow set of trajectories to go from noise to data (i.e. iterative, stochastic noising/denoising)

Continuous Normalizing Flows - define a continuous-time dynamical system that moves a simple distribution (i.e. Gaussian noise) into the data distribution

dxtdt=vθ(xt,t)\frac{d x_t}{dt} = v_\theta(x_t, t)
-xtx_t - the state/sample at time tt
-vθv_\theta - a neural network that defines a velocity field
-ODE solver used to solve for x(t)x(t)
-originally, repeatedly solve ODE through all time steps between 0 and 1, and update NN based on loss at the end instead of using the known correct direction
-much more computationally expensive

flow matching - a method to learn the vθ(xt,t)v_\theta(x_t, t) function for CNFs

-supervise the correct overall motion (interpolate xt=(1t)x0+tx1x_t = (1-t)x_0 + tx_1) for a given image/time step to learn the vector field
-takes advantage of the fact that you know the "total displacement" between noise and sample, and uses that to learn the vector field

score matching - "which direction in space increases probability density the fastest"

-in diffusion used to find gradients of a probability distribution (reverse process) without modeling the underlying distribution

flow matching - objective designed to match a target probability path, allowing a distribution to flow from p0N(x0,I)p_0 \sim \mathcal{N}(x|0, I) to some unkown, desired p1p_1 for which there are data samples

-construct intermediate distributions pt(x)p_t(x), and then learn the vector field that flows from p0p1p_0 \rightarrow p_1 by sampling data points
-ut(x)u_t(x) - velocity vector field evaluated at position xx and time tt
-FM Objective
-regression between the vector field ut(x)u_t(x) and the neural network vt(x)v_t(x)
LFM(θ)=Et,pt(x)vt(x)ut(x)2\mathcal{L}_{\text{FM}}(\theta) = \mathbb{E}_{t, p_t(x)} || v_t(x) - u_t(x)||^2

Process:

1.Sample x0N(0,1)x_0 \sim \mathcal{N}(0, 1) and x1p1x_1 \sim p_1
2.Sample time tt
3.Calculate sample xt=ft(x0,x1)x_t = f_t(x_0, x_1)
1.e.g. for linear interpolation, we have xt=(1t)x0+tx1x_t = (1-t)x_0 + tx_1
4.Calculate ut(x)=ddtft(x0,x1)u_t(x) = \frac{d}{dt} f_t(x_0, x_1)
1.e.g. ut(x)=x1x0u_t(x) = x_1 - x_0 for linear interpolation
5.Compute regression loss
L=vθ(xt,t)ut(x)2\mathcal{L} = ||v_\theta(x_t, t) - u_t(x)||^2

conditional probability path - pt(xx1)p_t(x | x_1), intermediate probability path conditioned on ending at the sample x1x_1

-Motivation: the location/point xtx_t can be on many trajectories, making it ambiguous unless conditioned on the sample

marginal probability path - pt(x)p_t(x), mixture of all conditional probability paths over samples

marginal vector field - ut(x)u_t(x), expectation of all conditional vector fields over samples