Back to Writing
Deep Dives May 10, 2024 15 min read

Mathematical Intuition behind Diffusion Models

At their core, diffusion models are about destroying and reconstructing information. We take an image and slowly add Gaussian noise until it is pure static. The model learns the reverse process: how to take static and denoise it step-by-step into a coherent image.

The Forward Process

We can model the forward diffusion process as a Markov chain. Let x0x_0 be the original image. At each step tt, we add a small amount of Gaussian noise:

q(xtxt1)=N(xt;1βtxt1,βtI)q(x_t | x_{t-1}) = \mathcal{N}(x_t; \sqrt{1 - \beta_t} x_{t-1}, \beta_t \mathbf{I})

Where βt\beta_t is a variance schedule. As tt \to \infty, xtx_t approaches an isotropic Gaussian distribution N(0,I)\mathcal{N}(0, \mathbf{I}).

The Reverse Process

The goal is to learn the reverse distribution pθ(xt1xt)p_\theta(x_{t-1} | x_t). Since the exact reverse is intractable, we approximate it with a neural network (usually a U-Net):

pθ(xt1xt)=N(xt1;μθ(xt,t),Σθ(xt,t))p_\theta(x_{t-1} | x_t) = \mathcal{N}(x_{t-1}; \mu_\theta(x_t, t), \Sigma_\theta(x_t, t))

Diffusion Process Diagram

This simple idea connects thermodynamics (non-equilibrium statistical physics) with modern deep learning.

Thanks for reading.