🚬Classifier Free Guidance

Overview

Classifier Free Guidance (CFG), a value AI artists tinker with every day.

Classifier guidance

Classifier guidance is a way to incorporate image labels in diffusion models. You can use a label to guide the diffusion process. For example, the label "cat" steers the reverse diffusion process to generate photos of cats.

The classifier guidance scale is a parameter for controlling how closely the diffusion process should follow the label.

Here is an example below. Suppose there are 3 groups of images with the label "cat", "dog" and "human". If the diffusion is unguided, the model will draw samples from each group's total population, but sometimes it may draw images that could fit two labels, e.g. a boy petting a dog.

With high classifier guidance, the images produced by the diffusion model would be biased toward the extreme or unambiguous examples. If you ask the model for a cat, it will return an image that is unambiguously a cat and nothing else.

The classifier guidance scale controls how closely the guidance is followed. In the figure above, the sampling on the right has a higher classifier guidance scale than the one in the middle. In practice, this scale value is simply the multiplier to the drift term toward the data with that label.

Classifier-free guidance

Classifier guidance needs an extra model to provide that guidance, but this has presented some difficulties in training.

Classifier-free guidance, in its authors' terms, is a way to achieve "classifier guidance without a classifier". Instead of using class labels and a separate model for guidance, they proposed to use image captions and train a conditional diffusion model, exactly like the conditioning in text-to-image.

They put the classifier part as conditioning of the noise predictor U-Net, achieving the so-called "classifier-free" (i.e. without a separate image classifier) guidance in image generation.

The text prompt provides this guidance in text-to-image.

CGF value

How to control how much the guidance should be followed with classifier-free diffusion process via conditioning?

Classifier-free guidance (CFG) scale is a value that controls how much the text prompt conditions the diffusion process. The image generation is unconditioned (i.e. the prompt is ignored) when it is set to 0. A higher value steers the diffusion towards the prompt.

Last updated