# VAE

## Variational Autoencoder

{% hint style="info" %}
It is done using a technique called the <mark style="color:red;">**variational autoencoder. (VAE file)**</mark>
{% endhint %}

The VAE neural network has two parts:

* An encoder
* A decoder

The encoder compresses an image to a lower dimensional representation in the latent space. The decoder restores the image from the latent space.

<figure><img src="https://3515747285-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fqm1WfU7McQ1hgBrDXi90%2Fuploads%2Fgit-blob-b2dbf1976abcb77c4f426bb97fefa6761da16b11%2Fimage.png?alt=media" alt=""><figcaption><p>Variational autoencoder transforms the image to and from the latent space.</p></figcaption></figure>

*<mark style="color:red;">**The latent space of the Stable Diffusion model is 4x64x64, 48 times smaller than the image pixel space**</mark>*. All the *<mark style="color:green;">**forward and reverse diffusions**</mark>* we talked about are done in the latent space.

And during training, *<mark style="color:red;">**instead of generating a noisy image, it generates a random tensor in latent space**</mark>* (latent noise). Instead of corrupting an image with noise, it corrupts the representation of the image in latent space with the latent noise. *<mark style="color:red;">**The reason for doing that is it is a lot faster since the latent space is smaller.**</mark>*
