πŸ“–
Wiki
CNCFSkywardAIHuggingFaceLinkedInKaggleMedium
  • Home
    • πŸš€About
  • πŸ‘©β€πŸ’»πŸ‘©Freesoftware
    • πŸ‰The GNU Hurd
      • πŸ˜„The files extension
      • πŸ“½οΈTutorial for starting
      • 🚚Continue Working for the Hurd
      • πŸš΄β€β™‚οΈcgo
        • πŸ‘―β€β™€οΈStatically VS Dynamically binding
        • 🧌Different ways in binding
        • πŸ‘¨β€πŸ’»Segfault
      • πŸ›ƒRust FFI
    • πŸ§šπŸ»β€β™‚οΈProgramming
      • πŸ“–Introduction to programming
      • πŸ“–Mutable Value Semantics
      • πŸ“–Linked List
      • πŸ“–Rust
        • πŸ“–Keyword dyn
        • πŸ“–Tonic framework
        • πŸ“–Tokio
        • πŸ“–Rust read files
  • πŸ›€οΈAI techniques
    • πŸ—„οΈframework
      • 🧷pytorch
      • πŸ““Time components
      • πŸ““burn
    • 🍑Adaptation
      • 🎁LoRA
        • ℹ️Matrix Factorization
        • πŸ“€SVD
          • ✝️Distillation of SVD
          • 🦎Eigenvalues of a covariance matrix
            • 🧧Eigenvalues
            • πŸͺCovariance Matrix
        • πŸ›«Checkpoint
      • 🎨PEFT
    • πŸ™‹β€β™‚οΈTraining
      • πŸ›»Training with QLoRA
      • 🦌Deep Speed
    • 🧠Stable Diffusion
      • πŸ€‘Stable Diffusion model
      • πŸ“ΌStable Diffusion v1 vs v2
      • πŸ€Όβ€β™€οΈThe important parameters for stunning AI image
      • ⚾Diffusion in image
      • 🚬Classifier Free Guidance
      • ⚜️Denoising strength
      • πŸ‘·Stable Diffusion workflow
      • πŸ“™LoRA(Stable Diffusion)
      • πŸ—ΊοΈDepth maps
      • πŸ“‹CLIP
      • βš•οΈEmbeddings
      • πŸ• VAE
      • πŸ’₯Conditioning
      • 🍁Diffusion sampling/samplers
      • πŸ₯ Prompt
      • πŸ˜„ControlNet
        • πŸͺ‘Settings Explained
        • 🐳ControlNet with models
    • πŸ¦™Large Language Model
      • ☺️SMID
      • πŸ‘¨β€πŸŒΎARM NEON
      • 🍊Metal
      • 🏁BLAS
      • πŸ‰ggml
      • πŸ’»llama.cpp
      • 🎞️Measuring model quality
      • πŸ₯žType for NNC
      • πŸ₯žToken
      • πŸ€Όβ€β™‚οΈDoc Retrieval && QA with LLMs
      • Hallucination(AI)
    • 🐹diffusers
      • πŸ’ͺDeconstruct the Stable Diffusion pipeline
  • 🎹Implementing
    • πŸ‘¨β€πŸ’»diffusers
      • πŸ“–The Annotated Diffusion Model
  • 🧩Trending
    • πŸ“–Trending
      • πŸ“–Vector database
      • 🍎Programming Languages
        • πŸ“–Go & Rust manage their memories
        • πŸ“–Performance of Rust and Python
        • πŸ“–Rust ownership and borrowing
      • πŸ“–Neural Network
        • 🎹Sliding window/convolutional filter
      • Quantum Machine Learning
  • 🎾Courses Collection
    • πŸ“–Courses Collection
      • πŸ“šAcademic In IT
        • πŸ“Reflective Writing
      • πŸ“–UCB
        • πŸ“–CS 61A
          • πŸ“–Computer Science
          • πŸ“–Scheme
          • πŸ“–Python
          • πŸ“–Data Abstraction
          • πŸ“–Object-Oriented Programming
          • πŸ“–Interpreters
          • πŸ“–Streams
      • 🍎MIT Algorithm Courses
        • 0️MIT 18.01
          • 0️Limits and continuity
          • 1️Derivatives
          • 3️Integrals
        • 1️MIT 6.042J
          • πŸ”’Number Theory
          • πŸ“ŠGraph Theory
            • 🌴Graph and Trees
            • 🌲Shortest Paths and Minimum Spanning Trees
        • 2️MIT 6.006
          • Intro and asymptotic notation
          • Sorting and Trees
            • Sorting
            • Trees
          • Hashing
          • Graphs
          • Shortest Paths
          • Dynamic Programming
          • Advanced
        • 3️MIT 6.046J
          • Divide and conquer
          • Dynamic programming
          • Greedy algorithms
          • Graph algorithms
Powered by GitBook
On this page
  • Overview
  • Latent diffusion model
  • Image resolution
  • Manifold hypothesis
  • Why is latent space possible?
  • Reverse diffusion in latent space
  • What is a VAE file?
  • Credit

Was this helpful?

Edit on GitHub
  1. πŸ›€οΈAI techniques
  2. 🧠Stable Diffusion

πŸ€‘Stable Diffusion model

PreviousStable DiffusionNextStable Diffusion v1 vs v2

Last updated 1 year ago

Was this helpful?

Overview

The stable diffusion model in image space is slow because the image space is enormous. For example, one 512x512 image with three color channels is a 786,432-dimensional space (Do not say that many values for ONE image). So, the latent diffusion model is used to solve this.

Latent diffusion model

Stable Diffusion is designed to solve the speed problem.

Stable Diffusion is a latent diffusion model. Instead of operating in the high-dimensional image space, it first compresses the image into the latent space. The latent space is 48 times smaller, so it reaps the benefits of crunching a lot fewer numbers. That's why it's a lot faster.

Image resolution

The image resolution is reflected in the size of the latent image tensor. The size of the latent image is 4x64x64 for 512Γ—512 images only. It is 4x96x64 for a 768Γ—512 portrait image. That's why it takes longer and moves VRAM to generate a larger image.

Since Stable Diffusion v1 is fine-tuned on 512x512 images, generating images larger than 512x512 could result in duplicate objects. For example, the infamous two heads, etc. If you must, keep at least one side to 512 pixels and use an AI upscales for higher resolution.

Manifold hypothesis

The high dimensionality of images is artifactual. Natural images can be readily compressed into the much smaller latent space without losing any information. This is called the manifold hypothesis in machine learning.

Why is latent space possible?

VAE can compress an image into a much smaller latent space without losing information. The reason is that natural images are not random. They are highly regular.

in

Here's how latent reverse diffusion in Stable Diffusion works.

  1. A random latent space matrix is generated

  2. The estimated noise is then subtracted from the latent matrix.

  3. Steps 2 and 3 are repeated up to specific sampling steps.

  4. The decoder of VAE converts the latent matrix to the final image.

What is a VAE file?

VAE files are used in Stable Diffusion v1 to improve eyes and faces. They are the decoder of the autoencoder we just talked about. By further fine-tuning the decoder, the model can paint finer details.

However, compressing an image into the latent space does lose information since the original VAE did not recover the fine details. Instead, the VAE decoder is responsible for painting fine details.

Credit

The estimates the noise of the latent matrix.

latent space
Reverse diffusion
noise predictor
LogoHow does Stable Diffusion work? - Stable Diffusion ArtStable Diffusion Art