πŸ“–
Wiki
CNCFSkywardAIHuggingFaceLinkedInKaggleMedium
  • Home
    • πŸš€About
  • πŸ‘©β€πŸ’»πŸ‘©Freesoftware
    • πŸ‰The GNU Hurd
      • πŸ˜„The files extension
      • πŸ“½οΈTutorial for starting
      • 🚚Continue Working for the Hurd
      • πŸš΄β€β™‚οΈcgo
        • πŸ‘―β€β™€οΈStatically VS Dynamically binding
        • 🧌Different ways in binding
        • πŸ‘¨β€πŸ’»Segfault
      • πŸ›ƒRust FFI
    • πŸ§šπŸ»β€β™‚οΈProgramming
      • πŸ“–Introduction to programming
      • πŸ“–Mutable Value Semantics
      • πŸ“–Linked List
      • πŸ“–Rust
        • πŸ“–Keyword dyn
        • πŸ“–Tonic framework
        • πŸ“–Tokio
        • πŸ“–Rust read files
  • πŸ›€οΈAI techniques
    • πŸ—„οΈframework
      • 🧷pytorch
      • πŸ““Time components
      • πŸ““burn
    • 🍑Adaptation
      • 🎁LoRA
        • ℹ️Matrix Factorization
        • πŸ“€SVD
          • ✝️Distillation of SVD
          • 🦎Eigenvalues of a covariance matrix
            • 🧧Eigenvalues
            • πŸͺCovariance Matrix
        • πŸ›«Checkpoint
      • 🎨PEFT
    • πŸ™‹β€β™‚οΈTraining
      • πŸ›»Training with QLoRA
      • 🦌Deep Speed
    • 🧠Stable Diffusion
      • πŸ€‘Stable Diffusion model
      • πŸ“ΌStable Diffusion v1 vs v2
      • πŸ€Όβ€β™€οΈThe important parameters for stunning AI image
      • ⚾Diffusion in image
      • 🚬Classifier Free Guidance
      • ⚜️Denoising strength
      • πŸ‘·Stable Diffusion workflow
      • πŸ“™LoRA(Stable Diffusion)
      • πŸ—ΊοΈDepth maps
      • πŸ“‹CLIP
      • βš•οΈEmbeddings
      • πŸ• VAE
      • πŸ’₯Conditioning
      • 🍁Diffusion sampling/samplers
      • πŸ₯ Prompt
      • πŸ˜„ControlNet
        • πŸͺ‘Settings Explained
        • 🐳ControlNet with models
    • πŸ¦™Large Language Model
      • ☺️SMID
      • πŸ‘¨β€πŸŒΎARM NEON
      • 🍊Metal
      • 🏁BLAS
      • πŸ‰ggml
      • πŸ’»llama.cpp
      • 🎞️Measuring model quality
      • πŸ₯žType for NNC
      • πŸ₯žToken
      • πŸ€Όβ€β™‚οΈDoc Retrieval && QA with LLMs
      • Hallucination(AI)
    • 🐹diffusers
      • πŸ’ͺDeconstruct the Stable Diffusion pipeline
  • 🎹Implementing
    • πŸ‘¨β€πŸ’»diffusers
      • πŸ“–The Annotated Diffusion Model
  • 🧩Trending
    • πŸ“–Trending
      • πŸ“–Vector database
      • 🍎Programming Languages
        • πŸ“–Go & Rust manage their memories
        • πŸ“–Performance of Rust and Python
        • πŸ“–Rust ownership and borrowing
      • πŸ“–Neural Network
        • 🎹Sliding window/convolutional filter
      • Quantum Machine Learning
  • 🎾Courses Collection
    • πŸ“–Courses Collection
      • πŸ“šAcademic In IT
        • πŸ“Reflective Writing
      • πŸ“–UCB
        • πŸ“–CS 61A
          • πŸ“–Computer Science
          • πŸ“–Scheme
          • πŸ“–Python
          • πŸ“–Data Abstraction
          • πŸ“–Object-Oriented Programming
          • πŸ“–Interpreters
          • πŸ“–Streams
      • 🍎MIT Algorithm Courses
        • 0️MIT 18.01
          • 0️Limits and continuity
          • 1️Derivatives
          • 3️Integrals
        • 1️MIT 6.042J
          • πŸ”’Number Theory
          • πŸ“ŠGraph Theory
            • 🌴Graph and Trees
            • 🌲Shortest Paths and Minimum Spanning Trees
        • 2️MIT 6.006
          • Intro and asymptotic notation
          • Sorting and Trees
            • Sorting
            • Trees
          • Hashing
          • Graphs
          • Shortest Paths
          • Dynamic Programming
          • Advanced
        • 3️MIT 6.046J
          • Divide and conquer
          • Dynamic programming
          • Greedy algorithms
          • Graph algorithms
Powered by GitBook
On this page
  • Diffusion model
  • What can Stable Diffusion do?
  • Training part
  • Foward diffusion
  • Reverse diffusion
  • Summary for diffusion process
  • How training is done
  • Noise predictor
  • Noise predictor in reverse diffusion
  • Credit

Was this helpful?

Edit on GitHub
  1. AI techniques
  2. Stable Diffusion

Diffusion in image

Diffusion processing in image

PreviousThe important parameters for stunning AI imageNextClassifier Free Guidance

Last updated 1 year ago

Was this helpful?

Diffusion model

Its math looks very much like diffusion in physics, so it was called diffusion model.

Stable Diffusion is a text-to-image latent diffusion model. It is called a latent diffusion model because it works with a lower-dimensinal representation of the image instead of the actual pixel space, which makes it more memory efficient.

The encoder compresses the image into a smaller representation, and a decoder to convert the compresses representation back into an image.

For text-to-image models, you'll need a tokenizer and an encoder to generate text embeddings.

What can Stable Diffusion do?

Stable Diffusion is a text-to-image deep-learning model.

Training part

Foward diffusion

A forward diffusion process adds noise to a training image, gradually turning it into an uncharacteristic noise image. The forward process will run any cat or dog image into a noise image. Eventually, you won't be able to tell whether they are initially a dog or a cat.

It's like a drop of ink fell into a glass of water. The ink drop diffuses in water. After a few minutes, it randomly distributes itself throughout the water. You can no longer tell whether it initially fell at the center or near the rim.

Example of forward diffusion of a cat image

Reverse diffusion

The main idea of reverse diffusion is starting from a noisy, meaningless image, it recovers a cat OR a dog image.

Summary for diffusion process

Every diffusion process has two parts below

  • Drift or directed motion

  • Random motion

And the reverse diffusion towards either cat or dog images but nothing in between. That's why the result can either be a cat or a dog.

How training is done

I agree this is a million-dollar question.

Noise predictor

Here is the training process for the noise predictor below:

  1. Pick a training image, like a photo of a cat.

  2. Generate a random noise image

  3. Corrupt the training image by adding this noisy image up to a certain number of steps

  4. Teach the noise predictor to tell us how much noise was added. This is done by tuning its weights and showing it the correct answer.

In the above picture, noise is sequentially added at each step. The noise predictor estimates the total noise added up to each step.

After training, we have a noise predictor capable of estimating the noise added to an image.

  1. Generating a completely random image and ask the noise predictor to tell us the noise

  2. Subtracting this estimated noise from the original image

  3. Repeat this process a few times

We will get an image of either a cat or a dog.

Credit

Reverse diffusion in latent space please see .

To reverse the discussion, we need to know how much noise is added to an image. The answer is using

A neural network model to predict the noise added. And it is a .

Noise predictor in

Here is no control over generating a cat or a dog's image(unconditioned). More detail for

πŸ›€οΈ
🧠
⚾
U-Net model
conditioning
noise predictor.
reverse diffusion
Stable diffusion turns text prompts into images
Forward diffusion of a cat image
The reverse diffusion process recovers an image.
Reverse diffusion works by subtracting the predicted noise from the image successively
Forward diffusion turns a into noise.
photo
LogoHow does Stable Diffusion work? - Stable Diffusion ArtStable Diffusion Art
here