πŸ“–
Wiki
CNCFSkywardAIHuggingFaceLinkedInKaggleMedium
  • Home
    • πŸš€About
  • πŸ‘©β€πŸ’»πŸ‘©Freesoftware
    • πŸ‰The GNU Hurd
      • πŸ˜„The files extension
      • πŸ“½οΈTutorial for starting
      • 🚚Continue Working for the Hurd
      • πŸš΄β€β™‚οΈcgo
        • πŸ‘―β€β™€οΈStatically VS Dynamically binding
        • 🧌Different ways in binding
        • πŸ‘¨β€πŸ’»Segfault
      • πŸ›ƒRust FFI
    • πŸ§šπŸ»β€β™‚οΈProgramming
      • πŸ“–Introduction to programming
      • πŸ“–Mutable Value Semantics
      • πŸ“–Linked List
      • πŸ“–Rust
        • πŸ“–Keyword dyn
        • πŸ“–Tonic framework
        • πŸ“–Tokio
        • πŸ“–Rust read files
  • πŸ›€οΈAI techniques
    • πŸ—„οΈframework
      • 🧷pytorch
      • πŸ““Time components
      • πŸ““burn
    • 🍑Adaptation
      • 🎁LoRA
        • ℹ️Matrix Factorization
        • πŸ“€SVD
          • ✝️Distillation of SVD
          • 🦎Eigenvalues of a covariance matrix
            • 🧧Eigenvalues
            • πŸͺCovariance Matrix
        • πŸ›«Checkpoint
      • 🎨PEFT
    • πŸ™‹β€β™‚οΈTraining
      • πŸ›»Training with QLoRA
      • 🦌Deep Speed
    • 🧠Stable Diffusion
      • πŸ€‘Stable Diffusion model
      • πŸ“ΌStable Diffusion v1 vs v2
      • πŸ€Όβ€β™€οΈThe important parameters for stunning AI image
      • ⚾Diffusion in image
      • 🚬Classifier Free Guidance
      • ⚜️Denoising strength
      • πŸ‘·Stable Diffusion workflow
      • πŸ“™LoRA(Stable Diffusion)
      • πŸ—ΊοΈDepth maps
      • πŸ“‹CLIP
      • βš•οΈEmbeddings
      • πŸ• VAE
      • πŸ’₯Conditioning
      • 🍁Diffusion sampling/samplers
      • πŸ₯ Prompt
      • πŸ˜„ControlNet
        • πŸͺ‘Settings Explained
        • 🐳ControlNet with models
    • πŸ¦™Large Language Model
      • ☺️SMID
      • πŸ‘¨β€πŸŒΎARM NEON
      • 🍊Metal
      • 🏁BLAS
      • πŸ‰ggml
      • πŸ’»llama.cpp
      • 🎞️Measuring model quality
      • πŸ₯žType for NNC
      • πŸ₯žToken
      • πŸ€Όβ€β™‚οΈDoc Retrieval && QA with LLMs
      • Hallucination(AI)
    • 🐹diffusers
      • πŸ’ͺDeconstruct the Stable Diffusion pipeline
  • 🎹Implementing
    • πŸ‘¨β€πŸ’»diffusers
      • πŸ“–The Annotated Diffusion Model
  • 🧩Trending
    • πŸ“–Trending
      • πŸ“–Vector database
      • 🍎Programming Languages
        • πŸ“–Go & Rust manage their memories
        • πŸ“–Performance of Rust and Python
        • πŸ“–Rust ownership and borrowing
      • πŸ“–Neural Network
        • 🎹Sliding window/convolutional filter
      • Quantum Machine Learning
  • 🎾Courses Collection
    • πŸ“–Courses Collection
      • πŸ“šAcademic In IT
        • πŸ“Reflective Writing
      • πŸ“–UCB
        • πŸ“–CS 61A
          • πŸ“–Computer Science
          • πŸ“–Scheme
          • πŸ“–Python
          • πŸ“–Data Abstraction
          • πŸ“–Object-Oriented Programming
          • πŸ“–Interpreters
          • πŸ“–Streams
      • 🍎MIT Algorithm Courses
        • 0️MIT 18.01
          • 0️Limits and continuity
          • 1️Derivatives
          • 3️Integrals
        • 1️MIT 6.042J
          • πŸ”’Number Theory
          • πŸ“ŠGraph Theory
            • 🌴Graph and Trees
            • 🌲Shortest Paths and Minimum Spanning Trees
        • 2️MIT 6.006
          • Intro and asymptotic notation
          • Sorting and Trees
            • Sorting
            • Trees
          • Hashing
          • Graphs
          • Shortest Paths
          • Dynamic Programming
          • Advanced
        • 3️MIT 6.046J
          • Divide and conquer
          • Dynamic programming
          • Greedy algorithms
          • Graph algorithms
Powered by GitBook
On this page
  • Overview
  • How does ControlNet work?
  • Two ways to use ControlNet
  • Edge detection
  • Human pose detection
  • The difference between using Canny edge detection and Openpose
  • Difference between the Stable Diffusion depth model and ControlNet
  • Similar
  • Difference
  • How does Con
  • Credit

Was this helpful?

Edit on GitHub
  1. AI techniques
  2. Stable Diffusion

ControlNet

PreviousPromptNextSettings Explained

Last updated 1 year ago

Was this helpful?

Overview

ControlNet is a neural network model for controlling models. You can use ControlNet along with any Stable Diffusion models.

Stable diffusion models support text-to-image. It uses text prompts as the conditioning to steer image generation so that we generate images that match the text prompt.

ControlNet adds one more conditioning in addition to the text prompt. The extra conditioning can take many forms in ControlNet.

How does ControlNet work?

ControlNet works by attaching trainable network modules to various parts of the U-Net (noise predictor) of the Stable Diffusion Model. The weight of the Stable Diffusion model is locked so that they are unchanged during training. Only the attached modules are modified during training.

During training, two conditionings are supplied along with each training image.

  • The text prompt

  • The control map

    • OpenPose

    • Canny edges

    • etc

The ControlNet model learns to generate images based on these two inputs.

Each control method is trained independently.

Two ways to use ControlNet

  • Edge detection

  • Human pose detection

Edge detection

ControlNet takes an additional input image and detects its outlines using the Canny edge detector. An image containing the detected edges is then saved as a control map. It is fed into the ControlNet model as an extra conditioning to the text prompt.

Human pose detection

ControlNet workflow using OpenPose

In this ControlNet workflow, key points are extracted from the input image using OpenPose and saved as a control map containing the positions of key points. It is then fed to Stable Diffusion as an extra conditioning together with the text prompt. Images are generated based on these two conditionings.

The difference between using Canny edge detection and Openpose

The Canny edge detector extracts the edges of the subject and background alike. It tends to translate the scene more faithfully. For instance, the outline and hairstyle are preserved in the pictures(edge detection) above.

The OpenPose(It reminds me of Xbox Kinect) only detects human key points such as positions of the head, arms, etc. The image generation is more liberal but follows the original pose. For example, the woman jumping up with the left foot pointing sideways is different from the original image and the one in the Canny Edge example because Openpose's keypoint detection does not specify the orientations of the feet.

Difference between the Stable Diffusion depth model and ControlNet

Stability AI, the creator of Stable Diffusion, released a depth-to-image model. It shares a lot of similarities with ControlNet, but there are important differences.

Similar

  • Both are Stable Diffusion models

  • Both use two conditionings (a preprocessed image and text prompt)

  • Both use MiDAS to estimate the depth map

Difference

  • Depth-to-image model is a v2 model. ControlNet can be used with any v1 or v2 models. ControlNet can use any v1 model not only opening up depth conditioning to the v1.5 base model, but also thousands of special models that were released by the community.

  • ControlNet is more versatile like condition with edge detection, pose detection, and so on.

  • ControlNet's depth map has a higher resolution than depth-to-image's

How does Con

Credit

According to the diagram from the . Initially, the weights of the attached network module are all zero, making the new model able to take advantage of the trained and locked model.

The process of extracting specific information(edge in this case) from the input image is called annotation() or preprocessing (in the ControlNet extension).

(Edge detection is not the only way an image can be preprocessed) is a fast human keypoint detection model that can extract human poses like positions of hands, legs, and head, like:

πŸ›€οΈ
🧠
πŸ˜„
paper
Adding Conditional Control to Text-to-image Diffusion Models
Openpose
Stable Diffusion
LogoControlNet v1.1: A complete guide - Stable Diffusion ArtStable Diffusion Art
Stable Diffusion ControlNet with Canny edge conditioning. Source: stable diffusion art
Input image annotated with human pose detection using Openpose.