πŸ“–
Wiki
CNCFSkywardAIHuggingFaceLinkedInKaggleMedium
  • Home
    • πŸš€About
  • πŸ‘©β€πŸ’»πŸ‘©Freesoftware
    • πŸ‰The GNU Hurd
      • πŸ˜„The files extension
      • πŸ“½οΈTutorial for starting
      • 🚚Continue Working for the Hurd
      • πŸš΄β€β™‚οΈcgo
        • πŸ‘―β€β™€οΈStatically VS Dynamically binding
        • 🧌Different ways in binding
        • πŸ‘¨β€πŸ’»Segfault
      • πŸ›ƒRust FFI
    • πŸ§šπŸ»β€β™‚οΈProgramming
      • πŸ“–Introduction to programming
      • πŸ“–Mutable Value Semantics
      • πŸ“–Linked List
      • πŸ“–Rust
        • πŸ“–Keyword dyn
        • πŸ“–Tonic framework
        • πŸ“–Tokio
        • πŸ“–Rust read files
  • πŸ›€οΈAI techniques
    • πŸ—„οΈframework
      • 🧷pytorch
      • πŸ““Time components
      • πŸ““burn
    • 🍑Adaptation
      • 🎁LoRA
        • ℹ️Matrix Factorization
        • πŸ“€SVD
          • ✝️Distillation of SVD
          • 🦎Eigenvalues of a covariance matrix
            • 🧧Eigenvalues
            • πŸͺCovariance Matrix
        • πŸ›«Checkpoint
      • 🎨PEFT
    • πŸ™‹β€β™‚οΈTraining
      • πŸ›»Training with QLoRA
      • 🦌Deep Speed
    • 🧠Stable Diffusion
      • πŸ€‘Stable Diffusion model
      • πŸ“ΌStable Diffusion v1 vs v2
      • πŸ€Όβ€β™€οΈThe important parameters for stunning AI image
      • ⚾Diffusion in image
      • 🚬Classifier Free Guidance
      • ⚜️Denoising strength
      • πŸ‘·Stable Diffusion workflow
      • πŸ“™LoRA(Stable Diffusion)
      • πŸ—ΊοΈDepth maps
      • πŸ“‹CLIP
      • βš•οΈEmbeddings
      • πŸ• VAE
      • πŸ’₯Conditioning
      • 🍁Diffusion sampling/samplers
      • πŸ₯ Prompt
      • πŸ˜„ControlNet
        • πŸͺ‘Settings Explained
        • 🐳ControlNet with models
    • πŸ¦™Large Language Model
      • ☺️SMID
      • πŸ‘¨β€πŸŒΎARM NEON
      • 🍊Metal
      • 🏁BLAS
      • πŸ‰ggml
      • πŸ’»llama.cpp
      • 🎞️Measuring model quality
      • πŸ₯žType for NNC
      • πŸ₯žToken
      • πŸ€Όβ€β™‚οΈDoc Retrieval && QA with LLMs
      • Hallucination(AI)
    • 🐹diffusers
      • πŸ’ͺDeconstruct the Stable Diffusion pipeline
  • 🎹Implementing
    • πŸ‘¨β€πŸ’»diffusers
      • πŸ“–The Annotated Diffusion Model
  • 🧩Trending
    • πŸ“–Trending
      • πŸ“–Vector database
      • 🍎Programming Languages
        • πŸ“–Go & Rust manage their memories
        • πŸ“–Performance of Rust and Python
        • πŸ“–Rust ownership and borrowing
      • πŸ“–Neural Network
        • 🎹Sliding window/convolutional filter
      • Quantum Machine Learning
  • 🎾Courses Collection
    • πŸ“–Courses Collection
      • πŸ“šAcademic In IT
        • πŸ“Reflective Writing
      • πŸ“–UCB
        • πŸ“–CS 61A
          • πŸ“–Computer Science
          • πŸ“–Scheme
          • πŸ“–Python
          • πŸ“–Data Abstraction
          • πŸ“–Object-Oriented Programming
          • πŸ“–Interpreters
          • πŸ“–Streams
      • 🍎MIT Algorithm Courses
        • 0️MIT 18.01
          • 0️Limits and continuity
          • 1️Derivatives
          • 3️Integrals
        • 1️MIT 6.042J
          • πŸ”’Number Theory
          • πŸ“ŠGraph Theory
            • 🌴Graph and Trees
            • 🌲Shortest Paths and Minimum Spanning Trees
        • 2️MIT 6.006
          • Intro and asymptotic notation
          • Sorting and Trees
            • Sorting
            • Trees
          • Hashing
          • Graphs
          • Shortest Paths
          • Dynamic Programming
          • Advanced
        • 3️MIT 6.046J
          • Divide and conquer
          • Dynamic programming
          • Greedy algorithms
          • Graph algorithms
Powered by GitBook
On this page

Was this helpful?

Edit on GitHub

Last updated 1 year ago

Was this helpful?

Overview

Depth-to-image (Depth2img) is an under-appreciated model in Stable Diffusion v2. It is an enhancement to image-to-image (img2img) which takes advantage of the depth information when generating new images.

With depth-to-image, you have better control of synthesizing subject and background separately.

In depth-to-image, Stable Diffusion takes an image and a prompt as inputs (similar with image-to-image). The model first estimates the depth map of the input image using , an AI model developed in 2019 for estimating (that is estimating depth from a single view). The depth map is then used by Stable Diffusion as an extra to image generation.

Depth-to-image uses three conditionings to generate a new image

  • test prompt

  • original image

  • depth map

Equipped with the depth map, the model has some knowledge of the three-dimensional composition of the scene. Image generations of foreground objects and the background can be separated.

Depth map

A depth map is a simple gray scale image of the same size of the original image encoding the depth information. Complete white means the object is closest to you. More black means further away.

Here’s an example of an image and its depth map estimated by MIDaS.

What can depth-to-image do

Here is an example of denoising strength for both image-to-image and depth-to-image.

Here we can see the image-to-image generations (top row). We ran into a problem: at low denoising strength, the image didn't change enough. At high denoising strength, we do see two wrestlers but the original composition is lost.

Depth-to-image resolves this problem. You can crank up denoising strength all the way to 1 (the maximum) without losing the original composition.

Some useful points with depth image

Inpainting

If we care about preserving the original composition

Style transfer

We can dial denoising strength all the way up to 1 without losing composition. That makes transforming a scene to a different style easy.

Summary

Depth-to-image is a great alternative to image-to-image, especially when you want to preserve the composition of the scene.

Credit

  1. πŸ›€οΈAI techniques
  2. 🧠Stable Diffusion

πŸ—ΊοΈDepth maps

Depth to image

PreviousLoRA(Stable Diffusion)NextCLIP
  • Depth map
  • What can depth-to-image do
  • Some useful points with depth image
  • Inpainting
  • Style transfer
  • Summary
  • Credit
MIDas
monocular depth perception
conditioning
LogoDepth-to-image in Stable Diffusion 2: All you need to know - Stable Diffusion ArtStable Diffusion Art
Original image
Comparing image-to-image and depth-to-image.