πŸ“–
Wiki
CNCFSkywardAIHuggingFaceLinkedInKaggleMedium
  • Home
    • πŸš€About
  • πŸ‘©β€πŸ’»πŸ‘©Freesoftware
    • πŸ‰The GNU Hurd
      • πŸ˜„The files extension
      • πŸ“½οΈTutorial for starting
      • 🚚Continue Working for the Hurd
      • πŸš΄β€β™‚οΈcgo
        • πŸ‘―β€β™€οΈStatically VS Dynamically binding
        • 🧌Different ways in binding
        • πŸ‘¨β€πŸ’»Segfault
      • πŸ›ƒRust FFI
    • πŸ§šπŸ»β€β™‚οΈProgramming
      • πŸ“–Introduction to programming
      • πŸ“–Mutable Value Semantics
      • πŸ“–Linked List
      • πŸ“–Rust
        • πŸ“–Keyword dyn
        • πŸ“–Tonic framework
        • πŸ“–Tokio
        • πŸ“–Rust read files
  • πŸ›€οΈAI techniques
    • πŸ—„οΈframework
      • 🧷pytorch
      • πŸ““Time components
      • πŸ““burn
    • 🍑Adaptation
      • 🎁LoRA
        • ℹ️Matrix Factorization
        • πŸ“€SVD
          • ✝️Distillation of SVD
          • 🦎Eigenvalues of a covariance matrix
            • 🧧Eigenvalues
            • πŸͺCovariance Matrix
        • πŸ›«Checkpoint
      • 🎨PEFT
    • πŸ™‹β€β™‚οΈTraining
      • πŸ›»Training with QLoRA
      • 🦌Deep Speed
    • 🧠Stable Diffusion
      • πŸ€‘Stable Diffusion model
      • πŸ“ΌStable Diffusion v1 vs v2
      • πŸ€Όβ€β™€οΈThe important parameters for stunning AI image
      • ⚾Diffusion in image
      • 🚬Classifier Free Guidance
      • ⚜️Denoising strength
      • πŸ‘·Stable Diffusion workflow
      • πŸ“™LoRA(Stable Diffusion)
      • πŸ—ΊοΈDepth maps
      • πŸ“‹CLIP
      • βš•οΈEmbeddings
      • πŸ• VAE
      • πŸ’₯Conditioning
      • 🍁Diffusion sampling/samplers
      • πŸ₯ Prompt
      • πŸ˜„ControlNet
        • πŸͺ‘Settings Explained
        • 🐳ControlNet with models
    • πŸ¦™Large Language Model
      • ☺️SMID
      • πŸ‘¨β€πŸŒΎARM NEON
      • 🍊Metal
      • 🏁BLAS
      • πŸ‰ggml
      • πŸ’»llama.cpp
      • 🎞️Measuring model quality
      • πŸ₯žType for NNC
      • πŸ₯žToken
      • πŸ€Όβ€β™‚οΈDoc Retrieval && QA with LLMs
      • Hallucination(AI)
    • 🐹diffusers
      • πŸ’ͺDeconstruct the Stable Diffusion pipeline
  • 🎹Implementing
    • πŸ‘¨β€πŸ’»diffusers
      • πŸ“–The Annotated Diffusion Model
  • 🧩Trending
    • πŸ“–Trending
      • πŸ“–Vector database
      • 🍎Programming Languages
        • πŸ“–Go & Rust manage their memories
        • πŸ“–Performance of Rust and Python
        • πŸ“–Rust ownership and borrowing
      • πŸ“–Neural Network
        • 🎹Sliding window/convolutional filter
      • Quantum Machine Learning
  • 🎾Courses Collection
    • πŸ“–Courses Collection
      • πŸ“šAcademic In IT
        • πŸ“Reflective Writing
      • πŸ“–UCB
        • πŸ“–CS 61A
          • πŸ“–Computer Science
          • πŸ“–Scheme
          • πŸ“–Python
          • πŸ“–Data Abstraction
          • πŸ“–Object-Oriented Programming
          • πŸ“–Interpreters
          • πŸ“–Streams
      • 🍎MIT Algorithm Courses
        • 0️MIT 18.01
          • 0️Limits and continuity
          • 1️Derivatives
          • 3️Integrals
        • 1️MIT 6.042J
          • πŸ”’Number Theory
          • πŸ“ŠGraph Theory
            • 🌴Graph and Trees
            • 🌲Shortest Paths and Minimum Spanning Trees
        • 2️MIT 6.006
          • Intro and asymptotic notation
          • Sorting and Trees
            • Sorting
            • Trees
          • Hashing
          • Graphs
          • Shortest Paths
          • Dynamic Programming
          • Advanced
        • 3️MIT 6.046J
          • Divide and conquer
          • Dynamic programming
          • Greedy algorithms
          • Graph algorithms
Powered by GitBook
On this page
  • Vector Database
  • The difference between a vector database and a traditional database
  • One limitation of relational databases
  • Vector embedding
  • The use cases of vector databases
  • How do vector databases work?
  • Credit

Was this helpful?

Edit on GitHub
  1. Trending
  2. Trending

Vector database

PreviousTrendingNextProgramming Languages

Last updated 1 year ago

Was this helpful?

Vector Database

A vector databaseis a type of dababase that stores data as high-dimentional vectors, which are mathematical representations of feautres or attributes. Each vector has a vcertain number of dimensions, which can range from tens of thousands, depending on the complexity and granularity of the data. The vectors are usually generated by applying some kind of transformation or embedding function to the raw data, such as text, images, audio, video and others. The embedding function can be based on various methods, such as machine learning models, word embeddings, feature extraction algorithms.

The difference between a vector database and a traditional database

The main difference between a vector database and a traditional(relational) databases lies in type of data they store. While relational database are designed for structured data that fits into tables, vector databses are intended for unstructured data, such as text or images. And the type of data that is stored also influence how the data is retrieved:

In the relational database, query results are based on matches for specific keywords. In vector database, query results are based on similarity between vectors.

One limitation of relational databases

Typically, the SQL is more complex, the time-consuming is longer. And it does not guarantee the accuracy of the results.

Vector embedding

Vector embedding is a process of transforming data into a vector representation. The vector representation is a mathematical representation of the data that can be used for various purposes, such as similarity search, classification, clustering, and others. Some ML algorithms can convert a given object into a numerical representaion that preserves the information of that object, i.e., ML model accepts the prompts and returns us a long list of numbers. The long list of numbers is the nummerical representation of our word and is called vector embedding. And these embeddings are a long list of numbers, we call them high-dimensional. Let's pretend for a second that these embeedings are only three dimensional to visualize them as shown below.

Source from

The numerical representations enable us to apply mathematical calculations to objects, such as words. For example, the following calculation will not work unless you replace the words with their embeddings:

drink - food + hungry = thirsty

And because we are able to use the embeddings for calculations, we can also calculate the distances between a pair of embedded objects. The closer two embeeded objects are to one another, the more similat they are.

The use cases of vector databases

Vector dbs have been around before thr hype around LLMs started. Originally, they were used in recommendation systems because they can quickly find similar objects for a given query. But because they can provide long-term memory to LLMs, they have also been used in QA applications recently.

How do vector databases work?

Vector databases are able to retrieve similar objects of a query quickly because they have already pre-calculated them. The underlying concept is called Approximate Nearest Neighbor(ANN) search, which uses different algorithms for indexing and calculating similarities.

Calculating the similarities between a query and every embedded object you have with a simple k-neares neighbors (kNN) algorithm can become time-consuming when you have millions of embeddings. With ANN, you can trade in some accuracy in exchange for speed and retrieve the approximately most similar objects to a query.

Indexing

A vector database indexes the vector embeddings. This step maps the vectors to a data structure that will enable faster searching. Indexing can thus help you retrieve a smaller portion of all the available vectors and thus speeds up retrieval. More detail about indexing can look up Hierarchical Navigable Small World(HNSW).

Similarity Measures

To find the nearest neighbors to the query from the indexed vectors, a vector database applies a similarity measure. Common similarity measures include cosine similaritym dot product, Euclidean distance, Manhattan distance, and Hamming distance.

Credit

https://learn.microsoft.com/en-us/semantic-kernel/memories/vector-db?source=docs https://towardsdatascience.com/explaining-vector-databases-in-3-levels-of-difficulty-fc392e48ab78

🧩
πŸ“–
πŸ“–
Explaining Vector Databases in 3 Levels of Difficulty