Retrieval Augmented Generation (RAG) Explained: Embedding, Sentence BERT, Vector Database (HNSW)

Get your 5$ coupon for Gradient: https://gradient.1stcollab.com/umarjamilai

In this video we explore the entire Retrieval Augmented Generation pipeline. I will start by reviewing language models, their training and inference, and then explore the main ingredient of a RAG pipeline: embedding vectors. We will see what are embedding vectors, how they are computed, and how we can compute embedding vectors for sentences. We will also explore what is a vector database, while also exploring the popular HNSW (Hierarchical Navigable Small Worlds) algorithm used by vector databases to find embedding vectors given a query.

Download the PDF slides: https://github.com/hkproj/retrieval-augmented-generation-notes
Sentence BERT paper: https://arxiv.org/pdf/1908.10084.pdf

Chapters
00:00 - Introduction
02:22 - Language Models
04:33 - Fine-Tuning
06:04 - Prompt Engineering (Few-Shot)
07:24 - Prompt Engineering (QA)
10:15 - RAG pipeline (introduction)
13:38 - Embedding Vectors
19:41 - Sentence Embedding
23:17 - Sentence BERT
28:10 - RAG pipeline (review)
29:50 - RAG with Gradient
31:38 - Vector Database
33:11 - K-NN (Naive)
35:16 - Hierarchical Navigable Small Worlds (Introduction)
35:54 - Six Degrees of Separation
39:35 - Navigable Small Worlds
43:08 - Skip-List
45:23 - Hierarchical Navigable Small Worlds
47:27 - RAG pipeline (review)
48:22 - Closing

Видео Retrieval Augmented Generation (RAG) Explained: Embedding, Sentence BERT, Vector Database (HNSW) канала Umar Jamil

Комментарии отсутствуют

Информация о видео

27 ноября 2023 г. 12:59:38

00:49:24

Umar Jamil

Теги

Правообладателям

Жалоба на материал Недопустимый материал Нарушение авторских прав

Комментарии

Другие видео канала

Retrieval Augmented Generation (RAG) Explained: Embedding, Sentence BERT, Vector Database (HNSW)

LongNet: Scaling Transformers to 1,000,000,000 tokens: Python Code + Explanation

ML Interpretability: feature visualization, adversarial example, interp. for language models

Direct Preference Optimization (DPO) explained: Bradley-Terry model, log probabilities, math

Coding LLaMA 2 from scratch in PyTorch - KV Cache, Grouped Query Attention, Rotary PE, RMSNorm

Flash Attention derived and coded from first principles with Triton (Python)

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Distributed Training with PyTorch: complete tutorial with cloud infrastructure and code

Coding Stable Diffusion from scratch in PyTorch

LoRA: Low-Rank Adaptation of Large Language Models - Explained visually + PyTorch code from scratch

Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training

How diffusion models work - explanation and code!

Reinforcement Learning from Human Feedback explained with math derivations and the PyTorch code.

Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math

Titans: Learning to Memorize at Test Time

BERT explained: Training, Inference, BERT vs GPT/LLamA, Fine tuning, [CLS] token

Coding a Transformer from scratch on PyTorch, with full explanation, training and inference.

LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU

Paper: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Mistral / Mixtral Explained: Sliding Window Attention, Sparse Mixture of Experts, Rolling Buffer

Variational Autoencoder - Model, ELBO, loss function and maths explained easily!

Kolmogorov-Arnold Networks: MLP vs KAN, Math, B-Splines, Universal Approximation Theorem

Attention is all you need (Transformer) - Model explanation (including math), Inference and Training

Segment Anything - Model explanation with code

CLIP - Paper explanation (training and inference)