Загрузка...

Paper: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper reading in the Discord group. All the lecture was improvised.

Join the group: https://discord.gg/JRKsaNbhCg

Link to paper: https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf

Видео Paper: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning канала Umar Jamil

Комментарии отсутствуют

Информация о видео

22 января 2025 г. 2:16:57

01:19:37

Правообладателям

Жалоба на материал Недопустимый материал Нарушение авторских прав

Комментарии

Поделиться

Другие видео канала

LongNet: Scaling Transformers to 1,000,000,000 tokens: Python Code + Explanation

Coding LLaMA 2 from scratch in PyTorch - KV Cache, Grouped Query Attention, Rotary PE, RMSNorm

ML Interpretability: feature visualization, adversarial example, interp. for language models

Flash Attention derived and coded from first principles with Triton (Python)

Direct Preference Optimization (DPO) explained: Bradley-Terry model, log probabilities, math

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Distributed Training with PyTorch: complete tutorial with cloud infrastructure and code

LoRA: Low-Rank Adaptation of Large Language Models - Explained visually + PyTorch code from scratch

Coding Stable Diffusion from scratch in PyTorch

How diffusion models work - explanation and code!

Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training

Reinforcement Learning from Human Feedback explained with math derivations and the PyTorch code.

Titans: Learning to Memorize at Test Time

Retrieval Augmented Generation (RAG) Explained: Embedding, Sentence BERT, Vector Database (HNSW)

Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math

BERT explained: Training, Inference, BERT vs GPT/LLamA, Fine tuning, [CLS] token

Coding a Transformer from scratch on PyTorch, with full explanation, training and inference.

LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU

Mistral / Mixtral Explained: Sliding Window Attention, Sparse Mixture of Experts, Rolling Buffer

Kolmogorov-Arnold Networks: MLP vs KAN, Math, B-Splines, Universal Approximation Theorem

Все заметки Новая заметка Страницу в заметки

Страницу в закладки Мои закладки

На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.

О Cookies Напомнить позже Принять