Kolmogorov-Arnold Networks: MLP vs KAN, Math, B-Splines, Universal Approximation Theorem

In this video, I will be explaining Kolmogorov-Arnold Networks, a new type of network that was presented in the paper "KAN: Kolmogorov-Arnold Networks" by Liu et al.
I will start the video by reviewing Multilayer Perceptrons, to show how the typical Linear layer works in a neural network. I will then introduce the concept of data fitting, which is necessary to understand Bézier Curves and then B-Splines.
Before introducing Kolmogorov-Arnold Networks, I will also explain what is the Universal Approximation Theorem for Neural Networks and its equivalent for Kolmogorov-Arnold Networks called Kolmogorov-Arnold Representation Theorem.
In the final part of the video, I will explain the structure of this new type of network, by deriving its structure step by step from the formula of the Kolmogorov-Arnold Representation Theorem, while comparing it with Multilayer Perceptrons at the same time.
We will also explore some properties of this type of network, for example the easy interpretability and the possibility to perform continual learning.

Paper: https://arxiv.org/abs/2404.19756

Slides PDF: https://github.com/hkproj/kan-notes

Chapters
00:00:00 - Introduction
00:01:10 - Multilayer Perceptron
00:11:08 - Introduction to data fitting
00:15:36 - Bézier Curves
00:28:12 - B-Splines
00:40:42 - Universal Approximation Theorem
00:45:10 - Kolmogorov-Arnold Representation Theorem
00:46:17 - Kolmogorov-Arnold Networks
00:51:55 - MLP vs KAN
00:55:20 - Learnable functions
00:58:06 - Parameters count
01:00:44 - Grid extension
01:03:37 - Interpretability
01:10:42 - Continual learning

Видео Kolmogorov-Arnold Networks: MLP vs KAN, Math, B-Splines, Universal Approximation Theorem канала Umar Jamil

pytorch python tutorial math language models deep learning machine learning multi layer perceptron mlp kolmogorov-arnold networks kolmogorov-arnold representation theorem universal approximation theorem neural networks bezier curves splines b-splines linear layers

Комментарии отсутствуют

Информация о видео

11 мая 2024 г. 14:23:09

01:15:39

Umar Jamil

Теги

Правообладателям

Жалоба на материал Недопустимый материал Нарушение авторских прав

Комментарии

Другие видео канала

Kolmogorov-Arnold Networks: MLP vs KAN, Math, B-Splines, Universal Approximation Theorem

LongNet: Scaling Transformers to 1,000,000,000 tokens: Python Code + Explanation

Coding LLaMA 2 from scratch in PyTorch - KV Cache, Grouped Query Attention, Rotary PE, RMSNorm

ML Interpretability: feature visualization, adversarial example, interp. for language models

Flash Attention derived and coded from first principles with Triton (Python)

Direct Preference Optimization (DPO) explained: Bradley-Terry model, log probabilities, math

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Distributed Training with PyTorch: complete tutorial with cloud infrastructure and code

LoRA: Low-Rank Adaptation of Large Language Models - Explained visually + PyTorch code from scratch

Coding Stable Diffusion from scratch in PyTorch

How diffusion models work - explanation and code!

Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training

Reinforcement Learning from Human Feedback explained with math derivations and the PyTorch code.

Titans: Learning to Memorize at Test Time

Retrieval Augmented Generation (RAG) Explained: Embedding, Sentence BERT, Vector Database (HNSW)

Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math

BERT explained: Training, Inference, BERT vs GPT/LLamA, Fine tuning, [CLS] token

Coding a Transformer from scratch on PyTorch, with full explanation, training and inference.

LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU

Paper: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Mistral / Mixtral Explained: Sliding Window Attention, Sparse Mixture of Experts, Rolling Buffer