Загрузка...

Language Model Merging - Techniques, Tools, and Implementations

Model merging is an innovative approach in the field of language modeling that allows researchers and practitioners to combine multiple models into a single, more capable model without the need for additional training. This technique addresses the challenges of building high-performance models, which typically require significant time, resources, and computational power.

Resources:
Code: https://github.com/ALucek/language-model-merging
Mergekit: https://github.com/arcee-ai/mergekit
Julien Simon Model Merging Pt.1: https://youtu.be/cvOpX75Kz4M?si=Q91k0viO5e4seNRN
Julien Simon Model Merging Pt.2: https://youtu.be/qbAvOgGmFuE?si=9DtMm3tEamjuX1kk

Models Shown:
Gemma w/Model Stock: https://huggingface.co/AdamLucek/gemma2-2b-it-chinese-german
Llama w/SLERP: https://huggingface.co/AdamLucek/llama3-8b-code-sql-slerp
Phi w/DELLA: https://huggingface.co/AdamLucek/Phi-3-mini-EmoMarketing-DELLA
Mistral w/MoE: https://huggingface.co/AdamLucek/EduMixtral-4x7B

Useful Blogs:
Merging Models With Mergekit: https://huggingface.co/blog/mlabonne/merge-models
Create a MoE: https://mlabonne.github.io/blog/posts/2024-03-28_Create_Mixture_of_Experts_with_MergeKit.html
Model Merging: https://blog.premai.io/model-merging/

Papers:
Model Soups: https://arxiv.org/pdf/2203.05482
SLERP: https://en.wikipedia.org/wiki/Slerp
Task Arithmetic: https://arxiv.org/pdf/2212.04089
TIES: https://arxiv.org/pdf/2306.01708
DARE: https://arxiv.org/pdf/2311.03099
Model Breadcrumbs: https://arxiv.org/pdf/2312.06795
Model Stock: https://arxiv.org/pdf/2403.19522
DELLA: https://arxiv.org/pdf/2406.11617
Mixture of Experts: https://arxiv.org/pdf/2401.04088

Chapters:
00:00 - Intro
01:51 - Method: Linear (Model Soups)
03:14 - Method: SLERP (Spherical Interpolation)
05:14 - Method: Task Arithmetic
08:14 - Method: TIES (Trim & Elect Signs)
11:39 - Method: DARE (Drop & Rescale)
13:26 - Method: Model Breadcrumbs
15:09 - Method: Model Stock
16:58 - Method: DELLA (Drop & Rescale via Sampling with Magnitude)
18:33 - Method: Passthrough (Frankenmerging)
20:02 - Method: Mixture of Experts
21:57 - Merging Your Own Models
22:35 - Showcase: Gemma 2 2B w/Model Stock
23:39 - Showcase: Llama 3 8B w/Slerp
24:19 - Showcase: Phi 3 Mini w/DELLA
24:58 - Showcase: Mistral 7b w/Mixture of Experts
25:26 - How To: Understanding Mergekit
26:29 - How To: Picking Models & Method
27:03 - How To: Config File Setup
28:54 - How To: Merging the Models
31:32 - How To: Testing the Merged Model
34:36 - How To: Concluding Merging

#ai #machinelearning #coding

Видео Language Model Merging - Techniques, Tools, and Implementations канала Adam Lucek
Страницу в закладки Мои закладки
Все заметки Новая заметка Страницу в заметки