Language Model Merging - Techniques, Tools, and Implementations
Model merging is an innovative approach in the field of language modeling that allows researchers and practitioners to combine multiple models into a single, more capable model without the need for additional training. This technique addresses the challenges of building high-performance models, which typically require significant time, resources, and computational power.
Resources:
Code: https://github.com/ALucek/language-model-merging
Mergekit: https://github.com/arcee-ai/mergekit
Julien Simon Model Merging Pt.1: https://youtu.be/cvOpX75Kz4M?si=Q91k0viO5e4seNRN
Julien Simon Model Merging Pt.2: https://youtu.be/qbAvOgGmFuE?si=9DtMm3tEamjuX1kk
Models Shown:
Gemma w/Model Stock: https://huggingface.co/AdamLucek/gemma2-2b-it-chinese-german
Llama w/SLERP: https://huggingface.co/AdamLucek/llama3-8b-code-sql-slerp
Phi w/DELLA: https://huggingface.co/AdamLucek/Phi-3-mini-EmoMarketing-DELLA
Mistral w/MoE: https://huggingface.co/AdamLucek/EduMixtral-4x7B
Useful Blogs:
Merging Models With Mergekit: https://huggingface.co/blog/mlabonne/merge-models
Create a MoE: https://mlabonne.github.io/blog/posts/2024-03-28_Create_Mixture_of_Experts_with_MergeKit.html
Model Merging: https://blog.premai.io/model-merging/
Papers:
Model Soups: https://arxiv.org/pdf/2203.05482
SLERP: https://en.wikipedia.org/wiki/Slerp
Task Arithmetic: https://arxiv.org/pdf/2212.04089
TIES: https://arxiv.org/pdf/2306.01708
DARE: https://arxiv.org/pdf/2311.03099
Model Breadcrumbs: https://arxiv.org/pdf/2312.06795
Model Stock: https://arxiv.org/pdf/2403.19522
DELLA: https://arxiv.org/pdf/2406.11617
Mixture of Experts: https://arxiv.org/pdf/2401.04088
Chapters:
00:00 - Intro
01:51 - Method: Linear (Model Soups)
03:14 - Method: SLERP (Spherical Interpolation)
05:14 - Method: Task Arithmetic
08:14 - Method: TIES (Trim & Elect Signs)
11:39 - Method: DARE (Drop & Rescale)
13:26 - Method: Model Breadcrumbs
15:09 - Method: Model Stock
16:58 - Method: DELLA (Drop & Rescale via Sampling with Magnitude)
18:33 - Method: Passthrough (Frankenmerging)
20:02 - Method: Mixture of Experts
21:57 - Merging Your Own Models
22:35 - Showcase: Gemma 2 2B w/Model Stock
23:39 - Showcase: Llama 3 8B w/Slerp
24:19 - Showcase: Phi 3 Mini w/DELLA
24:58 - Showcase: Mistral 7b w/Mixture of Experts
25:26 - How To: Understanding Mergekit
26:29 - How To: Picking Models & Method
27:03 - How To: Config File Setup
28:54 - How To: Merging the Models
31:32 - How To: Testing the Merged Model
34:36 - How To: Concluding Merging
#ai #machinelearning #coding
Видео Language Model Merging - Techniques, Tools, and Implementations канала Adam Lucek
Resources:
Code: https://github.com/ALucek/language-model-merging
Mergekit: https://github.com/arcee-ai/mergekit
Julien Simon Model Merging Pt.1: https://youtu.be/cvOpX75Kz4M?si=Q91k0viO5e4seNRN
Julien Simon Model Merging Pt.2: https://youtu.be/qbAvOgGmFuE?si=9DtMm3tEamjuX1kk
Models Shown:
Gemma w/Model Stock: https://huggingface.co/AdamLucek/gemma2-2b-it-chinese-german
Llama w/SLERP: https://huggingface.co/AdamLucek/llama3-8b-code-sql-slerp
Phi w/DELLA: https://huggingface.co/AdamLucek/Phi-3-mini-EmoMarketing-DELLA
Mistral w/MoE: https://huggingface.co/AdamLucek/EduMixtral-4x7B
Useful Blogs:
Merging Models With Mergekit: https://huggingface.co/blog/mlabonne/merge-models
Create a MoE: https://mlabonne.github.io/blog/posts/2024-03-28_Create_Mixture_of_Experts_with_MergeKit.html
Model Merging: https://blog.premai.io/model-merging/
Papers:
Model Soups: https://arxiv.org/pdf/2203.05482
SLERP: https://en.wikipedia.org/wiki/Slerp
Task Arithmetic: https://arxiv.org/pdf/2212.04089
TIES: https://arxiv.org/pdf/2306.01708
DARE: https://arxiv.org/pdf/2311.03099
Model Breadcrumbs: https://arxiv.org/pdf/2312.06795
Model Stock: https://arxiv.org/pdf/2403.19522
DELLA: https://arxiv.org/pdf/2406.11617
Mixture of Experts: https://arxiv.org/pdf/2401.04088
Chapters:
00:00 - Intro
01:51 - Method: Linear (Model Soups)
03:14 - Method: SLERP (Spherical Interpolation)
05:14 - Method: Task Arithmetic
08:14 - Method: TIES (Trim & Elect Signs)
11:39 - Method: DARE (Drop & Rescale)
13:26 - Method: Model Breadcrumbs
15:09 - Method: Model Stock
16:58 - Method: DELLA (Drop & Rescale via Sampling with Magnitude)
18:33 - Method: Passthrough (Frankenmerging)
20:02 - Method: Mixture of Experts
21:57 - Merging Your Own Models
22:35 - Showcase: Gemma 2 2B w/Model Stock
23:39 - Showcase: Llama 3 8B w/Slerp
24:19 - Showcase: Phi 3 Mini w/DELLA
24:58 - Showcase: Mistral 7b w/Mixture of Experts
25:26 - How To: Understanding Mergekit
26:29 - How To: Picking Models & Method
27:03 - How To: Config File Setup
28:54 - How To: Merging the Models
31:32 - How To: Testing the Merged Model
34:36 - How To: Concluding Merging
#ai #machinelearning #coding
Видео Language Model Merging - Techniques, Tools, and Implementations канала Adam Lucek
artificial intelligence OpenAI AI Gemini Mistral Llama Open Source HuggingFace Machine Learning Deep Learning AI Trends AI Innovations AI Applications AI Tutorial AI Research AI Solutions AI Projects AI Software AI Algorithms Artificial General Intelligence AGI AI Strategy AI Integration AI Development Multimodal Agent LangChain gpt-4o gpt langsmith fine-tuning RAG API data synthetic Reinforcement Learning model distillation llm model merging llm merging
Комментарии отсутствуют
Информация о видео
12 августа 2024 г. 17:00:33
00:35:23
Другие видео канала




















