New fine-tuning of language models: Match meaning, not tokens

Language models are usually trained to predict the next word, but that does not always lead to the best overall answers. We introduce energy-based fine-tuning, a new method that trains models to produce better full responses, leading to stronger results without the need for complex reward models or verifiers.

Project: https://energy-based-fine-tuning.github.io
Paper: https://arxiv.org/abs/2603.12248
GitHub: https://github.com/sjelassi/ebft_openrlhf

This session aired on May 14, 2026, at Microsoft Research Forum, Season 2 Episode 4.

Register for the series to hear about new releases: https://www.microsoft.com/en-us/research/event/microsoft-research-forum/?OCID=msr_researchforum_YTDescription
Explore all previous episodes: https://aka.ms/researchforumYTplaylist

Видео New fine-tuning of language models: Match meaning, not tokens канала Microsoft Research

Комментарии отсутствуют

Информация о видео

14 мая 2026 г. 22:06:23

00:07:40

Microsoft Research

Правообладателям

Жалоба на материал Недопустимый материал Нарушение авторских прав

Комментарии

Другие видео канала

New fine-tuning of language models: Match meaning, not tokens

New tools, models, repos, and papers out of Microsoft Research are here.

New tools, models, repos, and papers out of Microsoft Research are here.

Test-time verification for AI agents: New from Microsoft Research #ai #agenticai #verification

New tools, models, repos, and papers out of Microsoft Research are here. #ai #llm #github #agenticai