Загрузка...

Pass@k Training for Adaptively Balancing Exploration (Aug 2025)

Title: Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models (Aug 2025)
Link: http://arxiv.org/abs/2508.10751v1
Date: August 2025

Summary:
The paper introduces Pass@k Training, a reinforcement learning method that uses the Pass@k metric as a reward to improve the exploration and exploitation balance in large reasoning models (LLMs). It includes analytical derivations, empirical validation, and an exploration of advantage function design.

Key Topics:
- Reinforcement Learning
- Large Language Models
- Exploration and Exploitation
- Pass@k metric
- Advantage function design

Chapters:
00:00 - Intro to AI Paper Podcasts
00:06 - LLM Common Headache
00:12 - Pass at K Training
00:15 - Core Insight
00:26 - The Problem
00:44 - Exploration vs. Exploitation
01:10 - Pass at K
01:28 - Computational Enhancements
01:38 - Training Stability
01:53 - Boosting Exploration
02:14 - Answer Diversity
02:27 - Practical Payoff
03:01 - Implicit Reward Design
03:31 - Adaptive Training
03:43 - Adaptive State Optimization
03:51 - Final Thoughts

Видео Pass@k Training for Adaptively Balancing Exploration (Aug 2025) канала AI Paper Slop
Яндекс.Метрика
Все заметки Новая заметка Страницу в заметки
Страницу в закладки Мои закладки
На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.
О CookiesНапомнить позжеПринять