Загрузка...

Pass@k Training for Adaptively Balancing Exploration (Aug 2025)

Title: Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models (Aug 2025)
Link: http://arxiv.org/abs/2508.10751v1
Date: August 2025

Summary:
The paper introduces Pass@k Training, a reinforcement learning method that uses the Pass@k metric as a reward to improve the exploration and exploitation balance in large reasoning models (LLMs). It includes analytical derivations, empirical validation, and an exploration of advantage function design.

Key Topics:
- Reinforcement Learning
- Large Language Models
- Exploration and Exploitation
- Pass@k metric
- Advantage function design

Chapters:
00:00 - Intro to AI Paper Podcasts
00:06 - LLM Common Headache
00:12 - Pass at K Training
00:15 - Core Insight
00:26 - The Problem
00:44 - Exploration vs. Exploitation
01:10 - Pass at K
01:28 - Computational Enhancements
01:38 - Training Stability
01:53 - Boosting Exploration
02:14 - Answer Diversity
02:27 - Practical Payoff
03:01 - Implicit Reward Design
03:31 - Adaptive Training
03:43 - Adaptive State Optimization
03:51 - Final Thoughts

Видео Pass@k Training for Adaptively Balancing Exploration (Aug 2025) канала AI Paper Slop

ai paper explanation research

Комментарии отсутствуют

Информация о видео

19 августа 2025 г. 18:24:04

00:19:56

Правообладателям

Жалоба на материал Недопустимый материал Нарушение авторских прав

Комментарии

Поделиться

Другие видео канала

Realtime-VLA FLASH: Speeding up diffusion-based robot models

Any4D: Unified Feed-Forward Metric 4D Reconstruction (Dec 2025)

ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding (Dec 2025)

AgenticQwen: Training Small Agentic Language Models with Dual Data Flywheels for Industrial-Scale To

Context Unrolling in Omni Models (Apr 2026)

DiffusionOPD: Improving Multi-Task Diffusion with On-Policy Distillation

Symbol-Equivariant Recurrent Reasoning Models (Mar 2026)

Sign Lock-In: Randomly Initialized Weight Signs Persist and Bottleneck Sub-Bit Model Compression (Fe

Articraft: An Agentic System for Scalable Articulated 3D Asset Generation (May 2026)

From Kepler to Newton: Inductive Biases Guide Learned World Models in Transformers (Feb 2026)

System Card: Claude Opus 4.5 (Nov 2025)

Why Do Multi-Agent LLM Systems Fail? (Mar 2025)

SkillOrchestra: Learning to Route Agents via Skill Transfer (Feb 2026)

Memento-Skills: Let Agents Design Agents (Mar 2026)

Positive Alignment: Designing AI for Human Flourishing

Why Registers Matter for Pixel-Space Diffusion Transformers

SANA-WM: Efficient Minute-Scale World Modeling from NVIDIA

HoloMotion-1 Technical Report (May 2026)

Harnessing Agentic Evolution (May 2026)

DexJoCo: A Benchmark and Toolkit for Task-Oriented Dexterous Manipulation on MuJoCo (May 2026)

Все заметки Новая заметка Страницу в заметки

Страницу в закладки Мои закладки

На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.

О Cookies Напомнить позже Принять