Which Experiences Are Influential For RL Agents

Which experiences in a replay buffer actually help an RL agent, and which ones hurt it?

In this video, we break down the 2025 Reinforcement Learning Journal paper:

“Which Experiences Are Influential for RL Agents? Efficiently Estimating the Influence of Experiences”
by Hiraoka, Wang, Onishi, and Tsuruoka.

The paper introduces PIToD, Policy Iteration with Turn-over Dropout, a method for estimating the influence of experience data in reinforcement learning without expensive Leave-One-Out retraining.

We cover why influence estimation matters, why the classic LOO approach is computationally infeasible, how PIToD uses masks and flipped masks to isolate experience influence, and how the method can even improve underperforming agents by disabling harmful experience groups.

Topics covered:
• Experience replay in off-policy reinforcement learning
• Policy evaluation and policy improvement
• Why Leave-One-Out influence estimation is too slow
• PIToD: Policy Iteration with Turn-over Dropout
• Binary masks and flipped masks
• Influence estimation without retraining
• Self-influence and primacy bias
• Fixing underperforming RL agents through amendment
• Open questions for scaling PIToD to larger networks and multi-agent settings

Chapters:
00:00 Introduction
01:30 Experience Replay Foundations
04:00 Why Leave-One-Out Is Too Slow
06:30 PIToD: Policy Iteration with Turn-over Dropout
11:30 Theoretical Foundation
13:30 Evaluation Results
17:30 Fixing Underperforming Agents
19:30 Outro

Paper:
https://rlj.cs.umass.edu/2025/papers/RLJ_RLC_2025_4.pdf

If you found this useful, like the video, leave a comment, and subscribe for more deep dives into reinforcement learning, AI research, and machine learning papers.

Видео Which Experiences Are Influential For RL Agents канала Cindy