Robust-U1: MLLMs Self-Recover Corrupted Images

In this AI Research Roundup episode, Alex discusses the paper: 'Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding?' Multimodal Large Language Models (MLLMs) struggle to understand images degraded by real-world corruptions like system noise, compression artifacts, and bad weather. To solve this, researchers introduce Robust-U1, a framework that gives MLLMs explicit visual self-recovery capabilities. Built on the BAGEL model, Robust-U1 uses a three-stage pipeline starting with supervised fine-tuning on ImageNet-C using a rectified-flow loss to map corrupted inputs to recovered images. It then employs reinforcement learning with a dual-reward mechanism, balancing pixel-level structure and semantic consistency to ensure high-fidelity image reconstruction. Finally, the model undergoes multimodal reasoning training via next-token prediction to robustly understand the recovered visuals. Paper URL: https://arxiv.org/abs/2606.08063 #AI #MachineLearning #DeepLearning #MLLM #ComputerVision #ImageRestoration #ReinforcementLearning

Resources:
- GitHub: https://github.com/jqtangust/Robust-U1
- Hugging Face model: https://huggingface.co/Jiaqi-hkust/Robust-U1-SFT
- Hugging Face model 2: https://huggingface.co/Jiaqi-hkust/Robust-U1-RL
- Hugging Face model 3: https://huggingface.co/Jiaqi-hkust/Robust-U1

Видео Robust-U1: MLLMs Self-Recover Corrupted Images канала AI Research Roundup