Загрузка...

ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation

This paper introduces **ShareGPT-4o-Image**, a new dataset designed to make advanced image generation capabilities, similar to those found in proprietary models like GPT-4o-Image, more widely available for open research. This dataset comprises **91,000 synthetic samples**, specifically **45,000 text-to-image pairs** and **46,000 text-and-image-to-image examples**, all generated using GPT-4o’s image generation abilities to distill its high-level performance. Based on this dataset, the authors developed **Janus-4o**, a multimodal large language model (MLLM). Janus-4o shows **significant improvements in text-to-image generation** over its predecessor, Janus-Pro, and **newly supports text-and-image-to-image generation**, achieving impressive results even with a relatively small training dataset and quick training time (91K samples, 6 hours on an 8x A800 GPU machine). The release of both ShareGPT-4o-Image and Janus-4o aims to **promote further open research** in creating photorealistic and instruction-aligned images.

https://arxiv.org/pdf/2506.18095

Видео ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation канала AI Papers Podcast Daily
Страницу в закладки Мои закладки
Все заметки Новая заметка Страницу в заметки