Авто	Видео-блоги	ДТП, аварии	Для маленьких	Еда, напитки
Животные	Закон и право	Знаменитости	Игры	Искусство
Комедии	Красота, мода	Кулинария, рецепты	Люди	Мото
Музыка	Мультфильмы	Наука, технологии	Новости	Образование
Политика	Праздники	Приколы	Природа	Происшествия
Путешествия	Развлечения	Ржач	Семья	Сериалы
Спорт	Стиль жизни	ТВ передачи	Танцы	Технологии
Товары	Ужасы	Фильмы	Шоу-бизнес	Юмор

ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation

This paper introduces **ShareGPT-4o-Image**, a new dataset designed to make advanced image generation capabilities, similar to those found in proprietary models like GPT-4o-Image, more widely available for open research. This dataset comprises **91,000 synthetic samples**, specifically **45,000 text-to-image pairs** and **46,000 text-and-image-to-image examples**, all generated using GPT-4o’s image generation abilities to distill its high-level performance. Based on this dataset, the authors developed **Janus-4o**, a multimodal large language model (MLLM). Janus-4o shows **significant improvements in text-to-image generation** over its predecessor, Janus-Pro, and **newly supports text-and-image-to-image generation**, achieving impressive results even with a relatively small training dataset and quick training time (91K samples, 6 hours on an 8x A800 GPU machine). The release of both ShareGPT-4o-Image and Janus-4o aims to **promote further open research** in creating photorealistic and instruction-aligned images.

https://arxiv.org/pdf/2506.18095

Видео ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation канала AI Papers Podcast Daily