ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation
This paper introduces **ShareGPT-4o-Image**, a new dataset designed to make advanced image generation capabilities, similar to those found in proprietary models like GPT-4o-Image, more widely available for open research. This dataset comprises **91,000 synthetic samples**, specifically **45,000 text-to-image pairs** and **46,000 text-and-image-to-image examples**, all generated using GPT-4o’s image generation abilities to distill its high-level performance. Based on this dataset, the authors developed **Janus-4o**, a multimodal large language model (MLLM). Janus-4o shows **significant improvements in text-to-image generation** over its predecessor, Janus-Pro, and **newly supports text-and-image-to-image generation**, achieving impressive results even with a relatively small training dataset and quick training time (91K samples, 6 hours on an 8x A800 GPU machine). The release of both ShareGPT-4o-Image and Janus-4o aims to **promote further open research** in creating photorealistic and instruction-aligned images.
https://arxiv.org/pdf/2506.18095
Видео ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation канала AI Papers Podcast Daily
https://arxiv.org/pdf/2506.18095
Видео ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation канала AI Papers Podcast Daily
AI research machine learning deep learning arxiv papers hugging face artificial intelligence AI papers NLP neural networks AI podcast research papers AI trends transformer models GPT AI news tech podcast computer vision AI breakthroughs ML models data science AI tools generative AI AI updates research insights AI developments academic AI ML research
Комментарии отсутствуют
Информация о видео
13 ч. 6 мин. назад
00:18:13
Другие видео канала