- Популярные видео
- Авто
- Видео-блоги
- ДТП, аварии
- Для маленьких
- Еда, напитки
- Животные
- Закон и право
- Знаменитости
- Игры
- Искусство
- Комедии
- Красота, мода
- Кулинария, рецепты
- Люди
- Мото
- Музыка
- Мультфильмы
- Наука, технологии
- Новости
- Образование
- Политика
- Праздники
- Приколы
- Природа
- Происшествия
- Путешествия
- Развлечения
- Ржач
- Семья
- Сериалы
- Спорт
- Стиль жизни
- ТВ передачи
- Танцы
- Технологии
- Товары
- Ужасы
- Фильмы
- Шоу-бизнес
- Юмор
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents
Paper: https://arxiv.org/pdf/2410.24024
This research paper proposes ANDROIDLAB, a systematic framework for training and evaluating Android autonomous agents. The framework utilizes two operation modes – XML and SoM – to ensure consistent action spaces for both large language models (LLMs) and large multimodal models (LMMs). ANDROIDLAB includes a comprehensive benchmark with 138 tasks across nine Android apps, enabling reproducible evaluation and challenging performance levels for mobile agents. The authors also introduce the Android Instruct dataset, a collection of 10.5k traces and 94.3k steps, which proves to be effective in fine-tuning open-source models, significantly improving their performance on the benchmark. The paper concludes by discussing the potential for further fine-tuning and optimization of open-source models to narrow the gap between their performance and that of closed-source models.
Видео AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents канала AI Papers Decoded Podcast
This research paper proposes ANDROIDLAB, a systematic framework for training and evaluating Android autonomous agents. The framework utilizes two operation modes – XML and SoM – to ensure consistent action spaces for both large language models (LLMs) and large multimodal models (LMMs). ANDROIDLAB includes a comprehensive benchmark with 138 tasks across nine Android apps, enabling reproducible evaluation and challenging performance levels for mobile agents. The authors also introduce the Android Instruct dataset, a collection of 10.5k traces and 94.3k steps, which proves to be effective in fine-tuning open-source models, significantly improving their performance on the benchmark. The paper concludes by discussing the potential for further fine-tuning and optimization of open-source models to narrow the gap between their performance and that of closed-source models.
Видео AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents канала AI Papers Decoded Podcast
Комментарии отсутствуют
Информация о видео
5 ноября 2024 г. 21:19:15
00:22:17
Другие видео канала
