AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents

Paper: https://arxiv.org/pdf/2410.24024

This research paper proposes ANDROIDLAB, a systematic framework for training and evaluating Android autonomous agents. The framework utilizes two operation modes – XML and SoM – to ensure consistent action spaces for both large language models (LLMs) and large multimodal models (LMMs). ANDROIDLAB includes a comprehensive benchmark with 138 tasks across nine Android apps, enabling reproducible evaluation and challenging performance levels for mobile agents. The authors also introduce the Android Instruct dataset, a collection of 10.5k traces and 94.3k steps, which proves to be effective in fine-tuning open-source models, significantly improving their performance on the benchmark. The paper concludes by discussing the potential for further fine-tuning and optimization of open-source models to narrow the gap between their performance and that of closed-source models.

Видео AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents канала AI Papers Decoded Podcast

Комментарии отсутствуют

Информация о видео

5 ноября 2024 г. 21:19:15

00:22:17

AI Papers Decoded Podcast

Правообладателям

Жалоба на материал Недопустимый материал Нарушение авторских прав

Комментарии

Другие видео канала