Загрузка...

AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents

Paper: https://arxiv.org/pdf/2410.24024

This research paper proposes ANDROIDLAB, a systematic framework for training and evaluating Android autonomous agents. The framework utilizes two operation modes – XML and SoM – to ensure consistent action spaces for both large language models (LLMs) and large multimodal models (LMMs). ANDROIDLAB includes a comprehensive benchmark with 138 tasks across nine Android apps, enabling reproducible evaluation and challenging performance levels for mobile agents. The authors also introduce the Android Instruct dataset, a collection of 10.5k traces and 94.3k steps, which proves to be effective in fine-tuning open-source models, significantly improving their performance on the benchmark. The paper concludes by discussing the potential for further fine-tuning and optimization of open-source models to narrow the gap between their performance and that of closed-source models.

Видео AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents канала AI Papers Decoded Podcast
Яндекс.Метрика
Все заметки Новая заметка Страницу в заметки
Страницу в закладки Мои закладки
На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.
О CookiesНапомнить позжеПринять