Все видео Новые видео Популярные видео Категории видео

Авто	Видео-блоги	ДТП, аварии	Для маленьких	Еда, напитки
Животные	Закон и право	Знаменитости	Игры	Искусство
Комедии	Красота, мода	Кулинария, рецепты	Люди	Мото
Музыка	Мультфильмы	Наука, технологии	Новости	Образование
Политика	Праздники	Приколы	Природа	Происшествия
Путешествия	Развлечения	Ржач	Семья	Сериалы
Спорт	Стиль жизни	ТВ передачи	Танцы	Технологии
Товары	Ужасы	Фильмы	Шоу-бизнес	Юмор

tinyML Research Symposium: AugViT: Improving Vision Transformer Training by Marrying Attention and..

Zhongzhi YU, PhD Student, EIC Lab at Georgia Institute of Technology
AugViT: Improving Vision Transformer Training by Marrying Attention and Data Augmentation
Despite the impressive accuracy of large-scale vision transformers (ViTs) across various tasks, it remains a challenge for small-scale ViTs (e.g., 1G inference floating points operations (FLOPs) as in LeViT) to significantly outperform state-of-the-art convolution neural networks (CNNs) in terms of the accuracy-efficiency trade-off, limiting their wider application, especially on resource-constrained devices. As analyzed in recent works, selecting an effective data augmentation technique can non-trivially improve the accuracy of small-scale ViTs. However, whether existing mainstream data augmentation techniques dedicated to CNNs are optimal for ViTs is still an open question. To this end, we propose a data augmentation framework called AugViT, which is dedicated to incorporating the key component in ViTs, i.e., self-attention, into data augmentation intensity to enable ViT’s outstanding performance across various devices. Specifically, motivated by ViT’s patch-based processing pipeline, our proposed AugViT integrates (1) a dedicated scheme for mapping the attention map in ViTs to the suggested augmentation intensity for each patch, (2) a simple but effective strategy of selecting the most effective attention map within ViTs to guide the aforementioned attention-aware data argumentation, and (3) a set of patch-level augmentation techniques that matches the patch-aware processing pipeline and enables the varying of augmentation intensities in each patch. Extensive experiments and ablation studies on two datasets and ten representative ViT models validate AugViT’s effectiveness in boosting ViTs’ performance, especially for small-scale ViTs, e.g., improving LeViT-128S’s accuracy from 76.6% to 77.1%, achieving a comparable accuracy to EfficientNet-B0 with 21.8% fewer inference FLOPs overhead on ImageNet dataset).

Видео tinyML Research Symposium: AugViT: Improving Vision Transformer Training by Marrying Attention and.. канала The tinyML Foundation

Показать

Комментарии отсутствуют