Global Vision Transformer Pruning with Hessian-Aware Saliency | CVPR 2023
This work aims on challenging the common design philosophy of the Vision Transformer (ViT) model with uniform dimension across all the stacked blocks in a model stage, where we redistribute the parameters both across transformer blocks and between different structures within the block via the first systematic attempt on global structural pruning.
Transformers yield state-of-the-art results across many tasks. However, their heuristically designed architecture impose huge computational costs during inference. This work aims on challenging the common design philosophy of the Vision Transformer (ViT) model with uniform dimension across all the stacked blocks in a model stage, where we redistribute the parameters both across transformer blocks and between different structures within the block via the first systematic attempt on global structural pruning. Dealing with diverse ViT structural components, we derive a novel Hessian-based structural pruning criteria comparable across all layers and structures, with latency-aware regularization for direct latency reduction.
Performing iterative pruning on the DeiT-Base model leads to a new architecture family called NViT (Novel ViT), with a novel parameter redistribution that utilizes parameters more efficiently. On ImageNet-1K, NViT-Base achieves a 2.6x FLOPs reduction, 5.1x parameter reduction, and 1.9x run-time speedup over the DeiT-Base model in a near lossless manner. Smaller NViT variants achieve more than 1% accuracy gain at the same throughput of the DeiT Small/Tiny variants, as well as a lossless 3.3x parameter reduction over the SWIN-Small model.
These results outperform prior art by a large margin. Further analysis is provided on the parameter redistribution insight of NViT, where we show the high prunability of ViT models, distinct sensitivity within ViT block, and unique parameter distribution trend across stacked ViT blocks. Our insights provide viability for a simple yet effective parameter redistribution rule towards more efficient ViTs for off-the-shelf performance boost.
https://github.com/NVlabs/NViT
#CVPR2023 #cvpr
Видео Global Vision Transformer Pruning with Hessian-Aware Saliency | CVPR 2023 канала NVIDIA Developer
Transformers yield state-of-the-art results across many tasks. However, their heuristically designed architecture impose huge computational costs during inference. This work aims on challenging the common design philosophy of the Vision Transformer (ViT) model with uniform dimension across all the stacked blocks in a model stage, where we redistribute the parameters both across transformer blocks and between different structures within the block via the first systematic attempt on global structural pruning. Dealing with diverse ViT structural components, we derive a novel Hessian-based structural pruning criteria comparable across all layers and structures, with latency-aware regularization for direct latency reduction.
Performing iterative pruning on the DeiT-Base model leads to a new architecture family called NViT (Novel ViT), with a novel parameter redistribution that utilizes parameters more efficiently. On ImageNet-1K, NViT-Base achieves a 2.6x FLOPs reduction, 5.1x parameter reduction, and 1.9x run-time speedup over the DeiT-Base model in a near lossless manner. Smaller NViT variants achieve more than 1% accuracy gain at the same throughput of the DeiT Small/Tiny variants, as well as a lossless 3.3x parameter reduction over the SWIN-Small model.
These results outperform prior art by a large margin. Further analysis is provided on the parameter redistribution insight of NViT, where we show the high prunability of ViT models, distinct sensitivity within ViT block, and unique parameter distribution trend across stacked ViT blocks. Our insights provide viability for a simple yet effective parameter redistribution rule towards more efficient ViTs for off-the-shelf performance boost.
https://github.com/NVlabs/NViT
#CVPR2023 #cvpr
Видео Global Vision Transformer Pruning with Hessian-Aware Saliency | CVPR 2023 канала NVIDIA Developer
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
TIME Best Invention of 2023: NVIDIA NeuralangeloLearning Physically Simulated Tennis Skills from Broadcast Videos | NVIDIA Research at #SIGGRAPH2023NVIDIA #GenAI Theater at #SIGGRAPH2023Microfacet Theory for Non-Uniform Heightfields | NVIDIAResearchData Free Learning of Reduced-Order KinematicsRandom-Access Neural Compression of Material Textures | NVIDIA Research PaperNVIDIA Research: Synthesizing Physical Character-Scene InteractionsCUDA Toolkit 12.2: New Accelerated Computing and Security Enhancements RevealedUniversal Scene Description (OpenUSD): Composition and LayeringNVIDIA Omniverse Administration: Tips and Techniques to Manage Your Nucleus Enterprise ServerNVIDIA Omniverse Administration: How to Deploy the Enterprise LauncherNVIDIA Omniverse Administration: Getting StartedVSLAM for Robotic Applications with NVIDIA Jetson Orin NanoUniversal Scene Description (OpenUSD): 4 Superpowers to Get You StartedATT3D: Amortized Text-to-3D Image Generation | NVIDIA ResearchZero-Shot Pose Transfer for Unrigged Stylized 3D Characters | CVPR 2023Recurrence without Recurrence: Stable Video Landmark Detection with Deep Equilibrium Models CVPR2023Digital Renaissance: Neuralangelo Reconstructs 3D Scenes from 2D Video Clips | from NVIDIA ResearchAdversarial Augmentation against Adversarial Attacks | CVPR 2023Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models | CVPR 2023