Загрузка страницы

Real-Time Panoptic Segmentation From Dense Detections

Authors: Rui Hou, Jie Li, Arjun Bhargava, Allan Raventos, Vitor Guizilini, Chao Fang, Jerome Lynch, Adrien Gaidon Description: Panoptic segmentation is a complex full scene parsing task requiring simultaneous instance and semantic segmentation at high resolution. Current state-of-the-art approaches cannot run in real-time, and simplifying these architectures to improve efficiency severely degrades their accuracy. In this paper, we propose a new single-shot panoptic segmentation network that leverages dense detections and a global self-attention mechanism to operate in real-time with performance approaching the state of the art. We introduce a novel parameter-free mask construction method that substantially reduces computational complexity by efficiently reusing information from the object detection and semantic segmentation sub-tasks. The resulting network has a simple data flow that requires no feature map re-sampling, enabling significant hardware acceleration. Our experiments on the Cityscapes and COCO benchmarks show that our network works at 30 FPS on 1024x2048 resolution, trading a 3% relative performance degradation from the current state of the art for up to 440% faster inference.

Видео Real-Time Panoptic Segmentation From Dense Detections канала ComputerVisionFoundation Videos
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
17 июля 2020 г. 13:20:08
00:04:49
Другие видео канала
Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity EstimationDisp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity EstimationHigh-Resolution Radar Dataset for Semi-Supervised Learning of Dynamic ObjectsHigh-Resolution Radar Dataset for Semi-Supervised Learning of Dynamic ObjectsLearning to Dress 3D People in Generative ClothingLearning to Dress 3D People in Generative ClothingLearning Physics-Guided Face Relighting Under Directional LightLearning Physics-Guided Face Relighting Under Directional LightOrthogonal Convolutional Neural NetworksOrthogonal Convolutional Neural Networks232 - Improving Video Captioning with Temporal Composition of a Visual-Syntactic Embedding232 - Improving Video Captioning with Temporal Composition of a Visual-Syntactic EmbeddingMatch or No Match: Keypoint Filtering Based on Matching ProbabilityMatch or No Match: Keypoint Filtering Based on Matching ProbabilityDeepLPF: Deep Local Parametric Filters for Image EnhancementDeepLPF: Deep Local Parametric Filters for Image Enhancement324 - Weakly Supervised Deep Reinforcement Learning for Video Summarization With Semantically Meani324 - Weakly Supervised Deep Reinforcement Learning for Video Summarization With Semantically MeaniNeural Architecture Search for Lightweight Non-Local NetworksNeural Architecture Search for Lightweight Non-Local NetworksInverse Rendering for Complex Indoor Scenes: Shape, Spatially-Varying Lighting and SVBRDF From a...Inverse Rendering for Complex Indoor Scenes: Shape, Spatially-Varying Lighting and SVBRDF From a...High-Frequency Component Helps Explain the Generalization of Convolutional Neural NetworksHigh-Frequency Component Helps Explain the Generalization of Convolutional Neural Networks1257 - Multimodal Prototypical Networks for Few-shot Learning1257 - Multimodal Prototypical Networks for Few-shot Learning1369 - CenterFusion:Center-based Radar and Camera Fusionfor 3D Object Detection1369 - CenterFusion:Center-based Radar and Camera Fusionfor 3D Object Detection515 - Cinematic-L1 Video Stabilization with a Log-Homography Model515 - Cinematic-L1 Video Stabilization with a Log-Homography Model71 - DeepCSR: A 3D Deep Learning Approach For Cortical Surface Reconstruction71 - DeepCSR: A 3D Deep Learning Approach For Cortical Surface ReconstructionThrough the Looking Glass: Neural 3D Reconstruction of Transparent ShapesThrough the Looking Glass: Neural 3D Reconstruction of Transparent ShapesRethinking Zero-Shot Video Classification: End-to-End Training for Realistic ApplicationsRethinking Zero-Shot Video Classification: End-to-End Training for Realistic Applications12-in-1: Multi-Task Vision and Language Representation Learning12-in-1: Multi-Task Vision and Language Representation LearningEnd-to-End Camera Calibration for Broadcast VideosEnd-to-End Camera Calibration for Broadcast Videos653 - Misclassification Risk and Uncertainty Quantification in Deep Classifiers653 - Misclassification Risk and Uncertainty Quantification in Deep Classifiers
Яндекс.Метрика