Загрузка страницы

Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes

Tobias Pohlen, Alexander Hermans, Markus Mathias, Bastian Leibe
Semantic image segmentation is an essential component of modern autonomous driving systems, as an accurate understanding of the surrounding scene is crucial to navigation and action planning. Current state-of-the-art approaches in semantic image segmentation rely on pre-trained networks that were initially developed for classifying images as a whole. While these networks exhibit outstanding recognition performance (i.e., what is visible?), they lack localization accuracy (i.e., where precisely is something located?). Therefore, additional processing steps have to be performed in order to obtain pixel-accurate segmentation masks at the full image resolution. To alleviate this problem we propose a novel ResNet-like architecture that exhibits strong localization and recognition performance. We combine multi-scale context with pixel-level accuracy by using two processing streams within our network: One stream carries information at the full image resolution, enabling precise adherence to segment boundaries. The other stream undergoes a sequence of pooling operations to obtain robust features for recognition. The two streams are coupled at the full image resolution using residuals. Without additional processing steps and without pre-training, our approach achieves an intersection-over-union score of 71.8% on the Cityscapes dataset.

Видео Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes канала ComputerVisionFoundation Videos
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
26 июля 2017 г. 2:14:27
00:11:28
Другие видео канала
Unsupervised Representation Learning for Gaze EstimationUnsupervised Representation Learning for Gaze EstimationDisp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity EstimationDisp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity EstimationLearning to Dress 3D People in Generative ClothingLearning to Dress 3D People in Generative ClothingTutorial : Deep learning for Objects and Scenes - Part 2Tutorial : Deep learning for Objects and Scenes - Part 2Learning Physics-Guided Face Relighting Under Directional LightLearning Physics-Guided Face Relighting Under Directional LightWACV20: Keynote Talk: Maja Pantic, Imperial College London and SAICWACV20: Keynote Talk: Maja Pantic, Imperial College London and SAICOrthogonal Convolutional Neural NetworksOrthogonal Convolutional Neural Networks232 - Improving Video Captioning with Temporal Composition of a Visual-Syntactic Embedding232 - Improving Video Captioning with Temporal Composition of a Visual-Syntactic Embedding1276 - ClassMix: Segmentation-Based Data Augmentation for Semi-Supervised Learning1276 - ClassMix: Segmentation-Based Data Augmentation for Semi-Supervised LearningMatch or No Match: Keypoint Filtering Based on Matching ProbabilityMatch or No Match: Keypoint Filtering Based on Matching ProbabilityHandVoxNet: Deep Voxel-Based Network for 3D Hand Shape and Pose Estimation From a Single Depth MapHandVoxNet: Deep Voxel-Based Network for 3D Hand Shape and Pose Estimation From a Single Depth MapDSGN: Deep Stereo Geometry Network for 3D Object DetectionDSGN: Deep Stereo Geometry Network for 3D Object DetectionDeepLPF: Deep Local Parametric Filters for Image EnhancementDeepLPF: Deep Local Parametric Filters for Image Enhancement324 - Weakly Supervised Deep Reinforcement Learning for Video Summarization With Semantically Meani324 - Weakly Supervised Deep Reinforcement Learning for Video Summarization With Semantically MeaniInverse Rendering for Complex Indoor Scenes: Shape, Spatially-Varying Lighting and SVBRDF From a...Inverse Rendering for Complex Indoor Scenes: Shape, Spatially-Varying Lighting and SVBRDF From a...368 - DB-GAN: Boosting Object Recognition Under Strong Lighting Conditions368 - DB-GAN: Boosting Object Recognition Under Strong Lighting ConditionsNeural Pose Transfer by Spatially Adaptive Instance NormalizationNeural Pose Transfer by Spatially Adaptive Instance NormalizationALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday TasksALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday TasksHigh-Frequency Component Helps Explain the Generalization of Convolutional Neural NetworksHigh-Frequency Component Helps Explain the Generalization of Convolutional Neural Networks1257 - Multimodal Prototypical Networks for Few-shot Learning1257 - Multimodal Prototypical Networks for Few-shot Learning1369 - CenterFusion:Center-based Radar and Camera Fusionfor 3D Object Detection1369 - CenterFusion:Center-based Radar and Camera Fusionfor 3D Object Detection
Яндекс.Метрика