Загрузка страницы

Video Panoptic Segmentation

Authors: Dahun Kim, Sanghyun Woo, Joon-Young Lee, In So Kweon Description: Panoptic segmentation has become a new standard of visual recognition task by unifying previous semantic segmentation and instance segmentation tasks in concert. In this paper, we propose and explore a new video extension of this task, called video panoptic segmentation. The task requires generating consistent panoptic segmentation as well as an association of instance ids across video frames. To invigorate research on this new task, we present two types of video panoptic datasets. The first is a re-organization of the synthetic VIPER dataset into the video panoptic format to exploit its large-scale pixel annotations. The second is a temporal extension on the Cityscapes val. set, by providing new video panoptic annotations (Cityscapes-VPS). Moreover, we propose a novel video panoptic segmentation network (VPSNet) which jointly predicts object classes, bounding boxes, masks, instance id tracking, and semantic segmentation in video frames. To provide appropriate metrics for this task, we propose a video panoptic quality (VPQ) metric and evaluate our method and several other baselines. Experimental results demonstrate the effectiveness of the presented two datasets. We achieve state-of-the-art results in image PQ on Cityscapes and also in VPQ on Cityscapes-VPS and VIPER datasets.

Видео Video Panoptic Segmentation канала ComputerVisionFoundation Videos
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
17 июля 2020 г. 14:20:21
00:05:01
Другие видео канала
Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity EstimationDisp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity EstimationHigh-Resolution Radar Dataset for Semi-Supervised Learning of Dynamic ObjectsHigh-Resolution Radar Dataset for Semi-Supervised Learning of Dynamic ObjectsLearning to Dress 3D People in Generative ClothingLearning to Dress 3D People in Generative ClothingLearning Physics-Guided Face Relighting Under Directional LightLearning Physics-Guided Face Relighting Under Directional LightOrthogonal Convolutional Neural NetworksOrthogonal Convolutional Neural Networks232 - Improving Video Captioning with Temporal Composition of a Visual-Syntactic Embedding232 - Improving Video Captioning with Temporal Composition of a Visual-Syntactic EmbeddingMatch or No Match: Keypoint Filtering Based on Matching ProbabilityMatch or No Match: Keypoint Filtering Based on Matching ProbabilityDeepLPF: Deep Local Parametric Filters for Image EnhancementDeepLPF: Deep Local Parametric Filters for Image Enhancement324 - Weakly Supervised Deep Reinforcement Learning for Video Summarization With Semantically Meani324 - Weakly Supervised Deep Reinforcement Learning for Video Summarization With Semantically MeaniNeural Architecture Search for Lightweight Non-Local NetworksNeural Architecture Search for Lightweight Non-Local NetworksInverse Rendering for Complex Indoor Scenes: Shape, Spatially-Varying Lighting and SVBRDF From a...Inverse Rendering for Complex Indoor Scenes: Shape, Spatially-Varying Lighting and SVBRDF From a...High-Frequency Component Helps Explain the Generalization of Convolutional Neural NetworksHigh-Frequency Component Helps Explain the Generalization of Convolutional Neural Networks1257 - Multimodal Prototypical Networks for Few-shot Learning1257 - Multimodal Prototypical Networks for Few-shot Learning1369 - CenterFusion:Center-based Radar and Camera Fusionfor 3D Object Detection1369 - CenterFusion:Center-based Radar and Camera Fusionfor 3D Object Detection515 - Cinematic-L1 Video Stabilization with a Log-Homography Model515 - Cinematic-L1 Video Stabilization with a Log-Homography Model71 - DeepCSR: A 3D Deep Learning Approach For Cortical Surface Reconstruction71 - DeepCSR: A 3D Deep Learning Approach For Cortical Surface ReconstructionThrough the Looking Glass: Neural 3D Reconstruction of Transparent ShapesThrough the Looking Glass: Neural 3D Reconstruction of Transparent ShapesRethinking Zero-Shot Video Classification: End-to-End Training for Realistic ApplicationsRethinking Zero-Shot Video Classification: End-to-End Training for Realistic Applications12-in-1: Multi-Task Vision and Language Representation Learning12-in-1: Multi-Task Vision and Language Representation LearningEnd-to-End Camera Calibration for Broadcast VideosEnd-to-End Camera Calibration for Broadcast Videos653 - Misclassification Risk and Uncertainty Quantification in Deep Classifiers653 - Misclassification Risk and Uncertainty Quantification in Deep Classifiers
Яндекс.Метрика