Загрузка...

Sapiens2: 4K High-Fidelity Human Vision Models

In this AI Research Roundup episode, Alex discusses the paper: 'Sapiens2' Sapiens2 is a new family of high-resolution vision transformers designed for precise human-centric tasks like pose estimation and body-part segmentation. The models range from 0.4 to 5 billion parameters and support native resolutions up to 4K using windowed attention. By combining masked image reconstruction with self-distilled contrastive objectives, the researchers improved both low-level detail capture and high-level semantics. The models were pretrained on a massive curated dataset of 1 billion high-quality human images, setting new state-of-the-art benchmarks. Beyond existing tasks, Sapiens2 also introduces capabilities for pointmap and albedo estimation with significantly lower error rates. Paper URL: https://arxiv.org/pdf/2604.21681 #AI #MachineLearning #DeepLearning #ComputerVision #VisionTransformer #HumanCentric #PoseEstimation #ImageSegmentation

Resources:
- GitHub: https://github.com/facebookresearch/sapiens2

Видео Sapiens2: 4K High-Fidelity Human Vision Models канала AI Research Roundup
Яндекс.Метрика
Все заметки Новая заметка Страницу в заметки
Страницу в закладки Мои закладки
На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.
О CookiesНапомнить позжеПринять