- Популярные видео
- Авто
- Видео-блоги
- ДТП, аварии
- Для маленьких
- Еда, напитки
- Животные
- Закон и право
- Знаменитости
- Игры
- Искусство
- Комедии
- Красота, мода
- Кулинария, рецепты
- Люди
- Мото
- Музыка
- Мультфильмы
- Наука, технологии
- Новости
- Образование
- Политика
- Праздники
- Приколы
- Природа
- Происшествия
- Путешествия
- Развлечения
- Ржач
- Семья
- Сериалы
- Спорт
- Стиль жизни
- ТВ передачи
- Танцы
- Технологии
- Товары
- Ужасы
- Фильмы
- Шоу-бизнес
- Юмор
Architecting Object Permanence The HyDRA Framework
Researchers have introduced **Hybrid Memory**, a new framework for video world models that maintains **static background consistency** while tracking **dynamic subjects** that move in and out of the frame. Unlike previous methods that treat environments as motionless, this approach ensures that subjects re-emerging from off-camera retain their **original appearance and motion trajectory**. To support this, the authors developed **HM-World**, a massive dataset containing 59,000 high-fidelity clips specifically designed with **complex camera movements** and subject "exit-entry" events. They also proposed **HyDRA**, a specialized architecture that uses a **spatiotemporal retrieval mechanism** to pull relevant cues from compressed memory tokens. This system allows the model to "remember" hidden objects, effectively preventing them from **vanishing or distorting** when they return to view. Ultimately, this work significantly advances the ability of AI to simulate **physically coherent and persistent** dynamic worlds.
**Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models**
By Kaijin Chen (Huazhong University of Science and Technology, Kling Team at Kuaishou Technology), Dingkang Liang (Huazhong University of Science and Technology), Xin Zhou (Huazhong University of Science and Technology), Yikang Ding (Kling Team at Kuaishou Technology), Xiaoqiang Liu (Kling Team at Kuaishou Technology), Pengfei Wan (Kling Team at Kuaishou Technology), and Xiang Bai (Huazhong University of Science and Technology).
**What problem the paper was trying to solve**
The paper addresses the **inability of current video world models to maintain the consistency of moving subjects that temporarily leave the camera's field of view**. Existing memory mechanisms treat simulated environments as static canvases, meaning that when dynamic subjects (like walking pedestrians) exit the frame and later re-enter, models lose track of them and often render the returning subjects as frozen, distorted, or missing altogether.
**What are the paper's key novel ideas?**
The authors introduce **Hybrid Memory**, a new generative paradigm requiring models to simultaneously act as precise archivists for static backgrounds while constantly tracking and predicting the unseen motion of hidden dynamic subjects. To facilitate this, they created **HM-World**, the first large-scale dataset (59K video clips) specifically constructed with decoupled camera and subject trajectories to densely capture complex "exit-and-re-entry" events.
**What is the architecture or method they are using?**
The researchers propose **HyDRA (Hybrid Dynamic Retrieval Attention)**, a specialized memory architecture built on top of a video diffusion model. It employs a **3D-convolution-based Memory Tokenizer** to compress historical video latents into compact, motion-aware memory tokens. During generation, a **spatiotemporal relevance-driven retrieval mechanism** calculates affinity metrics to actively scan these tokens, selectively pulling the most critical hidden motion and appearance cues into the current denoising process.
**Why the paper matters**
This research overcomes a critical bottleneck in generative video consistency, proving that models can learn to successfully disentangle moving entities from static environments. Extensive experiments show that **HyDRA significantly outperforms existing state-of-the-art methods and leading commercial models** (such as WorldPlay) in preserving subject identity, motion coherence, and overall visual fidelity during complex camera movements.
**What are the potential applications**
Robust hybrid memory capabilities directly enhance **long-duration, high-fidelity video generation and interactive world modeling**. These advancements are foundational for downstream applications that require consistent physical world simulation, including **autonomous driving testing, embodied intelligence, and interactive open-world gaming**.
Видео Architecting Object Permanence The HyDRA Framework канала MLSlops
**Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models**
By Kaijin Chen (Huazhong University of Science and Technology, Kling Team at Kuaishou Technology), Dingkang Liang (Huazhong University of Science and Technology), Xin Zhou (Huazhong University of Science and Technology), Yikang Ding (Kling Team at Kuaishou Technology), Xiaoqiang Liu (Kling Team at Kuaishou Technology), Pengfei Wan (Kling Team at Kuaishou Technology), and Xiang Bai (Huazhong University of Science and Technology).
**What problem the paper was trying to solve**
The paper addresses the **inability of current video world models to maintain the consistency of moving subjects that temporarily leave the camera's field of view**. Existing memory mechanisms treat simulated environments as static canvases, meaning that when dynamic subjects (like walking pedestrians) exit the frame and later re-enter, models lose track of them and often render the returning subjects as frozen, distorted, or missing altogether.
**What are the paper's key novel ideas?**
The authors introduce **Hybrid Memory**, a new generative paradigm requiring models to simultaneously act as precise archivists for static backgrounds while constantly tracking and predicting the unseen motion of hidden dynamic subjects. To facilitate this, they created **HM-World**, the first large-scale dataset (59K video clips) specifically constructed with decoupled camera and subject trajectories to densely capture complex "exit-and-re-entry" events.
**What is the architecture or method they are using?**
The researchers propose **HyDRA (Hybrid Dynamic Retrieval Attention)**, a specialized memory architecture built on top of a video diffusion model. It employs a **3D-convolution-based Memory Tokenizer** to compress historical video latents into compact, motion-aware memory tokens. During generation, a **spatiotemporal relevance-driven retrieval mechanism** calculates affinity metrics to actively scan these tokens, selectively pulling the most critical hidden motion and appearance cues into the current denoising process.
**Why the paper matters**
This research overcomes a critical bottleneck in generative video consistency, proving that models can learn to successfully disentangle moving entities from static environments. Extensive experiments show that **HyDRA significantly outperforms existing state-of-the-art methods and leading commercial models** (such as WorldPlay) in preserving subject identity, motion coherence, and overall visual fidelity during complex camera movements.
**What are the potential applications**
Robust hybrid memory capabilities directly enhance **long-duration, high-fidelity video generation and interactive world modeling**. These advancements are foundational for downstream applications that require consistent physical world simulation, including **autonomous driving testing, embodied intelligence, and interactive open-world gaming**.
Видео Architecting Object Permanence The HyDRA Framework канала MLSlops
Комментарии отсутствуют
Информация о видео
31 марта 2026 г. 18:30:13
00:07:46
Другие видео канала





















