Загрузка...

Keras Tutorial: Checkpointing distributed models with Orbax

Don't let device failures or power outages ruin your training runs. In this tutorial, Yufeng Guo demonstrates how to use Keras with the Orbax checkpointing library. Learn how to implement a custom checkpoint manager and Keras callbacks to ensure your model state is always safely stored.

0:00 Introduction to Orbax & Keras Integration
0:39 Exploring Keras Checkpointing
1:11 Why Extend Keras for Multi-Host Environments?
1:48 What is Orbax?
2:29 Building Utility Classes: KerasOrbaxCheckpointManager & OrbaxCheckpointCallback
2:57 Deep Dive into KerasOrbaxCheckpointManager
3:45 Coding the Get, Save, and Restore State Functions
4:37 Implementing the OrbaxCheckpointCallback
5:12 Protecting Against Device Failures & Preemption
5:31 Implementation Details & Model.fit Integration
6:07 Checkpointing in Action: File Directory Walkthrough
6:56 Summary & Final Tips

Resources:
Orbax checkpointing in Keras - Developer guide → https://goo.gle/40T2LI8
ModelCheckpoint - Keras 3 API documentation → https://goo.gle/3PkAlEq
Subscribe to Google for Developers → https://goo.gle/developers

Speaker: Yufeng Guo
Products Mentioned: Google AI

Видео Keras Tutorial: Checkpointing distributed models with Orbax канала Google for Developers
Яндекс.Метрика
Все заметки Новая заметка Страницу в заметки
Страницу в закладки Мои закладки
На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.
О CookiesНапомнить позжеПринять