Загрузка страницы

Data Versioning Control with Real ML Project | Hands-On Lesson #1

This Hands-on tutorial demonstrated how to use Data Versioning Control (DVC) commands for real Machine Learning (ML) project. DVC is an important part of #mlops (Machine Learning Operations). You will learn how to initialize a new DVC session, how to prepare your data for tracking with DCV, how to read .dvc files, to understand what the information is stored in these files, and more. Also, once you watched this tutorial, you will know how to pull data from remote storage to your #datascience project.

Overall, this video shows the best practise how to work with DVC workflows for beginners and advances users (data scientists, data analytics, MLOps engineers).

DOWNLOAD THE FILES TO START THE TUTORIAL:
- You can fully follow the explained steps by yourself by cloning this Github repository to your local: https://github.com/vb100/dvc_project
- Training and Validation data: https://s3.amazonaws.com/fast-ai-imageclas/imagenette2-160.tgz

To complete this lesson, you will create a new branch on your Github repository where all data versioning control actions will be made.

You should understand that while combining Git and DVC, small files goes to Git, and large files goes to DVC control. Each control has it's own components, such as Git staging area, DVC cache, DVC remote and more.

The remote storage can be on the same computer (tutorial use-case) you are working on, or it can be in the cloud:
- AWS S3 Bucket.
- Google Cloud Bucket.
- Azure Blob storage, etc.

The content of the tutorial:
0:00 - Intro
1:17 - P1. Set-up your Python Environment
3:50 - P2. Hands-On the Basics DVC Workflow
6:50 - Tracking data files with DVC
9:27 - Uploading files to remote storage and push to DVC.
11:59 - Real life situation: Retrieve data from remote

Importants moments:
4:44 - Create a remote storage folder (dvc_remote) and connect it to DVC system for the data science project.
6:00 - Check config file in .dvc folder.
7:37 - What are .dvc files? (Explanation).
8:00 - What is MD5 decryption in DVC (Explanation).
9:08 - Git Control vs. DVC Control (Schemes).
10:55 - Check remote storage folder.
11:36 - check .dvc folder and config file in Github repository.
12:20 - Use dvc checkout command to pull data from remote storage.

Official DVC documentation: https://dvc.org/

Thank you for watching!
Subscribe the channel to get more fresh similar content in future! See you there!

#github

Видео Data Versioning Control with Real ML Project | Hands-On Lesson #1 канала Data Science Garage
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
1 августа 2022 г. 3:23:00
00:14:31
Яндекс.Метрика