- Популярные видео
- Авто
- Видео-блоги
- ДТП, аварии
- Для маленьких
- Еда, напитки
- Животные
- Закон и право
- Знаменитости
- Игры
- Искусство
- Комедии
- Красота, мода
- Кулинария, рецепты
- Люди
- Мото
- Музыка
- Мультфильмы
- Наука, технологии
- Новости
- Образование
- Политика
- Праздники
- Приколы
- Природа
- Происшествия
- Путешествия
- Развлечения
- Ржач
- Семья
- Сериалы
- Спорт
- Стиль жизни
- ТВ передачи
- Танцы
- Технологии
- Товары
- Ужасы
- Фильмы
- Шоу-бизнес
- Юмор
Algorithms of Data Science (Fall 2025) - Session 6a
Sampling and Bootstrap in Tree Models
Yiqiao introduced the new session on sampling and bootstrap, explaining its importance in building tree-based models and model selection. He noted that students are already familiar with the concept of sampling, making this session's learning curve less steep. Yiqiao discussed different sampling methods, including random sampling and stratified sampling, emphasizing their relevance in survey design but noting their limited use in modern data science and AI.
Understanding Sampling Methods in Data Science
Yiqiao explained different sampling methods in data science, noting that systematic sampling is less common in current AI literature compared to simple random sampling. They discussed the differences between stratified and cluster sampling, explaining that stratified sampling involves selecting random samples from each subgroup, while cluster sampling selects random strata. Yiqiao emphasized that while stratified sampling is often preferred for fairness in machine learning algorithms, it's important for practitioners to be familiar with various sampling methods and choose the appropriate one based on the specific problem at hand.
Bootstrap and Sampling Concepts
Yiqiao explained the concepts of sampling and bootstrap, highlighting that while sampling involves selecting a subset of a population for analysis, bootstrap involves repeated random sampling with replacement to estimate population parameters more efficiently. He emphasized that bootstrap can provide accurate results with fewer samples compared to traditional methods, saving time and resources. Yiqiao concluded by introducing an exercise for participants to practice these concepts using a provided website.
Monte Carlo Pi Estimation Method
Yiqiao explained a method to estimate the value of pi using a Monte Carlo simulation in Python. The experiment involves generating random points in a square, then counting how many of these points fall within a quarter-circle inscribed in the square. By comparing the ratio of points inside the circle to the total number of points, an approximation of pi can be obtained. Yiqiao requested a Python function to implement this experiment, with N as the input parameter to control the sample size.
Monte Carlo Pi Estimation Method
Yiqiao discussed a method to estimate pi using a Monte Carlo approach by sampling points inside a circle and comparing the ratio to the total number of points. They demonstrated this in a Colab environment by creating a Python function that approximates pi by sampling points in a quarter circle. Yiqiao explained that using a smaller number of points might not provide a good convergence, so they started with 20 points for the demonstration.
Monte Carlo Sampling Concepts
Yiqiao explained the concept of sampling and its relationship to Monte Carlo simulation, using a circle experiment to demonstrate how increasing sample size improves the accuracy of estimating pi. They showed that non-uniform random sampling can lead to incorrect results, even with large sample sizes, emphasizing the importance of equal chance random sampling in statistical experiments. Yiqiao also mentioned that Project 1's deadline was extended, and encouraged students to reach out if they need further extensions.
Видео Algorithms of Data Science (Fall 2025) - Session 6a канала Yiqiao Yin
Yiqiao introduced the new session on sampling and bootstrap, explaining its importance in building tree-based models and model selection. He noted that students are already familiar with the concept of sampling, making this session's learning curve less steep. Yiqiao discussed different sampling methods, including random sampling and stratified sampling, emphasizing their relevance in survey design but noting their limited use in modern data science and AI.
Understanding Sampling Methods in Data Science
Yiqiao explained different sampling methods in data science, noting that systematic sampling is less common in current AI literature compared to simple random sampling. They discussed the differences between stratified and cluster sampling, explaining that stratified sampling involves selecting random samples from each subgroup, while cluster sampling selects random strata. Yiqiao emphasized that while stratified sampling is often preferred for fairness in machine learning algorithms, it's important for practitioners to be familiar with various sampling methods and choose the appropriate one based on the specific problem at hand.
Bootstrap and Sampling Concepts
Yiqiao explained the concepts of sampling and bootstrap, highlighting that while sampling involves selecting a subset of a population for analysis, bootstrap involves repeated random sampling with replacement to estimate population parameters more efficiently. He emphasized that bootstrap can provide accurate results with fewer samples compared to traditional methods, saving time and resources. Yiqiao concluded by introducing an exercise for participants to practice these concepts using a provided website.
Monte Carlo Pi Estimation Method
Yiqiao explained a method to estimate the value of pi using a Monte Carlo simulation in Python. The experiment involves generating random points in a square, then counting how many of these points fall within a quarter-circle inscribed in the square. By comparing the ratio of points inside the circle to the total number of points, an approximation of pi can be obtained. Yiqiao requested a Python function to implement this experiment, with N as the input parameter to control the sample size.
Monte Carlo Pi Estimation Method
Yiqiao discussed a method to estimate pi using a Monte Carlo approach by sampling points inside a circle and comparing the ratio to the total number of points. They demonstrated this in a Colab environment by creating a Python function that approximates pi by sampling points in a quarter circle. Yiqiao explained that using a smaller number of points might not provide a good convergence, so they started with 20 points for the demonstration.
Monte Carlo Sampling Concepts
Yiqiao explained the concept of sampling and its relationship to Monte Carlo simulation, using a circle experiment to demonstrate how increasing sample size improves the accuracy of estimating pi. They showed that non-uniform random sampling can lead to incorrect results, even with large sample sizes, emphasizing the importance of equal chance random sampling in statistical experiments. Yiqiao also mentioned that Project 1's deadline was extended, and encouraged students to reach out if they need further extensions.
Видео Algorithms of Data Science (Fall 2025) - Session 6a канала Yiqiao Yin
Комментарии отсутствуют
Информация о видео
14 октября 2025 г. 2:15:01
00:28:15
Другие видео канала


![Beginner's Guide to DS, ML, and AI - [6] Another Deep Dive into RAG Hallucinations](https://i.ytimg.com/vi/uyb4Nfy2e28/default.jpg)

![[UNLISTED VIDEO] From Handwritten Notes in Excel through ChatGPT to Word Document](https://i.ytimg.com/vi/bkyZBixYmjc/default.jpg)
![Beginner's Guide to DS, ML, and AI - [4] Save and Load Transformers Model Locally](https://i.ytimg.com/vi/9rmgRG0zPBU/default.jpg)















