Загрузка страницы

Outlier detection and removal: z score, standard deviation | Feature engineering tutorial python # 3

If we have a dataset that follows normal distribution than we can use 3 or more standard deviation to spot outliers in the dataset. Many times these are legitimate values and it really depends on the situation if you want to remove them or not. But removing outliers can significantly increase the statistical power of machine learning model hence it is recommended that you treat outliers before building a model. Z score indicates how many standard deviation away a given sample is. We are going to go through all this theory and write python code to remove outliers from heights dataset that I have taken it from kaggle.

Link for kaggle dataset: https://www.kaggle.com/mustafaali96/weight-height

Code & Exercise: https://github.com/codebasics/py/blob/master/ML/FeatureEngineering/2_outliers_z_score/2_outliers_z_score.ipynb
CSV file for exercise: https://github.com/codebasics/py/tree/master/ML/FeatureEngineering/2_outliers_z_score/Exercise

Topics
00:00 Introduction
00:20 Exploratory analysis on a kaggle dataset
01:14 Plot histogram and bell curve
06:30 Use 3 standard deviation to remove outliers
12:14 Use Z score to remove outliers
17:39 Exercise

Website: http://codebasicshub.com/
Facebook: https://www.facebook.com/codebasicshub
Twitter: https://twitter.com/codebasicshub

Видео Outlier detection and removal: z score, standard deviation | Feature engineering tutorial python # 3 канала codebasics
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
28 мая 2020 г. 17:30:01
00:20:05
Яндекс.Метрика