Should we split first or scale first?? | Data Preprocessing | EDA | Machine Learning.

I do Free Lancing Projects and Final Year Engineering Projects
I even take classes - Machine Learning/Data Science/Fundaments of Python

Contact me on Whatsapp : 9403027341

You first need to split the data into training and test set (validation set could be useful too).

Don't forget that testing data points represent real-world data. Feature normalization (or data standardization) of the explanatory (or predictor) variables is a technique used to center and normalise the data by subtracting the mean and dividing by the variance. If you take the mean and variance of the whole dataset you'll be introducing future information into the training explanatory variables (i.e. the mean and variance).

Therefore, you should perform feature normalisation over the training data. Then perform normalisation on testing instances as well, but this time using the mean and variance of training explanatory variables. In this way, we can test and evaluate whether our model can generalize well to new, unseen data points.

Видео Should we split first or scale first?? | Data Preprocessing | EDA | Machine Learning. канала Stats_With_Sakhala_ji

Комментарии отсутствуют

Информация о видео

6 сентября 2023 г. 22:11:08

00:04:03

Stats_With_Sakhala_ji

Правообладателям

Жалоба на материал Недопустимый материал Нарушение авторских прав

Комментарии

Другие видео канала

Should we split first or scale first?? | Data Preprocessing | EDA | Machine Learning.

Data Visualisation - Common Practice | HeatMap Tables Simple-Text | Data Story

Ganit

1 Sample T Test and 2 Sample T Test using Python

Mel Spectrograms | A Clear Mental Model of Pitch, Frequency & Time Explained Simply (Audio Basics)

Final Year Engineering Project - Deep Learning Chatbot - Bert - LLM

Banglore _ S01E - Ft @jhonchapchoo Trauma - A Gateway to Escapism

Final Year Engineering Project - Depression Detection - Deep Learning + Computer Vision

Dapp - Crowd Funding - Charity Project 🔥🔥🔥 Blockchain Based Final Year Project

Anova and F Statistics - One Way Anova | Two way Anova | Manova

Crash course on Pandas and Seaborn 🔥🔥🔥🔥 Data Science and Machine Learning

Llamaindex - Usage Patterns | Meme Version

Measure Image similarity using Python ! Computer Vision | Machine Learning | Artificial Intelligence

Final Year Project - Attention model based Grammar correction and Informal to Formal conversion

4. Coeff Modulus & Poly Modulus Degree | Microsoft SEAL Encryption #education #learning #studygram

RAG - Load Index Store Query Retrieve

General Talk on AI | Kaldi | GPT | Fine Tuning & Pre Training | Distil Whisper | LLM

2. Learning With Errors LWE | Modern Encryption Scheme #cybersecurity #encryption

Read Mutate Select and Filters in R | Data Frame management in R | R Studio

Variance | CoVariance | Correlation - Statistics

Bayesian Optimization using Python. Hyperparameter Fine tune using Bayesian optimization