Загрузка...

6. PySpark Interview Task | Deloitte, KPMG, Accenture, PwC, Deutsche Bank Data Engineer Preparation

6. PySpark Interview Task | Deloitte, KPMG, Accenture, PwC, Deutsche Bank Data Engineer Preparation

🚀 Are you preparing for Data Engineering interviews at Deloitte, KPMG, Accenture, PwC, Deutsche Bank, or other top MNCs? This video is a must-watch!

In this session, we cover a real-world PySpark coding task that is frequently asked in Data Engineer interviews. This is essential for candidates aiming for roles in top consulting and banking firms!

🔹 What You’ll Learn in This Video:

✅ Read a CSV file in PySpark
✅ Define Schema for structured data
✅ Create a new full_name column (first_name + last_name)
✅ Apply conditional logic for the address_new field (pin == 1111)
✅ Handle NULL values & update the address column
✅ Use PySpark functions like concat_ws(), when(), col()

📌 This is a common PySpark coding assignment in top MNCs like Deloitte, PwC, Accenture, KPMG, Deutsche Bank, EY, and more! Watch till the end to ace your interview! 💯

⏱️ Timestamps for Quick Navigation

0:00 – Introduction & Why This is Important
2:35 – CSV File in PySpark
3:10 – Defining Schema for DataFrame
5:49 - Create Pyspark DataFrame
7:13 – Creating full_name Column
8:16 – Applying Conditional Logic on address_new
9:37 – Handling NULL Values in address Column
10:56 – Final Output

🔥 Like, Subscribe, and Hit the Bell Icon 🔔 for more Data Engineering content!

📂 Resources & Code :
📌 Code & Dataset:

from pyspark.sql.functions import concat_ws, col, when
from pyspark.sql.types import StructType,StructField, StringType, IntegerType

file_path = "dbfs:/FileStore/tables/data-1.csv"

schema = StructType([
StructField('first_name',StringType(),True),
StructField('last_name',StringType(),True),
StructField('pin',IntegerType(),True),
StructField('address',StringType(),True)
])

df = spark.read.csv(file_path,schema=schema,header=True)
#Download data from : https://github.com/Rushi21-kesh/YouTube-Question-Dataset/blob/main/data-1.csv

result = df.withColumn('name',concat_ws(' ','first_name','last_name'))\
.withColumn('address_new',when(col('pin').isin(1111),col('address')).otherwise(None))\
.withColumn('address',when(col('address').isNotNull(),col('address')).otherwise("Unknown"))
result.display()

📌 Full PySpark Playlist: https://youtube.com/playlist?list=PLP3N3nYQOEKu0A2-Zzt5c-C7DbdU8uDPe&feature=shared

#pyspark #dataengineering #dataengineeringessentials #bigdata #deloitte #deloittejobs
#kpmg #accentureinterview #deutschebank
#databricks #databrickstutorial #apachespark #etl #sparksql #azuredataengineer #datapipeline
#dataprocessing #python #pythonprogramming #sql #sparksql #dataanalytics #cloudcomputing #ai
#MachineLearning #SparkStreaming #BigDataAnalytics #TechInterviews #CodingInterview
#PySparkTutorial #PySparkInterviewQuestions #FAANGInterviews #DataScience #GenAI

Видео 6. PySpark Interview Task | Deloitte, KPMG, Accenture, PwC, Deutsche Bank Data Engineer Preparation канала The Data Engineering Edge
Страницу в закладки Мои закладки
Все заметки Новая заметка Страницу в заметки

На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.

Об использовании CookiesПринять