6. PySpark Interview Task | Deloitte, KPMG, Accenture, PwC, Deutsche Bank Data Engineer Preparation
6. PySpark Interview Task | Deloitte, KPMG, Accenture, PwC, Deutsche Bank Data Engineer Preparation
🚀 Are you preparing for Data Engineering interviews at Deloitte, KPMG, Accenture, PwC, Deutsche Bank, or other top MNCs? This video is a must-watch!
In this session, we cover a real-world PySpark coding task that is frequently asked in Data Engineer interviews. This is essential for candidates aiming for roles in top consulting and banking firms!
🔹 What You’ll Learn in This Video:
✅ Read a CSV file in PySpark
✅ Define Schema for structured data
✅ Create a new full_name column (first_name + last_name)
✅ Apply conditional logic for the address_new field (pin == 1111)
✅ Handle NULL values & update the address column
✅ Use PySpark functions like concat_ws(), when(), col()
📌 This is a common PySpark coding assignment in top MNCs like Deloitte, PwC, Accenture, KPMG, Deutsche Bank, EY, and more! Watch till the end to ace your interview! 💯
⏱️ Timestamps for Quick Navigation
0:00 – Introduction & Why This is Important
2:35 – CSV File in PySpark
3:10 – Defining Schema for DataFrame
5:49 - Create Pyspark DataFrame
7:13 – Creating full_name Column
8:16 – Applying Conditional Logic on address_new
9:37 – Handling NULL Values in address Column
10:56 – Final Output
🔥 Like, Subscribe, and Hit the Bell Icon 🔔 for more Data Engineering content!
📂 Resources & Code :
📌 Code & Dataset:
from pyspark.sql.functions import concat_ws, col, when
from pyspark.sql.types import StructType,StructField, StringType, IntegerType
file_path = "dbfs:/FileStore/tables/data-1.csv"
schema = StructType([
StructField('first_name',StringType(),True),
StructField('last_name',StringType(),True),
StructField('pin',IntegerType(),True),
StructField('address',StringType(),True)
])
df = spark.read.csv(file_path,schema=schema,header=True)
#Download data from : https://github.com/Rushi21-kesh/YouTube-Question-Dataset/blob/main/data-1.csv
result = df.withColumn('name',concat_ws(' ','first_name','last_name'))\
.withColumn('address_new',when(col('pin').isin(1111),col('address')).otherwise(None))\
.withColumn('address',when(col('address').isNotNull(),col('address')).otherwise("Unknown"))
result.display()
📌 Full PySpark Playlist: https://youtube.com/playlist?list=PLP3N3nYQOEKu0A2-Zzt5c-C7DbdU8uDPe&feature=shared
#pyspark #dataengineering #dataengineeringessentials #bigdata #deloitte #deloittejobs
#kpmg #accentureinterview #deutschebank
#databricks #databrickstutorial #apachespark #etl #sparksql #azuredataengineer #datapipeline
#dataprocessing #python #pythonprogramming #sql #sparksql #dataanalytics #cloudcomputing #ai
#MachineLearning #SparkStreaming #BigDataAnalytics #TechInterviews #CodingInterview
#PySparkTutorial #PySparkInterviewQuestions #FAANGInterviews #DataScience #GenAI
Видео 6. PySpark Interview Task | Deloitte, KPMG, Accenture, PwC, Deutsche Bank Data Engineer Preparation канала The Data Engineering Edge
🚀 Are you preparing for Data Engineering interviews at Deloitte, KPMG, Accenture, PwC, Deutsche Bank, or other top MNCs? This video is a must-watch!
In this session, we cover a real-world PySpark coding task that is frequently asked in Data Engineer interviews. This is essential for candidates aiming for roles in top consulting and banking firms!
🔹 What You’ll Learn in This Video:
✅ Read a CSV file in PySpark
✅ Define Schema for structured data
✅ Create a new full_name column (first_name + last_name)
✅ Apply conditional logic for the address_new field (pin == 1111)
✅ Handle NULL values & update the address column
✅ Use PySpark functions like concat_ws(), when(), col()
📌 This is a common PySpark coding assignment in top MNCs like Deloitte, PwC, Accenture, KPMG, Deutsche Bank, EY, and more! Watch till the end to ace your interview! 💯
⏱️ Timestamps for Quick Navigation
0:00 – Introduction & Why This is Important
2:35 – CSV File in PySpark
3:10 – Defining Schema for DataFrame
5:49 - Create Pyspark DataFrame
7:13 – Creating full_name Column
8:16 – Applying Conditional Logic on address_new
9:37 – Handling NULL Values in address Column
10:56 – Final Output
🔥 Like, Subscribe, and Hit the Bell Icon 🔔 for more Data Engineering content!
📂 Resources & Code :
📌 Code & Dataset:
from pyspark.sql.functions import concat_ws, col, when
from pyspark.sql.types import StructType,StructField, StringType, IntegerType
file_path = "dbfs:/FileStore/tables/data-1.csv"
schema = StructType([
StructField('first_name',StringType(),True),
StructField('last_name',StringType(),True),
StructField('pin',IntegerType(),True),
StructField('address',StringType(),True)
])
df = spark.read.csv(file_path,schema=schema,header=True)
#Download data from : https://github.com/Rushi21-kesh/YouTube-Question-Dataset/blob/main/data-1.csv
result = df.withColumn('name',concat_ws(' ','first_name','last_name'))\
.withColumn('address_new',when(col('pin').isin(1111),col('address')).otherwise(None))\
.withColumn('address',when(col('address').isNotNull(),col('address')).otherwise("Unknown"))
result.display()
📌 Full PySpark Playlist: https://youtube.com/playlist?list=PLP3N3nYQOEKu0A2-Zzt5c-C7DbdU8uDPe&feature=shared
#pyspark #dataengineering #dataengineeringessentials #bigdata #deloitte #deloittejobs
#kpmg #accentureinterview #deutschebank
#databricks #databrickstutorial #apachespark #etl #sparksql #azuredataengineer #datapipeline
#dataprocessing #python #pythonprogramming #sql #sparksql #dataanalytics #cloudcomputing #ai
#MachineLearning #SparkStreaming #BigDataAnalytics #TechInterviews #CodingInterview
#PySparkTutorial #PySparkInterviewQuestions #FAANGInterviews #DataScience #GenAI
Видео 6. PySpark Interview Task | Deloitte, KPMG, Accenture, PwC, Deutsche Bank Data Engineer Preparation канала The Data Engineering Edge
Deloitte Data Engineering coding 2025 Data Engineering Coding 2025 Pyspark Round 1 Deloitte important question Deloitte Pysaprk coding questions 2025 KPMG 2025 KPIT 2025 Pyspark important questions most ask pysaprk coding questions case when transformations reading CSV file in pyspark databricks coding questions deloitte most ask interview questions most ask data engineering question in 2025 expected interview Question azure data engineering
Комментарии отсутствуют
Информация о видео
21 февраля 2025 г. 17:02:18
00:11:44
Другие видео канала