Question 6: #Interview questions on #joins #groupby in pyspark #insurance #aggregates
In the insurance domain, you have two PySpark DataFrames:
Policies:
Columns: policy_id (int), customer_id (int), policy_type (string), coverage_amount (double), start_date (date), end_date (date), premium_amount (double)
Sample Data:
policy_id,customer_id,policy_type,coverage_amount,start_date,end_date,premium_amount
1,101,Life,500000.0,2022-01-01,2023-01-01,1000.0
2,102,Health,100000.0,2022-02-01,2023-02-01,1200.0
3,103,Auto,75000.0,2022-03-01,2023-03-01,800.0
4,104,Life,300000.0,2022-04-01,2023-04-01,900.0
5,105,Health,150000.0,2022-05-01,2023-05-01,1100.0
6,106,Auto,100000.0,2022-06-01,2023-06-01,700.0
7,107,Life,700000.0,2022-07-01,2023-07-01,1500.0
8,108,Health,200000.0,2022-08-01,2023-08-01,1300.0
9,109,Auto,50000.0,2022-09-01,2023-09-01,600.0
10,110,Life,400000.0,2022-10-01,2023-10-01,1200.0
Payments:
Columns: payment_id (int), policy_id (int), payment_date (date), payment_amount (double)
Sample Data:
payment_id,policy_id,payment_date,payment_amount
1,1,2022-06-15,300.0
2,2,2022-08-20,400.0
3,1,2022-07-01,250.0
4,3,2022-04-15,200.0
5,4,2022-05-01,350.0
6,5,2022-10-10,450.0
7,6,2022-06-30,180.0
8,7,2022-07-15,600.0
9,8,2022-09-05,300.0
10,9,2022-10-20,120.0
Your task is to find the policy type with the highest average payment amount per customer. Write a PySpark code that performs the following steps:
a. Join the Policies and Payments DataFrames on the policy_id column.
b. Group the resulting DataFrame by policy_type and customer_id.
c. Calculate the average payment amount per customer for each policy type.
d. Find the policy type with the highest average payment amount per customer.
e. Display the final result with columns policy_type, customer_id, max_average_payment, and average_payment_amount.
#pyspark #bigdata #interview #learnprogramming #pysparktutorial #bigdatatechnologies #interviewquestions #bigdatainterview #databricks
Видео Question 6: #Interview questions on #joins #groupby in pyspark #insurance #aggregates канала pysparkpulse
Policies:
Columns: policy_id (int), customer_id (int), policy_type (string), coverage_amount (double), start_date (date), end_date (date), premium_amount (double)
Sample Data:
policy_id,customer_id,policy_type,coverage_amount,start_date,end_date,premium_amount
1,101,Life,500000.0,2022-01-01,2023-01-01,1000.0
2,102,Health,100000.0,2022-02-01,2023-02-01,1200.0
3,103,Auto,75000.0,2022-03-01,2023-03-01,800.0
4,104,Life,300000.0,2022-04-01,2023-04-01,900.0
5,105,Health,150000.0,2022-05-01,2023-05-01,1100.0
6,106,Auto,100000.0,2022-06-01,2023-06-01,700.0
7,107,Life,700000.0,2022-07-01,2023-07-01,1500.0
8,108,Health,200000.0,2022-08-01,2023-08-01,1300.0
9,109,Auto,50000.0,2022-09-01,2023-09-01,600.0
10,110,Life,400000.0,2022-10-01,2023-10-01,1200.0
Payments:
Columns: payment_id (int), policy_id (int), payment_date (date), payment_amount (double)
Sample Data:
payment_id,policy_id,payment_date,payment_amount
1,1,2022-06-15,300.0
2,2,2022-08-20,400.0
3,1,2022-07-01,250.0
4,3,2022-04-15,200.0
5,4,2022-05-01,350.0
6,5,2022-10-10,450.0
7,6,2022-06-30,180.0
8,7,2022-07-15,600.0
9,8,2022-09-05,300.0
10,9,2022-10-20,120.0
Your task is to find the policy type with the highest average payment amount per customer. Write a PySpark code that performs the following steps:
a. Join the Policies and Payments DataFrames on the policy_id column.
b. Group the resulting DataFrame by policy_type and customer_id.
c. Calculate the average payment amount per customer for each policy type.
d. Find the policy type with the highest average payment amount per customer.
e. Display the final result with columns policy_type, customer_id, max_average_payment, and average_payment_amount.
#pyspark #bigdata #interview #learnprogramming #pysparktutorial #bigdatatechnologies #interviewquestions #bigdatainterview #databricks
Видео Question 6: #Interview questions on #joins #groupby in pyspark #insurance #aggregates канала pysparkpulse
Interview question on pyspark learn pyspark learn big data with pyspark pyspark programming tips hands-on pyspark examples big data analytics with pyspark pyspark datetime manipulation data preprocessing essentials pyspark tutorial for data engineers improve data quality in pyspark pyspark data transformation techniques pyspark interview question bigdata interview question bigdata interview series pyspark groupby in pyspark window function in pyspark
Комментарии отсутствуют
Информация о видео
4 января 2024 г. 23:16:23
00:13:58
Другие видео канала