Загрузка...

Question 6: #Interview questions on #joins #groupby in pyspark #insurance #aggregates

In the insurance domain, you have two PySpark DataFrames:

Policies:

Columns: policy_id (int), customer_id (int), policy_type (string), coverage_amount (double), start_date (date), end_date (date), premium_amount (double)
Sample Data:
policy_id,customer_id,policy_type,coverage_amount,start_date,end_date,premium_amount
1,101,Life,500000.0,2022-01-01,2023-01-01,1000.0
2,102,Health,100000.0,2022-02-01,2023-02-01,1200.0
3,103,Auto,75000.0,2022-03-01,2023-03-01,800.0
4,104,Life,300000.0,2022-04-01,2023-04-01,900.0
5,105,Health,150000.0,2022-05-01,2023-05-01,1100.0
6,106,Auto,100000.0,2022-06-01,2023-06-01,700.0
7,107,Life,700000.0,2022-07-01,2023-07-01,1500.0
8,108,Health,200000.0,2022-08-01,2023-08-01,1300.0
9,109,Auto,50000.0,2022-09-01,2023-09-01,600.0
10,110,Life,400000.0,2022-10-01,2023-10-01,1200.0

Payments:

Columns: payment_id (int), policy_id (int), payment_date (date), payment_amount (double)

Sample Data:
payment_id,policy_id,payment_date,payment_amount
1,1,2022-06-15,300.0
2,2,2022-08-20,400.0
3,1,2022-07-01,250.0
4,3,2022-04-15,200.0
5,4,2022-05-01,350.0
6,5,2022-10-10,450.0
7,6,2022-06-30,180.0
8,7,2022-07-15,600.0
9,8,2022-09-05,300.0
10,9,2022-10-20,120.0

Your task is to find the policy type with the highest average payment amount per customer. Write a PySpark code that performs the following steps:

a. Join the Policies and Payments DataFrames on the policy_id column.

b. Group the resulting DataFrame by policy_type and customer_id.

c. Calculate the average payment amount per customer for each policy type.

d. Find the policy type with the highest average payment amount per customer.

e. Display the final result with columns policy_type, customer_id, max_average_payment, and average_payment_amount.

#pyspark #bigdata #interview #learnprogramming #pysparktutorial #bigdatatechnologies #interviewquestions #bigdatainterview #databricks

Видео Question 6: #Interview questions on #joins #groupby in pyspark #insurance #aggregates канала pysparkpulse
Страницу в закладки Мои закладки
Все заметки Новая заметка Страницу в заметки

На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.

Об использовании CookiesПринять