Загрузка страницы

Spark with Python. Operations Supported by Spark RDD API

Spark RDD API Operations introduced in the video:
- Map Transformations.
- Reduce Actions.
- Key-value Pairs.
- Join Transformations.
- Set Operations.

For this I will use Python 3 and Enthought Canopy framework.

More about each Spark RDD API Operation in the context of the video:

MAP TRANSFORMATIONS. Applies a transformation that returns words RDD mapped to Uppercase.

REDUCE ACTIONS: An action is a computation that returns a value after running one or more operations on the dataset. An example of an action is the reduce function, which takes two elements from the dataset and applies some computations.

KEY-VALUE PAIRS: We can define a key-value pair by using a tuple in the format(key, value).

JOIN TRANSFORMATIONS: Join transformations take two datasets and creates another one by joining the two initial datasets by key. We can use leftOuterJoin, rightOuterJoin, and fullOuterJoin to perform specific types of join. Those ones are standard SQL Joining Types.

SET OPERATIONS: We can perform common set operations such as unions and intersection between RDDs. I introducing Intersect and Union operations for very simple datasets.

Those RDD API Operations can be used in the same way as standard MapReduce commands for Big Data datasets.

Vytautas Bielinskas

Видео Spark with Python. Operations Supported by Spark RDD API канала Data Science Garage
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
24 декабря 2018 г. 4:55:41
00:14:31
Яндекс.Метрика