Spark with Python. Operations Supported by Spark RDD API
Spark RDD API Operations introduced in the video:
- Map Transformations.
- Reduce Actions.
- Key-value Pairs.
- Join Transformations.
- Set Operations.
For this I will use Python 3 and Enthought Canopy framework.
More about each Spark RDD API Operation in the context of the video:
MAP TRANSFORMATIONS. Applies a transformation that returns words RDD mapped to Uppercase.
REDUCE ACTIONS: An action is a computation that returns a value after running one or more operations on the dataset. An example of an action is the reduce function, which takes two elements from the dataset and applies some computations.
KEY-VALUE PAIRS: We can define a key-value pair by using a tuple in the format(key, value).
JOIN TRANSFORMATIONS: Join transformations take two datasets and creates another one by joining the two initial datasets by key. We can use leftOuterJoin, rightOuterJoin, and fullOuterJoin to perform specific types of join. Those ones are standard SQL Joining Types.
SET OPERATIONS: We can perform common set operations such as unions and intersection between RDDs. I introducing Intersect and Union operations for very simple datasets.
Those RDD API Operations can be used in the same way as standard MapReduce commands for Big Data datasets.
Vytautas Bielinskas
Видео Spark with Python. Operations Supported by Spark RDD API канала Data Science Garage
- Map Transformations.
- Reduce Actions.
- Key-value Pairs.
- Join Transformations.
- Set Operations.
For this I will use Python 3 and Enthought Canopy framework.
More about each Spark RDD API Operation in the context of the video:
MAP TRANSFORMATIONS. Applies a transformation that returns words RDD mapped to Uppercase.
REDUCE ACTIONS: An action is a computation that returns a value after running one or more operations on the dataset. An example of an action is the reduce function, which takes two elements from the dataset and applies some computations.
KEY-VALUE PAIRS: We can define a key-value pair by using a tuple in the format(key, value).
JOIN TRANSFORMATIONS: Join transformations take two datasets and creates another one by joining the two initial datasets by key. We can use leftOuterJoin, rightOuterJoin, and fullOuterJoin to perform specific types of join. Those ones are standard SQL Joining Types.
SET OPERATIONS: We can perform common set operations such as unions and intersection between RDDs. I introducing Intersect and Union operations for very simple datasets.
Those RDD API Operations can be used in the same way as standard MapReduce commands for Big Data datasets.
Vytautas Bielinskas
Видео Spark with Python. Operations Supported by Spark RDD API канала Data Science Garage
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
AWS Industrial AI Services and Predictive MaintenanceBI-DIRECTIONAL ATTENTION | Explained in high levelFind Outliers with AutoEncoder - Full Tutorial (Hands-on and Theory)Data Versioning Control with Real ML Project | Hands-On Lesson #1Python tutorial: read text file (2 simple methods)XAMPP ir Wordpress instaliacija + paruosimas darbui [LT]10 minučių Jeruzalės mikrorajone [Vilnius]Passing of the planeHoroskopas 2015-iems (visi zodiako ženklai) (© Nuo Iki - LNK)20 AI tools that makes your life easier | ReviewMake Better Heatmap With Seaborn in PythonData Scientist Role | My RoadmapTrain logistic regression model with SparkMLPractical Deep Learning for Cloud, Mobile, and Edge with Keras and Tensorflow. BOOK REVIEWLambda function Simple Examples in PythonHow to use Decorators in Python - Get Started Here!Take-off from Vilnius int. airport (Bombardier Dash 8 Q400)Another scraping project in Python: fruit-inform.com (demonstration)Configuration parameters for LLMs | Clearly ExplainedCompare ML Models in few clicks with PyCaret in Python - DEMO