- Популярные видео
- Авто
- Видео-блоги
- ДТП, аварии
- Для маленьких
- Еда, напитки
- Животные
- Закон и право
- Знаменитости
- Игры
- Искусство
- Комедии
- Красота, мода
- Кулинария, рецепты
- Люди
- Мото
- Музыка
- Мультфильмы
- Наука, технологии
- Новости
- Образование
- Политика
- Праздники
- Приколы
- Природа
- Происшествия
- Путешествия
- Развлечения
- Ржач
- Семья
- Сериалы
- Спорт
- Стиль жизни
- ТВ передачи
- Танцы
- Технологии
- Товары
- Ужасы
- Фильмы
- Шоу-бизнес
- Юмор
Parallel table ingestion with a Spark Notebook (PySpark + Threading)
If we want to kick off a single Apache Spark notebook to process a list of tables we can write the code easily. The simple code to loop through the list of tables ends up running one table after another (sequentially). If none of these tables are very big, it is quicker to have Spark load tables concurrently (in parallel) using multithreading. There are some different options of how to do this, but I am sharing the easiest way I have found when working with a PySpark notebook in Databricks, Azure Synapse Spark, Jupyter, or Zeppelin.
Written tutorial and links to code:
https://dustinvannoy.com/2022/05/06/parallel-ingest-spark-notebook/
More from Dustin:
Website: https://dustinvannoy.com
LinkedIn: https://www.linkedin.com/in/dustinvannoy
Twitter: https://twitter.com/dustinvannoy
Github: https://github.com/datakickstart
CHAPTERS:
0:00 Intro and Use Case
1:05 Code example single thread
4:36 Code example multithreaded
7:15 Demo run - Databricks
8:46 Demo run - Azure Synapse
11:48 Outro
Видео Parallel table ingestion with a Spark Notebook (PySpark + Threading) канала Dustin Vannoy
Written tutorial and links to code:
https://dustinvannoy.com/2022/05/06/parallel-ingest-spark-notebook/
More from Dustin:
Website: https://dustinvannoy.com
LinkedIn: https://www.linkedin.com/in/dustinvannoy
Twitter: https://twitter.com/dustinvannoy
Github: https://github.com/datakickstart
CHAPTERS:
0:00 Intro and Use Case
1:05 Code example single thread
4:36 Code example multithreaded
7:15 Demo run - Databricks
8:46 Demo run - Azure Synapse
11:48 Outro
Видео Parallel table ingestion with a Spark Notebook (PySpark + Threading) канала Dustin Vannoy
Комментарии отсутствуют
Информация о видео
6 мая 2022 г. 19:20:42
00:12:33
Другие видео канала





















