Processing Large XML Wikipedia Dumps that won't fit in RAM in Python without Spark
The Python ElementTree object allows you to read any sized XML that you have time to process. Unlike a DOM the entire XML document does not need to be loaded. This video shows how the entire of Wikipedia can be processed without a large amount of RAM in Python.
My blog post for this video:
https://www.heatonresearch.com/2017/03/03/python-basic-wikipedia-parsing.html
The code for this video can be found here:
https://github.com/jeffheaton/present/blob/master/youtube/read_wikipedia.ipynb
Видео Processing Large XML Wikipedia Dumps that won't fit in RAM in Python without Spark канала Jeff Heaton
My blog post for this video:
https://www.heatonresearch.com/2017/03/03/python-basic-wikipedia-parsing.html
The code for this video can be found here:
https://github.com/jeffheaton/present/blob/master/youtube/read_wikipedia.ipynb
Видео Processing Large XML Wikipedia Dumps that won't fit in RAM in Python without Spark канала Jeff Heaton
Комментарии отсутствуют
Информация о видео
18 сентября 2019 г. 22:00:05
00:17:23
Другие видео канала