Extending Pandas using Apache Arrow and Numba - Uwe L Korn
PyData Berlin 2018
With the latest release of Pandas the ability to extend it with custom dtypes was introduced. Using Apache Arrow as the in-memory storage and Numba for fast, vectorized computations on these memory regions, it is possible to extend Pandas in pure Python while achieving the same performance of the built-in types. In the talk we implement a native string type as an example.
Slides: https://pydata.org/berlin2018/proposals/124/
---
www.pydata.org
PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.
PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases. 00:00 Welcome!
00:10 Help us add time stamps or captions to this video! See the description for details.
Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVideoTimestamps
Видео Extending Pandas using Apache Arrow and Numba - Uwe L Korn канала PyData
With the latest release of Pandas the ability to extend it with custom dtypes was introduced. Using Apache Arrow as the in-memory storage and Numba for fast, vectorized computations on these memory regions, it is possible to extend Pandas in pure Python while achieving the same performance of the built-in types. In the talk we implement a native string type as an example.
Slides: https://pydata.org/berlin2018/proposals/124/
---
www.pydata.org
PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.
PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases. 00:00 Welcome!
00:10 Help us add time stamps or captions to this video! See the description for details.
Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVideoTimestamps
Видео Extending Pandas using Apache Arrow and Numba - Uwe L Korn канала PyData
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
"Apache Arrow and the Future of Data Frames" with Wes McKinneyEuroSciPy 2019 Bilbao - Can we make Python fast without sacrificing readability? - JL CanoHow to Accelerate an Existing Codebase with Numba | SciPy 2019 | Siu Kwan Lam, Stanley SeibertThe columnar roadmap: Apache Parquet and Apache ArrowMassively Speed-Up Python Code With Numba CompilationJeff Reback - What is the Future of PandasIan Ozsvald - Making Pandas FlyWes McKinney - Apache Arrow: Leveling Up the Data Science StackJake VanderPlas - Performance Python: Seven Strategies for Optimizing Your Numerical CodeApache Arrow: High-Performance Columnar Data Framework (Wes McKinney)Running Apache Airflow Reliably with Kubernetes | AstronomerPyArrow vs. Pandas for managing CSV files - How to Speed Up Data Loading | Better Data ScienceUsing LLVM to accelerate processing of data in Apache ArrowApache Arrow: In Theory, In Practice // Apache Arrow Meetup SFPython(PyPy) is faster than C++ or NOT !The columnar roadmap Apache Parquet and Apache ArrowApache Arrow + Apache Beam: A vision for cross-language, columnar data pipelinesMake Python code 1000x Faster with NumbaPeter Hoffmann - Using Pandas and Dask to work with large columnar datasets in Apache Parquet