Using Apache Arrow, Calcite and Parquet to build a Relational Cache | Dremio
WANT TO EXPERIENCE A TALK LIKE THIS LIVE?
Barcelona: https://www.datacouncil.ai/barcelona
New York City: https://www.datacouncil.ai/new-york-city
San Francisco: https://www.datacouncil.ai/san-francisco
Singapore: https://www.datacouncil.ai/singapore
Download slides for this talk: https://goo.gl/eMWk8i
Everybody wants to get to data faster. As we move from more general solution to specific optimization techniques, the level of performance impact grows. This talk will discuss how layering in-memory caching, columnar storage and relational caching can combine to provide a substantial improvement in overall data science and analytical workloads. It will include a detailed overview of how you can use Apache Arrow, Calcite and Parquet to achieve multiple magnitudes improvement in performance over what is currently possible.
We'll start by talking about in-memory caches and the difference between block-based and data-aware caching strategies. We'll discuss the deployment design of this type of solution as well as cover the strengths of each. There will also be a discussion of the relationship of security and predicate application in these scenarios. Then we'll go into detail about how columnar storage formats can further enhance performance by minimizing read time, optimizing for vectorized in-memory processing and powerful compression techniques.
Lastly, we'll introduce a much more advanced way to speed access to data called relational caching. Relational caching builds a cache on columnar in-memory caching techniques but also includes a full comprehension of how data is being used and how different forms of data relate to each other. This will include leveraging multiple sorting and partitioning strategies as well as maintaining multiple related derivations of data for different types of access patterns. As part of this and we also cover approaches to data ttl, relational cache consistency and several different approaches to data mutation and real-time updates.
FOLLOW DATA COUNCIL:
Twitter: https://twitter.com/DataCouncilAI
LinkedIn: https://www.linkedin.com/company/datacouncil-ai
Facebook: https://www.facebook.com/datacouncilai
Видео Using Apache Arrow, Calcite and Parquet to build a Relational Cache | Dremio канала Data Council
Barcelona: https://www.datacouncil.ai/barcelona
New York City: https://www.datacouncil.ai/new-york-city
San Francisco: https://www.datacouncil.ai/san-francisco
Singapore: https://www.datacouncil.ai/singapore
Download slides for this talk: https://goo.gl/eMWk8i
Everybody wants to get to data faster. As we move from more general solution to specific optimization techniques, the level of performance impact grows. This talk will discuss how layering in-memory caching, columnar storage and relational caching can combine to provide a substantial improvement in overall data science and analytical workloads. It will include a detailed overview of how you can use Apache Arrow, Calcite and Parquet to achieve multiple magnitudes improvement in performance over what is currently possible.
We'll start by talking about in-memory caches and the difference between block-based and data-aware caching strategies. We'll discuss the deployment design of this type of solution as well as cover the strengths of each. There will also be a discussion of the relationship of security and predicate application in these scenarios. Then we'll go into detail about how columnar storage formats can further enhance performance by minimizing read time, optimizing for vectorized in-memory processing and powerful compression techniques.
Lastly, we'll introduce a much more advanced way to speed access to data called relational caching. Relational caching builds a cache on columnar in-memory caching techniques but also includes a full comprehension of how data is being used and how different forms of data relate to each other. This will include leveraging multiple sorting and partitioning strategies as well as maintaining multiple related derivations of data for different types of access patterns. As part of this and we also cover approaches to data ttl, relational cache consistency and several different approaches to data mutation and real-time updates.
FOLLOW DATA COUNCIL:
Twitter: https://twitter.com/DataCouncilAI
LinkedIn: https://www.linkedin.com/company/datacouncil-ai
Facebook: https://www.facebook.com/datacouncilai
Видео Using Apache Arrow, Calcite and Parquet to build a Relational Cache | Dremio канала Data Council
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
The columnar roadmap Apache Parquet and Apache Arrow#ACEU19: Chris Baynes – Fast federated SQL with Apache Calcite[Cassandra Cloud-Native Workshop Series] - #3 Application DevelopmentPyCon.DE 2017 Tamara Mendt - Modern ETL-ing with Python and Airflow (and Spark)Dremio Announce Data Lake Engines for AWS, Azure, and Hybrid CloudAnomaly Detection for Real-World Systems by Manojit Nandi | DataEngConf NY '163 Best Practices for Data Organizations: Structure, ROI, Communications | Monte CarloUrsa Labs and Apache Arrow in 2019 - Wes McKinneyLeveraging Stateful Functions to Power the Next Generation of Event-Driven Applications | VervericaIntroduction to Self-Service Data with DremioInteractive Exploratory Analytics with Druid | DataEngConf SF '17What is Dremio and Apache Arrow?Extending Pandas using Apache Arrow and Numba - Uwe L KornPOWERful DEVs Conf - (LIVE)Real-Time Data Pipelines Made Easy with Structured Streaming in Apache Spark | DatabricksStatistical Aspects of Distributed Tracing | SplunkImproving Python and Spark Performance and Interoperability: Spark Summit East talk by Wes McKinneyApache Drill SQL Queries on Parquet Data | Whiteboard WalkthroughDremio presents "Building an analytics stack on AWS....Without going crazy."