Get Ready for ML! Level Up Your Data Lake with Delta and lakeFS | Treeverse
A data lake is primarily two things: an object store and the objects being stored. Even with the most basic setup, data lakes are capable of supporting BI, Machine Learning, and operational analytics use cases. This flexibility speaks to the strength of object stores, particularly their flexibility in integrating with a diverse set of data processing engines.
As data lakes exploded in adoption, a number of improvements were made to the first architectures. The first and most obvious improvement was to file formats, which led to the development of analytics-optimized formats like parquet, and eventually Modern Table Formats like Delta Lake.An even newer improvement has been the emergence of Data Source Control tools like lakeFS that bring new levels of manageability across an entire lake! In this talk, we’ll cover how to incorporate these technologies into your data lake lake, and how they simplify workflows critical to ML experimentation, deployment of datasets, and more!
ABOUT THE SPEAKERS
ADI POLAK
Adi is an open-source technologist who believes in communities and is passionate about building a better world through open collaboration. As Vice President of Developer Experience at Treeverse, Adi helps build lakeFS, git-like interface for the data lakehouse. In her work, she brings her vast industry research and engineering experience to bear in educating and helping teams design, architect, and build cost-effective data systems and machine learning pipelines that emphasize scalability, expertise, and business goals. Adi is a frequent worldwide presenter and the author of O'Reilly's upcoming book, "Machine Learning With Apache Spark." Adi is also a proud Beacon for Databricks! Previously, she was a senior manager for Azure at Microsoft, where she focused on building advanced analytics systems and modern architectures.
When Adi isn’t building data pipelines or thinking up new software architecture, you can find her on the local cultural scene or at the beach.
PAUL SINGMAN
Paul is a developer advocate for the lakeFS project, after several years on the analytics team at Equinox Fitness. His goal is to democratize big data analytics through explaining data architectures that are both user-friendly and cost-effective. He's spoken at various conferences and meetups, including the Postgres Conference NYC and AWS re:Invent. When not working you can find him drinking tea and playing golf.
ABOUT DATA COUNCIL:
Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers. Make sure to subscribe to our channel for more videos, including DC_THURS, our series of live online interviews with leading data professionals from top open source projects and startups.
FOLLOW DATA COUNCIL:
Twitter: https://twitter.com/DataCouncilAI
LinkedIn: https://www.linkedin.com/company/datacouncil-ai/
Eventbrite: https://www.eventbrite.com/o/data-council-30357384520
Видео Get Ready for ML! Level Up Your Data Lake with Delta and lakeFS | Treeverse канала Data Council
As data lakes exploded in adoption, a number of improvements were made to the first architectures. The first and most obvious improvement was to file formats, which led to the development of analytics-optimized formats like parquet, and eventually Modern Table Formats like Delta Lake.An even newer improvement has been the emergence of Data Source Control tools like lakeFS that bring new levels of manageability across an entire lake! In this talk, we’ll cover how to incorporate these technologies into your data lake lake, and how they simplify workflows critical to ML experimentation, deployment of datasets, and more!
ABOUT THE SPEAKERS
ADI POLAK
Adi is an open-source technologist who believes in communities and is passionate about building a better world through open collaboration. As Vice President of Developer Experience at Treeverse, Adi helps build lakeFS, git-like interface for the data lakehouse. In her work, she brings her vast industry research and engineering experience to bear in educating and helping teams design, architect, and build cost-effective data systems and machine learning pipelines that emphasize scalability, expertise, and business goals. Adi is a frequent worldwide presenter and the author of O'Reilly's upcoming book, "Machine Learning With Apache Spark." Adi is also a proud Beacon for Databricks! Previously, she was a senior manager for Azure at Microsoft, where she focused on building advanced analytics systems and modern architectures.
When Adi isn’t building data pipelines or thinking up new software architecture, you can find her on the local cultural scene or at the beach.
PAUL SINGMAN
Paul is a developer advocate for the lakeFS project, after several years on the analytics team at Equinox Fitness. His goal is to democratize big data analytics through explaining data architectures that are both user-friendly and cost-effective. He's spoken at various conferences and meetups, including the Postgres Conference NYC and AWS re:Invent. When not working you can find him drinking tea and playing golf.
ABOUT DATA COUNCIL:
Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers. Make sure to subscribe to our channel for more videos, including DC_THURS, our series of live online interviews with leading data professionals from top open source projects and startups.
FOLLOW DATA COUNCIL:
Twitter: https://twitter.com/DataCouncilAI
LinkedIn: https://www.linkedin.com/company/datacouncil-ai/
Eventbrite: https://www.eventbrite.com/o/data-council-30357384520
Видео Get Ready for ML! Level Up Your Data Lake with Delta and lakeFS | Treeverse канала Data Council
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
![DC_THURS : dbt w/ Drew Banin](https://i.ytimg.com/vi/zc8wLzoAkVc/default.jpg)
![DevOps for Machine Learning & Other Half Truths Processes & Tools for the ML Lifecycle | DataRobot](https://i.ytimg.com/vi/z7m9B6vSVe8/default.jpg)
![Data Discovery Getting More From Your Metadata](https://i.ytimg.com/vi/WDi3rEe_Eow/default.jpg)
![Technical Founders Panel](https://i.ytimg.com/vi/mRgyDCtL6-k/default.jpg)
![Feed The Alligators With the Lights On: How Data Engineers Can See Who Really Uses Data | Stemma](https://i.ytimg.com/vi/4WO3klWEhiI/default.jpg)
![Architecting a Low-Latency Schemaless SQL Engine | Rockset](https://i.ytimg.com/vi/D3OUbQMxmcI/default.jpg)
![Building High Performance Recommender Systems with Feature Stores | Tecton](https://i.ytimg.com/vi/F7-7349p0Ok/default.jpg)
![Office Hours with Stitch Fix Data Platform](https://i.ytimg.com/vi/IabnpQAGkRo/default.jpg)
![DC_THURS on Trino](https://i.ytimg.com/vi/qGvZhwJWAaw/default.jpg)
![Enterprise Data Science Comes of Age | Anaconda](https://i.ytimg.com/vi/VZ3LLPKYjVE/default.jpg)
![Using Machine Learning and Observability Together to Reduce Incident Impact | DigitalOcean](https://i.ytimg.com/vi/QxocoT6Aeuo/default.jpg)
![Making Friends with Generative Models | Tonic](https://i.ytimg.com/vi/7WdMOfoBDpk/default.jpg)
![The Right Way to Track Mobile Data](https://i.ytimg.com/vi/qGgWe9GBUNk/default.jpg)
![DC_THURS on Feature Engineering](https://i.ytimg.com/vi/ewVwxuDizUQ/default.jpg)
![Scaling Uber's Metric System from Elasticsearch to Pinot | Uber](https://i.ytimg.com/vi/u82r_eqUaiI/default.jpg)
![Rikai: A New Data Format for Analytics on Unstructured Data at Scale](https://i.ytimg.com/vi/FVYOLcKNmsM/default.jpg)
![DC_THURS on DataHub w/ Shirshanka Das (Acryl Data)](https://i.ytimg.com/vi/lBbrilDAFMs/default.jpg)
![The Road to Exceptional Data Correctness](https://i.ytimg.com/vi/Ii2S_prglbc/default.jpg)
![Building an ML Experimentation Platform for Easy Reproducibility | Treeverse](https://i.ytimg.com/vi/FLtqcrJ7Vws/default.jpg)
![How Vercel Builds Dozens of Metrics from One Heterogenous Table](https://i.ytimg.com/vi/n3KUORtd5J4/default.jpg)
![DC_THURS w/ Patrick Thompson, CEO of Iteratively](https://i.ytimg.com/vi/-6zTaAtaLzM/default.jpg)