Delta Live Tables — Databricks Tech & Career Talks (January 27, 2022)
Chris Hoshino-Fish introduces Delta Live Tables, an optimized system for data management in the cloud.
With the advent of cloud computing, the software and hardware industry has been developing new paradigms enabling businesses to scale in the age of data. One of the greatest advantages realized was the separation of storage and compute for data; previous data collection projects were constantly constrained by either storage or compute, and enabling independent horizontal scaling of both disrupted the legacy database and data warehousing industries.
However, databases have been used by enterprises for decades and developed thousands of techniques and optimizations for these data systems that depended on coupled storage and compute. Allowing the compute engine to have certain expectations of the storage layer unlocks techniques for optimizing data access patterns. Some of these techniques are still useful in the world of cloud computing, as well as newly developed techniques specific to the cloud.
Delta Live Tables is a new system for data engineering from Databricks that builds upon technologies like Delta Lake, Apache Spark, and Spark’s Structured Streaming. It focuses on incrementally processing data, optimizing the compute system for cost and performance, while actively managing the data created by business’s data teams. It’s a key component of a data lakehouse, which requires a fast data processing to provide data practitioners with real-time data. Additionally, Delta Live Tables provides data quality monitoring capabilities, helping data teams analyze the quality of data and remediate data problems before they can affect data-driven decisions.
Chris Hoshino-Fish is a lead solutions architect at Databricks. Chris is an active member of the Performance Subject Matter Expert group and a former principal consultant focused on data engineering, working with several Fortune 500 Databricks customers. Prior to Databricks, Chris worked for an adtech company as a data engineer managing pipelines using Apache Spark for 3.5 years. Chris has a B.A. in computational mathematics from University of California, Santa Cruz.
https://www.ischool.berkeley.edu/events/2022/delta-live-tables
Видео Delta Live Tables — Databricks Tech & Career Talks (January 27, 2022) канала Berkeley School of Information
With the advent of cloud computing, the software and hardware industry has been developing new paradigms enabling businesses to scale in the age of data. One of the greatest advantages realized was the separation of storage and compute for data; previous data collection projects were constantly constrained by either storage or compute, and enabling independent horizontal scaling of both disrupted the legacy database and data warehousing industries.
However, databases have been used by enterprises for decades and developed thousands of techniques and optimizations for these data systems that depended on coupled storage and compute. Allowing the compute engine to have certain expectations of the storage layer unlocks techniques for optimizing data access patterns. Some of these techniques are still useful in the world of cloud computing, as well as newly developed techniques specific to the cloud.
Delta Live Tables is a new system for data engineering from Databricks that builds upon technologies like Delta Lake, Apache Spark, and Spark’s Structured Streaming. It focuses on incrementally processing data, optimizing the compute system for cost and performance, while actively managing the data created by business’s data teams. It’s a key component of a data lakehouse, which requires a fast data processing to provide data practitioners with real-time data. Additionally, Delta Live Tables provides data quality monitoring capabilities, helping data teams analyze the quality of data and remediate data problems before they can affect data-driven decisions.
Chris Hoshino-Fish is a lead solutions architect at Databricks. Chris is an active member of the Performance Subject Matter Expert group and a former principal consultant focused on data engineering, working with several Fortune 500 Databricks customers. Prior to Databricks, Chris worked for an adtech company as a data engineer managing pipelines using Apache Spark for 3.5 years. Chris has a B.A. in computational mathematics from University of California, Santa Cruz.
https://www.ischool.berkeley.edu/events/2022/delta-live-tables
Видео Delta Live Tables — Databricks Tech & Career Talks (January 27, 2022) канала Berkeley School of Information
Показать
Комментарии отсутствуют
Информация о видео
1 февраля 2022 г. 5:09:03
00:58:27
Другие видео канала
Putting Machine Learning into Production: An Overview — Srijith Rajamohan, DatabricksState of Data 2014: Data Science Teams in the Wild (DataEDGE 2014)I School Faculty Spotlight: Morgan AmesWhen Data Science Meets Design - Alan McConchie, Stamen Design (DataEDGE 2014)The I School in 2019: Where We’ve Been and Where We’re GoingRoundtable Discussion: Refusal of Surveillance Tech, Part 1 (April 12, 2021)Career Services: Networking TipsWhy your Big Data Initiative Sucks and What to do About it - DataEDGE 2015How to Scale AI-led Analytics — Umair Rauf (DataEDGE 2019)Toward Human-Centered Algorithmic Technologies (Min Kyung Lee)DataEDGE Conference: A new vision for data science — May 30--31, 2013WordSeer FeaturesInsight and Oversights: Shaping the Future of Visual Analytics with AI — Alvitta OttleySports Analytics and the Giants: Opportunities for Revenue Generation | DataEDGE 2016UC Berkeley School of Information Winter 2020 CommencementConstructing Experiments to Inform Business Innovation (DataEDGE 2014)Info 159/259. Natural Language ProcessingWomen in Data Science at UC Berkeley 2021: Data Science in ResearchTrainspotting and Predicting Train Delays | DataEDGE 2016Panel: Size Matters: Big Data, New Vistas in the Humanities and Social Sciences (DataEDGE 2012)At Scale and under Pressure: How Social Media Moderate, Choreograph, and Censor Public Discourse