Open Source Reliability for Data Lake with Apache Spark
Open Source Reliability for Data Lake with Apache Spark
Presenter: Michael Armbrust of Delta Lake
Presented at the Bay Area Apache Spark Meetup hosted at LinkedIn in August 2019.
In this talk, they cover:
* What data quality problems Delta helps address
* How to convert your existing application to Delta Lake
* How the Delta Lake transaction protocol works internally
* The Delta Lake roadmap for the next few releases
Bio: Michael Armbrust is a committer and PMC member of Apache Spark and the original creator of Spark SQL. He currently leads the team at Databricks that designed and built Structured Streaming and the Delta Lake open source project. He received his Ph.D. from UC Berkeley in 2013 and was advised by Michael Franklin, David Patterson, and Armando Fox. His thesis focused on building systems that allow developers to rapidly build scalable interactive applications and specifically defined the notion of scale independence. His interests broadly include distributed systems, large-scale structured storage, and query optimization.
Видео Open Source Reliability for Data Lake with Apache Spark канала LinkedIn Engineering
Presenter: Michael Armbrust of Delta Lake
Presented at the Bay Area Apache Spark Meetup hosted at LinkedIn in August 2019.
In this talk, they cover:
* What data quality problems Delta helps address
* How to convert your existing application to Delta Lake
* How the Delta Lake transaction protocol works internally
* The Delta Lake roadmap for the next few releases
Bio: Michael Armbrust is a committer and PMC member of Apache Spark and the original creator of Spark SQL. He currently leads the team at Databricks that designed and built Structured Streaming and the Delta Lake open source project. He received his Ph.D. from UC Berkeley in 2013 and was advised by Michael Franklin, David Patterson, and Armando Fox. His thesis focused on building systems that allow developers to rapidly build scalable interactive applications and specifically defined the notion of scale independence. His interests broadly include distributed systems, large-scale structured storage, and query optimization.
Видео Open Source Reliability for Data Lake with Apache Spark канала LinkedIn Engineering
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
LinkedIn WomenConnectEngineering Jobs at LinkedInReal-time Indexing of LinkedIn’s Economic Graph--Almog Gavra, LinkedIn (9/12/17)Engineering Careers at LinkedIn: Collaborative CultureLinkedIn's Responsible AI PrinciplesEngineering Careers With LinkedIn Marketing SolutionsLearn about Android Technical Phone Screen at LinkedInMeaningful Work: Engineering Careers with LinkedIn Marketing SolutionsCareer Stories: Shalini — Transformational GrowthLinkedIn Women in Big Data Meetup - February 2020Career Stories: Azita — Transformational GrowthStream Processing Meetup with Apache Kafka, Samza, and Flink (April 2023)Elevating the Voices of Women on LinkedInManaged/s.a., streaming/batch; Unified processing w/Samza Fluent API—Yi Pan, LinkedIn (5/24/17)KDD 2020: Ads Allocation in Feed via Constrained OptimizationStream Processing (4.24)Stream Processing Meetup, July 19Concourse: Near realtime notifications platform@Linkedin--Ajith M. & Vivek N. (July 19, 2018)LinkedIn TechCon: Site Engineering Tech Talk - Day 1Learn about the iOS Technical Phone Screen at LinkedInWell Prepped: Wellness as an Interviewing Tool